RffiwUnBia ill II in vttmi ml 111191 ■Nfl i hi ■■I IHIIiliiii \wmm lwll88li8HHflIBi ■ Bffli HHlill m I I §8ffi]Q9fififfifl8 LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 no."77 --= ''f, L161 — O-1096 THEORETICAL LIMITATIONS ON THE USE OF PARALLEL MEMORIES Henry David Shapiro, Ph.D. Department of Computer Science University of Illinois at Urb ana- Champaign, 1976 3>r Regardless of the underlying machine architecture, independently addressable memory modules contribute significantly to program speed-ups on modern computers. Because of memory conflicts which arise while accessing data, actual program speed-ups are generally less than theoretically possible. Organizing the data of a computation so as to avoid memory conflicts is particularly difficult for data which can logically be viewed as two-dimensional. Several geometric and algebraic conditions are presented which determine if the data of a computation can'be organized to avoid memory conflicts. It is shown that a prime number of memory modules gives higher memory utilization and allows the use of simpler storage schemes than a power of two number of memory modules. The case of greatest practical significance, references to rows, columns and diagonals of a matrix, is given special attention. Finally, a brief discussion is presented which relates this research to that of a companion problem, the construction of memory-processor connection networks for single-instruction-multiple-data stream machines. UIUCDCS-R- 75-776 THEORETICAL LIMITATIONS ON THE USE OF PARALLEL MEMORIES by Henry David Shapiro B.A. , Johns Hopkins University, 1968 M.S., Stanford University, I969 Department of Computer Science University of Illinois at Urb ana- Champaign Urbana, Illinois This work was supported in part by the National Science Foundation under grant no. NSF GJ U1538 and was submitted in partial fulfillment for the Doctor of Philosophy degree in Computer Science, 1975. \LLior ho.77fc-7BI p. 3— in ACKNOWLEDGMENTS Special thanks are due to the chairman of my doctoral committee, Professor C. L. Liu, for his constant encouragement and guidance throughout this research. Appreciation is also extended to the other members of the committee, Professors David Kuck, Duncan Lawrie, Judith Liebman, and Franco Preparata, for their many constructive suggestions. Professor Preparata deserves a special note of gratitude for his comments which helped to simplify the proof of Theorem 5. Also, sincere appreciation is extended to two fellow graduate students, Bruce Link, for his sustained interest in this research as well as his participation in discussions of these results; and Brian Hansche, for our discussions of the conjectures presented in Chapter h. To Mr. Stanley Zundo of the Computer Science drafting department goes a note of thanks for his assistance in the preparation of the numerous figures included in this dissertation. Thanks also goes to Mrs. Connie Slovak for an outstanding job in the typing of this manuscript. Recognition for the financial support which made this research possible is due to both a National Science Foundation Graduate Fellowship and NSF Grant GJ ^1538. Finally, a note of thanks to my wife, Jacqueline, who more than anyone else, encouraged and cheered me when this work progressed slowly. IV TABLE OF CONTENTS Page 1. THE DATA ORGANIZATION PROBLEM 1 1.1 Machine Models and the Data Organization Problem 1 1.2 Formalization of the Problem 7 1. 3 Elimination of Boundary Conditions 12 1.^4- Classes of Skewing Schemes and Some Particular Generalized Lines 15 1. 5 Summary 22 2 . DETERMINATION OF VALID SKEWING SCHEMES 2k 2 . 1 Introduction 2k 2.2 The Basic Result 2k 2.3 Existence and Construction of Valid Skewing Schemes when the Number of Memory Modules Equals the Length of the Generalized Line 32 2.k Existence and Construction of Valid Periodic Skewing Schemes kk 3. SPECIAL RESULTS ON [x,y] -LINES 52 3. 1 Introduction 52 3 . 2 Preliminaries 52 3.3 The Special Case of a Prime Number of Memory Modules 56 3. k Generalization to Composite N 60 3 • 5 Further Results and Examples 73 Page k. UNRESOLVED PROBLEMS AND DIRECTIONS OF FURTHER RESEARCH 80 k.l The Effectiveness of Linear and Periodic Skewing Schemes.... 80 k.2 Questions Relating to Memory Utilization 93 h. 3 Comments on Broader Problems 97 LIST OF REFERENCES 100 VITA 102 VI LIST OF FIGURES Figure Page 1 Multi-function Computer 2 2 Data Needed for the Evaluation of the Function, F 5 3 Parallel Computer 6 k Geometric Realization of a Generalized Line 9 5 The Instance of the [x,y] -line whose First Component is (i, j ) 19 6 Containment Relations Between Classes of Skewing Schemes... 23 7 Instances of a Generalized Line, with their Designated Elements Marked with Asterisks 26 8 Checking the Condition in Theorem h 28 9 Pictorial Presentation of the Proof of Theorem k 30 10 Pictorial Presentation of the Proof of Theorem h 31 11 Tesselation of the Plane by a Generalized Line 33 12 The Generalized Line L = ( (0, 0), (0, l), (0,2 ), (l, l), (2, l) ) Cannot Tesselate the Plane 3^- 13 The Skewing Scheme Resulting From the Use of Theorem 5 38 lU Tesselation of the Plane by the Generalized Line, L = ((0,0), (1,0), (1,1), (2,0), (2,1)) 1+1 15 Tesselation of the Plane by the Generalized Line, L = ((0,0), (1,0), (2,0), (2,1), (2,2)) 1+2 16 One Possible Tesselation of the Plane by the Generalized Line, L = ( (0, 0), (l, -l), (l, 0), (l, l), (2, 0) ) 1+3 17 Another Possible Tesselation of the Plane by the Generalized Line, L= ( (0,0), (l, -l), (1,0), (l, l), (2, 0)) 1+5 VI 1 Figure Page 18 The "Wrap Around" Interpretation of a Generalized Line Used with Periodic Skewing Schemes k6 19 A Valid Linear Skewing Scheme for the Generalized Line, L = ((0,0), (0,1), (0,2), (l,l), (2,0), (2,1), (2,2)) 50 20 Proof of the Existence of a Periodic Skewing Scheme for L = ((0,0), (0,1), (2,1), (2,2)) 51 21 [x, y ] -lines on the Torus 5^- 22 Programmer * s View of STAEAN Memory 77 23 The Periodic Skewing Scheme Used in the STARAN Computer.... 78 2k Positioning Four Instances of the Generalized Line ((0,0), (1,0), (1,1), (2,0), (2,1)), so their Designated Elements Form a Parallelogram 82 25 Alternate Positionings of Instances for the Generalized" Line ( (0, 0), (l, 0), (l, 1), (1,2 ), (2,2 ) ) Qk 26 A Non-periodic Skewing Scheme, \|r, for which cp(i,j) = \|/(i mod N, j mod N) is not Valid 88 27 Examples of Translating by (p, q) and/ or (r, s), so that all the Components of the Instance of the Generalized Line Lie Interior to the Parallelogram 90 28 Components of an Instance of a Generalized Line, after Translation by (p,q) and/or (r, s), which are Stored in Same Memory Module 91 29 An Example of a Polyomino for which There is a Valid Periodic Skewing Scheme, but no Valid Linear Skewing Scheme 92 30 Covers for the Generalized Line ((0,0), (0,1), (0,2), (1,1), (2,1)) which Tesselate the Plane 96 1. THE DATA ORGANIZATION PROBLEM 1.1 Machine Models and the Data Organization Problem In the late 1950' s and early 1960's computer architects began to explore the possibility of increasing the speed at which existing computers operated, by performing some internal operations simultaneously. The overlapping of memory fetches with instruction decoding and execution was the basis of the increase in speed of several machines. A difficulty imposed by the hardware technology of that day was that the rate at which the control unit and arithmetic processor could manipulate data was higher than the rate at which a single memory unit could supply the data . Despite many changes in memory and circuit technology over the past twenty years, the inability of a single memory unit to satisfy the data demands of the central processor has not changed. It appears that this situation will persist in the foreseeable future. Because of the relatively slow memory data rate, primary memory on most modern computers consists of several independent memory modules. Since memory fetches can go on simultaneously in different memory modules, the rate at which the memory can supply data to the central processor is effectively increased. A very successful machine, designed on these general principles, was the CDC 6600 [15] • Figure 1 depicts a block diagram of a computer, which may be regarded as an abstraction of this machine. The designers of the CDC 6600 realized that effective use of the potentially highly overlapped functioning of their computer depended on reducing data dependencies in control unit function 1 function k * * operand registers operand registers BUS I general registers primary memory memory module 1 memo ry module M Figure 1: Multi- function Computer computations and on storing the data requested by the control unit so that data elements demanded in quick succession were in different memory modules. The problem of detecting and reducing data dependencies in a computation has received much serious attention in the literature. The question of how to organize the data, so that data elements needed in quick succession by the central processor were not in the same memory module, was, for a long time, generally ignored. The designers of the CDC 6600 tried to lessen contention for memory by arranging primary memory so that references to consecutively numbered memory locations cycled through all thirty- two memory modules before repeating. This scheme eliminates memory conflict in the most common cases: Fetching sequential instructions and manipulating the data of one-dimensional arrays stored in consecutive memory locations. The problem of organizing the data of a two-dimensional array, so that data requested in quick succession are in different memory modules, was left to the programmer and/or the compiler. To make this problem more explicit consider the following specific example. A FORTRAN programmer writes DIMENSION A(N, N) A(I,J)=F(A(I-1,J-1),A(I-1,J),A(I-1,J+1),A(I,J),A(I+1,J-1),A(I+1,J),A(I+1,J+1)) Normally the programmer envisions the memory allocated to array A as actually being two-dimensional, leaving it to the compiler to convert the doubly subscripted references to real machine addresses. Fetching the parameters for the function call can be thought of as fetching, in rapid succession, in Figure 2. Depending on the the data enclosed by the dimensions of the array A, and the method of data organization employed by the FORTRAN compiler, some of the seven parameters needed by the function, F, may lie in the same memory module. If such a memory conflict occurs, the fetching of the data, and the overall computation, will be slowed. In general, organizing the data of a two-dimensional array so that memory conflicts are reduced or eliminated is a very difficult problem. Another machine design in which this same type of problem arises is depicted in Figure 3- This is a single-instruction-multiple- data stream (SIMD) machine, an abstraction of ILLIAC IV. In many computations the goal is to fetch M words of data in parallel and then operate on them simultaneously. If even two of the data words to be fetched are in the same memory, all the processors may have to sit idle while a second memory cycle is initiated. This can affect performance dramatically. Because memory conflicts can seriously degrade performance, in machines of this design, organizing the data of a computation so that memory conflicts are avoided can be very important. The purpose of this thesis is to develop some mathematical conditions which determine if the data of a two-dimensional array can be stored in a primary memory consisting of independent memory modules, so that during a given computation the data requested by the control unit and /or arithmetic processors can be fetched without memory conflicts. In Chapter 1 preliminaries are considered. Chapter 2 provides a general discussion, a i-3, d-2 i-3, 0-1 i-3, i-3, 0+1 i-3, 0+2 a i-2,j-2 a i-2,j-l a i-2,j a i-2,j+l a i-2,j+2 i-1,0-2 %J-2 i+1,0-2 a. _ 1-1,0-1 a. . 1-1,0 a. _ .;_ 1-1,0+1 1,0-1 a. . 1,0 a. . . 1,0+1 1+1,0-1 a. . . 1+1,0 a. . . _ 1+1,0+1 i-1,0+2 i,0+2 1+1,0+2 a i+2,j-2 a i+2,j-l a i+2,j a i+2,j-H a i+2,j+2 a. -» . ~ a. _ . a. _ . a. _ . a. . i+3,o-2 1+3,0-1 1+3,0 i+3,o+l i+3,o+2 Figure 2: Data Needed for the Evaluation of the Function, F. arithmetic processors A 1 A. control unit —Mr memory- processor connection network \ primary- memory k. "s Figure 3 : Parallel Computer . 7 while Chapter 3 focuses attention on some special cases of importance in practice. Chapter k informally" presents some techniques which have promise in practice, even though complete theoretical analysis of the techniques has not been completed. 1.2 Formalization of the Problem In Section 1.1, the problem of organizing data in parallel memories to eliminate memory conflicts was developed from an historical perspective. To treat this problem mathematically it is convenient to provide a model of the computations that abstract the situation sufficiently so that machine dependent details are eliminated. The data for the computations are to be stored in a doubly subscripted array. Definition 1: A generalized line , L, of length n, is an n-tuple of ordered pairs of integers, the first ordered pair of which is (0,0). A generalized line can be thought of as a rigid template,- which during the course of a computation is positioned at various locations over the matrix of data. The data enclosed by the template is to be fetched for a computation. Returning to the programming example used in Section 1.1, the is to be viewed as the template and its positioning over the matrix of data elements is determined by the actual values of I and J during execution. The data enclosed by the needs to be fetched before computation of the function, F, can proceed. The actual generalized line is an n-tuple, for example L = ( (0, 0), (0,1), (0,2), (l, l), (2,0), (2,1), (2,2)) . This is clearly just a formal way of specifying a 8 geometric template. The template can be built by placing unit squares on the plane, so that the unit squares are centered at the points of the plane indicated by the ordered pairs of the n- tuple. Figure k demonstrates this construction for the generalized line L . There are a few minor points that need clarification. First, the labeling of the points of the plane is not the method commonly used in elementary algebra. The first coordinate indicates the vertical direction, with down being positive, and the second coordinate indicates the horizontal direction, with right being positive. This labeling was chosen to reinforce the fact that the data are stored in a two-dimensional array; this method of labeling is often used for two-dimensional arrays in the literature. This labeling scheme also conforms to that of other authors [3,10]. A second minor point is that technically L and L = ((0,0), (-1,-1), (1,-1), (-1,0), (1,0), (-1,1), (1,1)) are different generalized lines. Their realization by unit squares, however, gives the same geometric shape. A formal definition of equivalent generalized lines could be given; intuitively two generalized lines are equivalent if they realize the same shape, without rotations or reflections. A third point is that the geometric realization of a generalized line need not be a connected figure . As has been pointed out, a generalized line can be viewed as a template. During the execution of a program this template will be positioned over the matrix of data, and the data elements enclosed by it will be referenced in parallel (or in quick succession, depending on the nature of the machine) . The positioning of a template can be viewed as the intuitive interpretation of (0,0) (0,1) (0,2) (1,0) (1,1) :i,2) (2,0) (2,1: [2,2) Geometric realization of the generalized line, \ = ((0,0), (0,1), (0,2), (1,1), (2,0), (2,1), (2,2)). The ordered pairs indicate the labeling of points in the plane . Figure k: Geometric Realization of a Generalized Line. 10 Definition 2: An instance of a generalized line, L, is the ordered n- tuple, L(a,b), resulting from the addition of the ordered pair (a,b) to each component of the generalized line. The positioning of the in Figure 2 corresponds to L„(i,j)« If the equivalent generalized line, L , (see Figure k) is used, then this same collection of data elements corresponds to L (i-1, j-l). It is reasonable to consider a version of FORTRAN designed for machines with parallel functioning. The program segment of Section 1.1 might become TEMPLATE L=((0,0), (0,1), (0,2), (l,l), (2,0), (2,1), (2,2)) DIMENSION A(N,N) A(I,J)=F(L(I-1,J-1) OF A) With these definitions it is now possible to give a precise statement of the data organization problem. Problem: Given a large matrix of data and given that an algorithm requires, at various stages in its computation, the data elements contained in many instances of one or more generalized lines, is it possible to assign the data elements of the matrix to various memory modules, so that when the data elements of an instance of a generalized line are demanded, all the data elements lie in different memory modules? 11 The ability to store the data, so that when an instance of a generalized line is desired all the data elements are in different memories, is a goal in keeping with the functioning of machines designed along the lines of Figures 1 and 5« If it is possible to store the data so no memory conflicts result when fetching instances of the generalized lines used by the algorithm, then, in machines designed along the lines of Figure 1, contention for memory is lessened, and in machines designed along the lines of Figure 3, the data can be fetched in one memory cycle. This problem motivates the following definitions . Definition 3: Given an M xM matrix and N independent memory modules, a skewing scheme is a mapping, cp: (0,1,2, ...,M-1} x {0,1,2, ...,M-1) - (0, 1, 2, . . ., N-l}, where cp(i.j) = k means matrix element a. . is stored in memory module k. Definition k: Given a collection of generalized lines, {L n ,L p , . . -,Lp)j an M xM matrix of data, N memory modules, and a skewing scheme, cp, the skewing scheme is said to be valid for this collection if and only if given any instance of any of the generalized lines, cp assigns the data elements of the instance which lie within the matrix bounds to distinct memory modules. With these definitions the problem described earlier can be formalized as: Given an MxM matrix, N independent memory modules, and a collection of generalized lines used by an algorithm, is there a valid skewing scheme for this collection? 12 1.5 Elimination of Boundary Conditions The reader may have noticed in the preceding section that the definition of a skewing scheme explicitly depends on M, the size of the matrix. From the point of view of the programmer this is unfortunate. The matrix size may vary from one run to the next. If a new skewing scheme was needed for each run, use of some programs might be difficult. Notice, however, that if cp is a valid skewing scheme for a collection of generalized lines on an MxM matrix, then cp restricted to [0,1,2, ...,M'-1] x {0,1,2, . . .,M'-1}, M' < M, is also valid for this collection on an M* x M' matrix. The pragmatic consideration that M may not be known in advance, and can be large, leads to the search for valid skewing schemes with domain {0,1,2,...} x {0, 1, 2, . . .) . When skewing schemes with this domain are used several benefits accrue. As noted above, such a skewing scheme can be used without prior knowledge about the size of M. A secondary benefit is that special case handling of instances which overlap the boundaries along the right and bottom of the MxM matrix can be simplified. Zero or some other null value can be stored for the value of data elements outside the actual array bounds. These practical considerations justify elimination of the boundaries along the right and bottom edges of the matrix, i.e. there are reasons to treat the matrix as infinite in size. It is also possible to show mathematically that widening the domain of skewing schemes to the quarter plane does not result in any loss of generality, that is, those collections for which valid skewing schemes exist on all finite domains, have valid skewing schemes on the quarter plane. Theorem 1 : Given a collection of generalized lines, {L , L , ...,L }, and N, the number of memory modules, 13 if for each M there exists a valid skewing scheme for this collection, cp : [0,1,2, .. .,M-1] x (0,1,2, ...,M-1] -» [0,1,2, ...,U-1}, then there exists a valid skewing scheme for this collection with domain (0, 1, 2, ... } x (0, 1, 2, . . .} . Proof: The proof uses the Konig infinity lemma: If a rooted tree has infinitely many nodes, but each node has finitely many successors, then there is a path of infinite length in the tree [9]. We construct a rooted tree as follows . The nodes at level i in the tree are the skewing schemes valid for this collection for an i xi matrix. Recall that for instances which overlap the boundaries of the matrix there must not be any memory conflicts for elements of the instance lying inside the matrix bounds. Also note that the one node at level 0, the root, is an artificial construct, since matrices of dimension zero have no data elements, i.e. one node at level is created, by convention, to provide a root for the tree. A node at level i, cp., is connected to a node at level i+1, cp. , if cp. , restricted to {0, 1, 2, . . .,i-l] x {0, 1, 2, . . ., i-1} is just cp. . (The node at level is connected to all nodes at level 1, by convention.) This construction produces a tree. To see this, note that every node at level i, i > 0, has a predecessor, for if cp. is a valid skewing scheme on the i x i matrix, then cp. restricted to [0, 1, 2, . . ., i-2} X [0, 1, 2, . . ., i-2} is a valid skewing scheme on the (i-l) x (i-l) matrix. Also note that each cp. has only one predecessor. Thus the construction yields a tree. Next, notice that the tree has infinitely many nodes, since by assumption there is a valid skewing scheme for every M, and, hence, at least one node at each level. Lastly, each node has only finitely many successors, in fact a Ik 21+1 node at level i has exactly N possible candidates for successor nodes, many of which will presumably fail to be valid skewing schemes. Thus the Konig infinity lemma implies an infinite path in the tree. Let the nodes in this path be ty ,\|/ ,\|/ , . . . . Define cp by cp(i, j) = a|t (i, j) where k > max(i,j). Notice that cp is a well-defined function, since if k~ > k n then \|/, restricted to (0, 1,2, . . .,k -1} x (0, 1, 2, . . .,k_, -1} is just d 1 K„ 1 1 \|r , by the way in which the tree was constructed. To see that cp is k l valid for the collection (L, , L p , . . . ,L p }, consider an arbitrary instance of one of the generalized lines. Let k be selected sufficiently large so that this instance does not overlap the right or bottom boundaries of the k xk matrix. Now since \h is valid for this collection on the k xk matrix, and cp restricted to (0,1,2, .. .,k-l) x (0, 1,2, . . .,k-l) equals \|r , all the data elements comprising the instance (except those which overlap the left and top boundaries) will be mapped to different memories by \|/ , and hence cp. Since the instance was arbitrary, cp is a valid skewing scheme . ■ Theorem 1 shows that in searching for a valid skewing scheme, M can be ignored. As pointed out earlier, an additional benefit is that special handling at some matrix boundaries is eliminated. To extend this simplification to the left and top boundaries it is convenient to use (...,-1,0,1,...] x ( . . ., -1, 0, 1, . . .} as the domain for skewing schemes. Unlike our situation earlier, this change cannot be justified on the practical grounds that the size of the matrix may not be known in advance. However, any difficulties at the left and top boundaries can be eliminated by Theorem 2: Given a collection of generalized lines, (L , L , ...,L }, and N, the number of memory modules, 15 if there exists a skewing scheme, cp, valid for this collection with domain (0,1,2,...) x (0, 1, 2, . . .), then there is a skewing scheme, cp, valid for this collection with domain ( . . . , -1, 0, 1, ... } x ( . . . , -1, 0,1, . . . ) . Proof: The proof is similar to the preceding proof, so only a few details are sketched. The main difference is that nodes at level i are chosen to represent skewing schemes valid for the collection of generalized lines with domain {-i, -i+1, . . ., i-1, i) x (-i, -i+1, . . ., i-1, i] . The only new difficulty is to see that level i is not empty. This is so since cp restricted to (0, 1,2, . . .,2i) x (0,1,2, .. .,2i) is valid, so \|/. defined on (-i, -i+1, . . ., i-1, i) x (-i, -i+1, . . ., i-1, i) by\J/.(j,k) = cp(j+i,k+i) is also valid. B The content of Theorem 2 is that there is no loss of generality in considering only matrices of data that are infinite in all directions. This is particularly useful in formulating theoretical results, since proofs no longer need to account for any special conditions that might arise when only part of an instance lies inside the matrix hounds. Because of Theorem 2 only skewing schemes whose domain is the entire plane will be considered throughout the rest of this thesis. l.k Classes of Skewing Schemes and Some Particular Generalized Lines Given a collection of generalized lines it is desirable to have some conditions which determine whether or not a valid skewing scheme exists In situations that arise in actual practice the existence of such a valid skewing scheme is usually not sufficient. It is also highly desirable that cp(i, j) be readily calculable, so that address computation does not overly degrade system performance. There are two approaches that can be taken. 16 One is that cp be represented by a simple mathematical formula. The second is to use some table look-up strategy. Neither of these two methods of calculating cp works well for arbitrary skewing schemes. In general, a closed form mathematical expression for an arbitrary skewing scheme, cp, may not exist. Table look-up techniques will also not be of much help, since a large (theoretically infinite) table will have to be stored, and storing the table in memory so that memory conflicts are eliminated in obtaining information from the table is the same problem as storing the original matrix of data so memory conflicts are eliminated. Because arbitrary skewing schemes cannot always be implemented conveniently, certain subclasses of the skewing schemes valid for the entire plane take on significance. Table look-up schemes motivate the following definition. Definition 5: Given N, the number of memory modules, a skewing scheme, cp, is called periodic if and only if cp(i,j) = cp(i+kN,j+iN), fork,! = ...,-2,-1,0,1,2,... and for any i and j . If cp is a periodic skewing scheme, then cp(i,j) = cp(i mod N, j mod l«l) Therefore to calculate the value of cp at any point in the plane it is only necessary to know cp on {0, 1, . . ., N-l] x [0, 1,2, . . .,N-1] . If N is sufficiently small, periodic skewing schemes can be implemented by table look-up at reasonable cost. The needed values of cp can be stored in a specially designed super- fast memory. As N becomes large, and especially in machines designed along the lines suggested by Figure 3, where each arithmetic unit may require a private copy of the basic N xN storage map, periodic skewing schemes 17 realizable only by table look-up become unattractive. In such situations the first method suggested for computing cp, a simple mathematical formula, appears more reasonable. One class of skewing schemes that has attracted attention in the literature is the class of linear skewing schemes. Definition 6: Given N, the number of memory modules, a skewing scheme, cp, is called linear if and only if there exist constants a and b such that cp(i,j) = ai+bj mod N. The class of linear skewing schemes is a subclass of the periodic skewing schemes, since if cp is a linear skewing scheme, then cp(i+kN, j+fN) = a(i+kN)+b(j+£N) mod N = ai+tg mod N = cp(i,j), and, thus, cp is a periodic skewing scheme. That the linear skewing schemes are a subclass of the periodic skewing schemes shows that some periodic skewing schemes can be implemented without table look-up. There are other periodic skewing schemes which can also be efficiently implemented without table look-up. The one used in the STARAN computer will be mentioned in Chapter 3. Budnik and Kuck [3] and Lawrie [10] have investigated linear skewing schemes in detail. Much of their work was motivated by considering machines designed as in Figure 3. After investigating the data requirements of programs written for similar machines, and after discussions with numerical analysts, they generally focused their attention on some commonly used generalized lines. In particular the generalized lines consisting of N consecutive elements of a matrix row (the generalized line R=((0,0), (0,1), (0,2), . . ., (0,W-1)) ), N consecutive elements of a matrix column (the generalized line C=( (0, 0), (1,0), (2, 0), . . ., (N-l, 0) ) ), N consecutive elements of a forward diagonal (the generalized line 18 D=((0,0), (l,l), (2,2), . . ., (N-l,N-l)) ), N consecutive elements of a backwards diagonal (the generalized line B=( (0, 0), (l, -l), (2, -2), . . ., (N-l, -N+l)) ), and, when N was a perfect square, n/n x */n blocks (the generalized line S=((0,0), (0,1), . . ., (0,n/n-1), (1,0), (1,1), . . ., (1,n/n-1), . . ., (\/n-1, 0), (n/n-1, 1), . . ., (*/n-1,\%-1)) ) were of primary concern. One of the main results of Budnik and Kuck [J] is that if 2|w or 3 |n, where N is the number of memories, then there is no valid linear skewing scheme for the collection of generalized lines, (R, C,D, B). In order to generalize this result and to provide a reasonable notation, a definition is useful. Definition 7: An [x, y]„-line is a generalized line of the form ( (0, 0), (y, x), (2y,2x), . . ., ( (N-l)y, (N-l)x) ) . + Pictorially, the template for an [x,y], T -line is formed by starting at the origin and going over x and down y, until a total of N points are generated (see Figure 5) • In this notation the generalized line representing N consecutive elements of a row is the [1, 0] N -line, the generalized line representing N consecutive elements of a column is the [0,1] -line, etc. The following result can be found in Budnik and Kuck [3], though different notation is used. Theorem 3: Given N memory modules and a collection of [x,y] N - lines, [[x^y^-lines |i=l,2, ...,I], cp(c,d) = ac + bd mod N is a valid linear skewing scheme for this collection if and only if (ay.+bx.,N) = 1, for 1=1,2,..., I. Note that N is the length of the [x,y] -line. (c, d) is the greatest common divisor of c and d. 19 <^ (i,3) x V A> Figure 5: The Instance of the [x, y] -line whose First Component is (i, j) N 20 Proof: Suppose first that cp(i,j) = ai+bj mod N is a valid skewing scheme for bhis col-lection. Take an arbitrary [x, y] -line from the collection, say L = [x ,y ] -line. Since cp is valid the N elements in the instance L(0, 0) must be mapped by cp to different memory modules. Since this instance is just ((0,0), (y ,x ),...,( (N-l)y , (N-l)x )), it must be the case that cp(0,0) / cp(vy , vx ), for v=l,2, . . ., N-l. Thus a-O+b-0 mod N / avy +bvx mod N, for V=l,2, ...,N-1. But this is just v(ay +bx ) ^ 0, for v=l,2, . ..,N-1, and it is a well-known result of elementary number theory that this implies (ay +bx , N) = 1 [7]« Conversely suppose there exists a and b such that (ay +bx , N) = 1, for i=l,2, ...,I. To show that cp(i,j) = ai+bj mod N is valid for the collection consider an arbitrary instance of an arbitrary [x, y], T -line in the collection. It suffices to show that the N elements in this instance are mapped to different memory modules. For definiteness, consider L = Tx ,y ] -line and the instance L(i,j). Since the choice of line and the choice of instance were made arbitrarily, if cp maps all the components of ((i, j), (i+y r , j+x r ), ..., (i+(N-l)y r , j + (N-l)x r )) to different memory modules, then cp will be valid. But if qp(i+vy , j+vx ) = cp(i+v'y ,j+v'x ), for some v,v' e (0, 1, 2, . . .,N-l], with v / v', then ai+avy +bj+bvx = ai+av'y +bj+bv'x which implies (v-v')(ay +bx ) =„ 0, and since v-v' ^ 0, this contradicts the assumption that (ay.+bx.,N) = 1 for i=l, 2, ...,I. ■ The result of Budnik and Kuck mentioned earlier follows immediately, since the collection of generalized lines they refer to is just t[l,0] N - line, [0,1] N - line, [l,l] N -line,[l,-l] N - line}, and one of a, b, and a+b will be divisible by 2, so if 2|n no choices of a and b satisfy the conditions of the theorem. Similarly one of a, b, a+b, and -a+b will be divisible by 3- 21 Lawrie [10] points out that in computers designed along the lines of Figure 3 solving the data organization problem is insufficient in practice. To be able to use such a computer in a reasonable way, an efficient memory-processor connection network, which can route the data to the appropriate arithmetic unit, is also needed. In providing 2q a data organization scheme and a connection network, Lawrie uses P = 2 processors and N = 2P memories. Within this framework he found that by use of linear skewing he could fetch any P consecutive elements of any row, column, forward diagonal, backward diagonal, or any vP x vP block without memory conflicts. (Take a = s/P + 1 and b = 2.) This does not violate Theorem 3> since the number of memories, N, is not equal to the length of the lines, P. In addition to providing a skewing scheme, Lawrie also designed a network, the fi -network, which routed the data to the appropriate processor in 0(% P) time. F. Yao [l6] has shown that the fi- network is optimal. Lawrie left unanswered the question of using some non-linear skewing scheme to achieve the same conflict free access while at the same time reducing the number of memories, N, to be 2q equal to the number of processors, P = 2 . The restriction that the number of processors be a power of two was kept so that any arithmetic mod N could be performed rapidly by shifting and by the hope that the fi-network, or some slight modification of it would still be able to align the data. In Chapter 3 of this thesis it will be shown that if the number of memories equals the number of processors, which in turn is a power of two, then no skewing scheme of any type will be valid for the collection of generalized lines Lawrie considers. 22 Swanson [Ik] has also studied the problem of designing efficient memory-processor' connection networks. In his construction the number of memories, N, equals the number of processors, P, and P is prime. For this case he has designed a network based on k-apart shifters which operates in 0(n/p) time, but uses very little hardware. Unfortunately, to align the data with the processors his network must be followed by a shift network, which requires an additional 0(' b 1 + y 1 ) = b 2 +y i) = k ' In order to see that cp is not a valid skewing scheme for L, consider the instance L(a +x -x.,b +y -y.) = ( (a +x -x.+x , b +y -y .+y ), -L-L(JJ--1-,J -LJ_ ( J_L_LJ_ ( J-L From the definition of generalized line, (x ,y ) = (0,0). 28 * 3 1 5 2 6 * 3 2 5 1 6 4 2 4 5 1 4 *3 6 6 1 2 5 2 *3 1 6 4 2 6 5 4 * - 3 5 6 2 1 ^T 6 2 1 4 5 2 overlap occurs here L = ((0,0), (0,1), (0,2), (1,1), (2,1)). N = 7- The condition in Theorem 4 is tested for k=3' Overlap occurs in two places, implying the skewing scheme is not valid. Only a small section of the plane is shown. overlap occurs here Figure 8: Checking the Condition in Theorem 4. 29 (a 1 +x 1 -x J +x 2 ,b 1 +y l -y J +y 2 ), . . ., (a-^-x.. +VV y i" y / y M ) ) ' The ^ component of L^+x^x ,^+y^y ) is (a^x^x +x ,^+y^y +y ) = (a +x ,b 4-y ), which is mapped by cp into memory k. The i"th component of L(a 1+ x 1 -x.,b 1+ y 1 -y.) is (a^-x.+x^b^-y . + y.) = (a^x.+x^x., h 2 +y.+y 1 -y,) = (a 2 +x 1 ,b 2 +y 1 ), where (a^x^b^y^ = ( a 2 +x j> b 2 +y j ) was used. Since cp(a +x ,b 4-y ) = k, the i th component of L(a +x -x.,b +y -y.) is also mapped into memory k. Because two distinct components of this instance are mapped by cp into memory k, cp is not valid for L. Figure 9 provides an intuitive picture for this part of the proof. Conversely, suppose cp is not a valid skewing scheme for the generalized line, L. Then there is an instance of the generalized line, L(a,b) = ((a+x 1 ,b+y 1 ), (a+x 2 ,b4y 2 ), . . ., (a+x^b+y^ ), the ith and jth components of which, i ^ j, are mapped by cp into memory k. Then the two instances L(a+x.-x n ,b+y ,-y., ) and L(a+x. -x, ,b+y. -y n ) violate the condition v j 1' j 1 l 1 i 1 of the theorem. Their designated elements are (a+x.-x +x , b+y.-y +y ) = (a+x ,b+y ) and (a+x^x^x^b+y^y^y^ = (a+x^b+y^, respectively, which, being the i™ and j^h components of L(a,b), are mapped into memory k, Furthermore these two instances have an element in common, since the i^h component of L(a+x.-x , b+y.-y ) and the j th component of L(a+x. -x ,b+y. -y ) J -I- J J- X X X X are both (a+x.+x. -x , b+y . +y. -y ) . Figure 10 provides an intuitive picture j x x j X X of the situation. ■ Several remarks are appropriate concerning this theorem. When given a collection of several generalized lines, this theorem can still be used to determine the validity of a skewing scheme by applying it to each generalized line of the collection individually, since the skewing scheme must be valid for each generalized line, independently of its validity for the others in the collection. A second consideration is that this theorem 30 stored in memory module k X L(a n ,b v r stored in memory — module k L ' a i iX r x j' b i +y r y d ] LCa^b ) and L(a 2 ,li have an element in common L(a o ,b L(a +x -x.,b +y -y . ) has two components mapped -L -L "the condition that each generalized line tesselate the plane is clearly needed. An intuitive visualization of the condition on the 0. makes use of Theorem 6 easier, and may aid in understanding the proof. Imagine that each tesselation of the plane, T., is performed using the rigid template determined by the generalized line, L., and that each tesselation is done on a separate sheet of clear plastic. In addition, let the designated elements of the instances be marked by asterisks. Then 0. is the set of points on the copy of the plane used for T. containing an asterisk, and the condition that 0=0, ... = becomes: When the sheets of clear plastic are overlaid, the asterisks on the sheets of plastic coincide. 38 2 4 5 6 3 1 2 4 5 6 3 1 3 2 4 5 6 3 1 3 1 2 4 5 6 3 5 1 2 4 5 6 1 2 4 5 6 6 T 3 5 6 1 3 5 2 4 6 3 1 3 2 4 5 6 3 1 2 4 5 1 2 4 5 6 1 2 4 5 6 3 1 2 1 2 4 6 3 1 TJ 4 5 6 3 1 2 4 5 6 3 3 1 2 4 5 6 3 1 J2j4 5 6 3 1 2 4 5 6 5 6 3 1 2 4 5 6 3 1 2 4 5 6 3 1 2 4 2 4 5 6 3 1 2 4 5 6 3 1 2 4 5 6 3 1 1 2 4 5 6 3 5 6 1 3 2 4 5 6 3 5 6 1 2 4 5 6 3 6 4 3 5 1 2 4 1 2 4 3 1 jy 4 5 6 3 5 1 2 4 5 6 3 1 2 4 5 6 3 1 J2 1 2 4 6 3 1 2 4 5 6 3 1 2 4 5 6|3 3 1 3 2 4 5 6 3 1 2 4 5 6 3 5 6 1 3 2 4 5 6 5 6 1 2 4 5 6 3 1 2 4 1 2 4 2j4 5 6 3 1 2 4 5 6 3 1 2 4 5 6 3j 1 1 2 4 5 6 3 6 1 3 2 4 5 6 3 1 2|4 5 6 3 5 6 4 3 1 2 4 5 1 2 4 5 6 »l 1° 1 2|4 5 6 3 1 2 4 5 6 3 1 2 4 5 6 3 1 2 1 2 4 5 6 3 1 2 4 5 6| 3 1 3 2| 4 5 6| 3 3 1 2 4 5 6 3 5 6 1 3 2 4 5 6 1 2 | 4 5 6 5 6 3 1 2 4 1 2 4 5 6j 3 1 2 4 L = ((0,0), (0,1), (0,2), (1,1), (2,0), (2,1), (2,2)) . The heavy lines indicate that the skewing scheme is periodic Figure 13: The Skewing Scheme Resulting From the Use of Theorem 5- 39 Proof of Theorem 6: Suppose that cp is valid for this collection. Then the method of proof used in Theorem 5, applied to each generalized line separately, produces a tesselation T using L . In addition the tesselation that results by following the construction in that proof yields y = { (i, j) |cp(i, j) = 0}. Since V is independent of V, 1 = 2 = ... = p . To establish the converse suppose tesselations, T , using L,> exist, and = = ... = . Then define cp by cp(i,j) = k, where (i, j) J- d. sr is the k+l s ^ component of the instance of L contained in T . This is exactly the same construction employed in the proof of Theorem 5, and so cp is well defined and a valid skewing scheme for L, . It must also be shown that cp is a valid skewing scheme for L„,L_,, .... and L . Pick an i ° 2 3 p arbitrary generalized line, say L,. Theorem k can be applied to show that cp is a valid skewing scheme for L . In verifying that the condition of Theorem k holds when k = 0, the set of instances that must be examined for common elements are just those instances comprising the tesselation, T . Now consider verifying that the condition of Theorem h holds for arbitrary k. As in the proof of Theorem k, the set of instances that must be examined for common elements is obtained by shifting the tesselation. However, the amount of the shift, (x -x ,y -y ), is determined by the components of L , the generalized line used to construct cp, despite the fact that the condition is being verified for the generalized line, L . Since a rigid translation of a tesselation is a tesselation, and v and k were arbitrary, the theorem is proved. ■ A few examples may be helpful in illustrating some of the subtleties that can arise in using this theorem. Consider the generalized lines L = ( (0,0), (1,0), (l, l), (2, 0), (2, l) ), whose geometric realization is realization is 1+0 , and L - ((0,0), (1, 0), (2, 0), (2, l), (2,2) ), whose geometric Each of these generalized lines tesselates the plane separately, as Figures ll+ and 15 show. Since, these are the only tesselations possible, except for rigid shifting, it is clear that tesselations do not exist for these two generalized lines which can be positioned so their designated elements coincide. Thus there is no valid skewing scheme for the collection {L,,L } using only five memories. A possible question arises from contemplating this example: If another generalized line, L', which produces the same geometric realization as Ly, is substituted for L in the collection [L , L , ...,L }, is it possible a valid skewing scheme will now exist, where before there was no valid skewing scheme? The answer to this question can be important in practice, since it is convenient to think in geometric terms. Fortunately, the substitution described above does not affect the existence of a valid skewing scheme, since tesselations of the plane using L' appear to the eye as rigid shifts of tesselations of the plane using L . Thus, when actually using this theorem to find valid skewing schemes, it is permissible to just draw pictures, as has been done through- out, and to pick the designated elements arbitrarily. Figures Ik and l6 illustrate another situation. Both the generalized lines L, , given earlier, and L, = ((0,0), (1, -l), (1,0), (1,1), (2,0)), whose geometric realization is , tesselate the plane, and it is clear that when these tesselations are overlaid, their designated elements coincide. Thus Theorem 6 guarantees a skewing scheme using five kl * X X X- X X- X X * X X X- * X X X- X X- X- X X X X X- X X X X X X X X X X- X X X * X X * X X X X X X X ■* X X X * X X X * X- X- X X X X X X X X X X X- ' X X X * X X X X X X Figure Ik: Tesselation of the Plane by the Generalized Line, L= ( (0,0), (1,0), (1, 1), (2,0), (2,1) ) . 1+2 * * * 1 * * * * * * * * X * * * * * * * * * * * * * * * * *- * * * * *■ * * * * * * * * * * * •* * * * * * * * * * * * * K * * * * * * * * * * * * X * * * * * * * * figure 15: Tesselation of the Plane by the Generalized Line, L = ( (0,0), (1,0), (2,0), (2,1), (2,2) ) . J+3 * * * * ^t * i * * * * * * * * * L * * 4- *T -* * * HI * *- x * * *L : * * * * ■j ♦ * * r " bjl : *l Figure l6: One Possible Tesselation of the Plane by the Generalized Line, L = ((0,0), (1,-1), (1,0), (1,1), (2,0)). 1+1+ memory modules that allows conflict free access to all instances of either generalized line. However, there is an alternative tesselation for Lv, given in Figure 17. If this tesselation had been used, along with the only tesselation for L , then the incorrect conclusion, that there is no valid skewing scheme for the collection {L ,L }, might have been drawn. The statement of Theorem 6 only requires the existence of a set of tesselations which also satisfy an additional condition. Thus if more than one tesselation exists for some of the generalized lines in the collection, they must all be tried before concluding that no valid skewing scheme exists. 2.1+ Existence and Construction of Valid Periodic Skewing Schemes As was pointed out in Chapter 1, periodic skewing schemes are a valuable subset of all skewing schemes, since, for restricted values of N, a reasonable amount of additional hardware permits address computation by table look-up. For this reason it would be nice if the theorems in the last two sections could be restricted to determine the existence of valid periodic skewing schemes. The essential idea of the needed alteration is depicted in Figures 13 and 18. Consider the memory storage map, infinite in both directions, defined by a periodic skewing scheme. If the plane is partitioned into NxN squares, where N is the number of memory modules, then the memory map defined by the skewing scheme is identical within each partition. The bold lines in Figure 13 illustrate this. The main point to be observed is that when an instance of a generalized line extends over one of the partitioning lines, instead of considering the instance to be as in Figure 18(a), it can be considered to be as in Figure l8(b) . ^5 1*11 *l 1 X X- X * X X 1 1* X X X- 1 1* * * X II X 1* * X X X- X X * ■* * X- X 1*- ■X X 1* * X X- X 1* X * X X X- X- * * X Jx ■ II* * * X * * X X- X 1 1* X X X X X * * * X X * * ' X X * X X X- * X 1 1* X Figure l'J: Another Possible Tesselation of the Plane by the Generalized Line, L= ( (0,0), (1, -1), (1,0), (1,1), (2,0)). 1*6 M (b) Figure 18: The "Wrap Around" Interpretation of a Generalized Line Used with Periodic Skewing Schemes. ^7 This view is appropriate in determining the validity of a periodic skewing scheme, since in Figure 18(h) the data elements of the instance that are on the left are stored in the same memory modules as the data elements of the instance that extend beyond the partition line in Figure l8(a) . This situation has been referred to as "wrapping around." This can occur over horizontal partitioning lines as well. Since there are no special properties used when an instance of a generalized line wraps around, the opposite edges of the N xN square can be identified, resulting in a torus. The entire problem of finding valid periodic skewing schemes can be recast into the framework of looking for valid skewing" schemes on the torus formed by identifying opposite edges of the N xN square. Definition 9, and Theorems k, 5> and 6 carry over to the torus with only minor modification. Theorem 7: Given a generalized line of length M, N memory modules, and a periodic skewing scheme, the skewing scheme is valid for this generalized line if and only if the following condition holds for every k e (0, 1,2, . . .,N-1} : When all instances of the generalized line on the torus (formed by identifying opposite edges of the Nxlf square) are considered, in which the designated element of the instance is mapped into memory k, no two of the instances have an element in common. Definition 10: A generalized line tesselates the torus (formed by identifying the opposite edges of an NxN square) if and only if there exists a collection of 14-8 instances of the generalized line, so that every ordered pair on the torus is in one and only one of these instances. Theorem 8: Given N memory modules and a generalized line of length N, there is a valid periodic skewing scheme for this generalized line if and only if it tesselates the torus (formed by identifying opposite edges of the NxN square). Theorem 9: Given N memory modules and a collection of generalized lines, (L ,L , ...,L }, all of length N, then there is a valid periodic skewing scheme for this collection if and only if there exists tesselations of the torus (formed by identifying opposite edges of the NxN square), T ,T , ...,T , such that T. is a tesselation using L. and CL = 0. = . . . = . where i 12 P 0. = [designated elements of the instances of L. comprising T. } . The proofs of these theorems are identical to those of Theorems k, 5, and 6, only the arithmetic must be done in the residue classes mod N. When drawing pictures to determine the existence of valid periodic skewing schemes only an N xN square is needed as long as instances extending beyond the bounds of the square are wrapped around. A few additional comments are in order before closing this section. In working on the torus, as in the plane, the realization of a generalized line, as a rigid template composed of unit squares, need not k9 An interesting question which can be asked is: Can the situation "be restricted still further to determine the existence of valid linear skewing schemes? The answer to this question is not fully known. Notice that the skewing scheme given in Figure 13 is periodic, "but not linear. However, a valid linear skewing scheme does exist for the generalized line whose geometric realization is ■shaped. One such is illustrated in Figure 19. Only the N xN square is shown, since linear skewing schemes are periodic, and thus only the N xN square with wrap around need be considered. Other generalized lines, like L, = ( (0, 0), (0, l), (2,1), (2,2) ), whose geometric realization is the disconnected shape , provide examples of generalized lines for which valid periodic skewing schemes exist, but for which no valid linear skewing schemes exist. Figure 20 implies the .existence of a valid periodic skewing scheme. Trying all possibilities for a and b in cp(i, j) = ai+bj mod N, where, since N = h, a and b need only run over 0, 1, 2, and 3, eliminates the existence of valid linear skewing schemes. The general question of when a valid periodic skewing scheme for a collection of generalized lines implies a valid linear skewing scheme appears quite difficult. Chapter 3 investigates this question for [x^yLy-lines . The general problem is discussed again in Chapter K. 50 1 2 3 4 5 6 2 5 1 4 5 1 3 4 5 6 2 1 3 4 5 6 1 3 6 2 4 5 1 3 4 5 6 2 1 3 4 6 2 6 2 3 a = 1, b = y, N = 7 Figure 19: A Valid Linear Skewing Scheme for the generalized Line, L = ( (0, 0), (0, l) , (0,2), (1,1), (2,0), (2,1), (2,2)). 51 ■ I i i . . i LlII A / / 7 / / NX. \ i V A / K'A \ \ \ I I . I \ M II I 'l \ \ \ \ \ - Figure 20: Proof of the Existence of a Periodic Skewing Scheme for L = ((0,0), (0,1), (2,1), (2,2)). 52 3. SPECIAL RESULTS ON [x,y] N - LINES 3-1 Introduction In Chapter 2, geometric conditions were developed which aid in determining if valid skewing schemes exist for collections of generalized lines. In this chapter, only collections of [x,y] -lines are considered. The highly structured nature of [x, y] -lines permits additional results to be obtained. The main result, which will be proved over the course of the next several sections, is Theorem 10: Given N memory modules and a collection of [x,y] N -lines, ([x^y^-lines |i=l,2, . . .,1), then there is a valid periodic skewing scheme for the collection if and only if there is a valid linear skewing scheme for the collection. The if direction is trivial. The only if direction, however, is a rather surprising result, since the number of valid periodic skewing schemes for a collection of generalized lines usually greatly exceeds the number of valid linear skewing schemes. The proof technique is a generalization of an argument used by Polya for a restricted subcase [13] • 3.2 Preliminaries As was discussed in Chapter 2, when dealing with periodic skewing schemes it is convenient to replace the plane with the torus formed by identifying opposite edges of the NxN square. An instance of an [x,y] -line can now be viewed as having its first component at (i,j) 53 and with successive components located by going over x and down y on the torus, until a total of N points are generated. Figure 21 illustrates this construction. It might happen that in generating these N points two of them will coincide. If this should occur, then there can be no valid periodic skewing scheme using N memory modules for this [x, y] -line, since there are two distinct matrix elements, in the same instance of this [x, y] -line, which must be assigned to the same memory module by any periodic skewing scheme. The condition that must be imposed is simple. Lemma 1: If (x.,y.,N) f 1, then there is no valid periodic skewing scheme using N memory modules for the generalized line, L, the [x. ,y. ] -line. Proof: Suppose (x.,y.,N) = s > 1. Then L(c,d) = ((c,d), (c+y ,d+x ), ..., (cA.,d+i ), ..., (c+(N-l)y ,d+(N-l)x ) ) . Note that XX o X o X X X ^ ^ A N "4. A 4-V 4- N ^ KT S - T 4> T.-4- — , — , and — are integers and that — < N-l. If cp is an arbitrary s ' s ' s s periodic skewing scheme then cp(c,d) = cp(c+N — ,d+N — ) = cp(c + — y. , d +— x. ) • S S S 1 S X Since (c,d) and (c + — y.,d + -x. ) are components of the same instance of L, S X S X cp is not valid. ■ Thus, given a collection of [x,y] -lines {[x.,y. ] -lines | i=l,2, . . ., I}, if even one of the [x. ,y. ] -lines is such that (x.,y.,N) ^ 1, then there are no valid periodic skewing schemes, using N memory modules, for this collection. (x.,y.,N) = 1 for i=l,2, ...,I, is, therefore a necessary condition for the existence of a valid periodic skewing scheme. It is not a sufficient condition, however. When (x.,y.,N) = 1 some simplifications are possible. In general, given an arbitrary generalized line, L, of length N, there are N distinct instances of L on the torus. However, letting L be the t/ \ x i^y-^N)is the greatest common divisor of x-, y. and R. 5^ 1 1 1 1 1 1 1 k 5 2 3 6 An example of an instance of an [x, y] -line on the torus. The order in which the elements were generated is indicated on the figure. i = 1, j=2, x = 3, y = 2, N Figure 21: [x,y] -lines on the Torus 55 [x.,y. ] -line, on the torus the components of L(c,d), L(c+y. ,d+x. ), . .., and L(c+(N-l)y. ,d+(N-l)x. ) are the same. The ordering of the components within these instances is different, but for purposes of determining the validity of a periodic skewing scheme these N instances can be regarded as one instance. Thus, for [x, y] -lines, with (x, y, N) = 1, the number of instances on the torus is effectively N, instead of Or. Throughout the rest of this chapter an [x,y] -line, with (x,y,N) = 1, will be regarded as having only N instances on the torus. Notice that no two of the N instances have any elements in common and since the torus has IF elements, every element on the torus is in one and only one instance It is possible to characterize these N instances. Lemma 2: Given an [x,y] -line, with (x,y, N) = 1, each of the N instances of the [x, y] -line can be characterized by an integer in {0, 1, 2, . . ., N-l), in the following manner: If (w, z) is a component of an instance of the [x, y] -line, characterize this instance by xw-yz mod N. Proof: The proposed characterization is a function from the N instances of the [x,y] -line to (0,1,2, ...,N-1}. First it must be shown that this is indeed a well-defined function. Let (w, z) and (w',z') be different components of the same instance of the [x, y] -line. Then w' = w+vy and z 1 =z+vx, and xw'-yz' = xw+xvy-yz-yvx =xw-yz, so, in fact, the mapping defined is a function. Additionally, the function is a one-to-one correspondence. The pigeonhole principle implies that to see this it suffices to show that for any i there exists w and z such that xw-yz =i. It is a well-known Congruences are mod N, unless otherwise indicated. 56 result in number theory that given x and y there exists c and d such that xc-yd = (x,y) [7]. Since ((x,y),N) = (x,y,N) = 1 and the residue classes of numbers relatively prime to N form a group under multiplication, there exists g such that (x,y) 'g=l. Hence xcg-ydg = 1, and thus xcgi-ydgi= i. Letting w = cgi mod N and z = dgi mod N gives the needed w and z . B 3.3 The Special Case of a Prime Number of Memory Modules Theorem 10 is easy to prove if N, the number of memory modules, is a prime. Lemma 3: Given N memory modules, N a prime number, and a collection of [x.y] - lines, ([x. ,y. 1 - lines I ; " N 7 1 1 N i=l,2, ...,I}, then there is a valid periodic skewing scheme for the collection if and only if there is a valid linear skewing scheme for the collection. Proof: As remarked earlier, the if direction is trivial. - If cp is a valid periodic skewing scheme then exactly N points on the torus, formed by identifying opposite edges of the N xN square, will be mapped by cp into memory module zero. This is so because exactly one component of each of the N disjoint instances of the [x , y ] -line must be mapped to zero by cp. Without loss of generality, the element (0,0) can be assumed to be one of these elements. Consider any other element mapped into memory zero, say the element (y, x). Construct the a and b for the linear skewing scheme as follows: if y s then a = 1, b = else if x = then a = 0, b = 1 else b = 1, a = -y x , 57 /N— 1 /S where y " is the multiplicative inverse of y in the field of residue classes mod N. (Note that y ^ and N is prime.) The claim is that cp(c, d) = ac+bd mod N is a valid linear skewing scheme. Suppose not, i.e. cp(c,d) = cp(c+vy., d+vx. ) for some c,d, ie[l,2, ...,!}, and Ve{l,2, . . .,N-1} . This implies avy.+bvx. =0, but since N is a prime and v f- 0, this last equation implies that ay.+bx. = 0. Three cases will be examined to show that av.+bx. = contradicts the validity of cp. Case 1: y = 0. ay.+bx. = reduces to y. = 0, since a = 1 11 i ' and b = 0. Since, by Lemma 1, (x.,y.,N) = 1, x. ^ 0. It follows that x7 exists in the field of residue classes mod N, since N is prime and x. is non-zero. Thus (y,x) = (0,x) = (O.xxT x. ) = (0+xx7 y. ,0+xxT' x. ) . l ' ' 7 l l N l i' i i y Now (0,0) and (0+xx. y.,0+xx. x. ) are distinct components of the same instance of the [x.,y. ] -line, because xx. is non-zero. They are both, li N ' i ' however, mapped by cp to memory zero, contradicting the validity of cp. Case 2: x = 0. ay.+bx. = reduces to x. = 0. Here y. £ 0, 11 1 Jl ' and, in a manner similar to case 1, (y, x) = (y, 0) = (yy7 y.,0) = (0+yy7 y.,0+yy7 x. ) . This is a similar situation to that encountered in case 1. Case 3: x ^ 0, y ^ 0. ay.+bx. = becomes -y~ xy.+x. = 0. Now y. ^ 0, for y. = implies x. = and then (x.,y.,N) ^ 1, contrary to the requirement established by Lemma 1. But y. p implies y7 exists in the field of residue classes mod N. Thus x = yy7 x. and l l (y,x) - (yyT y i ,yy7 x ± ) = (0+yy~ y^O+yy" x ± ), giving rise to the same contradiction as in case 1. Thus, the assumption that cp(c, d) = ac+bd mod N is not a valid linear skewing scheme led to the conclusion that cp is not a valid periodic skewing scheme, contrary to the hypothesis of the lemma. ■ 58 It is now possible to precisely characterize when valid periodic skewing schemes exist for a collection of [x,y] -lines, if the number of memory modules is a prime. Lemma k: Given N memory modules, N a prime number, and a collection of [x,y] -lines, {[x. ,y. ] -lines | i=l, 2, . . ., I], then there exist a valid periodic skewing scheme for this collection if and only if (x.,y.,N) = 1, for i=l,2, ...,I, and either I O or for all non-zero choices for g., i=l,2, . . .,N+1 and any permutation, a, of (l, 2, ...,l), it is not the case that (x a(l) ,y a(l) ) - (0, gl ), (x a(2) ,y a(2) ) - (g 2 ,g 2 ), ••"' (x a(i)> y a(i) } E tei^-D'Si)' "•> (X a(N)' y a(W) ) B (%(*-!),%), and ( x a (N + l)' y a(N + l) } s (g N+1 ,o). Proof: Lemma J shows that it is sufficient to prove that the conditions stated above are precisely those needed to guarantee the existence of valid linear skewing schemes for the collection of [x,y]. Tl lines. The need for the condition (x.,y.,N) = 1 for i=l,2, ...,I has already been discussed. The only if direction: Suppose I ^ N +1 and there exists non- zero g. and a permutation, a, so that (x /,x,y /-,n) = (0,g ), . . ., i o{±) o{±) l (X a(N)^a(N) } 3 ^N^W' and ( X a (N + l)' y c(N + l) } * ( % + l> 0) ' Zt Wl11 be shown that for any a and b, cp(c, d) = ac+bd mod N is not a valid skewing scheme. Case 1: b = 0. ay o(N+1)+ bX o(N+1) ^ a -0 + • g^ * 0. Thus (ay / s+bx / yN) 4 ~L> and -> ^y Theorem 3, cp is not a valid skewing scheme. 59 Case 2: b ^ 0. Because N is a prime number, (ay.+bx.,N) = 1 for all i if and only if ay. +bx. ^ for all i. Also ay.+bx. ^ for all 11 11 i if and only if ab~ y.+x. jk for all i. Now ab~ = j for some J6{Q,1,2, ...,H-1}. Let k = -J. Then a*"V ff(k+l) « fl(k+l) s ab "\ + i + ^ + i k : j g k+l"j g k+l = °> where (x a(k+l)' y a(k + l) ) s ^k+l^^+l 5 was used ' Thus ay.+bx. = for some i and Theorem 3 implies cp is not valid. The if direction: (x.,y.,N) = 1 for i=l,2, ..., I and suppose, for the moment, I § N +1, but for all non-zero choices for g. and any permutation, a, of (l, 2, ...,l), it is not the case that (x / v.y ,.*) = av-LJ a(.l; (0, gl ), ..., (x a(N) ,y a(N) ) - (g N (N-l),g N ), and {* g{lsML y7 g{v+l) ) = (% +1 ,0) A valid linear skewing scheme will be defined for the collection of [x,y] N -lines. Case 1: Suppose that there exists je{0, 1, 2, . . .,N-1}, such that (x.,y.) ^ (g. -,j,g., 1 ) for all i and all non-zero choices of g. , i.e. l l j +1 j +i j +1 the reason a permutation cannot be constructed with (x , ^y , ,) = (0, g ), — ' (X a(N)^ y a(N) ) " (%(*-!)*%)' *nd ( x a( N+l)' y a(N+l) } s (g N+l> 0) ls that there is no possible choice for a(j+l). (Note that when attempting to construct permutations for which (x /-,\>y /-,\) - (0>g-,), •••> (X a(N)^a(N) } S ^(^W' and ^ +1 y7 a ^ +l) ) = C^^O), (x.,y.) can be congruent to only one of (0,g 1 ), (g 2 ,g 2 ), ..., (g N (N-l),g N ), and (g N+1 , 0).) Take a = -j and b = 1. Now if (ay +bx_, N) ^ 1 for some k, then ay +bx, = 0, since N is prime, and hence -jy, +x, = 0. Then ^V y k^ ~ ( J ' y k> y k^ contrar y to the assumption that (x ± ,y ± ) £ (g. +1 J,g. +1 ) for any i and any non-zero g. , since g. can be taken to be y . Hence, (ay.+bx.,N) = 1 for all i and cp(c, d) = ac+bd mod N is valid. 6o Case 2: For any i and any non-zero g N+1 , (x^y^) ^ (%+]_> °)' This is similar to case 1, 'and occurs when there is no possible choice for a (N + l) for which (x ff(l) ,y a(l) ) = (0,^), ..., (x ff(N) ,y ff(]|) ) s (g N (N-l),g N ), and (x a(N+l) ,y a(N+l) ) ■ (g N+1 ,0). Take a = 1 and b = 0. Again, suppose (ay +bx , N) / 1 for some k, then ay +bx, = reduces to y k s 0, and (x^y^ = (\>°) ~ (% +1 '°) where g N+1 is taken as x fc . This contradicts the assumption that (x.,y.) ^ (g , 0) for all i and all non- zero choices of g . Thus cp(c, d) = ac+bd mod N is valid. Since these two cases exhaust all the reasons why a permutation, a, cannot be constructed so that for some non-zero choices of g. (X a(l)^ y a(l) ) 3 <°'8i>' •'•> (x a(N)^ y a(N) ) S (s^W' and ^ X a(N+l) ,y a(N+l)' ) ~ ^N+l' ^ there is a valld linear skewing scheme for the collection of [x,y] -lines. If I < N the pigeonhole principle gives the same two cases, since there will either exist a je{0, 1, . . ., N-l] such that for any ie[l,2, . . ., 1} and any non-zero choice of g. , (x.,y.) ^ (g. -,j,g. , ) or for any ie{l,2, ...,1} and any non-zero choice of g N+1 , (x ± ,y i ) ^ (g N+1 ,0). ■ 3-4 Generalization to Composite N To complete the proof of Theorem 10 it suffices to prove two statements: 1. Given N, the number of memory modules, N a composite number, a collection of [x, y] -lines, C = ([x.,y. ] -lines |i=l,2, . . ., I], and a valid periodic skewing scheme, cp, for this collection, using N memory modules, then if p is a prime factor of N, there is a valid periodic skewing 6l scheme, cp', using only p memory modules, for the collection of [x, y] -lines, C = {[x.,y. ] - ' p ' x 3 1 p lines |i=l, 2, . . ., 1} . 2. Given a collection of [x,y] -lines, S = ([x.,y.] - lines |i=l, 2, . . ., I], and a valid linear skewing scheme, cp, for S, using M memory modules, and given a valid linear skewing scheme, cp', for S' = ([x i ,y i ] MT -lines|i=l,2, ...,I}, using M' memory modules, then there is a valid linear skewing scheme, cp", using MM' memory modules, for the collection of [x^]^, -lines, S" = {[x^y^^,- lines |i=l, 2, . . ., 1} . To see that this is sufficient note that given a valid periodic skewing scheme, using N memory modules, for the collection of [x, y] -lines, {[x. ,y. ] -lines |i=l,2, . . ., I}, statement 1 above, implies valid periodic skewing schemes for each prime factor of N. Lemma 3 then implies the existence of valid linear skewing schemes for this collection for each prime factor of N, and, finally, statement 2 implies existence of a valid linear skewing scheme for this collection using N memories. Because of Theorem 3> statement 2 is equivalent to Lemma 5: Given a collection of ordered pairs, t(x i ,y i ) |i=l,2, .. .,1}, and given M, M 1 , a, b, a', and b', such that (ay.+bx.,M) = (a'y +b'x,M') = 1, for i=l,2, ...,I, then there exists a" and b" such that (a"y.+b"x ,MM') = 1, for i=l,2, ...,I. Note that the xj_ and y± are the same in both C and C, but the generalized lines are different, since they have different length. 62 Proof: The proof is constructive. Let jt be the product of those prime numbers which divide M, but do not divide M' . Let jt' be the product of those prime numbers which divide M', but do not divide M. Finally, let p be the product of those primes which divide both M and M' . For the sake of definiteness, in calculating it, jt' and p include a prime factor in the product only once, even if it appears in the prime factorization of M and/or M' several times. Also if there are no prime factors from which to calculate it, it' or p set tt, tt' or p, as the case dictates, to one. Define a" = ait'p+a'Tt and b" = bjr'p+b'n. The claim is that (a"y.+b"x.,MM') = 1 for i=l,2, ...,I. Suppose not, i.e. for some i, there is a prime, p, for which p | a"y. +b"x. and p|MM'. If p|a"y.+b"x., then p|ajt 'py.+a'ny.+bot 'px.+b'nx. . By the definitions of it, Tt * and p, p divides exactly one of them. Assume p|xt. Since, p|n implies p|a'Tty.+b'Ttx. , it can be concluded that p|art 'py.+bit 'px. . Since p is a prime and p Jn' and p /p, it must happen that p|ay.+bx.. However, p J jt implies p|M, so (ay.+bx.,M) 4 ^-t contrary to the hypothesis of the theorem. Thus if p|a"y.+b"x. and p|MM', then p /it. By similar arguments p /it' and p /p. But this is impossible. Hence, a"y.+b"x. and MM' have no prime factors in common, i.e. (a"y. +b"x. ,MM' ) = 1 for i=l,2, ...,I. ■ To complete the proof of Theorem 10, it is, therefore, sufficient to prove statement 1, above. Statement 1 will be proved by contradiction. To this end, suppose a collection of [x,y] -lines, C = {[x.,y. ] -lines| i=l,2, .. .,1), is given, p is a prime number, there is no valid periodic skewing scheme for C, using p memory modules, and there is a valid periodic skewing scheme, using N memory modules for the collection of [x, y] - lines, C = {[x. ,y. ] -lines |i=l, 2, . . ., I}, where p|N. Some additional technical lemmas are useful. Lemma 6 : N 63 P-l o o l -1 -1 (p-l) -1 1 -2 -2 (P-D -2 (p-2) (P-2) 1 1 -(p-D >-(P"l) (p-1)-^- 2 ^ (p-l)"^ 1 2 i p " 2 2 P" 2 I 5 " 1 2 p - X 1 (P-D (P-D (P-D p-2 P-l = (P-1)I, where all calculations are done in the field of residue classes mod p, p a prime number. Proof: For convenience the second matrix will be denoted by M P and the first by M' . Consider the first row of the product matrix. Clearly, the first row of M' times the first column of M is p-l. The P P first row of M' times any other column of M is zero, since it is P P (p-l) •l+0-i+. . .+0'i P ~ +l'i P " " = (p-l)+i P ~ . Fermat's theorem [7] states that, if i / 0, f i P = 1. Thus p-l+i p_1 - p-1+1 = p = 0. Thus the first Note that all equations are in the field of residue classes mod p, in particular, = mean = . Q± row of the product matrix is [p-1 ... 0] . Now consider any other row of M', [0 i " i"' ... i ], times any column of M , [1 j j 2 ... j P_1 ] T . The product is 0+i' 1 j+i" 2 j" 2 +...+i- (p - l) j p - 1 = (i j)+(i j) +...+(i o) • In general (x+x +...+x p ') (x-l) = x-x, but since p is prime, Fermat's theorem implies x = x for any x. Thus (x+x +...+X ) (x-l) = 0. But in a field, the product of two numbers is zero if and only if at least one of them is zero. Thus if x / 1, x+x + ...+x P ~~ = 0, and, clearly, if x = 1, x+x 2 +...+x p ~ = p-1. Replacing x by i -1 .i, and noting i -1 j - 1 if and only if i = j, gives that the off-diagonal elements of the product matrix are zero and the diagonal elements are p-1. Thus M' xM = (p-l)l. ■ p p Corollary 1: det(M ) ^ 0, where the calculations are performed in the field of residue classes mod p, p a prime number. Proof: (p-l)M' is a left inverse for M , since (p-l) 2 p -2p+l = 1. But if a matrix has a left inverse, the left inverse is a two-sided inverse, and the matrix has a non-zero determinant [8]. ■ p-1 r rp-1 if r= p-1 Corollary 2 : Z i = S , i=0 P Lo if r - 0,1,2, ...,p-2 where = 1, by convention. Proof: While, in general, matrix multiplication does not commute, M' xM = M xM', since M' is (almost) the inverse of M . Note P P P P P P that the last column of M' is all ones, since i~^ p " = (i ) p " L = l, by p-1 Fermat's theorem. Thus ( 2 i ) mod p is just the r+1 row of M times i=0 P the last column of M' . Since M xM' = (p-l) I, the last column of the P P P T product matrix is [0 ... p-1] , and the corollary is proved. ■ 65 P -1 r f-P if r = p-1 Lemma 7: z i = -l , for e a l. (l) i=0 P lo if r = 0,1, . ..,p-2 Proof: The proof is by induction on e. For the basis case, p-1 r r-p = -1 = p-1 if r = p-1 e =1, formula (l) reduces to Z i = < i=0 P lo if r =0,1,2, . ..,p-2 This is just Corollary 2. Therefore, assume that the result is true for e' = e-1. Since formula (l) is clearly true for r = 0, r will be assumed greater than zero throughout the remainder of the proof. Now, p e -l p e_1 -l p-1 p-1 p e_1 -l Z l T = S Z (jp+i) r = Z Z (jp+i) r • (2) ,2=0 j=0 i=0 i=0 j=0 Expanding (jp+i) by the binomial expansion and rearranging the terms in the sum gives e e-1 „r „ I " /T\ .r-k r-k.k /vN Z t = Z Z Z (,)o P i • (3) 1=0 k=0 i=0 j=0 Isolate an inner sum, e-1 -, e-1 •] p " /r N .r-k r-k.k ,r N r-k.k p „ .r-k ,, x S L)o p i = (, )p l Z o (4) j=o k k j=0 By the induction assumption, e-1 , e-2 . . P -1 r k f-P if r-k = p-1 S J ' %-l I 0=0 P LO if r-k = 0,1, ...,p-2 66 Thus, P „ "\r-k £ a e "2 e ~l -^ -, •P +a r k P if r-k = p-1 (5) a r,k P e-1 if r-k = 0,1, . . .,p-2 for some a , integer. By substituting (5) into (k) it is possible to r , K determine the value of the sums in (3). Case 1: Middle terms of the binomial expansion, r > k > 0. In this case 1 g r-k < p-1. Thus e-1 ,r N r-k.k p " .r-k , V )P i £ J k 3=0 ,z\ r-k.k e-1 _ . ( k )P l a^ k P = pe (6) p-1 p-1 , Thus for k=l,2, ...,r-l, £ £ ©J P i = °- i=0 j=0 k p e Case 2: The first term of the binomial expansion, r > k = 0, Here, again substituting (5) into (k) , v v P 6 " 1 -! v P 6 " 1 -! ,T\ r-k.k ^ „ .r-k r .r ( V )P i Z J - p £ J k j=0 d=0 r, e-2 e-lx ._ . p (-p +a, Q p ) if r = p-1 r e-1 P a r,0 P (7) if r =0,1, ...,p-2 Now in (7), p oc n p = 0, since r § 1. Also in (7), r,u p e p (-p " +a n p ~ ) = if r = p-1 ^ 2, that is p 1 3- Thus (7) becomes r,0 p e -p e_1 if p = 2 e-1 ,r^ r-k.k " .r-k ( k )p i Z 3 = e 0, -1 e_1 -l /i\ .r-k r-k.k Here, Z Z L)3 p i reduces to i=0 j=0 1 e " 1 n i P-1 p -1 , p-1 ^ r .r e-1 .r Z Zi=p Si = i e-1 p (p-l+a p) if r = p-1 i=0 j=0 i=0 , (9) e-1 P r, r if r=0,l, ...,p-2 where Corollary 2 is used in obtaining the last equality. This reduces further to e-1 . p-1 p -1 S Z i = _ < i=0 o=0 pe if r = p-1 if r=0,l,2, ...,p-2 (10) T* t* — Tc T* — "k" Tc Summing Z Z ( v )j p i over all k, and using the i=o o=o k results of cases 1, 2, and 3, completes the proof of the lemma. ■ Lemma 8: If p e |N, p e+1 /n and e ^ 1 then p e / S i P_1 i=0 68 N-l 1 N-l Proof: p e | £ i P ~ if and only if E i P ~ e 0. Now i=0 . i=0 P e N_1 -P-l N p " 1 .p-l e+1 »_ . .. v N ^ T Ei - — E i • p / N implies p / — and, by Lemma 7, i.o P e P e i=0 P e P 6 -l i i w P 6 -i 1 E i^ 1 - e -p 6 - 1 . Ihus iL E i^ 1 ^ e 0. ■ i=0 P P 1=0 P It is now possible to prove Theorem 10. Proof of Theorem 10: As was pointed out earlier, all that there remains to prove is the first of the two statements found at the beginning of this section. To this end, suppose a collection of t x >y] -lines, p a prime factor of N, is given, C = {[x.,y.] -lines | i=l, 2, . . ., I], so that there is no valid periodic skewing scheme for this collection, using p memory modules. The assumption of the existence of a valid periodic skewing scheme, using N memory modules, for the collection of [x,y] -lines, C = ([x. ,y. ] -lines |i=l, 2, . . ., I}, will lead to a contradiction . Suppose a valid periodic skewing scheme exists for C using N memory modules. Call it cp. Then exactly N points on the torus must be mapped by the skewing scheme into memory module zero, one element from each of the N disjoint instances of the [x ,y ] -line. Let this set of N points be { (u.,v. ) | j=0, 1, . . .,N-1} . Because cp is assumed to be a valid periodic skewing scheme for C, using N memory modules, for any rx. ,y. ] -line in the collection each of the (u.,v.) is a component of a different instance of the [x. ,y. ] -line. By Lemma 2, {x.u.-y.v. mod n| i' l N ' l j l J ' j=0,l, ...,N-1} = {0,1,2, ..., N-l}, for 1=1,2, ...,I. N-l x Thus, by Lemma 8, p e / E ((x.u.-y.v.) mod N)^ , for i=l, 2, ...,I, j =0 1 3 10 where e is chosen so p e |N, p e+1 /n and ell, this last since p|N by 69 N-l p-1 assumption. Note that Z ((x.u.-y.v.) mod N) " does not depend on the j=0 i J i J choice of i. This sum will be denoted by E. Since there is assumed to be no valid periodic skewing scheme for G using p memory modules, by Lemma k either (x.,y.,p) / 1 for some i, or I I p +1 and there exist non-zero g. and a permutation, a, for which (x a(l) ,y a(l) ) - p (0,^), ..., (x a(p) ,y a(p) ) = , (g p (p-D,g p ), aM (x a( P+ l)^a( P+ l) ) "p (g p+l ,0) ' N ° W lf ( V y i' p) * 1 f ° r SOme i? then, since p|N, (x.,y.,N) f 1 also. This contradicts the assumed validity of cp. Therefore, I must be greater than or equal to p+1 and there must exist non-zero g. and a permutation, o, with the required properties. Without loss of generality, assume a is the identity permutation. Consider the system p-1 P-2, y, p-i P-2, y i X l y 2 X 2 p-1 p-1 X l V — 1 — — p-1 y ^p 7 1 p-2 p p 7 2 p-1 X ^ p ' _ 7 _ p_ A y p-i p+i p-2 y p+l X p+1 p-1 P+1 (11) where the matrix is called M, A = det(M), and R = [y p ~ ... x ] , T and r = [y n 7 n ... n are unknowns to be determined. '1 2 'p J An important question is: Does this system have a solution, and, if so, is it unique? The answer to both parts of this question will be yes if A / 0. In order to prove A ^ and to obtain some information about the form of the y., the system in (11 ) can be converted into a similar system in the field of residue classes mod p. 70 When the elements of M are replaced by their values in the field of residue classes mod p, the resulting matrix is the M of P Lemma 6. To see this observe that M. . = y. x. " " = g. (g.(j-l)) " ' =« i,J J 3 P J J P g, P ~ (j-1) 1 =_ (j-1) 1 " = M , where (x ,y ) s (g.(j-l),g.) was used, j y irjt ^ j j y j a (recall a was assumed to be the identity permutation), and g. " = 1 by Fermat's theorem. This observation justifies the choice of the name M in Lemma 6. P Also notice that A mod p = det(M ) where the determinant of M P P is calculated in the field of residue classes mod p. This is nothing more than observing that the mod operator and the det operator commute. Because A mod p = det(M ), A is a reasonable notation to use for either of these ^ v p ' p quantities. In a manner similar to that used in the proof that M is converted to M by the mod operator, R is converted to R = [0 ... 1] . By Corollary 1 of Lemma 6, A 4. 0, so A / 0. Thus the system P V MT= AR and the system M X= A R have unique solutions. The reader is P P P cautioned that despite the fact that both systems have unique solutions, it is not obvious that the mod operator applied to the y. converts P to X . The reason for this is that the y. might not be integers, i.e., if the y. are only rational it makes no sense to consider y. mod p. If, however, all the y. are integers then X will in fact be r , the column vector obtained by replacing y. with y. mod p. It is possible to show, however, that y. is an integer by solving MT= AR by means of Cramer's rule. When using Cramer's rule, y. is calculated by replacing column i of M by AR, getting a new matrix, M. , det(M. ) i and then y. = . 1 det(M) 71 7± By using common rules for manipulating determinants [8], Adet(M.' ) - - det(M.' ), where M.' is the matrix formed by replacing A column i of M by R. Since every element of Ml is an integer, det(M. ) is also an integer. Thus r consists solely of integers, and X = r • In order to complete the argument, by arriving at the contradiction mentioned at the beginning of the proof, it is convenient to determine the form of r. This can be done by first determining the form of r • This is easily done directly. In the proof of Corollary 1 of Lemma 6, it was shown that M " " = (p-l)M 1 . As noted earlier ' P P R = [0 ... 1] , sor = (p-l)M'A R = (p-l)A P P P since r (p-l) o-(P-D (p-1) -(p-1) 1 -(p-1) r(P-D (P-D -(p-i: is the last column of M' . But by Fermat's theorem P X -(P-D 2 -(p-D (P-1)- (P - 1} , sor = A (p-1) P A (p-1) p A (p-1) p . A (p-1) p Renaming A (p-1) = c, it follows that r distinct. c +& X P c+6 2 p c +6 p P , where the S. may all be 72 Finally consider the sum N-l , N-l (c^zj^.-y^.)*- -'-(c + P6 p )^ o (x p u.-y p v.)P- % (c+p6 1 )Z + (c+p6 2 )E + ••• + (c+p& )Z s^ pcE + p H(& 1 +6 2 +- • •-* ) ^ p-V, where V = & n +&~ + •••46 +c 12 p Now expand the sum another way. N-l N-l (c+po ) Z (x u -y v ) p ~ +..-+(c+p5 ) Z (x u -y v ) P ~ = - 1 - t_q J • ■ •> (i + (N-l)y k , j + CN-l)^)). is an instance of the tx k' y kV line ' similarl y> c P(i+y k ,J+x k ), cp(i+2y k ,o+2x k ), . . ., cp(i+Ny fc , j+Nx^ must all be distinct, since ((i+y^j+x^), (i+2y k , j+2^), . . ., (i+Ny fc , j+Nx^)) is also an instance of the [x, ,y, ] -line. Since these two instances have N-l ordered pairs in common, the pigeonhole principle requires that 75 cp(i, j) = cp(i+Ny , j+Nx_ ) . From this it is clear that cp(i, j) = cp(i+£lty. ,0+iNxi ), for £=...,-1,0,1,... and any i and j. Since k was arbitrary, it follows that for any (c, d), II II cp(c,d) = cp(c + Z a.Ny., d+ S a.Nx. ) = cp(c + N 2 a.y.,d +N X a.x.) = i=l X X i=l X X i=l X X i=l X X cp(c+N,d). Similarly, by using the sequence of b. 's, cp(c,d) = cp(c,d+N). Since (c, d) was arbitrary, cp(c, d) = cp(c+N, d) = cp(c, d+W) establishes that cp must be a periodic skewing scheme. H This condition, restrictive though it may be, is sufficient to resolve the most important practical case: {[1,0] -line, [0,1] -line, [1,1] -line, [1,-1] -line}. The sequences (0,1,0,0) and (1,0,0,0) suffice, clearly. Thus for this important case, considered by Budnik and Kuck [3] and Lawrie [10], if there does not exist a linear skewing scheme using N memory modules, and there does not when N is a power of two, then there is no valid skewing scheme of any type whatsoever . It is easy to allow oneself to be misled by the conclusion of Theorem 10. When given a collection of [x,y] -lines, and deciding on a skewing scheme using N memory modules, there may be advantages to choosing a non-linear, but still periodic,' skewing scheme. Some periodic skewing schemes can be so simple that they take very little hardware to perform address computation and to align the data with the correct processor, even less hardware than required by linear skewing schemes. One such periodic skewing scheme has been used in the construction of an actual machine, the STARAN [1]. Abstracting from the exact details of the STARAN n n design, the programmer views memory as consisting of a 2 x2 array. In the language of the designers of the STARAN, the programmer views the memory as having 2 words of 2 bits, and the programmer can indicate 76 he wants to fetch all the bits of one word, or a bit-slice, the j^h bit of each word (Figure 22). In the terminology used here, this is equivalent to fetching arbitrary instances of the [1,0] -line and the [0,1] -line from an array of data elements, using 2 memory modules 2 n to store the data, and using a periodic skewing scheme. The skewing scheme employed is cp(i, j) = i mod 2 © j mod 2 , n n where i mod 2 and j mod 2 are expressed in binary notation and © is exclusive-or. This periodic skewing scheme is explicitly calculated for an 8x8 array in Figure 23- Unlike the other machine designs discussed earlier, the responsibility of deciding on a skewing scheme does not rest with the programmer or compiler, but is built directly into the hardware. Indeed the user of the STARAN need not even know that a skewing scheme is employed; by appropriately setting the global address register, G, and the access mode register, M, either the correct word or bit-slice will be made to appear at the processing elements. Additional ways of setting M allow some, but not all, instances of other generalized lines (but not [x,y] -lines) to be fetched without memory conflict. 2 n The reason that this skewing scheme is of practical importance is that the address computations needed to fetch instances of the [1, 0] n -line and the [0,1] -line can be done using only n exclusive-or gates. This compares to 2 n adders that would be needed if a linear skewing scheme was employed. Additionally, an exclusive-or can be performed in less time than What has been called memory here is actually the memory of a single array module in the STARAN. Each array module has a memory consisting of a 256 x 256 array of bits. This memory physically consists of 256 independent memory modules, each with 256 one-bit words. By the skewing scheme described in the body of the text all the bits of any word and all the bits of any bit-slice lie in different memory modules, and can be fetched in one memory cycle. Readers interested in exact implementation details and the terminology used by the STARAN designers should consult [1]. 77 bit-slices words 1 1 Y////////////////////////VM//////A V -word i t_ bit-slice j Figure 22: Programmer's View of STARAN Memory. 78 1 2 3 4 5 6 7 1 3 2 5 4 7 6 2 3 1 6 7 4 5 3 2 1 7 6 5 4 4 5 6 7 1 2 3 5 4 7 6 1 3 2 6 7 4 5 2 3 1 7 6 5 4 3 2 1 Figure 23: The Periodic Skewing Scheme Used in the STARAN Computer. 79 an addition, since there is no carry propagation. The memory-processor connection network required in the STARAN is of some order of complexity as Lawrie's ft-network. The reader interested in the details of address computation, a proof of the validity of the skewing scheme, and the details of the memory-processor connection network should consult Batcher [1] . This example was presented to show that non-linear periodic skewing schemes can be important in actual practice, even when there are valid linear skewing schemes. In comparing STARAN to the more general machine modeled in Figure J, it should be noted that in STARAN the generalized lines that the programmer can access conveniently were fixed at the time of design. For these generalized lines the programmer need not concern himself with skewing schemes, as the hardware handles the data storage and unscrambling automatically. In the more general computer, modeled in Figure 3, the determination of an appropriate skewing scheme is left to the programmer, who may be restricted in his choices by the nature of the memory-processor connection network. While this may be more work for the user, it allows greater flexibility than is available in the STARAN. Some authors [11] have discussed leaving the choice of skewing scheme and /or the address computation to the compiler, thus freeing the programmer from this bookkeeping. 8o k. UNRESOLVED PROBLEMS AND DIRECTIONS OF FURTHER RESEARCH k.l The Effectiveness of Linear and Periodic Skewing Schemes One question that has occurred throughout this thesis is: Given a collection of generalized lines, when can the search for a valid skewing scheme for this collection safely be restricted to certain subclasses of skewing schemes, that is, when does a valid skewing scheme from a class of skewing schemes, imply a valid skewing scheme from a subclass of this class of skewing schemes? In Chapter 1, it was shown that for any collection of generalized lines, attention could safely be restricted from skewing schemes valid on the quarter plane to skewing schemes valid on the entire plane. Similarly, Theorem 10 shows that for collections of [x, y] -lines, attention can safely be restricted from periodic skewing schemes, using N memory modules, to linear skewing schemes, using N memory modules . In general, for an arbitrary collection of generalized lines, the question of when attention can be restricted from arbitrary skewing schemes defined on the plane to periodic skewing schemes, and from periodic skewing schemes to linear skewing schemes is unresolved. In this section some conjectures, partial results, and interesting examples are presented for a class of generalized lines called polyominoes [5,6]. Definition 11: A polyomino is a generalized line in which given any two components, (x ,y ) and (x ,y ), there exists a path (x ,y ) = (x ,y ), (x ,y ),..., 11 2 2 8l (x ± ,y ± ) = (x T ,y T ) such that (x ± ,y ± ) is a r r j j component of the generalized line, for 5=1,2, . ..,r and either x. = x. ± 1 and y. = y. or 3+1 3 3+1 J x = x. and y^^ = y ± 1, for j=l,2, . . .,r-l. 3+1 3 3+1 3 In the geometric realization of a generalized line "by unit squares, a polyomino is a generalized line that is connected. Except for the disconnected shape of Figure 20, all the generalized lines used as examples in Chapter 2 are polyominoes. There is an impressive amount of literature, and many unsolved problems concerning polyominoes. A general source is [6]. Conjecture: Given N memory modules, and a polyomino of length N, then if there is a valid skewing scheme for the polyomino, there is also a valid periodic skewing scheme for the polyomino . This conjecture is supported by consideration of a construction, illustrated initially by example. Consider the generalized line whose geometric realization is Theorem 5 proves that the problem of finding valid skewing schemes, is equivalent to determining tesselations of the plane. With the objective of analyzing possible tesselations, lay down an instance of this generalized line (see Figure 2k). Without loss of In the literature, polyominoes are usually defined to be the geometric realizations of the class of generalized lines described by Definition 11. Additionally, unlike here, in most problems concerning polyominoes, rotations and reflections of a polyomino are permitted and are not regarded as generating different polyominoes. Additionally, a comment should be made about connectedness. Here, connected means connected by more than a corner. This kind of connectedness has been called rook-wise connected, because of the permissible motions of the chess piece by the same name. 82 A B C (a) (b) (c) (p, q) \ / / v > /(p+r ,q+s) < (o,op v, / / \ (r,s) (d) Figure 2k: Positioning Four Instances of the Generalized Line ((0,0), (1,0), (1,1), (2,0), (2,1)), so their Designated Elements Form a Parallelogram. 83 generality the designated element can be assumed to be at (0,0). Now, there is only one way a second instance can be positioned so that the square labeled A is covered without overlapping of the instances (Figure 2k(h)) . Again there is only one way to position a third instance so the square labeled B is covered without any overlapping of the instances (Figure 2^(c)). Finally, there is only one way for a fourth instance to be positioned so that the square labeled C is covered and, again, there is no overlapping of the instances (Figure 2^4- (d.) ) . Notice that the designated elements form the vertices of a parallelogram which contains "no holes," and has area N, the length of the polyomino. Now, by replication of the parallelogram, it is clear that a tesselation of the plane results. The tesselation that results is very orderly. Sometimes, particularly when the polyomino has a high degree of symmetry, the construction, informally presented above, can yield more than one parallelogram. This is illustrated, in Figure 25, by the generalized line, whose geometric realization is L -shaped. The generalized line, whose geometric realization is , -shaped, exhibits the same phenomenon . When four instances of a polyomino of length N can be laid down so their designated elements form the four vertices of a parallelogram of area N and which contains no holes, then the tesselation of the plane, produced by replicating the parallelogram, induces a periodic skewing scheme. The proof of this statement is reminiscent of the proof of Theorem 11. If the vertices of the parallelogram are labeled as in In this example, N = 5« 8k v < > V (a) Figure 25: Alternate Positionings of Instances for the Generalized Line ((0,0), (1,0), (1,1), (1,2), (2,2)) 85 Figure 2^(d), then letting cp "be the skewing scheme induced by the tesselation, it is clear that cp(c,d) = ep(c+p,d+q) = cp(c+r,d+s), (lk) for any (c,d). Now |ps-qr| = N since |ps-qr| is the area of the parallelogram, and thus (c,d)+(-rp, -rq)+(pr,ps) = (c, d+N) (or (c,d-N), depending on the sign of ps-qr) . Combining this with (lk) implies cp(c,d) = cp(c,d+N)- Similarly, (c,d)+(sp, sq)+(-qr, -qs) = (c+N, d) (or (c-N,d)), and, thus, cp(c,d) = cp(c+N,d). Since (c,d) was arbitrary, cp is periodic. The obstacle standing in the way of a proof of the conjecture is the inability to prove that if a polyomino tesselates the plane, which implies the existence of a valid skewing scheme, then four instances of the polyomino can be positioned so that the designated elements form the vertices of a parallelogram of area N. This has been checked by hand for all polyominoes through N = 7 an d no exceptions have been found. It is reasonable to believe that this is in fact so, since, for most polyominoes there is usually only one way a second instance can be positioned so that some carefully chosen square is covered, and at the same time the instances do not overlap. This style of argument was used to show that the cannot tesselate the plane (Figure 12), as well as to construct the unique tesselation of the plane (except for rigid shifting) by the , (Figures Ik and 2k) . Polyominoes with some symmetry, however, frequently produce several distinct tesselations. 86 Even though it seems reasonable to believe that if a polyomino tesselates the plane, the construction of a parallelogram of area N will always be possible, there is some evidence to the contrary. Some authors have considered tesselating the plane, while allowing simultaneous use of several different polyominoes. Examples have been reported of collections of polyominoes from which a tesselation of the plane can be constructed, but from which no periodic tesselation (with any period whatsoever) can be constructed [5]. Because the example presented below indicates that there may be some unrecognized subtleties, to close the discussion of this conjecture, an alternate approach to its proof, known to be inadequate, will be discussed. Suppose a polyomino of length N is given, and f is a valid skewing scheme for this polyomino. Then, define cp(i,j) = \|r(i mod N, j mod N) . cp is periodic, clearly. The objective is to show that cp is valid. This approach is motivated by consideration of the [1,0] -line. Clearly, any valid skewing scheme, \|/, has the property \|/(c,d) = \(r(c,d+N) for any (c,d). However, it is easy to construct \|/ so that \J/(c,d) ^ \|/(c+N,d), i.e. \|/ is non-periodic in the vertical direction. Note, however, that cp, defined by cp(i, j) = \|/ ( ± mod N, j mod N) is both periodic and valid for the [1,0] -line, provided \|/ is valid. This technique also works on generalized lines whose geometric realization is an IxL rectangle, where N N = N. However, consideration of the generalized line L = ( (0, 0), (0, l), (l, l), (1,2)), whose geometric is I . , shows that such an approach to the proof of These tesselations should be carefully distinguished from those used in Theorem 6. There, several tesselations were constructed using different generalized lines, but each tesselation used instances of only one type. In the case mentioned here, several different polyominoes were used to construct a single tesselation. 8 7 the conjecture is inadequate. Figure 26 indicates a tesselation, which induces a valid skewing scheme, \|r, for which cp, defined by cp(i,j) = \|r(i mod N, j mod N) is not a valid skewing scheme. There are, however, valid periodic skewing schemes for L. If this conjecture is true, then given N memory modules and a polyomino of length N, when looking for a valid skewing scheme attention can safely be restricted to periodic skewing schemes . A good question to ask is: Can attention safely be restricted to linear skewing schemes? While the answer to this question is "no, " in general, if N is prime and if the existence of a tesselation by a polyomino implies that the construction described earlier results in a parallelogram of area N, then the answer is "yes." Using the notation of Figure 2^(d), for cp(i, j) = ai+bj mod N, define a and b by: if (p,q) ^ (0,0) then a = -q, b = p else a = 1, b = 1 . To see that cp is valid, observe first that for the exceptional case (p, q) = (0,0), either (p, q) = (±N,0) or (p, q) = (0, ±N). This is true because the polyomino is connected and has only N components . Now (p> Q.) = (±N, 0) or (0, ±N) implies that the polyomino is a [0,1] -line or a [1,0] -line, and then cp(i,j) = i+j mod N is indeed valid. When a = -q and b = p, that cp is valid can be seen as follows. Note that ^(PjQ.) = -QP+pq. mod- N = 0, and cp(r, s) = -qr+ps mod N = 0. Thus adding multiples of (p, q) and/or (r, s) to a point does not effect the value of cp, Because of this, and the orderly way in which replications of the parallelogram tesselate the plane, if cp maps all the components of one instance to distinct memory modules, then cp is valid. However, by adding 88 1 II / y II 1 II II | 1 1 II Enlargement of the ^ xh square, showing the skewing scheme induced by the above tesselation 1 2 3 3 2 3 1 1 1 2 3 2 Figure 26: A Non-periodic Skewing Scheme, ii, for which of . • >,L }, not necessarily all of the same length, determination of the minimum number of memory modules, N, so that a valid skewing scheme, using N memory modules, exists for the collection, is quite difficult. A simple heuristic is to find a collection of generalized lines, {L', L', . . .,L'}, such that L.' is a cover of L. , the L.' are all ' j.' 2' ' p ' 1 i/ i of the same length, and such that they satisfy the conditions of Theorem 6. The resulting skewing scheme will also he valid for {L , L , ...,L )• There is often great latitude in choosing the covers, since given a generalized line, L, there may be many choices for L', so that L' tesselates the plane. For the generalized line, L = ((0,0), (0, l), (0,2), (1,1), (2,1) ), if the search for covers that tesselate the plane is arbitrarily restricted to be polyominoes, and covers symmetric to other covers are ignored, there are still three generalized lines that tesselate the plane and are covers for L (see Figure 30) . The rapid growth in the number of covers which are able to tesselate the plane indicates the formal evaluation of the heuristic given earlier may be very difficult. Conjecture: Given a polyomino, L, of length N, for which there is no valid skewing scheme using N memory modules, then there is a valid skewing scheme for L using N+l memory modules if and only if there is a cover, L', for L, of length N+l, L 1 a polyomino, which tesselates the plane. 96 T ■MM I Figure 30: Covers for the Generalized Line ((0,0), (0,1), (0,2), (1,1), (2,1)) which Tesselate the Plane. 97 4.3 Comments on Broader Problems In closing, it is useful to relate this research to the construction of real machines and to the work of others. Computers similar to the one modeled in Figure 3 have been built or proposed. Many researchers have realized that a very important problem is the construction of memory-processor connection networks [10, 14]. In actual computations it is necessary to align the data so that the first component of an instance always appears at processor i , the second component of the instance always appears at processor i , etc. An example may be helpful. Consider the generalized line L = ( (0, 0), (0, 1), (0,2), (1, l), (2,0), (2,1), (2,2)) and the periodic skewing scheme depicted in Figure 13. If an instance is fetched, whose designated element is stored in memory zero, then the data is already aligned, i.e. the element demanded by processor zero is in memory module zero, the element demanded by processor one is in memory module one, etc. For an instance whose designated element is stored in memory module one, however, the data from memory module one needs to be routed to processor zero, the data from memory module two needs to be routed to processor one, the data in memory module four needs to be routed to processor two, etc. This situation can be described as follows: If an instance of L is fetched, whose designated element is stored in memory module one, then to align the data with the processors, the memory-processor- connection network must be able to sort the permutation (12 4 5 6 3)- The phrase "sort the permutation" is appropriate since paths must be established from processor zero to memory module one, from processor one to memory module two, etc., and this can be thought of as placing (12^0563) as input at the processor side of the network and getting as output (0123^56) at the memory side 98 of the network. Using this terminology, to be able to align any instance of L, using the skewing scheme of Figure 13, the network must be able to sort (0123^56), (l 2 1| 5 6 )), (2 1+ 5 1 6 3 0), (3 1 6 2 h 5), (k 5 6 2 3 1), (5 6 3 ] + l 2), and (6 3 5 1 2 *0 • In general, if a periodic skewing scheme is used, derived from a tesselation generated by replicating parallelograms constructed as described in Section h.l, then, in order to align the data, the memory-processor connection network must be able to sort N permutations. Standard fan- in arguments show that this will take 0(fog. N) time. The results of Chapter 3 and Section k.l are very encouraging. It appears that in the most important practical cases valid linear skewing schemes exist if any valid skewing schemes exist. When linear skewing schemes are employed the memory-processor connection network Is simpler. If it is unnecessary for the first component of the generalized line to go to processor zero, but if, on the other hand, it is sufficient that it always go to processor i , and similarly for the other components of the generalized line, then a network capable of performing arbitrary shifts is adequate. That is, the permutations to be sorted are just (0 12 ... N-l), (12 3 ••• N-l 0), (2 3^ ... N-10 1), ..., (N-l 1 ... N-2). If only one generalized line is required for a computation, then there are no difficulties created by always sending the first component to processor i , the second component to processor i , etc. If several generalized lines are used by an algorithm, however, problems may develop. It is often the case, as in matrix multiplication, that the algorithm will th require that the j component of all the generalized lines used be sent to the same processor, i.. A simple shifting network may no longer be J 99 adequate, since the j^n component of one generalized line may not be sent to the same processor as the j™ component of some other generalized line. It is necessary to apply a corrective transformation after (or before) the shifting is performed. A different transformation may be required for each generalized line. For N, a power of two, Lawrie's Q -network [10] performs the shifting and the additional transformations needed simultaneously, without additional time delay or extra hardware. Unfortunately, as the results presented here indicate, when only the problem of finding valid skewing schemes is considered, a power of two is not the best choice for N. Taking N to be a prime number gives consistently better results. Taking N to be prime has two major disadvantages, however. The modular arithmetic is much slower and no networks that can perform as well as the Q-network are known. One possible way of constructing such a memory-processor v connection network is to follow the shifting network by a Benes network [2] which can sort any permutation. The best algorithm for setting up the Benes network is due to Opferman and Tsao-Wu [12] . This algorithm, which takes 0(1% N) time, is too slow for use on each memory cycle. However, since the generalized lines used by a program are fixed at compile time the way in which the network needs to be set up for each generalized line used by the program can be calculated once, before the program is run, stored in a memory, and then read out, to set the network up rapidly, when needed. Adding a Benes network to the memory-processor connection network, as just described, will create additional cost and will slow down the machine somewhat since the data will have to pass through more gates. This problem, and particularly incorporating the Benes network in with the shifting network, is a good candidate for further research. 100 LIST OF REFERENCES [1] Batcher, K. E., "The Mult i- dimensional Access Memory in STARAN, " Presented at the 1975 Sagamore Conference on Parallel Processing and submitted for publication in the IEEE Transactions on Computers Special Issue on Parallel Processing. [2] Benes, V. E., Mathematical Theory of Connecting Networks and Telephone Traffic , Academic Press, New York; 1965. [3] Budnik, P. and D. J. Kuck, "The Organization and Use of Parallel Memories, " IEEE Transactions on Computers , Vol. C-20, No. 12, pp. I566-I569; December 1971. [h] Chandra, A. K., "Independent Permutations, as Related to a Problem of Moser and a Theorem of Polya, " Journal of Combinatorial Theory , Series A, Vol. 16, pp. 111-120; 197I*. [5] Gardner, M., "Mathematical Games; More About Tiling the Plane: The Possibilities of Polyominoes, Polyiamonds and Polyhexes, " Scientific American , Vol. 233, No. 2, pp. 112-115; August 1975. [6] Golomb, S. W., Polyominoes , Charles Scribner's Sons, New York; 1965. [7] Hardy, G. H. and E. M. Wright, An Introduction to the Theory of Numbers , Oxford University Press, London; 195^-. [8] Hoffman, K. and R. Kunze, Linear Algebra , Prentice-Hall, Inc., Englewood Cliffs, New Jersey; 1961. [9] Konig, D., Theorie der Endlichen und Unendlichen Graphen , Akademische Verlagsgesellschaft M.B.H., Leipzig; 1936. [10] Lawrie, D. H., "Memory-Processor Connection Networks," Ph.D. Thesis, Department of Computer Science, University of Illinois at Urb ana- Champaign, Report No. UIUCDCS-R-73-557; February 1973- [11] Muraoka, Y., "Storage Allocation Algorithms in the TRANQUIL Compiler," M.S. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, Report No. 297; January 1969- [12] Opferman, D. C. and N. T. Tsao-Wu, "On a Class of Rearrangeable Switching Networks, " Bell System Technical Journal , Vol. 50, pp. 1579-1618; May- June 1971. 101 [13] Polya, G., "liber die 'doppelt-periodischen' Losungen des rz-Damen-Problems, " in W. Ahrens, Mathamatische Unterhaltungen und Spiele , Teubner, Leipzig, pp. 36V 374; 19 18. [14] Swanson, R. C, "Interconnections for Parallel Memories to Unscramble p-Ordered Vectors," IEEE Transactions on Computers , Vol. C-23, No. 11, pp. 1105-1115; November 1974. [15] Thornton, J. E. , Design of a Computer the Control Data 6600 , Scott, Foresman and Company, Glen view, Illinois; 1970. [16] Yao, F., Private Communication. 102 VITA Henry David Shapiro was born in New York, New York on September 17, 19^-7. He received his bachelor of arts degree in mathematics from the Johns Hopkins University, graduating with both honors and departmental honors. At that time, he was also elected to Phi Beta Kappa. In October 1969, Mr. Shapiro was awarded a master of science degree in mathematics from Stanford University. Before coming to the University of Illinois, he taught high school mathematics and computer science at West Springfield High School, Fairfax County, Virginia, from September 1969 until June 1972. While working on his doctorate, he held a National Science Foundation Graduate Fellowship and a Research Assistantship in the Department of Computer Science. Mr. Shapiro is a member of Sigma Xi, the Association for Computing Machinery, and the National Council of Teachers of Mathematics. He has had two papers accepted for publication: "Storage Schemes in Parallel Memories" (to appear in the Proceedings of the 1975 Sagamore Computer Conference on Parallel Processing) and "A New Approach to Teaching a First Course in Compiler Construction" with M. D. Mickunas (to appear in the Proceedings of the Sixth Symposium on Computer Science Education). IBLIOGRAPHIC DATA HEET 1. Report No. UIUCDCS-R-75-776 3. Recipient's Accession No. Title and Subtitle THEORETICAL LIMITATIONS ON THE USE OF PARALLEL MEMORIES 5- Report Date December, 1975 Aur hor(s ) Henry David Shapiro 8- Performing Organization Rept. No. Vrforming Organization Name and Address Department of Computer Science University of Illinois at Urb ana- Champaign Urbana, Illinois 10. Project/Task/Work Unit No. 11. Contract /Grant No. NSF C-J U1538 2. Sponsoring Organization Name and Address 13. Type of Report & Period Covered Ph.D. Thesis 14. elementary Notes '. Abstracts Regardless of the underlying machine architecture, independently addressable anory modules contribute significantly to program speed-ups on modern computers. :;cause of memory conflicts which arise while accessing data, actual program speed-ups ■'e generally less than theoretically possible. Organizing the data of a computation !> as to avoid memory conflicts is particularly difficult for data which can logically l viewed as two-dimensional. Several geometric and algebraic conditions are presented ziich determine if the data of a computation can be organized to avoid memory conflicts ]. is shown that a prime number of memory modules gives higher memory utilization and E.lows the use of simpler storage schemes than a power of two number of memory modules. le case of greatest practical significance, references to rows, columns and diagonals c' a matrix, is given special attention. Finally, a brief discussion is presented lien relates this research to that of a companion problem, the construction of 1 Ke> Words and Document Analysis. 17a. Descriptors memOry-prOCeSSOr connection networks for single-instruction-multiple-data stream machines parallel memories memory conflicts skewing schemes linear skewing tesselations 1 I l< nt it lers /Open-Ended Terms 0SAT1 I- ie Id /Group >\ atlability Statement — " M T1S- 3f> ( I0-7C 19. Security Class (This Report ) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21- No. of Pages 22. Price USCOMM-DC 40329-P71 c<2>