RffiwUnBia 
 
 ill II 
 
 in 
 
 vttmi 
 
 ml 
 
 111191 
 
 ■Nfl 
 
 i hi 
 
 ■■I 
 
 IHIIiliiii 
 
 \wmm 
 
 lwll88li8HHflIBi 
 
 ■ 
 
 Bffli 
 
 HHlill 
 
 m 
 
 I 
 
 I §8ffi]Q9fififfifl8 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 no."77<o-78l 
 
 cop. 2^ 
 
1 he person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 
 L161 — O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/theoreticallimit776shap 
 
THEOEETICAL LIMITATIONS ON THE USE OF 
 PARALLEL MEMORIES 
 
 Henry David Shapiro 
 
* \ 
 
 The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 APR 
 
 w 
 
 6 1970 
 
 1 r> 
 
 --= ''f, 
 
 L161 — O-1096 
 
THEORETICAL LIMITATIONS ON THE USE OF 
 PARALLEL MEMORIES 
 
 Henry David Shapiro, Ph.D. 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana- Champaign, 1976 
 
 3>r 
 
 Regardless of the underlying machine architecture, independently 
 addressable memory modules contribute significantly to program speed-ups on 
 modern computers. Because of memory conflicts which arise while accessing 
 data, actual program speed-ups are generally less than theoretically possible. 
 Organizing the data of a computation so as to avoid memory conflicts is 
 particularly difficult for data which can logically be viewed as 
 two-dimensional. Several geometric and algebraic conditions are presented 
 which determine if the data of a computation can'be organized to avoid 
 memory conflicts. It is shown that a prime number of memory modules gives 
 higher memory utilization and allows the use of simpler storage schemes 
 than a power of two number of memory modules. The case of greatest practical 
 significance, references to rows, columns and diagonals of a matrix, is 
 given special attention. Finally, a brief discussion is presented which 
 relates this research to that of a companion problem, the construction of 
 memory-processor connection networks for single-instruction-multiple-data 
 stream machines. 
 
UIUCDCS-R- 75-776 
 
 THEORETICAL LIMITATIONS ON THE USE OF 
 PARALLEL MEMORIES 
 
 by 
 
 Henry David Shapiro 
 
 B.A. , Johns Hopkins University, 1968 
 M.S., Stanford University, I969 
 
 Department of Computer Science 
 University of Illinois at Urb ana- Champaign 
 Urbana, Illinois 
 
 This work was supported in part by the National Science Foundation 
 under grant no. NSF GJ U1538 and was submitted in partial 
 fulfillment for the Doctor of Philosophy degree in Computer Science, 
 1975. 
 
\LLior 
 ho.77fc-7BI 
 
 p. 3— 
 
 in 
 
 ACKNOWLEDGMENTS 
 
 Special thanks are due to the chairman of my doctoral committee, 
 Professor C. L. Liu, for his constant encouragement and guidance throughout 
 this research. Appreciation is also extended to the other members of the 
 committee, Professors David Kuck, Duncan Lawrie, Judith Liebman, and 
 Franco Preparata, for their many constructive suggestions. Professor 
 Preparata deserves a special note of gratitude for his comments which 
 helped to simplify the proof of Theorem 5. 
 
 Also, sincere appreciation is extended to two fellow graduate 
 students, Bruce Link, for his sustained interest in this research as well 
 as his participation in discussions of these results; and Brian Hansche, 
 for our discussions of the conjectures presented in Chapter h. 
 
 To Mr. Stanley Zundo of the Computer Science drafting department 
 goes a note of thanks for his assistance in the preparation of the numerous 
 figures included in this dissertation. Thanks also goes to Mrs. Connie Slovak 
 for an outstanding job in the typing of this manuscript. 
 
 Recognition for the financial support which made this research 
 possible is due to both a National Science Foundation Graduate Fellowship 
 and NSF Grant GJ ^1538. 
 
 Finally, a note of thanks to my wife, Jacqueline, who more than anyone 
 else, encouraged and cheered me when this work progressed slowly. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. THE DATA ORGANIZATION PROBLEM 1 
 
 1.1 Machine Models and the Data Organization Problem 1 
 
 1.2 Formalization of the Problem 7 
 
 1. 3 Elimination of Boundary Conditions 12 
 
 1.^4- Classes of Skewing Schemes and Some Particular 
 
 Generalized Lines 15 
 
 1. 5 Summary 22 
 
 2 . DETERMINATION OF VALID SKEWING SCHEMES 2k 
 
 2 . 1 Introduction 2k 
 
 2.2 The Basic Result 2k 
 
 2.3 Existence and Construction of Valid Skewing Schemes 
 when the Number of Memory Modules Equals the Length 
 
 of the Generalized Line 32 
 
 2.k Existence and Construction of Valid Periodic 
 
 Skewing Schemes kk 
 
 3. SPECIAL RESULTS ON [x,y] -LINES 52 
 
 3. 1 Introduction 52 
 
 3 . 2 Preliminaries 52 
 
 3.3 The Special Case of a Prime Number of Memory Modules 56 
 
 3. k Generalization to Composite N 60 
 
 3 • 5 Further Results and Examples 73 
 
Page 
 
 k. UNRESOLVED PROBLEMS AND DIRECTIONS OF FURTHER RESEARCH 80 
 
 k.l The Effectiveness of Linear and Periodic Skewing Schemes.... 80 
 
 k.2 Questions Relating to Memory Utilization 93 
 
 h. 3 Comments on Broader Problems 97 
 
 LIST OF REFERENCES 100 
 
 VITA 102 
 
VI 
 
 LIST OF FIGURES 
 
 Figure Page 
 
 1 Multi-function Computer 2 
 
 2 Data Needed for the Evaluation of the Function, F 5 
 
 3 Parallel Computer 6 
 
 k Geometric Realization of a Generalized Line 9 
 
 5 The Instance of the [x,y] -line whose First 
 
 Component is (i, j ) 19 
 
 6 Containment Relations Between Classes of Skewing Schemes... 23 
 
 7 Instances of a Generalized Line, with their Designated 
 Elements Marked with Asterisks 26 
 
 8 Checking the Condition in Theorem h 28 
 
 9 Pictorial Presentation of the Proof of Theorem k 30 
 
 10 Pictorial Presentation of the Proof of Theorem h 31 
 
 11 Tesselation of the Plane by a Generalized Line 33 
 
 12 The Generalized Line L = ( (0, 0), (0, l), (0,2 ), (l, l), (2, l) ) 
 Cannot Tesselate the Plane 3^- 
 
 13 The Skewing Scheme Resulting From the Use of Theorem 5 38 
 
 lU Tesselation of the Plane by the Generalized Line, 
 
 L = ((0,0), (1,0), (1,1), (2,0), (2,1)) 1+1 
 
 15 Tesselation of the Plane by the Generalized Line, 
 
 L = ((0,0), (1,0), (2,0), (2,1), (2,2)) 1+2 
 
 16 One Possible Tesselation of the Plane by the 
 
 Generalized Line, L = ( (0, 0), (l, -l), (l, 0), (l, l), (2, 0) ) 1+3 
 
 17 Another Possible Tesselation of the Plane by the 
 Generalized Line, L= ( (0,0), (l, -l), (1,0), (l, l), (2, 0)) 1+5 
 
VI 1 
 
 Figure Page 
 
 18 The "Wrap Around" Interpretation of a Generalized 
 
 Line Used with Periodic Skewing Schemes k6 
 
 19 A Valid Linear Skewing Scheme for the Generalized 
 
 Line, L = ((0,0), (0,1), (0,2), (l,l), (2,0), (2,1), (2,2)) 50 
 
 20 Proof of the Existence of a Periodic Skewing Scheme 
 
 for L = ((0,0), (0,1), (2,1), (2,2)) 51 
 
 21 [x, y ] -lines on the Torus 5^- 
 
 22 Programmer * s View of STAEAN Memory 77 
 
 23 The Periodic Skewing Scheme Used in the STARAN Computer.... 78 
 
 2k Positioning Four Instances of the Generalized Line 
 
 ((0,0), (1,0), (1,1), (2,0), (2,1)), so their Designated 
 
 Elements Form a Parallelogram 82 
 
 25 Alternate Positionings of Instances for the 
 
 Generalized" Line ( (0, 0), (l, 0), (l, 1), (1,2 ), (2,2 ) ) Qk 
 
 26 A Non-periodic Skewing Scheme, \|r, for which cp(i,j) = 
 
 \|/(i mod N, j mod N) is not Valid 88 
 
 27 Examples of Translating by (p, q) and/ or (r, s), so 
 that all the Components of the Instance of the 
 
 Generalized Line Lie Interior to the Parallelogram 90 
 
 28 Components of an Instance of a Generalized Line, 
 after Translation by (p,q) and/or (r, s), which 
 
 are Stored in Same Memory Module 91 
 
 29 An Example of a Polyomino for which There is a 
 Valid Periodic Skewing Scheme, but no Valid Linear 
 
 Skewing Scheme 92 
 
 30 Covers for the Generalized Line ((0,0), (0,1), (0,2), 
 
 (1,1), (2,1)) which Tesselate the Plane 96 
 
1. THE DATA ORGANIZATION PROBLEM 
 
 1.1 Machine Models and the Data Organization Problem 
 
 In the late 1950' s and early 1960's computer architects began 
 to explore the possibility of increasing the speed at which existing 
 computers operated, by performing some internal operations simultaneously. 
 The overlapping of memory fetches with instruction decoding and execution was 
 the basis of the increase in speed of several machines. A difficulty 
 imposed by the hardware technology of that day was that the rate at which 
 the control unit and arithmetic processor could manipulate data was 
 higher than the rate at which a single memory unit could supply the data . 
 Despite many changes in memory and circuit technology over the past twenty 
 years, the inability of a single memory unit to satisfy the data demands 
 of the central processor has not changed. It appears that this situation 
 will persist in the foreseeable future. Because of the relatively slow 
 memory data rate, primary memory on most modern computers consists of 
 several independent memory modules. Since memory fetches can go on 
 simultaneously in different memory modules, the rate at which the memory 
 can supply data to the central processor is effectively increased. A 
 very successful machine, designed on these general principles, was the 
 CDC 6600 [15] • Figure 1 depicts a block diagram of a computer, which may 
 be regarded as an abstraction of this machine. The designers of the 
 CDC 6600 realized that effective use of the potentially highly overlapped 
 functioning of their computer depended on reducing data dependencies in 
 
control unit 
 
 function 1 
 
 function k 
 
 * * 
 
 operand 
 registers 
 
 operand 
 registers 
 
 BUS 
 
 I 
 
 general registers 
 
 primary 
 memory 
 
 memory 
 
 module 
 
 1 
 
 memo ry 
 module 
 
 M 
 
 Figure 1: Multi- function Computer 
 
computations and on storing the data requested by the control unit so 
 that data elements demanded in quick succession were in different memory 
 modules. The problem of detecting and reducing data dependencies in a 
 computation has received much serious attention in the literature. The 
 question of how to organize the data, so that data elements needed in 
 quick succession by the central processor were not in the same memory 
 module, was, for a long time, generally ignored. The designers of the 
 CDC 6600 tried to lessen contention for memory by arranging primary memory 
 so that references to consecutively numbered memory locations cycled 
 through all thirty- two memory modules before repeating. This scheme 
 eliminates memory conflict in the most common cases: Fetching sequential 
 instructions and manipulating the data of one-dimensional arrays stored 
 in consecutive memory locations. 
 
 The problem of organizing the data of a two-dimensional array, 
 so that data requested in quick succession are in different memory 
 modules, was left to the programmer and/or the compiler. To make this 
 problem more explicit consider the following specific example. A FORTRAN 
 programmer writes 
 
 DIMENSION A(N, N) 
 A(I,J)=F(A(I-1,J-1),A(I-1,J),A(I-1,J+1),A(I,J),A(I+1,J-1),A(I+1,J),A(I+1,J+1)) 
 
 Normally the programmer envisions the memory allocated to array A as actually 
 being two-dimensional, leaving it to the compiler to convert the doubly 
 subscripted references to real machine addresses. Fetching the parameters 
 
for the function call can be thought of as fetching, in rapid succession, 
 
 in Figure 2. Depending on the 
 
 the data enclosed by the 
 
 dimensions of the array A, and the method of data organization employed 
 by the FORTRAN compiler, some of the seven parameters needed by the 
 function, F, may lie in the same memory module. If such a memory conflict 
 occurs, the fetching of the data, and the overall computation, will be 
 slowed. In general, organizing the data of a two-dimensional array so 
 that memory conflicts are reduced or eliminated is a very difficult 
 problem. 
 
 Another machine design in which this same type of problem 
 arises is depicted in Figure 3- This is a single-instruction-multiple- 
 data stream (SIMD) machine, an abstraction of ILLIAC IV. In many 
 computations the goal is to fetch M words of data in parallel and then 
 operate on them simultaneously. If even two of the data words to be 
 fetched are in the same memory, all the processors may have to sit idle 
 while a second memory cycle is initiated. This can affect performance 
 dramatically. Because memory conflicts can seriously degrade performance, 
 in machines of this design, organizing the data of a computation so that 
 memory conflicts are avoided can be very important. 
 
 The purpose of this thesis is to develop some mathematical 
 conditions which determine if the data of a two-dimensional array can be 
 stored in a primary memory consisting of independent memory modules, so 
 that during a given computation the data requested by the control unit and /or 
 arithmetic processors can be fetched without memory conflicts. In Chapter 1 
 preliminaries are considered. Chapter 2 provides a general discussion, 
 
a i-3, d-2 i-3, 0-1 i-3, i-3, 0+1 i-3, 0+2 
 
 a i-2,j-2 a i-2,j-l a i-2,j a i-2,j+l a i-2,j+2 
 
 i-1,0-2 
 
 %J-2 
 
 i+1,0-2 
 
 a. _ 
 
 1-1,0-1 
 
 a. . 
 1-1,0 
 
 a. _ .;_ 
 
 1-1,0+1 
 
 1,0-1 
 
 a. . 
 1,0 
 
 a. . . 
 1,0+1 
 
 1+1,0-1 
 
 a. . . 
 1+1,0 
 
 a. . . _ 
 1+1,0+1 
 
 i-1,0+2 
 
 i,0+2 
 
 1+1,0+2 
 
 a i+2,j-2 a i+2,j-l a i+2,j a i+2,j-H a i+2,j+2 
 
 a. -» . ~ a. _ . a. _ . a. _ . a. . 
 
 i+3,o-2 1+3,0-1 1+3,0 i+3,o+l i+3,o+2 
 
 Figure 2: Data Needed for the Evaluation 
 of the Function, F. 
 
arithmetic 
 processors 
 
 A 
 
 1 
 
 A. 
 
 control 
 unit 
 
 —Mr 
 
 memory- processor connection network 
 
 \ 
 
 primary- 
 memory 
 
 k. 
 
 "s 
 
 Figure 3 : Parallel Computer . 
 
7 
 
 while Chapter 3 focuses attention on some special cases of importance 
 in practice. Chapter k informally" presents some techniques which 
 have promise in practice, even though complete theoretical analysis of 
 the techniques has not been completed. 
 
 1.2 Formalization of the Problem 
 
 In Section 1.1, the problem of organizing data in parallel 
 memories to eliminate memory conflicts was developed from an historical 
 perspective. To treat this problem mathematically it is convenient to 
 provide a model of the computations that abstract the situation sufficiently 
 so that machine dependent details are eliminated. The data for the 
 computations are to be stored in a doubly subscripted array. 
 
 Definition 1: A generalized line , L, of length n, is 
 
 an n-tuple of ordered pairs of integers, the first 
 
 ordered pair of which is (0,0). 
 
 A generalized line can be thought of as a rigid template,- which 
 during the course of a computation is positioned at various locations over 
 the matrix of data. The data enclosed by the template is to be fetched 
 for a computation. Returning to the programming example used in Section 1.1, 
 
 the 
 
 is to be viewed as the template and its positioning 
 
 over the matrix of data elements is determined by the actual values of I 
 and J during execution. The data enclosed by the 
 
 needs to be 
 
 fetched before computation of the function, F, can proceed. The actual 
 generalized line is an n-tuple, for example L = ( (0, 0), (0,1), (0,2), (l, l), 
 (2,0), (2,1), (2,2)) . This is clearly just a formal way of specifying a 
 
8 
 
 geometric template. The template can be built by placing unit squares 
 on the plane, so that the unit squares are centered at the points of the 
 plane indicated by the ordered pairs of the n- tuple. Figure k 
 demonstrates this construction for the generalized line L . 
 
 There are a few minor points that need clarification. First, 
 the labeling of the points of the plane is not the method commonly used 
 in elementary algebra. The first coordinate indicates the vertical 
 direction, with down being positive, and the second coordinate indicates 
 the horizontal direction, with right being positive. This labeling was 
 chosen to reinforce the fact that the data are stored in a two-dimensional 
 array; this method of labeling is often used for two-dimensional arrays 
 in the literature. This labeling scheme also conforms to that of other 
 authors [3,10]. A second minor point is that technically L and L = ((0,0), 
 (-1,-1), (1,-1), (-1,0), (1,0), (-1,1), (1,1)) are different generalized lines. 
 Their realization by unit squares, however, gives the same geometric shape. 
 A formal definition of equivalent generalized lines could be given; 
 intuitively two generalized lines are equivalent if they realize the same 
 shape, without rotations or reflections. A third point is that the 
 geometric realization of a generalized line need not be a connected 
 figure . 
 
 As has been pointed out, a generalized line can be viewed as a 
 template. During the execution of a program this template will be positioned 
 over the matrix of data, and the data elements enclosed by it will be 
 referenced in parallel (or in quick succession, depending on the nature 
 of the machine) . The positioning of a template can be viewed as the 
 intuitive interpretation of 
 
(0,0) 
 
 (0,1) 
 
 (0,2) 
 
 (1,0) 
 
 (1,1) 
 
 :i,2) 
 
 (2,0) 
 
 (2,1: 
 
 [2,2) 
 
 Geometric realization of the generalized line, 
 \ = ((0,0), (0,1), (0,2), (1,1), (2,0), (2,1), (2,2)). 
 
 The ordered pairs indicate the labeling of points 
 in the plane . 
 
 Figure k: Geometric Realization of a 
 Generalized Line. 
 
10 
 
 Definition 2: An instance of a generalized line, L, 
 is the ordered n- tuple, L(a,b), resulting from the 
 addition of the ordered pair (a,b) to each component 
 of the generalized line. 
 
 The positioning of the 
 
 in Figure 2 corresponds to 
 
 L„(i,j)« If the equivalent generalized line, L , (see Figure k) is used, 
 then this same collection of data elements corresponds to L (i-1, j-l). It 
 is reasonable to consider a version of FORTRAN designed for machines with 
 parallel functioning. The program segment of Section 1.1 might become 
 
 TEMPLATE L=((0,0), (0,1), (0,2), (l,l), (2,0), (2,1), (2,2)) 
 DIMENSION A(N,N) 
 
 A(I,J)=F(L(I-1,J-1) OF A) 
 
 With these definitions it is now possible to give a precise 
 statement of the data organization problem. 
 
 Problem: Given a large matrix of data and given that 
 an algorithm requires, at various stages in its 
 computation, the data elements contained in many 
 instances of one or more generalized lines, is it 
 possible to assign the data elements of the matrix 
 to various memory modules, so that when the data 
 elements of an instance of a generalized line are 
 demanded, all the data elements lie in different 
 memory modules? 
 
11 
 
 The ability to store the data, so that when an instance of a 
 generalized line is desired all the data elements are in different 
 memories, is a goal in keeping with the functioning of machines 
 designed along the lines of Figures 1 and 5« If it is possible to 
 store the data so no memory conflicts result when fetching instances 
 of the generalized lines used by the algorithm, then, in machines 
 designed along the lines of Figure 1, contention for memory is lessened, 
 and in machines designed along the lines of Figure 3, the data can be 
 fetched in one memory cycle. This problem motivates the following 
 definitions . 
 
 Definition 3: Given an M xM matrix and N independent 
 memory modules, a skewing scheme is a mapping, 
 cp: (0,1,2, ...,M-1} x {0,1,2, ...,M-1) - (0, 1, 2, . . ., N-l}, 
 where cp(i.j) = k means matrix element a. . is stored 
 in memory module k. 
 
 Definition k: Given a collection of generalized lines, 
 {L n ,L p , . . -,Lp)j an M xM matrix of data, N memory 
 modules, and a skewing scheme, cp, the skewing scheme 
 is said to be valid for this collection if and only if 
 given any instance of any of the generalized lines, cp 
 assigns the data elements of the instance which lie 
 within the matrix bounds to distinct memory modules. 
 
 With these definitions the problem described earlier can be 
 formalized as: Given an MxM matrix, N independent memory modules, and a 
 collection of generalized lines used by an algorithm, is there a valid 
 skewing scheme for this collection? 
 
12 
 
 1.5 Elimination of Boundary Conditions 
 
 The reader may have noticed in the preceding section that the 
 definition of a skewing scheme explicitly depends on M, the size of 
 the matrix. From the point of view of the programmer this is unfortunate. 
 The matrix size may vary from one run to the next. If a new skewing 
 scheme was needed for each run, use of some programs might be difficult. 
 Notice, however, that if cp is a valid skewing scheme for a collection of 
 generalized lines on an MxM matrix, then cp restricted to [0,1,2, ...,M'-1] x 
 {0,1,2, . . .,M'-1}, M' < M, is also valid for this collection on an M* x M' 
 matrix. The pragmatic consideration that M may not be known in advance, 
 and can be large, leads to the search for valid skewing schemes with 
 domain {0,1,2,...} x {0, 1, 2, . . .) . When skewing schemes with this domain 
 are used several benefits accrue. As noted above, such a skewing scheme 
 can be used without prior knowledge about the size of M. A secondary 
 benefit is that special case handling of instances which overlap the 
 boundaries along the right and bottom of the MxM matrix can be simplified. 
 Zero or some other null value can be stored for the value of data elements 
 outside the actual array bounds. These practical considerations justify 
 elimination of the boundaries along the right and bottom edges of the 
 matrix, i.e. there are reasons to treat the matrix as infinite in size. 
 It is also possible to show mathematically that widening the domain of 
 skewing schemes to the quarter plane does not result in any loss of generality, 
 that is, those collections for which valid skewing schemes exist on all 
 finite domains, have valid skewing schemes on the quarter plane. 
 Theorem 1 : Given a collection of generalized lines, 
 {L , L , ...,L }, and N, the number of memory modules, 
 
13 
 
 if for each M there exists a valid skewing scheme 
 
 for this collection, cp : [0,1,2, .. .,M-1] x 
 
 (0,1,2, ...,M-1] -» [0,1,2, ...,U-1}, then there 
 
 exists a valid skewing scheme for this collection 
 
 with domain (0, 1, 2, ... } x (0, 1, 2, . . .} . 
 
 Proof: The proof uses the Konig infinity lemma: If a rooted 
 tree has infinitely many nodes, but each node has finitely many successors, 
 then there is a path of infinite length in the tree [9]. We construct a 
 rooted tree as follows . The nodes at level i in the tree are the skewing 
 schemes valid for this collection for an i xi matrix. Recall that for 
 instances which overlap the boundaries of the matrix there must not be 
 any memory conflicts for elements of the instance lying inside the matrix 
 bounds. Also note that the one node at level 0, the root, is an artificial 
 construct, since matrices of dimension zero have no data elements, i.e. 
 one node at level is created, by convention, to provide a root for the 
 tree. A node at level i, cp., is connected to a node at level i+1, cp. , 
 if cp. , restricted to {0, 1, 2, . . .,i-l] x {0, 1, 2, . . ., i-1} is just cp. . (The node 
 at level is connected to all nodes at level 1, by convention.) This 
 construction produces a tree. To see this, note that every node at level i, 
 i > 0, has a predecessor, for if cp. is a valid skewing scheme on the i x i 
 matrix, then cp. restricted to [0, 1, 2, . . ., i-2} X [0, 1, 2, . . ., i-2} is a valid 
 skewing scheme on the (i-l) x (i-l) matrix. Also note that each cp. has 
 only one predecessor. Thus the construction yields a tree. Next, notice 
 that the tree has infinitely many nodes, since by assumption there is a 
 valid skewing scheme for every M, and, hence, at least one node at each 
 level. Lastly, each node has only finitely many successors, in fact a 
 
Ik 
 
 21+1 
 node at level i has exactly N possible candidates for successor 
 
 nodes, many of which will presumably fail to be valid skewing schemes. 
 
 Thus the Konig infinity lemma implies an infinite path in the tree. 
 
 Let the nodes in this path be ty ,\|/ ,\|/ , . . . . Define cp by cp(i, j) = a|t (i, j) 
 
 where k > max(i,j). Notice that cp is a well-defined function, since if 
 
 k~ > k n then \|/, restricted to (0, 1,2, . . .,k -1} x (0, 1, 2, . . .,k_, -1} is just 
 
 d 1 K„ 1 1 
 
 \|r , by the way in which the tree was constructed. To see that cp is 
 
 k l 
 valid for the collection (L, , L p , . . . ,L p }, consider an arbitrary instance of 
 
 one of the generalized lines. Let k be selected sufficiently large so 
 
 that this instance does not overlap the right or bottom boundaries of 
 
 the k xk matrix. Now since \h is valid for this collection on the k xk 
 
 matrix, and cp restricted to (0,1,2, .. .,k-l) x (0, 1,2, . . .,k-l) equals \|r , 
 
 all the data elements comprising the instance (except those which overlap 
 
 the left and top boundaries) will be mapped to different memories by \|/ , 
 
 and hence cp. Since the instance was arbitrary, cp is a valid skewing 
 
 scheme . ■ 
 
 Theorem 1 shows that in searching for a valid skewing scheme, 
 
 M can be ignored. As pointed out earlier, an additional benefit is that 
 
 special handling at some matrix boundaries is eliminated. To extend this 
 
 simplification to the left and top boundaries it is convenient to use 
 
 (...,-1,0,1,...] x ( . . ., -1, 0, 1, . . .} as the domain for skewing schemes. 
 
 Unlike our situation earlier, this change cannot be justified on the 
 
 practical grounds that the size of the matrix may not be known in advance. 
 
 However, any difficulties at the left and top boundaries can be eliminated 
 
 by 
 
 Theorem 2: Given a collection of generalized lines, 
 (L , L , ...,L }, and N, the number of memory modules, 
 
15 
 
 if there exists a skewing scheme, cp, valid for this 
 collection with domain (0,1,2,...) x (0, 1, 2, . . .), then 
 there is a skewing scheme, cp, valid for this collection 
 with domain ( . . . , -1, 0, 1, ... } x ( . . . , -1, 0,1, . . . ) . 
 Proof: The proof is similar to the preceding proof, so only a 
 few details are sketched. The main difference is that nodes at level i 
 are chosen to represent skewing schemes valid for the collection of 
 generalized lines with domain {-i, -i+1, . . ., i-1, i) x (-i, -i+1, . . ., i-1, i] . 
 The only new difficulty is to see that level i is not empty. This is so 
 since cp restricted to (0, 1,2, . . .,2i) x (0,1,2, .. .,2i) is valid, so \|/. 
 defined on (-i, -i+1, . . ., i-1, i) x (-i, -i+1, . . ., i-1, i) by\J/.(j,k) = cp(j+i,k+i) 
 is also valid. B 
 
 The content of Theorem 2 is that there is no loss of generality 
 in considering only matrices of data that are infinite in all directions. 
 This is particularly useful in formulating theoretical results, since 
 proofs no longer need to account for any special conditions that might arise 
 when only part of an instance lies inside the matrix hounds. Because of 
 Theorem 2 only skewing schemes whose domain is the entire plane will be 
 considered throughout the rest of this thesis. 
 
 l.k Classes of Skewing Schemes and Some Particular Generalized Lines 
 
 Given a collection of generalized lines it is desirable to have 
 some conditions which determine whether or not a valid skewing scheme exists 
 In situations that arise in actual practice the existence of such a valid 
 skewing scheme is usually not sufficient. It is also highly desirable that 
 cp(i, j) be readily calculable, so that address computation does not overly 
 degrade system performance. There are two approaches that can be taken. 
 

 16 
 
 One is that cp be represented by a simple mathematical formula. The 
 second is to use some table look-up strategy. 
 
 Neither of these two methods of calculating cp works well for 
 arbitrary skewing schemes. In general, a closed form mathematical 
 expression for an arbitrary skewing scheme, cp, may not exist. Table 
 look-up techniques will also not be of much help, since a large 
 (theoretically infinite) table will have to be stored, and storing the 
 table in memory so that memory conflicts are eliminated in obtaining 
 information from the table is the same problem as storing the original 
 matrix of data so memory conflicts are eliminated. Because arbitrary 
 skewing schemes cannot always be implemented conveniently, certain 
 subclasses of the skewing schemes valid for the entire plane take on 
 significance. Table look-up schemes motivate the following definition. 
 
 Definition 5: Given N, the number of memory modules, 
 
 a skewing scheme, cp, is called periodic if and only 
 
 if cp(i,j) = cp(i+kN,j+iN), fork,! = ...,-2,-1,0,1,2,... 
 
 and for any i and j . 
 
 If cp is a periodic skewing scheme, then cp(i,j) = cp(i mod N, j mod l«l) 
 Therefore to calculate the value of cp at any point in the plane it is only 
 necessary to know cp on {0, 1, . . ., N-l] x [0, 1,2, . . .,N-1] . If N is sufficiently 
 small, periodic skewing schemes can be implemented by table look-up at 
 reasonable cost. The needed values of cp can be stored in a specially 
 designed super- fast memory. 
 
 As N becomes large, and especially in machines designed along 
 the lines suggested by Figure 3, where each arithmetic unit may require a 
 private copy of the basic N xN storage map, periodic skewing schemes 
 
17 
 
 realizable only by table look-up become unattractive. In such situations 
 the first method suggested for computing cp, a simple mathematical 
 formula, appears more reasonable. One class of skewing schemes that 
 has attracted attention in the literature is the class of linear skewing 
 schemes. 
 
 Definition 6: Given N, the number of memory modules, 
 a skewing scheme, cp, is called linear if and only if 
 there exist constants a and b such that cp(i,j) = ai+bj 
 mod N. 
 
 The class of linear skewing schemes is a subclass of the periodic 
 skewing schemes, since if cp is a linear skewing scheme, then cp(i+kN, j+fN) = 
 a(i+kN)+b(j+£N) mod N = ai+tg mod N = cp(i,j), and, thus, cp is a periodic 
 skewing scheme. That the linear skewing schemes are a subclass of the 
 periodic skewing schemes shows that some periodic skewing schemes can be 
 implemented without table look-up. There are other periodic skewing schemes 
 which can also be efficiently implemented without table look-up. The one 
 used in the STARAN computer will be mentioned in Chapter 3. 
 
 Budnik and Kuck [3] and Lawrie [10] have investigated linear 
 skewing schemes in detail. Much of their work was motivated by considering 
 machines designed as in Figure 3. After investigating the data requirements 
 of programs written for similar machines, and after discussions with 
 numerical analysts, they generally focused their attention on some 
 commonly used generalized lines. In particular the generalized lines 
 consisting of N consecutive elements of a matrix row (the generalized line 
 R=((0,0), (0,1), (0,2), . . ., (0,W-1)) ), N consecutive elements of a matrix 
 column (the generalized line C=( (0, 0), (1,0), (2, 0), . . ., (N-l, 0) ) ), N 
 consecutive elements of a forward diagonal (the generalized line 
 
18 
 
 D=((0,0), (l,l), (2,2), . . ., (N-l,N-l)) ), N consecutive elements of a 
 backwards diagonal (the generalized line B=( (0, 0), (l, -l), (2, -2), . . ., 
 (N-l, -N+l)) ), and, when N was a perfect square, n/n x */n blocks (the 
 generalized line S=((0,0), (0,1), . . ., (0,n/n-1), (1,0), (1,1), . . ., (1,n/n-1), 
 . . ., (\/n-1, 0), (n/n-1, 1), . . ., (*/n-1,\%-1)) ) were of primary concern. One 
 of the main results of Budnik and Kuck [J] is that if 2|w or 3 |n, where 
 N is the number of memories, then there is no valid linear skewing 
 scheme for the collection of generalized lines, (R, C,D, B). In order to 
 generalize this result and to provide a reasonable notation, a definition 
 is useful. 
 
 Definition 7: An [x, y]„-line is a generalized line 
 of the form ( (0, 0), (y, x), (2y,2x), . . ., ( (N-l)y, (N-l)x) ) . + 
 Pictorially, the template for an [x,y], T -line is formed by starting 
 at the origin and going over x and down y, until a total of N points are 
 generated (see Figure 5) • In this notation the generalized line 
 representing N consecutive elements of a row is the [1, 0] N -line, the 
 generalized line representing N consecutive elements of a column is the 
 [0,1] -line, etc. The following result can be found in Budnik and Kuck [3], 
 though different notation is used. 
 
 Theorem 3: Given N memory modules and a collection of 
 [x,y] N - lines, [[x^y^-lines |i=l,2, ...,I], 
 cp(c,d) = ac + bd mod N is a valid linear skewing 
 scheme for this collection if and only if 
 (ay.+bx.,N) = 1, for 1=1,2,..., I. 
 
 Note that N is the length of the [x,y] -line. 
 (c, d) is the greatest common divisor of c and d. 
 
19 
 
 <^ 
 
 (i,3) 
 
 x 
 
 V 
 
 A> 
 
 Figure 5: The Instance of the [x, y] -line whose 
 
 First Component is (i, j) 
 
 N 
 
20 
 
 Proof: Suppose first that cp(i,j) = ai+bj mod N is a valid 
 skewing scheme for bhis col-lection. Take an arbitrary [x, y] -line from 
 the collection, say L = [x ,y ] -line. Since cp is valid the N elements 
 in the instance L(0, 0) must be mapped by cp to different memory modules. 
 Since this instance is just ((0,0), (y ,x ),...,( (N-l)y , (N-l)x )), it 
 must be the case that cp(0,0) / cp(vy , vx ), for v=l,2, . . ., N-l. Thus 
 a-O+b-0 mod N / avy +bvx mod N, for V=l,2, ...,N-1. But this is just 
 v(ay +bx ) ^ 0, for v=l,2, . ..,N-1, and it is a well-known result of 
 elementary number theory that this implies (ay +bx , N) = 1 [7]« 
 
 Conversely suppose there exists a and b such that (ay +bx , N) = 1, 
 for i=l,2, ...,I. To show that cp(i,j) = ai+bj mod N is valid for the 
 collection consider an arbitrary instance of an arbitrary [x, y], T -line in 
 the collection. It suffices to show that the N elements in this instance 
 are mapped to different memory modules. For definiteness, consider 
 L = Tx ,y ] -line and the instance L(i,j). Since the choice of line and 
 the choice of instance were made arbitrarily, if cp maps all the components 
 of ((i, j), (i+y r , j+x r ), ..., (i+(N-l)y r , j + (N-l)x r )) to different memory 
 modules, then cp will be valid. But if qp(i+vy , j+vx ) = cp(i+v'y ,j+v'x ), 
 for some v,v' e (0, 1, 2, . . .,N-l], with v / v', then ai+avy +bj+bvx = 
 ai+av'y +bj+bv'x which implies (v-v')(ay +bx ) =„ 0, and since v-v' ^ 0, 
 this contradicts the assumption that (ay.+bx.,N) = 1 for i=l, 2, ...,I. ■ 
 
 The result of Budnik and Kuck mentioned earlier follows 
 immediately, since the collection of generalized lines they refer to is 
 just t[l,0] N - line, [0,1] N - line, [l,l] N -line,[l,-l] N - line}, and one of a, b, and 
 a+b will be divisible by 2, so if 2|n no choices of a and b satisfy the 
 conditions of the theorem. Similarly one of a, b, a+b, and -a+b will be 
 divisible by 3- 
 
21 
 
 Lawrie [10] points out that in computers designed along the 
 lines of Figure 3 solving the data organization problem is insufficient 
 in practice. To be able to use such a computer in a reasonable way, 
 an efficient memory-processor connection network, which can route the 
 
 data to the appropriate arithmetic unit, is also needed. In providing 
 
 2q 
 a data organization scheme and a connection network, Lawrie uses P = 2 
 
 processors and N = 2P memories. Within this framework he found that by 
 
 use of linear skewing he could fetch any P consecutive elements of any 
 
 row, column, forward diagonal, backward diagonal, or any vP x vP block 
 
 without memory conflicts. (Take a = s/P + 1 and b = 2.) This does not 
 
 violate Theorem 3> since the number of memories, N, is not equal to the 
 
 length of the lines, P. In addition to providing a skewing scheme, 
 
 Lawrie also designed a network, the fi -network, which routed the data 
 
 to the appropriate processor in 0(% P) time. F. Yao [l6] has shown 
 
 that the fi- network is optimal. Lawrie left unanswered the question of 
 
 using some non-linear skewing scheme to achieve the same conflict free 
 
 access while at the same time reducing the number of memories, N, to be 
 
 2q 
 equal to the number of processors, P = 2 . The restriction that the 
 
 number of processors be a power of two was kept so that any arithmetic 
 
 mod N could be performed rapidly by shifting and by the hope that the 
 
 fi-network, or some slight modification of it would still be able to 
 
 align the data. In Chapter 3 of this thesis it will be shown that 
 
 if the number of memories equals the number of processors, which in turn 
 
 is a power of two, then no skewing scheme of any type will be valid for 
 
 the collection of generalized lines Lawrie considers. 
 
22 
 
 Swanson [Ik] has also studied the problem of designing 
 efficient memory-processor' connection networks. In his construction 
 the number of memories, N, equals the number of processors, P, and P 
 is prime. For this case he has designed a network based on k-apart 
 shifters which operates in 0(n/p) time, but uses very little hardware. 
 Unfortunately, to align the data with the processors his network must 
 be followed by a shift network, which requires an additional 0(<ty P) time 
 and 0(% P) hardware. The choice of P as a prime, however, guarantees 
 that any instance of the [1, 0] p -, [0,l] p -, [1, l] p -, and [1,-1] -lines 
 can be fetched without conflict. The question of designing memory- 
 processor connection networks will be discussed again briefly in Chapter k. 
 
 1 . 5 Summary 
 
 In this chapter several computer designs which utilize parallel 
 memory modules to achieve program speed-ups were considered. The problem 
 of organizing the data so that computations can proceed efficiently was 
 formalized. Generalized lines, and as special cases, [x,y]„- lines, were 
 defined. Various classes of skewing schemes were also defined. Figure 6 
 depicts the containment relationships between these classes. Theorem 2 
 shows that even though skewing schemes valid on the entire plane are a 
 subclass of those valid on the quarter plane, the two classes of skewing 
 schemes are of equal power, in that the collections of generalized lines 
 they can handle are exactly the same . In Chapters 3 and k similar 
 results will be presented for linear and periodic skewing schemes, but 
 only for certain subclasses of generalized lines. 
 
23 
 
 cp: {0,1,2,...} x{0,l,2,...) - (0,1,2, ...,N-1' 
 Schemes Valid in Quarter Plane 
 
 .,-1,0,1, ...] x(..., -1,0,1, ...} - 10,1,2, 
 Schemes Valid in Whole Plane 
 
 •,.N-1] 
 
 
 cp(i,j) = cp(i+kN,j+fN) 
 k,ie{..., -1,0,1=...) 
 
 Valid Periodic Schemes 
 
 
 
 cp(i,j) = ai+bj mod N 
 Valid Linear Schemes 
 
 
 
 ■ 
 
 Figure 6: Containment Relations Between Classes 
 of Skewing Schemes. 
 
2k 
 
 2. DETERMINATION OF VALID SKEWING SCHEMES 
 
 2.1 Introduction 
 
 Chapter 1 dealt primarily with definitions and historical 
 perspectives. In this chapter results will be presented on whether a 
 matrix of data can be stored in N memory modules so that instances of a 
 given generalized line, or collection of generalized lines, can be 
 fetched without memory conflicts. In some of the results the length of 
 the generalized line(s) will be restricted to equal the number of memory 
 modules. This restriction is motivated by the desire to maximally 
 utilize memory in computers designed as indicated in Figure 5« If 
 such a computer has M arithmetic units, then the data elements of 
 instances of generalized lines of up to length M can be processed in 
 
 parallel. If the number of memories is N, then memory utilization is 
 
 M 
 limited to — , because only M out of the N memory modules will be referenced 
 
 in one memory cycle. If 100$ memory utilization is desired, N = M is 
 
 required. Since the memory-processor connection network will be more complex 
 
 for larger N, requiring N = M has additional benefits. 
 
 In the last part of the chapter special attention is paid to 
 
 periodic skewing schemes. The results developed in Sections 2.2 and 2.3 
 
 will be seen to carry over to the periodic case without significant change. 
 
 2.2 The Basic Result 
 
 In this section a necessary and sufficient condition is presented 
 for determining the validity of a skewing scheme for a generalized line of 
 
25 
 
 length M, using N memory modules. To do this it is convenient to 
 augment some earlier definitions. 
 
 Definition 8: The designated element of a generalized 
 
 line or an instance of a generalized line is the first 
 
 ordered pair of the n- tuple. 
 
 In the material to follow, the intuitive, geometric viewpoint 
 is a valuable aid to understanding, and, for this reason, the intuitive, 
 as well as the formal, approach will be presented. As discussed in 
 Chapter 1, a generalized line can be viewed as a rigid template. The 
 designated element can then be indicated by marking the appropriate 
 square of the generalized line or an instance of the generalized line by 
 
 a distinguishing mark, like an asterisk. For the 
 
 •shaped 
 
 generalized line, given by L = ( (0, 0), (0, l), (0,2), (l,l), (2, 0), (2, l), (2,2) ), 
 
 this is illustrated in Figure r J . 
 
 Theorem k: Given a generalized line of length M, N 
 
 memory modules, and a skewing scheme, the skewing 
 
 scheme is valid for this generalized line if and 
 
 only if the following condition holds for every 
 
 k e {0, 1,2, . . .,N-1} : When all instances of the 
 
 generalized line are considered, in which the 
 
 designated element of the instance is mapped by the 
 
 skewing scheme into memory k, no two of these 
 
 instances have an element in common. 
 
 Before presenting the formal proof, an intuitive explanation of 
 
 the condition presented above might be useful. Imagine a two-dimensional 
 
 memory storage map laid out on the plane, in which the entry at (i,j) is 
 
26 
 
 MM) 
 
 (0,0) 
 
 (0,1) 
 
 
 
 
 
 
 
 
 
 (1,0) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 l(; 
 
 L = ((0,0), (0,1), (0,2), (1,1), (2,0), (2,1), (2,2)) 
 
 Figure "(: Instances of a '.Generalized Line, with 
 their Designated Elements Marked with 
 Asterisks. 
 
27 
 
 the value of the skewing scheme at this point. Also assume an infinite 
 supply of realizations of the generalized line by unit squares, with 
 the designated element marked with an asterisk. Select a k e {0,1,2, ...,N-1}. 
 For each location in the memory storage map containing a k, place a copy 
 of the generalized on the plane so that the asterisk is directly over the 
 k. If, after completing this construction, no two instances of the 
 generalized line laid down on the plane, overlap, and this is true 
 irrespective of the choice of k, then the skewing scheme is valid. 
 Conversely, if the skewing scheme is valid, then when the construction 
 described above is performed for any k, the instances will not overlap. 
 This construction is illustrated in Figure 8. 
 
 Proof of Theorem k: Suppose first that for some k e {0,1,2, ...,N-1} 
 there are two distinct instances of the generalized line, whose designated 
 elements are mapped by the skewing scheme, cp, into memory module k, and that 
 these two instances have an element in common. Being specific, let the 
 generalized line be L = ( (x^y^, (x 2 ,y g ), . . ., (x^y^ ) + and let the two 
 instances be L^b^ = ( (a^x^b^y^, (a^x^b^y^, . . ., ( a 1 +x M ^ b 1 +y M ) ) 
 and L(a 2 ,b 2 ) = ( (a^x^b^y^, (a 2 +x 2 ,b 2 +y 2 ), . . ., ( a 2 + V b 2 +y *P ^ ' Then ' 
 the two instances have an element in common means (a +x.,b +y.) = 
 (a +x.,b +y.), for some i and j. In addition, i ^ j, for i = j implies 
 (a ,b, ) = (a ,b p ), contrary to the assumption that the instances are distinct 
 Since the designated elements of these two instances are mapped by cp into 
 
 memory k, c P( a 1 +x 1 >' b 1 + y 1 ) = <P( a 2 +x i> b 2 +y i) = k ' 
 
 In order to see that cp is not a valid skewing scheme for L, 
 consider the instance L(a +x -x.,b +y -y.) = ( (a +x -x.+x , b +y -y .+y ), 
 
 -L-L(JJ--1-,J -LJ_ ( J_L_LJ_ ( J-L 
 
 From the definition of generalized line, (x ,y ) = (0,0). 
 
28 
 
 
 * 
 
 3 
 
 1 
 
 5 
 
 2 
 
 6 
 
 * 
 3 
 
 
 
 
 
 2 
 
 
 
 5 
 
 
 
 1 
 
 6 
 
 4 
 
 
 
 2 
 4 
 
 5 
 
 1 
 
 4 
 
 *3 
 
 
 
 
 
 
 6 
 
 
 
 6 
 
 1 
 
 2 
 
 5 
 
 
 
 2 
 
 
 
 *3 
 
 1 
 
 6 
 
 4 
 
 2 
 
 6 
 
 5 
 
 
 
 4 
 
 * 
 - 3 
 
 
 
 5 
 
 6 
 
 2 
 
 1 
 
 
 
 ^T 
 
 
 6 
 
 2 
 
 1 
 
 4 
 
 
 
 5 
 
 2 
 
 
 overlap 
 
 occurs 
 
 here 
 
 L = ((0,0), (0,1), (0,2), (1,1), (2,1)). N = 7- 
 The condition in Theorem 4 is tested for 
 k=3' Overlap occurs in two places, implying 
 the skewing scheme is not valid. Only a 
 small section of the plane is shown. 
 
 overlap 
 
 occurs 
 
 here 
 
 Figure 8: Checking the Condition in Theorem 4. 
 
29 
 
 (a 1 +x 1 -x J +x 2 ,b 1 +y l -y J +y 2 ), . . ., (a-^-x.. +VV y i" y / y M ) ) ' The ^ 
 component of L^+x^x ,^+y^y ) is (a^x^x +x ,^+y^y +y ) = 
 
 (a +x ,b 4-y ), which is mapped by cp into memory k. The i"th component 
 of L(a 1+ x 1 -x.,b 1+ y 1 -y.) is (a^-x.+x^b^-y . + y.) = (a^x.+x^x., 
 h 2 +y.+y 1 -y,) = (a 2 +x 1 ,b 2 +y 1 ), where (a^x^b^y^ = ( a 2 +x j> b 2 +y j ) was 
 used. Since cp(a +x ,b 4-y ) = k, the i th component of L(a +x -x.,b +y -y.) 
 is also mapped into memory k. Because two distinct components of this 
 instance are mapped by cp into memory k, cp is not valid for L. Figure 9 
 provides an intuitive picture for this part of the proof. 
 
 Conversely, suppose cp is not a valid skewing scheme for the 
 generalized line, L. Then there is an instance of the generalized line, 
 L(a,b) = ((a+x 1 ,b+y 1 ), (a+x 2 ,b4y 2 ), . . ., (a+x^b+y^ ), the ith and jth 
 components of which, i ^ j, are mapped by cp into memory k. Then the two 
 
 instances L(a+x.-x n ,b+y ,-y., ) and L(a+x. -x, ,b+y. -y n ) violate the condition 
 v j 1' j 1 l 1 i 1 
 
 of the theorem. Their designated elements are (a+x.-x +x , b+y.-y +y ) = 
 (a+x ,b+y ) and (a+x^x^x^b+y^y^y^ = (a+x^b+y^, respectively, 
 which, being the i™ and j^h components of L(a,b), are mapped into memory k, 
 Furthermore these two instances have an element in common, since the i^h 
 component of L(a+x.-x , b+y.-y ) and the j th component of L(a+x. -x ,b+y. -y ) 
 
 J -I- J J- X X X X 
 
 are both (a+x.+x. -x , b+y . +y. -y ) . Figure 10 provides an intuitive picture 
 
 j x x j X X 
 
 of the situation. ■ 
 
 Several remarks are appropriate concerning this theorem. When 
 given a collection of several generalized lines, this theorem can still be 
 used to determine the validity of a skewing scheme by applying it to each 
 generalized line of the collection individually, since the skewing scheme 
 must be valid for each generalized line, independently of its validity for 
 the others in the collection. A second consideration is that this theorem 
 
30 
 
 stored in memory 
 module k 
 
 X 
 
 L(a n ,b 
 
 v r 
 
 stored in 
 memory — 
 module k 
 
 L ' a i iX r x j' b i +y r y d ] 
 
 LCa^b ) and L(a 2 ,li 
 
 have an element in 
 common 
 
 L(a o ,b 
 
 L(a +x -x.,b +y -y . ) has two components mapped 
 
 -L -L <J -i- _L ,J 
 
 by '{,' to the same memory. Hence cp is not valid. 
 
 Figure 9: Pictorial Presentation of the Proof of 'Iheorem h 
 
31 
 
 stored in memory 
 module k 
 
 \ 
 
 L(a+x -x^b+y -y x ) 
 
 element common 
 to both instances 
 
 Y 
 
 L(a,b) 
 
 stored in 
 memory 
 module k 
 
 L(a+x i -x 1 ,b+y i -y 1 ) 
 
 L(a+x .-x ,b+y.-y ) and L(a+x. -x ,b+y. -y ) have 
 
 J-LJ-L .L-LX-L 
 
 their designated elements stored in memory module 
 k and they have an element in common. 
 
 Figure 10: Pictorial Presentation of the Proof of 
 Theorem h . 
 
32 
 
 is non-constructive. While only providing a method for establishing 
 the validity of a skewing scheme, this theorem will be used in the next 
 section to establish a constructive result. 
 
 2.3 Existence and Construction of Valid Skewing Schemes when the 
 
 Number of Memory Modules Equals the Length of the Generalized Line 
 
 The results in this section will be developed first for one 
 
 generalized line, and then extended to collections of generalized lines. 
 
 Definition 9: A generalized line tesselates the 
 
 plane if and only if there exists a (necessarily 
 
 infinite) collection of instances of the generalized 
 
 line, so that every ordered pair in the plane is in 
 
 one and only one of these instances. 
 
 If the realization of a generalized line as a rigid template 
 
 composed of unit squares is used, then a generalized line tesselates the 
 
 plane if and only if it can tile the "infinite floor" without gaps or 
 
 overlapping. The generalized line, whose realization is 
 
 -shaped, 
 
 tesselates the plane, as can be seen in Figure 11, but the generalized line 
 given by T = ((0,0), (0,1), (0,2), (1,1), (2,1)), whose realization is 
 
 -shaped, cannot. Figure 12 illustrates why the 
 
 cannot 
 
 tesselate the plane. Notice that no matter where the first instance is 
 laid down on the plane, there is only one way another instance can be laid 
 down so that the square labeled A is covered and the instances do not 
 overlap. Now the square labeled B cannot be safely covered. 
 
 In the literature [5,6] tesselations of the plane normally permit rotations 
 and reflections of the basic shapes. These transformations are excluded in 
 this discussion. 
 
33 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 II 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ' 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 figure 11: Tesselation of the Plane by a Generalized Line 
 
3^ 
 
 
 
 
 
 
 
 
 A 
 
 
 
 
 
 B 
 
 
 
 
 
 
 
 
 Figure 12: The Generalized Line L = ( (0,0), (0,1), (0,2), (1,1), (2,1)) 
 Cannot Tesselate the Plane. 
 
35 
 
 Theorem 5: Given N memory modules and a generalized 
 line of length N, there is a valid skewing scheme for 
 this generalized line if and only if it tesselates 
 the plane. 
 
 Proof: Suppose that the generalized line tesselates the plane. 
 Define cp as follows: cp(i,j) = k, where (i,j) is the k+l s ^ component of 
 the instance contained in the tesselation which covers the point (i, j). 
 cp is well defined since each point in the plane is covered exactly once. 
 The question is: Given any instance, does cp map the N components of the 
 instance into distinct memory modules? The reader should note that for 
 cp to be valid the answer to the above question must be yes for all 
 instances of the generalized line, including those not contained in the 
 tesselation. Theorem k will be applied to show cp is valid. 
 
 Consider verification of the condition in Theorem k, when k = 0. 
 The set of instances that must be checked for overlap are just those that 
 comprise the given tesselation, by the way cp was defined. Since the 
 instances used in a tesselation do not overlap each other, the condition 
 is verified for k = 0. Now let k be some other value in {0,1,2, ...,N-1}. 
 Note that every element of the plane stored in memory k is a fixed shift 
 from an element stored in memory zero. To be specific, if the generalized 
 line is L = ((x^y^), (x,y ),..., (x,y)), then every element stored in 
 memory k is at (x, -x ,y -y ) away from an element stored in memory zero. 
 Conversely, every element in the plane (x, -x , y -y ) away from an 
 element stored in memory zero is stored in memory k. Thus when verifying 
 the condition of Theorem h for arbitrary k, the set of instances that must 
 be checked for overlap form a tesselation, the tesselation formed by shifting 
 the given tesselation by (x -x ,y -y ) . Again, since the instances 
 
36 
 
 comprising a tesselation do not overlap, the condition is seen to hold 
 for all k, and, hence, cp is -valid. 
 
 In order to prove the converse suppose cp is a valid skewing 
 scheme for the generalized line, L, using N memory modules. Let 
 T = {L(i, j) |cp(i, j) = 0], i.e. T is the set of instances whose designated 
 elements are mapped by cp into memory zero. The claim is that T is a 
 tesselation. The condition in Theorem h, applied when k = 0, guarantees 
 that no two instances of T overlap. Thus, to prove that T is a tesselation 
 it is necessary to show that every point in the plane is a component of 
 some instance contained in T. Suppose not, i.e. there is a point (c,d) 
 such that (c, d) is not a component of any instance in T. Now (c,d) is a 
 component of precisely N instances of L, since there is exactly one 
 instance of the generalized line in which it is the h^" component, for 
 h=l, 2, ...,N. Now consider the N distinct designated elements of these N 
 instances, cp cannot map any of these N designated elements into memory 
 zero, for it was assumed that (c,d) was not a component of any instance 
 in T. Thus there are two elements in this set, say (a ,b ) and (a ,b ), 
 so that cp(a , b ) = cp(a , b ) = I / 0, by the pigeonhole principle. Now the 
 condition of Theorem h is violated when k = i } since L(a ,b ) and L(a ,b ) 
 will both have their designated elements mapped to memory i and they both 
 contain (c, d). This contradicts the validity of cp. This contradiction 
 arose by assuming T did not cover (c, d). Thus T covers every point in the 
 plane, and T is a tesselation. B 
 
 Given a tesselation of the plane by a generalized line, the proof 
 of the theorem shows how a valid skewing scheme can be constructed. This 
 theorem provides a practical means for determining the existence of a 
 valid skewing scheme, since the construction of a tesselation can normally 
 
37 
 be done by observation. Figure 13 indicates the skewing scheme resulting 
 
 -shaped generalized line, and the use of this 
 
 from the 
 
 theorem. The dark heavy lines indicate that in this case the skewing 
 
 scheme obtained is periodic. More will be said about periodic skewing 
 
 schemes in other sections. 
 
 The result of Theorem 5 can be extended to collections of 
 
 generalized lines . 
 
 Theorem 6: Given N memory modules and a collection 
 
 of generalized lines, {L, , L p , . . .,L }, all of length 
 
 N, then there is a valid skewing scheme for this 
 
 collection if and only if there exist tesselations 
 
 of the plane, T, . TL. . . ..!_. such that T. is a 
 * ' 1 2 7 ' P 7 l 
 
 tesselation using L. and 0. = CL = ... =0 where 
 
 l 12 p 7 
 
 0. = [designated elements of the instances of L. 
 i l 
 
 comprising T. } . 
 
 Because of Theorem 5> "the condition that each generalized line 
 tesselate the plane is clearly needed. An intuitive visualization of the 
 condition on the 0. makes use of Theorem 6 easier, and may aid in 
 understanding the proof. Imagine that each tesselation of the plane, T., 
 is performed using the rigid template determined by the generalized line, 
 L., and that each tesselation is done on a separate sheet of clear 
 plastic. In addition, let the designated elements of the instances be 
 marked by asterisks. Then 0. is the set of points on the copy of the 
 plane used for T. containing an asterisk, and the condition that 
 0=0, ... = becomes: When the sheets of clear plastic are overlaid, 
 the asterisks on the sheets of plastic coincide. 
 
38 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 3 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 5 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 6 
 T 
 
 3 
 5 
 
 
 6 
 
 1 
 3 
 
 5 
 
 2 
 
 4 
 
 6 
 
 3 
 
 
 
 1 
 3 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 1 
 
 2 
 
 4 
 
 6 
 
 3 
 
 
 
 1 
 
 TJ 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 J2j4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 5 
 
 
 
 6 
 
 1 
 
 3 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 5 
 
 
 6 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 6 
 
 4 
 
 3 
 5 
 
 
 
 1 
 
 2 
 
 4 
 
 
 
 1 
 
 2 
 
 4 
 
 3 
 
 
 
 1 
 
 jy 4 
 
 5 
 
 6 
 
 3 
 5 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 J2 
 
 
 1 
 
 2 
 
 4 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6|3 
 
 3 
 
 
 
 1 
 3 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 5 
 
 
 6 
 
 1 
 3 
 
 2 
 
 4 
 
 5 
 
 6 
 
 5 
 
 6 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 
 
 1 
 
 2 
 
 4 
 
 2j4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3j 
 
 1 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 6 
 
 1 
 3 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2|4 
 
 5 
 
 6 
 
 3 
 5 
 
 6 
 4 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 »l 
 
 1° 
 
 1 
 
 2|4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6| 3 
 
 
 
 1 
 3 
 
 2| 4 
 
 5 
 
 6| 3 
 
 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6 
 
 3 
 
 5 
 
 
 6 
 
 1 
 
 3 
 
 2 
 
 4 
 
 5 
 
 6 
 
 1 
 
 2 | 4 
 
 5 
 
 6 
 
 5 
 
 6 
 
 3 
 
 
 
 1 
 
 2 
 
 4 
 
 
 
 1 
 
 2 
 
 4 
 
 5 
 
 6j 3 
 
 
 
 1 
 
 2 
 
 4 
 
 L = ((0,0), (0,1), (0,2), (1,1), (2,0), (2,1), (2,2)) . The 
 heavy lines indicate that the skewing scheme is periodic 
 
 Figure 13: The Skewing Scheme Resulting From 
 the Use of Theorem 5- 
 
39 
 
 Proof of Theorem 6: Suppose that cp is valid for this 
 collection. Then the method of proof used in Theorem 5, applied to 
 each generalized line separately, produces a tesselation T using L . 
 In addition the tesselation that results by following the construction 
 in that proof yields y = { (i, j) |cp(i, j) = 0}. Since V is independent 
 of V, 1 = 2 = ... = p . 
 
 To establish the converse suppose tesselations, T , using L,> 
 exist, and = = ... = . Then define cp by cp(i,j) = k, where (i, j) 
 
 J- d. sr 
 
 is the k+l s ^ component of the instance of L contained in T . This is 
 
 exactly the same construction employed in the proof of Theorem 5, and so 
 
 cp is well defined and a valid skewing scheme for L, . It must also be 
 
 shown that cp is a valid skewing scheme for L„,L_,, .... and L . Pick an 
 i ° 2 3 p 
 
 arbitrary generalized line, say L,. Theorem k can be applied to show that 
 cp is a valid skewing scheme for L . In verifying that the condition of 
 Theorem k holds when k = 0, the set of instances that must be examined for 
 common elements are just those instances comprising the tesselation, T . 
 Now consider verifying that the condition of Theorem h holds for arbitrary k. 
 As in the proof of Theorem k, the set of instances that must be examined for 
 common elements is obtained by shifting the tesselation. However, the 
 amount of the shift, (x -x ,y -y ), is determined by the components of 
 L , the generalized line used to construct cp, despite the fact that the 
 condition is being verified for the generalized line, L . Since a rigid 
 translation of a tesselation is a tesselation, and v and k were arbitrary, 
 the theorem is proved. ■ 
 
 A few examples may be helpful in illustrating some of the 
 subtleties that can arise in using this theorem. Consider the generalized 
 lines L = ( (0,0), (1,0), (l, l), (2, 0), (2, l) ), whose geometric realization is 
 
realization is 
 
 1+0 
 
 , and L - ((0,0), (1, 0), (2, 0), (2, l), (2,2) ), whose geometric 
 Each of these generalized lines tesselates 
 
 the plane separately, as Figures ll+ and 15 show. Since, these are the 
 only tesselations possible, except for rigid shifting, it is clear that 
 tesselations do not exist for these two generalized lines which can be 
 positioned so their designated elements coincide. Thus there is no valid 
 skewing scheme for the collection {L,,L } using only five memories. 
 
 A possible question arises from contemplating this example: 
 If another generalized line, L', which produces the same geometric 
 realization as Ly, is substituted for L in the collection [L , L , ...,L }, 
 is it possible a valid skewing scheme will now exist, where before there 
 was no valid skewing scheme? The answer to this question can be important 
 in practice, since it is convenient to think in geometric terms. 
 Fortunately, the substitution described above does not affect the 
 existence of a valid skewing scheme, since tesselations of the plane 
 using L' appear to the eye as rigid shifts of tesselations of the plane 
 using L . Thus, when actually using this theorem to find valid skewing 
 schemes, it is permissible to just draw pictures, as has been done through- 
 out, and to pick the designated elements arbitrarily. 
 
 Figures Ik and l6 illustrate another situation. Both the 
 generalized lines L, , given earlier, and L, = ((0,0), (1, -l), (1,0), (1,1), 
 
 (2,0)), whose geometric realization is 
 
 , tesselate the plane, 
 
 and it is clear that when these tesselations are overlaid, their designated 
 elements coincide. Thus Theorem 6 guarantees a skewing scheme using five 
 
kl 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 
 
 X- 
 
 
 
 
 
 X 
 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 X 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 ■* 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 * 
 
 
 
 
 
 X- 
 
 
 
 
 
 X- 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 ' 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 X 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 Figure Ik: Tesselation of the Plane by the Generalized 
 Line, L= ( (0,0), (1,0), (1, 1), (2,0), (2,1) ) . 
 
1+2 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 1 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 *- 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 *■ 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 •* 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 K 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 figure 15: Tesselation of the Plane by the Generalized 
 Line, L = ( (0,0), (1,0), (2,0), (2,1), (2,2) ) . 
 
J+3 
 
 * 
 
 * 
 
 
 * 
 
 * 
 
 
 ^t 
 
 * 
 
 
 i 
 
 * 
 
 * 
 
 * 
 
 * 
 
 * 
 
 
 * 
 
 
 * 
 
 * 
 
 
 * 
 
 <L 
 
 * * 
 
 
 4- 
 
 
 * 
 
 
 * 
 
 EL 
 
 * 
 
 
 * 
 
 * 
 
 
 * 
 
 
 ■i 
 
 * 
 
 * 
 
 * 
 
 * 
 
 *n : 
 
 * 
 
 
 * 
 
 * 
 
 
 * 
 
 ^ 
 
 ♦L 
 
 * -,r- ^ 
 
 c 
 
 n 
 
 •* 
 
 " HI 
 
 
 *C 
 
 * 
 
 
 
 5L 
 
 * 
 
 
 •j 
 
 t 
 
 * 
 
 * 
 
 * 
 
 •n : 
 
 x 
 
 
 * 
 
 * 
 
 
 *■ 
 
 
 >L 
 
 * * 
 
 
 4- 
 
 
 
 *T 
 
 -* 
 
 
 * 
 
 * 
 
 HI 
 
 
 * 
 
 *- 
 
 
 x 
 
 * 
 
 * 
 
 *L : 
 
 * 
 
 
 * 
 
 * 
 
 
 * 
 
 ■j 
 
 ♦ 
 
 * * 
 
 r " 
 
 bjl : 
 
 *l 
 
 
 Figure l6: One Possible Tesselation of the Plane by the 
 Generalized Line, L = ((0,0), (1,-1), (1,0), 
 (1,1), (2,0)). 
 
1+1+ 
 
 memory modules that allows conflict free access to all instances of 
 either generalized line. However, there is an alternative tesselation 
 for Lv, given in Figure 17. If this tesselation had been used, along with 
 the only tesselation for L , then the incorrect conclusion, that there 
 is no valid skewing scheme for the collection {L ,L }, might have been 
 drawn. The statement of Theorem 6 only requires the existence of a set 
 of tesselations which also satisfy an additional condition. Thus if more 
 than one tesselation exists for some of the generalized lines in the 
 collection, they must all be tried before concluding that no valid skewing 
 scheme exists. 
 
 2.1+ Existence and Construction of Valid Periodic Skewing Schemes 
 
 As was pointed out in Chapter 1, periodic skewing schemes are 
 a valuable subset of all skewing schemes, since, for restricted values of 
 N, a reasonable amount of additional hardware permits address computation 
 by table look-up. For this reason it would be nice if the theorems in 
 the last two sections could be restricted to determine the existence of 
 valid periodic skewing schemes. 
 
 The essential idea of the needed alteration is depicted in 
 Figures 13 and 18. Consider the memory storage map, infinite in both 
 directions, defined by a periodic skewing scheme. If the plane is 
 partitioned into NxN squares, where N is the number of memory modules, 
 then the memory map defined by the skewing scheme is identical within 
 each partition. The bold lines in Figure 13 illustrate this. The main 
 point to be observed is that when an instance of a generalized line extends 
 over one of the partitioning lines, instead of considering the instance to 
 be as in Figure 18(a), it can be considered to be as in Figure l8(b) . 
 
^5 
 
 1*11 
 
 
 
 
 *l 
 
 1 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 
 
 
 
 
 X 
 
 
 
 
 
 * 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 1 
 
 1* 
 
 
 
 
 
 X 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 
 
 
 
 1 
 
 1* 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 II 
 
 X 
 
 1* 
 
 
 
 
 
 * 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 
 X- 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 * 
 
 
 
 ■* 
 
 
 * 
 
 
 
 
 
 X- 
 
 
 
 
 
 X 
 
 
 1*- 
 
 
 
 
 ■X 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 1* 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 X 
 
 1* 
 
 
 
 
 
 X 
 
 
 
 * 
 
 
 X 
 
 
 
 
 X 
 
 
 
 
 
 
 
 X- 
 
 
 
 
 
 
 X- 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 X 
 
 
 
 
 Jx 
 
 ■ II* 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 X- 
 
 
 X 
 
 1 
 
 1* 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 
 
 * 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 * 
 
 
 
 
 
 * 
 
 
 ' 
 
 
 
 X 
 
 
 
 
 X 
 
 
 
 
 
 
 
 
 * 
 
 
 
 
 X 
 
 
 
 
 
 X 
 
 
 
 
 
 X- 
 
 
 
 * 
 
 
 
 
 
 X 
 
 
 
 
 1 
 
 1* 
 
 
 
 
 
 X 
 
 
 
 Figure l'J: Another Possible Tesselation of the Plane by 
 the Generalized Line, L= ( (0,0), (1, -1), (1,0), 
 (1,1), (2,0)). 
 
1*6 
 
 M 
 
 (b) 
 
 Figure 18: The "Wrap Around" Interpretation of a 
 Generalized Line Used with Periodic 
 Skewing Schemes. 
 
^7 
 
 This view is appropriate in determining the validity of a periodic 
 skewing scheme, since in Figure 18(h) the data elements of the instance 
 that are on the left are stored in the same memory modules as the data 
 elements of the instance that extend beyond the partition line in 
 Figure l8(a) . This situation has been referred to as "wrapping around." 
 This can occur over horizontal partitioning lines as well. Since there 
 are no special properties used when an instance of a generalized line 
 wraps around, the opposite edges of the N xN square can be identified, 
 resulting in a torus. The entire problem of finding valid periodic 
 skewing schemes can be recast into the framework of looking for valid 
 skewing" schemes on the torus formed by identifying opposite edges of the 
 N xN square. 
 
 Definition 9, and Theorems k, 5> and 6 carry over to the torus 
 with only minor modification. 
 
 Theorem 7: Given a generalized line of length M, 
 N memory modules, and a periodic skewing scheme, the 
 skewing scheme is valid for this generalized line if 
 and only if the following condition holds for every 
 k e (0, 1,2, . . .,N-1} : When all instances of the 
 generalized line on the torus (formed by identifying 
 opposite edges of the Nxlf square) are considered, in 
 which the designated element of the instance is mapped 
 into memory k, no two of the instances have an element 
 in common. 
 
 Definition 10: A generalized line tesselates the torus 
 (formed by identifying the opposite edges of an NxN 
 square) if and only if there exists a collection of 
 
14-8 
 
 instances of the generalized line, so that every 
 ordered pair on the torus is in one and only one 
 of these instances. 
 
 Theorem 8: Given N memory modules and a generalized 
 line of length N, there is a valid periodic skewing 
 scheme for this generalized line if and only if it 
 tesselates the torus (formed by identifying opposite 
 edges of the NxN square). 
 
 Theorem 9: Given N memory modules and a collection 
 
 of generalized lines, (L ,L , ...,L }, all of length 
 
 N, then there is a valid periodic skewing scheme for 
 
 this collection if and only if there exists tesselations 
 
 of the torus (formed by identifying opposite edges of 
 
 the NxN square), T ,T , ...,T , such that T. is a 
 
 tesselation using L. and CL = 0. = . . . = . where 
 
 i 12 P 
 
 0. = [designated elements of the instances of L. 
 comprising T. } . 
 
 The proofs of these theorems are identical to those of 
 Theorems k, 5, and 6, only the arithmetic must be done in the residue 
 classes mod N. When drawing pictures to determine the existence of valid 
 periodic skewing schemes only an N xN square is needed as long as instances 
 extending beyond the bounds of the square are wrapped around. 
 
 A few additional comments are in order before closing this 
 section. In working on the torus, as in the plane, the realization of a 
 generalized line, as a rigid template composed of unit squares, need not 
 
k9 
 
 An interesting question which can be asked is: Can the 
 situation "be restricted still further to determine the existence of 
 valid linear skewing schemes? The answer to this question is not 
 fully known. Notice that the skewing scheme given in Figure 13 is 
 periodic, "but not linear. However, a valid linear skewing scheme does 
 exist for the generalized line whose geometric realization is 
 
 ■shaped. One such is illustrated in Figure 19. Only the 
 
 N xN square is shown, since linear skewing schemes are periodic, and 
 thus only the N xN square with wrap around need be considered. Other 
 generalized lines, like L, = ( (0, 0), (0, l), (2,1), (2,2) ), whose geometric 
 
 realization is the disconnected shape , provide examples of 
 
 generalized lines for which valid periodic skewing schemes exist, but for 
 which no valid linear skewing schemes exist. Figure 20 implies the 
 .existence of a valid periodic skewing scheme. Trying all possibilities 
 for a and b in cp(i, j) = ai+bj mod N, where, since N = h, a and b need only 
 run over 0, 1, 2, and 3, eliminates the existence of valid linear skewing 
 schemes. The general question of when a valid periodic skewing scheme for 
 a collection of generalized lines implies a valid linear skewing scheme 
 appears quite difficult. Chapter 3 investigates this question for 
 [x^yLy-lines . The general problem is discussed again in Chapter K. 
 
50 
 
 
 
 1 
 
 2 
 
 3 
 4 
 5 
 
 6 
 
 2 
 
 5 
 
 1 
 
 4 
 5 
 
 
 1 
 
 3 
 4 
 5 
 
 6 
 
 2 
 
 
 1 
 
 3 
 4 
 5 
 
 6 
 
 1 
 
 3 
 
 6 
 
 2 
 
 4 
 
 5 
 
 
 1 
 
 3 
 4 
 5 
 
 6 
 
 2 
 
 
 1 
 
 3 
 4 
 
 6 
 
 2 
 
 6 
 
 2 
 
 
 
 3 
 
 a = 1, b = y, N = 7 
 
 Figure 19: A Valid Linear Skewing Scheme for the 
 generalized Line, L = ( (0, 0), (0, l) , 
 (0,2), (1,1), (2,0), (2,1), (2,2)). 
 
51 
 
 ■ I i 
 
 i . . i 
 
 LlII 
 
 A 
 
 / 
 / 
 
 7 / 
 
 / 
 
 NX. 
 
 
 \ 
 
 
 i 
 
 V 
 
 A 
 / 
 
 K'A 
 
 \ 
 \ 
 
 \ 
 
 I I . I 
 
 \ 
 
 M 
 
 II I 
 
 'l 
 
 \ 
 \ 
 
 \ 
 
 \ 
 
 \ 
 
 - 
 
 Figure 20: Proof of the Existence of a Periodic 
 Skewing Scheme for L = ((0,0), (0,1), 
 
 (2,1), (2,2)). 
 
52 
 
 3. SPECIAL RESULTS ON [x,y] N - LINES 
 
 3-1 Introduction 
 
 In Chapter 2, geometric conditions were developed which aid 
 in determining if valid skewing schemes exist for collections of 
 generalized lines. In this chapter, only collections of [x,y] -lines 
 are considered. The highly structured nature of [x, y] -lines permits 
 additional results to be obtained. The main result, which will be proved 
 over the course of the next several sections, is 
 
 Theorem 10: Given N memory modules and a collection 
 of [x,y] N -lines, ([x^y^-lines |i=l,2, . . .,1), then 
 there is a valid periodic skewing scheme for the 
 collection if and only if there is a valid linear 
 skewing scheme for the collection. 
 
 The if direction is trivial. The only if direction, however, is 
 a rather surprising result, since the number of valid periodic skewing 
 schemes for a collection of generalized lines usually greatly exceeds the 
 number of valid linear skewing schemes. The proof technique is a 
 generalization of an argument used by Polya for a restricted subcase [13] • 
 
 3.2 Preliminaries 
 
 As was discussed in Chapter 2, when dealing with periodic 
 skewing schemes it is convenient to replace the plane with the torus 
 formed by identifying opposite edges of the NxN square. An instance of 
 an [x,y] -line can now be viewed as having its first component at (i,j) 
 
53 
 
 and with successive components located by going over x and down y on 
 the torus, until a total of N points are generated. Figure 21 illustrates 
 this construction. It might happen that in generating these N points 
 two of them will coincide. If this should occur, then there can be no 
 valid periodic skewing scheme using N memory modules for this [x, y] -line, 
 since there are two distinct matrix elements, in the same instance of 
 this [x, y] -line, which must be assigned to the same memory module by any 
 periodic skewing scheme. The condition that must be imposed is simple. 
 
 Lemma 1: If (x.,y.,N) f 1, then there is no valid 
 
 periodic skewing scheme using N memory modules for 
 
 the generalized line, L, the [x. ,y. ] -line. 
 
 Proof: Suppose (x.,y.,N) = s > 1. Then L(c,d) = ((c,d), 
 (c+y ,d+x ), ..., (cA.,d+i ), ..., (c+(N-l)y ,d+(N-l)x ) ) . Note that 
 
 XX o X o X X X 
 
 ^ ^ A N "4. A 4-V 4- N ^ KT S - T 4> T.-4- 
 
 — , — , and — are integers and that — < N-l. If cp is an arbitrary 
 
 s ' s ' s s 
 
 periodic skewing scheme then cp(c,d) = cp(c+N — ,d+N — ) = cp(c + — y. , d +— x. ) • 
 
 S S S 1 S X 
 
 Since (c,d) and (c + — y.,d + -x. ) are components of the same instance of L, 
 
 S X S X 
 
 cp is not valid. ■ 
 
 Thus, given a collection of [x,y] -lines {[x.,y. ] -lines | 
 i=l,2, . . ., I}, if even one of the [x. ,y. ] -lines is such that (x.,y.,N) ^ 1, 
 then there are no valid periodic skewing schemes, using N memory modules, 
 for this collection. (x.,y.,N) = 1 for i=l,2, ...,I, is, therefore a 
 necessary condition for the existence of a valid periodic skewing scheme. 
 It is not a sufficient condition, however. 
 
 When (x.,y.,N) = 1 some simplifications are possible. In 
 general, given an arbitrary generalized line, L, of length N, there are 
 N distinct instances of L on the torus. However, letting L be the 
 
 t/ 
 \ x i^y-^N)is the greatest common divisor of x-, y. and R. 
 
5^ 
 
 
 
 1 1 
 1 
 
 1 
 
 1 1 
 
 
 1 
 
 k 
 
 
 
 5 
 
 2 
 
 
 
 3 
 
 6 
 
 An example of an instance of an [x, y] -line 
 
 on the torus. The order in which the elements 
 were generated is indicated on the figure. 
 
 i = 1, j=2, x = 3, y = 2, 
 
 N 
 
 Figure 21: [x,y] -lines on the Torus 
 
55 
 
 [x.,y. ] -line, on the torus the components of L(c,d), L(c+y. ,d+x. ), 
 . .., and L(c+(N-l)y. ,d+(N-l)x. ) are the same. The ordering of the 
 components within these instances is different, but for purposes of 
 determining the validity of a periodic skewing scheme these N instances 
 can be regarded as one instance. Thus, for [x, y] -lines, with (x, y, N) = 1, 
 the number of instances on the torus is effectively N, instead of Or. 
 Throughout the rest of this chapter an [x,y] -line, with (x,y,N) = 1, 
 will be regarded as having only N instances on the torus. Notice that 
 no two of the N instances have any elements in common and since the torus 
 has IF elements, every element on the torus is in one and only one instance 
 It is possible to characterize these N instances. 
 
 Lemma 2: Given an [x,y] -line, with (x,y, N) = 1, 
 each of the N instances of the [x, y] -line can be 
 characterized by an integer in {0, 1, 2, . . ., N-l), in 
 the following manner: If (w, z) is a component of 
 an instance of the [x, y] -line, characterize this 
 instance by xw-yz mod N. 
 
 Proof: The proposed characterization is a function from the N 
 instances of the [x,y] -line to (0,1,2, ...,N-1}. First it must be shown 
 that this is indeed a well-defined function. Let (w, z) and (w',z') be 
 different components of the same instance of the [x, y] -line. Then 
 w' = w+vy and z 1 =z+vx, and xw'-yz' = xw+xvy-yz-yvx =xw-yz, so, in 
 fact, the mapping defined is a function. 
 
 Additionally, the function is a one-to-one correspondence. The 
 pigeonhole principle implies that to see this it suffices to show that 
 for any i there exists w and z such that xw-yz =i. It is a well-known 
 
 Congruences are mod N, unless otherwise indicated. 
 
56 
 
 result in number theory that given x and y there exists c and d such 
 that xc-yd = (x,y) [7]. Since ((x,y),N) = (x,y,N) = 1 and the residue 
 classes of numbers relatively prime to N form a group under multiplication, 
 there exists g such that (x,y) 'g=l. Hence xcg-ydg = 1, and thus 
 xcgi-ydgi= i. Letting w = cgi mod N and z = dgi mod N gives the needed 
 w and z . B 
 
 3.3 The Special Case of a Prime Number of Memory Modules 
 
 Theorem 10 is easy to prove if N, the number of memory modules, 
 
 is a prime. 
 
 Lemma 3: Given N memory modules, N a prime number, 
 
 and a collection of [x.y] - lines, ([x. ,y. 1 - lines I 
 
 ; " N 7 1 1 N 
 
 i=l,2, ...,I}, then there is a valid periodic skewing 
 scheme for the collection if and only if there is a 
 valid linear skewing scheme for the collection. 
 Proof: As remarked earlier, the if direction is trivial. - If cp 
 is a valid periodic skewing scheme then exactly N points on the torus, 
 formed by identifying opposite edges of the N xN square, will be mapped 
 by cp into memory module zero. This is so because exactly one component 
 of each of the N disjoint instances of the [x , y ] -line must be mapped 
 to zero by cp. Without loss of generality, the element (0,0) can be assumed 
 to be one of these elements. Consider any other element mapped into 
 memory zero, say the element (y, x). Construct the a and b for the linear 
 skewing scheme as follows: 
 
 if y s then a = 1, b = 
 
 else if x = then a = 0, b = 1 
 
 else b = 1, a = -y x , 
 
 
57 
 
 /N— 1 /S 
 
 where y " is the multiplicative inverse of y in the field of residue 
 classes mod N. (Note that y ^ and N is prime.) 
 
 The claim is that cp(c, d) = ac+bd mod N is a valid linear 
 skewing scheme. Suppose not, i.e. cp(c,d) = cp(c+vy., d+vx. ) for some 
 c,d, ie[l,2, ...,!}, and Ve{l,2, . . .,N-1} . This implies avy.+bvx. =0, 
 but since N is a prime and v f- 0, this last equation implies that 
 ay.+bx. = 0. Three cases will be examined to show that av.+bx. = 
 contradicts the validity of cp. 
 
 Case 1: y = 0. ay.+bx. = reduces to y. = 0, since a = 1 
 
 11 i ' 
 
 and b = 0. Since, by Lemma 1, (x.,y.,N) = 1, x. ^ 0. It follows that 
 
 x7 exists in the field of residue classes mod N, since N is prime and 
 
 x. is non-zero. Thus (y,x) = (0,x) = (O.xxT x. ) = (0+xx7 y. ,0+xxT' x. ) . 
 l ' ' 7 l l N l i' i i y 
 
 Now (0,0) and (0+xx. y.,0+xx. x. ) are distinct components of the same 
 
 instance of the [x.,y. ] -line, because xx. is non-zero. They are both, 
 
 li N ' i ' 
 
 however, mapped by cp to memory zero, contradicting the validity of cp. 
 
 Case 2: x = 0. ay.+bx. = reduces to x. = 0. Here y. £ 0, 
 
 11 1 Jl ' 
 
 and, in a manner similar to case 1, (y, x) = (y, 0) = (yy7 y.,0) = 
 
 (0+yy7 y.,0+yy7 x. ) . This is a similar situation to that encountered 
 
 in case 1. 
 
 Case 3: x ^ 0, y ^ 0. ay.+bx. = becomes -y~ xy.+x. = 0. 
 
 Now y. ^ 0, for y. = implies x. = and then (x.,y.,N) ^ 1, contrary to 
 
 the requirement established by Lemma 1. But y. p implies y7 exists 
 
 in the field of residue classes mod N. Thus x = yy7 x. and 
 
 l l 
 
 (y,x) - (yyT y i ,yy7 x ± ) = (0+yy~ y^O+yy" x ± ), giving rise to the same 
 contradiction as in case 1. 
 
 Thus, the assumption that cp(c, d) = ac+bd mod N is not a valid 
 linear skewing scheme led to the conclusion that cp is not a valid periodic 
 skewing scheme, contrary to the hypothesis of the lemma. ■ 
 
58 
 
 It is now possible to precisely characterize when valid 
 periodic skewing schemes exist for a collection of [x,y] -lines, if 
 the number of memory modules is a prime. 
 
 Lemma k: Given N memory modules, N a prime number, 
 and a collection of [x,y] -lines, {[x. ,y. ] -lines | 
 i=l, 2, . . ., I], then there exist a valid periodic 
 skewing scheme for this collection if and only if 
 (x.,y.,N) = 1, for i=l,2, ...,I, and either I O or 
 for all non-zero choices for g., i=l,2, . . .,N+1 and any 
 permutation, a, of (l, 2, ...,l), it is not the case 
 that (x a(l) ,y a(l) ) - (0, gl ), (x a(2) ,y a(2) ) - (g 2 ,g 2 ), 
 
 ••"' (x a(i)> y a(i) } E tei^-D'Si)' "•> 
 
 (X a(N)' y a(W) ) B (%(*-!),%), and ( x a (N + l)' y a(N + l) } s 
 
 (g N+1 ,o). 
 
 Proof: Lemma J shows that it is sufficient to prove that the 
 conditions stated above are precisely those needed to guarantee the 
 existence of valid linear skewing schemes for the collection of [x,y]. 
 
 Tl 
 
 lines. The need for the condition (x.,y.,N) = 1 for i=l,2, ...,I has 
 
 already been discussed. 
 
 The only if direction: Suppose I ^ N +1 and there exists non- 
 zero g. and a permutation, a, so that (x /,x,y /-,n) = (0,g ), . . ., 
 i o{±) o{±) l 
 
 (X a(N)^a(N) } 3 ^N^W' and ( X a (N + l)' y c(N + l) } * ( % + l> 0) ' Zt Wl11 
 be shown that for any a and b, cp(c, d) = ac+bd mod N is not a valid 
 
 skewing scheme. 
 
 Case 1: b = 0. ay o(N+1)+ bX o(N+1) ^ a -0 + • g^ * 0. Thus 
 
 (ay / s+bx / yN) 4 ~L> and -> ^y Theorem 3, cp is not a valid skewing 
 scheme. 
 
59 
 
 Case 2: b ^ 0. Because N is a prime number, (ay.+bx.,N) = 1 
 
 for all i if and only if ay. +bx. ^ for all i. Also ay.+bx. ^ for all 
 
 11 11 
 
 i if and only if ab~ y.+x. jk for all i. Now ab~ = j for some 
 
 J6{Q,1,2, ...,H-1}. Let k = -J. Then a*"V ff(k+l) « fl(k+l) s ab "\ + i + ^ + i k : 
 
 j g k+l"j g k+l = °> where (x a(k+l)' y a(k + l) ) s ^k+l^^+l 5 was used ' Thus 
 ay.+bx. = for some i and Theorem 3 implies cp is not valid. 
 
 The if direction: (x.,y.,N) = 1 for i=l,2, ..., I and suppose, 
 
 for the moment, I § N +1, but for all non-zero choices for g. and any 
 
 permutation, a, of (l, 2, ...,l), it is not the case that (x / v.y ,.*) = 
 
 av-LJ a(.l; 
 
 (0, gl ), ..., (x a(N) ,y a(N) ) - (g N (N-l),g N ), and {* g{lsML y7 g{v+l) ) = (% +1 ,0) 
 
 A valid linear skewing scheme will be defined for the collection of 
 
 [x,y] N -lines. 
 
 Case 1: Suppose that there exists je{0, 1, 2, . . .,N-1}, such that 
 
 (x.,y.) ^ (g. -,j,g., 1 ) for all i and all non-zero choices of g. , i.e. 
 l l j +1 j +i j +1 
 
 the reason a permutation cannot be constructed with (x , ^y , ,) = (0, g ), 
 
 — ' (X a(N)^ y a(N) ) " (%(*-!)*%)' *nd ( x a( N+l)' y a(N+l) } s (g N+l> 0) ls 
 that there is no possible choice for a(j+l). (Note that when attempting 
 
 to construct permutations for which (x /-,\>y /-,\) - (0>g-,), •••> 
 (X a(N)^a(N) } S ^(^W' and ^ +1 y7 a ^ +l) ) = C^^O), (x.,y.) 
 can be congruent to only one of (0,g 1 ), (g 2 ,g 2 ), ..., (g N (N-l),g N ), and 
 (g N+1 , 0).) Take a = -j and b = 1. Now if (ay +bx_, N) ^ 1 for some k, 
 then ay +bx, = 0, since N is prime, and hence -jy, +x, = 0. Then 
 
 ^V y k^ ~ ( J ' y k> y k^ contrar y to the assumption that (x ± ,y ± ) £ (g. +1 J,g. +1 ) 
 for any i and any non-zero g. , since g. can be taken to be y . Hence, 
 (ay.+bx.,N) = 1 for all i and cp(c, d) = ac+bd mod N is valid. 
 
6o 
 
 Case 2: For any i and any non-zero g N+1 , (x^y^) ^ (%+]_> °)' 
 This is similar to case 1, 'and occurs when there is no possible choice 
 for a (N + l) for which (x ff(l) ,y a(l) ) = (0,^), ..., (x ff(N) ,y ff(]|) ) s 
 
 (g N (N-l),g N ), and (x a(N+l) ,y a(N+l) ) ■ (g N+1 ,0). Take a = 1 and b = 0. 
 Again, suppose (ay +bx , N) / 1 for some k, then ay +bx, = reduces to 
 y k s 0, and (x^y^ = (\>°) ~ (% +1 '°) where g N+1 is taken as x fc . This 
 contradicts the assumption that (x.,y.) ^ (g , 0) for all i and all non- 
 zero choices of g . Thus cp(c, d) = ac+bd mod N is valid. 
 
 Since these two cases exhaust all the reasons why a permutation, 
 a, cannot be constructed so that for some non-zero choices of g. 
 
 (X a(l)^ y a(l) ) 3 <°'8i>' •'•> (x a(N)^ y a(N) ) S (s^W' and 
 ^ X a(N+l) ,y a(N+l)' ) ~ ^N+l' ^ there is a valld linear skewing scheme 
 for the collection of [x,y] -lines. 
 
 If I < N the pigeonhole principle gives the same two cases, since 
 there will either exist a je{0, 1, . . ., N-l] such that for any ie[l,2, . . ., 1} 
 and any non-zero choice of g. , (x.,y.) ^ (g. -,j,g. , ) or for any 
 ie{l,2, ...,1} and any non-zero choice of g N+1 , (x ± ,y i ) ^ (g N+1 ,0). ■ 
 
 3-4 Generalization to Composite N 
 
 To complete the proof of Theorem 10 it suffices to prove two 
 statements: 
 
 1. Given N, the number of memory modules, N a 
 
 composite number, a collection of [x, y] -lines, 
 C = ([x.,y. ] -lines |i=l,2, . . ., I], and a valid 
 periodic skewing scheme, cp, for this collection, 
 using N memory modules, then if p is a prime 
 factor of N, there is a valid periodic skewing 
 
6l 
 
 scheme, cp', using only p memory modules, for 
 
 the collection of [x, y] -lines, C = {[x.,y. ] - 
 
 ' p ' x 3 1 p 
 
 lines |i=l, 2, . . ., 1} . 
 2. Given a collection of [x,y] -lines, S = ([x.,y.] - 
 lines |i=l, 2, . . ., I], and a valid linear skewing 
 scheme, cp, for S, using M memory modules, and given 
 a valid linear skewing scheme, cp', for 
 S' = ([x i ,y i ] MT -lines|i=l,2, ...,I}, using M' 
 memory modules, then there is a valid linear 
 skewing scheme, cp", using MM' memory modules, for 
 the collection of [x^]^, -lines, S" = {[x^y^^,- 
 lines |i=l, 2, . . ., 1} . 
 To see that this is sufficient note that given a valid periodic 
 skewing scheme, using N memory modules, for the collection of [x, y] -lines, 
 {[x. ,y. ] -lines |i=l,2, . . ., I}, statement 1 above, implies valid periodic 
 skewing schemes for each prime factor of N. Lemma 3 then implies the 
 existence of valid linear skewing schemes for this collection for each prime 
 factor of N, and, finally, statement 2 implies existence of a valid linear 
 skewing scheme for this collection using N memories. Because of Theorem 3> 
 statement 2 is equivalent to 
 
 Lemma 5: Given a collection of ordered pairs, 
 t(x i ,y i ) |i=l,2, .. .,1}, and given M, M 1 , a, b, a', 
 and b', such that (ay.+bx.,M) = (a'y +b'x,M') = 1, 
 for i=l,2, ...,I, then there exists a" and b" such 
 that (a"y.+b"x ,MM') = 1, for i=l,2, ...,I. 
 
 Note that the xj_ and y± are the same in both C and C, but the generalized 
 lines are different, since they have different length. 
 
62 
 
 Proof: The proof is constructive. Let jt be the product of 
 those prime numbers which divide M, but do not divide M' . Let jt' be 
 the product of those prime numbers which divide M', but do not divide 
 M. Finally, let p be the product of those primes which divide both M and 
 M' . For the sake of definiteness, in calculating it, jt' and p include a 
 prime factor in the product only once, even if it appears in the prime 
 factorization of M and/or M' several times. Also if there are no prime 
 factors from which to calculate it, it' or p set tt, tt' or p, as the case 
 dictates, to one. 
 
 Define a" = ait'p+a'Tt and b" = bjr'p+b'n. The claim is that 
 (a"y.+b"x.,MM') = 1 for i=l,2, ...,I. Suppose not, i.e. for some i, there 
 is a prime, p, for which p | a"y. +b"x. and p|MM'. If p|a"y.+b"x., 
 then p|ajt 'py.+a'ny.+bot 'px.+b'nx. . By the definitions of it, Tt * and p, 
 p divides exactly one of them. Assume p|xt. Since, p|n implies 
 p|a'Tty.+b'Ttx. , it can be concluded that p|art 'py.+bit 'px. . Since p is a prime 
 and p Jn' and p /p, it must happen that p|ay.+bx.. However, p J jt implies 
 p|M, so (ay.+bx.,M) 4 ^-t contrary to the hypothesis of the theorem. Thus 
 if p|a"y.+b"x. and p|MM', then p /it. By similar arguments p /it' and p /p. 
 But this is impossible. Hence, a"y.+b"x. and MM' have no prime factors 
 in common, i.e. (a"y. +b"x. ,MM' ) = 1 for i=l,2, ...,I. ■ 
 
 To complete the proof of Theorem 10, it is, therefore, sufficient 
 to prove statement 1, above. Statement 1 will be proved by contradiction. 
 To this end, suppose a collection of [x,y] -lines, C = {[x.,y. ] -lines| 
 i=l,2, .. .,1), is given, p is a prime number, there is no valid periodic 
 skewing scheme for C, using p memory modules, and there is a valid periodic 
 
 
skewing scheme, using N memory modules for the collection of [x, y] - 
 
 lines, C = {[x. ,y. ] -lines |i=l, 2, . . ., I}, where p|N. 
 
 Some additional technical lemmas are useful. 
 Lemma 6 : 
 
 N 
 
 63 
 
 P-l 
 o 
 
 o 
 
 l 
 
 
 
 -1 
 -1 
 
 (p-l) 
 
 -1 
 
 1 
 
 
 -2 
 
 -2 
 
 (P-D 
 
 -2 
 
 
 (p-2) 
 
 (P-2) 
 
 1 
 
 1 
 -(p-D 
 
 >-(P"l) 
 
 (p-1)-^- 2 ^ (p-l)"^ 
 
 1 
 2 
 
 i p " 2 2 P" 2 
 
 I 5 " 1 2 p - X 
 
 1 
 (P-D 
 
 (P-D 
 (P-D 
 
 p-2 
 P-l 
 
 = (P-1)I, 
 
 where all calculations are done in the field of residue 
 
 classes mod p, p a prime number. 
 
 Proof: For convenience the second matrix will be denoted by M 
 
 P 
 
 and the first by M' . Consider the first row of the product matrix. 
 
 Clearly, the first row of M' times the first column of M is p-l. The 
 
 P P 
 
 first row of M' times any other column of M is zero, since it is 
 P P 
 
 (p-l) •l+0-i+. . .+0'i P ~ +l'i P " " = (p-l)+i P ~ . Fermat's theorem [7] states 
 that, if i / 0, f i P = 1. Thus p-l+i p_1 - p-1+1 = p = 0. Thus the first 
 
 Note that all equations are in the field of residue classes mod p, in 
 particular, = mean = . 
 
Q± 
 
 row of the product matrix is [p-1 ... 0] . Now consider any other 
 row of M', [0 i " i"' ... i ], times any column of M , 
 [1 j j 2 ... j P_1 ] T . The product is 0+i' 1 j+i" 2 j" 2 +...+i- (p - l) j p - 1 = 
 (i j)+(i j) +...+(i o) • 
 
 In general (x+x +...+x p ') (x-l) = x-x, but since p is prime, 
 Fermat's theorem implies x = x for any x. Thus (x+x +...+X ) (x-l) = 0. 
 But in a field, the product of two numbers is zero if and only if at 
 least one of them is zero. Thus if x / 1, x+x + ...+x P ~~ = 0, and, clearly, 
 if x = 1, x+x 2 +...+x p ~ = p-1. Replacing x by i -1 .i, and noting i -1 j - 1 if and 
 only if i = j, gives that the off-diagonal elements of the product 
 
 matrix are zero and the diagonal elements are p-1. Thus M' xM = (p-l)l. ■ 
 
 p p 
 
 Corollary 1: det(M ) ^ 0, where the calculations 
 are performed in the field of residue classes mod p, 
 p a prime number. 
 
 Proof: (p-l)M' is a left inverse for M , since (p-l) 
 
 2 
 p -2p+l = 1. But if a matrix has a left inverse, the left inverse is a 
 
 two-sided inverse, and the matrix has a non-zero determinant [8]. ■ 
 
 p-1 r rp-1 if r= p-1 
 Corollary 2 : Z i = S , 
 
 i=0 P Lo if r - 0,1,2, ...,p-2 
 
 where = 1, by convention. 
 
 Proof: While, in general, matrix multiplication does not 
 
 commute, M' xM = M xM', since M' is (almost) the inverse of M . Note 
 P P P P P P 
 
 that the last column of M' is all ones, since i~^ p " = (i ) p " L = l, by 
 
 p-1 
 Fermat's theorem. Thus ( 2 i ) mod p is just the r+1 row of M times 
 
 i=0 P 
 
 the last column of M' . Since M xM' = (p-l) I, the last column of the 
 
 P P P 
 
 T 
 product matrix is [0 ... p-1] , and the corollary is proved. ■ 
 
65 
 
 P -1 r f-P if r = p-1 
 Lemma 7: z i = -l , for e a l. (l) 
 
 i=0 P lo if r = 0,1, . ..,p-2 
 
 Proof: The proof is by induction on e. For the basis case, 
 
 p-1 r r-p = -1 = p-1 if r = p-1 
 e =1, formula (l) reduces to Z i = < 
 
 i=0 P lo if r =0,1,2, . ..,p-2 
 
 This is just Corollary 2. 
 
 Therefore, assume that the result is true for e' = e-1. Since 
 
 formula (l) is clearly true for r = 0, r will be assumed greater than zero 
 
 throughout the remainder of the proof. Now, 
 
 p e -l p e_1 -l p-1 p-1 p e_1 -l 
 
 Z l T = S Z (jp+i) r = Z Z (jp+i) r • (2) 
 
 ,2=0 j=0 i=0 i=0 j=0 
 
 Expanding (jp+i) by the binomial expansion and rearranging the 
 terms in the sum gives 
 
 e e-1 
 
 „r „ I " /T\ .r-k r-k.k /vN 
 
 Z t = Z Z Z (,)o P i • (3) 
 
 1=0 k=0 i=0 j=0 
 
 Isolate an inner sum, 
 
 e-1 -, e-1 •] 
 
 p " /r N .r-k r-k.k ,r N r-k.k p „ .r-k ,, x 
 
 S L)o p i = (, )p l Z o (4) 
 
 j=o k k j=0 
 
 By the induction assumption, 
 
 e-1 , e-2 . . 
 
 P -1 r k f-P if r-k = p-1 
 
 S J ' %-l I 
 
 0=0 P LO if r-k = 0,1, ...,p-2 
 
66 
 
 Thus, 
 
 P „ "\r-k 
 
 £ a 
 
 e "2 e ~l -^ -, 
 
 •P +a r k P if r-k = p-1 
 
 (5) 
 
 a r,k P 
 
 e-1 
 
 if r-k = 0,1, . . .,p-2 
 
 for some a , integer. By substituting (5) into (k) it is possible to 
 r , K 
 
 determine the value of the sums in (3). 
 
 Case 1: Middle terms of the binomial expansion, r > k > 0. 
 In this case 1 g r-k < p-1. Thus 
 
 e-1 
 
 ,r N r-k.k p " .r-k 
 , V )P i £ J 
 
 k 3=0 
 
 ,z\ r-k.k e-1 _ . 
 ( k )P l a^ k P = pe 
 
 (6) 
 
 p-1 p-1 , 
 
 Thus for k=l,2, ...,r-l, £ £ ©J P i = °- 
 
 i=0 j=0 k p e 
 
 Case 2: The first term of the binomial expansion, r > k = 0, 
 Here, again substituting (5) into (k) , 
 
 v v P 6 " 1 -! v P 6 " 1 -! 
 ,T\ r-k.k ^ „ .r-k r .r 
 
 ( V )P i Z J - p £ J 
 k j=0 d=0 
 
 r, e-2 e-lx ._ . 
 p (-p +a, Q p ) if r = p-1 
 
 r e-1 
 P a r,0 P 
 
 (7) 
 if r =0,1, ...,p-2 
 
 Now in (7), p oc n p = 0, since r § 1. Also in (7), 
 
 r,u p e 
 
 p (-p " +a n p ~ ) = if r = p-1 ^ 2, that is p 1 3- Thus (7) becomes 
 r,0 p e 
 
 -p e_1 if p = 2 
 
 e-1 
 
 ,r^ r-k.k " .r-k 
 
 ( k )p i Z 3 = e <l 
 
 j=0 P [_0 otherwise 
 
 (8) 
 
67 
 
 -l e_1 -i 
 
 T* T* — Tc Y*— Tc Tc 
 
 Thus for k = 0, Z Z (, )j p i = 0, unless p = 2. But, 
 i=0 j=0 K p e ■ 
 
 p-1 p -1 
 
 r N .r-k r-k.k 
 
 even if p = 2 the sum is zero mod p , since S Z (, )j p " i 
 
 i=0 j=0 
 
 p-1 
 
 reduces, by use of (8), to Z -p , which is -p ~ *p = 0. 
 
 i=0 P e 
 
 Case 3: The last term of the binomial expansion, r = k > 0, 
 
 -1 e_1 -l 
 
 /i\ .r-k r-k.k 
 
 Here, Z Z L)3 p i reduces to 
 
 i=0 j=0 
 
 1 e " 1 n i 
 
 P-1 p -1 , p-1 
 
 ^ r .r e-1 .r 
 
 Z Zi=p Si = i 
 
 e-1 
 
 p (p-l+a p) if r = p-1 
 
 i=0 j=0 
 
 i=0 
 
 , (9) 
 
 e-1 
 
 P r, r 
 
 if r=0,l, ...,p-2 
 
 where Corollary 2 is used in obtaining the last equality. This reduces 
 further to 
 
 e-1 . 
 
 p-1 p -1 
 
 S 
 
 Z i = _ < 
 
 i=0 o=0 
 
 pe 
 
 if r = p-1 
 
 if r=0,l,2, ...,p-2 
 
 (10) 
 
 T* t* — Tc T* — "k" Tc 
 
 Summing Z Z ( v )j p i over all k, and using the 
 
 i=o o=o k 
 
 results of cases 1, 2, and 3, completes the proof of the lemma. ■ 
 
 Lemma 8: If p e |N, p e+1 /n and e ^ 1 then p e / S i P_1 
 
 i=0 
 
68 
 
 N-l 1 N-l 
 
 Proof: p e | £ i P ~ if and only if E i P ~ e 0. Now 
 i=0 . i=0 P e 
 
 N_1 -P-l N p " 1 .p-l e+1 »_ . .. v N ^ T 
 
 Ei - — E i • p / N implies p / — and, by Lemma 7, 
 
 i.o P e P e i=0 P e 
 
 P 6 -l i i w P 6 -i 1 
 
 E i^ 1 - e -p 6 - 1 . Ihus iL E i^ 1 ^ e 0. ■ 
 i=0 P P 1=0 P 
 
 It is now possible to prove Theorem 10. 
 
 Proof of Theorem 10: As was pointed out earlier, all that there 
 remains to prove is the first of the two statements found at the 
 beginning of this section. To this end, suppose a collection of 
 t x >y] -lines, p a prime factor of N, is given, C = {[x.,y.] -lines | 
 i=l, 2, . . ., I], so that there is no valid periodic skewing scheme for this 
 collection, using p memory modules. The assumption of the existence of 
 a valid periodic skewing scheme, using N memory modules, for the collection 
 of [x,y] -lines, C = ([x. ,y. ] -lines |i=l, 2, . . ., I}, will lead to a 
 contradiction . 
 
 Suppose a valid periodic skewing scheme exists for C using N 
 
 memory modules. Call it cp. Then exactly N points on the torus must be 
 
 mapped by the skewing scheme into memory module zero, one element from 
 
 each of the N disjoint instances of the [x ,y ] -line. Let this set of N 
 
 points be { (u.,v. ) | j=0, 1, . . .,N-1} . Because cp is assumed to be a valid 
 
 periodic skewing scheme for C, using N memory modules, for any 
 
 rx. ,y. ] -line in the collection each of the (u.,v.) is a component of a 
 
 different instance of the [x. ,y. ] -line. By Lemma 2, {x.u.-y.v. mod n| 
 
 i' l N ' l j l J ' 
 
 j=0,l, ...,N-1} = {0,1,2, ..., N-l}, for 1=1,2, ...,I. 
 
 N-l x 
 
 Thus, by Lemma 8, p e / E ((x.u.-y.v.) mod N)^ , for i=l, 2, ...,I, 
 
 j =0 1 3 10 
 
 where e is chosen so p e |N, p e+1 /n and ell, this last since p|N by 
 
69 
 
 N-l 
 
 p-1 
 
 assumption. Note that Z ((x.u.-y.v.) mod N) " does not depend on the 
 
 j=0 
 
 i J i J 
 
 choice of i. This sum will be denoted by E. 
 
 Since there is assumed to be no valid periodic skewing scheme 
 for G using p memory modules, by Lemma k either (x.,y.,p) / 1 for 
 some i, or I I p +1 and there exist non-zero g. and a permutation, a, 
 for which (x a(l) ,y a(l) ) - p (0,^), ..., (x a(p) ,y a(p) ) = , (g p (p-D,g p ), 
 
 aM (x a( P+ l)^a( P+ l) ) "p (g p+l ,0) ' N ° W lf ( V y i' p) * 1 f ° r SOme i? 
 then, since p|N, (x.,y.,N) f 1 also. This contradicts the assumed 
 
 validity of cp. Therefore, I must be greater than or equal to p+1 and 
 
 there must exist non-zero g. and a permutation, o, with the required 
 
 properties. Without loss of generality, assume a is the identity 
 
 permutation. 
 
 Consider the system 
 
 p-1 
 
 P-2, 
 
 y, 
 
 p-i 
 
 P-2, 
 
 y i X l y 2 X 2 
 
 p-1 p-1 
 X l V 
 
 — 
 
 
 1 — — 
 
 p-1 
 y 
 ^p 
 
 
 7 1 
 
 p-2 
 
 p p 
 
 
 7 2 
 
 p-1 
 
 X ^ 
 
 p ' _ 
 
 
 7 
 
 _ p_ 
 
 A 
 
 y 
 
 p-i 
 
 p+i 
 
 p-2 
 y p+l X p+1 
 
 p-1 
 
 P+1 
 
 (11) 
 
 where the matrix is called M, A = det(M), and R = [y p ~ ... x ] , 
 
 T 
 and r = [y n 7 n ... n are unknowns to be determined. 
 '1 2 'p J 
 
 An important question is: Does this system have a solution, 
 and, if so, is it unique? The answer to both parts of this question will 
 be yes if A / 0. In order to prove A ^ and to obtain some information 
 about the form of the y., the system in (11 ) can be converted into a 
 similar system in the field of residue classes mod p. 
 
70 
 
 When the elements of M are replaced by their values in the 
 
 field of residue classes mod p, the resulting matrix is the M of 
 
 P 
 
 Lemma 6. To see this observe that M. . = y. x. " " = g. (g.(j-l)) " ' =« 
 
 i,J J 3 P J J P 
 
 g, P ~ (j-1) 1 =_ (j-1) 1 " = M , where (x ,y ) s (g.(j-l),g.) was used, 
 j y irjt ^ j j y j a 
 
 (recall a was assumed to be the identity permutation), and g. " = 1 by Fermat's 
 
 theorem. This observation justifies the choice of the name M in Lemma 6. 
 
 P 
 
 Also notice that A mod p = det(M ) where the determinant of M 
 
 P P 
 
 is calculated in the field of residue classes mod p. This is nothing more 
 
 than observing that the mod operator and the det operator commute. Because 
 
 A mod p = det(M ), A is a reasonable notation to use for either of these 
 ^ v p ' p 
 
 quantities. In a manner similar to that used in the proof that M is 
 
 converted to M by the mod operator, R is converted to R = [0 ... 1] . 
 
 By Corollary 1 of Lemma 6, A 4. 0, so A / 0. Thus the system 
 
 P V 
 
 MT= AR and the system M X= A R have unique solutions. The reader is 
 
 P P P 
 
 cautioned that despite the fact that both systems have unique solutions, 
 
 it is not obvious that the mod operator applied to the y. converts P to X . 
 
 The reason for this is that the y. might not be integers, i.e., if the y. 
 
 are only rational it makes no sense to consider y. mod p. If, however, 
 
 all the y. are integers then X will in fact be r , the column vector 
 
 obtained by replacing y. with y. mod p. 
 
 It is possible to show, however, that y. is an integer by solving 
 
 MT= AR by means of Cramer's rule. When using Cramer's rule, y. is 
 
 calculated by replacing column i of M by AR, getting a new matrix, M. , 
 
 det(M. ) 
 i 
 and then y. = . 
 
 1 det(M) 
 
71 
 
 7± 
 
 By using common rules for manipulating determinants [8], 
 
 Adet(M.' ) 
 
 - - det(M.' ), where M.' is the matrix formed by replacing 
 
 A 
 
 column i of M by R. Since every element of Ml is an integer, det(M. ) 
 is also an integer. Thus r consists solely of integers, and X = r • 
 
 In order to complete the argument, by arriving at the 
 contradiction mentioned at the beginning of the proof, it is convenient 
 to determine the form of r. This can be done by first determining the 
 form of r • This is easily done directly. In the proof of Corollary 1 
 
 of Lemma 6, it was shown that M " " = (p-l)M 1 . As noted earlier 
 
 ' P P 
 
 R = [0 ... 1] , sor = (p-l)M'A R = (p-l)A 
 
 P P P 
 
 since 
 
 r (p-l) 
 
 o-(P-D 
 
 (p-1) 
 
 -(p-1) 
 
 1 
 
 -(p-1) 
 
 r(P-D 
 
 (P-D 
 
 -(p-i: 
 
 is the last column of M' . But by Fermat's theorem 
 
 P 
 
 X -(P-D 
 2 -(p-D 
 
 (P-1)- (P - 1} 
 
 , sor = 
 
 A (p-1) 
 P 
 
 A (p-1) 
 p 
 
 A (p-1) 
 p . 
 
 A (p-1) 
 p 
 
 Renaming 
 
 A (p-1) = c, it follows that r 
 
 distinct. 
 
 c +& X P 
 c+6 2 p 
 
 c +6 p 
 P 
 
 , where the S. may all be 
 
72 
 Finally consider the sum 
 
 N-l , N-l 
 
 (c^zj^.-y^.)*- -'-(c + P6 p )^ o (x p u.-y p v.)P- % 
 
 (c+p6 1 )Z + (c+p6 2 )E + ••• + (c+p& )Z s^ pcE + p H(& 1 +6 2 +- • •-* ) ^ 
 
 p-V, where V = & n +&~ + •••46 +c 
 12 p 
 
 Now expand the sum another way. 
 
 N-l N-l 
 
 (c+po ) Z (x u -y v ) p ~ +..-+(c+p5 ) Z (x u -y v ) P ~ = 
 - 1 - t_q J <J " i=0 
 
 ., N-l N-l 
 
 (c + p5 1 )x L P " 1 Z u P_1 + (c +P & 1 )(p-l)(-l)x p - 2 y E u P " 2 v. + ... + 
 j=0 J j=0 J J 
 
 (c +P 6 1 )(-l) p - 1 y 1 P ' 1 "^ P " 1 I 
 
 i N" 1 i o N " X o 
 
 + (c +P 5 p )x p - 1 Z u p - X + (c+p5 )(p-l)(-l)x P " 2 y Z u . p_2 v . + . . . + 
 
 j=0 J j=0 J J 
 
 . . N-l . (IS 
 
 (c +P 5 2 )(-l) p -V 2 v p - X 
 
 1 N_1 1 2 N_1 2 
 
 + (c +P 6 p )x p p - Z q u. P " + (c +P 6 p )(p-l)(-l)x p p - y p ^ u. P " v. + — + 
 
 (c +P o )(-l) p " 1 y p - 1 Vv.P" 1 
 
 P P j=0 ° 
 
73 
 
 If the terms in the sum of equation (12) are added together by 
 columns, the result is 
 
 (■i p " 1 )( i i (o ^ i)XiP "1 +(p " 1)( " 1) (Iv^ p " 2Tj )( 1 i (c+p5i)XiP " Syi 
 
 (13) 
 
 But by considering the system MF= AR, and the value for r derived above, 
 
 / ^ \ p-k k-1 . p-k k-1 , , , , . . ,....,.. 
 Z (c+p&.)x. y. = Ax _ y _ , and when this is substituted into 
 .__ v *i l i p+1 p+1 ' 
 
 (13) the sum becomes 
 
 f^u.^ 1 ) Ax^^+Cp-lX-l/^u.^v.^Ax P" 2 y , 
 
 t/N-1 \ N-l . 
 
 + ••• +(-l) P Z v. P_1 Ay P =A Z (x _u.-y n v.) P ~ 1 = T AE 
 \j =0 J / P+1 J=0 P+1 J P+1 J N . 
 
 Combining this with the form of the sum derived earlier, 
 p HV = A H or H(pV-A) = 0. Now e was chosen earlier in the proof so 
 that p e |N, but p e J H. However, if E(pV-A) = then p e |H(pV-A). Thus p 
 must divide pV-A. Since p|pV, this implies p|A. But p /a, since A^ ^ 0. 
 
 XT }J 
 
 This is the desired contradiction, and the theorem is proven. ■ 
 
 3-5 Further Results and Examples 
 
 In Sections 3.1 through 3.^- it was shown that given 
 N memory modules and a collection of [x,y] -lines, restricting 
 consideration to linear skewing schemes suffices to determine the existence 
 of valid periodic skewing schemes for the collection. It is natural to 
 
7^ 
 
 ask whether this result can be extended farther, so that by consideration 
 of only linear skewing schemes, the question of the existence of an 
 arbitrary skewing scheme, valid in the plane, can be settled. A 
 limited answer is given by 
 
 Theorem 11: Given N, the number of memory modules, 
 and a collection of [x,y] -lines, {[x.,y. ] -lines | 
 i=l, 2, . . ., I}, if there exists two sequences of 
 
 integers, (a^a^ . . .,a ] .) and (b^bg, . . .,b z ), such 
 
 I I 
 
 that £ (a x ,a y ) = (0,1) and 2 (b x ,b y ) = (1,0), 
 i=l i=l 
 
 then if there does not exist a valid linear skewing 
 
 scheme for this collection, there is no valid skewing 
 
 scheme for this collection. 
 
 Proof: The proof is quite simple. The conditions of this 
 theorem and the result of Theorem 10, imply there is no valid periodic 
 skewing scheme for this collection of [x,y] -lines. The conditions on 
 the sequences of a. 's and b. ' s will be seen to imply that if there are any 
 valid skewing schemes, then they are periodic. This will establish the 
 theorem. 
 
 Notice that for any valid skewing scheme, cp, cp(i,j) = 
 cp(i+Ny, , j+Nx_ ), for k=l,2, ...,I and any i and j. This is so, because 
 cp(i,j), cpCi+y^j+s^), ...,cp(i+(N-l)y k ,j + (N-l)x k ) must all be distinct, 
 since ((i, j), (i+y k , J + \)> • ■ •> (i + (N-l)y k , j + CN-l)^)). is an instance of the 
 tx k' y kV line ' similarl y> c P(i+y k ,J+x k ), cp(i+2y k ,o+2x k ), . . ., cp(i+Ny fc , j+Nx^ 
 must all be distinct, since ((i+y^j+x^), (i+2y k , j+2^), . . ., (i+Ny fc , j+Nx^)) 
 is also an instance of the [x, ,y, ] -line. Since these two instances have 
 N-l ordered pairs in common, the pigeonhole principle requires that 
 
75 
 
 cp(i, j) = cp(i+Ny , j+Nx_ ) . From this it is clear that cp(i, j) = 
 cp(i+£lty. ,0+iNxi ), for £=...,-1,0,1,... and any i and j. 
 
 Since k was arbitrary, it follows that for any (c, d), 
 
 II II 
 
 cp(c,d) = cp(c + Z a.Ny., d+ S a.Nx. ) = cp(c + N 2 a.y.,d +N X a.x.) = 
 i=l X X i=l X X i=l X X i=l X X 
 
 cp(c+N,d). Similarly, by using the sequence of b. 's, cp(c,d) = cp(c,d+N). 
 Since (c, d) was arbitrary, cp(c, d) = cp(c+N, d) = cp(c, d+W) establishes 
 that cp must be a periodic skewing scheme. H 
 
 This condition, restrictive though it may be, is sufficient 
 to resolve the most important practical case: {[1,0] -line, [0,1] -line, 
 [1,1] -line, [1,-1] -line}. The sequences (0,1,0,0) and (1,0,0,0) 
 suffice, clearly. Thus for this important case, considered by Budnik and 
 Kuck [3] and Lawrie [10], if there does not exist a linear skewing scheme 
 using N memory modules, and there does not when N is a power of two, then 
 there is no valid skewing scheme of any type whatsoever . 
 
 It is easy to allow oneself to be misled by the conclusion of 
 Theorem 10. When given a collection of [x,y] -lines, and deciding on a 
 skewing scheme using N memory modules, there may be advantages to choosing 
 a non-linear, but still periodic,' skewing scheme. Some periodic skewing 
 schemes can be so simple that they take very little hardware to perform 
 address computation and to align the data with the correct processor, 
 even less hardware than required by linear skewing schemes. One such 
 periodic skewing scheme has been used in the construction of an actual 
 
 machine, the STARAN [1]. Abstracting from the exact details of the STARAN 
 
 n n 
 design, the programmer views memory as consisting of a 2 x2 array. In 
 
 the language of the designers of the STARAN, the programmer views the 
 
 memory as having 2 words of 2 bits, and the programmer can indicate 
 
76 
 
 he wants to fetch all the bits of one word, or a bit-slice, the j^h 
 
 bit of each word (Figure 22). In the terminology used here, this is 
 
 equivalent to fetching arbitrary instances of the [1,0] -line and the 
 
 [0,1] -line from an array of data elements, using 2 memory modules 
 2 n 
 
 to store the data, and using a periodic skewing scheme. 
 
 The skewing scheme employed is cp(i, j) = i mod 2 © j mod 2 , 
 
 n n 
 
 where i mod 2 and j mod 2 are expressed in binary notation and © is 
 
 exclusive-or. This periodic skewing scheme is explicitly calculated 
 
 for an 8x8 array in Figure 23- Unlike the other machine designs 
 
 discussed earlier, the responsibility of deciding on a skewing scheme 
 
 does not rest with the programmer or compiler, but is built directly into 
 
 the hardware. Indeed the user of the STARAN need not even know that a 
 
 skewing scheme is employed; by appropriately setting the global address 
 
 register, G, and the access mode register, M, either the correct word or 
 
 bit-slice will be made to appear at the processing elements. Additional 
 
 ways of setting M allow some, but not all, instances of other generalized 
 
 lines (but not [x,y] -lines) to be fetched without memory conflict. 
 
 2 n 
 
 The reason that this skewing scheme is of practical importance 
 is that the address computations needed to fetch instances of the [1, 0] n -line 
 and the [0,1] -line can be done using only n exclusive-or gates. This 
 compares to 2 n adders that would be needed if a linear skewing scheme was 
 employed. Additionally, an exclusive-or can be performed in less time than 
 
 What has been called memory here is actually the memory of a single array 
 module in the STARAN. Each array module has a memory consisting of a 256 x 
 256 array of bits. This memory physically consists of 256 independent 
 memory modules, each with 256 one-bit words. By the skewing scheme 
 described in the body of the text all the bits of any word and all the bits 
 of any bit-slice lie in different memory modules, and can be fetched in one 
 memory cycle. Readers interested in exact implementation details and the 
 terminology used by the STARAN designers should consult [1]. 
 
77 
 
 bit-slices 
 
 words 
 
 1 
 
 1 
 
 Y////////////////////////VM//////A 
 
 
 V 
 
 -word i 
 
 t_ 
 
 bit-slice j 
 
 Figure 22: Programmer's View of STARAN Memory. 
 
78 
 
 
 
 1 
 
 2 
 
 3 
 
 4 
 
 5 
 
 6 
 
 7 
 
 1 
 
 
 
 3 
 
 2 
 
 5 
 
 4 
 
 7 
 
 6 
 
 2 
 
 3 
 
 
 
 1 
 
 6 
 
 7 
 
 4 
 
 5 
 
 3 
 
 2 
 
 1 
 
 
 
 7 
 
 6 
 
 5 
 
 4 
 
 4 
 
 5 
 
 6 
 
 7 
 
 
 
 1 
 
 2 
 
 3 
 
 5 
 
 4 
 
 7 
 
 6 
 
 1 
 
 
 
 3 
 
 2 
 
 6 
 
 7 
 
 4 
 
 5 
 
 2 
 
 3 
 
 
 
 1 
 
 7 
 
 6 
 
 5 
 
 4 
 
 3 
 
 2 
 
 1 
 
 
 
 Figure 23: The Periodic Skewing Scheme 
 Used in the STARAN Computer. 
 
79 
 
 an addition, since there is no carry propagation. The memory-processor 
 connection network required in the STARAN is of some order of complexity 
 as Lawrie's ft-network. The reader interested in the details of address 
 computation, a proof of the validity of the skewing scheme, and the 
 details of the memory-processor connection network should consult 
 Batcher [1] . 
 
 This example was presented to show that non-linear periodic 
 skewing schemes can be important in actual practice, even when there are 
 valid linear skewing schemes. In comparing STARAN to the more general 
 machine modeled in Figure J, it should be noted that in STARAN the 
 generalized lines that the programmer can access conveniently were fixed 
 at the time of design. For these generalized lines the programmer need 
 not concern himself with skewing schemes, as the hardware handles the data 
 storage and unscrambling automatically. In the more general computer, 
 modeled in Figure 3, the determination of an appropriate skewing scheme 
 is left to the programmer, who may be restricted in his choices by the 
 nature of the memory-processor connection network. While this may be more 
 work for the user, it allows greater flexibility than is available in the 
 STARAN. Some authors [11] have discussed leaving the choice of skewing 
 scheme and /or the address computation to the compiler, thus freeing the 
 programmer from this bookkeeping. 
 
8o 
 
 k. UNRESOLVED PROBLEMS AND DIRECTIONS OF FURTHER RESEARCH 
 
 k.l The Effectiveness of Linear and Periodic Skewing Schemes 
 
 One question that has occurred throughout this thesis is: 
 
 Given a collection of generalized lines, when can the search for a valid 
 
 skewing scheme for this collection safely be restricted to certain 
 
 subclasses of skewing schemes, that is, when does a valid skewing scheme 
 
 from a class of skewing schemes, imply a valid skewing scheme from a 
 
 subclass of this class of skewing schemes? In Chapter 1, it was shown 
 
 that for any collection of generalized lines, attention could safely be 
 
 restricted from skewing schemes valid on the quarter plane to skewing 
 
 schemes valid on the entire plane. Similarly, Theorem 10 shows that for 
 
 collections of [x, y] -lines, attention can safely be restricted from 
 
 periodic skewing schemes, using N memory modules, to linear skewing 
 
 schemes, using N memory modules . In general, for an arbitrary collection 
 
 of generalized lines, the question of when attention can be restricted 
 
 from arbitrary skewing schemes defined on the plane to periodic skewing 
 
 schemes, and from periodic skewing schemes to linear skewing schemes is 
 
 unresolved. In this section some conjectures, partial results, and 
 
 interesting examples are presented for a class of generalized lines called 
 
 polyominoes [5,6]. 
 
 Definition 11: A polyomino is a generalized line in 
 
 which given any two components, (x ,y ) and (x ,y ), 
 
 there exists a path (x ,y ) = (x ,y ), (x ,y ),..., 
 
 11 2 2 
 
8l 
 
 (x ± ,y ± ) = (x T ,y T ) such that (x ± ,y ± ) is a 
 r r j j 
 
 component of the generalized line, for 5=1,2, . ..,r 
 
 and either x. = x. ± 1 and y. = y. or 
 
 3+1 3 3+1 J 
 
 x = x. and y^^ = y ± 1, for j=l,2, . . .,r-l. 
 
 3+1 3 3+1 3 
 
 In the geometric realization of a generalized line "by unit 
 squares, a polyomino is a generalized line that is connected. Except 
 for the disconnected shape of Figure 20, all the generalized lines used 
 as examples in Chapter 2 are polyominoes. There is an impressive 
 amount of literature, and many unsolved problems concerning polyominoes. 
 A general source is [6]. 
 
 Conjecture: Given N memory modules, and a polyomino 
 of length N, then if there is a valid skewing scheme 
 for the polyomino, there is also a valid periodic 
 skewing scheme for the polyomino . 
 
 This conjecture is supported by consideration of a construction, 
 illustrated initially by example. Consider the generalized line whose 
 
 geometric realization is 
 
 Theorem 5 proves that the problem 
 
 of finding valid skewing schemes, is equivalent to determining tesselations 
 of the plane. With the objective of analyzing possible tesselations, lay 
 down an instance of this generalized line (see Figure 2k). Without loss of 
 
 In the literature, polyominoes are usually defined to be the geometric 
 realizations of the class of generalized lines described by Definition 11. 
 Additionally, unlike here, in most problems concerning polyominoes, 
 rotations and reflections of a polyomino are permitted and are not regarded 
 as generating different polyominoes. Additionally, a comment should be 
 made about connectedness. Here, connected means connected by more than a 
 corner. This kind of connectedness has been called rook-wise connected, 
 because of the permissible motions of the chess piece by the same name. 
 
82 
 
 
 A 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 B 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 C 
 
 
 
 
 
 
 
 
 
 (a) 
 
 (b) 
 
 (c) 
 
 
 (p, q) 
 
 \ 
 
 
 
 / 
 
 / 
 
 v 
 
 > 
 
 /(p+r 
 
 ,q+s) 
 
 < 
 
 (o,op 
 
 v, 
 
 / 
 
 / 
 
 
 
 \ 
 
 (r,s) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 (d) 
 
 Figure 2k: Positioning Four Instances of the Generalized Line 
 
 ((0,0), (1,0), (1,1), (2,0), (2,1)), so their Designated 
 Elements Form a Parallelogram. 
 
83 
 
 generality the designated element can be assumed to be at (0,0). Now, 
 there is only one way a second instance can be positioned so that the 
 square labeled A is covered without overlapping of the instances 
 (Figure 2k(h)) . Again there is only one way to position a third 
 instance so the square labeled B is covered without any overlapping of 
 the instances (Figure 2^(c)). Finally, there is only one way for a 
 fourth instance to be positioned so that the square labeled C is covered 
 and, again, there is no overlapping of the instances (Figure 2^4- (d.) ) . 
 Notice that the designated elements form the vertices of a parallelogram 
 which contains "no holes," and has area N, the length of the polyomino. 
 Now, by replication of the parallelogram, it is clear that a tesselation 
 of the plane results. The tesselation that results is very orderly. 
 Sometimes, particularly when the polyomino has a high degree of symmetry, 
 the construction, informally presented above, can yield more than one 
 parallelogram. This is illustrated, in Figure 25, by the generalized line, 
 
 whose geometric realization is 
 
 L 
 
 -shaped. The generalized line, 
 
 whose geometric realization is , -shaped, exhibits the same 
 phenomenon . 
 
 When four instances of a polyomino of length N can be laid down 
 so their designated elements form the four vertices of a parallelogram of 
 area N and which contains no holes, then the tesselation of the plane, 
 produced by replicating the parallelogram, induces a periodic skewing 
 scheme. The proof of this statement is reminiscent of the proof of 
 Theorem 11. If the vertices of the parallelogram are labeled as in 
 
 In this example, N = 5« 
 
8k 
 
 
 
 v 
 
 
 
 < 
 
 
 
 
 
 
 
 
 
 > 
 
 
 
 
 V 
 
 
 
 
 
 
 
 
 
 
 (a) 
 
 
 
 
 Figure 25: Alternate Positionings of Instances for the 
 
 Generalized Line ((0,0), (1,0), (1,1), (1,2), (2,2)) 
 
85 
 
 Figure 2^(d), then letting cp "be the skewing scheme induced by the 
 tesselation, it is clear that 
 
 cp(c,d) = ep(c+p,d+q) = cp(c+r,d+s), (lk) 
 
 for any (c,d). Now |ps-qr| = N since |ps-qr| is the area of the 
 parallelogram, and thus (c,d)+(-rp, -rq)+(pr,ps) = (c, d+N) (or (c,d-N), 
 depending on the sign of ps-qr) . Combining this with (lk) implies 
 cp(c,d) = cp(c,d+N)- Similarly, (c,d)+(sp, sq)+(-qr, -qs) = (c+N, d) (or 
 (c-N,d)), and, thus, cp(c,d) = cp(c+N,d). Since (c,d) was arbitrary, cp is 
 periodic. 
 
 The obstacle standing in the way of a proof of the conjecture 
 is the inability to prove that if a polyomino tesselates the plane, which 
 implies the existence of a valid skewing scheme, then four instances of 
 the polyomino can be positioned so that the designated elements form the 
 vertices of a parallelogram of area N. This has been checked by hand for 
 all polyominoes through N = 7 an d no exceptions have been found. It is 
 reasonable to believe that this is in fact so, since, for most polyominoes 
 there is usually only one way a second instance can be positioned so that 
 some carefully chosen square is covered, and at the same time the instances 
 do not overlap. This style of argument was used to show that the 
 
 cannot tesselate the plane (Figure 12), as well as to construct 
 the unique tesselation of the plane (except for rigid shifting) by the 
 
 , (Figures Ik and 2k) . Polyominoes with some symmetry, however, 
 
 frequently produce several distinct tesselations. 
 
86 
 
 Even though it seems reasonable to believe that if a 
 polyomino tesselates the plane, the construction of a parallelogram 
 of area N will always be possible, there is some evidence to the 
 contrary. Some authors have considered tesselating the plane, while 
 allowing simultaneous use of several different polyominoes. Examples 
 have been reported of collections of polyominoes from which a tesselation 
 of the plane can be constructed, but from which no periodic tesselation 
 (with any period whatsoever) can be constructed [5]. 
 
 Because the example presented below indicates that there may 
 be some unrecognized subtleties, to close the discussion of this 
 conjecture, an alternate approach to its proof, known to be inadequate, 
 will be discussed. Suppose a polyomino of length N is given, and f is a 
 valid skewing scheme for this polyomino. Then, define cp(i,j) = \|r(i mod N, 
 j mod N) . cp is periodic, clearly. The objective is to show that cp is 
 valid. This approach is motivated by consideration of the [1,0] -line. 
 Clearly, any valid skewing scheme, \|/, has the property \|/(c,d) = \(r(c,d+N) 
 for any (c,d). However, it is easy to construct \|/ so that \J/(c,d) ^ \|/(c+N,d), 
 i.e. \|/ is non-periodic in the vertical direction. Note, however, that cp, 
 defined by cp(i, j) = \|/ ( ± mod N, j mod N) is both periodic and valid for the 
 [1,0] -line, provided \|/ is valid. This technique also works on generalized 
 lines whose geometric realization is an IxL rectangle, where N N = N. 
 However, consideration of the generalized line L = ( (0, 0), (0, l), (l, l), (1,2)), 
 
 whose geometric is I . , shows that such an approach to the proof of 
 
 These tesselations should be carefully distinguished from those used in 
 Theorem 6. There, several tesselations were constructed using different 
 generalized lines, but each tesselation used instances of only one type. 
 In the case mentioned here, several different polyominoes were used to 
 construct a single tesselation. 
 
8 7 
 
 the conjecture is inadequate. Figure 26 indicates a tesselation, 
 which induces a valid skewing scheme, \|r, for which cp, defined by cp(i,j) = 
 \|r(i mod N, j mod N) is not a valid skewing scheme. There are, however, 
 valid periodic skewing schemes for L. 
 
 If this conjecture is true, then given N memory modules and a 
 polyomino of length N, when looking for a valid skewing scheme attention 
 can safely be restricted to periodic skewing schemes . A good question 
 to ask is: Can attention safely be restricted to linear skewing schemes? 
 While the answer to this question is "no, " in general, if N is prime and 
 if the existence of a tesselation by a polyomino implies that the 
 construction described earlier results in a parallelogram of area N, 
 then the answer is "yes." Using the notation of Figure 2^(d), for 
 cp(i, j) = ai+bj mod N, define a and b by: 
 
 if (p,q) ^ (0,0) then a = -q, b = p 
 else a = 1, b = 1 . 
 
 To see that cp is valid, observe first that for the exceptional case 
 
 (p, q) = (0,0), either (p, q) = (±N,0) or (p, q) = (0, ±N). This is true 
 
 because the polyomino is connected and has only N components . Now 
 
 (p> Q.) = (±N, 0) or (0, ±N) implies that the polyomino is a [0,1] -line or 
 
 a [1,0] -line, and then cp(i,j) = i+j mod N is indeed valid. When a = -q 
 
 and b = p, that cp is valid can be seen as follows. Note that 
 
 ^(PjQ.) = -QP+pq. mod- N = 0, and cp(r, s) = -qr+ps mod N = 0. Thus adding 
 
 multiples of (p, q) and/or (r, s) to a point does not effect the value of cp, 
 
 Because of this, and the orderly way in which replications of the 
 
 parallelogram tesselate the plane, if cp maps all the components of one 
 
 instance to distinct memory modules, then cp is valid. However, by adding 
 
88 
 
 
 1 
 
 
 
 II 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 
 
 
 
 
 
 y 
 
 
 
 
 
 
 
 
 
 II 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 
 II 
 
 
 
 
 
 
 II 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 II 
 
 
 
 
 
 
 
 
 
 
 
 Enlargement of the ^ xh square, 
 showing the skewing scheme 
 induced by the above tesselation 
 
 
 
 1 
 
 2 
 
 3 
 
 3 
 
 2 
 
 3 
 
 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 3 
 
 2 
 
 Figure 26: A Non-periodic Skewing Scheme, ii, for which <P(i,j) = 
 i mod N, j mod N) is not Valid. 
 
89 
 
 (p, q) and/or (r, s) to some components of the L(0, 0) instance, where L 
 is the polyomino, the condition on cp can be changed to: cp maps into 
 distinct memory modules all ordered pairs (of integers) interior to 
 the parallelogram or lying on the edges whose endpoints are (0,0) and 
 (p, q), and (0,0) and (r, s) , but not including (p, q) and (r, s). 
 Figure 27 illustrates the situation. Now, cp will be the same for two 
 ordered pairs of integers interior to or lying on the edges of the 
 parallelogram if and only if the line segment connecting them is parallel 
 to the line segment from (0,0) to (p, q), since only then will |-qi +pi |, 
 the area of parallelogram I in Figure 28, equal |-qi +pj |, the area of 
 parallelogram II in Figure 28. The existence of two ordered pairs of 
 integers interior to or lying on the edges of the parallelogram, for 
 which the line segment connecting them is parallel to the line segment 
 connecting (0,0) to (p, q) implies the existence of an ordered pair of 
 integers lying on the line segment connecting (0,0) to (p, q) . But this 
 can occur only if p and q have a common divisor other than one. Then this 
 common divisor, which is less than N, divides |ps-qr| = N, which is 
 impossible since N is prime. This contradiction implies cp is a valid 
 linear skewing scheme. 
 
 When N is not a prime number, it is not always possible to 
 restrict attention to linear skewing schemes, when determining if valid 
 skewing schemes exist for a polyomino. Figure 29 illustrates that four 
 instances of the generalized line L = ( (0, 0), (0,1), (0,2), (0,3), (l, 0), (l, l), 
 (1,2), (1,3), (2,0), (2,1), (3,0), (3,1)) can be positioned so that the 
 designated elements form a parallelogram of area N (N = 12) . Thus there 
 is a valid periodic skewing scheme for L. There is no valid linear skewing 
 scheme for L, as trying all the possibilities indicates. Notice that the 
 
90 
 
 Figure 27: Examples of Translating by (p, q) and/or (r, s), so 
 that all the Components of the Instance of the 
 Generalized Line Lie Interior to the Parallelogram. 
 
91 
 
 parallelogram II 
 
 parallelogram I 
 
 parallelogram formed from 
 the designated elements 
 
 Figure 28: Components of an Instance of a Generalized 
 Line, after Translation by (p, q) and/or 
 (r, s), which are Stored in Same Memory 
 Module . 
 
92 
 
 Figure 29: An Example of a Polyomino for which There is 
 a Valid Periodic Skewing Scheme, but no Valid 
 Linear Skewing Scheme . 
 
93 
 
 line segments connecting (0,0) and (p, q) and (0,0) and (r, s) pass 
 through ordered pairs of integers. It is frequently the case that for N, 
 a composite number, one of these line segments will not pass through 
 any ordered pairs of integers. In such a case, as in the proof outlined 
 above, cp(i,j) = ai+bj mod N, will be a valid linear skewing scheme, 
 where a = -q and b = p or a = -s, b = r, depending on which line segment 
 does not pass through an ordered pair of integers. 
 
 In summary, there is evidence to believe that in determining the 
 existence of a valid skewing scheme for a polyomino, using the same 
 number of memory modules as the length of the polyomino, only periodic 
 skewing schemes need be considered, and if N is prime attention can be 
 restricted further, to linear skewing schemes. 
 
 k .2 Questions Relating to Memory Utilization 
 
 Throughout much of this work attention has been focused on the 
 case where the length of the generalized line(s) and the number of 
 memory modules are equal. Motivation for this was presented at the 
 beginning of Chapter 2 by considering computers designed as in 
 Figure 3» When dealing with [x,y] -lines, setting the length of the 
 generalized lines equal to the number of memory modules is an honest 
 reflection of the real situation. In actual computations involving 
 [x,y] -lines, and in particular the [1,0] -line and the [0,1] -line, what 
 is often wanted is an entire row or column of a matrix. However, the 
 number of processors in the actual computer dictates that the row or 
 column can be processed no faster than N elements at a time, where the 
 number of processors is taken to be equal to the number of memory modules, 
 N. Since N elements of the row or column is all that can be fetched and 
 
9^ 
 
 processed in one memory cycle, clearly [x, y] -lines are the natural 
 candidates for consideration. 
 
 On the other hand, when a computation uses the generalized 
 line L = ((0,0), (0,1), (0,2), (1,1), (2,0), (2,1), (2,2)), whose geometric 
 
 realization is 
 
 -shaped, if the number of processors and the 
 
 number of memories is greater than the length of the generalized line, 
 then some components of the computer may just have to idle; the algorithm 
 may not be able to utilize the full potential of the machine. Having 
 extra hardware available, which cannot be utilized by an algorithm, may 
 still result in improved performance, however. To see this, again 
 consider an example presented in Chapter 2. Figure 12 indicated why 
 
 the 
 
 cannot tesselate the plane, and hence there exists no 
 
 valid skewing scheme, using only five memory modules, for the generalized 
 line, L= ( (0, 0), (0, l), (0,2), (l, l), (2, l) ) . However, if the computer has 
 more than five memory modules, perhaps a valid skewing scheme can be 
 found which uses six or more memory modules. This motivates 
 
 Definition 12: A generalized line, L', is called 
 
 a cover for another generalized line, L, if every 
 
 component of L is contained in L' . 
 
 Notice that if cp is valid for a cover of t;he generalized line, 
 L 1 , then cp is also valid for L. The length of the generalized line used 
 as a cover may conveniently be chosen to be the number of memory modules 
 of the computer and then Theorem 5 can be applied to determine valid 
 skewing schemes for the cover. 
 
95 
 
 In general, given a collection of generalized lines, 
 
 [L- t I> of . • >,L }, not necessarily all of the same length, determination 
 
 of the minimum number of memory modules, N, so that a valid skewing 
 
 scheme, using N memory modules, exists for the collection, is quite 
 
 difficult. A simple heuristic is to find a collection of generalized 
 
 lines, {L', L', . . .,L'}, such that L.' is a cover of L. , the L.' are all 
 ' j.' 2' ' p ' 1 i/ i 
 
 of the same length, and such that they satisfy the conditions of 
 Theorem 6. The resulting skewing scheme will also he valid for 
 {L , L , ...,L )• There is often great latitude in choosing the covers, 
 since given a generalized line, L, there may be many choices for L', so 
 that L' tesselates the plane. For the generalized line, L = ((0,0), 
 (0, l), (0,2), (1,1), (2,1) ), if the search for covers that tesselate the 
 plane is arbitrarily restricted to be polyominoes, and covers symmetric 
 to other covers are ignored, there are still three generalized lines 
 that tesselate the plane and are covers for L (see Figure 30) . The 
 rapid growth in the number of covers which are able to tesselate the 
 plane indicates the formal evaluation of the heuristic given earlier may 
 be very difficult. 
 
 Conjecture: Given a polyomino, L, of length N, 
 for which there is no valid skewing scheme using 
 N memory modules, then there is a valid skewing 
 scheme for L using N+l memory modules if and only 
 if there is a cover, L', for L, of length N+l, 
 L 1 a polyomino, which tesselates the plane. 
 
96 
 
 T 
 
 ■MM I 
 
 
 
 Figure 30: Covers for the Generalized Line 
 
 ((0,0), (0,1), (0,2), (1,1), (2,1)) which 
 Tesselate the Plane. 
 
97 
 
 4.3 Comments on Broader Problems 
 
 In closing, it is useful to relate this research to the 
 construction of real machines and to the work of others. Computers 
 similar to the one modeled in Figure 3 have been built or proposed. 
 Many researchers have realized that a very important problem is the 
 construction of memory-processor connection networks [10, 14]. In actual 
 computations it is necessary to align the data so that the first 
 component of an instance always appears at processor i , the second 
 component of the instance always appears at processor i , etc. An 
 example may be helpful. Consider the generalized line L = ( (0, 0), (0, 1), 
 (0,2), (1, l), (2,0), (2,1), (2,2)) and the periodic skewing scheme depicted 
 in Figure 13. If an instance is fetched, whose designated element is 
 stored in memory zero, then the data is already aligned, i.e. the element 
 demanded by processor zero is in memory module zero, the element demanded 
 by processor one is in memory module one, etc. For an instance whose 
 designated element is stored in memory module one, however, the data from 
 memory module one needs to be routed to processor zero, the data from 
 memory module two needs to be routed to processor one, the data in memory 
 module four needs to be routed to processor two, etc. This situation can 
 be described as follows: If an instance of L is fetched, whose designated 
 element is stored in memory module one, then to align the data with the 
 processors, the memory-processor- connection network must be able to sort 
 the permutation (12 4 5 6 3)- The phrase "sort the permutation" is 
 appropriate since paths must be established from processor zero to memory 
 module one, from processor one to memory module two, etc., and this can 
 be thought of as placing (12^0563) as input at the processor side 
 of the network and getting as output (0123^56) at the memory side 
 
98 
 
 of the network. Using this terminology, to be able to align any 
 instance of L, using the skewing scheme of Figure 13, the network must 
 be able to sort (0123^56), (l 2 1| 5 6 )), (2 1+ 5 1 6 3 0), 
 (3 1 6 2 h 5), (k 5 6 2 3 1), (5 6 3 ] + l 2), and (6 3 5 1 2 *0 • 
 In general, if a periodic skewing scheme is used, derived from a 
 tesselation generated by replicating parallelograms constructed as 
 described in Section h.l, then, in order to align the data, the 
 memory-processor connection network must be able to sort N permutations. 
 Standard fan- in arguments show that this will take 0(fog. N) time. 
 
 The results of Chapter 3 and Section k.l are very 
 encouraging. It appears that in the most important practical 
 cases valid linear skewing schemes exist if any valid skewing schemes 
 exist. When linear skewing schemes are employed the memory-processor 
 connection network Is simpler. If it is unnecessary for the first 
 component of the generalized line to go to processor zero, but if, on 
 the other hand, it is sufficient that it always go to processor i , and 
 similarly for the other components of the generalized line, then a 
 network capable of performing arbitrary shifts is adequate. That is, 
 the permutations to be sorted are just (0 12 ... N-l), (12 3 ••• N-l 0), 
 (2 3^ ... N-10 1), ..., (N-l 1 ... N-2). 
 
 If only one generalized line is required for a computation, then 
 there are no difficulties created by always sending the first component to 
 processor i , the second component to processor i , etc. If several 
 generalized lines are used by an algorithm, however, problems may develop. 
 
 It is often the case, as in matrix multiplication, that the algorithm will 
 
 th 
 require that the j component of all the generalized lines used be sent 
 
 to the same processor, i.. A simple shifting network may no longer be 
 
 J 
 
99 
 
 adequate, since the j^n component of one generalized line may not be 
 sent to the same processor as the j™ component of some other generalized 
 line. It is necessary to apply a corrective transformation after (or 
 before) the shifting is performed. A different transformation may be 
 required for each generalized line. For N, a power of two, Lawrie's 
 Q -network [10] performs the shifting and the additional transformations 
 needed simultaneously, without additional time delay or extra hardware. 
 Unfortunately, as the results presented here indicate, when only the 
 problem of finding valid skewing schemes is considered, a power of two 
 is not the best choice for N. Taking N to be a prime number gives 
 consistently better results. Taking N to be prime has two major 
 disadvantages, however. The modular arithmetic is much slower and no 
 networks that can perform as well as the Q-network are known. 
 
 One possible way of constructing such a memory-processor 
 
 v 
 
 connection network is to follow the shifting network by a Benes network [2] 
 
 which can sort any permutation. The best algorithm for setting up 
 the Benes network is due to Opferman and Tsao-Wu [12] . This algorithm, 
 which takes 0(1% N) time, is too slow for use on each memory cycle. 
 However, since the generalized lines used by a program are fixed at 
 compile time the way in which the network needs to be set up for each 
 generalized line used by the program can be calculated once, before the 
 program is run, stored in a memory, and then read out, to set the network 
 up rapidly, when needed. Adding a Benes network to the memory-processor 
 connection network, as just described, will create additional cost and 
 will slow down the machine somewhat since the data will have to pass through 
 more gates. This problem, and particularly incorporating the Benes network 
 in with the shifting network, is a good candidate for further research. 
 
100 
 
 LIST OF REFERENCES 
 
 [1] Batcher, K. E., "The Mult i- dimensional Access Memory in STARAN, " 
 
 Presented at the 1975 Sagamore Conference on Parallel Processing and 
 submitted for publication in the IEEE Transactions on Computers 
 Special Issue on Parallel Processing. 
 
 [2] Benes, V. E., Mathematical Theory of Connecting Networks and Telephone 
 Traffic , Academic Press, New York; 1965. 
 
 [3] Budnik, P. and D. J. Kuck, "The Organization and Use of Parallel 
 Memories, " IEEE Transactions on Computers , Vol. C-20, No. 12, 
 pp. I566-I569; December 1971. 
 
 [h] Chandra, A. K., "Independent Permutations, as Related to a Problem of 
 Moser and a Theorem of Polya, " Journal of Combinatorial Theory , 
 Series A, Vol. 16, pp. 111-120; 197I*. 
 
 [5] Gardner, M., "Mathematical Games; More About Tiling the Plane: The 
 Possibilities of Polyominoes, Polyiamonds and Polyhexes, " Scientific 
 American , Vol. 233, No. 2, pp. 112-115; August 1975. 
 
 [6] Golomb, S. W., Polyominoes , Charles Scribner's Sons, New York; 1965. 
 
 [7] Hardy, G. H. and E. M. Wright, An Introduction to the Theory of Numbers , 
 Oxford University Press, London; 195^-. 
 
 [8] Hoffman, K. and R. Kunze, Linear Algebra , Prentice-Hall, Inc., Englewood 
 Cliffs, New Jersey; 1961. 
 
 [9] Konig, D., Theorie der Endlichen und Unendlichen Graphen , Akademische 
 Verlagsgesellschaft M.B.H., Leipzig; 1936. 
 
 [10] Lawrie, D. H., "Memory-Processor Connection Networks," Ph.D. Thesis, 
 
 Department of Computer Science, University of Illinois at Urb ana- Champaign, 
 Report No. UIUCDCS-R-73-557; February 1973- 
 
 [11] Muraoka, Y., "Storage Allocation Algorithms in the TRANQUIL Compiler," 
 M.S. Thesis, Department of Computer Science, University of Illinois at 
 Urbana-Champaign, Report No. 297; January 1969- 
 
 [12] Opferman, D. C. and N. T. Tsao-Wu, "On a Class of Rearrangeable Switching 
 Networks, " Bell System Technical Journal , Vol. 50, pp. 1579-1618; 
 May- June 1971. 
 
101 
 
 [13] Polya, G., "liber die 'doppelt-periodischen' Losungen des 
 
 rz-Damen-Problems, " in W. Ahrens, Mathamatische Unterhaltungen und 
 Spiele , Teubner, Leipzig, pp. 36V 374; 19 18. 
 
 [14] Swanson, R. C, "Interconnections for Parallel Memories to Unscramble 
 p-Ordered Vectors," IEEE Transactions on Computers , Vol. C-23, No. 11, 
 pp. 1105-1115; November 1974. 
 
 [15] Thornton, J. E. , Design of a Computer the Control Data 6600 , Scott, 
 Foresman and Company, Glen view, Illinois; 1970. 
 
 [16] Yao, F., Private Communication. 
 
102 
 
 VITA 
 
 Henry David Shapiro was born in New York, New York on September 17, 
 19^-7. He received his bachelor of arts degree in mathematics from the 
 Johns Hopkins University, graduating with both honors and departmental 
 honors. At that time, he was also elected to Phi Beta Kappa. In October 
 1969, Mr. Shapiro was awarded a master of science degree in mathematics from 
 Stanford University. Before coming to the University of Illinois, he 
 taught high school mathematics and computer science at West Springfield 
 High School, Fairfax County, Virginia, from September 1969 until June 1972. 
 While working on his doctorate, he held a National Science Foundation Graduate 
 Fellowship and a Research Assistantship in the Department of Computer Science. 
 Mr. Shapiro is a member of Sigma Xi, the Association for Computing Machinery, 
 and the National Council of Teachers of Mathematics. He has had two papers 
 accepted for publication: "Storage Schemes in Parallel Memories" (to appear 
 in the Proceedings of the 1975 Sagamore Computer Conference on Parallel 
 Processing) and "A New Approach to Teaching a First Course in Compiler 
 Construction" with M. D. Mickunas (to appear in the Proceedings of the Sixth 
 Symposium on Computer Science Education). 
 
IBLIOGRAPHIC DATA 
 HEET 
 
 1. Report No. 
 
 UIUCDCS-R-75-776 
 
 3. Recipient's Accession No. 
 
 Title and Subtitle 
 
 THEORETICAL LIMITATIONS ON THE USE OF PARALLEL MEMORIES 
 
 5- Report Date 
 
 December, 1975 
 
 Aur hor(s ) 
 
 Henry David Shapiro 
 
 8- Performing Organization Rept. 
 No. 
 
 Vrforming Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana- Champaign 
 
 Urbana, Illinois 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 NSF C-J U1538 
 
 2. Sponsoring Organization Name and Address 
 
 13. Type of Report & Period 
 Covered 
 
 Ph.D. Thesis 
 
 14. 
 
 elementary Notes 
 
 '. Abstracts 
 
 Regardless of the underlying machine architecture, independently addressable 
 anory modules contribute significantly to program speed-ups on modern computers. 
 :;cause of memory conflicts which arise while accessing data, actual program speed-ups 
 ■'e generally less than theoretically possible. Organizing the data of a computation 
 !> as to avoid memory conflicts is particularly difficult for data which can logically 
 l viewed as two-dimensional. Several geometric and algebraic conditions are presented 
 ziich determine if the data of a computation can be organized to avoid memory conflicts 
 ]. is shown that a prime number of memory modules gives higher memory utilization and 
 E.lows the use of simpler storage schemes than a power of two number of memory modules. 
 le case of greatest practical significance, references to rows, columns and diagonals 
 c' a matrix, is given special attention. Finally, a brief discussion is presented 
 lien relates this research to that of a companion problem, the construction of 
 
 1 Ke> Words and Document Analysis. 17a. Descriptors memOry-prOCeSSOr connection networks for 
 
 single-instruction-multiple-data stream machines 
 
 parallel memories 
 memory conflicts 
 skewing schemes 
 linear skewing 
 tesselations 
 
 1 I l< nt it lers /Open-Ended Terms 
 
 0SAT1 I- ie Id /Group 
 
 >\ atlability Statement 
 
 — 
 
 " M T1S- 3f> ( I0-7C 
 
 19. Security Class (This 
 Report ) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21- No. of Pages 
 
 22. Price 
 
 USCOMM-DC 40329-P71 
 
c<2>