L I E> R.A R.Y OF THE U N IVE.R.SITY OF ILLINOIS 510.84 I«fer no. Z50-£5 WORD DO X ->x . , C; END; GO TO HERE; THERE: END PROCEDURE SEARCH; - 8 - 3.U System K 3.U.1 Assembler During this quarter, the basic ILL I AC IV assembler was transferred from the 709*+ to the B5500 by rewriting the assembler algorithm in Burroughs Extended ALGOL. This assembler has been exhaustively checked out and is available for use in conjunction with the ILLIAC IV simulator for basic checkout of ILLIAC IV code. (in this respect, see Document No. 159) • 3.U.2 Simulator The simulator program has also been successfully transferred from the 7094 to the B5500. All orders of ILLIAC IV which are currently well defined have been implemented in the simulator and checked out. In associa- tion with the basic assembler, this program suite is now available for use by applications personnel. 3.^.3 Debugging System In recognition of the novelty of some aspects of ILLIAC IV program- ming, it was decided to investigate the provision of more sophisticated debug- ging facilities than are generally provided on commercially available machines, These debugging facilities are to be ultimately formalized as a language exten- sion to Tranquil. This work is currently in an early stage. As a tooth-cutting project, an investigation is being made of the feasibility of attaching the debugging language extensions to B5500 ALGOL. A temporary grammer for a metastatement in ALGOL has been developed. Such a statement should allow programmers to write tests of their code into the original code. The result of the test statement should indicate that the code passed the test, or give some diagnostic information. Some thought has been given to implementation by macrogenerating acceptable code into an ALGOL program, or by modifying an ALGOL compiler to accept the metastatements. 3 . h . k Operating System Full-time work on this part of the ILLIAC IV software started during this quarter. An overall approach to the problem was formulated and various design criteria defined. Operating system's personnel participated in - 9 - Work has begun on the mesh generator problem for the two dimensional heat equation for convex polygonal regions, with solution by means of alter- nating direction techniques. An algorithm has been developed and coded in ALGOL for simplified boundary conditions, and debugging on the B5500 has started. 4.1.2 User Support Codes It was found, in attempting to analyze the problem by writing partial differential codes for ILLIAC IV, that a rather general structure might be used on a model for a large number of codes. (it would also serve for problems in areas other than partial differential equations.) This structure has five functionally distinct parts. In simple codes, some of these may be vestigial, but they can be identified. They are: l) Data specification; 2) storage allocation; 3) input/output; h) stencil operations; and 5) mathematical kernels. The data specification includes mesh generation for PDE codes. In PDE problems, this segment may calculate coordinates of grid points, make lists of neighboring points, do interpolation for initial or boundary values, and set indices or flags to mark boundaries. Information supplied to this segment will include geometry data, mesh sizes, location of interfaces, and more. Storage allocation assumes a crucial and unique role for ILLIAC IV. Efficient use of the machine depends directly on proper storage allocation. A number of canned mappings have been found to apply to a widening class of problems, and, as more of these somewhat universal allocation schemes are found, they will be added to the library. For some problems, it may be necessary to do a line by line or block by block allocation. Statements will be provided to do this. I/O, in addition to providing reading and writing of symbolic disk files and PE blocks, will also provide for graphical display of data through the B65OO. In the latter area, it is expected to allow for graphing functions of two variables and simulated three-dimensional plots, with the ability to rotate and cross-section the image. It is also expected to provide for contour mapping and for the display of the map as it would appear on an arbitrary surface--i. e. , a weather map on a hemisphere. It is also hoped - 11 - to have the facility for simulated three-dimensional contour mappings and for rotating and cross-sectioning the image. The graphical display package would also permit vector field graphs on arbitrary surfaces. Stencil operations have been found useful in programming problems for irregular meshes and for expanding sparse matrices. They allow data to be addressed indirectly and they make resulting algorithms more regular and easier to write. On simple parts of the data structure, they may not be necessary. The mathematical kernel is difficult to define exactly. In present thinking, it has come to be associated with a code without I/O which operates on stencils--the part of the code which is involved with setting up and solv- ing the difference equation in a PDE program, inverting a matrix, or perform- ing a sequence of transformation in an eigenvalue problem. Kernels may be written specifically or automatically generated from codes written for size independent problems. Schematically, the following flow diagram pertains to this: 1. Data Specification 2. Storage Allocation 3. I/O h. Stencil Operations 5. Kernel Other Kernels - 12 - In an actual code, these segments interact in several ways. The compiler uses information from some parts in translating others — storage allocation is a declaration. At execution, parameters must be transmitted from one segment to another. Also, in real algorithms, many kernels will be linked together. 4.1.3 Slip Flow and Boundary Layer Studies In any hydrodynamic problem involving the motion of a body in a fluid, the interest is in determining the velocity vector, the pressure, the temperature, etc., at any point in the flow field- The aerodynamic charac- teristics (lift, drag, etc.) of the body can then be obtained. It is well established that the flow field can be divided into two regions: a very thin layer in the neighborhood of the body (boundary layer) where friction (viscous action) is important, and the remaining region out- side this layer where viscous effects can be neglected. The governing equa- tions for both regions—a system of nonlinear partial differential equations- are derived from the same principles which are the conservation of mass, momentum, and energy. The equations for the outer region, however, do not contain terms to account for viscous action and are called Euler's equations. For the boundary layer region, the resulting equations are the Navier-Stokes equations which are more complex. Because of the great mathematical difficulties connected with solving the full Navier-Stokes equations, certain approximations are usually introduced into these equations to simplify them. Such approximations lead to the, so-called, boundary layer equations. The task of solving the boundary layer equations is the main and only theme of the boundary layer theory. It should be mentioned that the form of the boundary layer equations and the difficulty encountered in solving them depends on the geometry of the body and the manner in which it is moving. The classical boundary conditions for the velocity vector W and the temperature are W = 0, T = T respectively, at the surface of the body, where T is the wall temperature of the body. It has been established that w in certain situations, these classical boundary conditions are not appropri- ate (the criterion is the value of a parameter called the rarefaction parameter) - 13 - In such circumstances, which are often encountered in outer space where the density is very low or in hypersonic speeds, the above boundary conditions are replaced by the velocity slip and temperature jump boundary conditions at the wall. The simplest problem in which velocity slip rather than zero velocity should be used as a boundary condition is the slow motion of a flat plate in a rarefied flow field. The governing equations for this problem are: U ^ U +■ ^ U cbc ciy a 2 u 2 v - constant, au + av = ck cty where all quantities are nondimensionalized by normalizing them. This system of equations can be reduced to a single second order partial differential equation by introducing the stream function ¥ and using the von Mises trans- formation. The resulting equation is: where Q = oil ox" : as i Re -v v -- ■ cW (1), U = U(Y,x) cW ST ; Re is the Reynold's number based on the free 7 stream mean free path, The transformed boundary conditions are: U OF, 0) = 1 ^f (°; x) = 1 U («, x) = 1 - Ik - In the present study, equation (l) has already been solved by using a difference scheme suggested very recently by Milton Lees for a simi- lar equation but with different boundary conditions than the above ones. The results of the solution were in good agreement with another solution developed in a laborious way by W. L. Chow who did not transform the system of equations to equation (l) and did not employ the mesh or grid technique to arrive at his numerical solution. The success of the difference scheme used for equation (l) encouraged its application to a more sophisticated boundary layer problem in which the speed does not have to be slow, the density does not have to be low, and the heat transfer effects are not neglected. 1'he governing equations for this suggested problem, in the X - ¥ space are: ox" Re cW t"- 1 U |} dT Re, Pr' o¥ u o¥ ♦ <*-!>■£ u(| where U = U (¥, x) and T = T (¥, x) . An attempt is now being made to solve the above system of equations, by using the velocity slip and temperature jump boundary conditions one time and by using the no slip, no jump boundary conditions the next time. k . 1 . k Matrix Operations An algorithm for multiplying a sparse matrix times a sparse matrix was developed and coded in ILLIAC IV assembly language. To form the product A x B where A and B are sparse, the algorithm simultaneously forms dot pro- ducts of the form a b. 1 < i < 256 where a is a row vector of A, and b. is r 1 — — r ' 1 a column vector of B [or a section of a row (column) vector of length 2^6 when A (B) is m x n, m > 2^6 (n > 256)]. The algorithm consists of two sections. The first section determines which multiplications have to be performed. Since these multiplications, in general, vary from column to column and thus from PE to PE, a queue consisting - 15 - of the nonzero elements of a that are to be used together with a tag denoting which element of b. is to be used is formed in each PE. 1 The second section performs the multiplications using the information and the operands stored in the queue. Thus, a PE only has to be disabled when its total number of multiplications is less than the total number of multiplications of another PE. To illustrate, consider a x B where: ' r B = [a 1( 0, 0, a v a 5 , 0] " b ll b 12 b 22 h 2k V b U3 V V V b 53 b 62 b 6k B is stored as: PE 11 12 U3 2k 31 22 53 %h 51 52 '6k '62 - 16 - The queue looks ]ike: PE k_ 2 «- "h tags Note that the tags do not refer to the row index of the b element but to the position of the element in the packed array. U.1.5 Mathematical Service Routines This quarter has seen the initiation of an effort to develop codes for standard algebraic and trigonometric functions and routines for ILLIAC IV. Several codes are being written, and the analysis for others has been undertaken. A square root code (6U bit) has been completed. The technique used is essentially that of finding a close approximation and then applying Newton's method, a fixed number of times--a standard procedure. A polynomial root finder using the techniques of multisectioning to find simple roots and then applying Bainstow's method for multiple roots is nearing completion. Codes for log, sine, and cosine are under development. U.1.6 Carlson's S Method n The programming of steady-state neutron transport in Tranquil was done during this quarter. The stead-state neutron transport was of the type existing in an homogeneous isotropic medium with no upscattering or independ- ent source. Another characteristic was that it was of one dimensional spher- ical geometry assuming 32 velocity groups, l4 angular directions (Carlson's - IT - discrete S , formulation), and 128 spatial mesh points. At full efficiency, 8 processing elements calculate the inner solution for 32 groups simultaneously. Details are recorded in a separate ILLIAC IV document. k.2 Signal Processing 1+.2.1 Application of ILLIAC IV to Phased Array Radar for Area Defense The use of the ILLIAC IV computer in handling data for large phased array radars for ICBM Area Defense has been investigated. A particular phased array radar system was modeled, assuming a certain number of objects, and pro- grammed for single quadrant operation on ILLIAC IV. This single quadrant will be more than sufficient to handle the real time data processing requirements of radar control, scanning, designation, and tracking of potential targets. The phased array radar model assumed has separate transmitter and receiver arrays. The radar system is capable of scanning 10 resolution o elements/second. These may have as many as 10 apparent targets from which 2 3 3 x 10 to 3 x 10 should be tracked as possible targets for further discrim- ination. The transmitter has four quadrants which can operate, gather, or transmit four separate pulses simultaneously. Each quadrant has 12,5^ radi- ating elements and requires 113 steering commands for each transmission. The receiver consists of clusters of elements where each cluster can operate independently. There are 20 clusters for scanning, 2 for designation, and 5 for tracking and multiple use; each cluster can receive 9 beams simultaneously. The three major functions which require most of the computer time (steering commands, designation, track) have been programmed for ILLIAC IV operations and are presently undergoing simulation debugging for the B5500 computer. This major operation requires approximately 50$> full computer capability of the single quadrant. A detailed report documenting the pro- cedure for handling the problem is presently in the final stages of preparation. I4..3 Matrices )4.3.1 Eigenvalue Problems In the last quarter, emphasis was given to eigenvalue problems of symmetric matrices. During this quarter, the eigenvalue problems for - 18 - non-symmetric matrices were dealt with (QR-algorithm) . It was found, that the QR-algorithm with a single origin shift will speed up the convergence greatly. It was also found that using a double origin shift is more advan- tageous in the case of complex -pair eigenvalues. The full report on Jacobi, Householder, and the QR-algorithm has been completed. The Tranquil language codes for the three methods are included in the report. k.k Linear Programming U.U.I Introduction The study of linear programming has been directed to the writing of a to tally> memory contained LP code in ILL1AC IV assembly language by using the general form revised simplex algorithm and to the investigation of a new algorithm for large scale linear programming problems. h.k.2 LP Code The totally contained LP code is quite suitable for the solution of relatively small scale linear programming problems, such as a size of 600 x 1000 or smaller. The main feature of this case is its simplicity, since no input-output operations are involved. A complete procedure description, including mathematical procedures and computational procedures, will soon appear in two ILLIAC IV Documents. This code will be simulated by the ILL1AC IV simulator using 6U bit precision. No other numerical provision is being included, such as reinversion of basis to restore the precision at some point of iteration. The new algorithm for large scale linear programming problems attempts to make use of the great sparsity of the activity matrix and the formulation of the revised simplex method. This new algorithm is the product form revised simplex method which has been proven to be computationally effective in han- dling larger scale linear programming problems in which the general form revised simplex method is no longer effectively applicable. This new algo- rithm has received thorough investigation. Proper mathematical procedures have been formulated in terms of the product form revised simplex method and made applicable to the ILLIAC IV computer. A brief comparison between the general form revised simplex method and the product form revised simplex - 19 - method was made in order to visualize the advantages of the product form revised simplex method over the general form revised simplex method. h. 5 Computer Graphics 4.5.1 Introduction Over the last quarter, ILLIAC IV s investigation of large scale weather calculations has reached the conclusion that general circulation modeling is a highly parallel process (as are most partial differential equations problems) to which ILLIAC IV can be efficiently applied. As in all applications, the real difficulty in using ILLIAC IV to simulate the large scale motion of the atmosphere lies in storing data across memory in such a way that the PE's can be used efficiently. In general, it seems that most global grid systems likely to be used in meteorology can be broken into large rectangle-like blocks, each of which can be stored in the machine in a straight-forward way. This, of course, does not eliminate the need for clever programming, but is a significant step towards parallelism. 4.5.2 NCAR Model Two heartening results related to the NCAR model have appeared in the last quarter. The first involved a model comparable numerically to that coded and timed for ILLIAC IV, but not core contained. Also, assuming a disk block size of 256 words/PE, the ratio of block transmission time between disk and ILLIAC IV memory to computation time on 1 block of data is approximately 1 to 10. This allows I/O to be effectively masked by computation. The sec- ond involved a very simple calculation related to general circulation mod-ling, The ratio of execution time on NCAR's CDC 6600 (problem coded in FORTRAN IV) to estimated ILLIAC IV execution time (hand timed machine code) was 800 to 1-- a substantial speedup. 4.5.3 Weather Mapping Algorithm A great simplification occurred in the weather mapping algorithm when it was learned that the vectors do not need to be ordered to follow the contour lines. The reason for this is that weather maps are difficult to interpret if they contain too many contour lines. With a smaller number - 20 - of contour lines, a display is fast enough to draw the vectors in a random manner without flicker. It is now apparent that a complete weather map can be generated in about one millisecond witn all tne work done in ILLIAC IV in a highly efficient manner. k . 5 . h Graphical Display System Work this quarter has centered on debugging the graphical display system described in last quarter's QPR, The system is presently operating on an IBM 709^ with output to a Calcomp platter (via tape) and on a CDS l6oU with output direct tc a scope. Parts of the system have run on the B5500; but as that machine does not allow independent subroutines, there is doubt as to whether the present version of the system will be used on it. Present work is centered on detailed debugging and improvements of this system and on the definition and use of surfaces in this system. U.6 Weapons Effects Calculations During this period, the performance of the ILLIAC IV on weapons effects calculations was studied. The specific problems investigated were Eulerian hydrodynamics in two and three dimensions, and the solution of the radiation transport problem by the method of long characteristics. Because of the assumed symmetry in the r direction in 2-D hydro, it is required that the number of cells in the Z direction equal roughly twice the number of cells in the r direction. Since 5 hydro quantities per cell must be stored, the total number of hydro cells must be less than 100,000. Subject to these constraints, the largest mesh that can be core contained with skewed storage is 256 x 125. However, there is a different method of storage allocation that will accommodate the entire 100,000 cells. The single material problem with a closed form equation of state was coded in assembly language and timed. It requires roughly 4000 clocks to advance the hydro quantities one time cycle in 256 cells. With an efficient I/O buffering scheme, it is possible to do very large mesh calculations with no apparent delays for the I/O. No interesting 3-D problem can be core contained. Although the compute time per cell increases by about 30$> over 2-D, the number of cells - 21 - that can be contained in a core load decreases by about a factor of 3- The I/O required per cell in 3-D is roughly double that required by 2-D. Because of these factors, the 3-D problem is I/O bound. If disc transfer rates were roughly doubled, the I/O bound would disappear, The radiation transport problem was coded in Tranquil and then ''hand" compiled. The hand corn pi. Led code (which may be quite different from the code produced by the compiler) requires about 5000 clocks to determine the radiation transport in a single direction and single frequency group across 256 eulerian cells. There appears to be no significant I/O problem. Very interesting problems occur in multimaterial hydro (in tho equation of state routine) and in the table look-up procedure required to do radiation transport. These problems are explained and possible solutions proposed in two ILLIAC IV documents now in preparation. 1+.7 Anthropology Applications 4.7.1 Population Genetic Program At this time, the experimentation with full chromosomal simulation in 'SCATRE' has been completed. This work demonstrated the feasibility of developing such a program for the ILLIAC IV. Although the model itself was shown to be operable, it is also apparent, both from the results of this experiment and from discussion with geneticists, that the scale and detail accuracy of the model would have to be improved for later exacting scientific use. A program is now under development in ALGOL for the B5500 which will test several other techniques for possible inclusion in an ILLIAC IV program. The original model was limited by small population size and small amounts of chromosomal material per individual, so that its development of trait forms was limited to a few small elementary linear, scalar, threshold characters with two levels of integration. The B5500 experiments will include several expansions. Instead of just simple phase shift and substitution mutations (which will also be done by a new algorithm), there will be included algo- rithms to simulate crossovers, loops, and other such higher level randomiz- ing and sorting factors. A seven level non-scalar integrative and combinatorial plan for phenotype determination and several possible mating schemes will be tried along with some techniques of pseudo-simulation - 22 - (recording the state of the stabile parts of the chromosome by a register indicating "changed/same"), thus saving space for more program density and population size. The cede for several ct the above algorithms has aLready been generated, but they have not yet been tested in a combined model. The exact form of the final ILLIAC IV model will depend on the results of these experiments. U.7.2 Other Applications Work on other applications has, to date, been confined to exploratory discussions of possible uses of computers in Anthropology with special respect being paid to the specific capabilities of tne ILLIAC IV. Possible future programs include: Archeology- -the establishment of n-dimensional matrices of time, space, and form related data of pot sherds and other artifacts, and the manipulation of these matrices tc discover patterns of continuity both in strata and in evolutionary progression; Physicax Anthropology- -the application of transformation algorithms tc constellations of points representing fossil material to determine continuities and aberations; Social Anthropology-— the possibility has emerged that a fast efficient computer such as the ILLIAC IV will al-LOW the expansion of some of the traditional theories of sociograms combined with the ideas of the social network theory and certain mathematical tools and concepts from graph theory such as the m-clique into a significant tool for ethnographic evaluation; Theoretical Anthropology—recent inquiries indicate a possible relationship between the ideas of Levi-Straus, the theory of binary cognitive models, and explorations in the character of such binary networks as are found in the retina of the eye; such that computer simulation might be a possible tool of analysis. Further applications are also under consideration. - 23 - REFERENCES [l] Koga, Yoshiaki. Methods for Generating Diagnostic Sequences . (October 17, 1967), ILLIAC IV Document No. 160. c [2] Knowlton, Kenneth C. "A Programmer's Description of L ." Comm. ACM 9, (August, 1966), pp. 616-625. [3] Grothe, D. A Description of the Basic Assembly Language for ILLIAC IV, (October 5, 1967), ILLIAC IV Document No. 159. - 25 - AUG IB