The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN i DEC 16 in L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/analysisofalgori401chas 'J6r Report No. 401 *f \jt ANALYSIS OF ALGORITHMS FOR FINDING ALL SPANNING TREES OF A GRAPH by Stephen Martin Chase October 19, 1970 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS The Library of the J UN 978 at Urbana-Champaign Report No. 1<-01 ANALYSIS OF ALGORITHMS FOR FINDING ALL SPANNING TREES OF A GRAPH* by Stephen Martin Chase October 19, 1970 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 This work was supported in part by the following grants: US NSF GJ 217, US NSF GJ 8l2, and project BUILD. The last stage of thesis rewriting was supported by IBM. This work was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, October 1970. ANALYSIS OF ALGORITHMS FOR FINDING ALL SPANNING TREES OF A GRAPH Stephen Martin Chase, Ph.D. Department of Computer Science University of Illinois at Urbana-Champaign, 1970 Relatively little attention has been paid to the problem of measuring the efficiency of graph algorithms. The fact that the amount of work required by most graph algorithms varies greatly and unpredictably with the structure of the graph to which it is applied, makes this problem both practically important and theoretically difficult. Two major goals were set at the outset of this investigation: first, to investigate and develop general approaches and specific techniques for analyzing the efficiency of graph algorithms, and second, to test and illustrate some of these approaches and techniques by using them for the analysis and comparison of specific algorithms. With respect to the first goal, empirical and analytical methods are dis- cussed. The use of empirical methods is greatly facilitated by a Graph Algorithm Software Package, GASP, which is an extension of PL/1 and has sets and graphs as additional data types. With respect to the second goal, the problem of finding all the spanning trees of a graph was chosen. All published algorithms are analyzed and compared. A new algorithm is described, analytically compared to the previous algorithms, and found to be superior. For example, on the complete graph on n nodes, cost (new algorithm) = cost (A) / (/z ), where A is the most efficient previous algorithm. iii ACKNOWLEDGMENT The author wishes to thank Professor Jurg Nievergelt for his extraordinary assistance, advice, and encouragement during the preparation of this thesis . The support of the author's graduate education "by the Department of Computer Science, University of Illinois at Urb ana-Champaign, is gratefully acknowledged. The thesis research was supported by the following grants: US NSF GJ 217, US NSF GJ 812, and project BUILD. The last stage of thesis rewriting was supported by IBM. The author greatly appreciates the typing by Mrs. Joanne Bennett , The efforts of the many people who aided the preparation of this thesis are also appreciated. Finally, the author wishes to dedicate this thesis to his wife, Mary, and parents, Martin and Doris, for their sacrifices which made this effort possible . TABLE OF CONTENTS Page ACKNOWLEDGMENT iii 1. INTRODUCTION. 1.1. Goals of This Investigation 1 1.2. Related Efforts 2 1.3- Notation 2 2. EMPIRICAL METHODS OF MEASURING EFFICIENCY OF COMPUTATION. . k 2.1. Advantages k 2.2. Disadvantages U 2.3. Lessening the Disadvantages by Using Better Measuring Techniques 5 3. ANALYTICAL METHODS OF MEASURING EFFICIENCY OF COMPUTATION . 7 3«1. Advantages 7 3-2. Disadvantages 7 3 -3- Types of Analysis 8 h. THE PROBLEM OF FINDING ALL THE SPANNING TREES IN A GRAPH. . 11 k.l. The Problem: Its Variations and Applications. ... 11 k.2. The Algorithms: Their Common Features 12 5. DESCRIPTION OF THE ALGORITHMS Ik 5'1> Exhaustion lU 5-2. Determinants lU 5'3« Decomposition 15 5.^. Tree Transformations 15 5.5. Hamiltonian Paths 16 5.6. Introduction to Expansion Algorithms 16 5 -7 • Cancellation of Non-Trees 18 5.8. Circuit-Free Expansion 18 5«9« Connected Expansion 20 5.10. Factoring 20 5-11. More Factoring 23 5-12. Pruning 25 5.13. Variations 28 TABLE OF CONTENTS Page 6. ANALYTICAL MEASUREMENTS OF SELECTED ALGORITHMS 29 6.1. A Priori Bounds 29 6.2. Worst Case 30 6.3- Computation Trees 30 6.k. Direct Comparisons 32 6.5. Special Graphs: The Quotient Operator ~$k 6.6. Complete Graphs 37 7. CONCLUSIONS . kl REFERENCES k-2 APPENDICES VITA 1. INTRODUCTION 1.1. Goals of This Investigation Graph theory and Its applications have received much attention over the past two decades. In particular, many algorithms have been proposed for the solution of several graph problems which arise frequently In certain applications. However, relatively little attention has been paid to the problem of measuring the efficiency of these proposed algorithms. The fact that the amount of work required by most graph algorithms varies greatly and unpredictably with the struc- ture of the graph to which it is applied, makes this problem both practically im- portant and theoretically difficult. Two major goals were set at the outset of this investigation: first, to investigate and develop general approaches and specific techniques for analyzing the efficiency of graph algorithms, and second, to test and illustrate some of these approaches and techniques by using them for the analysis and comparison of specific algorithms. With respect to the first goal, there are two major categories of methods for measuring the efficiency of graph algorithms: empirical and analytical. Empirical methods are discussed in chapter 2, analytical methods in chapter 3. Empirical methods are greatly facilitated by a Graph Algorithm Software Package, GASP, which is described in detail in appendix 1. With respect to the second goal, one graph problem, that of finding all the trees of a given graph, was chosen. It is discussed in chapter 4. There are many published algorithms for this problem, and these are described in chapter 5. By concentrating on these algorithms, useful techniques for efficiency analysis were developed and tested. Such detailed study also led to the development of a new algorithm for finding all the trees in a graph. This new algorithm is analytically compared to the best among the known algorithms in chapter 6 and is found to be superior to them. 1.2, Related Ef forte Previous effort! which share some of the goals of this investigation fall mainly Into two categories: first, the analysis of specific algorithms, and second, the development of general purpose graph aoftware. A brief review of some of the most relevant papers follow. Authors of spanning tree algorithms sometimes present data on the perfor- mance of their algorithms ([Dawson 68], [Stehman 69]). This approach suffers from the fact that one cannot deduce the efficiency of an algorithm from its performance on isolated examples. A systematic comparison of seven algorithms on 13 graphs was done by Fernandez la his thesis [Fernandez 69a], and is mentioned in an abstract [Fernandez 69b]. Notable among the analyses of other graph algorithms are Gotlieb and Cornell's experiments with algorithms for finding a fundamental set of circuits ([Gotlieb 67], see also [Paton 69]), Shirey's analysis of algorithms for testing the planarity of graphs [Shirey 69], and Cornell and Gotlieb's analysis of an algorithm for testing graph isomorphism [Cornell 70] . The second category consists of papers which describe languages for graph processing ([Friedman, 69], [Hart, 69], [Read ], [Wolfberg, 69]). These lan- guages and GASP are similar in the sense that they all include graphs and sets as data types, and they all are extensions of an existing base language (e.g., FORTRAN, LISP). GASP is the only one which is an extension of PL/1, the richest widely available programming language. 1.3. Notation Throughout this thesis, "n" will stand for the number of nodes in a given graph; "b M will stand for the number of branches; "t" will stand for the number of spanning trees; M c" will stand for the time cost of an algorithm. Upper bounds on c will be expressed as c « 0(f(n,b,t)), which means that there exists a constant A such that c < A • f(n,b,t) for sufficiently large n, b, and t. Similiarly, lower bounds will be expressed as f(n,b,t) ■ 0(c), which implies that there exists a constant A such that c £ A * f(n,b,t) for sufficiently large n, b, and t. The cardinality of a set s will be denoted |s|. The symmetric difference (exclusive or) of sets si and s2 will be denoted si • s2. Truth values, YES and NO, will be combined using "&", "or", and "-.". A graph G consists of a set of words {v., v« v } and a set of branches {e , ..., e, }. The set of branches incident to v will be denoted B.. The degree of a node, "degree (v )", is equal to JB |. 2. EMPIRICAL METHODS OF MEASURING EFFICIENCY OF COMPUTATION There are several methods which could be used to measure the efficiency of graph algorithms. These methods tend to fall into two categories, empirical and analytical. Of course, some methods have both empirical and analytical features, but the division into categories is still useful in order to understand general principles. Empirical methods consist of implementing the algorithm on a computer, running several tests on it, measuring the cost, and drawing some form of con- clusion from the observed data. This approach has several advantages and dis- advantages. 2.1. Advantages The first advantage is that empirical measures are often easier to obtain than analytic measures. This is especially true if one needs the implemented algorithm to solve problems. Obtaining data is relatively trivial. Interpreting the data may be easy (e.g., if one only wants to compare algorithms qualitatively to find out which algorithm is best), or may be very difficult (e.g., if one wants a quantitative prediction of the cost on graphs which have not been tested). A second advantage occurs after a graph algorithm has been implemented: the programmer often sees ways to improve its efficiency (both on a programming level and a graph theoretical level). Insights into measuring the efficiency may occur as well. Finally, empirical results produce numbers corresponding to actual run times which may prove to be more useful than analytically derived formulas which often yield only rates of growth. 2.2. Disadvantages One major disadvantage of experimental testing of efficiency of graph algorithms is that the run time of an implemented program depends on many factors /hich have little or nothing to do with the algorithm proper. These factors Include the particular computer, language, and programmer; the implementation; ind the method of representing graphs. A change in some of these factors could reverse the experimental conclusion of the superiority of one algorithm over mother. Similarly, once the machine on which they were obtained becomes ob- solete, experimental results are likely to lose their value. The other major disadvantage of experimental testing is that because of :omputer time costs, only a small number of tests can be run. If the amount of :omputation required by the algorithm is sensitive to the structure of the graph, it becomes very difficult to accurately extend the results of tests on a small number of graphs to the class of all graphs. Similarly, if the algorithm requires computation time which increases rapidly rfith increasingly large graphs, experimental measures will be limited to tests on small graphs. For tree-finding programs, 15-node graphs may be too large [Dawson 68] . Costs of algorithms applied to small graphs usually will permit only very poor extrapolations to the costs of larger graphs. Many authors hide the inefficiency of their algorithms by illustrating them on small graphs where they appear reasonable. When applied to slightly larger graphs, the algorithms require considerably more computation. 2.3. Lessening the Disadvantages by Using Better Measuring Techniques The two disadvantages mentioned in the previous section are due in varying degrees to the use of data consisting of computer run times. In order to obtain data dependent on properties of the algorithm rather than on the particular i computer system used, the following technique can be used. Divide the given program into logical groups of operations which have the i property that during any test of the program, all the operations in a section I [will be executed the same number of times. Insert counters into the program, jone to each logical section. Assign weights to each operation, and compute the \ total weight of a section as the sum of the weights of all the operations in the section. Take the section counts from a computer test, multiply them by the corresponding weights, sum over all sections, and you have the total cost of that test graph. This technique decreases the dependence on the particular computer system used because one can arbitrarily assign weights to operations in a manner con- sistent with any imaginable computer system. The cost then corresponds to the run time on the imaginary computer, perhaps quite different from the real time cost of the test run. Furthermore, many different costs (based on different imaginary machines) can be computed at the cost of just one real test. To achieve this, simply save the section counts and change the weight system. This technique may increase slightly the size of test graphs which can be directly measured. Once the program has been debugged, any code which does not affect the flow of the program can be removed, reducing the real time required for a test without changing the computed cost. GASP is very useful when the above technique is applied. GASP allows programmers to express graph and set operations in natural terms, without regard to how these objects are represented. Similarly, the operations on these objects are expressed independently of their implementation. Assigning a reasonable set of weights to GASP operations is easy. Because programs written in GASP are independent of the representations, it is possible to run the same program with many different versions of GASP, thereby obtaining experience with different representations. GASP is structured so that small changes can be made in some GASP routines and data structures without requiring changes in the routines which use them. 3. ANALYTICAL METHODS OF MEASURING EFFICIENCY OF COMPUTATION In contrast to empirical methods, analytical methods involve the mathematical analysis of the computational structure of algorithms. This approach also has its relative advantages and disadvantages. 3.1. Advantages First, analytical results hold for arbitrarily large graphs, where experi- mental results would have to be extrapolated. Thus analytical results give a better indication of the true nature of the algorithm. Second, analytical measures are usually performed on the algorithm proper rather than a machine-dependent implementation of the algorithm. Thus the re- sults will not become obsolete when implementations improve. 3.2. Disadvantages The big disadvantage with the analytical approach is that many graph al- gorithms are difficult to measure analytically, especially when the cost of the algorithm varies greatly with the structure of the graph (and not just its size). The goal of analytical methods is to express the cost in terms of a few easily calculated parameters of the input graphs. For some algorithms, this goal is unobtainable, and one must do at least one of the following: 1. Restrict the estimates and bounds to apply only to some subset of the set of all graphs. 2. Introduce more complicated parameters. 3. Accept larger measurement errors. Another possible disadvantage of analytical measures is that they are derived for large graphs, so that small terms and details can be ignored. However, if for some reason the algorithm is applied only to small graphs, the ignored information may be more important than the derived formula. 8 3-3» Types of Analysis There are several techniques which can he used in making analytical measures of efficiency. These techniques will he illustrated "by applying them to an algorithm, A, of the following structure . A: "Pick an arbitrary node X Q . For all nodes X adjacent to X , do S . " S is an operation whose cost is large hut constant, so that the total cost of A is determined "by the number of executions of S. Some of the techniques will he more significantly used (and therefore illustrated) in chapter 6. A standard technique for measuring an algorithm's efficiency is worst- case analysis . If applied to algorithm A, the following analysis might take place. "The bound variable X takes its values from the set of nodes of the graph; therefore, n is a bound for the number of times S is executed. Hence, c = 0(n) ." This method is usually the easiest to apply, but usually the least accurate. If an algorithm has c = 0(n ) with k very small (say 2 or 3), then worst-case analysis may be accurate for some graphs. However, for less efficient algorithms or for typical graphs, the errors can grow rapidly and often become intolerable. In order to get bounds which are tighter than those from worst-case analysis, it is usually necessary to make assumptions. That is, the test graphs are assumed to have certain properties. For example, assume all nodes have the same degree, d (which could be either a constant or some small function of n) . Algorithm A would be analyzed as follows: "X will take on d values because that is exactly the number of nodes adjacent to X Q . Therefore, c = 0(d)." Assumptions should be chosen with care. Too many will make the analysis easy, but the conclusions will be of limited use. Too few may weaken the analysis so that only very loose bounds can be obtained. Particularly useful assumptions are those which specify the test graphs in terms of one or more parameters. With such assumptions, analytic bounds can be derived and expressed in terms of the parameters. For many algorithms, a useful one-parameter family of test graphs is the complete graph on n nodes. Complete graph analysis of Algorithm A would be as follows: "X. is adjacent to all of the other n-1 nodes; therefore, c = O(n-l)." Other possible examples of parameterized classes of graphs include circuits of n nodes, ladders of r rungs, star graphs of b branches, rectangular grids of r rows and c columns, and others of even more parameters. In addition to making the analysis easier, assumptions may be chosen in a way that reflects the intended use of the algorithm. For example, if the applica- tion is in electrical network theory, assumptions such as planarity or bounded degree of nodes may reflect physical limitations of the hardware. The main disadvantage of these techniques is that the assumptions restrict the set of graphs for which the conclusions are valid. It is possible that the conclusions will be false for most graphs. This disadvantage is lessened when estimates which are derived on a small class of graphs can be used as bounds on a larger class. For example, if the cost of an algorithm increases whenever a non- parallel branch is added to the test graph, then the cost of that algorithm on the complete graph on n nodes will be an upper bound of the cost on any graph I on n nodes . When the task is to compare two or more algorithms and to determine which I one is best, there are two approaches which can be used. The first approach is ! to apply the previously discussed techniques on each algorithm individually, and i : then compare the derived estimates and bounds. The second approach is to analyze | directly the computational aspects of the differences between the competing ; algorithms. To illustrate the second approach, suppose Algorithm B is obtained by I 10 modifying Algorithm A so that X Q is chosen to be a node of minimum degree. Then the comparison analysis may be as follows: "If the computation required in B to find a minimum degree X Q is negligible, then Algorithm B is better than Algorithm A because S is executed fewer times." One advantage of direct comparison is that the analysis is often easier, thus fewer (if any) assumptions will be required. With fewer assumptions, the conclusions will be valid for a larger set of graphs (perhaps all graphs). Another advantage is that the inefficient parts of the algorithms are pin- pointed. Such knowledge about the parts would be useful if it is possible to recombine the parts into new algorithms, or if analogous parts appear in another pair of algorithms. A disadvantage of direct comparison is that numerical bounds for individual algorithms are not automatically produced. A related disadvantage is that this method cannot be used on an algorithm which has nothing in common with other algorithms. 11 4. THE PROBLEM OF FINDING ALL THE SPANNING TREES IN A GRAPH 4.1. The Problem: Its Variations and Applications In order to make meaningful comparisons among graph algorithms, it is useful to focus on a single graph problem. For this thesis, the chosen problem is that of finding (i.e., listing exactly once) all the spanning trees of a connected undirected graph. A [spanning] tree is a set of [n-1] branches which are connected and contain no circuits. There are several variations of the problem, including the following: 1. count the number of trees in any given graph [see section 5.2]; 2. find formulas for the number of trees in special graphs ([Bercovici 69], [Cayley 89], [Mullin 67], [Myers 65], [O'Neil 66b], [Riordan 60]); 3. find all spanning trees of a directed graph ([Chen 66b, 67], [Paul 67]); 4. find all spanning trees common to two related graphs ([Ardon 69], [Mayeda 66, 68], [Stehman 69]); 5. find, for a given (directed or undirected) graph, all k-trees (which span k specified components), or all co-trees (complements of trees), or all sets satisfying certain conditions ([Berger 68], [Chen 65, 66a, 69a, 69d], [Dunn 68], [Mayeda 57], [Paul 67]); 6. find two spanning trees with minimal intersection [Chase ]; 7. find all rooted ordered trees of the complete graph [Scions 68], Only spanning trees on undirected graphs will be considered for the remainder of this thesis, so the following conventions will be used. The term "tree" will mean spanning tree, "graph" will mean undirected graph, and "finding all trees" will mean listing all the trees of a given graph without duplications. Factoring the trees into (unions of) cartesian products is allowed; the applications (see below) can use answers in this form ([Bedrosian 62], [Chen 69a], [Dunn 68]). 12 The primary application of finding all trees is in the analysis of linear electrical networks ([Hakimi 66], [Stehman 69], [Weinberg 58]). A second appli- cation is in the analysis of multilevel masers [Bedrosian 62]. Other potential applications have been mentioned. 4.2. The Algorithms: Their Common Features At least ten distinct algorithms for finding all trees have been proposed in the vast literature on this subject. In addition to their large number, these algorithms have other properties which make them highly desirable objects for efficiency measurements. One property of these algorithms is that the cost, c (as well as the number of answers, t) grows exponentially with the size of the test graph. Exponential algorithms are desirable as objects of efficiency measurements (both empirical and analytical) because the large growth rates magnify the differences between algorithms. Thus the inferiority of a bad algorithm will be apparent even on small graphs. Competing exponential algorithms usually have a variety of growth rates, allowing analytical measurements to determine the most efficient algorithm, because only the growth rates of the costs of algorithms are considered in analytical measurements. Examples of competing algorithms which 3 cannot be analytically contrasted because they share a common growth rate [n ] are the better algorithms for testing the planarity of graphs [Shirey 69]. Although different ideas for exponential algorithms can be contrasted by analytical measurements, differences in graph representation and differences in implementation efficiency do not show up. If an algorithm is more efficient in a particular representation, it will always pay to convert the input graph into 2 that representation because the cost of conversion is 0(n ) which will be small when added to an exponential term. Similarly, implementation improvements can do no better than to reduce the cost by a constant factor, which will not affect growth rates. Another property of these algorithms is that analytical bounds are difficult 13 to derive (this explains the complete lack of meaningful bounds by the many authors of these algorithms). One of the reasons for this difficulty is that the cost of these algorithms depends greatly on the parameter t, which cannot be expressed in terms of n and b (except for a few special graphs). Fortun- ately, the scarcity of individual bounds in terms of n and b does not rule out comparison analysis. For example, any algorithm whose cost grows faster than t will be inferior to any algorithm whose cost grows slower than t. A property of these algorithms which aids direct comparison analysis is that many of them can be arranged in a sequence in which the difference between one algorithm and the next is small. This property aids both the description and the analysis of the algorithms because only the differences need to be described and analyzed. Ik 5. DESCRIPTION OF THE ALGORITHMS This chapter briefly describes all known classes of algorithms for finding all trees in a graph. Sections 5.1 through 5.4 are independent of each other; section 5.5 describes a variation of the algorithm in section 5.4; section 5.6 introduces the remaining algorithms, each of which is described in terms of the differences from the preceding algorithm. 5.1. Exhaustion Exhaustion algorithms simply search through a large set of candidate branch sets, testing each to see if it is a tree of the graph. One algorithm ([Hale 61], [MacWilliams 58], [Mason 57], [Mayeda 57], [Weinberg 58]) generates all sets of n-1 branches from the graph, and tests each set to see if it is a tree. Another algorithm ([Char 68], [Zobrist 64]) takes a previously computed list of all the trees on the complete graph on n nodes, and tests each tree to see if all of its branches belong to the input graph. 5.2. Determinants The most efficient method to calculate t, the number of trees in a graph, is to evaluate a determinant [Harary 59], Let M be a n-1 by n-1 matrix with entries m.. defined as follows: m. . = degree (v.), and (for i*j ) m. . = - (the number of branches connecting v. to v.). Then, t = det (M) can be calculated by using any standard method of evaluating determinants (e.g., Gaussian Elimination). Determinant algorithms ([Trent 54], [Weinberg 58]) to find all trees need to evaluate determinants symbolically, a complicated (and costly) process. Some of the more efficient "determinant" algorithms ([Chang 68]), [Chen 68], [Malik 67], [Nakagawa 58]) turn out to be different presentations of algorithms to be described later (sections 5.4, 5.7, and 5.10). 15 5.3. Decomposition Many authors ([Berger 68], [Chen 69a, 69b], [Hakimi 64], [Jong 66], [Kim 60], [Lee 63], [MacWilliams 58], [Mayeda 59], [Myers 67], [Row 61], [Watanabe 61]) have suggested decomposition as a method to find all trees. The basic idea is to divide the graph into two or more subgraphs, to find the trees on these subgraphs, and then to combine these partial trees into trees of the input graph. Unlike most decomposition algorithms for other graph problems (e.g., planarity tests), the final step of combining the partial answers is not trivial. There are other difficulties in constructing decomposition algorithms. Only a few algorithms ([Chen 69a, 69b], [Kim 60], [Mayeda 59], [Myers 67]) avoid duplication and its penalty of checking each tree against the list of trees. Some algorithms ([Chen 69a], [Lee 63], [MacWilliams 58], [Myers 67]) can be applied only to special types of graphs. Apparently only one of these algorithms [Chen 69b] is general, avoids duplications, and overcomes some of the difficulties of combining partial trees. Like most of the references to decomposition algorithms, significant details are not specified, so no algorithm will be described (or analyzed) here. If the details could be worked out, a decomposition algorithm might be competitive with the best of the existing algorithms. 5.4. Tree Transformations Several algorithms ([Chen 69c], [Fujisawa 59], [Hakimi 61, 66], [Kishi 69], [Malik 67], [Mayeda 65, 66, 68], [Stehman 69], [Watanabe 60], [Wing 63]) are based on "elementary tree transformations". Tree Y.. is transformed by adding any new branch a„ and removing any branch a., which lies in the path connecting the endpoints of a ? . The new tree Y * Y- 9 {a..,a 9 }. For any two trees, Y and Y, there is a sequence of trees such that for 1 " *"' " k ' Y# is an elementary 16 tree transformation of Y. .. The "distance" from Y Q to Y, denoted d(Y-,Y), is the minimum number of transformations necessary to change Y. into Y. For all Y and Y Q , d(Y Q ,Y) ^ n-1. A tree transformation algorithm begins with an initial tree Y_. First, all possible elementary tree transformations are applied to Y_ to get X 1 , the set of all trees at distance 1 from Y_. Next, X ? , the set of all trees at distance 2 from Y , is found by applying elementary transformations to the trees in X.. . Similarly, X^ is found from X- by elementary transformations. This process continues until X is found where r = max[over Y]d(Y_,Y). The details of this algorithm, such as how to avoid duplications, will not be described here (see [Mayeda 65]). On some graphs, the choice of Y~ can make a big difference in the cost of the algorithm. The best Y is a "central" tree ([Deo 66], [Malik 68]), for which max [over Y]d(Y ft ,Y) is a minimum. One algorithm for finding a central tree has been suggested [Amoia 69]. 5.5. Hamiltonian Paths The trees of any graph can be arranged in a (Hamiltonian Path) sequence Y , Y 2 , ..., Y such that for 1 < i < t-1, d ( Y ± > Y ±+1 ) = 1 ([Chen 67], [Cummins 66], [Shank 68]). Algorithms to find trees in such an order have been suggested ([Kamae 67], [Kishi 67, 68]). These algorithms will not be described here because they are too complicated. 5.6. Introduction to Expansion Algorithms The remaining algorithms (sections 5.7 through 5.12) expand the "variable Cartesian Product" X, * X„ x . . . * X n , where the definition of the set X. 1 2 n-1 j depends on the choices of elements from X.. through X - . The basic flowchar for these algorithms appears in figure 1, and will be explained in detail in this section. Subsequent flowcharts will be described by explaining the changes in the contents of boxes 1 through 4. t ( START Y-» initialize calculate X 3 17 pick a . from X process \a^, a ,..., a jl j = n >^ NO f return) J«-J-l Figure 1: Expansion Algorithms 18 The two interlocking loops in the basic flowchart are roughly equivalent to n-1 nested loops (fixed nested loops cannot be used because n is a variable). The variable j specifies the nesting level. The "highest" level is 1, the "lowest" level is n. Each level j (1 < j £ n-1) , begins (box 2) with the calculation of X., a set of branches. X controls the iterations at level j. Namely, an iteration begins (box 3) with one branch being picked (and deleted) from X and being assigned to the bound variable a.. At the lowest level, the set {a., a_, ..., a , } is processed as a tree 12 n-1 candidate (box 4). When computation at a level j is completed, the algorithm "backtracks" (box 5) to the previous level j-1 where another iteration (a. 1 from X .) leads to a new instance of level j . 5.7. Cancellation of Non-Trees This algorithm ([Bellert 62], [Chen 65], [Maxwell 66], [Piekarski 65]) is actually a method of expanding the symbolic determinant mentioned in section 5.2 [Myers 65]. The flowchart for this algorithm appears in figure 2. Box 2 reads "X. <- B. - {a,, a_, .... a. . }" which means that X, is the set B, (all J 3 1 2* j-1 j j branches incident to node v.) excluding any currently assigned a (i = 1,2 j-1). Box 4 reads "L = L 9 {{a n , a.,..., a ,}}", where L is a list 1 Z n— 1 of tree candidates which have been generated at previous instances of the lowest level. If {a n , a_, ..., a , } is equal to a set S already in L, then S 12 n-1 is removed from L. {a., a_, ..., a -} is added to L if and only if there 12 n-1 is no such match. When the algorithm terminates, L is the list of all trees of the graph. 5.8. Circuit-Free Expansion This algorithm ([Brownell 68], [Char 68], [Hobbs 59], [Mason 57]) differs (startY* J<-1 19 !,<-■ r YES pick a . from X 3<-j+i NO L=l©{{ ai , a 2 , ..., a^]} NO >' > J<-J-l RETURN 5 Figure 2i Cancellation of Non-Trees 20 from the previous algorithm by avoiding the generation of non-treee (rather than waiting to cancel them out of the list L) . This algorithm rejects the choice of any branch which forms a circuit with previously chosen branches . At the lowest level, the tree candidate can be output immediately because any set of n-1 branches which does not contain a circuit is a tree. The flowchart for this algorithm appears in figure 3. Box 2 now reads "X «- B - Circuit_Makers (a.., a 2 , . .., a J_i)"« Box ^ now reads "Output 1* 3 2 ' "**• a n-l * 5.9. Connected Expansion This algorithm ([Berger 67], [Cummins 64], [Feussner 02, 04], [Hirayama 63], [Minty 65], [O'Neil 66a]) differs from the previous algorithm in the method of avoiding non-trees. Instead of testing for circuits, this algorithm preserves connectedness. The references cited above offer a variety of algorithms; an efficient representative is described below. The flowchart for this algorithm appears in figure 4. The new variables are Y. (needed to avoid duplications) and p (representing the nodes in the current connected subgraph). Box 1 has added the initializing statements "Y +■ Branches of Graph" and "p ■*■ v ". Box 2 now reads "X «- Boundary {p 1 , p„, ..., p } n Y " which means that X contains all branches (in Y . ) which have exactly one endpoint belonging to {p , p P-«^* This guarantees that any branch picked from X will preserve connectedness and avoid circuits. Box 3 has added the statement "p.., *■ other endpoint (a.)", which means that the r j+l - j endpoint of branch a, which is not already in {p. , p„ p } is assigned to the node variable P. +1 • Also in box 3 are the statements "remove a from Y " and "Y.., **■ Y " , which limit the choice of branches at lower levels (see 3 J+l J box 2) in order to avoid duplications. 5.10. Factoring This algorithm ([Ardon 69], [Chang 68], [Chen 68, 69c], [Cummins 64], START >\ 21 3<-i . <-B - Circuit_Maker« (a,, a 2 » •••• *1-1^ pick a , from X NO Output {a lf a 2 , ..., a n-1 ] j<-j-l M RETURN Figure 5* Circuit-Froe Expansion G*EV 22 3<-i Y,<— Branches of Graph 2 r->l X^^-Boundary {p x> p 2> ...» P JO T YES > pick a from X p <- other_endpoint (ii) remove a from Y Y <— Y ^ NO Output £a x , a 2 , ..., a n ^J r IT NO :a ^( return) V 1 3<-j-i Figure 4 i Connected Expansion 23 [Hirayama 65], [Holt 68], [Mason 57], [Mcllroy 69], [Nakagawa 58], [Percival 53]) differs from the previous algorithm in that when node p is added to the currently connected subgraph, all branches in X. which are incident to p are factored together into a single iteration. As a consequence, at the lowest levels, instead of individual trees of n-1 branches, Cartesian products of n-1 factors are produced. The flowchart for this algorithm appears in figure 5. In box 3, "pick A from X " and "p. , *■ other endpoint (A.)" mean that a. is picked from j j J+ 1 J J X and p , is the other endpoint of a. (as in the previous algorithm), j j + l J However, a. is now extended to A., a factor set of branches: A. = J J J {a } u X n B ; that is, A. contains all branches in X. which are incident j 3 P j+1 J 2 to p , . All of A. is removed from X.. Similarly, "remove A. from Y." *3+l J J J J deletes the entire subset A. from Y.. J J In box 4, "output A * A„ * . . . * A " means that a family of trees is output in the form of a Cartesian product of the factor sets A., 1 < j < n-1. This factored form is adequate for the applications (see section 4.2), but if individual trees are desired, they can be obtained by finding all combinations of one branch from each of the n-1 factor sets (this Cartesian product expan- sion could be accomplished by the flowchart in figure 1, with box 2: "X, «- A." and box 4: "Output {a n , a , ..., a , }") . r 1' 2' n-1 5.11. More Factoring The idea behind this algorithm is to factor into a single iteration (the last one) all those cases in which only one branch from X. appears in a tree. To avoid duplication, the other (earlier) iterations from X lead to the choice (at level i+1) of an additional branch of X. . 3 The flowchart for this algorithm appears in figure 6. The new variables are d. (a truth value which controls X.) and Z. (temporary storage for X ). Box 2 now reads "if d. then Z. ■*■ X. «- Boundary {p., p„, ..., p J } n Y, j j 3 3 1 2 K j ■ j START J<-1 P 1^" V 1 Y.<- Branches of Graph ±. X ^-Boundary [p^ P 2> •.., Pj}n Y 1 A YES > pick A from X ^ 3 p ^_ other_endpoint (A ) remove A from Y 3 j Y <-Y 3<-3+* NO Output A^X A 2 *..• x A n _ 1 4 j<-j-i 24 RETURN Figure 5* Factoring 25 else X «-X. ,". This means that if d . = YES, then X. is calculated J 0-1 J J as in the previous algorithm, and stored in Z.. If d = NO, then X. is as- signed the current value of X. . (since one branch was picked from X , at level j-1, this guarantees another branch from x ._i at level j). Box 3 has the additional statements "d,,. <■ (-id,) or (X . = $)" and J+l J J "if d & d , then A. «- Z.". Thus, if d, = NO then d... •*■ YES. If J J+l J J j J+ 1 d = YES and X. is not empty, then d. in «- NO. If d. = YES and X. is J j J+l J J empty, then d - *■ YES and A. is replaced by Z. (the saved value of the full set X. before deletions). J 5.12. Prunin g In the algorithms of sections 5.9 through 5.11, branches are deleted from the Y.'s. Even though the input graph was connected, deleted branches may cause YA. = Y. u (A. x A„ x . . . x A ,) to fail to connect all the nodes J J 1 2 n-1 (denote this situation by "YA. fails"). Once YA. fails, further computation J J at levels j through n-1 is wasted because no spanning trees can be found on a disconnected graph. Thus it would be useful to know when YA. fails. On the other hand, an additional connectedness test would be expensive because it would be executed so many times. The algorithm of this section differs from the three previous algorithms in that needless computation is avoided when YA. fails, but an additional J test for connectedness is not needed. This is accomplished by using "failure to find trees" as a test for connectedness. The flowchart for this algorithm appears in figure 7. The new variable is k. (the count of executed iterations from X ). Box 2 initializes this •J J count: "k. *■ 0". Box 3 increments the count "k . -«- k. + 1". J J J The major change occurs in the NO branch of the "j=l?" decision box where a further test is inserted: "k . = 0?", which means "was X. empty in box 2?". If the answer is NO , then computation proceeds (as in previous 26 START Y<- Branches of Graph d 1 <_yas if d then Z <- X <— Boundary {p^ p 2 P j) nT 1 else X J^V pick A from X J J p. .-,<— other_endpoint (A ) remove A from Y 3 3 y <— y 3+1 3 d. ,<-(-id ) or (X = 0) 3+1 3 3 if d 4 d then A <-Z 3j+l 3 3 3^-3+1 NO Output A x yA 2 y ... *A n-1 £ 3«-3-i YES >f return) Figure 6% More Factoring START >f 27 3<-l Y. <— Branches of Graph d 1 <_YES JiL 2 if d then Z <— X <— Boundary £p , p , • ••, P.} fl Y pick A. from X /v p ^ — other endpoint (A,) remove A from Y J J J+l^ j d. .<-(-,d ) or (X = 0) if d & d then A <-2 j j+1 J 3 3 3+1 3^3+1 - 1". This means that control returns to the previous level (j ■*■ j-1) and continues to return to higher levels (pruning unnecessary iterations) until I. > 1. The remaining iterations at level j are then pruned by proceeding to box 5 (j - J-D. The idea behind box 6 is that the first iteration of box 3 does not change the value of YA. from that of YA. . Thus the final value of j leaving box 6 indicates the highest level at which YA. failed. J 5.13. Variations There are many variations of the algorithms of sections 5.7 through 5.12. Some will be mentioned very briefly in this section. There is an algorithm, Circuit Check, which is "half way" between Cancella- tion of Non-Trees [5.7] and Circuit-Free Expansion [5.8]. This algorithm is use- ful for analysis and will be described in section 6.4. The algorithms of sections 5.7 and 5.8 can be generalized [Maxwell 66] by replacing B. with a somewhat more general cutset. Sections 5.7 and 5.8 can be improved by labeling the nodes so that degree (v.) < degree (v , ) . For sections 5.9 through 5.12, a good heuristic is to always choose p. to be of minimal degree. J For graphs with b < 2(n-l), it may pay to find all co-trees (using some form of duality) and convert them to trees. Finally there are many special cases which can occur in graphs (either initially or during computation) which can be handled more efficiently than the general case. For example, the existence of separating nodes or separating branches allows a quick decomposition. 29 6. ANALYTICAL MEASUREMENTS OF SELECTED ALGORITHMS The algorithms described in chapter 5 will now be measured by the tech- niques described in section 3.3. A priori bounds (which do not depend on the structure of an algorithm) are given in section 6.1. An example of worst case analysis appears in section 6.2. In the remaining sections, only the expansion algorithms [5.6 through 5.12] are analyzed, using the "computation tree" defined in section 6.3. Section 6.4 employs direct comparisons of consecutive algorithms, Section 6.5 introduces the "quotient operator" and uses it to measure the New algorithm on "closed ladder" graphs. Finally, section 6.6 applies "complete graph analysis" in order to obtain upper bounds for the factoring algorithms. 6.1. A Priori Bounds Sometimes it is possible to derive bounds for an algorithm without knowing its structure. If an algorithm is difficult to analyze, a priori bounds may be the tightest available bounds. Find-all-trees algorithms illustrate this point because the required number of answers, t, grows exponentially. Any algorithm which finds trees one at a time must have t = 0(c) [recall from section 1.3 that "f(n,b,t) = 0(c)" means 4A such that c > A • f(n,b,t)]. For the algorithms of sections 5.4, 5.5, 5.8, and 5.9, tighter bounds are difficult to obtain. Another example of "a priori" bounding occurs in the exhaustion algorithms u (section 5.1). The first algorithm checks all ( ) combinations of n-1 n-I b! branches, so regardless of the details, -r. ■ v, - > — rrv = 0(c). The second ° (b-n+1) ! (n-1) ! n-2 algorithm checks each of the n [Cayley 89] trees of the complete graph, so n = 0(c). The storage required by the second algorithm is also larger o than n . These lower bounds are sufficient to demonstrate the inefficiency of these algorithms. 30 6.2. Worst Case This section will illustrate "worst case analysis" as applied to the first exhaustion algorithm [5.1]. Let the branches of the graph be numbered 1 through b. Represent each combination of branches by an ordered list of n-1 positions, p., each position containing a branch number (p. - i, where l degree (v.) 4 n— x A—l J J [the most efficient case], c.(G.CNT) < (— ) n_1 . Thus c = 0(nt(— ) n_1 ). 4 n n In the analysis to follow, it will be convenient to use the notation ST(G, A, w) to denote the subtree of CT(G, A) consisting of the node w [w e CT(G, A)] and all the nodes and branches connected to w from below. If w is the root node, ST(G, A, w ) = CT(G, A). Two subtrees, ST(G , A , w ) and ST(G„, A , w„), are isomorphic if there is a one-to-one and onto mapping of the nodes and branches which preserves incidence and level relationships . 6.4. Direct Comparisons The expansion algorithms (5.7 through 5.12) will now be sequentially compared in terms of their computation trees. The expression "c . (A ) < c.(A.)" means that for all graphs G, and for each size parameter c. (i = 2, 4), c.(G, A x ) < c ± (G, A 2 ). 33 The Circuit-Free Expansion (CFE) algorithm [5.8] introduces two changes from the previous algorithm (CNT) . First, the "non-tree test" changes from "check L for duplicates" to "check for circuits". Second, this test has been moved up from box 4 to box 2. In order to isolate the change in efficiency, let us make these changes one at a time. If the second change is made without the first [Piekarski 65], the cost increases [based on limited empirical evidence]; therefore, let us try the first change without the second. Call this the Circuit Check algorithm (CtC) . Since the only change is in box 4 ("check {a.. , a„, ..., a } for circuits"), the computation tree does not change: c.(CtC) = c.(CNT). However, cost (box 4) drops from O(n.t) to 0(n) [the cost of a circuit test]. Clearly, Circuit Check is more efficient than Cancellation of Non-Trees. Now add the second change. The cost of each box is 0(n) regardless of the placement of the circuit test (only the constants change). However c . (CFE) < c.(CtC) because the non-trees are discovered sooner, and needless computation is avoided. For nearly all graphs G, c.(G, CFE) < c.(G, CtC). Thus the second change is an improvement also. The Connected Expansion (Con) algorithm [5.9] will be considered equally efficient as Circuit-Free Expansion. Both algorithms find trees one at a time, avoiding non-trees and duplications, so c, (CFE) = c.(Con) = t. Empirically, Circuit-Free Expansion appears more efficient [Fernandez 69a]. The Factoring (Fac) algorithm [5.10] is clearly an improvement over "one 12 k tree at a time" algorithms. Each factor A. = {a., a., .... a.} (box 3, J J J J figure 5), corresponds to a node w. ,_ in CT(G, Fac); i.e., A. corresponds to the entire subtree ST(G, Fac, w,,.), Each a. corresponds to a node w.,, J +1 J J+l in CT(G, Con). Therefore, ST(G, Fac, w .) replaces k subtrees ST(G, Con, w 1 +1 ), i = 1, ..., k. Clearly c.(G, Fac) < c.(G, Con), with equality holding only if G is a tree. Typically, c.(G, Fac)/t (the "cost per 3h tree") goes to zero exponentially as n increases [6.6, figure 10]. For each X. calculated in box 2 of the Factoring algorithm, the trees which contain iust one branch from X. will be calculated k times over, J ' where k is the number of iterations necessary to empty X.. The More Factoring (MF) algorithm [5.11] combines these k cases into a single iteration, clearly an improvement in efficiency. Thus c.(G, MF) < c.(G, Fac) , with equality holding rarely. Typically, c . (MF) /c . (Fac) goes to zero exponentially as n increases [6.6]. An intuitive indication of the improvement is that the factors are larger; i.e., on a complete graph, Factoring will always find at least one Cartesian product family consisting of a single tree [Chang 68], while (if n>3) every family found by More Factoring will contain at least two trees. The Pruning algorithm [5.12] is clearly an improvement. The test for correctedness is obtained at negligible cost, but the potential savings are large. Naming this algorithm (with factoring and pruning, [figure 7]) the New algorithm, c.(G, New) < c.(G, Fac). 6.5. Special Graphs: The Quotient Operator This section (as well as the next) will illustrate the technique of measuring the cost of an algorithm on a parameterized class of graphs, G = G(p). For the algorithms to be measured, c = k c„(G) + k, c,(G) with k. = 0(n); thus, the only quantities which need to be measured are c (G) and c ,(G) . Since G = G(p), c.(p) will replace c.(G). Only algorithms with factoring will be measured directly; the "one at a time" algorithms [5.4, 5.8, 5.9] have c /(p) = c (p)» the number of trees as a function of p. For the classes of graphs to be considered, c~(p), c,(p), and t(p) are all exponential in p. In order to derive, compare, and plot these func- tions, f(p), the "quotient operator", Q(f, p), will be used: Q(f, 1) = f(l), and for p>l, Q(f, p) = f(p)/f(p-l) [it is not difficult to interpret 35 P f(l) > 1, thus Q(f, 2) is well defined]. Clearly, f(p) = n Q(f, i) . i=l For the functions to be considered, there will always exist a "quotient limit", q(f, p) , either a constant or a linear function of p, such that lim ^IxJSl = i. For example, q(p!, p) = p, q(k P , p) = k, q(k P , k) = 0. p-*» q(f, p) As a first example of a special class of graphs, consider L(r), the closed ladder of r rungs (see figure 9). Since n(r) = 2r and b(r) = 3r-2, this example will show that even on graphs with rank > nullity [i.e., b < 2(n-l)], the New algorithm has cost per tree, c/t, going to zero exponentially. To prove this claim, it suffices to show that q(c.(r, New),r)/ q(t, r) < 1. It is not difficult to derive the equation t(r) = 4 t(r-l) - t(r-2), with t(l) = 1, t(2) = 4. In fact, t(r) = t(i+l) • t(r-i) - t(i) • t(r-i-l), for any i, ln-l _/ x (n-1; / , yn-l N lim ^n-iv (n-1) . Q(c 4 , n) =— — 2 =(n-lX^2) . Since n _(^) (n-2) q(c^, n) = (n-l)e. n— 7 The quotient limit for t(n) = n is similarly derived: Q(t, n) = n-1 n " 2 1 n " 1 Hn, (n " 2+ n>(^T> n / o ■ 1\ / n N iim n n-i .. , , = (n-2+ -) (— -7) ; _„ . , - 1; therefore, . nX n-3 n n-1 ' n^-°° (n-2)e (n-1) q(t,n) = (n-2)e. Figure 10 plots c,(n, CtC), t = c,(n, CFE) = c,(n, Con), c.(n, Fac) , and c.(n, New) using the quotient operator Q(f, n) . The quotient limits derived above are the asymptotic limits of the plotted functions. From this analysis, it is obvious that the cost per tree, c/t, goes to zero exponentially (e ) for the Factoring algorithm, and the cost ratio of -n the New algorithm to Factoring also goes to zero exponentially (/2 ). Thus, the most efficient way to find trees one at a time is to use the New algorithm combined with a simple Cartesian product expansion algorithm [cost (New) < cost (simple expansion) < cost (other "one at a time" algorithms)]. 40 Figure 10: Complete Graph Analysis 41 7. CONCLUSIONS One important contribution of this thesis is the efficiency analysis of all published algorithms for finding all trees of a graph. A very rough summary of this analysis is as follows: Algorithm Cost 2 "check for duplications" t "one tree at a time" t ~n Factoring te New t(e/2)~ n Note that for the algorithms with factoring, the cost per tree goes to zero exponentially as n increases. The techniques which were used to measure efficiency include the following: (1) the use of special classes of graphs on which the cost of an algorithm can be accurately measured (e.g., complete graphs); (2) the direct comparison (e.g., using computation trees) of competing algorithms in order to show differences in efficiency without the need to derive individual bounds; (3) the isolation of each idea of an algorithm (e.g., factoring) so that the efficient ideas can be available for the development and analysis of new algorithms; (4) the minimization of implementation details in empirical measurements (e.g., using GASP and counting statements rather than seconds); (5) the use of measures which reflect the nature of the class of algorithms (e.g., the quotient operator which linearizes the exponential nature of "recur- sive" algorithms). The New algorithm is an important contribution of this thesis primarily because these techniques show that it is more efficient than any previous algorithm for finding all trees. U2 REFERENCES These references are broken into three sections, and each section has its own aims and criteria for inclusion. The first section aims at being an exhaustive list of papers which discuss various aspects of algorithms for finding all spanning trees of a graph. In addition, this section contains some references which discuss graph theoretical results of potential importance to such algorithms (e.g., the existence of a Hamiltonian circuit in the tree graph of a graph, or bounds on the number of trees in a graph) . Several of these references discuss applications which use all spanning trees, mainly in the analysis of linear electrical networks. The second section contains references to reports on general purpose graph- processing languages or software packages (programs which implement a specific graph algorithm are not included here) . The third section includes a few references to important papers on other graph algorithms, in particular those concerned with the following problems: a) Find a minimum cost spanning tree of a graph. b) Find a basis in the vector space of circuits of a graph (also known as a set of fundamental circuits) . c) Determine isomorphism of graphs. d) Determine if a graph is planar. e) Find shortest paths in a graph. 1*3 1. Spanning Trees Amoia, V., and Cottafava, G. "On Central Trees," Proceedings of the 12th Midwest Symposium on Circuit Theory , paper XIV. 1, 196*9^ Ardon, M., and Malik, N. "A Recursive Algorithm for Generating Trees and Signed Complete Trees, " Proceedings of the 12th Midwest Symposium on Circuit Theory , paper VII. 2, 1969. Bedrosian, S. "Application of Linear Graphs to Multilevel Maser Analysis," Journal of the Franklin Institute , Vol. 27^, No. k, pp. 278-283, October 1962. Bellert, S. "Topological Analysis and Synthesis of Linear Systems," Journal of the Franklin Institute , Vol. 27^, No. 6, pp. k25-kk3, December 1962. Bercovici, M. "Formulas for the Number of Trees in a Graph, " IEEE Trans - actions on Circuit Theory , Vol. CT-16, pp. 101-102, February 1969. Berger, I. "The Enumeration of Trees Without Duplication," IEEE Transactions on Circuit Theory , Vol. CT-1^, pp. 1+17-^-18, December 1967. Berger, I., and Nathan, A. "The Algebra of Sets of Trees, K-Trees, and Other Configurations," IEEE Transactions on Circuit Theory , Vol. CT-15, pp. 221-226, September 1968. Brownell, R. "Growing the Trees of a Graph," Proceedings of the IEEE , Vol. 56, pp. 1121-1123, June 1968. Cayley, A. "A Theorem on Trees," Quarterly Journal of Mathematics , Vol. 23, pp. 376-378, 1889. Chang, W., and Chan, S.G. "A Fast Tree-Finding Method," Proceedings of the 11th Midwest Symposium on Circuit Theory , pp. i+57-^62^ 1968. Char, J. "Generation of Trees, Two-Trees and Storage of Master Forests," IEEE Transactions on Circuit Theory , Vol. CT-15, pp. 228-238, September 1968. Chen, W. "Generation of Trees and K-Trees," Proceedings of the Third Allerton Conference on Circuit and Systems Theory , pp. 889-899; 1965 • Chen, W. "On the Generation of Non-Singular Submatrices and Their Corresponding Subgraphs, " Proceedings of the Fourth Allerton Conference on Circuit and Systems Theory , pp. 207-217, 1966a. Chen, W. "On the Realization of Directed Trees and Directed 2-Trees, " IEEE Transactions on Circuit Theory , Vol. CT-13, pp. 230-232, June 1966b. Chen, W. "Hamilton Circuits in Directed-Tree Graphs," IEEE Transactions on Circuit Theory , Vol. CT-14, pp. 231-233, June 1967 . Chen, W. "Iterative Procedure for Generating Trees and Directed Trees, " Electronic Letters , Vol. K, No. 23, pp. 516-518, November 1968. Chen, W. "Computer Generation of Trees and Co-Trees in a Cascade of Multi- terminal Networks," IEEE Transactions on Circuit Theory , Vol. CT-16, pp. 518-526, November 1969a. Chen, W. "Generation of Trees and Co-Trees of a Graph by Decomposition, " Proceedings of the IEE ( London ), Vol. Il6, No. 10, pp. l639-l61+3, October 1969b. Chen, W. "On the Generation of Trees Without Duplications," Proceedings of the IEEE, Vol. 57, pp. 1292-1293, July 1969c. kk Chen, W., and Mark, S. "On the Algebraic Relationship of Trees, Co-Trees, Circuits, and Cutsets of a Graph, " IEEE Transactions on Circuit Theory , Vol. CT-16, pp. 176-I8I+, May 1969d. Cummins, R., and Thomason, L. "An Efficient Tree-Listing Program," Unpublished, 196^. Cummins, R. "Hamilton Circuits in Tree Graphs, " IEEE Transactions on Circuit Theory , Vol. CT-13, pp. 82-90, March 1966. Dawson, D. "Computational Aspects of the Topological Approach to Active Linear Network Analysis, " Proceedings of Hawaii International Conference on System Sciences , pp . 113-115, 1968. Dunn, W. Jr., and Chan, S.P. "Topological Formulation of Network Functions Without Generation of K- Trees, " Proceedings of the Sixth Allerton Conference on Circuit and Systems Theory, pp. 822-831, 1968. Fernandez, E. "Analisis de Redes Electricas con Computador Digital mediante Formulas Topologicas, " Thesis, Departamento de Electricidad, Universidad de Chile, 1969a. Fernandez, E. "An Evaluation of Tree Generation Methods, " Proceedings of the 12th Midwest Symposium on Circuit Theory , paper VIlT5~ 1969b • Feussner, W. "Uber Stromverzweigung in Netzformigen Leitern, " Annalen der Physik , Vol. 9, pp. I30I4-I329, 1902 . Feussner, W. "Zur Berechnung der Stromstarke io Netzformigen Leitern, " Annalen der Physik , Vol. 15, pp. 385-39^, 190^+. Fujisawa, T. "On a Problem of Network Topology," IRE Transactions on Circuit Theory , Vol. CT-6, pp. 261-266, September 1959~ Hakimi, S. "On Trees of a Graph and Their Generation," Journal of the Franklin Institute , Vol. 272, No. 5, PP- 3^7-359, November 196I. Hakimi, S., and Green, G. "Generation and Realization of Trees and K- Trees, " IEEE Transactions on Circuit Theory , Vol. CT-11, pp. 2^7-255, June 196^. Hakimi, S., and Deo, N. "A Topological Approach to the Analysis of Linear Circuits, " Proceedings of the Fourth Allerton Conference on Circuit and Systems Theory , pp. 197-206, 1966 . Hale, H. "A Logic for ] dentifying Trees of a Graph," ALEE Transactions on Power Apparatus and Systems , Vol. 80, pp. 195-197, June 196I . Harary, F. "Graph Theory and Electrical Networks," IRE Transactions on Circuit Theory , Vol. CT-6, pp. 95-109, May 1959- Kirayama, H., Watanabe, H., and Harada, K. "Digital Determination of Trees In Network Topology, " Journal of the Institute of Electrical Communications Engineers of Japan , Vol. ^6, No. 1, pp. 23-30, January 1963 • Hirayama, H., and Ohtsuki, T. "Topological Network Analysis by Digital Computer," Journal, of the Institute of Electrical Communication Engineers of Japan , Vol. hti, No. 3, pp. h2h-k'52, March 1965. Hobbs, E., and MacWilliams, F. "Topological Network Analysis as a Computer Program, " IRE Transactions on Circuit Theory , Vol. CT-6, pp. 135-136, March 1959- Holt, A., and Fiedler, J. "Efficient Tree-Generation Method Suitable for Computer Programming," Electronic Letters , Vol. k, No. 10, pp. 183-1814-, May 1968. h5 Jong, M., Lau, H., and Zobrist, G. "Tree Generation," Electronic Letters , Vol. 2, No. 8, pp. 318-319, August 1966. Kamae, T. "The Existence of a Hamilton Circuit in a Tree Graph, " IEEE Transactions on Circuit Theory , Vol. CT-14, pp. 279-283, September 1967. Kim, W., Freiman, C, Younger, D., and Mayeda, W. "On Iterative Factorization in Network Analysis by Digital Computers, " Eastern Joint Computer Confer - ence , pp. 2*4-1-253, December i960. Kishi, G., and Kajitani, Y. "On Maximally Distinct Trees," Proceedings of the Fifth Annual Allerton Conference on Circuit and Systems Theory , pp. 635- 6U3, 1967. Kishi, G., and Kajitani, Y. "On Hamilton Circuits in Tree Graphs," IEEE Transactions on Circuit Theory , Vol. CT-15, pp. 1 +2-50, March 1968. Kishi, G.,'and Kajitani, Y. "Maximally Distant Trees and Principal Partition of a Linear Graph, " IEEE Transactions on Circuit Theory , Vol. CT-l6, pp. 323-330, August 1969. Lee, S. "On Topological Formulae," Proceedings of the First Annual Allerton Conference on Circuit and Systems" Theory" pp. ^•27- I +55, 19&3 • MacWilliams, J. "Topological Network Analysis as a Computer Program," IRE Transactions on Circuit Theory , Vol. CT-5, pp. 228-229, September I95H. Malik, N., and Lee, Y. "Finding Trees and Signed Tree-Pairs by the Compound Method, " Proceedings of the 10th Midwest Symposium on Circuit Theory , paper VI- 5, 1967- Mason, S. "Topological Analysis of Linear Non-Reciprocal Networks," Pro - ceedings of the IRE , Vol. K^, pp. 829-838, June 1957- Maxwell, L., and Cline, J. "Topological Network Analysis by Algebraic Methods," Proceedings of the IEE ( London ), Vol. 113, No. 8, pp. 13^-13^7, August 1966. Mayeda, W. "Digital Determination of Topological Quantities and Network Functions," Interim Technical Report No. 6 , Contract No. DA-11-022-0RD- 1983, University of Illinois, Urbana, Illinois, January 1957* Mayeda, W. "Reducing Computational Time in the Analysis of Networks by Digital Computers," IRE Transactions on Circuit Theory , Vol. CT-6, pp. 136-137, March 1959- Mayeda, W., and Seshu, S. "Generation of Trees Without Duplications," IEEE Transactions on Circuit Theory , Vol. CT-12, pp. 18I-I85, June 1965. - Mayeda, W. "Generation of Trees and Complete Trees," CSL Report R-28U , University of Illinois, Urbana, Illinois, April 1966. Mayeda, './., Hakimi, S., Chen, W. and Deo, N. "Generation of Complete Trees," IEEE Transactions on Circuit Theory , Vol. CT-15, pp. 101-105, June 1968. Mcllroy, M. "Generator of Spanning Trees," Communications of the ACM , Vol. 12, No. 9, p. 511, September 1969. Minty, G. "A Simple Algorithm for Listing All the Trees of a Graph," IEEE Transactions on Circuit Theory , Vol. CT-12, p. 120, March 19&5 • Mullin, R., and Stanton, R. "A Combinatorial Property of Spanning Forests in Connected Graphs," Journal of Combinatorial Theory , Vol. 3, pp- 236-2^3* 1967. k6 layers, B., and Auth, L. Jr. "The Number and Listing of All Trees in an Arbitrary Graph," Journal of Combinatorial Theory , Vol. 3, pp. 236-2^3, 1967. Myers, B., and Auth, L. Jr. "The Number and Listing of All Trees in an Arbitrary Graph, " Proceedings of the Third Allerton Conference on Circuit and Systems Theory , pp. 906-912, 1965. Myers, B. "Efficient Generation of Tree -Admittance Products in a Cascade of 2-Port Networks," Proceedings of the IEE ( London ), Vol. 114, No. 11, pp. l64l-l646, November 1967. Nakagawa, N. "On Evaluation of the Graph Trees and the Driving Point Admittance," IRE Transactions on Circuit Theory , Vol. CT-5, pp. 122-127, June 1958 . O'Neil, P., and Slepian, P. "An Application of Feussner's Method to Tree Counting," IEEE Transactions on Circuit Theory , Vol. CT-13, pp. 336-339, September 1966a. O'Neil , P., and Slepian, P. "The Number of Trees in a Network," IEEE Transactions on Circuit Theory , Vol. CT-13 , PP • 271-281, September 1966b. ul, A. Jr. "Generation of Directed Trees and 2-Trees Without Duplications," IEEE Transactions on Circuit Theory , Vol. CT-14, pp. 35^-356, September L-clval, W. "The Solution of Passive Electrical Networks by Means of Mathematical Trees," Proceedings of the IEE (London), pt. 3, Vol. 100, pp. 110-150, May 1955. Piekarski, M. "Listing of All Possible Trees of a Linear Graph," IEEE Transactions on Circuit Theory , Vol. CT-12, pp. 124-125, March 19&5 • Riordan, J. "The Enumeration of Trees by Height and Diameter," IBM Journal of Research and Development , Vol. h, No. 5, pp. 473-478, November i960. , P. "On a Tree Expansion Theorem, " IRE Transactions on Circuit Theory , Vol. CT-8, pp. U 96-5OO, December 19637! Scoins, H. "Placing Trees in Lexicographic Order," Machine Intelligence 3 ? pp. 1+5-60, 1968. Shank, H. "A Note on Hamilton Circuits in Tree Graphs," IEEE Transactions on ■Vir.-uit Theory , Vol. CT-15, p. 86, March 1968 . hman, C, Maenpaa, J., and Stahl, W. "Complete Tree Generation - Some ractical Experience," :EEE Transactions on Circuit Theory , Vol. CT-16, pp. 548,550, November 196^! .11. "A Note on Enumeration and Listing of All Possible Trees in a ;ted Linear Graph," Proceedings of the National Academy of Science , . 40, pp. 1004-1007, October 1954. Watanabe, H. "Computational Method for Network Topology," IRE Transactions on Circuit Theory , Vol. CT-7, pp. 296-302, September i960. Watanabe, . [ethod of Tree Expansion in Network Topology," IEEE Trans - actions on Circuit Theory , Vol. CT-8, pp. 4-10, March 1961. W dnb :r . L. "Kirchoff's Third and Fourth Laws," IRE Transactions on Circuit Theory , Vol. CT-5, pp. 8-30, March 1958. Wine, 0. "Enumeration of Trees," IEEE Transactions on Circuit Theory , Vol. CT-10, pp. 127-128, March I963T hi Zobrist, G., and Lago, G. "Digital Computer Analysis of Passive Networks Using Topological Formula, " Proceedings of the Second Annual Allerton Conference on Circuit and Systems Theory , pp. 513-595, 196*+ • 2. General Purpose Graph Software Christensen, C. "An Example of the Manipulation of Directed Graphs in the AMBIT/g Programming Language, " Interactive Systems for Experimental Applied Mathematics , pp. ^23-^35, 196b. Friedman, D.P., Dickson, D.C., Fraser, J. J., and Pratt, T.W. "GRASPE 1.5- A Graph Processor and its Application, " University of Houston Report RS I-69 , Houston, Texas, August 1969. Hart, R. "HINT: A Graph Processing Language," Research Report , Computer Institute for Social Science Research, Michigan State University, East Lansing, Michigan, February 19&9 • Read, R.C., King, C, Cadogan, C.C., and Morris, P. "The Application of Digital Computer Techniques to the Study of Graph Theoretical and Related Combinatorial Problems, " Scientific Report , Computing Centre, University of West Indies, Mona, Kingston 1, Jamaica. Wolfberg, M.S. "An Interactive Graph Theory System," Moore School Report No. 69-25 ? University of Pennsylvania, Philadelphia, Pennsylvania, June 1969. 3. Selected References to Other Graph Algorithms Chase, S. "How to Win Shannon Switching Games: A Case Study in Automatic Graph Processing, " Communications of the ACM , (to appear) . Corneil, D., and Gotlieb, C. "An Efficient Algorithm for Graph Isomorphism," Journal of the ACM , Vol. 17, No. 1, pp. 51-6^+, January 1970. Dijkstra, E. "A Note on Two Problems in Connection with Graphs," Numerische Mathematik , Vol. 1, No. 5, pp. 269-271, October 1959- Gotlieb, C, and Corneil, D. "Algorithms for Finding a Fundamental Set of Cycles for an Undirected Linear Graph," Communications of the ACM , Vol. 10, No. 12, pp. 780-783, December 1967. Paton, K. "An Algorithm for Finding a Fundamental Set of Cycles of a Graph, " Communications of the ACM , Vol. 12, No. 9, pp. 51^-518, September 1969. Shirey, R. "Implementation and Analysis of Efficient Graph Planarity Testing Algorithms," Ph.D. Thesis , University of Wisconsin, Madison, Wisconsin, 1969. r, S.H. "GIT - A Heuristic Program for Testing Pairs of Directed Line Graphs for Isomorphism," Communications of the ACM , Vol. 1, No. 1, pp. 26-3^, January 196^. Witzgall, C. "On Labelling Algorithms for Determining Shortest Pahts in Networks," NBS Report 98I+O, 1968. US APPENDICES 1. GASP MANUAL 1.1. Purposes of GASP The main purpose the the Graph Algorithm Software Package is to allow programmers of graph algorithms to code programs in a natural and machine independent way. Because operations are expressed in a language of graph and set terms, the programs will be easy to follow and estimates of the amount of computation will be easier to compute. Comparisons among different algorithms for the same problem will be much easier using GASP becuase it is easy to generate programs from their description in conventional graph-theoretic terms. Moreover, some tallies on the amount of computation are provided by GASP. The logical structure of GASP makes it possible to change representations with relatively little change in programs. A few low-level routines would have to be rewritten, but the higher level programs would not. 1.2. Basic Concepts and Terminology 1.2.1. Data Types An integer has the usual definition. A character string is a fixed-length string of characters. A truth value is a variable which can take on one of the two values: yes or no. A name references an object (see below). An object is a conglomeration consisting of one integer, one character string, three names, and (most important) one set . A set can exist only as part of an object. There are two types of objects: restricted and unrestricted . Restricted objects can- not belong to sets. 1.2.2. Definitions and Assignments GASP objects are available to the programmer only through one level of ^9 indirect addressing. The programmer deals with names which refer to objects which have values. This relationship (shown in figure 11) is very important for the understanding of GASP. name definition assignment values (for set, etc.) Figure 11 In order to distinguish one level from the other, we will use two sets of words. A name is defined if it references an existing object, otherwise it is undefined. A name may change its definition; that is, it can be made to refer to a different object. The number of names referring to an object may vary from zero to any reasonable positive integer. When the contents of a set are changed, we say the set is assigned a new value . Also, the object involved is assigned a new value . The main advantage of this indirect addressing scheme is that not all objects need to be accessed by permanent names. E.g., all objects in a set S may be made accessible by means of the statement "FOR (ALL, X, S)", even if no object in S has previously been given a name. In this case, we regard the bound variable X as a name whose definition ranges over all the objects in the set S. 1.2.3. Graphs A graph is represented as an object whose set contains the branches and nodes belonging to the graph. Each node and each branch is an unrestricted object. In this first implementation, the set associated with a branch is the 50 set of two incident nodes; the set associated with a node is the set of all incident branches and adjacent nodes. 1.2. A. System Objects GASP reserves a few restricted objects for special use. NULLSET is a read-only object whose set is always empty. AC is an object whose set holds intermediate values of set operations. USED is the object whose set contains all currently active unrestricted objects. NODES and BRANCHES are the objects whose sets contain all nodes and branches (respectively) belonging to the union of all graphs. 1.2.5. GASP Statement Forms GASP is an extension of PL/1 (through the use of the PL/1 Preprocessor). Thus any PL/1 statement could be considered a GASP statement. The 'pure' GASP statements fall into three categories: PL/1 statements which declare or assign values to GASP data types; GASP Procedure calls which constitute a complete statement starting with CALL (unless the procedure name begins with $) and ending with a semi-colon; Type - functions where type is one of the following: name, integer, truth value. A type-function call can be inserted almost any- where a 'type' variable is allowed. 1.3. The GASP Statements for Sets 1.3.1. Notation In the instruction set that follows, the actual code that must appear (as spelled) is capitalized while arbitrary names used as arguments are not. GASP statements will frequently be set off by quotation marks which are ob- viously not part of the code. 51 1.3.2. Declarations GASP variables are declared just as regular PL/1 variables are declared. Conversion from terms in section 2 to actual program words is shown below. Formal Description Computer Code name (and unrestricted object) $NAME integer $INTEGER character string $CHAR truth value $BIT restricted object $MAXSET NOTE: Declaring a name (or unrestricted object) does not define it. However, 'DCL x $MAXSET;' will create a restricted object and that object will be the definition of x. $MAXSET is the only declaration which cannot be factored; 'DCL (SI, S2) $MAXSET;' would result in both names SI and S2 referring to the same object. 1.3.3. Definitions A name, x, can be defined in two other ways besides 'DCL x $MAXSET; ' [previous paragraph] . ' $ALLOC (x) ;' creates (storage for) a new unrestricted object which will become the definition of the name x (previously declared $NAME) . Also the character string part of this object will be assigned the value 'x 1 . 'x = name-expression ; ' will define (or redefine) the name x to refer to the object named by name-expression (name-expression can be either a pre- viously defined name or an arbitrary expression which computes a name value) . 1.3.4. Freeing of Storage The storage taken up by an object can be freed as follows: '$KILL (x) ;' will free the unrestricted object named x and the name x will become undefined. 52 'CALL POP (x) ;' will free the restricted object named x and leave x undefined. 1.3.5. Operations on Sets 1.3.5.1. Notation Nearly all arguments will be defined names. In the following examples, those names beginning with 's' are to be considered as sets (of any object) and those beginning with 'e' should be considered as elements of sets ('e 1 names must refer to unrestricted objects). As is the case with most GASP operations, no names are changed by the instructions in this section. 1.3.5.2. Truth Value Functions '$IS-IN(e, s)' answers "does e belong to s?". '$EQUALS(sl, s2)' answers "does set si = set s2?". '$EMPTY_(s)' is equivalent to '$EQUALS(s, NULLS ET) ' . 1.3.5.3. Procedure Calls '$STORES (sname, s_expression) ; ' will assign to the set named sname the value of the set named s_expression (which remains unchanged). '$CLEAR(s); f is equivalent to '$STORES (s, NULLSET) ;'. '$CHANGES (s, elem, op)', where op = ADD or DELETE, will add (delete) elem to (from) the set s. If this does not change the truth value of the expression "elem e s", then it is a harmless waste of time. '$CSES (s, e) ;' (Clear and Store Element in Set) assigns to s the value {e}. 1.3.5.4. Integer Functions 'CARD(s)' returns the integer number of elements belonging to s. 1.3.5.5. Name Functions (Choice) The functions in this section pick elements out of sets with varying 53 side effects. 'ELEM_OF(s) ' will return the name of an object belonging to the set s. This statement should not be used unless it is known that s is not empty. The set s is unchanged by this instruction. *CAN_PIC (e, s)' is a truth value function which will answer "CAN one PICk an element from s?". If set s is not empty, e will name an object belonging to s, which will then be deleted from s. If s is empty, e will be undefined. 'ITH_EL (s, i)' will return (withoug deleting) the i-th element of the set s, where i is an integer. Since this depends on the arbitrary (but fixed) ordering of elements in the implementation of the set, it has little use, 'RANDEL(s)' will return a randomly chosen element from the set s, with- out deleting it. 1.3.5.6. Name Functions (Intermediate Results) The name functions in this section all perform some operation on the input sets and store the result in the AC. The name returned is always AC. The input sets remain unchanged (unless one of them is AC). 'UNION (si, s2)' takes the union of sets si and s2. 'INTER (si, s2) ' takes the intersection of sets si and s2. 'COMPL (s)' takes the complement of the set s with respect to the universal set of unrestricted objects (useless by itself). 'DIFF (si, s2)' contains those objects belonging to si but not to s2 . 'SYMDIF (si, s2) ' contains those objects belonging to exactly one of the sets si and s2 (exclusive or) . 1.3.6. Saving Object Values 'CALL PUSH (x) ; ' saves the value of the object named x. ^ 'CALL POP (x) ;' restores the saved value of the object named x. For example, consider the following code: CALL PUSH (x); 2 CALL POP (x) ; 4 The values (of all the parts) of the object named x will be the same at points 1, 2, and 4 regardless of the values at point 3. The definition of x remains unchanged throughout. As the words 'push' and 'pop' imply, any number of copies of an object may be saved in this way, and restored in the usual 'last in - first out' order. Implementation restrictions will limit the number of saved objects at any point during execution. 1.3.7. I/O GASP does not aid the user in the input of sets. 'CALL PELEMSK (s) ; ' (Print ELEMent and SKip to next line) will print the character string of the object s and the character string of all objects belonging to the set of s. EXAMPLE: DCL (SI, S2, El, E2, E3, E4) $NAME; $ALLOC (SI) ; $ALLOC (El); $ALLOC (E2); E3 = El; E4 = E2; S2 = SI; $CSES (S2, E3); $CHANGES (S2, E4 , ADD); 55 CALL PELEMSK (S2) ; END; would generate the output line SI =■ (El, E2) and skip to the next line. *$PUT (var-name);' is like 'PUT DATA (var-name) ; ' but with no restrictions on var-name. 'CALL ABDUMP;' dumps the entire data base (of objects). Regular PL/1 I/O is also available. 1.3.8. Expanding Operations 'EXPAND2 (subr, set)' is a name function which returns AC. Subr may be any name function (e.g., UNION, INTER, SYMDIF) which takes two names as arguments and performs a binary (usually associative and communtative) operation on their sets, returning the name of the set which holds the result. For example, if s = {el, e2, e3}, then EXPAND2 (UNION, s) is equivalent to UNION(el, UNION (e2, e3)). If s is empty, then EXPAND2 (subr, s) is empty, and if s = {el} then EXPAND2 (subr, s) = (the value of the set) el. 'CALL EXPAND1 (subr, set);' is a procedure call which can be used with any procedure subr which takes one name as input. EXPAND1 will call this routine with set as the argument, and then will call it with each object belonging to set as the argument. Useful choices for subr include PELEMSK, PUSH, and POP. 1.3.9. Loop Control '$FOR (q, x, s);' code '$END; ' allows code to be executed iteratively with each iteration having a different definition of the name x chosen from the set s. The quantifier, q, may 56 be any number, including ANY (equivalent to 1) and ALL (equivalent to cardinalit of set s). Code will be executed minimum (q, ALL) times. 's' may be any name or name function. Once the $FOR statement is executed, changing s will not affect or be affected by the iterations. The $FOR - $END pair is a PL/1 block, and the bound variable x is automatically declared within this block (it need not be declared before). The normal exit from a $FOR - $END section is to the next statement after $END. Any other jump outside must be expressed as 'GO_TO label ;'. '$4ALLPAIRS (bvl, bv2, s) ' code '$4APEND; * is similar to the $FOR statement except that code is executed for all possible unordered choices of bound variables bvl and bv2 subject to bvl, bv2 e s and bvl * bv2. Abnormal exits must be through the statement 'G0_2 label;'. 1.4. The GASP Statements for Graphs Nodes will be denoted by n, nl, n2; branches by b, graphs by g. 1.4.1. Truth Value Functions 'INCIDENT (n, b)' answers "is n incident to b?". 'ADJACENT (n, n2) ' answers "is n adjacent to n2?". 1.4.2. Simple Information Extraction To get the set of adjacent nodes or incident branches of a given n or the set of incident nodes of a given b, use the following name functions (all return AC) : 'SET_OF_INCIDENT (NODES, n) ' , 'SET_OF_INCIDENT (BRANCHES, n) ' , or 'SET OF INCIDENT (NODES, b) ' . 57 To get the set of nodes in g, use the name function (returns AC) '$NOF(g)'. Similarly, the branches of g are obtained by '$BOF(g)'. 'CALL GET_BAN (b, nl, n2, g) ;' defines nl and n2 to be the endpoints of b in g. 1.4.3. Advanced Graph Operations 'NBOUND (nodeset, g) ' is a name function which returns (AC) the set of all nodes of g which do not belong to nodeset but are adjacent to at least one node belonging to nodeset. 'BBOUND (nodeset, g)' is a name function which returns (AC) the set of all branches of g which have exactly one endpoint belonging to nodeset. 'CALL INTBANS (s, g) ;' is a procedure call which returns with s re- assigned the value of the subset of branches of g which have both endpoints belonging to s (a set of nodes at input time). 'D1ST (nl, n2, g) ' is an integer function which returns the distance from nl to n2 in g. 'CALL COLAPS (b, g) ; ' is a procedure call which changes g by merging the endpoints of b into a single node and removing any branches connecting those endpoints (such as b). 'CALL DELBAN (b, g) ;' is a procedure call which deletes all trace of b from g. 'CALL DELNOD (n, g) ; ' is a procedure call which deletes n and all of its incident branches from g. 1.4.4. Graph I/O 'CALL READGR (g) ;' is the procedure call to input g. The input format is a sequence of paths of node numbers (from 1 to the number of nodes) . A new path is begun by a minus sign in front of the starting node [only Euler graphs can be given by a single path]. The entire sequence is terminated by a zero. READGR also will output g (see below). 58 EXAMPLE: Given the output sequence 1, 2, 4, -2, 3, 4, 1, 0, READGR would create the graph shown in figure 12. Figure 12 'CALL DEF_BAN (b, nl, n2, g) ;' is a procedure call which will create a branch b connecting nl and n2 in g. ' $PUTGRAPH (g) ;' is the procedure call which outputs g as a set of nodes and branches. It is equivalent to 'CALL EXPAND1 (PELEMSK, g) ;'. 1.5. Measuring GASP Programs A count of the number of executions of each block of a GASP program is accomplished with the following statements [even though they are complete statements, they need not be followed by a semicolon]. '$DCLSTAT(k) ' declares k integers to be used for counting. '$STAT' is placed in each logical block to be counted. '$CLEARSTAT' initializes the counts to zero. '$0UTSTAT' prints out the k integers, in the order that the '$STAT"'s appeared (compilation-wise, not execution-wise). 1.6. Implementation Details 1.6.1. Data Structure A Universal SET (USET) contains all GASP objects. USET is a PL/1 struc- 59 ture subdivided into $TSIZE objects (level name is ELEMENT) of which $SIZE are unrestricted. The current systems has $SIZE»64 and $TSIZE»127, but these can be changed easily. The set part (SSET) of an object is a bit string of length $SIZE (this is the only reason for restricted objects: an increase in the number of restricted objects increases the memory requirement only linearly; an increase in unrestricted objects increases memory requirements quadratically) . The other parts of an object are CHARP (CHARacter string Part), INTP (INTeger Part), REFP (REFerence Part), RP_2 and RP_3 (Reference Part 2 and 3). PL/1 declarations for the various data types [1.2.1.] are as follows: $CHAR - CHAR (8), $INTEGER = BIN FIXED (15), $NAME = BIN FIXED, and $BIT = BIT (1). 1.6.2. How GASP Works GASP procedures which require only a line or so of code are translated by the PL/1 preprocessor. The identifiers which are translated by the GASP macros usually start with a '$'. Longer GASP procedures are incorporated into the programs as separate PL/1 procedures. The user has a choice of two methods which include these procedures in his program. The more efficient way is to include them as pre- compiled external procedures. The more flexible way is to have their source code inserted into the main program: this allows the user to set the limits $SIZE and $TSIZE to fit his needs. 1.6.3. Cost Parameters Since the PL/1 preprocessor and PL/1 compiler are used, compilation time is usually large, run time is usually reasonable. For example, a typical program took 20 seconds to compile, 6 seconds to execute. The core requirement for the basic GASP programs and data is around 120k bytes, a typical program might require a total of 150k bytes. 6o GASP macro definitions require 206 lines of PL/1 code, the source code of GASP procedures is around 240 lines. 1.6.4. Implementation Defects When coding a binary set operation (e.g., 'UNION (SI, S2)'), one must make sure that at least one of the arguments is not AC. 'G0_T0' and 'G0_2' are precompiled into more than one PL/1 statement and therefore should not appear immediately after a 'THEN' . 61 2. THE NEW ALGORITHM PROGRAMMED IN GASP NEW: PROC(GtN); DCL G $NAME, N SINTEGER; DCL ( SET_PJ t XJ_NODES, AJ, XJ, P(N) ) $NAME, ( IS_DISCON, D(N) ) $BIT, J SINTEGER; %DCL ( KJ, ZJ, SAVED, $CARD ) CHAR; /* USE PARTS OF OBJECTS */ % KJ = •INTP(XJ)' ; % ZJ = »REFP(AJ)« ; % $CARD = «INTP' ; % SAVED = 'RP_2' ; /* RP_2 POINTS TO THE SAVED VALUE OF OBJECTS */ IF N < 3 THEN RETURN ; $ALLOC ( TEM P ) ; $ALLOC(XJ ); $ALLOC(SET_PJ); $AL LOC ( X J_NODES ) 5$ALLOC( A J ) ;$ALLOC(YJ ) ; BOXl: J = 1 ; P(l) = NODES#(l); $CSES ( SET_PJ, P(l) ) ; SSTORES ( XJ, SET_OF_INCIDENT ( BRANCHES, P(l))) *, SSTORES ( YJ, $BOF ( G ) ) ; D( 1 ) = YES ; IS_DISCON = NO ; BOX2 : $STAT /* COUNT C2(G, NEW) */ IF D(J) THEN DO; $STAT SALLOC(ZJ); $STORES( ZJ, XJ ) ; $STORES(XJ_NODES, DIFF( EXPAND2 ( UN I ON , X J ) , SET_PJ ) ) ; /* NODE BOUNDARY ( P 1 , P2 , . . . , P J ) */ SCARD ( XJ_NODES ) = CARD ( XJ_NODES ) J END; KJ = ; BOX3 : IF - D( J ) THEN DO ; SSTORES ( TEMP, DIFF ( AJ, ZJ ) ) ; IF -n $EMPTY_( TEMP ) THEN DO; $STAT /* MOW PAY THE PRICE FOR INCORRECT XJ */ $STORES ( AJ, INTER ( AJ, ZJ ) ) ; SSTORES ( SAVED ( YJ ), UNION(YJ, TEMP) ) ; SSTORES ( SAVED ( XJ ), UN I ON ( SA VED ( XJ ) , TEMP) ) ; END; D ( J + 1 ) = YES ; GO TO BUMP ; END; /* FLSE IF D(J) THEN */ IF SCARD ( XJ_NODES ) > THEN D ( J + 1 ) = NO *, ELSE DO; $STAT D(J+1) = YES J SSTORES ( AJ, ZJ ) ; SKILL(ZJ) J END; BUMP: KJ = KJ + l ; CALL PUSH(AJ) ; J = J + 1 ; J_EO_N : IF J < N-l THEN GO TO BOX2 ; B0X4 : SSTAT /* CMG, NEW) */ 63 IF D(J) THEN $STORES ( AJ, YJ) ; ELSE SSTORES ( AJ, INTER(ZJ, YJ) ) ; /* 'OUTPUT Al X A2 X • . • X A(N-1)» COMES HERE */ BOX5 : J = J - 1 ; CALL POP(AJ); CALL POP(YJ); CALL POP(XJ); CALL POP ( X J_NODES ) ; SCHANGES ( SET_PJ, P(J+1), DELETE ) ; IF IS_DISCON THEN GO TO DISCON ; XJ_EMPTY_: IF $CARD (XJ_NODES) > THEN GO TO BOX3 ; J_EO_l : IF J = 1 THEN GO TO RETURN_ ; GO TO BOX5; niSCOM : $STAT IF O(J) THEN DO; $KILL(ZJ); $STAT END; IS_DISCON = ( KJ = 1 ) ; GO TO J_EO_l ; RETURN_ : $KILL(YJ); $KILL(AJ); $KILL(XJ); $K I LL ( X J_NODES ) ; $KILL(TEMP); $KILL(SET_PJ) ; RETURN; END NEW; 6k VITA The author, Stephen Martin Chase, was born in Urbana, Illinois, on September 21, 19^3* He received his Bachelor of Science degree in Mathematics in June 1965, and his Master of Science degree in Mathematics in June 1967 from the University of Illinois. From June 1965 to June 19J0, he was a research assistant in the Department of Computer Science of the University of Illinois at Urb ana- Champaign . In June 1970, he joined the research staff of the Thomas J. Watson Research Center in Yorktown Heights, New York. ■■-■■ tf0* \&tt