UNIVERSITY OF 
 
 ILLINOIS LIBRARY 
 
 AT URBANA-CHAMPAIGN 
 
l h o^t?eToA h ts7ei ng this Serial i s re 
 ? hi ^ h w withdraw' thC lib 4fro« 
 
 T ° renew coll T»i , *«"M«ol f rom 
 
 ^ = T========= = ~Lr!!^^A^ IGN 
 
/ 
 
J^-lU^ UIUCDCS-R-T^-655 
 
 June, ±9lh 
 
 CLUSTERING BY CLIQUE GENERATION 
 Chih-Meng Cheng 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
UIUCDCS-R-TU-655 
 
 This volume is bound without no. 656 which is 
 
 a restricted publication. ERATION 
 
 which is/are unavailable. 
 
 June, 197^ 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBAN A, ILLINOIS 6l801 
 
 Submitted in partial fulfillment of the requirements for the degree of 
 Master of Science in Computer Science in the Graduate College of the 
 University of Illinois at Urban a- Champaign, June, 197^ 
 
UIUCDCS-R-7^-655 
 
 CLUSTERING BY CLIQUE GENERATION 
 
 by 
 Chih-Meng Cheng 
 
 June, 197^ 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 
 UNIVERSITY OF ILLINOIS AT URBAN A-CHAMPAIGN 
 
 URBANA, ILLINOIS 6l801 
 
 Submitted in partial fulfillment of the requirements for the degree of 
 Master of Science in Computer Science in the Graduate College of the 
 University of Illinois at Urban a- Champaign, June, 197^ 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/clusteringbycliq655chen 
 
/no. &SS-&&0 
 
 ACKNOWLEDGEMENT 
 
 I wish to thank my advisor, Professor D. S. Watanabe, for his constant 
 gui dance, patience, and encouragement, in the preparation of this thesis. 
 
 Financial support from the Department of Computer Science of the 
 University of Illinois is also gratefully acknowledged. 
 
IV 
 
 TABLE OF CONTENTS 
 
 PAGE 
 
 1. INTRODUCTION 1 
 
 2. BRON-KERBOSCH ALGORITHM 1+ 
 
 2.1 Analysis k 
 
 2.2 Bron-Kerbosch Algorithm 10 
 
 2.3 Moon-Moser Graphs 12 
 
 3. IMPLEMENTATION 15 
 
 3.1 Numerical Results l6 
 
 LIST OF REFERENCES 19 
 
 APPENDIX 20 
 
-1- 
 
 INTRODUCTION 
 
 Given a set of objects each described by a vector of characteristics, 
 a clustering technique groups those objects with similar characteristics 
 together into subsets called clusters. The similarity criterion uses an 
 appropriate distance function measuring the distance between objects which 
 varies with the interpretation of the characteristic vector for the set. 
 
 Clustering techniques are useful in many areas. For example, they can 
 be used in medicine to identify new diseases and to refine existing disease 
 categories, in biology to develop taxonomies for plants and animals, and in 
 archaeology to classify artifacts with respect to period and style. 
 
 There is no general method which always yields useful clusters for an 
 arbitrary set of objects. Usually different techniques are tried, and often 
 relevant clusters can be obtained through comparison of the results. Two of 
 the most popular and effective techniques are the single-link method [5] 
 and clique generation. 
 
 In both methods, the set of objects is interpreted as an undirected 
 graph. For a given distance function, we can define a threshold <S such that 
 if the distance between two objects is less than 6, then the two objects are 
 said to be similar. Using this concept, the set of objects can be interpreted 
 as a graph where nodes represent objects and edges join nodes representing 
 similar objects. 
 
 The single-link method is the simplest and oldest technique. In this 
 method a cluster is defined to be a connected component of the graph. A 
 connected component is a subgraph in which each pair of nodes is joined by 
 a path or sequence of edges . For objects which form distinct disconnected 
 
-2- 
 
 clusters , the single-link method often yields good results (see Figure la), 
 and the simplicity of the method allows it to be applied efficiently to large 
 sets of objects. However, the obvious disadvantage of this method is the 
 chaining effect, where pairs of nodes may be joined by a path but the distance 
 between the objects they represent is large (see Figure lb). 
 
 Figure la. Single- link clusters 
 
 Figure lb. Chaining effect 
 
 In clique generation a cluster is defined to be a maximal complete subgraph 
 or clique of the graph. A complete subgraph is one in which every node in the 
 subgraph is adjacent to every other node in the subgraph. A maximal complete 
 subgraph is a complete subgraph which is not properly contained in any other 
 
-3- 
 
 complete subgraph. The problem of finding all the cliques in an arbitrary 
 graph is well known, and many algorithms have been proposed. The earliest 
 were developed by Bonner and Bierstone [l] , but the most efficient algorithm 
 available was devised by Bron and Kerbosch [2]. Unfortunately finding cliques 
 is much more difficult than finding connected components. In fact the problem 
 of finding all the cliques in an arbitrary graph is polynomial-complete, and 
 hence is equivalent in difficulty to the notoriously difficult "traveling- 
 salesman" problem. 
 
 These two methods illustrate how tightly clusters can be defined by lying 
 at the extremes of any reasonable scale of tightness. Cliques contain more 
 information about the structure of a graph than connected components, and 
 although they are often too tight to be used as clusters, they can form the 
 nuclei of useful clusters. Hence we will restrict our attention to clique 
 generation. 
 
 Chapter 2 presents a detailed study of the Bron-Kerbosch algorithm. The 
 algorithm is described, analyzed, and shown to be near optimal. Chapter 3 
 discusses the efficient implementation of the algorithm, describes an efficient 
 new implementation, and presents numerical results demonstrating the superiority 
 of the new implementation over previous ones. 
 
-k- 
 
 2. THE BRON-KERBOSCH ALGORITHM 
 
 2.1 Analysis 
 
 Mulligan [k] studied the algorithms of Bonner, Bierstone, and Bron and 
 Kerbosch in detail. His tests showed that the Bron-Kerbosch algorithm is 
 
 l+£ 
 
 fastest. To generate n cliques, it required time of 0(n ), where e is a 
 small positive quantity, while the other algorithms required time of 0(n ). 
 Unfortunately the number of cliques in a graph generally is an exponential 
 function of the number of nodes in the graph. Hence the speed of an 
 algorithm is crucial. 
 
 Bron and Kerbosch' s presentation of their algorithm is not well motivated. 
 Hence we will attempt to state clearly the motivation behind their algorithm 
 and to explain why it works so well. 
 
 Figure 2 illustrates how clique generation can be applied to a simple 
 data set. Figure 2a presents the adjacency matrix A for the six objects in 
 the set. Here a. . is 1 if and only if object i is similar to object j; 
 otherwise a. . is 0. Figure 2b presents a graph G representing the data set. 
 
 -'-J 
 
 The nodes in G correspond to the objects in the set, and an edge connects any 
 two nodes in G corresponding to .similar objects. Figure 2c presents a graph 
 T summarizing the possible ways of generating the cliques of G, starting with 
 the empty set, <j>. Each node in T is a complete subgraph, and each edge from 
 a node a in level £ to a node 3 in level I + 1 is labeled with the node of G 
 added to a to form 3. A clique is generated by traversing a path or sequence 
 of edges which terminates in a clique. 
 
 Obviously one way to generate all the cliques is to visit every node and 
 traverse all the paths in T. This approach is time-consuming and wasteful 
 
-5- 
 
 
 1 
 
 2 
 
 3 
 
 It 
 
 5 
 
 6 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 2 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 3 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 k 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 5 
 
 
 
 
 
 1 
 
 
 
 
 
 1 
 
 6 
 
 
 
 
 
 1 
 
 
 
 1 
 
 
 
 k 
 
 Figure 2a. Adjacency Matrix A 
 
 Figure 2b. Graph G 
 
 level 
 
 level 1 {1} 
 
 level 2 {1,2} {1,3} {2.3} 
 
 {2,U> {3,5} {3,6} {5,6} 
 
 level 3 
 
 {1,2,3} 
 
 {3,5,6} 
 
 Figure 2c. Clique Generation Graph T 
 
-6- 
 
 because most of the paths lead to cliques already generated. Most early 
 algorithms perform an ordered traversal of V, but this is still wasteful 
 because subsets of already generated cliques are repeatedly constructed. 
 Bron and Kerbosch used a cleverer approach and -were able to eliminate 
 certain paths by applying the ideas formalized in the following lemmas. 
 
 Lemma 1. Suppose the paths from node a in T beginning with the edge 
 labeled with node a of G have been explored so that all cliques containing 
 a U {a} have been generated. Then only those paths from a beginning with 
 edges labeled with nodes of G not adjacent to a need be explored. 
 Proof. Let C be any clique generated by exploring a path from a beginning 
 with an edge labeled with a node adjacent to a. Then it must either contain 
 or not contain nodes not adjacent to a. Suppose it contains such a node, 
 say b. Then clearly it can be generated by exploring a path from a begin- 
 ning with the edge labeled with node b which is not adjacent to a. Suppose 
 it contains no such nodes. It obviously contains ot , and it must contain a 
 since all its other nodes by assumption are adjacent to a. Therefore it has 
 already been generated. Q.E.D. 
 
 Lemma 2. Suppose the paths from node a in T beginning with the edge labeled 
 with node a of G have been explored so that all cliques containing a U {a} 
 have been generated. Then at any node 3 of T which properly contains a, 
 those paths beginning with an edge labeled with a can be ignored. 
 Proof. Suppose a path from 3 beginning with an edge labeled with a is 
 
-7- 
 
 explored. It must clearly terminate in a clique containing a (J {a}. But 
 
 "by assumption, all the cliques containing a U {a} have already been generated, 
 
 and hence the path can be ignored. Q.E.D. 
 
 The Bron-Kerbosch algorithm is recursive; upon arriving at a node in 
 level &, the algorithm calls itself to explore the levels higher than I. 
 Lemma 1 is applied at each level. Upon first arriving at node a in T at 
 level £, the algorithm selects a node of G called FIXP that is adjacent to 
 the most nodes adjacent to the partially constructed clique a, moves to the 
 node a U {FIXP}, and calls itself to construct all the cliques containing 
 a U {FIXP}. This choice of FIXP eliminates the maximum number of paths from 
 a. Upon returning to node a, the algorithm chooses a node of G called SEL 
 that is adjacent to the nodes in a but not adjacent to FIXP, moves to the 
 node a U {SEL}, calls itself to construct all the cliques containing a (J {SEL}, 
 and repeats this procedure for all such nodes. This process is illustrated 
 for the graph G of Figure 2b in Figure 3. 
 
 Lemma 1 cannot be used to eliminate all redundant edges. Note, for 
 example, that the edge labeled 6 from node {3} in Figure 3 is traversed 
 although it leads to the clique {3,5,6}, previously generated. However, 
 lemma 2 can be used to eliminate the edge labeled 5 from node {3,6}, and the 
 clique is not regenerated. 
 
 Lemma 1 is used only once at each level for the node FIXP. Conceivably 
 it could be applied repeatedly for every node SEL at each level. However, 
 if it is used to eliminate edges labeled with nodes adjacent to SEL, some 
 
-8- 
 
 [9] ^"1 [91^2 [l]/3 
 
 {1} 
 
 {6} 
 
 {2,3} 
 
 {1,2,3} {3,5,6} 
 
 [ l] Select 3 as FIXP at level 0, and move to {3}. 
 
 [ 2] Select 1 as FIXP at level 1, and move to {1,3}. 
 
 [ 3] Select 2 as FIXP at level 2, move to clique {1,2,3}, and hack up to {3}. 
 
 [ h] Ignore 2 because 2 is adjacent to 1, FIXP at level 1. 
 
 [ 5] Select 5 as SEL at level 1 and move to {3,5}. 
 
 I 6J Select 6 as FIXP at level 2, move to clique {3,5,6}, and back up to {3}. 
 
 [ 7] Select 6 as SEL at level 1, and move to {3,6}. 
 
 [ 8J Ignore 5 because 5 was selected at {3}, and back up to <j> . 
 
 [ 9] Ignore 1, 2, 5, and 6 because 1, 2, 5, and 6 are adjacent to 3, FIXP at 
 
 level 0. 
 
 [10] Select h as SEL at level 0, and move to {U}. 
 
 [ll] Select 2 as FIXP at level 1, move to clique {2,^}, back up to <j>, and stop, 
 
 Figure 3. Application of the Bron-Kerbosch Algorithm 
 
-9- 
 
 previously ignored edges may have to be traversed. For example, if lemma 1 
 is applied at node {3} in Figure 3 for node SEL, when SEL is 5, the edge 
 labeled 6 can be ignored, but only if the previously ignored edge labeled 2 
 is traversed. Hence, the lemma should be used selectively so that the number 
 of new edges to be traversed is less than the number of edges to be eliminated. 
 
 The elimination of all redundant edges requires additional tests. Thus 
 if lemma 1 is applied at node (3) for node 5, the edge labeled 2 must be 
 traversed to generate the clique {2,3,6}, if this clique exists. However if 
 a test reveals that nodes 2 and 6 are not adjacent, this clique cannot exist, 
 and the edges labeled 2 and 6 can both be ignored. 
 
 These modifications were incorporated into the Bron-Kerbosch algorithm, 
 but the performance of the algorithm was not improved because the time 
 required to perform the additional tests was comparable to that required to 
 traverse the redundant edges. Therefore it seems unlikely that the Bron- 
 Kerbosch algorithm can be improved significantly. 
 
-10- 
 
 2.2 Bron-Kerbosch Algorithm 
 
 The following formulation of the Bron-Kerbosch algorithm is very similar 
 to Mulligan ' s formulation . 
 
 ALG0RITHM_BR0N-KERB0SCH: PROCEDURE; 
 
 DECLARE S the set of all data nodes, 
 NIL the empty set, 
 C a global integer variable, 
 COMPSUB a global set of nodes ; 
 
 /* COMPSUB is a complete subgraph- containing C nodes */ 
 STEP_1: /* Initially COMPSUB is empty, none of the nodes have been explored, 
 and all nodes are candidates which can be added to COMPSUB. 
 Hence call EXTEND with arguments NIL and S. */ 
 C = 0; 
 
 COMPSUB = NIL; 
 CALL EXTEND (NIL,S) ; 
 EXTEND: PROCEDURE ( EXPL,CAND) RECURSIVE; 
 DECLARE EXPL a local set of nodes , 
 
 /* EXPL = {a e S \ a adjacent to all the nodes e COMPSUB, and 
 all cliques containing COMPSUB U {a} have been generated}. 
 Nodes in EXPL are not added to COMPSUB because this would 
 lead to cliques previously generated. */ 
 CAND a local set of nodes, 
 
 /* CAND = {a e S | a adjacent to all nodes e COMPSUB, 
 
 a 4 EXPL}, the set of nodes called candidates that can be 
 added to COMPSUB to form new complete subgraphs. */ 
 
-11- 
 
 NEXPL a local set of nodes , 
 
 /* NEXPL = {a e EXPL | a adjacent to SEL}, the new set of 
 explored nodes constructed in STEP_3 of EXTEND for the 
 next recursive call to EXTEND. */ 
 NCAND a local set of nodes, 
 
 /* NCAND = {a £ CAND | a adjacent to SEL, a ± SEL}, the 
 new set of candidates constructed in STEP_3 of EXTEND 
 for the next recursive call to EXTEND. */ 
 FIXP a local variable representing one node, 
 
 /* FIXP is the first node e EXPL U CAND adjacent to the 
 most nodes e CAND. */ 
 SEL a local variable representing one node; 
 
 /* SEL is a node e CAND selected to be added to COMPSUB. */ 
 STEP_2: /* Choose FIXP and SEL. */ 
 
 FIXP = first node e EXPL U CAND adjacent to the most nodes e CAND; 
 IF FIXP e EXPL 
 
 THEN SEL = first node e CAND not adjacent to FIXP; 
 ELSE SEL = FIXP; 
 STEP_3: /* Add SEL to COMPSUB, increment C, and construct NEXPL and NCAND, 
 Note that the number of candidates decreases for each call to 
 EXTEND; hence EXTEND always returns. */ 
 NEXPL = {a e EXPL | a adjacent to SEL}; 
 NCAND = {a e CAND | a adjacent to SEL, a j SEL}; 
 COMPSUB = COMPSUB U {SEL}; 
 CAND = CAND - {SEL}; 
 
-12- 
 
 EXPL = EXPL U {SEL}; 
 C = C + 1; 
 STEP_H : /* If NCAND and NEXPL are empty, a clique has "been generated. */ 
 IF (NEXPL = NIL) & (NCAND = NIL) THEN print the clique of C nodes 
 contained in COMPSUB ; 
 STEP_5: /* If NCAND is not empty, COMPSUB can he extended further. */ 
 
 IF NCAND -1= NIL THEN CALL EXTEND ( NEXPL, N CAND ) ; 
 STEP_6: /* Either NEXPL is not empty and NCAND is empty which implies 
 that a previously generated clique is heing constructed, or 
 a new clique has "been printed, or a successful return from 
 EXTEND has occurred. Hence back up by removing SEL from 
 COMPSUB. If possible, select a new SEL and attempt to 
 generate more cliques. Otherwise return. */ 
 C = C - 1; 
 
 COMPSUB = COMPSUB - {SEL}; 
 IF there are nodes e CAND not adjacent to FIXP 
 
 THEN select the first such node as SEL and go to STEP_3; 
 ELSE RETURN; 
 END EXTEND; 
 END ALGORITHM_BRON-KERBOSCH ; 
 
 2 . 3 Moon-Moser Graphs 
 
 Bron and Kerbosch tested their algorithm on the Moon-Moser graphs [3] 
 which contain more cliques per node than any other graphs. These graphs 
 have 3k nodes grouped into k triplets, and each node is adjacent to every 
 
-13- 
 
 other node except the two nodes in the same triplet. The graph with 3k 
 
 nodes contains 3 cliques. They found that their algorithm constructed 
 
 k 
 
 all the cliques in the Moon-Moser graph with 3k nodes in time of 0(3.1*+ ). 
 
 We can gain some insight into this result by counting the number of com- 
 parisons required to generate the cliques . 
 
 For a Moon-Moser graph with 3k nodes grouped into the triplets {1,2,3}, 
 {U,5,6}, ..., {3k-2, 3k- 1, 3k}, the number of comparisons, c , is as follows. 
 
 K. 
 
 Operation Comparisons 
 
 Find FIXP 3k(3k-l) 
 
 Construct lists EXPL,CAND 3k-l 
 
 Call EXTEND c, n 
 
 k-1 
 
 Find next SEL 1 
 
 Construct lists NEXPL,NCAND 3k-l 
 
 Call EXTEND c, . 
 
 k-1 
 
 Find next SEL 1 
 
 Construct lists NEXPL,NCAND 3k-l 
 
 Call EXTEND c, ., 
 
 k-1 
 
 This is the best case because the choices for SEL are 2 and 3, and finding the 
 next SEL requires only one comparison. Summing these counts yields 
 
 C. = 3C, , + 9k 2 + 6k-l . 
 k k-1 
 
 This linear difference equation is easily solved giving 
 
 C k = 3 k i | 1 (9i 2 + 6i-l)3 _i . 
 The worst case occurs when the choices for SEL are 3k-l and 3k. In this case 
 
 C 
 
 k = 3 i | 1 (9i + 12i-T)3" 
 
■1k- 
 
 As k ■*■ °°, both sums converge yielding 
 
 best k+2.6053 
 
 L k ~ J 
 
 worst k+2.6801 
 
 C k ~ 3 
 
 If the number of comparisons is an adequate measure of the time required by 
 
 the algorithm, then the Bron-Kerbosch algorithm operates at the theoretical 
 
 limit of 0(3 k ) . 
 
-15- 
 
 3 . IMPLEMENTATION 
 
 In most areas where clustering techniques are applied, large amounts of 
 data are generally processed. Clique generation is often used as an important 
 first step in classifying the data. To make clique generation practical for 
 large data sets, it is essential to develop an efficient implementation of the 
 fastest available algorithm, the Bron-Kerbosch algorithm. 
 
 Bron and Kerbosch implemented their algorithm in Algol, while Mulligan 
 implemented it in PL/I. Since their implementations are identical, we will 
 restrict our attention to Mulligan's. 
 
 Mulligan's implementation is fairly fast. However, since enormous 
 amounts of time are generally required to process large graphs , even a modest 
 improvement in performance is of practical significance. In his implementation, 
 EXPL and CAND are concatenated into a single vector of integers with a pointer 
 indicating the boundary between the two lists. A selected candidate is trans- 
 ferred from the candidate list to the explored list by exchanging it with the 
 first node in the candidate list and incrementing the pointer by one. This data 
 structure has certain advantages . Additions to and deletions from the lists are 
 simple, and the determination of the list contents is trivial. However, it 
 complicates the execution of the principal operations of finding FIXP, and 
 constructing the lists NEXPL and NCAND. These operations must be performed 
 serially node by node in loops. They could be speeded up if the lists were 
 sorted, but this would require a more elaborate list structure and additions 
 to and deletions from the lists would be more complicated. 
 
 We observed that the principal operations can be written in terms of set 
 intersections as follows: 
 
-16- 
 
 FIXP = first node i e EXPL U CAND for which CAND fl {nodes e S adjacent 
 to i} has maximum number of nodes, 
 
 NEXPL = EXPL fl {nodes e S adjacent to SEL} , 
 
 NCAND = CAND fl {nodes e S adjacent to SEL}. 
 If the lists are represented by bit strings, then intersections of the lists 
 can be computed rapidly using boolean operations which perform blocks of 
 comparisons in parallel. Hence we chose to represent the lists of candidates 
 and explored nodes and the rows of the adjacency matrix for an m-node graph 
 by m-bit strings. This new data structure speeds up the principal operations, 
 but it also creates new problems. The determination of the names and the 
 number of nodes in a list, formerly trivial, now is fairly difficult. It 
 would defeat the purpose of the new data structure to perform these operations 
 bit by bit in a high-level language. Hence we chose to implement these 
 operations in a low-level language. An efficient subroutine can be written 
 to count the one bits in a bit string using the IBM/360 logical instruction 
 "translate and test", which maps a byte into a table. Unfortunately, there 
 is no LBM/36O instruction which extracts the locations of the one bits in a 
 string. However, a subroutine which rapidly extracts the one bit locations 
 can be written using a fast register to register add instruction. 
 
 We implemented the basic algorithm in PL /I. A listing of the PL/I 
 procedure EXTEND and the Assembler subroutines is presented in the Appendix. 
 
 3.1 Numerical Results 
 
 The new implementation was compared to Mulligan's on several graphs. 
 Unfortunately, accurate timing results could not be obtained because of the 
 
-17- 
 
 local multiprogramming environment. 
 
 Some typical results are shown in Tables 1 and 2. Table 1 presents 
 results for the Moon-Moser graphs with 3k nodes. Since the time estimates 
 are contaminated with random errors, least squares approximations of the 
 form ak + b were fitted to the logarithms of the times. These approximations 
 
 indicate that the time required to generate 3 cliques is proportional to 
 
 k k 
 
 3.00 for Mulligan's implementation and to 2.99 for the new implementation. 
 
 Given the timing errors, we can conclude that the actual execution time is 
 
 probably proportional to 3 . 
 
 Table 2 presents results obtained using data from a color-shape 
 preference test for preschool children. Each child's performance is 
 described by a characteristic vector of 72 bits. Two performances were 
 judged to be similar if 6 or more bits in the corresponding vectors matched. 
 We analyzed an 80 node graph summarizing the data for 80 children. As the 
 threshold 6 decreases, the number of edges in the graph grows, and the number 
 of cliques increases rapidly. 
 
 In both examples, the new implementation is superior to Mulligan's. 
 Although the improvement in performance is relatively modest, it is 
 significant in view of the high cost of analyzing large data sets. 
 
-18- 
 
 k 
 
 Time (Seconds) 
 
 Mulligan 
 
 Present 
 
 5 
 
 1.U3 
 
 .81+ 
 
 6 
 
 5.02 
 
 2.1*8 
 
 7 
 
 lU.13 
 
 l.h3 
 
 8 
 
 k2.kk 
 
 22. 9k 
 
 9 
 
 129 . 82 
 
 66. 3h 
 
 Table 1. Moon-Moser Graphs with 3k Nodes 
 
 Threshold 
 
 Number 
 of Cliques 
 
 Time (Seconds) 
 
 Mulligan 
 
 Present 
 
 33 
 
 165 
 
 8.78 
 
 1.1+2 
 
 31 
 
 315 
 
 lU.39 
 
 2.87 
 
 29 
 
 730 
 
 31.25 
 
 6.10 
 
 27 
 
 23U8 
 
 95-10 . 
 
 20.57 
 
 25 
 
 7505 
 
 3U6.78 
 
 66.78 
 
 Table 2. Color-Shape Preference Test Graph 
 
-19- 
 
 LIST OF REFERENCES 
 
 1. Augustson, J. G. , and Minker, J., "An analysis of some graph theoretical 
 cluster techniques," J. ACM IT , 571-588, 1970. 
 
 2. Bron, C. , and Kerbosch, J., "Finding all cliques of an undirected graph," 
 Comm. ACM 16 , 575-577, 1973. 
 
 3. Moon, J. W. , and Moser, L. , "On cliques in graphs," Israel J. Math. 3 , 
 23-28, 1965. 
 
 h. Mulligan, G. D. , Algorithms for finding cliques of a graph , Technical 
 Report kl , Department of Computer Science, University of Toronto, 1972. 
 
 5. Sibson, R., "Single-link cluster method," Comp. J. 16 , 30-32, 1973. 
 
-20- 
 
 APPENDIX 
 
 PL/I and ASSEMBLER PROGRAMS 
 
-21- 
 
 EXTEND: 
 
 PROCEDURE(CAND,EXPL,N) RECURSIVE; 
 
 /* RECURSIVE PROCEDURE GENERATING ALL POSSIBLE CLIQUES EXTENDED 
 FROM THE PARTIAL SOLUTION IN "CCMPSUB" USING "CAND" 
 INITIALLY, CAND CONTAINS 1 BITS FOR ALL NODES PRESENT 
 
 EXPL IS THE NIL OR ZERO BIT STRING 
 GLOBALLY DEFINED VARIABLES ARE: 
 
 PRNTFLG - BIT 1 => CLIQUES ARE TO BE PRINTED 
 
 BIT => CLIQUES ARE NOT PRINTED JUST 
 CCUNTED IN NUMOUT 
 NUMCUT - COUNTER OF CLIQUES WHEN PRNTFLG IS BIT 
 CONNECTED - N DIMENSIONAL VECTOR, EACH ELEMENT IS 
 OF N BITS. ( AOJACENCY MATRIX ) 
 CONNECTED(J) - BIT STRING REPRESENTING 
 
 NODES ADJACENT TO J 
 VERTEX - N DIMENSIONAL VECTOR LIKE CCNNECTEO 
 
 VERTEX(J) IS OF N BITS WITH A 1 BIT IN THE 
 JTH PCSITICN ONLY */ 
 
 /* ASSEMBLER SUBROUTINES */ 
 
 /* CCUNT(N6,STR,CT) IS A SUBROUTINE THAT CCUNTS UP THE 
 NUMBER OF 1 BITS IN A BIT STRING: 
 
 CT = * OF 1 BITS IN BIT STRING STR, WHERE STR IS OF LENGTH 
 NB BYTES */ 
 
 DCL COUNT OPTICNS(ASM) 
 
 ENTRY(FIXEL) BIN ( 15, ) , BIT ( * J, FIXED BINU5,OJ), 
 /* XTRACT(NB, STR, LIST, M) IS A SUBROUTINE THAT EXTRACTS THE 
 POSITIONS OF 1 BITS IN A BIT STRING 
 LIST = LIST OF POSITIONS OF 1 BITS IN THE BIT STRING STR, 
 
 WHERE STR HAS LENGTH NB BYTES, ANO 
 M = NUMBER OF ELEMENTS IN LIST */ 
 
 XTRACT OPTICNS(ASM) 
 
 ENTRYCFIXED BINI 15 , 0) , BI T (* ),(*) FIXED BIN(15,0), 
 FIXED B1N( 15,0)), 
 CAND BIT<*), /* CANDIDATES PASSED */ 
 
 EXPL BIT(*>, /* EXPLORED NODES PASSED */ 
 
 (NCAN,NEXP) BIT(N), /* NEW CAND, NEW EXPL */ 
 
 NB FIXED BIN(15,0), /* NO. OF BYTES IN BIT STRINGS */ 
 
 ASEL BIT(N), /* LIST OF FUTURE SELECT NODES */ 
 
 FIXP FIXED BIN(15,0), /* MOST CONNECTED NCOE W.R.T. THE 
 
 CANDIDATES */ 
 
 RLIST(N) FIXED BIN(15,0), /* RETURN LIST FROM ASM ROUTINES */ 
 ZERO BIT(N), /* ZERO BIT STRING */ 
 
 /* CT, CNT ARE COUNTERS OF 1 BITS 
 SEL IS THE SELECTED NODE 
 IFL IS A FLAG 
 
 CTHERS ARE INDEXING VARIABLES */ 
 <I,CNT,CT,IFL, IS, SEL, J, M) FIXED BIN(15,0); 
 
-22- 
 
 ASEL = ASEL & -.ASEL; /* INITIALIZE */ 
 
 zero*asel; 
 ifl=o; 
 
 NB=N/8; /* ASSUME N DIVISIBLE BY 8 */ 
 
 CNT=0; 
 
 /* GCING THRU CANC AND EXPL LISTS TO FIND THE NODE THAT IS 
 
 CONNECTED TO THE MOST CANDIDATES */ 
 
 IF ZERC=EXPL THEN GO TO SKPl; 
 
 /* INTEGER REPRESENTATION OF ALL 
 
 EXPL NODES INTO RLIST */ 
 
 CALL XTPACTCNB, EXPL, RLIST, M); 
 
 DO 1=1 TC M; /* SEARCH THRU EXPL LIST */ 
 
 J=RLIST(I); 
 
 /* COUNT CONNECTIONS TO CANDS. */ 
 CALL COUNT(NB,CONNECTED(Ji L CAND,CT); 
 
 IF CT>CNT THEN /* FIND MOST CONNECTED NODE */ 
 
 DO; 
 
 CNT=CT; FIXP=J; ASEL= -.CONNECTED! J ) £ CAND; 
 
 end; 
 END; 
 SKPl: IF ZERG=CAND THEN GO TO SKP2; 
 
 CALL XTRACT(NB,CAND,RLIST,M); /* EXTRACT NODES FROM CAND LIST */ 
 DO 1=1 TO M; /* SEARCH THRU EXPL LIST */ 
 
 J=FLIST(I); 
 
 CALL CCUNT(NB,CONNECTED(J) & CANO,CT); 
 IF CT>CNT THEN 
 DO; 
 
 CNT S CT; FIXP=J; IFL-l; ASEL * -.CONNECTEDC Jl 6 CAND; 
 END; 
 END; 
 SKP2: CALL XTRACT< NB, ASEL, RLIST, M) ; IS=1; 
 
 IF IFL=0 THEN /* IFL=0 => FIXP IS IN EXPL LIST */ 
 
 DO; /* SELECTED NODE MUST BE A CAND. 
 
 INSTEAD OF FIXP, CHOOSE A CAND. 
 NOT CONNECTED TO FIXP */ 
 
 SEL=RLIST(IS); IS*IS*l; 
 END; 
 ELSE SEL=FIXP; /* ELSE FIXP IS A CAND,, CHOOSE 
 
 FIXP */ 
 
-23- 
 
 DO WHILE (IS <= M+l); /* feHlLE THERE ARE STILL SEL'S */ 
 
 /* CONSTRUCT NEW CAND, EXPL */ 
 NEXP = EXPL £ CGNNECTED<SEL); 
 NCAN = CAND £ CONNECT ED( SEL ) £ -»VERTEX( SEL J ; 
 
 c=c+i; 
 
 COMPSUB(C)=SEL; /* AOD SELECTED NODE TC PARTIAL 
 
 SOLUTION OF CLIQUE */ 
 
 /* NC MORE CANDICATES AND EXPLORED 
 NODES => A CLIQUE IS FOUND */ 
 IF (NCAN I NEXP) = ZERO THEN 
 IF PRNTFLG THEN 
 
 PUT EDIT( (LOMPSUB(I) DO 1 = 1 TO O) < SKIP, <C)F(3i ) ; 
 ELSE NUMGUT=NUMOUT+l; 
 
 /* IF THERE ARE MORE CANDIDATES TO 
 BE EXPLORED , CALL EXTEND TO GC 
 DOWN ONE MORE LEVEL */ 
 
 ELSE IF NCAN -= ZERO THEN CALL EXTEND( NCAN , NEXP ,N) ; 
 C=C-l; /* DELETE EXPLORED NODE FROM 
 
 CLIQUE */ 
 
 /* DELETE EXPLD- NODE FGRM CAND */ 
 CAND = CAND £ -.VERTEX ( SEL ) ; 
 
 /* ADD EXPLORED NODE TO EXPL */ 
 EXPL = EXPL I VERTEX(SfcL); 
 
 SEL=FLIST( IS); /* SELECT ANOTHER NCDE CUT OF LIST 
 
 CF CANDIDATES NOT CONNECTED TO 
 FIXP »/ 
 
 IS=IS+l; 
 END; 
 END EXTEND; 
 
-2k- 
 
 COUNT 
 
 LOOP 
 
 OUT 
 TABLE 
 
 BEGIN 
 
 L 
 
 L 
 
 L 
 
 LA 
 
 LA 
 
 LH 
 
 AR 
 
 SR 
 
 LA 
 
 LA 
 
 TRT 
 
 CR 
 
 BC 
 
 AR 
 
 LR 
 
 B 
 
 STH 
 
 LEAVE 
 
 DC 
 
 DC 
 
 DC 
 
 DC 
 
 DC 
 
 DC 
 
 OC 
 
 DC 
 
 DC 
 
 DC 
 
 DC 
 
 END 
 
 3,0(0,1) 
 
 4,4(0,1) 
 
 5,8(0,1) 
 
 9,1 
 
 8*0 
 
 6,0(0,3) 
 
 6,4 
 
 4,9 
 
 2,0 
 
 lfO 
 
 1(256,4) 
 
 6,1 
 
 12, OUT 
 
 8,2 
 
 4,1 
 
 LOOP 
 
 8,0(0,5) 
 
 R3=ADDR(LENGTH OF STRING IN BYTES) 
 
 R4=ADUR( STRING) 
 
 R5=ADDR( COUNT) 
 
 R9=C0NSTANT 1 
 
 R8=C0UNT REGISTER 
 
 OBTAIN LENGTH OF STRING 
 
 R6=ADDR OF END OF STRING 
 
 SCAN START ONE BYTE TO THE LEFT 
 
 CLEAR R2,R1 
 
 ,TABLE TRANSLATE AND TEST 
 
 HAS CAN REACHED END? 
 
 NO, ADD TO COUNT REGISTER 
 
 RESET SCAN ADDR 
 
 BRANCH BACK 
 
 YES. STORE INTO COUNT 
 
 X'0001010201 
 X» 0203030403 
 X' 0203030403 
 X' 0203030403 
 X» 0203030403 
 X 1 0405050605 
 X' 0203030403 
 X' 0304040504 
 X' 0203030403 
 X 1 0405050605 
 X' 0405050605 
 
 0202 03 010202030 20303040102020302030304' 
 040405 01020203020303040203030403040405' 
 040405 03040405040505060102020 302030304 • 
 040405020303040 30404050304040504050506 • 
 040405 03040405040505060304040 504050506' 
 06060701020203020303040203030403040405' 
 0404050^04040 504050506 0203030 403 040405' 
 050506 0304 0405040505060405050605060607' 
 04040503040405040505060304040504050506' 
 0606070304040504050506040 5050 605060607' 
 0606070506060706070708 • 
 
-25- 
 
 XTRACT 
 
 BEGIN 
 
 
 L 
 
 2,0(0,1) 
 
 
 L 
 
 3,4(0,1) 
 
 
 L 
 
 4,8(0,1) 
 
 
 LA 
 
 <3,8 
 
 
 LA 
 
 6,1 
 
 
 LA 
 
 3,0 
 
 
 LA 
 
 8,0 
 
 
 LA 
 
 10,2 
 
 LOP 
 
 IC 
 
 7,0(0,3) 
 
 
 SLL 
 
 7,24 
 
 LDP1 
 
 AR 
 
 8,6 
 
 
 ALR 
 
 7,7 
 
 
 BC 
 
 12,N0AD 
 
 
 STH 
 
 8,0(0,4) 
 
 
 BC 
 
 2, NXT 
 
 
 AR 
 
 4,10 
 
 
 B 
 
 L0P1 
 
 NXT 
 
 AR 
 
 4,10 
 
 
 B 
 
 NEXT 
 
 NCAD 
 
 BC 
 
 4,L0P1 
 
 NEXT 
 
 AR 
 
 5,6 
 
 
 LR 
 
 8,5 
 
 
 SLA 
 
 8,3 
 
 
 AR 
 
 3,6 
 
 
 CH 
 
 5,0(0,2) 
 
 
 BC 
 
 4, LOP 
 
 
 L 
 
 7, 12(0,1) 
 
 
 S 
 
 4,8(0,1) 
 
 
 SRA 
 
 4,1 
 
 
 STH 
 
 4,0(0,7) 
 
 R2=ADDR OF N, # OF BYTES 
 
 R3=ADDR OF STRING 
 
 R4=ADDP OF PASSED LIST 
 
 R9=C0NSTANT 8 
 
 R6=CCNSTANT 1 
 
 R5=BYTE COUNT 
 
 R8=BIT CCUNT 
 
 R10=C0NSTANT 2 
 
 PUT ONE BYTE IN 
 
 SHIFT LEFT 
 
 UP BIT COUNTER 
 
 SHIFT LEFT ONE 
 
 CARRY? NO, 
 
 YES. STORE BIT 
 
 ZERO? YES, NEXT BYTE 
 
 READY FOR NEXT STORE 
 
 BACK FOR ANOTHER SHIFT 
 
 READY FOR NEXT STORE 
 
 R7 
 
 GO TG NCAD 
 INDEX 
 
 NO CARRY. NOT ZERO. 
 UP COUNT OF BYTES 
 
 SHIFT AGAIN 
 
 RESET BIT COUNTER 
 
 NEXT BYTE ON STRING 
 
 4 CF BYTES EXCtED LENGTH? 
 
 NO. GET ANCTHER BYTE 
 
 R7 = ADDR OF FOURTH PARAMETER 
 
 COUNT H CF INDEX STORED 
 STGRb AND RETURN 
 
 LEAVE 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-7U-655 
 
 3. Recipient's Accession No. 
 
 4. Title and Subtitle 
 
 CLUSTERING BY CLIQUE GENERATION 
 
 5. Report Date 
 
 June, 197^ 
 
 6. 
 
 7. Author(s) 
 
 CHIH-MENG CHENG 
 
 8- Performing Organization Rept. 
 
 No. 
 
 9. Performing Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urban a- Champaign 
 
 Urbana, Illinois 6l801 
 
 10. Project/Task/Work Unit No. 
 
 1 1. Contract/Grant No. 
 
 12. Sponsoring Organization Name and Address 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 17. Key Words and Document Analysis. 17a. Descriptors 
 
 17b. Idcntif Lers /Open-Ended Terms 
 
 17c. ( OSAT1 Field/Group 
 
 18. Availability Statement 
 
 Unlimited 
 
 FORM N TIS- 35 (10-70) 
 
 19. Security c [ass (Thi 
 Report ) 
 
 UNCLASSIFIED 
 
 20. Sec urity ( lass (This 
 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 30 
 
 22. P 
 
 USCOMM-DC 40329-P7 1 
 
MI6 28 * 74 
 
FEB 1 7 1981