The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 To renew call Telephone Center, 333-8400 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 Vs 
 
 1 7 I3P2 
 
 *****■■ 
 
 APR 03 
 
 JAN 6 1997 
 
 1996 
 
 L161— O-1096 
 
' U' 
 
 -Ur "r*u 
 
 UIUCDCS-R-77-905 UILU-ENG 77 1759 
 
 ■1 
 
 Average Analysis of Simple Path Algorithms 
 
 by 
 Yehoshua Perl 
 
 November 1977 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN • URBANA, ILLINOIS 
 
Average Analysis of Simple Path Algorithms 
 
 Yehoshua Perl* 
 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign 
 
 November 1977 
 
 *0n leave from Department of Mathematics and Computer Science 
 Bar-Ilan University, Ramat-Gan, Israel 
 
 **This work was supported in part by the National Science Foundation under 
 Grant No. NSF MCS-73-03408. 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/averageanalysiso905perl 
 
Abstract 
 
 Given a graph of n vertices and e edges. The average 
 complexity of several known simple path algorithms is analyzed. The 
 average and the standard deviation of the number of the edges scanned 
 to find the target vertex in both breadth first search and depth 
 first search are shown to be of order n. 
 
 Both the average and the variance of Prim's minimum 
 spanning tree algorithm are shown to require 0(n lg n ln(e/n)) time. 
 The same result holds for Dijkstra's shortest path algorithm. 
 Krususkal's minimum spanning tree algorithm, which competes with 
 Prim's algorithm requires 0(n In n lg e) on the average. 
 
 The connection to related results is discussed. 
 
 -1- 
 
1. Introduction 
 
 Given an undirected graph G(V,E) of n vertices and e edges. 
 We discuss the average complexity of several known path algorithms 
 as breadth first search (BFS) , depth first search (DFS), Prim's and 
 Kruskal's minimum spanning tree (MST) algorithms and Dijkstra's 
 shortest path algorithm. The analysis for directed graph is 
 essentially the same. 
 
 Recently there are attempts to design new algorithms which 
 are very efficient on the average even though they are not better 
 in the worst case. For example, a transitive closure algorithm of 
 Bloniarz, Fischer and Meyer [1] and an algorithm of Spira [15] for 
 finding shortest paths between all pairs in a graph. Such works are 
 motivated by the fact that in practice the average complexity of 
 an algorithm is more relevant than the worst case complexity. 
 
 We analyze the average complexity of several known 
 algorithms since we feel that while designing new algorithms which 
 are efficient on the average we should know the average complexity 
 of known algorithm so we can compare them. Average analysis of 
 simple algorithms might lead .to analysis of more complicated 
 algorithms. Also simple algorithms are sometimes applied as procedures 
 in compound algorithms and their analysis may help to analyze the 
 complexity of the compound algorithms. For example Dinic's maximum 
 flow algorithm [5] applies both BFS and DFS. 
 
 -2- 
 
We analyze the average complexity of the algorithms for a 
 random graph assuming equal probability of all graphs of n vertices 
 and e edges and that the lengths of the edges are independently chosen 
 from a non-negative distribution. We also assume that for every vertex 
 a list of its edges is given. 
 
 The connection of our results to related works of Bloniarz, 
 Fischer and Meyer [1], Spira [15] and Johnson [8] is discussed. 
 
 Let d(v) denote the degree of a vertex v and l(u,v) 
 denotes the length of an edge (u,v). By In v and lg x we denote the 
 natural and binary logarithms, respectively. 
 
 -3- 
 
2. Breadth First Search and Depth First Search 
 
 The two common procedures for scanning a graph are breadth 
 first search (BFS) and depth first search (DFS). Both of them require 
 in the worst case scanning of all the edges of the graph. We want to 
 find what is the average number of edges to be scanned in order to 
 find a path from a source vertex s to a target vertex t. Actually, 
 we need the average number of edges scanned until we reach the vertex t 
 for the first time; since then one can backtrack along the edges of 
 the path, using pointers which are prepared while scanning. 
 
 Let us find the average behavior of BFS and DFS for a random 
 graph. Following Erdos and Renyi [6] we assume equal probability 
 of all graphs of n vertices and e edges implying that the probability 
 of every pair of vertices being connected by an edge is equal e/(„). 
 
 The probability of an edge emanating from a vertex v ^ t to enter 
 the target vertex t is p = l/(n-l) (note that the list of the edges 
 of the second vertex scanned, for example, contains an edge to the 
 source vertex) . Both BFS and DFS begin scanning at the source vertex 
 s and continue to scan I edges emunating from vertices, which were 
 already visited, until the vertex t is reached. They differ in the 
 order of scanning the edges. The BFS scan the edges from the vertices 
 in first visited first scanned order while in DFS the order is last 
 visited first scanned . (It seems an extension of the fifo and lifo 
 orders where every element is processed several times.) Actually 
 
 -4- 
 
while scanning the i-th edge of a vertex v the probability to reach t 
 is l/(n-i) and not l/(n-l) since i vertices different from t are 
 not reachable through this edge. But since the degrees of the 
 vertices and the order of their scanning are not known we shall 
 use l/(n-l) as a lower bound for the probability obtaining an 
 upper bound for the average number of edges scanned until reaching 
 t for both BFS and DFS or any other order of scanning. (Actually 
 the probabilities of the first edges scanned in BFS are higher 
 than in DFS since in BFS more edges of the same vertices are scanned 
 first, but again it is not clear how to calculate this difference.) 
 
 e e 
 
 E(I) < Z i p(l-p) 1 " 1 = p Z i(l-p) 1 " 1 < i = J,* = n-1 
 i=l i=l p ' KU i; 
 
 The variance is 
 
 Var(I) < E(I 2 ) < Z ± 2 p(l-p) 1 1 < (2-p)/p 2 < 2/p 2 
 
 Var(I) < 2 (n-1) 
 
 i=l 
 2 
 
 and the standard deviation is 
 
 a(I) < /T (n-1) 
 
 Note that our bound applies also for cases when the algorithm 
 stops earlier where there is no path from s to t. 
 
 -5- 
 
In practice we may assume that the number of edges required 
 to reach the target t is linear with the number of vertices since 
 both the average and the standard deviation are linear with n. 
 
 This result is interesting since it is independent with 
 the density of the graph. Although we can not calculate the difference 
 between BFS and DFS it is clear that BFS is slightly more efficient 
 on the average. This may be expected since BFS finds a shortest 
 path while DFS not necessarily does. 
 
 A related result was obtained by Bloniarz, Fischer, and 
 Meyer [1] in their average analysis of a transitive closure algorithm, 
 which essentially applies BFS from every vertex in the graph. They 
 find that BFS requires scanning of n In n edges on the average in order 
 to scan all the vertices reachable from s. Actually their proof is 
 also valid for DFS. Our result shows that while there is no deference 
 in the worst case between scanning one target vertex and all vertices, 
 there is a deference in the average case. 
 
 Now, what is the average number of vertices scanned in BFS 
 until reaching t. Let d denote the degree of the target vertex t. 
 
 The probability of reaching t while scanning the source 
 vertex s in p=d/(n-l) since this is the probability of s being one 
 of the d neighbours of t. The probability of reaching t while 
 scanning the i-th edge, assuming t was not reached before, is 
 d/(n-i)>p since the i vertices already scanned are not possible 
 neighbours of t. Again we shall use p as a lower bound for the 
 
 -6- 
 
probabilities obtaining an upper bound for the average number of 
 vertices scanned, assuming degree d for t, 
 
 E(I,d) < £ i p(l-p) 1 * < 
 
 i-1 P d 
 
 and the variance 
 
 V(I,d) < 2/p 2 = 2((n-l)/d) 2 
 
 The probability that the degree of t is d is 
 
 ,n-l N d... .n-l-d 
 ( d ) q (1-q) 
 
 where q = e/(„) is the probability of an edge connecting two vertices, 
 
 If t is a single vertex then at most n-1 vertices are scanned. Thus 
 the average number of vertices scanned until reaching t is 
 
 E(I) < (n-1) (1-q) 11 " 1 + Z E(I,d) C"" 1 ) q d (l-q) n 1 d 
 
 d=l 
 
 ^/-r\ / -i \ /-, \ n— 1 r- n— 1 ..n— lv d,, x n-l-d 
 E(I) < (n-1) (1-q) + E — — ( ) q (1-q) 
 
 d=l 
 
 -7- 
 
Let us simplify the right handside of the inequality. 
 
 m , . 
 
 Denoting m = n-1 and using the identity (,) = E (, .) we obtain 
 
 d . , d-± 
 
 1=1 
 
 m(l-q) m + m Z ± (») qVq) 1 ^ 
 d=l d d 
 
 m , m . 
 
 m(l-q) + m Z q (1-q) j Z M-l' 
 
 d=l i=l 
 
 mm j i • i 
 
 /i \ m , v <r d /i >.m-d 1 ,1-1,. 
 
 = m(l-q) + m E E q (1-q) ^ ^d-1^ 
 i=l d=l 
 
 and by the identity (.) = — (, .. ) 
 
 d d d-1 
 
 mm, j i • 
 
 /i \ m , v v d /i xm-d 1 ,1, 
 
 = m(l-q) + m Z E q (1-q) t (j) 
 
 i=l d=l 
 
 m l , , ' 
 
 m(l-q) m + m E 1/i E q d (l-q) m " d (t) 
 1=1 d=l 
 
 m l , _, 
 
 m(l + 1/H ) E 1/i E q d (l-q) m d (J) 
 m 1=1 d=0 d 
 
 where H = E 1/i = In n + 0(1) 
 m 1=1 
 
 = m(l + 1/H ) E (1-q)" 1 l /i Z q d (l-q) i " d (J) 
 
 m 1=1 d=o d 
 
 m , 
 
 m(l + 1/H ) E (l-q)™" 1 /! 
 m 1=1 
 
 m 
 m(l + 1/H ) E (1-q) /(m-i) 
 m 1=1 
 
 -8- 
 
No close formula was found for this series, thus 
 
 n-1 
 E(I) < (n-l)(l + 1/H ) I (l-e/i^r/in-i-l) 
 
 n_1 i-1 2 
 
 -9- 
 
3. Prim's Minimum Spanning Tree Algorithm 
 
 Consider Prim's nearest neighbour algorithm [13] for 
 finding an MST of a given graph. We begin with a subtree T 
 containing one arbitrary vertex u. Then in each step we choose a 
 vertex v { T with a minimum distance D(v) to a vertex of the 
 subtree T. Then v is added to T and the distances D(u ) of the 
 
 neighbours u. f T of v to the subtree T are updated if possible. 
 
 D(u ± ) = min(D(u i ),l(v,u i )) 
 
 The straight forward implementation of this algorithm 
 
 2 
 requires 0(n ) time. Thus Prim's algorithm is efficient for 
 
 2 
 complete graphs since 0(n ) edges must be scanned for obtaining 
 
 2 
 an MST. On the other hand, for sparse graphs, where e<< n , the 
 
 straight forward implementation of Prim's algorithm is inferior 
 
 to Kruskal's MST algorithm [11] requiring 0(e lg e) time, 
 
 But for sparse graphs there exists an 0( (e+n) lg n) 
 
 implementation [12] of Prim's algorithm, using a heap, (see for 
 
 example [14]) as a priority queue for choosing the nearest neighbour 
 
 to the subtree T. The vertices v f T are stored in a heap 
 according to their distances D(v) from T. Thus either choosing 
 the nearest neighbour or updating D(v) requires at most 0(lg n) time. 
 (For every vertex v in the heap we have a pointer to its place in the 
 
 -10- 
 
heap, which is updated while v is moving in the heap.) There are at 
 most e updates through the algorithm. Thus the priority queue 
 implementation of Prim's algorithm requires at most 0((e+n)lg n) 
 time. 
 
 Consider now the average complexity of this implementation. 
 We assume that the lengths of the edges are independently chosen 
 from a non-negative distribution. 
 
 Let N(v) denote the number of times D(v) was updated in 
 the algorithm, then N(v) <_ d(v) where d(v) denotes the degree of 
 
 v. M = Z N(v) is the total number of updates in the algorithm. 
 
 V 
 
 We refer to the edges of the vertex v in the order of their scanning 
 from T in the algorithm. The variable x.(v) i = l,2,...,d(v) has 
 
 value 1 if D(v) was updated through its i-th edge and o otherwise, 
 
 d(v) 
 clearly N(v) = Z X.(v). 
 i=l X 
 
 The distance D(v) is updated through the i-th edge from 
 
 T to v if this edge is shorter than the i-1 previous edges from T 
 
 to v and the probability for this is 1/i. 
 
 Thus E(X.(v)) = 1/i 
 
 and Var(X.(v)) = E(X. 2 (v)) - E 2 (X.(v)) = 1/i - 1/i 2 
 
 The distance D(v) is updated through some edges of v until 
 v is added to T. Hence 
 
 d(v) d(v) 
 
 E(N(v)) < Z E(X,(v)) = Z 1/i - H ., v - In d(v)+0(l) 
 " 1-1 X 1-1 d(v) 
 
 -11- 
 
The event that D(v) was updated through the i-th edge is independent 
 with the event that D(v) was updated through the j-th edge. Thus 
 the variables X.(v), i = l,2,...,d(v) are independent, and 
 
 d(v) d(v) 
 
 Var(N(v)) = Z E(X.(v)) = Z (1/i - I/O = H , , " - R^\ = In d(v) + 0(1) 
 i=l x i=i d ^ v ) d ( y ) 
 
 and Var(N(v)) < E(N(v)). 
 
 The average of the total number of updates in the algorithm 
 
 M = Z N(v) is 
 V 
 
 E(M) = Z E(N(v)) 1 0(E In d(v)) = 0(ln n d(v)) 
 
 V V V 
 
 The maximum of II d(v), where Z d(v) = 2e, is obtained where all d(v) 
 V V 
 
 are equal (up to a difference of 1) and d(v) = 2e/n. Hence 
 
 n 
 E(M) = 0(ln n d(v)) £ 0(ln n 2e/n) - 0(n In (2e/n)) 
 V 1 
 
 Using the heap each update requires at most 0(lg n) time. Therefore 
 
 the average time required by the priority queue implementation of 
 
 Prim's algorithm is bounded by 0(n lg n In (e/n)). 
 
 There is a difficulty in computing the variance of 
 
 M = Z N(v) since the variables N(v), v z V are dependent, for example 
 V 
 
 -12- 
 
if N(u) = n-1 then N(v) < n-1 for v / u. Therefore 
 
 Var(M) = E Var N(v) - 2 E Cov(N(u) ,N(v) ) 
 V u,veV 
 
 d(u) d(v) 
 Cov(N(u),N(v)) = Cov( E X.(u), E X.(v)) = 
 
 i=l X j=l J 
 
 d(u) d(u) d(v) d(v) 
 
 = E(( E X.(u) - E E(X.(u))( E X. (v) - E E(X.(v)) 
 i=l X i=l X j=l J j=l J 
 
 d(u) d(v) 
 = E( E E (X (u) - E(X (u)))(X (v) - E(X (v)))) 
 
 i=l j=l 2 J 
 
 d(u) d(v) 
 = E E E((X (u) - E(X (u)))(X (v) - E(X (v)))) 
 
 i=l j=l J J 
 
 d(u) d(v) 
 = E E E(X (u) • X (v)) - E(X (u)) • E(X (v)) 
 i=l j=l 2 3 
 
 If the i-th edge to u is not actually the j-th edge to 
 v than x.(u) and x.(v) are independent and thus 
 
 E(X i (u)«X j (v)) - E(X i (u)) . E(X (v)) = 
 
 Thus the only contribution to Cov( N(u),N(v)) is in case 
 (u,v) e E. But in such a case it can not happen that both u was 
 updated from v and v was updated from u. Hence at most one of 
 x.(u) and x.(v) is equal 1 and x.(u) • x.(v) = for all cases. 
 
 -13- 
 
Thus if (u,v) e E then 
 
 Cov(N(u),N(v)) = - E(x i (u))«E(x.(v)) = - 1/i •' 1/j < -1/n 2 
 
 Hence 
 
 Var(M) < Z Var N(v) - 2 I 1/n 2 - E Var N(v) - 2e/n 2 
 V (u,v)eE V 
 
 The fact that Var(M) is slightly smaller than if the 
 variable N(v) were independent is not surprising since the 
 dependency is of a "negative" nature. By this we mean that the 
 only information we have is that if some N(v)'s are high enough 
 than the values of the others are bounded, but if some are small 
 than we have no information about the values of the others. 
 
 Var(M) < Z Var(N(v)) < Z E (N(v)) = E(M) = 0(n In (2e/n) 
 
 V V 
 
 Since the standard deviation a(M) <_ / E(M) the time 
 required by the algorithm in practice is quite concentrated near the 
 
 average 0(n lg n In (e/n)). 
 
 2 2 
 
 For dense graphs, where e = (n ) , we obtain 0(n lg n) 
 
 average behavior of the algorithm. But better results are obtained 
 
 for sparse graphs becuase of the careful analysis of the complexity. 
 
 -14- 
 
For example, in case e = a(n lg n) the average time required is 
 0(n lg n lg lg n) = 0(e lg lg n) which is the worst case behavior 
 of the very efficient algorithms of Yao [7] and Cheriton and 
 Tarjan [2]. Hence, in practice, it might be sometimes reasonable 
 to apply the more simple algorithm of Prim. For graphs which are 
 even more sparse, as in planar graphs for example, where e = 0(n) 
 the algorithm requires 0(n lg n) time on the average. 
 
 Our result has some influence on Johnson's 0(e) MST 
 algorithm [8]. He shows that for graphs which are dense enough, 
 
 where e ^ 0(n ) for a constant positive integer k, we can 
 implement Prim's algorithm to an 0(e) algorithm by using as a 
 priority queue a heap of a constant height k (allowing n^'^ sons 
 for every vertex in the heap). Johnson's main idea is that the 
 e possible updates may only decrease the value of the updated element 
 and, thus require only climbing up in the heap which takes at most 
 
 k operations per update. Eliminating a minimum element of the heap 
 
 1/k 
 requires at most 0(kn ) operations but only n minimum elements 
 
 are eliminated in the algorithm. Thus the algorithm requires at 
 
 1/k 
 most ke + kn n = 0(e). 
 
 Our result shows that even for complete graphs the average 
 
 number of updates is small enough to make Johnson's implementation 
 
 unefficient. This consequence is strengthen by the small variance 
 
 obtained. Thus from the average point of view Johnson's implementation 
 
 is inferior to the binary heap implementation. 
 
 -15- 
 
4. Dijkstra's Shortest Path Algorithm 
 
 Dijkstra's very efficient algorithm [A] for finding a 
 shortest path from a source vertex s to all other vertices in the 
 graph is another example of a nearest neighbour algorithm, A set 
 of vertices S contains all vertices for which the shortest distance 
 from the source s was already computed, initially S = {s}. For 
 every vertex v, D(v) is the length of a shortest path from s to 
 v through vertices of S. In each step we choose a vertex v i S 
 with a minimum distance D(v) from s to be added to S and the distances 
 D(u.) of the neighbours u. t S of v are updated if possible. 
 
 As for Prim's algorithm there is an 0((e+n) lg n) 
 implementation [10] of Dijkstra's algorithm using a heap as a priority 
 queue for the distances D(v) of v t S. The analysis of the average 
 complexity of this implementation is very similar to the analysis 
 of Prim's algorithm. The difference is in the probability of an 
 update of D(v) through the i-th edge. In Prim's algorithm D(v) 
 is updated through the i-th edge if it is shorter than the i-1 
 previous edges and since the lengths cf the edges are drawn 
 independently from the same distribution this probability is 1/i. 
 In Dijkstra's algorithm D(v) is updated through the i-th edge (u,v) 
 where u is the last vertex added to S if 
 
 D(u) + l(u,v) < Min (D(u.) + l(u.,v)) 
 
 u. J J 
 
 J 
 
 ■16- 
 
where u. are the other neighbours of v in S. The vertices are 
 J 
 
 added to S in a non-decreasing order of their distances from s. 
 
 Thus D(u) ^D(u.) and the probability of an update of D(v) 
 
 through the i-th edge is bounded by 1/i. Now we may apply the 
 name analysis as in Prim's algorithm and obtain that the average 
 time and the variance required by Dijkstra's algorithm are bounded 
 by the corresponding results for Prim's algorithm i.e. 
 0(n lg n In (e/n)) . 
 
 Applying Dijkstra's algorithm n times, for all vertices 
 of the graph yields an algorithm for all shortest paths in a graph 
 for which the average time and the variance are bounded by 
 
 2 
 0(n lg n ln(e/n)). Spira [15] gives an algorithm for this problem 
 
 2 2 
 of 0(n lg n) average time. The difference of ln(e/n) instead 
 
 of lg n comes from the careful use of the possible sparseness of the 
 
 graph. Our result for the variance was obtained by using the 
 
 ;i negative'' nature of the dependency of the variables and is much 
 
 lower than Spria's result, 0(n lg n) . Also our analysis is slightly 
 simpler than Spira' s. 
 
 There is an important difference between these two algorithms. 
 Spira' s algorithm applies n times a heap-priority-queue implementation 
 of a one source shortest path algorithm, which is actually due to 
 Danzig [3]. But this last algorithm requires an initial sorting 
 of the edges of the graph which might take 0(n^ lg n) time which is 
 
 -17- 
 
higher than the time required by the straight forward implementations 
 of the main part of the algorithm and of Dijkstra's algorithm, both 
 of which require 0(n2) time. Therefore Spira's algorithm is 
 suggested only for finding all shortest paths in the graph while 
 Dijkstra's algorithm is efficient for both problems. 
 
 As in Prim's I1ST algorithm, Johnson [9] uses a constant 
 height heap to obtain an 0(e) implementation for Dijkstra 
 algorithm. Our observation in Section 2 holds also for this case. 
 
 -18- 
 
5. Kruskal's Minimum Spanning Tree Algorithms 
 
 In a previous section we analyzed the average behavior 
 of Prim's MST algorithm. Let us now analyze the competitive MST 
 algorithm of Kruskal [11]. This algorithm sorts first all the edges 
 of the graph in a non-decreasing order. Then we scan the edges 
 in this order adding to the tree every edge which is not closing 
 a cycle with the edges already inserted into the tree. Another 
 implementation of Kruskal's algorithm uses a heap as a priority 
 queue for the edges instead of the initial sorting. Both 
 implementations clearly require 0(e lg e) time in the worst case 
 since checking if an edge is closing a cycle is performed very 
 efficiently using the Union-Merge algorithm [7] [16]. Let us 
 calculate the average behavior of this priority queue implementation 
 by finding the average number of edges taken out of the priority 
 queue. 
 
 The first two edges, i.e. with the smallest lengths , must 
 be inserted into the tree. Assuming k edges were already inserted 
 into the tree, it is difficult to calculate the exact probability 
 of an edge to be the next edge of the tree, since it depends on 
 the number of the vertices in the subtrees of the forest containing 
 the k first edges. But this probability is the lowest in case all 
 k edges are in the same subtree. Thus we can obtain an upper bound 
 for the average assuming all k edges generate only one subtree of k+1 
 
 -19- 
 
k+1 k 
 vertices. The number of edges closing a cycle is ( _ ) - k = (_) 
 
 and the probability of an edge to be inserted into the tree is bounded 
 
 k n 
 by p, = 1 - («)/(«). Note that this is the probability even in case 
 
 the graph is not complete. Actually even in the case of one subtree 
 the probability is higher since some edges connecting vertices of 
 the tree might already have been scanned before. 
 
 Let I, denote the number of edges scanned while the tree 
 
 has already k edges until the k+l-th edge is inserted into the tree. 
 The total number of edges taken out of the priority queue is 
 
 n-2 
 
 1 = Z \ 
 k=0 k 
 
 E(I) = E E(I k ) < E E 1 P^l-P^ 1 " 1 < Z Z id-^)/^))^ 7 ^ 1 X 
 k=0 k=0 i=l k=0 i=l 
 
 < z (")/((")- ())) = Z 1/(1 - k ^ ]\ ) = 
 
 k=0 2 2 k=o n(n_1) 
 
 n-2 °° , „ , x °° _ • • n ~2 
 
 = Z n^Cn-l)" 1 Z (k'-k)""" < 
 
 = Z Z ( M k ~^ ) = Z n'^n-l)" 1 Z (k 2 ' 
 
 k=0 1-0 n(n_1) 1-0 k=0 
 
 _. n-2 
 < Z n 1 (n-l) 1 Z k 
 i=0 k=0 
 
 Now let us use the following approximation (see for 
 example [14]) 
 
 Z k = -T--j- n + 0(n ) 
 k=l 
 
 -20- 
 
We shall use the first term and later show that the 
 second term contributes a lower order term. 
 Thus 
 
 E(I) < Z n i (n~l) i (n-2) 2i+1 /(2i+l) = 
 
 i=0 
 
 i , i+1 , „ 9 i+l 
 
 „ 1 /n-2. _ 1 /Bz£\ 
 
 = n Z n ... ( ) < n Z — — ■ (— — ) 
 
 . n 2 l+l n . _ i+I n 
 
 i=0 i=0 
 
 n-2 
 
 Denote q = 
 
 n n 
 
 OO CO 00 
 
 E(I) < n Z — - q =n Z / q dq = n / Z qdq = n / ■= dq 
 
 • n 1+1 • n in 1-c l 
 
 1=0 1=0 1=0 
 
 = -n In (1-q) = -n In (l-(n-2)/n) < n In 
 
 The contribution of the second term in the approximation 
 
 is 
 
 0( Z n i (n-l) 1 (n-2) 21 < 0( Z ((n-2)/n) i ) 
 i=0 i=0 
 
 = 0(1/(1- ^ = 0(n) 
 
 which is of lower order. 
 
 Each elimination of a shortest edge from the priority 
 
 -21- 
 
queue takes at most 0(lg e) . Thus the average time required by 
 Kruskal algorithm is bounded by 0(n In n lg e) . Prim's MST 
 algorithm average behavior was of the same order and these two 
 algorithms are competitive from both the worst case and average 
 behavior. 
 
 -22- 
 
Acknowledgment 
 
 I wish to thank Shmuel Zaks for simplifying the bound 
 on the average number of vertices scanned in BFS. 
 
 -23- 
 
References 
 
 [1] Bloniarz P. A., M. J. Fischer and A. R. Meyer, "A note on the 
 
 average time to compute transitive closures,' 1 Proc. of the 
 
 3rd Int. Colloquium on Automata, Languages and Programming, 
 
 S. Michelson and R. Milner (eds.), July 1976. 
 [2] Cheriton D. and R. E. Tarjan, ''Finding minimum spanning trees, ''' 
 
 SIAM J. Comput., 5(1976), 724-741. 
 [3] Danzig, G. B., Linear Programming and Extensions , Princeton 
 
 University Press, Princeton 1963, 363-366. 
 [4] Dijkstra, E. W. , : 'A note on two problems in connexion with 
 
 graphs, : ' Numer. Math., 1(1959), 269-271. 
 [5] Dinic, E. A., ''Algorithm for solution of a problem of maximum 
 
 flow in a network with power estimation," Sov. Math. Dokl, 
 
 11(1970), 1277-1280. 
 [6] Erdos, P. and A. Renyi, "On random graphs I," Publications 
 
 Mathematicae, 6(1959), 290-297. 
 [7] Hopcroft, J. E. and J. D. Ullman, 'Set merging algorithm," SIAM 
 
 J. Comput., 2(1973), 294-303. 
 [8] Johnson, D. B. , 'Priority queues with update and finding minimum 
 
 spanning trees," Info. Proc. Let., 4(1975), 53-57. 
 [9] Johnson, D. B. , "Algorithms for shortest paths," Ph.D. Thesis, 
 
 Cornell University, 1973. 
 10] Johnson, E. L. , 'On shortest paths and sorting,' Proc. ACM 25th 
 
 Annual Conference, August 1972, Boston, Vol. I, 510-517. 
 
 -24- 
 
[11] Kruskal, J. B. , 'On the shortest spanning subtree of a graph and 
 
 the traveling salesman problem. 1 ' Proc. Amer. Math. Soc, 
 
 7(1956), A8-50. 
 [12] Kerschenbaum, A. and R. Van Slyke, "Computing minimum spanning 
 
 trees efficiently,'' Proc. ACM 25th Annual Conference, 
 
 August 1972, Boston, Vol. 1, 518-527. 
 [13] Prim, R. C, "Shortest connection networks and some generalizations,' 
 
 Bell Sys. Tech. J., 36(1957), 1389-1401. 
 [14] Reingold, E. M. , J. Neivergelt and N. Deo, Combinatorial 
 
 Algorithms: Theory and Practice , Prentice Hall, Englewood 
 
 Cliffs, N.J. , 1977. 
 [15] Spira, P. M. , ''A new algorithm for finding all shortest paths in 
 
 a graph of positive arcs in average time 0(n^ log^ n), r 
 
 SIAM J. Comput., 2(1973), 28-32. 
 [16] Tarjan, R. E. , "Efficiency of a good but not linear set union 
 
 algorithm, : " JACM, 22(1975), 215-225. 
 [17] Yao, A. C. C, "An 0(|e| log log |v|) algorithm for finding 
 
 minimum spanning trees,'' Info. Proc. Let., 4(1975), 21-23. 
 
 -25- 
 
JOCRAPHIC DATA 
 
 ET 
 
 1. Report No. 
 
 UIUCDCS-R-77-905 
 
 2. 
 
 3. Recipient's Accession No. 
 
 ■Ac and Sunt itlc 
 
 Average Analysis of Simple Path Algorithms 
 
 5- Report Date 
 
 November 1977 
 
 6. 
 
 ithor(s) Y> p er ^ 
 
 8. Performing Organization Rcpt. 
 No. 
 
 ■rtorming Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, IL 61801 
 
 10. Pro)ect/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 MCS-73-03408 
 
 p.Tisoiing Organization Name and Address 
 
 National Science Foundation 
 Washington, D.C. 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 uppk mcntary Notes 
 
 Ihsir-uts 
 
 Given a graph of n vertices and e edges, 
 known simple path algorithms is analyzed. The 
 of the number of the edges scanned to find the 
 search and depth first search are shown to be 
 
 The average complexity of several 
 average and the standard deviation 
 target vertex in both breadth first 
 of order n. 
 
 Both the average and the variance of Prim's minimum spanning tree algorithm 
 are shown to require 0(n lg n ln(e/n)) time. The same result holds for Dijkstra's 
 shortest path algorithm. Krususkal's minimum spanning tree algorithm, which 
 competes with Prim's algorithm requires 0(n In n lg e) on the average. 
 
 The connection to related results is discussed. 
 
 cj Words and Document Analysis. 17a. Descriptors 
 
 Average analysis, Average complexity, Breadth first search, Depth first 
 search, Minimum spanning tree, Path algorithms, Probabilistic analysis 
 of algorithms, Shortest path. 
 
 Identif iers/Open-F.nded Terms 
 
 COSATI Field/Group 
 
 vailability Statement 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 NTIS-35 ( 10-70) 
 
 USCOMM-OC 40329-P71 
 
FEB 1 
 
 5 JQ7Q 
 
*w im