Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/extensionofhuffm853park 
 
3 /<?. ^ f~ 
 
 l ^Report No. UIUCDCS-R-77-853 
 
 EXTENSION OF THE HUFFMAN TREE CONSTRUCTION ALGORITHM 
 
 UILU-ENG 77 1729 
 NSF-OCA-MCS73-07980 A03-000026 
 
 ^ 
 
 by 
 
 D. Stott Parker, Jr, 
 
 May 1977 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 

 
Report No. UIUCDCS-R-77-853 
 
 EXTENSION OF THE HUFFMAN TREE CONSTRUCTION ALGORITHM 
 
 by 
 
 D. Stott Parker, Jr. 
 
 May 1977 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 * 
 
 This work was supported in part by the National Science Foundation 
 
 under Grant No. US NSF MCS73-07980 A03. 
 
I. Introduction 
 
 Although Huffman's algorithm was first presented in 1952, and 
 was developed then for a problem in discrete coding [Huf 52], 
 it is still undergoing a considerable amount of research as more and 
 more applications for it are uncovered in various fields. In the last 
 year alone, Itai [Itai 76], van Leeuwen [vanL 76], Glassey and Karp 
 [GK 76], and Golumbic [Gol 76] have presented new perspectives on how 
 the algorithm works and how it or related algorithms can be employed in 
 new ways. Until the present, however, all research has concentrated on 
 two variations of the algorithm which respectively minimize total 
 weighted path length, and measures akin to tree height, of the constructed 
 tree. Applications for weighted path length minimization include 
 
 (1) construction of optimal search trees [Zim 59] , [HT 71], [Itai 76], 
 
 (2) merging of lists [FB 72], [Liu 76], (3) minimization of absolute error 
 in sums [Cap 75] and relative error in products [Sam 75], (4) text file 
 compression [Rub 76], (5) optimal checking for leaky pipelines 
 
 and water pollution [GK 76], and of course (6) construction of minimum 
 redundancy codes [Huf 52] . Applications for tree height minimization 
 include (1) optimal execution time for fanning-in data (in limited task- 
 scheduling systems, and in arithmetic/Boolean sum- or product accumulation 
 (e.g., in dot-products, matrix multiplication), etc.) and related problems 
 related to speed in parallel processing [Gol 76] , (2) optimal use of 
 space (storage or registers) in parse trees, and (3) minimization of 
 error bounds in parse trees of sums [Sam 75]. And this is by no means 
 a complete list. 
 
 Our interest in the algorithm comes mainly from its import in compiling, 
 
Not only does the algorithm build optimal trees with respect to 
 execution time, space usage, and roundoff error for many classes of 
 limited expressions, it does so in near linear time. If N is the number 
 of leaves in the tree to be constructed, Huffman's algorithm can be 
 implemented in time 0(N logN) when a priority queue is used. Moreover, 
 van Leeuwen has shown that this time bound can be reduced to 0(N) if the 
 leaf weights are in sorted order [vanL 76] . (This would suggest that the 
 complexity of the algorithm is lower bounded by 0(N log N) , since sorting 
 is at least that difficult. However an 0(N log N) optimal parsing algorithm, 
 though not linear, is still respectable.) In our opinion the algorithm 
 has great potential in the development of future compiling algorithms, 
 as well as other areas of computer science. Whether it does or not, 
 however, the results of this paper should serve to expose the structure 
 of tree construction with the Huffman algorithm in a way not discussed 
 before. 
 
 This paper addresses and solves in part the following problem. The 
 two variations of the Huffman algorithm mentioned above are based on the 
 same construction process, but use different tree cost functions and node- 
 merging methods. (Specifically, the weighted path-length variation 
 produces internal nodes having weights equal to the sum of the weights 
 of its sons, while the tree-height algorithm uses the maximum of the 
 son weights plus some nonnegative constant. This will all be discussed 
 in greater detail below.). First, it is not clear why these two apparently 
 unrelated methods both produce optimal trees. Second, from the point of 
 view of compiling it would be nice if we could use yet other methods 
 
to construct trees optimal with respect to some other cost measure 
 besides tree height and path length. For example, suppose we wish 
 to construct parse trees for parallel evaluation of products of 
 arithmetic expressions, optimal with respect to some measure of both 
 roundoff and space used. Since error bounds in this case correspond 
 to path-length, and execution time to tree height, an optimal parse 
 tree cannot be constructed using the Huffman algorithm unless a node- 
 merging method more complicated than the two above is used. This 
 problem raises the following question: for exactly which methods will 
 the Huffman algorithm produce optimal trees under exactly which cost? 
 We will show a class of methods exists, encompassing the two standard 
 methods above, which produces optimal trees with the Huffman algorithm 
 under corresponding classes of tree cost functions and ties together in 
 a nice way some results from information theory and the theory of convex 
 functions . 
 
II. Basic Machinery for Huffman Tree Construction 
 
 In this section we define the notation to be used for the rest of 
 the paper. The exposition here is not really introductory and readers 
 seeking more background are referred to [Knu 68] or [Eve 73] . For the 
 time being we confine ourselves to binary tree construction until the 
 essential results are established. The extension to r-ary trees is 
 then straightforward. 
 
 In the binary tree construction problem one is given a set of n+1 
 leaves having corresponding weights { w , w , ... , w }. Although in 
 some problems a particular ordering is to be enforced on the leaves 
 (e.g., [HT 71]) we drop these considerations and presume in this paper 
 that the final order of the leaves in the constructed tree makes no 
 difference. Furthermore the weights need not be normalized so that 
 their sum comes out to be unity or anything like that; we require only 
 that they be nonnegative and, for convenience, sorted by index: 
 
 < w, < w„ < . . . < w , . 
 
 — 1 — 2 — — n+1 
 
 Construction of a (full) binary tree on these leaves is then effected by 
 n merge operations of pairs of available nodes. Each of the nodes in 
 the pair is marked unavailable and their father (the result of the merge) 
 is marked available, having as his weight some function of the pair's 
 weights. Each of the leaves is initially marked available, of course. 
 Note that n merges are necessary and sufficient since all full binary 
 trees on n+1 leaves have n internal nodes. 
 
Simple examples of tree construction are given in Fig. 1 a§b. In Fig. la 
 the weight combination function used is the sum of the son weights; in 
 Fig. lb it is their maximum plus 3.14 . 
 
 (a) weight (root) = weight (left son) 
 
 + weight (right son) 
 
 (b) weight (root) = 
 
 max (weight (left son), 
 weight (right son)) 
 
 + 3.14 
 
 Figure 1. Tree Construction 
 
 Note that each internal node defines the root of a full binary subtree 
 of the constructed tree, so tree construction can be defined inductively 
 in terms of forests (collections of trees) in the obvious way: the 
 construction begins with a forest of n+1 one-node trees and repeatedly 
 reduces the number of trees by 1 via merge operations until there is only 
 one tree left. 
 
 With this in mind we adopt the following notation: 
 
 w. -- j smallest leaf weight (i.e., w is smallest, w . largest) 
 £. -- path length (distance from the root) of the j leaf 
 W. -- i smallest internal node weight 
 
 With each of these the name of the tree or forest in question will be 
 added in parentheses whenever it is not clear from context which tree or 
 forest is meant. Thus W. (T) would be the i smallest internal node weight 
 
in the tree T, and £.(F) would be the current path length of w. to the 
 root of the tree containing it in the forest F. For example, if we 
 let T. and T_ be the trees in Fig. la and lb respectively, then 
 w^Tj) = Wl (T 2 ) = 1 
 
 W 2 ( - T l' ) = w 2 ( - T 2- ) = 2 
 w 3 ( Tl ) = w 3 (T 2 ) - 3 
 
 W = W = 4 
 
 C£ lf £ 2 , £ 3 , ^HTj) = ( 3, 3, 2, 1 ) 
 
 (j^, l 2 , £ 3 , £ 4 )(T 2 ) = ( 2, 2, 2, 2 ) 
 
 and 
 
 (W r W 2 , W 3 )(T 1 ) = ( 3, 6, 10 ) 
 
 (W^ W 2 , W 3 )(T 2 ) = ( 5.14, 7.14, 10.28 ) . 
 
 Finally, if we denote by R the nonnegative reals, let us define the 
 
 2 
 weight combination function F: R — >» R to be the symmetric function used 
 
 to produce the weight of internal nodes generated by a merge operation 
 
 (cf. Fig. 2), and the n-internal node tree cost function G: R + — ^ R to 
 
 be a function on the weights of all the internal nodes of the tree: 
 
 Cost(T) = G( W^T), W 2 (T), ... , W n (T)) . 
 
 Note that if such a tree cost is to be generally useful, it should be 
 
 extensible to arbitrary numbers of arguments and not dependent on some 
 
 fixed value of n, 
 
 (order c^ l«**S l* -K«< 
 
 \ 
 
 Fig. 2 Weight combination function F(x,y) 
 
Huffman ' s algorithm for binary tree construction is now simple to 
 state: To build the Huffman tree, merge at each step the two available nodes 
 of smallest weight (with ties resolved arbitrarily) . Now if 
 
 F(x,y) = x + y and G = sum 
 then it is not hard to show that the cost of any tree T in this system is 
 
 s w co *.cn 
 
 l<j<n+l J J 
 
 which is called the weighted path length of T. Also, if 
 
 F( x >y) = max(x,y) + c (c>0) and G = max 
 
 then the cost of any tree T in this system is 
 
 max ( w .CO + cJLCO ) 
 l£j<n+l J J 
 
 and we call this a tree-height measure of T because when c=l and w.=0 for 
 
 g j 
 
 j=l,...,n+l this cost is exactly the height of T (although it otherwise 
 has nothing directly to do with tree height) . 
 
 The importance of Huffman's algorithm is that it produces, in time 
 0(n log n) or less, optimal trees in both of these systems. Proof of 
 the optimality in the weighted path length system (the one originally 
 considered by Huffman) can be found in a paper by Zimmerman [Zim 59] . 
 To our knowledge a proof of the optimality of Huffman's algorithm in 
 the tree-height system has never been published, possily because 
 Zimmerman's proof mutatis mutandis will work for it as well, possibly because the 
 optimality is intuitively clearer. Examples of the construction in both 
 systems has already been given in Figure 1. In both cases the trees 
 illustrated are the unique optimal-cost trees; note that although they 
 have identical initial weights their structures are entirely different. 
 
III. General Characterization of Huffman Iree Construction 
 
 We begin this section with a result from a recent paper by 
 Glassey and Karp [GK 76], and show how it may be extended in a 
 natural way to characterize the weight structure obtained in trees 
 constructed with the Huffman algorithm in general. 
 
 Definition A weight sequence a is a set of nonnegative numbers 
 
 fa,, a , ... , a '] such that a, < a_ < a, < . . . < a . 
 L 1 2 m J 1—2—3— — m 
 
 We define a partial order on weight sequences of equal length as follows 
 
 Definition Given two weight sequences a = [a,, a-, ... , a "] and 
 
 b = Tb. , b~, ... , b 1, we write 
 L 1' 2 ' nr 
 
 a •< b 
 
 k k 
 
 if Z a. < S b. holds for all k, 1 < k < m 
 i — i * • 
 
 i=l i=l x 
 
 Theorem 1 (Glassey § Karp) Let W(S) = [W (S), W (S) , ... , W (S)] be 
 the weight sequence for the internal nodes in a tree constructed by the 
 Huffman algorithm in the weighted path-length system, and let 
 W(T) = [W 1 (T), W 2 (T), ... , W (T)] be the weight sequence for the 
 internal nodes of any other tree on the same leaf weights. 
 Then W(S) •< W(T) . 
 
 Proof Glassey and Karp actually prove the theorem for the general case 
 of r-ary tree construction, where r may be greater than 2 and the trees 
 need not be full. We only sketch the proof here (it may be found on 
 
pp. 371-373 of [GK 76]) for the binary case, which is sufficient for 
 our current needs. The theorem is a sharpening of the earlier result by 
 Hu and Tucker that "Huffman's algorithm gives an optimal m-sum forest" 
 in the weighted path-length system ( [HT 71], p. 518). In any case it is 
 an important characterization of the Huffman algorithm and will give rise 
 to most of the results in this paper. 
 
 The proof proceeds by induction on k, where we are trying to prove 
 
 k k 
 
 Z W. (S) < Z W. (T) for 1 < k < m. 
 i=l x i=l x 
 
 The basis k=l is trivially true since the Huffman algorithm selects 
 
 W 1 (S) = w. +w_ where w.. and w~ are the two smallest leaf weights. 
 
 The induction step breaks into two cases, one easy, one hard. 
 
 Case 1 : W (S) = W. (T) . In this event since w and w_ are minimal, T 
 
 has effectively combined w. and w~ in a way identical to the Huffman 
 
 algorithm. We are thus reduced to the problem of constructing binary 
 
 trees on the leaf weights {W, (S)=W 1 (T)=w,+w_, w 7 , w., ... , w ..}. 
 
 o 1 ' l v ' 1 2 3 4 n+1 
 
 By induction we know 
 
 k k 
 
 Z W. (S) < S W. (T) 
 i=2 X i=2 1 
 
 for this problem, so it follows that 
 
 k k 
 
 Z W. (S) £ E W. (T) . 
 i=l X i=l 1 
 
 Case 2 : First notice that the weight sequence JX (T) , ... , W,(T)^ 
 corresponds to the weights of internal nodes defining some subforest F, of 
 the tree T. This will be true for any T in the weighted path length system 
 since the weight combination function F satisfies F(x,y) > max(x,y). 
 
10 
 
 If £.(F.) denotes the path length of the leaf with weight w. within 
 this forest (i.e., the distance from this leaf to some root in the 
 
 forest) then 
 
 k n+1 
 
 Z W. (T) = Z w. £.(F, ) . 
 1=1 j = l J J 
 
 If I =max £.(F.) and W (T) is the weight of some internal node whose 
 max J j * p 
 
 two (leaf) sons have depth I , then 
 
 r max 
 
 W (T) _> W 2 (T) > W^S). 
 
 i.e., W (T) = w +w, where {w ,w, } f\ {w.,wj i {w-.w^}. 
 
 p 21 u <1 D XZ 1 Z 
 
 The rest of the theorem shows that the tree T which is 
 obtained by interchanging the leaf weights {w ,w } with {w ,w, } in T, satisfies 
 
 x. Z 3.D 
 
 k k 
 
 Z W. (T) < Z W. (T) . 
 i=l X i=l X 
 
 k k _ 
 
 But since W^T) = w 1 +w 2 we can apply Case 1 to get Z W^S) <_ Z t^CO 
 
 i=l i=l 
 
 and the theorem follows by induction. 
 
 Definition A function 4>:R ->• R is strictly increasing if 
 
 x < y implies 4>(x) < <Ky) • 
 If -<$> is strictly increasing then <b is strictly decreasing . 
 
 If <j) is either strictly increasing or strictly decreasing it is strictly 
 monotone . 
 
 Definition A function cf>:U -*- R is convex if II is a convex subset of R 
 (i.e., U is an interval) and for all x,yell, te[0,l] 
 <Htx+(l-t)y) < t<j>(x) + (i-t)<Hy). 
 cf):U -* R is concave if -<j) is convex. 
 
11 
 
 Examples 
 
 / 
 
 
 FUNCTIONS 
 
 CoHCAVE 
 
 
 \ ^<£ /sfcils J|?(f?*sirt"\^) 
 
 Figure 3. Convex and concave functions 
 
 The above definitions are not the only ones possible -- some texts, 
 for example, define these functional attributes in terms of derivatives: 
 
 $ strictly increasing (decreasing) <==> <J>' > (<J>' < 0) 
 
 $ convex (concave) <==> <J>" > ((J)" < 0) 
 Since we do not need these derivative properties we make no assumptions 
 about the function cf> other than continuity, and the characteristics stated 
 above where applicable. In particular cf> need not be differentiable. 
 
 Definition A function <j>:U ■*■ R is positive if cf)(x) > for all xeU, 
 
 negative if cf>(x) <_ for all xeU, 
 and sign- cons is tent if (J) is positive or negative. 
 
 Theorem 2 Let a and b be two weight sequences of length m such that a-< b, 
 
 If <j> is any concave, strictly increasing function and we define <J>(a) to be the 
 
 weight sequence [<Ka,) ,... ,<K a )1 and similarly for 4>(b), then 
 
 m n n 
 
 <Ka) < <Kb) i-e., 2 4>(a.) < E <j)(b.) for 1 < i < n. 
 
 " " ' i=l i=l 
 
 Proof The theorem is one of several based on analogous results of Hardy, 
 Littlewood, and Polya for convex functions. The proof here is adapted from 
 
12 
 
 section 2.24 of [Mit 70], and is attributed by Mitrinovic to Fuchs [Fuc 47] 
 Define 
 
 I). 
 
 <Ka.)-<Kb.) 
 
 for i=l , . . . , m, 
 
 where we assume without loss of generality that a-^b- for any i. 
 
 (If a.=b. we can delete both from the weight sequences a and b without 
 
 disturbing the inequality we wish to prove) . Since <$> is increasing we have 
 
 D. > for 1 <_ i <_ m; also since a «< b and since <$> is concave we get 
 
 k k 
 
 D. > D. .. Set A, = Z a. and B, = Z b. for 1 < k < m. 
 i-i-l k j=1 j k j=1 j 
 
 Then a ■< b implies A, £ B, for all k, or C\-B k ) < 0. 
 
 Therefore for 1 <_ n <_ m, 
 
 n-1 
 
 Z (A. -B. )(D. -D. J + (A -B ID < 
 k _ \ k JK k k+r n n n - 
 
 n-1 
 
 n-1 
 
 = \ (D k" D k + P + A n D n 1 * B k^ D k- D k + P + B n D n 
 k=l k=l 
 
 Z (A. -A. .) D. 
 , i i-l^ l 
 i=l 
 
 n 
 
 < Z (B.-B. J D. 
 
 — . l l-l l 
 
 i=l 
 
 Z a. D. < Z b. D. 
 
 . , ii — ., 11 
 i=l i=l 
 
 n 
 
 Z (a.-b.) D. < 
 ,i l l — 
 i=l 
 
 n 
 
 Z (J) (a.) - (j)(b.) < 
 i=l 
 
 n 
 
 n 
 
 E <t)(a. ) < Z (j)(b.) . 
 i=l i=l 
 
 QED 
 
13 
 
 With the above in mind we can make our extension of the Huffman 
 tree construction process. For the rest of this paper we assume that 
 our weight combination function F(x,y) has the form 
 
 F(x,y) = f l ( X<f>(x) + A<Ky) ) 
 where X is some positive constant and <j> is a continuous, strictly 
 monotone function. We restrict the domain of definition of $ to some 
 interval U of R , which we call the weight space , and require that 
 F: U 2 -*■ U so that F produces a weight when given two weights. 
 
 The class of functions F this generates has a number of 
 interesting properties: each such F is increasing since <j> is monotone. 
 Each F is symmetric in its variables and can be extended naturally to 
 functions of more than two arguments. (This latter property will be 
 useful at the end of this section when we consider the generalization 
 of binary to r-ary tree construction.) Moreover when A = 1 F is also 
 associative, i.e. , 
 
 F( F(u,v), F(x,y) ) = F( u, F( v, F(x,y) ) ) 
 
 = F( F(x,v), F(u,y) ) 
 = (j) _1 ( <J>(u) + <Kv) + <j>(x) + My) )■ 
 Also note that when X = 1 and <{)(x) = x we obtain 
 
 F(x,y) = x + y 
 -- the weight merging function for the weighted path-length system--, 
 and when X = exp(pc) [c >_ 0] and cf>(x) = exp(px), then 
 
 lim F(x,y) = max(x,y) + c 
 -- the function for the tree-height system. Thus this class of weight 
 merging functions F is broad enough, in the limit at least, to encompass 
 
14 
 
 the two known Huffman-optimal ones. The purpose of this paper is to 
 show first, what the Huffman algorithm produces with these weight 
 merging functions; second, which conditions are needed for this produce 
 to be optimal; and third, why all this is useful. 
 
 One assumption we can make immediately is that the strictly 
 monotone function <f> is strictly increasing since F is invariant of changes 
 to its sign. For this reason we will frequently make statements below 
 requiring <J> to be increasing; if <J> were actually taken to be decreasing 
 then the statement in question would hold for -<J>. More restrictions must 
 be made on 4> and A before we can prove the resulting function F will be 
 useful to us; these restrictions are mainly embodied in the following 
 lemma. First we know we must have F: U 2 ■+ U. Also, we need an analogue 
 of the fact used in the proof of Theorem 1 that F(x,y)=x+y is "monotone" 
 (in the sense that F(x,y) >^ max(x,y) ) which guarantees that the k smallest 
 internal node weights of any constructed tree T comprise the weights of 
 some subforest of T. We satisfy both these restrictions on F in the 
 following way: 
 
 Lemma 1 Let <{>: U -*- R be a strictly increasing function and A be a 
 
 -1 2 
 
 positive constant. If F(x,y) = <J) (A<J>(x) + A<Ky)) is to satisfy F:U •*■ U 
 
 and either F(x,y) <_ min(x,y) Vx,y«U or F(x,y) >_ max(x,y) »x,y^U, 
 
 then we must have \ ~>_1, and <f> must be sign-consistent on U. 
 
 Under these circumstances 
 
 F( x >y) < m in(x,y) Vx,yeU if $ is negative (increasing), 
 
 F( x >y) > max(x,y) yx,y^U if $ is positive (increasing). 
 
15 
 
 Proof Since $ is increasing we have 
 
 F(x,y) £. max(x,y) iff X(f>(x) + X<J)(y) >_ 4>(x) for all x > y in U, 
 F(x,y) £min(x,y) iff X<j>(x) + X<j>(y) <_ (J>(x) for all x <_ y in U. 
 Neither of these can be true if <J> is not sign-consistent, for then the 
 connectivity of the interval U and the continuity of <j) imply a neighborhood 
 of zero would exist in <j>(U) , and for example we could contradict the first 
 inequality above by selecting <f>(x) > 0, 4>(y) = -<J>(x) , giving X«0 = >_ <J)(x). 
 So we conclude <J> must be sign-consistent, and find the "monotonicity" 
 condition on F is satisfied when 
 
 def ( d)fx") \ * is ne S ative [F(x,y) < min(x,y)] 
 X > Xi === sup !■ , f \\J{ ■- ,) and 
 
 x,y\J>W+(p(.yjy ^ is positive [ F (x,y) > max(x,y)] 
 
 de£ / \ ♦ is negative [F(x,y) > max(x,y)] 
 X <_ X === inf| (j)(x) \ and 
 
 x,yU(x)+())(y)) <f> is positive [F(x,y) £min(x,y)] 
 
 We now show that the additional condition X >_ 1 is necessitated by the 
 
 requirement that F: U 2 -> U by considering what happens to the above 
 
 inequalities for all nonzero X (X = is uninteresting, giving F = constant) 
 
 (1) Assume X > 1/2. Then since F: U 2 -> U we have X (<j> (U) +<j> (U) ) cl <f>(U) , 
 implying $(10 must be unbounded, so we have Xo = and Xj = 1. 
 The only nontrivial condition we can satisfy is X >_ Xi = 1, 
 giving as stated F(x,y) >_ max(x,y) if <J) is positive, 
 
 F(x,y) <_ min(x,y) if $ is negative. 
 
 (2) Assume X = 1/2. Since we must always have X < 1/2 and 1/2 < Xi, 
 there is no way to satisfy either X_<X or X >_ Xi. 
 
 (3) Assume < X < 1/2. Then because X(<KU)+<KU)) £ <H U ) zero is 
 a limit point of <}>(U) , so we get X = and Xi > 1/2. Thus 
 there is again no nontrivial way to satisfy either X < Xq or X > Xi. 
 
16 
 
 As an interesting sidelight, note that when 4> is positive increasing , 
 the tree constructed by the Huffman algorithm (for any positive /,) on the 
 leaf weights {w , . . . ,w } is topologically isomorphic to the tree that 
 would be built by the Huffman algorithm with 
 
 F(x,y) = A(x + y) 
 on the leaf weights {<J>(w ■> ) > ••■ ><J>(w ■,)}> although the actual values of 
 the internal node weights would be different unless <t>(x)=x. If <J> is 
 positive decreasing , by contrast, the tree constructed by the Huffman 
 algorithm on {w , ... ,w } is topologically isomorphic to that which 
 would be built by the anti-Huffman algorithm (the tree construction 
 procedure in which the two nodes of greatest weight are merged at each 
 step) with F(x,y) = A(x+y) on the leaf weights (<t>(w-,) , ... ><K W ■>))■ 
 This all follows from the "order-preserving" properties of monotone 
 functions. It should be pointed out that when 4> is positive decreasing 
 and A >_ 1 the Huffman algorithm could always produce the following tree, 
 because then F(x,y) < min(x,y) and the smallest weight is always selected: 
 
 Fig. 4 
 
 This is also the structure of the tree that would be produced if <j) were 
 positive increasing, A >_ 1, and the anti-Huffman algorithm were used, 
 for then we would have F(x,y) >_ max(x,y) and the largest weight would 
 always be selected. This type of tree construction is not particularly 
 interesting but will be covered here in the interest of completeness. 
 
17 
 
 Lemma 1 establishes when F is a "monotone" weight combination 
 function; we are now in a position to extend the results of Theorem 1. 
 If F(x,y) >^max(x,y), then it is clear that the k smallest internal 
 node weights [W (T) , . . . ,W,(T)] of a tree T define a subforest of T. 
 (For, if there is any weight W. (T) in the set corresponding to an 
 internal node whose son's weight W.(T) is not also contained in the 
 set, then W. (T) < W.(T), for otherwise we would have W. (T) in the set. 
 But this is impossible because F(x,y) >_ max(x,y) implies W. (T) >_W.(T).) 
 Thus Lemma 1 asserts that if (f> is positive (resp. negative) increasing 
 and X >_ 1, the resulting internal node weights will have this subforest 
 characterization: every collection of least (resp. greatest) node 
 weights define some subforest. 
 
 The lemma also shows that, for "monotone" F, we can assume <$> is 
 positive (and strictly monotone continuous) instead of assuming it is 
 increasing, since again F is invariant of sign changes to (j>, and 4> must 
 now be either positive or negative. This assumption seems to be the 
 natural one to make because of the following result. 
 
18 
 
 Theorem 3 Let F(x,y) = <{) ( A<f)(x) + A<Hy) ) be the tree construction 
 weight combination function where <\> is convex , positive, and strictly monotone 
 and A >_ 1 . If, as in Theorem 1, W(S) and W(T) are the weight sequences 
 for the internal nodes of the trees S and T, constructed respectively by the 
 Huffman algorithm and by any other way, then 
 
 W(S) ■< W(T) . 
 (The same results hold if $ is concave , negative, and strictly monotone.) 
 
 Proof The proof has two cases, accordingly as <J> is strictly increasing or 
 
 strictly decreasing. 
 
 Case 1 ; (j) convex, positive, and strictly increasing. 
 
 We accomplish the proof in two steps: using the notation of Theorem 2, 
 
 we first show that the weight sequences <t>(W(S)) and 4>(W(T)) satisfy 
 
 <KW(S)) < <HW(T)) 
 and then, since <J> is concave increasing in this case, we can apply Theorem 2 
 to get W(S) < W(T) as desired. 
 
 w 
 
 If there are n+1 initial leaf weights w n , . . . ,w .. we have as above 
 
 & 1 n+1 
 
 (S) = rw i (S),...,W (S)] and W(T) = [w^T) , . . . ,W (T)] as the internal 
 
 node weight sequences where W. is the i -smallest such weight. In particular 
 
 since W. (S) designates the weight of some internal node which is the root of 
 
 some subtree S. of the Huffman tree S, if we define 
 i 
 
 1. = { j w. i 
 
 is a 
 
 leaf of S. } 
 
 i 
 
 &.(S.) = path length of weight w. in the subtree S. , 
 
 £.(S.) 
 a - Z A J x 4,(w ) 
 
 m* 3 
 
 then W. (S) = (t>" 1 (a.). 
 
19 
 
 Defining J*, and £.(T.) in an analogous manner, if we set 
 
 i.CT.) 
 
 b. = s x J x «v 
 
 i j e ^ J 
 
 then W. (T) = ^ _1 (b.). 
 
 We now claim that the sets a = [a.. , . . . ,a "] and b = [b. , . . . ,b "] 
 
 are weight sequences satisfying a -< b. First of all since <J> is 
 
 positive increasing and W(S) , W(T) are weight sequences, we know that 
 
 a = <})(W(S)) and b = <KW(T)) are weight sequences; in fact <J> "preserves 
 
 order" since if W. (S) < W. (S) then a. = <t>(W. (S)) < <j>(W.(S)) = a. 
 
 1 J i i J J > 
 
 and similarly for W(T) and b. 
 
 Second, by Lemma 1 we know that F(x,y) >^ x,y in this case, so the 
 k smallest node weights [w , . . . ,W, ] for either S or T correspond to a 
 subforest F, of S or T. This implies 
 
 k 
 
 
 k A (T.) 
 
 Z b. 
 
 = 
 
 Z Z A J <|)(w.) 
 
 i=l x 
 
 
 i=l j£3^ J 
 
 n+1 £ -C F v) 
 
 S ( A + A 2 + ... + A J k ) <J>(w.) 
 j = l J 
 
 ^/n» + l ^ (F k ) 
 
 XTi) 2 ( A J K - 1 ) <j)(w j if A > 1 
 j = l J 
 
 n+1 
 
 if A = 1, 
 
 j=l J k J 
 
 k 
 with a similar expression holding for Z a. . 
 
 i=l 1 
 
 We can now directly apply Glassey and Karp's method of proof for 
 rem 1. The proof proceeds by inducti 
 prove a <^ b by showing for all k that 
 
 Theorem 1. The proof proceeds by induction on k, where we are trying to 
 
 k k 
 
 Z a. < Z b.. 
 . , i — ,i 
 i=l i=l 
 
20 
 
 The basis k=l is trivial, and for the induction step there are two 
 
 possibilities, depending on the relationship between a. = ^(W.(S)) and 
 
 h 1 = (KW^T)). 
 
 Subcase 1 : a = b. 
 
 In this case we know <$> (a.) = <$> (b ) = F(w ,w_) and we are reduced to the 
 
 proof on the set of leaf weights {Ffw^w-), w , ... , w }, for which 
 
 we have by induction that 
 
 k k 
 
 Z a. < Z b. 
 . 1 — . i 
 i=2 i=2 
 
 So, 
 
 k k 
 
 Z a. < Z b 
 
 . i — . , 
 
 i=l i=l 
 
 Subcase 2 : a < b. 
 
 As in Theorem 1, we show there is a tree T with internal weights W. (T) = 4> (c^) 
 
 where c = [c ,,..., c ] satisfies Z c. < Z b. and, in addition, 
 
 in t — i ' 
 
 l<i<k l<i<k 
 
 c. = a, so we have (by reduction to subcase 1) 
 
 k k k 
 
 Z a. < Z c. < Z b. 
 . l — . , i — • i i 
 i=l i=l i=l 
 
 completing the proof that a < b. This is easily done by taking the 
 
 forest F, corresponding to the least k weights (W.. (T) , . . . ,W,(T) } of T and 
 
 defining as before the maximum path length in this forest 
 
 i k = max I . (F. ) . 
 max i k 
 
 3 J 
 
 We then choose an internal node having weight W (T)=c{> (b )=F(w ,w ) whose 
 
 P p r s 
 
 2 (leaf) sons have path length & in F, and have leaf weights w and w . 
 
 Since a n < b n < b we know (w ,w }0(w n ,w n ) 4 (w.,,w„}. Assuming w < w , 
 1 1 — p r' s 12 12 6 r — s 
 
 let T be the tree constructed exactly like T but with the leaf weights 
 
21 
 
 w and w. , w and w_ interchanged. Then F, is still a subforest of T 
 (topologically T and T are isomorphic) and determines a subset of k of 
 T's internal node weights, and consequently some k-subset of the weight 
 sequence c. Specifically, if we define T. , <) . , £.(T.) exactly as above 
 
 so that Mf) 
 
 C i " .? A ] ' <Kw.) 
 
 and W. (T) = <jf (c), then F, defines the set 
 
 1 X K 
 
 <Sl = -C i | cf> + (c.)is the weight of some internal node in F, } 
 
 and \&.\ = k. Moreover we must have 
 
 k 
 
 E c. < £ c. 
 i=l x - ie£ x 
 
 since the first k weights c. are the least such weights. But we also have 
 
 k 
 E c. < Z b. . 
 ie& x i=l 1 
 
 To show this we write for convenience 
 
 K(V ^ r (T) 4, CO 1(f) 
 
 A^X 1 = A r A 2 = X =X 
 
 l k (T) £ (T) £ (T) I. (T) £ (T) 
 
 A = X max = X r = A s = A X = A 2 
 m 
 
 <J> r = 4>O r ) <(> s = (|)(w s ) ^ = <j)(w 1 ) <J> 2 s (f,(w 2 ) 
 
 There fore 
 
 (f> > (J), and (J) > <})_, and in the case X > 1 
 T r — 1 s — T 2 
 
 since . 
 
 ,k 
 
 I AT), JL(T) < fc (T) 
 1 2 v ' — max v ' 
 
 we have 
 
 A, < A and A~ < A . 
 1 — m 2 — m 
 
 So, if A >1, 
 
22 
 
 Z c. 
 ie,8 x 
 
 k 
 
 Z b 
 1-1 : 
 
 (A) | < Vl + V2 + Vr + W 
 
 ( Vl + A 2*2 + Vr + Vs> 
 
 ( CA^ApC^-^) ♦ (A m -A 2 )(* 2 ^ s ) ) 
 
 . 
 
 A trivial modification of this argument gives the proof for X-l, so we 
 omit it here. Thus we have shown 
 
 £ c. < Z c. < Z b, 
 
 i=l 
 
 ie<& 
 
 i = l 
 
 but since c =<j> (W..(T)) = <p (F(w ,w~)) = a. we have, by reduction to 
 
 subcase 1, that 
 
 Therefore 
 
 Z a. < Z c. . 
 . , 1 — . , 1 
 i=l i=l 
 
 Z a. < Z b. 
 . , 1 — . . 1 
 i=l i=l 
 
 and Theorem 3 follows for Case 1, since we have shown that a <( b, and, 
 since here <J) is concave increasing we can apply Theorem 2 to get immediately 
 W(S) = c))" 1 (a) < f 1 ^) = W(T). 
 
 Case 2: tfi convex, positive, strictly decreasing 
 
 Actually in this case the Theorem is something of an understatement 
 
 We are comparing here the weight sequences 
 
 W(S) = [W 1 (S),...,W n (S)] = [^ 1 (a n ),...,0- 1 (a 1 )] 
 and W(T) = [W 1 (T),...,W n (T)] = fo" 1 ^) , . . . .f 1 ^)] , 
 where a = [a^...^] and b = [b , . . . ,b ] are weight sequences 
 as in case 1. From the discussion following Lemma 1 we see that 
 
23 
 
 a would be the internal node weight sequence formed using the 
 anti-Huffman algorithm on the leaf weights {<(>(w ) , . . . »<K w n+1 )} • 
 It follows that a, >_ b for 1 < k <n, and thus we easily have 
 both a y b and W(S)< W(T) as consequences. 
 
 Theorem 3 is apparently the most general possible result of its 
 kind. To show what can happen when cj) is not convex, we consider an example 
 where f is concave positive. Let <f>(x) = v'x , A = 1, and U = R + so that 
 
 F(x,y) = ( v^ *Jy ) 2 , 
 and suppose we are to build a tree given the leaf weights (1,2,3,4). The 
 Huffman algorithm produces the tree S 
 
 Fig. 5a 
 
 for which we have I W. (S) = 5.83 + 13.93 + 37.78 = 57.73, while the tree T 
 
 i=l x 
 
 Fig. 5b 
 
 3 j 
 
 has Z W. (T) = 9 + 9.90 + 37.78 = 56.68, so W(S) ifw(T). That this 
 
 i=l 1 ' 
 
 phenomenon will always happen when cf) is not convex is a result of the 
 
 converse of Theorem 2, which says that 
 
 m m 
 
 E 4>(a.) £ E <Kb.) for all a -< b ■ ■ " ■ > <p concave increasing, 
 i=l 1 i=l 1 
 
 The proof is easy and we omit it. 
 
 In addition to Theorem 3 we have the following characterization 
 of Huffman construction with the functions F considered in this paper. 
 
24 
 
 Theorem 4 If F(x,y) = 4> (A<J>(x) + A<t)(y)) where A >^ 1 and <\> is increasing, 
 continuous, and sign-consistent (as in Lemma 1), then 
 
 W^S) < W X (T) and WJS) < W n (T), 
 i.e., the smallest and largest Huffman internal node weights are no larger 
 than the corresponding smallest and largest node weights in any other tree. 
 
 Proof The proof is like that of Theorem 3. We discuss only the case where 
 <p is positive , so W.. (S) = F(w ,w ) <_ W (T) follows immediately and we must 
 show W (S) <_ W (T) . If <J> is negative, the proof is similar, but there 
 
 W (S) = F(w w ) <_ W (T) and we must show W (S) < W (T) . Here we have 
 
 n+1 £.(S) 
 W (S) = 4> _i ( Z X J <j>(w ) ), 
 j = l J 
 
 n+1 I (T) 
 W n (T) = <D _i ( I A J <J,(w ) ). 
 j = l J 
 
 and because <J) is increasing 
 
 w n (s) < W n (T) iff 4>CW n (S)) < «D(W n (T)) 
 
 MS) MT) 
 
 or equivalents iff Za j (w } < Zx J <, (w>)< 
 
 We prove this inequality by induction on n, the number of internal nodes 
 in the constructed tree. As a basis we have W (S) = W (T) for n=l, and 
 when n=2 there are only three inequivalent trees: 
 
 Fig. 6 T^ /\ \ V.-/ / \ 
 
 w l w 2 W 3 "l W 2 W 3 W l W 3 W 2 
 
 Assuming w <_ w~ <_ w , we have 
 
 (J)(W 2 (S)) = <KW 2 (Ti)) = A 2 (j)(wi) + A 2 cj)(w2) + A$(w 3 ) 
 
 and <j>(W 2 (T 2 )) = A <j>Oi) + A 2 (J>(w 2 ) + A 2 4>(w 3 ) 
 
 <KW 2 (T 3 )) = A 2 tj)(w 1 ) + A<j)(w 2 ) + A 2 <Kw 3 ). 
 Since (J) is increasing and A > 1 we find 
 
25 
 
 <KW 2 (S)) - cJ)(W 2 (T 2 )) = (A 2 -X) ((j)(wO - 4>(w 3 )) i 
 
 <KW 2 (S)) - <J)CW 2 (T 3 )) = (A 2 -X) C*(wi.) - (J>(w 2 )) < 0, 
 so for any tree T constructed here we have W 2 (S) <_ W 2 (T) . 
 
 For the induction step we borrow once again the argument of Glassey 
 and Karp used already in Theorems 1 and 3 by breaking down the problem 
 into two cases, one trivial, one not so trivial. 
 Case 1 W (S) = W-CT) 
 
 In this case the theorem follows by induction since we are reduced to the 
 problem of constructing a tree having (n-1) internal nodes on the leaf 
 weights {W 1 (S)=W 1 (T)=F(w 1 ,w 2 ), w^ ... , w^} . 
 
 Case 2 W^S) < W (T) 
 
 To show W (S) <_ W (T), or equivalently <f>(W (S)) <_ <j> (W (T)), we show there 
 
 exists a tree T such that <j> (W (S)) <_<j>(W (T)) <_<J>(W (T)). This is 
 
 accomplished in the manner described in Theorem 3: in the tree T we 
 
 select an internal node having weight W (T) = F(w ,w ) whose two (leaf) 
 
 sons have maximal path length I = max I . (T) in T. Since W, (S) f W. (T) 
 r max 1 11 
 
 J J 
 
 we know {w ,w }n{w.,w„} ^ {w..w,}. Assuming w < w , let T be the tree 
 rs 1*2 1*2 6 r — s 
 
 constructed exactly like T but with w and w n , w and w„ interchanged. 
 
 r 1 s 2 b 
 
 Then Wj (T) = F(w ,v* 2 ) = W (S) so by Case 1 above we have W (S) < W (f) . 
 
 However we also have 
 
 I I (T) I £ ? (T) 
 
 4>(W n (T)) - ♦CW n CT}) = (X maX - A - 1 )»(w l )-«(w r )) + (A maX - X )(4>(w 2 )-<|)(w s )) 
 
 < . 
 
 Consequently W (T) <_ W (T) and we obtain W (S) <_ W (T) as desired 
 
on 
 
 26 
 
 In summary, we have the following characterization of binary tree 
 
 construction with the Huffman algorithm using Ffx.yj = % (/i(x)*>^( / 
 
 Assuming X > 1 and <$> is positive, strictly monotone, and continuous, if 
 
 S is the Huffman tree and T is any other, then the internal node weight 
 
 sequences W(S) and W(T) satisfy 
 
 k k 
 
 (1) W(S) < W(T) i.e., E W. (S) < Z W. (T) f or 1 < k < n 
 
 i=l X i=l 
 
 (provided <$> is additionally convex . ) 
 
 (2) W 1 (S)<W 1 (T) and W n (S)<W n (T). 
 
 These results will be exploited in section IV. We finish this section by 
 outlining how the above characterization extends to r-ary tree constructi 
 
 Theorem 5 Let everything be defined as in Theorem 3, with the exception 
 that we let F:U ■+ U be the r-ary function (r >_ 2) 
 
 -1 r 
 F(x ,x , ... ,x ) = <|> X ( A Z <Kx.) ). 
 i z r i=1 i 
 
 Then the results of Theorems 3 and 4 still hold. 
 
 We omit the proof, which is virtually identical to that of these theorems, with 
 the changes that we must now define F on less that its full r arguments in 
 the natural way (In the binary case all constructed trees are full, but 
 that is no longer true in the r-ary case. If n+1 leaf weights are 
 provided, the Huffman algorithm selects exactly the 2 + [(n-1) mod (r-1)] 
 smallest weights for the first weight combination, and this quantity is 
 not necessarily equal to r; however choosing this many weights guarantees 
 that all future weight combinations can merge r weights.), and the details 
 of showing that the tree T gives us inequalities like W(S) •< W(T) < W(T) 
 are slightly more complicated but no different in method. These details 
 are covered in Glassey and Karp's proof in [GK 76]. 
 
27 
 
 IV. Cost Functions under which Huffman Trees are Optimal 
 
 In section III we described the properties of the internal node 
 weights in Huffman trees with the weight combination function 
 F(x,y) = <P (^M+AcKy)) • I n this section we exhibit several classes 
 of tree cost functions for which the Huffman trees are optimal by 
 exploiting these results as much as possible. As indicated above 
 in section II we are considering cost a function of the constructed 
 internal node weights, so formally 
 
 Cost(T) = G(W(T)) = GCWjCT), ... ,W n (T)). 
 Thus G: U + R is to be a function under which Huffman internal node 
 weight sequences have smallest image. We show now that if 4>: R ■*■ R 
 is some strictly increasing function, possibly satisfying other conditions 
 to be made explicit later, then cost functions of the forms 
 
 Gi(W(T)) = <K max W. (T) ) [or \\>( min W. (T) ) ] 
 
 G 2 (W(T)) =£ <K W.(T) ) 
 
 G 3 (W(T)) = ip" 1 C Z U W i (T) ) ) 
 
 G-(W(T)) = ^( G m (W(T)) ) m = 1,2,3,4 
 force Huffman trees to be optimal. This result constitutes our (partial) 
 answer to the question posed in the introduction, and leads to connections 
 between the tree height and weighted path-length systems discussed in 
 section II. Applications will be taken up below. 
 
 For the following four theorems we compare the cost of trees S and T 
 built using the weight combination function F(x,y) of section III, where 
 S is the tree built by the Huffman algorithm and T is any other tree. 
 
28 
 
 As usual W(S) = [W 1 (S),...,W n (S)] and W(T) = [Wj (T) , . . . .W^T) ] 
 denote the internal node weight sequences for these trees. 
 
 Theorem 6 If F(x,y) is as in Theorem 4 and the cost of T is given by 
 
 Cost^T) = GiOVCT)) = <KW n (T)) 
 or alternatively = ip(W.(T)) 
 
 where ty is any strictly increasing function, then Costi(S) < Costi(T). 
 
 Proof Immediate from Theorem 4. Notice only that if <$> is positive 
 increasing [decreasing] then from Lemma 1 we must have F(x,y) >_ max(x,y) 
 [F(x,y) £min(x,y)] so W (T) is the first [last] and W (T) the last [first] 
 
 node weight constructed. 
 
 Theorem 7 If F(x,y) is as in Theorem 3 and the cost of T is given by 
 
 n 
 Cost 2 (T) = G 2 (W(T)) = I iKW.(T)) 
 
 i=l 1 
 
 where \p is a continuous, strictly increasing function such that 
 
 ^oc}> is concave increasing if (J> is positive increasing, 
 
 and ipocj) is decreasing if $ is positive decreasing 
 
 ( o denotes functional composition) , then 
 
 Cost 2 (S) < Cost 2 (T). 
 
 This result was observed for the case (f>(x)=x, ^=1 by Glassey and Karp [GK 76] 
 
 Proof This is essentially a corollary of Theorem 3. Using the notation 
 in that proof, note that 
 
29 
 
 n n 1 
 
 Cost 2 (S) = I iKW. (S)) = I ij;«4>" (a.) 
 i=l x i=l 1 
 
 and similarly 
 
 n n 1 
 
 Cost 2 (T) = I ip(W . (T)) = £ iHfr" CO. 
 1-1 i=l 
 
 where a- [a r ...,a n ] = [WjCS)) , . . ,<f>(W n (S))] 
 
 and b = [b ,,..., b ] = [$(W 1 (T)) , . . ,<|>(W (T))] are weight sequences, 
 
 We now break the proof down into the two cases indicated above. 
 
 Case 1 tj> positive increasing 
 
 Since (J> is increasing we know from Case 1 of Theorem 3 that a -< b. 
 
 Now if i{/*4> is concave (we already know it is increasing since both 
 
 ij> and <J> are) then by Theorem 2 we get 
 
 lM -1 (a) < iM^Cb) 
 
 n -1 n -1 
 so in particular I ip« (f> (a.) <_ S ^°<}> (b.) 
 
 i=l X i=l ' 1 
 
 or equivalents ^^ (s) £ Cost2 (J) ^ 
 
 Case 2 $ positive decreasing 
 
 Since <j> is decreasing we have from Case 2 of Theorem 3 that a > b, 
 
 k — k 
 
 for all k, 1 <_ k <_ n. If ip«<J>~ is decreasing then 
 
 ^°<T (a R ) < \p<><$>~ (b k ) for all k, and clearly Cost 2 (S) <_Cost 2 (T). 
 
30 
 
 Theorem 8 If F(x,y) is as in Theorem 3 and the cost of T is given by 
 
 -1 n 
 Cost 3 (T) = G 3 (W(T)) = 4, l ( Z KW.(T)) ) 
 
 i=l 
 where ip is a continuous, strictly monotone function, then 
 
 Cost3(S) _< Cost 3 (T) if \p<><p satisfies the condition indicated in the 
 
 following table: 
 
 (J> positive increasing <J> positive decreasing 
 
 -i 
 \p increasing 
 
 \p decreasing 
 
 , a" 1 
 
 \p»<p concave increasing 
 
 tyttp convex decreasing 
 
 ^•0"' decreasing 
 
 ^•4> 
 
 1 
 
 increasing 
 
 Note that, since the sign of \p does not affect G we can always assume 
 ip to be increasing; if it were chosen to be decreasing we could simply 
 use -\p. The above table reflects this fact. 
 
 Proof Follows directly from Theorem 7. We can assume ip increasing as 
 just indicated, so \p is then increasing and we have 
 
 Cost 3 (S) = ip" 1 (Cost 2 (S)) < ^(Cost 2 CO) = Cost 3 (T). 
 
 Theorem 9 If F(x,y) is as in Theorem 3 and the cost of T is given by 
 
 CosnCT) = G^CWCT)) = iK G m (W(T)) ) (m=l,2,3) 
 where \p is a continuous, strictly increasing function, then 
 
 cosn(s) < cosncr). 
 
 Proof Immediate from the order-preserving properties of increasing functions 
 
31 
 
 Although these are the only cost functions we discuss here, it 
 
 should be emphasized that they are not the only possible Huffman-optimal 
 
 ones. For example we could define 
 
 k 
 Cost 5 (T) = G 5 k (W(T)) = I iKW.(T)) 
 
 ' i=l 
 
 for any k between 1 and n, and prove optimal ity of Huffman trees by 
 following the arguments used for Theorem 7. However the cost measures 
 given here seem the most useful and natural at present: among other 
 things they are all symmetric, associative, and extensible in the 
 senses described at the beginning of section III. It is, nevertheless, 
 possible that other cost functions may be determined to be useful 
 for specific problems,. 
 
 As a particular application of Theorem 8, consider the case where 
 
 U = R + and <j>(x) = x a , ty(x) = x 6 . (a, 3 ^0) 
 
 Note <$> is convex decreasing for a < 
 
 concave increasing for < a <_ 1 
 convex increasing for 1 <_ a. 
 
 Thus the table for Theorem 8 can be broken down as follows: since 
 
 1 
 
 8/a 
 
 l|K<j>: ( x ) = X 
 the Huffman algorithm gives optimal trees for the system generated by <J> and ty 
 
 (specifically F(x,y) = ( A(x a + y a ) ) / ) whenever 
 
 
 a > 
 
 ■ a < 
 
 3 > 
 
 < 8/a <_ 1 
 
 8/a < 
 
 3 < o 
 
 8/a < 
 
 < 8/a 
 
32 
 
 This can be plotted as follows 
 
 Fig. 7 
 
 ct ft 
 
 Therefore for A>_ 1, ())(x)=x , <Kx)=x , with any a,R in the shaded region, 
 
 Huffman's algorithm will produce an optimal tree. The class of weight 
 
 combination and tree cost functions generated by this <J> and ty has many 
 
 interesting properties, and have been studied extensively in the theory 
 
 of means, a major foundation of the 
 
 modern theory of inequalities. The interested reader is referred to 
 
 Chapters 1-3 of [HLP 34]. We stop here only to give one final result; 
 
 namely, the promised tie-up between the tree-height and weighted path- length 
 
 systems. 
 
 a ft 
 
 Let <j>(x) = x > 'P(x) = x as above. Then when X = 1, 
 
 if a = 1 we have 
 
 F(x,y) = x + y, 
 
 and lim F(x,y) = lim (x + y ) 
 
 a-* 00 
 
 a-* 10 
 
 max(x,y) . 
 
 If we let 3 = a and let a-* 30 we then find tp° $ = identity and 
 
 -1 n -1 
 
 COSt(T) = ty ( Z lpe(}) C E 4>(w.)) ) 
 
 i=l 
 
 JetTi 
 
 = lim ( Z Z (w.) a ) /a 
 
 a-*» i=l je5l ■* 
 
 max 
 
 w 
 
 l<;<n+l - 1 
 Thus we have a partial connection between the two systems, where the 
 
33 
 
 tree-height system is a limit of many other systems. To show that 
 
 F(x,y) = max(x,y) + c , 
 where c is a positive constant, is also contained in the framework 
 provided by Theorem 8 requires a little more work. We let p be some 
 positive real variable and define A = exp(pc), where c is the desired 
 
 en 
 
 constant, and let <j>(x) = <Kx) = exp(px), U = R . Th 
 F(x,y) = <fr -1 ( X<D(x) + A<Ky) ) 
 
 = - logC exp(p(x+c)) + expCpCy+c)) ). 
 
 Here <$> and ty are both increasing on U and ^°<f> (x) = x is concave increasing 
 
 so by Theorem 8 we know the Huffman algorithm produces optimal trees in 
 
 this system, for any positive value of p. But it then follows that 
 
 lim F(x,y) = max(x,y) + c 
 p-x» 
 
 and it is easily verified that the cost function G(W(T)) is the max 
 function as before. Strictly speaking we should demonstrate here that 
 such limits of sequences of tree construction systems are in fact tree 
 construction systems (producing optimal trees with the Huffman algorithm) 
 but since it is well-known that the tree-height system is such a system 
 we forgo such a demonstration here. Thus the class of weight combination 
 functions F and cost functions G given by Theorem 8 is, at any rate, 
 large enough to contain the previously known Huffman -optimal systems, 
 namely the path-length and tree-height ones. 
 
34 
 
 V. Applications and Open Problems 
 
 We have just shown that for wide classes of tree construction 
 systems the Huffman algorithm produces cost-optimal trees. These classes 
 are £0 wide that the general notation probably invites the reader to think 
 that the results are too vague to be useful. We try to counteract this 
 
 Q, g 
 
 impression by giving some examples beyond the <$>(x) = x , <Kx) = x 
 one presented just above. 
 
 Consider first F(x,y) = xy defined on the weight space U = [0,1]. 
 We of course have F:U 2 ■+ U, and find X = 1 and <K X ) = -log(x) on U, a 
 (positive) convex decreasing function (the base of the log is immaterial, 
 and for convenience we take it to be e here) . Theorem 8 then ensures 
 
 that any cost function G generated by, for example, 
 iKx) = < or 
 
 O _ 1 
 
 x (3 > 0) (so ipo<j>~ (x) = exp(-Bx) is decreasing) 
 
 a log (x) (a > 0) (so i|*> <J>~ (x) = +ax is increasing) 
 
 will give a system optimal with the Huffman algorithm. Specifically if 
 
 o 
 
 we choose 5=1 and ip(x) = x = x, then 
 
 , n n 
 
 Cost(T) = f\ E iKW.(T)) ) = E W. (T) 
 
 i=l i=l 
 
 so the cost of the tree is the sum of the internal node weights, each of 
 
 which is the product of its sons' weights. For example: 
 
 Fig. 8 
 
 Total cost = .06 + .024 + .0012 = .0852 
 
35 
 
 Also if we take a = 1 and ip(x) = -log(x), then 
 
 , n n 
 
 CostCT) = G(W(T)) = f^ S <J>(W (I)) ) = n W (T) 
 
 i=l x i=l x 
 
 so the cost of the tree is the product of the internal node weights, 
 
 each of which is the product of its sons' weights. This system is 
 
 therefore the multiplicative analogue of the weighted path-length system. 
 
 As an example, note that for the tree above the total cost would be here 
 
 (.06) (.024) (.0012) = (.000001728). 
 
 The example F(x,y) = xy is interesting because it changes 
 
 drastically if we change the weight space U from [0,1] to [0 ,°°) or [b,°°) 
 
 for any b >_ 1. When U = [O, 00 ) we can say nothing, for there is then no 
 
 positive, strictly monotone function <j> for which F(x,y) = <J> (<|>(x) + <Ky) )• 
 
 When U = [b, 00 ) we wind up with <j)(x) = +log(x), a (positive) concave increasing 
 
 function. In particular, then, if i|»(x) = x as for the example with U=[0,1] 
 
 then 4>°i> (x) = exp(x) 
 
 which is not concave increasing as required by Theorem 8 and will not 
 
 give optimal trees with the Huffman algorithm: 
 
 155^ 
 
 Fig. 9 
 
 zm 
 
 Huffman but suboptimal tree 
 Total cost = 6+20+120 = 146 
 
 non-Huffman but optimal tree 
 Total cost = 10+12+120 = 142 
 
V. 
 
 Another application of this extension of Huffman tree construction 
 is the generation of codes which are optimal under criteria other than 
 Huffman's original one, equivalent to weighted path-length [Huf 52]. A 
 moderate literature has grown up around this subject; it is surprising 
 that no corresponding analogue of Huffman's algorithm has also been 
 developed. We outline several known results, including interesting lower 
 bounds on average codeword length like that of the Noiseless Coding 
 Theorem, and then present these Huffman analogues. 
 
 In 
 
 the context of coding the leaf weights {w , . . . ,w . } are proba- 
 
 bilities (so Zw. = 1), representing the relative frequencies of occurrence 
 of a set of (n+1) messages which are to be encoded into D-ary codewords 
 (D >_ 2). Let the length of the message with probability w. be called %.; 
 we are then interested in minimizing the "quasiarithmetic mean codeword 
 
 length" [Acz 74], [Cam 66] 
 
 -1 n+1 
 L(u,{w },{£}) = y x ( Z w. y(JL) ) 
 
 J j i = l 
 
 or some similar code cost measure; here y is a continuous, strictly 
 increasing function on R . For example, when y(x) = x we get the 
 traditional weighted path-length; other "translative" forms of L have 
 been considered in [Cam 66], [Acz 74], and [Nath 75]. Although this 
 measure of codeword length is quite general, most special cases treated 
 in the literature can be handled by the extended Huffman construction 
 presented here. We consider three cases one by one; each is based on 
 Renyi ' s entropy of order a 
 
37 
 
 n+1 
 
 a 
 
 H fw ...... ,w .) = ■= — log„f S w. ) 
 
 or 1' n+1 1-a 6 D^ . , i J 
 
 3=1 
 
 Here D is the size of the code letter alphabet, i.e., codewords can 
 
 be viewed as D-ary numbers. Renyi's entropy has the interesting property 
 
 that its limit, as a -> 1, is the usual Shannon entropy 
 
 n+1 
 H(w_,...,w -) = - E w log D (w ). 
 
 j=l j J 
 
 Campbell [Cam 65] now defines an exponential codeword length average L(t) 
 
 tx 
 by setting y(x) = D so that 
 
 I . 
 
 L(t) = | log D ( E w. D t£ j ) = log A ( E Wj A j ) 
 
 where t > and A = D > 1 . He then proves that 
 
 (1) lim L(t) = I w. I. 
 t-+0 j : J 
 
 (2) lim L(t) = max I 
 
 (3) H (w, , . . . ,w . ) < L(t) where a = -r— — = ■= = ^rr- 
 
 v J or 1' n+1' — v * 1+t 1 + log_(A) 
 
 -I. 
 
 i ct ct 
 with equality holding when D J = w. /(Ew. ). 
 
 Now consider general Huffman construction as discussed in section II with 
 
 F(x,y) = A(x+y) and G(W(T)) = log, (W (T)). Then 
 
 a n 
 
 Cost(T) = G(WCT)) = L(t) = L( lo g[) CA) ), 
 so Huffman construction with this weight combination function F produces 
 optimal exponential-length-cost trees by Theorem 6. 
 
 Aczel [Acz 74], besides citing results of Campbell for the degenerate 
 
 case t<0 (A<1) above, considers the result when y(x) = (A -1)/(A-1) 
 
 (again, A = D ) and shows that 
 
 , n+1 
 L(y) = y~ C I w y(£ ) ) 
 j=l J J 
 

 satisfies ( ( E w . l ) ' - 1 )/( X - 1 ) < L( where again 
 
 a = l/(l+t) = 1/(1 + log A). Bat notice that wh- 
 
 and G(W(T)) = u _1 (j I W. (T) ), then because u(m) = 1 ♦ I ♦ ... ♦ z™" 1 (**l t J 
 
 Cost(T) = G(W(T)) = L(u). 
 So, by Theorems 7 and 9, since u ' is increasing, Huffman construction 
 with this function F again produces the optimal code tree (identical to 
 the one constructed for Campbell's average codeword length). 
 
 Lastly, Nath has come up with nice results by defining what he calls 
 
 the average codeword length of order a (a > 1) [Nath 75] 
 
 n+1 (ct-l)£. 
 L(a) = (a - l)" 1 log ( I w °Tj J / w a ) 
 
 j = l J 
 
 n+1 I. 
 log,( Z w X J I w ) 
 
 ex ex fcx- 1 1 
 
 where w = Z w. and A = D^ . He shows that 
 
 3 -I. 
 
 H (w . . ,w ) < L(a) with equality iff w. = D J for all j. 
 
 Now when F(x,y) = (Xx a + Xy a ) 1/a and G(W(T)) = log, (W (T) a / w a ) 
 
 A n 
 
 we find Cost(T) = G(W(T)) = L(a), so by Theorem 6 Huffman 
 construction with this function F produces optimal trees here, i.e., 
 produces code trees of least average length L(a). To illustrate this, 
 we consider construction of an optimal binary code for the ensemble of 
 13 messages given in [Huf 52]. One of the nice features of L(a) is that 
 its limit as a -> 1 is the traditional average codeword length (weighted 
 path-length); so in Figure 10 we display optimal code trees for the 
 ensemble under the cost function L(a) for both a = 1 and a = 2, giving 
 codeword assignments and L(cx) in each case. 
 
39 
 
 Fig. 10a a = 
 
 r xti 
 
 OOlU\ OOIHO 00110 O\0o\ OlOOD 0O|O| OPIOO olo| III mo on ooo to 
 
 lb) = Z>jlj ■ IX > 3-42 
 
 Fig. 10b a = 2 
 
 FCx ; ^=>[I^ r 
 
 OOO00 O000| O|O0O OIOOI 0O\0O OOIOI OOII OOOI 0|o/ /op fo/ ©'/ 
 
 UXj = 
 
 h* [Z wj* Z ai /Z «/] - «fl« [V-V-Sw/] >Mt 
 
40 
 
 There are several open problems remaining. First, it is currently 
 hard to determine, given some arbitrary weight combination and tree cost 
 functions F and G, the explicit forms for X, <f>, and \p (if they exist at 
 all) which let us apply the above theorems and prove the optimality of 
 Huffman trees. It is not difficult, though, to give necessary conditions 
 for F and G to be of the proper form. First both functions must be 
 symmetric, increasing functions of their variables. G is also associative 
 when it is not monadic, and F(x,y) = 4> (X<|)(x) + X<f>(y)) is associative 
 if and only if X=l; however F must satisfy the bi-symmetry property 
 
 F( F(u,v) , F(x,y) ) = F( F(u,x) , F(v,y) ) 
 [Acz 48] whether X = 1 or not. Much more is known for the case X = 1: 
 Thielman [Thi 51] gives forms for (f> when F (or G) is rational, and 
 
 Aczel and Daroczy [AD 75jp.l51] prove that homogeneity of F implies 
 
 ex 
 <J)(x) = x or (j)(x) = log x, as do Hardy-Littlewood-Polya [HLP 34]. 
 
 Thus a number of tests for decomposability into these conjugated forms 
 
 exist, but apart from tedious determination of the power series for 
 
 <j> (x) by repeated differentiation of the defining equation 
 
 F( (t) _1 (x) , c^V) ) = 4> _1 (2Xx) 
 
 no direct method of decomposition is known to us. 
 
 Another large problem is to find if other tree construction 
 systems apart from those considered here are optimal under the Huffman 
 
41 
 
 algorithm. If we consider F(x,y) = <j> (Acf>(x) + A<Ky)) with < X < 1 , 
 for example, the proof techniques in section III break down because 
 we are no longer guaranteed that F: U 2 -»■ U or that the least k internal 
 node weights in a tree determine a subforest of that tree. However 
 Huffman trees could still conceivably be optimal. Also, if we let 
 
 F(x,y) = (J) _:L ( CWCx)) + CWGO) ) 
 where £ belongs to some class of monotone functions besides £(z) = Xz, 
 it is open whether Theorem 3 can still be proved. It cannot be when 
 £(x) = x 2 , for then we have the example in Figure 11, for 
 
 which W(S).-<W(T). 
 
 Fig. 11 
 
 Huffman but suboptimal tree 
 W +W +W = 2+8+68 = 78 
 
 1 2 3 
 
 non-Huffman but optimal tree 
 W +W +W = 5+5+50 = 60 
 
 1 2 3 
 
 This breakdown is at least in part due to F's no longer being 
 "quasi linear," but the details must be clarified. It would be very 
 interesting if we could prove that quasilinearity of F is a necessary 
 condition for Huffman optimality under the cost functions of section IV; 
 currently this paper only outlines how quasilinearity is sufficient. 
 
42 
 References 
 
 [Acz 48] Aczel, J. "On Mean Values." Bull. AMS 54, 392-400(1948) . 
 
 [Acz 74] . "Determination of all additive quasi arithmetic mean 
 
 codeword lengths". Z. Wahrsch. verw. G. 29, 351-360(1974;. 
 
 [AD 75] § Z. Daroczy. On Measures of Information and their 
 
 Characterizations . NY: Academic Press, 1975. 
 
 [Cam 65] Campbell, L.L. "A Coding Theorem and Renyi's Entropy". 
 Information and Control 8, 423-429(1965). 
 
 [Cam 66] . "Definition of Entropy by Means of a Coding Problem" 
 
 Z. Wahrsch. verw. G . 6_, 113-118(1966). 
 [Cap 76] Caprani, 0. "Roundoff Errors in Floating-Point Summation" 
 
 BIT 15_, 5-9(1975). 
 
 [Eve 73] Even, S. Algorithmic Combinatorics , ch.7. NY: Macmillan, 1973. 
 
 [FB 72] Frazer, W.D. § B.T. Bennett. "Bounds on Optimal Merge Performance, 
 and a Strategy for Optimality." J ACM 1_9, 641-648 (Oct. 1972). 
 
 [Fuc 47] Fuchs, L. "A New Proof of an Inequality of Hardy-Littlewood-Polya." 
 Mat. Tidsskr. B 1947, pp. 53-54. 
 
 [GK 76] Glassey, C.R. & R.M. Karp. "On the Optimality of Huffman Trees." 
 SI AM J Appl Math 31_, 2, 368-372 (Sept . 1976). 
 
 [Gol 76] Golumbic, M.C. "Combinatorial Merging". IEEE Transactions on 
 Computers TC-25, 11, 1164-1167(Nov. 1976). 
 
 [HLP 34] Hardy, G.H., J.E. Littlewood, G. Polya. Inequalities . 
 
 Cambridge: The University Press, 1934. 
 [HT 71] Hu, T.C. $ A.C. Tucker. "Optimal Computer Search Trees and 
 
 Variable- length Alphabetical Codes." SI AM J Appl Math 21 , 
 
 4, 514-532(1971). 
 
 [Huf 52] Huffman, D.A. "A Method for the Construction of Minimum-redundancy 
 Codes." Proc IRE 40, 1098-1101(1952). 
 
 
43 
 
 [Itai76] Itai, A. "Optimal Alphabetic Trees". SI AM J Comput 5_, 9-18(1976). 
 
 [Knu 68] Knuth, D.E. Section 2.3.4.5 on "Path length" in Fundamental Algorithms: 
 
 The Art of Computer Programming v.l . Reading, MA: Addison-Wesley 1968 
 
 [Liu 76] Liu, J.W.S. "Algorithms for Parsing Search Queries in Inverted File 
 Document Retrieval Systems." ACM Trans Database Systems 1_, 
 4, 299-316(Dec. 1976). 
 
 [Mit 70] Mitrinovic, D.S. Analytic Inequalities . NY: Springer-Verlag, 1970. 
 
 [Nath75] Nath, P. "On a Coding Theorem Connected with Renyi's Entropy". 
 Information and Control 29, 234-242(1975). 
 
 [Rub 76] Rubin, F. "Experiments in Text-file Compression". CACM 19_, 
 
 11, 617-623(Nov. 1976). 
 [Sam 75] Sameh, A.H. Unpublished manuscript. 
 
 [Thi 51] Thielman, H.P. "A Note on a Functional Equation." 
 Am J Math 73, 2, 482-484(Apr. 1951). 
 
 [vanL76] van Leeuwen, J. "On the Construction of Binary Trees " 
 
 rd 
 Proc 3 Intern ational Colloquium on Automata, Languages, 
 
 § Programming . Edinburgh, July 1976, pp. 382-410. 
 [Zim 59] Zimmerman, S. "An Optimal Search Procedure." AMM 66 , 690-693(1959). 
 
IIBLIOGRAPHIC DATA 
 
 MEET 
 
 1. Report No. 
 
 UIUCDCS-R-77-853 
 
 . Title and Subtitle 
 
 EXTENSION OF THE HUFFMAN TREE CONSTRUCTION ALGORITHM 
 
 3. Recipient's Accession No. 
 
 Author(s) 
 
 D. Stott Parker, Jr, 
 
 5. Report Date 
 
 May, 1977 
 
 6. 
 
 8. Performing Organization Rept. 
 
 No -UIUCDCS-R-77-853 
 
 Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 !. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 
 
 Supplementary Notes 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 US NSF MCS73-07980 A03 
 
 13. Type of Report & Period 
 Covered 
 
 Technical Report 
 
 14. 
 
 . Abstracts 
 
 The Huffman algorithm is a well-known method, increasingly finding applications in 
 computer science, for constructing optimal binary (or r-ary) trees on a given set of 
 terminal nodes. Each node has some associated weight, and these weights are combined 
 as the construction continues to form new weights for the internal nodes of the tree; 
 the Huffman algorithm merely specifies which nodes are to be combined at any step of 
 the construction. Here a new formulation of this type of tree construction is pre- 
 sented in a way that leads naturally to a solution of the following question: for 
 exactly which weight combination functions does the Huffman algorithm produce optimal 
 trees under exactly which cost? It is shown that there is a broad class of such 
 functions which produce optimal trees in conjunction with the Huffman algorithm under 
 correspondingly wide classes of cost criteria. In addition the known results about 
 Huffman tree construction and related concepts from information theory and the theory 
 of convex functions are fieri together. anH snggPsM nns nn pn BC -tM P f,,t-„T-g app i-t^M^ 
 
 Key Words and Document Analysis. 1/a. Descriptors * * , 
 
 are given. 
 Tree construction 
 Huffman algorithm 
 Optimal (minimal-cost) trees 
 Tree-height 
 Weighted path-length 
 Renyi entropies 
 
 '• Identifiers/Open-Ended Terms 
 
 COSATI Field/Group 
 
 Availability Statement 
 
 fnlimited 
 
 M NTIS-35 (10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED. 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 45 
 
 22. Price 
 
 USCOMM-DC 40329-P71 
 
*V<> ! 6 ?97f' 
 
*** 
 
 - ----N