Digitized by the Internet Archive in 2013 http://archive.org/details/extensionofhuffm853park 3 /» R to be the symmetric function used to produce the weight of internal nodes generated by a merge operation (cf. Fig. 2), and the n-internal node tree cost function G: R + — ^ R to be a function on the weights of all the internal nodes of the tree: Cost(T) = G( W^T), W 2 (T), ... , W n (T)) . Note that if such a tree cost is to be generally useful, it should be extensible to arbitrary numbers of arguments and not dependent on some fixed value of n, (order c^ l«**S l* -K«< \ Fig. 2 Weight combination function F(x,y) Huffman ' s algorithm for binary tree construction is now simple to state: To build the Huffman tree, merge at each step the two available nodes of smallest weight (with ties resolved arbitrarily) . Now if F(x,y) = x + y and G = sum then it is not hard to show that the cost of any tree T in this system is s w co *.cn ly) = max(x,y) + c (c>0) and G = max then the cost of any tree T in this system is max ( w .CO + cJLCO ) l£j max(x,y). 10 If £.(F.) denotes the path length of the leaf with weight w. within this forest (i.e., the distance from this leaf to some root in the forest) then k n+1 Z W. (T) = Z w. £.(F, ) . 1=1 j = l J J If I =max £.(F.) and W (T) is the weight of some internal node whose max J j * p two (leaf) sons have depth I , then r max W (T) _> W 2 (T) > W^S). i.e., W (T) = w +w, where {w ,w, } f\ {w.,wj i {w-.w^}. p 21 u <1 D XZ 1 Z The rest of the theorem shows that the tree T which is obtained by interchanging the leaf weights {w ,w } with {w ,w, } in T, satisfies x. Z 3.D k k Z W. (T) < Z W. (T) . i=l X i=l X k k _ But since W^T) = w 1 +w 2 we can apply Case 1 to get Z W^S) <_ Z t^CO i=l i=l and the theorem follows by induction. Definition A function 4>:R ->• R is strictly increasing if x < y implies 4>(x) < is strictly increasing then :U -*- R is convex if II is a convex subset of R (i.e., U is an interval) and for all x,yell, te[0,l] (x) + (i-t) ' > (' < 0) $ convex (concave) <==> " > ((J)" < 0) Since we do not need these derivative properties we make no assumptions about the function cf> other than continuity, and the characteristics stated above where applicable. In particular cf> need not be differentiable. Definition A function :U ■*■ R is positive if cf)(x) > for all xeU, negative if cf>(x) <_ for all xeU, and sign- cons is tent if (J) is positive or negative. Theorem 2 Let a and b be two weight sequences of length m such that a-< b, If is any concave, strictly increasing function and we define (a) to be the weight sequence [(b), then m n n (a.) < E is increasing we have D. > for 1 <_ i <_ m; also since a «< b and since <$> is concave we get k k D. > D. .. Set A, = Z a. and B, = Z b. for 1 < k < m. i-i-l k j=1 j k j=1 j Then a ■< b implies A, £ B, for all k, or C\-B k ) < 0. Therefore for 1 <_ n <_ m, n-1 Z (A. -B. )(D. -D. J + (A -B ID < k _ \ k JK k k+r n n n - n-1 n-1 = \ (D k" D k + P + A n D n 1 * B k^ D k- D k + P + B n D n k=l k=l Z (A. -A. .) D. , i i-l^ l i=l n < Z (B.-B. J D. — . l l-l l i=l Z a. D. < Z b. D. . , ii — ., 11 i=l i=l n Z (a.-b.) D. < ,i l l — i=l n Z (J) (a.) - (j)(b.) < i=l n n E (x) + A is a continuous, strictly monotone function. We restrict the domain of definition of $ to some interval U of R , which we call the weight space , and require that F: U 2 -*■ U so that F produces a weight when given two weights. The class of functions F this generates has a number of interesting properties: each such F is increasing since is monotone. Each F is symmetric in its variables and can be extended naturally to functions of more than two arguments. (This latter property will be useful at the end of this section when we consider the generalization of binary to r-ary tree construction.) Moreover when A = 1 F is also associative, i.e. , F( F(u,v), F(x,y) ) = F( u, F( v, F(x,y) ) ) = F( F(x,v), F(u,y) ) = (j) _1 ( (u) + (x) + My) )■ Also note that when X = 1 and <{)(x) = x we obtain F(x,y) = x + y -- the weight merging function for the weighted path-length system--, and when X = exp(pc) [c >_ 0] and cf>(x) = exp(px), then lim F(x,y) = max(x,y) + c -- the function for the tree-height system. Thus this class of weight merging functions F is broad enough, in the limit at least, to encompass 14 the two known Huffman-optimal ones. The purpose of this paper is to show first, what the Huffman algorithm produces with these weight merging functions; second, which conditions are needed for this produce to be optimal; and third, why all this is useful. One assumption we can make immediately is that the strictly monotone function is strictly increasing since F is invariant of changes to its sign. For this reason we will frequently make statements below requiring to be increasing; if were actually taken to be decreasing then the statement in question would hold for -. More restrictions must be made on 4> and A before we can prove the resulting function F will be useful to us; these restrictions are mainly embodied in the following lemma. First we know we must have F: U 2 ■+ U. Also, we need an analogue of the fact used in the proof of Theorem 1 that F(x,y)=x+y is "monotone" (in the sense that F(x,y) >^ max(x,y) ) which guarantees that the k smallest internal node weights of any constructed tree T comprise the weights of some subforest of T. We satisfy both these restrictions on F in the following way: Lemma 1 Let <{>: U -*- R be a strictly increasing function and A be a -1 2 positive constant. If F(x,y) = (x) + A_ max(x,y) »x,y^U, then we must have \ ~>_1, and must be sign-consistent on U. Under these circumstances F( x >y) < m in(x,y) Vx,yeU if $ is negative (increasing), F( x >y) > max(x,y) yx,y^U if $ is positive (increasing). 15 Proof Since $ is increasing we have F(x,y) £. max(x,y) iff X(f>(x) + X_ 4>(x) for all x > y in U, F(x,y) £min(x,y) iff X(x) + X(y) <_ (J>(x) for all x <_ y in U. Neither of these can be true if is not sign-consistent, for then the connectivity of the interval U and the continuity of (U) , and for example we could contradict the first inequality above by selecting (x) > 0, 4>(y) = -(x) , giving X«0 = >_ must be sign-consistent, and find the "monotonicity" condition on F is satisfied when def ( d)fx") \ * is ne S ative [F(x,y) < min(x,y)] X > Xi === sup !■ , f \\J{ ■- ,) and x,y\J>W+(p(.yjy ^ is positive [ F (x,y) > max(x,y)] de£ / \ ♦ is negative [F(x,y) > max(x,y)] X <_ X === inf| (j)(x) \ and x,yU(x)+())(y)) is positive [F(x,y) £min(x,y)] We now show that the additional condition X >_ 1 is necessitated by the requirement that F: U 2 -> U by considering what happens to the above inequalities for all nonzero X (X = is uninteresting, giving F = constant) (1) Assume X > 1/2. Then since F: U 2 -> U we have X ( (U) + (U) ) cl (U) , implying $(10 must be unbounded, so we have Xo = and Xj = 1. The only nontrivial condition we can satisfy is X >_ Xi = 1, giving as stated F(x,y) >_ max(x,y) if _ Xi. (3) Assume < X < 1/2. Then because X((U) , so we get X = and Xi > 1/2. Thus there is again no nontrivial way to satisfy either X < Xq or X > Xi. 16 As an interesting sidelight, note that when 4> is positive increasing , the tree constructed by the Huffman algorithm (for any positive /,) on the leaf weights {w , . . . ,w } is topologically isomorphic to the tree that would be built by the Huffman algorithm with F(x,y) = A(x + y) on the leaf weights {(w ■> ) > ••■ >(w ■,)}> although the actual values of the internal node weights would be different unless (x)=x. If is positive decreasing , by contrast, the tree constructed by the Huffman algorithm on {w , ... ,w } is topologically isomorphic to that which would be built by the anti-Huffman algorithm (the tree construction procedure in which the two nodes of greatest weight are merged at each step) with F(x,y) = A(x+y) on the leaf weights ((w-,) , ... >))■ This all follows from the "order-preserving" properties of monotone functions. It should be pointed out that when 4> is positive decreasing and A >_ 1 the Huffman algorithm could always produce the following tree, because then F(x,y) < min(x,y) and the smallest weight is always selected: Fig. 4 This is also the structure of the tree that would be produced if _ 1, and the anti-Huffman algorithm were used, for then we would have F(x,y) >_ max(x,y) and the largest weight would always be selected. This type of tree construction is not particularly interesting but will be covered here in the interest of completeness. 17 Lemma 1 establishes when F is a "monotone" weight combination function; we are now in a position to extend the results of Theorem 1. If F(x,y) >^max(x,y), then it is clear that the k smallest internal node weights [W (T) , . . . ,W,(T)] of a tree T define a subforest of T. (For, if there is any weight W. (T) in the set corresponding to an internal node whose son's weight W.(T) is not also contained in the set, then W. (T) < W.(T), for otherwise we would have W. (T) in the set. But this is impossible because F(x,y) >_ max(x,y) implies W. (T) >_W.(T).) Thus Lemma 1 asserts that if (f> is positive (resp. negative) increasing and X >_ 1, the resulting internal node weights will have this subforest characterization: every collection of least (resp. greatest) node weights define some subforest. The lemma also shows that, for "monotone" F, we can assume <$> is positive (and strictly monotone continuous) instead of assuming it is increasing, since again F is invariant of sign changes to (j>, and 4> must now be either positive or negative. This assumption seems to be the natural one to make because of the following result. 18 Theorem 3 Let F(x,y) = <{) ( A is convex , positive, and strictly monotone and A >_ 1 . If, as in Theorem 1, W(S) and W(T) are the weight sequences for the internal nodes of the trees S and T, constructed respectively by the Huffman algorithm and by any other way, then W(S) ■< W(T) . (The same results hold if $ is concave , negative, and strictly monotone.) Proof The proof has two cases, accordingly as is strictly increasing or strictly decreasing. Case 1 ; (j) convex, positive, and strictly increasing. We accomplish the proof in two steps: using the notation of Theorem 2, we first show that the weight sequences (W(S)) and 4>(W(T)) satisfy is concave increasing in this case, we can apply Theorem 2 to get W(S) < W(T) as desired. w If there are n+1 initial leaf weights w n , . . . ,w .. we have as above & 1 n+1 (S) = rw i (S),...,W (S)] and W(T) = [w^T) , . . . ,W (T)] as the internal node weight sequences where W. is the i -smallest such weight. In particular since W. (S) designates the weight of some internal node which is the root of some subtree S. of the Huffman tree S, if we define i 1. = { j w. i is a leaf of S. } i &.(S.) = path length of weight w. in the subtree S. , £.(S.) a - Z A J x 4,(w ) m* 3 then W. (S) = (t>" 1 (a.). 19 Defining J*, and £.(T.) in an analogous manner, if we set i.CT.) b. = s x J x «v i j e ^ J then W. (T) = ^ _1 (b.). We now claim that the sets a = [a.. , . . . ,a "] and b = [b. , . . . ,b "] are weight sequences satisfying a -< b. First of all since is positive increasing and W(S) , W(T) are weight sequences, we know that a = <})(W(S)) and b = "preserves order" since if W. (S) < W. (S) then a. = (W. (S)) < (W.(S)) = a. 1 J i i J J > and similarly for W(T) and b. Second, by Lemma 1 we know that F(x,y) >^ x,y in this case, so the k smallest node weights [w , . . . ,W, ] for either S or T correspond to a subforest F, of S or T. This implies k k A (T.) Z b. = Z Z A J <|)(w.) i=l x i=l j£3^ J n+1 £ -C F v) S ( A + A 2 + ... + A J k ) (w.) j = l J ^/n» + l ^ (F k ) XTi) 2 ( A J K - 1 ) 1 j = l J n+1 if A = 1, j=l J k J k with a similar expression holding for Z a. . i=l 1 We can now directly apply Glassey and Karp's method of proof for rem 1. The proof proceeds by inducti prove a <^ b by showing for all k that Theorem 1. The proof proceeds by induction on k, where we are trying to k k Z a. < Z b.. . , i — ,i i=l i=l 20 The basis k=l is trivial, and for the induction step there are two possibilities, depending on the relationship between a. = ^(W.(S)) and h 1 = (KW^T)). Subcase 1 : a = b. In this case we know <$> (a.) = <$> (b ) = F(w ,w_) and we are reduced to the proof on the set of leaf weights {Ffw^w-), w , ... , w }, for which we have by induction that k k Z a. < Z b. . 1 — . i i=2 i=2 So, k k Z a. < Z b . i — . , i=l i=l Subcase 2 : a < b. As in Theorem 1, we show there is a tree T with internal weights W. (T) = 4> (c^) where c = [c ,,..., c ] satisfies Z c. < Z b. and, in addition, in t — i ' l (b )=F(w ,w ) whose P p r s 2 (leaf) sons have path length & in F, and have leaf weights w and w . Since a n < b n < b we know (w ,w }0(w n ,w n ) 4 (w.,,w„}. Assuming w < w , 1 1 — p r' s 12 12 6 r — s let T be the tree constructed exactly like T but with the leaf weights 21 w and w. , w and w_ interchanged. Then F, is still a subforest of T (topologically T and T are isomorphic) and determines a subset of k of T's internal node weights, and consequently some k-subset of the weight sequence c. Specifically, if we define T. , <) . , £.(T.) exactly as above so that Mf) C i " .? A ] ' + (c.)is the weight of some internal node in F, } and \&.\ = k. Moreover we must have k E c. < £ c. i=l x - ie£ x since the first k weights c. are the least such weights. But we also have k E c. < Z b. . ie& x i=l 1 To show this we write for convenience K(V ^ r (T) 4, CO 1(f) A^X 1 = A r A 2 = X =X l k (T) £ (T) £ (T) I. (T) £ (T) A = X max = X r = A s = A X = A 2 m r = 4>O r ) <(> s = (|)(w s ) ^ = 2 s (f,(w 2 ) There fore (f> > (J), and (J) > <})_, and in the case X > 1 T r — 1 s — T 2 since . ,k I AT), JL(T) < fc (T) 1 2 v ' — max v ' we have A, < A and A~ < A . 1 — m 2 — m So, if A >1, 22 Z c. ie,8 x k Z b 1-1 : (A) | < Vl + V2 + Vr + W ( Vl + A 2*2 + Vr + Vs> ( CA^ApC^-^) ♦ (A m -A 2 )(* 2 ^ s ) ) . A trivial modification of this argument gives the proof for X-l, so we omit it here. Thus we have shown £ c. < Z c. < Z b, i=l ie<& i = l but since c = (W..(T)) =

(w ) , . . . »_ b for 1 < k (x) = v'x , A = 1, and U = R + so that F(x,y) = ( v^ *Jy ) 2 , and suppose we are to build a tree given the leaf weights (1,2,3,4). The Huffman algorithm produces the tree S Fig. 5a for which we have I W. (S) = 5.83 + 13.93 + 37.78 = 57.73, while the tree T i=l x Fig. 5b 3 j has Z W. (T) = 9 + 9.90 + 37.78 = 56.68, so W(S) ifw(T). That this i=l 1 ' phenomenon will always happen when cf) is not convex is a result of the converse of Theorem 2, which says that m m E 4>(a.) £ E

(A(x) + A^ 1 and <\> is increasing, continuous, and sign-consistent (as in Lemma 1), then W^S) < W X (T) and WJS) < W n (T), i.e., the smallest and largest Huffman internal node weights are no larger than the corresponding smallest and largest node weights in any other tree. Proof The proof is like that of Theorem 3. We discuss only the case where

is negative, the proof is similar, but there W (S) = F(w w ) <_ W (T) and we must show W (S) < W (T) . Here we have n+1 £.(S) W (S) = 4> _i ( Z X J (w ) ), j = l J n+1 I (T) W n (T) = CW n (S)) < «D(W n (T)) MS) MT) or equivalents iff Za j (w } < Zx J <, (w>)< We prove this inequality by induction on n, the number of internal nodes in the constructed tree. As a basis we have W (S) = W (T) for n=l, and when n=2 there are only three inequivalent trees: Fig. 6 T^ /\ \ V.-/ / \ w l w 2 W 3 "l W 2 W 3 W l W 3 W 2 Assuming w <_ w~ <_ w , we have (J)(W 2 (S)) = (W 2 (T 2 )) = A Oi) + A 2 (J>(w 2 ) + A 2 4>(w 3 ) 1 we find 25 (w 3 )) i (w 2 )) < 0, so for any tree T constructed here we have W 2 (S) <_ W 2 (T) . For the induction step we borrow once again the argument of Glassey and Karp used already in Theorems 1 and 3 by breaking down the problem into two cases, one trivial, one not so trivial. Case 1 W (S) = W-CT) In this case the theorem follows by induction since we are reduced to the problem of constructing a tree having (n-1) internal nodes on the leaf weights {W 1 (S)=W 1 (T)=F(w 1 ,w 2 ), w^ ... , w^} . Case 2 W^S) < W (T) To show W (S) <_ W (T), or equivalently (W (S)) <_ (W (T)), we show there exists a tree T such that (W (S)) <_(W (T)) <_(W (T)). This is accomplished in the manner described in Theorem 3: in the tree T we select an internal node having weight W (T) = F(w ,w ) whose two (leaf) sons have maximal path length I = max I . (T) in T. Since W, (S) f W. (T) r max 1 11 J J we know {w ,w }n{w.,w„} ^ {w..w,}. Assuming w < w , let T be the tree rs 1*2 1*2 6 r — s constructed exactly like T but with w and w n , w and w„ interchanged. r 1 s 2 b Then Wj (T) = F(w ,v* 2 ) = W (S) so by Case 1 above we have W (S) < W (f) . However we also have I I (T) I £ ? (T) 4>(W n (T)) - ♦CW n CT}) = (X maX - A - 1 )»(w l )-«(w r )) + (A maX - X )(4>(w 2 )-<|)(w s )) < . Consequently W (T) <_ W (T) and we obtain W (S) <_ W (T) as desired on 26 In summary, we have the following characterization of binary tree construction with the Huffman algorithm using Ffx.yj = % (/i(x)*>^( / Assuming X > 1 and <$> is positive, strictly monotone, and continuous, if S is the Huffman tree and T is any other, then the internal node weight sequences W(S) and W(T) satisfy k k (1) W(S) < W(T) i.e., E W. (S) < Z W. (T) f or 1 < k < n i=l X i=l (provided <$> is additionally convex . ) (2) W 1 (S)_ 2) -1 r F(x ,x , ... ,x ) = <|> X ( A Z : R ■*■ R is some strictly increasing function, possibly satisfying other conditions to be made explicit later, then cost functions of the forms Gi(W(T)) = ( min W. (T) ) ] G 2 (W(T)) =£ is positive increasing [decreasing] then from Lemma 1 we must have F(x,y) >_ max(x,y) [F(x,y) £min(x,y)] so W (T) is the first [last] and W (T) the last [first] node weight constructed. Theorem 7 If F(x,y) is as in Theorem 3 and the cost of T is given by n Cost 2 (T) = G 2 (W(T)) = I iKW.(T)) i=l 1 where \p is a continuous, strictly increasing function such that ^oc}> is concave increasing if (J> is positive increasing, and ipocj) is decreasing if $ is positive decreasing ( o denotes functional composition) , then Cost 2 (S) < Cost 2 (T). This result was observed for the case (f>(x)=x, ^=1 by Glassey and Karp [GK 76] Proof This is essentially a corollary of Theorem 3. Using the notation in that proof, note that 29 n n 1 Cost 2 (S) = I iKW. (S)) = I ij;«4>" (a.) i=l x i=l 1 and similarly n n 1 Cost 2 (T) = I ip(W . (T)) = £ iHfr" CO. 1-1 i=l where a- [a r ...,a n ] = [WjCS)) , . . ,(W n (S))] and b = [b ,,..., b ] = [$(W 1 (T)) , . . ,<|>(W (T))] are weight sequences, We now break the proof down into the two cases indicated above. Case 1 tj> positive increasing Since (J> is increasing we know from Case 1 of Theorem 3 that a -< b. Now if i{/*4> is concave (we already know it is increasing since both ij> and are) then by Theorem 2 we get lM -1 (a) < iM^Cb) n -1 n -1 so in particular I ip« (f> (a.) <_ S ^°<}> (b.) i=l X i=l ' 1 or equivalents ^^ (s) £ Cost2 (J) ^ Case 2 $ positive decreasing Since is decreasing we have from Case 2 of Theorem 3 that a > b, k — k for all k, 1 <_ k <_ n. If ip«~ is decreasing then ^°<$>~ (b k ) for all k, and clearly Cost 2 (S) <_Cost 2 (T). 30 Theorem 8 If F(x,y) is as in Theorem 3 and the cost of T is given by -1 n Cost 3 (T) = G 3 (W(T)) = 4, l ( Z KW.(T)) ) i=l where ip is a continuous, strictly monotone function, then Cost3(S) _< Cost 3 (T) if \p<>

positive increasing positive decreasing -i \p increasing \p decreasing , a" 1 \p»

1 increasing Note that, since the sign of \p does not affect G we can always assume ip to be increasing; if it were chosen to be decreasing we could simply use -\p. The above table reflects this fact. Proof Follows directly from Theorem 7. We can assume ip increasing as just indicated, so \p is then increasing and we have Cost 3 (S) = ip" 1 (Cost 2 (S)) < ^(Cost 2 CO) = Cost 3 (T). Theorem 9 If F(x,y) is as in Theorem 3 and the cost of T is given by CosnCT) = G^CWCT)) = iK G m (W(T)) ) (m=l,2,3) where \p is a continuous, strictly increasing function, then cosn(s) < cosncr). Proof Immediate from the order-preserving properties of increasing functions 31 Although these are the only cost functions we discuss here, it should be emphasized that they are not the only possible Huffman-optimal ones. For example we could define k Cost 5 (T) = G 5 k (W(T)) = I iKW.(T)) ' i=l for any k between 1 and n, and prove optimal ity of Huffman trees by following the arguments used for Theorem 7. However the cost measures given here seem the most useful and natural at present: among other things they are all symmetric, associative, and extensible in the senses described at the beginning of section III. It is, nevertheless, possible that other cost functions may be determined to be useful for specific problems,. As a particular application of Theorem 8, consider the case where U = R + and (x) = x a , ty(x) = x 6 . (a, 3 ^0) Note <$> is convex decreasing for a < concave increasing for < a <_ 1 convex increasing for 1 <_ a. Thus the table for Theorem 8 can be broken down as follows: since 1 8/a l|K: ( x ) = X the Huffman algorithm gives optimal trees for the system generated by and ty (specifically F(x,y) = ( A(x a + y a ) ) / ) whenever a > ■ a < 3 > < 8/a <_ 1 8/a < 3 < o 8/a < < 8/a 32 This can be plotted as follows Fig. 7 ct ft Therefore for A>_ 1, ())(x)=x , and ty has many interesting properties, and have been studied extensively in the theory of means, a major foundation of the modern theory of inequalities. The interested reader is referred to Chapters 1-3 of [HLP 34]. We stop here only to give one final result; namely, the promised tie-up between the tree-height and weighted path- length systems. a ft Let (x) = x > 'P(x) = x as above. Then when X = 1, if a = 1 we have F(x,y) = x + y, and lim F(x,y) = lim (x + y ) a-* 00 a-* 10 max(x,y) . If we let 3 = a and let a-* 30 we then find tp° $ = identity and -1 n -1 COSt(T) = ty ( Z lpe(}) C E 4>(w.)) ) i=l JetTi = lim ( Z Z (w.) a ) /a a-*» i=l je5l ■* max w l<;(x) = and ty are both increasing on U and ^° (x) = x is concave increasing so by Theorem 8 we know the Huffman algorithm produces optimal trees in this system, for any positive value of p. But it then follows that lim F(x,y) = max(x,y) + c p-x» and it is easily verified that the cost function G(W(T)) is the max function as before. Strictly speaking we should demonstrate here that such limits of sequences of tree construction systems are in fact tree construction systems (producing optimal trees with the Huffman algorithm) but since it is well-known that the tree-height system is such a system we forgo such a demonstration here. Thus the class of weight combination functions F and cost functions G given by Theorem 8 is, at any rate, large enough to contain the previously known Huffman -optimal systems, namely the path-length and tree-height ones. 34 V. Applications and Open Problems We have just shown that for wide classes of tree construction systems the Huffman algorithm produces cost-optimal trees. These classes are £0 wide that the general notation probably invites the reader to think that the results are too vague to be useful. We try to counteract this Q, g impression by giving some examples beyond the <$>(x) = x , 0) (so ipo~ (x) = exp(-Bx) is decreasing) a log (x) (a > 0) (so i|*> ~ (x) = +ax is increasing) will give a system optimal with the Huffman algorithm. Specifically if o we choose 5=1 and ip(x) = x = x, then , n n Cost(T) = f\ E iKW.(T)) ) = E W. (T) i=l i=l so the cost of the tree is the sum of the internal node weights, each of which is the product of its sons' weights. For example: Fig. 8 Total cost = .06 + .024 + .0012 = .0852 35 Also if we take a = 1 and ip(x) = -log(x), then , n n CostCT) = G(W(T)) = f^ S (W (I)) ) = n W (T) i=l x i=l x so the cost of the tree is the product of the internal node weights, each of which is the product of its sons' weights. This system is therefore the multiplicative analogue of the weighted path-length system. As an example, note that for the tree above the total cost would be here (.06) (.024) (.0012) = (.000001728). The example F(x,y) = xy is interesting because it changes drastically if we change the weight space U from [0,1] to [0 ,°°) or [b,°°) for any b >_ 1. When U = [O, 00 ) we can say nothing, for there is then no positive, strictly monotone function for which F(x,y) = (<|>(x) + °i> (x) = exp(x) which is not concave increasing as required by Theorem 8 and will not give optimal trees with the Huffman algorithm: 155^ Fig. 9 zm Huffman but suboptimal tree Total cost = 6+20+120 = 146 non-Huffman but optimal tree Total cost = 10+12+120 = 142 V. Another application of this extension of Huffman tree construction is the generation of codes which are optimal under criteria other than Huffman's original one, equivalent to weighted path-length [Huf 52]. A moderate literature has grown up around this subject; it is surprising that no corresponding analogue of Huffman's algorithm has also been developed. We outline several known results, including interesting lower bounds on average codeword length like that of the Noiseless Coding Theorem, and then present these Huffman analogues. In the context of coding the leaf weights {w , . . . ,w . } are proba- bilities (so Zw. = 1), representing the relative frequencies of occurrence of a set of (n+1) messages which are to be encoded into D-ary codewords (D >_ 2). Let the length of the message with probability w. be called %.; we are then interested in minimizing the "quasiarithmetic mean codeword length" [Acz 74], [Cam 66] -1 n+1 L(u,{w },{£}) = y x ( Z w. y(JL) ) J j i = l or some similar code cost measure; here y is a continuous, strictly increasing function on R . For example, when y(x) = x we get the traditional weighted path-length; other "translative" forms of L have been considered in [Cam 66], [Acz 74], and [Nath 75]. Although this measure of codeword length is quite general, most special cases treated in the literature can be handled by the extended Huffman construction presented here. We consider three cases one by one; each is based on Renyi ' s entropy of order a 37 n+1 a H fw ...... ,w .) = ■= — log„f S w. ) or 1' n+1 1-a 6 D^ . , i J 3=1 Here D is the size of the code letter alphabet, i.e., codewords can be viewed as D-ary numbers. Renyi's entropy has the interesting property that its limit, as a -> 1, is the usual Shannon entropy n+1 H(w_,...,w -) = - E w log D (w ). j=l j J Campbell [Cam 65] now defines an exponential codeword length average L(t) tx by setting y(x) = D so that I . L(t) = | log D ( E w. D t£ j ) = log A ( E Wj A j ) where t > and A = D > 1 . He then proves that (1) lim L(t) = I w. I. t-+0 j : J (2) lim L(t) = max I (3) H (w, , . . . ,w . ) < L(t) where a = -r— — = ■= = ^rr- v J or 1' n+1' — v * 1+t 1 + log_(A) -I. i ct ct with equality holding when D J = w. /(Ew. ). Now consider general Huffman construction as discussed in section II with F(x,y) = A(x+y) and G(W(T)) = log, (W (T)). Then a n Cost(T) = G(WCT)) = L(t) = L( lo g[) CA) ), so Huffman construction with this weight combination function F produces optimal exponential-length-cost trees by Theorem 6. Aczel [Acz 74], besides citing results of Campbell for the degenerate case t<0 (A<1) above, considers the result when y(x) = (A -1)/(A-1) (again, A = D ) and shows that , n+1 L(y) = y~ C I w y(£ ) ) j=l J J satisfies ( ( E w . l ) ' - 1 )/( X - 1 ) < L( where again a = l/(l+t) = 1/(1 + log A). Bat notice that wh- and G(W(T)) = u _1 (j I W. (T) ), then because u(m) = 1 ♦ I ♦ ... ♦ z™" 1 (**l t J Cost(T) = G(W(T)) = L(u). So, by Theorems 7 and 9, since u ' is increasing, Huffman construction with this function F again produces the optimal code tree (identical to the one constructed for Campbell's average codeword length). Lastly, Nath has come up with nice results by defining what he calls the average codeword length of order a (a > 1) [Nath 75] n+1 (ct-l)£. L(a) = (a - l)" 1 log ( I w °Tj J / w a ) j = l J n+1 I. log,( Z w X J I w ) ex ex fcx- 1 1 where w = Z w. and A = D^ . He shows that 3 -I. H (w . . ,w ) < L(a) with equality iff w. = D J for all j. Now when F(x,y) = (Xx a + Xy a ) 1/a and G(W(T)) = log, (W (T) a / w a ) A n we find Cost(T) = G(W(T)) = L(a), so by Theorem 6 Huffman construction with this function F produces optimal trees here, i.e., produces code trees of least average length L(a). To illustrate this, we consider construction of an optimal binary code for the ensemble of 13 messages given in [Huf 52]. One of the nice features of L(a) is that its limit as a -> 1 is the traditional average codeword length (weighted path-length); so in Figure 10 we display optimal code trees for the ensemble under the cost function L(a) for both a = 1 and a = 2, giving codeword assignments and L(cx) in each case. 39 Fig. 10a a = r xti OOlU\ OOIHO 00110 O\0o\ OlOOD 0O|O| OPIOO olo| III mo on ooo to lb) = Z>jlj ■ IX > 3-42 Fig. 10b a = 2 FCx ; ^=>[I^ r OOO00 O000| O|O0O OIOOI 0O\0O OOIOI OOII OOOI 0|o/ /op fo/ ©'/ UXj = h* [Z wj* Z ai /Z «/] - «fl« [V-V-Sw/] >Mt 40 There are several open problems remaining. First, it is currently hard to determine, given some arbitrary weight combination and tree cost functions F and G, the explicit forms for X, , and \p (if they exist at all) which let us apply the above theorems and prove the optimality of Huffman trees. It is not difficult, though, to give necessary conditions for F and G to be of the proper form. First both functions must be symmetric, increasing functions of their variables. G is also associative when it is not monadic, and F(x,y) = 4> (X<|)(x) + X(y)) is associative if and only if X=l; however F must satisfy the bi-symmetry property F( F(u,v) , F(x,y) ) = F( F(u,x) , F(v,y) ) [Acz 48] whether X = 1 or not. Much more is known for the case X = 1: Thielman [Thi 51] gives forms for (f> when F (or G) is rational, and Aczel and Daroczy [AD 75jp.l51] prove that homogeneity of F implies ex (x) by repeated differentiation of the defining equation F( (t) _1 (x) , c^V) ) = 4> _1 (2Xx) no direct method of decomposition is known to us. Another large problem is to find if other tree construction systems apart from those considered here are optimal under the Huffman 41 algorithm. If we consider F(x,y) = (Acf>(x) + A ! 6 ?97f' *** - ----N