MM LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 5/0.84 t of >.Z CENTRAL CIRCULATION AND BOOKSTACKS The person borrowing this material is re- sponsible for its renewal or return before the Latest Date stamped below. You may be charged a minimum fee of $75.00 for each non-returned or lost item. Theft, mutilation, or defatement of library materials can be causes for student disciplinary action. All materials owned by the University of Illinois Library are the property of the State of Illinois and are protected by Article 16B of Illinois Criminal Law and Procedure. TO RENEW, CALL (217) 333-8400. University of Illinois Library at Urbana-Champaign r 1 820Q0 When renewing by phone, write new due date below previous due date. L162 Digitized by the Internet Archive in 2013 http://archive.org/details/differencepreser602prep ^J I U ' " I in *> JU^OZ- UIUCDCS-R-73-602 77ld~J5h DIFFERENCE -PRESERVING CODES by F. P. Preparata and J. Nievergelt September 1973 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS THE LIBRARY OF THE NOV 2 1973 UNIVERSITY OF ILLINOIS AT URR* MA ' »o/s\(iN UIUCDCS-R-73-602 Difference-Preserving Codes by ■X--X- F. P. Preparata and J. Nievergelt September 1973 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 # This work was supported in part by the National Science Foundation (Grant GJ-31222) and in part by the National Research Council of Italy. #* Department of Electrical Engineering, Department of Computer Science, Coordinated Science Laboratory, University of Illinois, Urbana, Illinois. Dipartimento di Scienze dell'Informazione, Universita di Pisa, 56100 Pisa, Italy. Difference-Preserving Codes Abstract : A code (of integers by binary sequences) is called difference- preserving (DP-code) if it has the following two properties: 1. if the absolute value of the difference between two integers is less than or equal to a certain threshold, the Hamming distance of their code- words is equal to this value. 2. if the absolute value of the difference between two integers exceeds the threshold, then the Hamming distance of their codewords also exceeds this threshold. Such codes (or slight modifications thereof) have also been called path-codes, circuit-codes, or snake-in-the-box codes. This paper discusses the application of DP-codes to pattern recognition and classification problems, and presents a construction of efficient DP-codes whose information content is asymptotically (in the length of codewords) of the order of theoretical upper bounds. Key Words and Phrases : coding, difference-preserving codes, bounded-error codes, path-codes, snake-in-the-box codes, pattern recognition and classification 1. Terminology, definition, and problems Let 1,J be integers and u,v binary sequences of N bits each. Let |i-j| be the absolute value of the difference between i and j, and let H(u,v) be the Hamming distance between u and v (i.e. the number of positions in which u and v differ). Let t * 1 be an integer called the threshold, and K n be an integer called the range . Let &, the code , -2- be a mapping from the set (1,2,...,K) into the set {0,1} of binary- sequences of length N. We say that & is a difference -pre serving code with threshold t , or DP-t code , if and only if, for all integers i,j in the range 1 g i g K, 1 s j g K, the following two conditions hold: 1) |i-j| ^ t=*H(fl(i), *(j)) = |i-j] 2) |i_j| > t=$R(&{±), jD(j)) > t. Intuitively, the code £ preserves the difference between integers, whereby all differences larger than t are lumped together. When we want to specify the length N of codewords of a DP-t code, we may speak of an (N,t)-code, and if in addition we want to specify the range, we may speak of a (K,N,t)-code. As an example, the following is an (8,J|.,l)-code (range = 8, length = k f threshold = l): integer codeword 1 0000 2 0001 3 0011 h 0111 5 0110 6 1110 7 1100 8 1101 optimal in the sense that there i range larger than 8. It is natural to ask the following types of questions about difference-preserving codes: -- What is the maximal range K for given length N and threshold t — What is the minimal length N for given range K and threshold t -- What is the maximal threshold t for given range K and length N -- How can one systematically construct DP-codes for various choices of the parameters K, N and t. -3- -- Are there efficient encoding and decoding algorithms for various DP-codes This paper will answer some of these questions, but first let us motivate our interest in distance-preserving codes, and survey the known results. 2. Motivation and applications Our main interest in difference-preserving codes stems from their use in pattern recognition and classification problems, where the following technique is standard. With an object A one associates a vector (a , a a ) > 2. p ' ' ' ' ) ^.p / j where each component represents a feature that is measured by an integer value. The decision whether two objects A and B are equivalent is then based on whether or not their corresponding feature vectors (a a ) and (b x , ..., b f ) are close enough. In one of the most useful metrics, this amounts to deciding whether the inequality f £ la.-b. I g t 1 "1 i I i=l x x holds, where t is some threshold. DP-codes allow this decision to be made very efficiently, by replacing arithmetic operations (difference, absolute value, sum) by boolean operations (exclusive-or and "population count", i.e. the ■ number of l's in a binary sequence). Assume that a feature vector (a , . . a ) can be stored in a single memory cell a, each component a. having been assigned a field of sufficient length to hold all possible values of this component. If these values are represented by a DP-t code, then the f inequality Z | a .-b | g t holds if and only if the result of the i=l -It- bitwise exclusive-or of memory cells a and p, which holds the vector (b , . .., b ), has at most t l's. On a computer with long word- length and a built-in population count operation (such as the CDC- 6000 machines), this coding technique can speed up the comparison of feature vectors drastically. Since feature -vector comparisons frequently occur in inner loops of pattern classification programs, we believe that DP-codes are an important technique in pattern recognition and classification. From the point of view of this application, we are interested in DP-codes for small ranges K, and thresholds t that may be close to K. Typical values to be encoded might be the grey-levels of a digitized picture, the number of characters of English words, or the number of vertices of a graph that is abstracted from a handwritten character. In all of these cases a range of the order of magnitude of 10 is likely to suffice. Such small DP-codes can be constructed by ad-hoc procedures, or by techniques such as those described in section 5. However, DP-codes are also interesting from the point of view of coding theory, since they have common aspects with two well-known classes of codes: a) Gray codes, which are characterized as the special case t = 1 of the first requirement of DP-codes: |i-j| ^ t=»H(^(i), Mi)) = |i-dl b) Error-correcting codes, which share with DP-codes the requirement that codewords (all in the case of error-correcting codes, all but some in the case of DP-codes) are at a certain minimal distance from each other. -5- Because of these two properties, DP-codes might also be Called codes of bou nded error : if any b bits (b g t) in a codeword *(i) of a DP-t code are in error, the resulting binary sequence either is not a codeword at all (error-detection), or else it is a codeword &ti) of an integer j with |i-j| = b g t (bounded error). A DP-t code can also provide a limited form of error- correction as follows: if any b g [t/2j bits in a codeword *(i) are in error, then the resulting binary sequence can be decoded as an integer j such that |i-j| g 2b I t. From the point of view of coding theory, one is interested in the asymptotic properties of DP-codes for large ranges. We address this question in sections J and k, where we describe an efficient class of DP-codes, and compare their information content to theoretical upper bounds. It is this aspect of bounding or detecting errors which has motivated most of the early work on DP-codes. Kautz [8] introduced such codes in the special case t = 1, discussed their application to analog-to-digital conversion, and gave upper and lower bounds for the maximal number of codewords in circuit codes (a slight modification of the DP-codes discussed here, where the sequence of codewords forms a closed path). Vasil'ev [13] improved on these lower bounds by constructing circuit codes for t = 1 exploiting the connection already mentioned above between error-correcting codes and DP-codes. The class of DP-t codes described in Section 3 of this paper results from a construction also based on the mentioned connection between the two families of codes. Our construction is substantially different from that of Vasil'ev and holds for arbitrary values of the threshold t. -6- Chien, Freiman and Tang [l] describe code-construction techniques "based on the idea of combining two small codes into one larger one, for arbitrary threshold values. They also introduced a new technique for obtaining good upper bounds on the number of codewords. Further code-construction techniques and improved bounds are • described in various papers, such as Singleton [12], Klee [9 ,10] Danzer and Klee [ 2 ], Douglas [\ , 5], and Wyner [11^]. The main contribution of this paper is a new class of DP-t codes which have asymptotically for large codeword- length N and fixed threshold t, a higher information content than any of the known DP-codes. We also describe some new code-construction techniques suitable for constructing small codes, and we have already described earlier the application of DP-codes for the fast computation of the distance of vectors as it is used in pattern classification problems. 3. A Class of DP-Codes Based on Error-Correcting Codes The class of DP-codes to be presented in this section is based on an idea which we now informally describe. As stated before, an (N.t) DP-code is a sequence of vertices in an N-cube path so that each vertex v on the path is at distance greater than t from any other code vertex, with the exception of those vertices corresponding to integers within difference t from the integer represented by v. If one considers now a binary t -error -correcting code C of some length N 1 , each code point in C' is at distance at least (2t+l> from any other codepoint in C\ Thus we may think of threading all of the points of fi 1 with an N'-cube path & in the hope to obtain a (N',t) DP-code as the sequence of vertices of this path. Although this construction certainly meets the first condition of a DP-code (see section l) it may not meet the -7- second. In fact the N'-cube path just mentioned may he thought of as the catenation of subpaths each of which is contained within the decoding set of a code point veC*. While this ensures that each point veC' be at a Hamming distance greater than t from any other point of the N'-cube path, this may not occur for points towards the ends of • subpaths within decoding sets. To obtain the necessary distance, we may think to construct a code & as a subset of the cartesian product of jB« and of a new code fl" of length N". The function of fi" is to provide the necessary distance where fl 1 is more likely to violate the second condition of DP-codes. Thus the resulting (N'+N")-cube path will be traced on the coordinates of &" when the coordinates of fi ' are safe, and vice versa . As we shall now show more formally, such (N'+N",t) DP-code can be constructed. We begin by constructing the code fi ' . An important property of j9» is better elucidated by resorting to the polynomial representation of binary sequences, or vectors; in other words with a vector f = ( f > •••> f m ) we associate the polynomial f(x) = E f^x 1 . i=0 We also introduce some necessary nomenclature. For some positive integer N', A^ denotes the algebra of polynomials over GF(2) in x modulo (x - 1). For f(x) eA^, the weight W[f(x)] of f(x) is the number of nonzero coefficients of f(x) and W[f x (x) + f (x) ] = H(f (x),f (x)) is the Hamming distance between f (x) and f (x). Let C» c Ajj, be a cyclic binary code with odd actual minimum distance d = 2 t+l. These requirements are satisfied, for example, by a primitive binary BCH-code (see, e.g. [11], p. 282). Let g(x) be the generator polynomial of C and m(x) be a polynomial which has the minimum degree among the minimum weight polynomials in C (clearly -8- g(x) = m(x) when W[g(x)] = d). We define s = 2 - deg[m(x)] > and consider the set of polynomials F = {f(x)|deg[f(x)j < s}. We may now order the elements of F to form a sequence f_(x),f.. (x), . . . , . f (x) so that W[f. (x)+f (x)] = 1 f or i = 0,1,. . . ,2 S -1 (mod 2 s ). 2 S -1 X It is well-known that such sequences exist. They are referred to as Gray code sequences and correspond to Hamiltonian circuits on the "binary s-cube. Define now v t (x) = f i (x)m(x). We have the following simple result: Lemma 1 : The set V = {v. (x)|v. (x) = f . (x)m(x),f . (x) £ F) is contained in C, all the v. (x)'s are distinct and H(v. (x),v. + (x) ) = d for i = 0,1,...,2 S -1 (mod 2 S ). Proof : (i) Since each v. (x) is a multiple of a codeword m(x) it is also a multiple of g(x), hence it belongs to C. (ii) To show that all the elements of V are distinct, assume v. (x) = v.(x), for i j- j. Letting m(x) = p(x)g(x), we have deg[m(x)] = deg[p(x)] + deg[g(x)]. The equality v (x) = v (x) can be -J- J rewritten as (f. (x) + f . (x) )p(x)g(x) = 0, i.e., (f (x) + f (x))p(x) i ■*■ J must be a multiple of h(x) = (x N + l)/g(x). But deg [(f . (x)+f . (x))p(x) ] ■*■ J deg[p(x)] + deg[f. (x) + f.(x)] < deg[p(x)] + s, whereas deg[h(x)] = 2 N - (deg[g(x)] + deg[p(x)]) + deg[p(x)]"= s + degjjp(x)]. Thus h(x) cannot divide (f . (x) + f.(x))p(x). (iii) Recall that dCv^x), v i+1 (x)) =W[(f.(x) + f i+1 (x))m(x) ]; (x) + f i+1 (x) = x 5 for H(v i (x),v i+1 (x)) =W[m(x)] = d. since f . (x) + f. , (x) = x^ for some p in the range (0,s-l), then -9- In the sequence f U),...,f ( x ), in passing from f . (x) to P _i •*- f i+1 ( x ) there is exactly one coefficient that changes: Let it be the s(i) coefficient of x J , where s(i) e {0/1,. . . ,s-l}. We denote the pair (v i ,v ) as a g(i) -transition. We now construct a code fl' as a sequence of N ' -dimensional * points as follows: 1. Let x ,x ,...,x be the powers of x whose coefficients are nonzero in the polynomial m(x). Construct the N'-cube path V\ as the sequence of points v. = v',v',...,v' , where t -, , i.: + 5(i) v and v! differ exactly in the coefficient of x J for J - J j = 1,2,..., d-1. 2. fi' is the N'-cube path obtained by catenating the paths This code fl ' has an interesting property. Since all points at Hamming distance less than or equal to (d-l)/2 from a point in C belong to distinct cosets of the vector subspace C, it is clear that the points in a path segment V are in different cosets and that a unique collection of (d-1) cosets is associated with each of the s transition types. We now consider the construction of the code &". We begin by introducing a code C" as a set of at least s N"-dimensional points (as many as the coefficients of the ff^aO's) with minimum Hamming distance (d-l)/2. Denoting by u an arbitrary element of C", we now define an infective mapping cp : [0,1,. . . ,s-l) ■+ C", so that u = cp(r) means that u is associated with the r-transition. If u. and u are two points in C" at Hamming distance q, the sequence of the first q elements of the N"-cube path u = u' u',...,u' = u. is denoted as -10- U . We define a code &" as the set of points on the paths U for ij J all u. and u. in cp{0,l,. . . ,s-l}. We have thus developed the necessary nomenclature for the construction of a DP-code & c fi' x fi". A code word w of & is of the form [v,u], where ve&' and u e jB". The code &, which is a mapping fi: . {1,...,K} -» {0,1} N , is constructed as follows: 1. For each v. e V, let u. = cpCs(i) ]• Let H(u i _ 1 ,u i ) = q ± . We construct the two sequences v.U. and V _.u (i.e. for example, X X ~X • X XX v U is the sequence U. , . to each element of which we juxtapose v i i-l,i 1-1,1 on the left). Note that v.U. . contains a. elements and that V^ X X "X, • X x contains d elements. We then define: ¥ i = (Vi.i,i>(Vi> as the catenation of v.U. . , and V u. (v U preceding V.U.). 1 1-1,1 X ± J- X— J-,_L -*- - 1 - 2. The code & is the catenation of the paths W n ,W n ,...,W , U - 1 2 S -1 from which exactly (d-l)/2 consecutive points have been removed. Notice that W Q is defined as (v Q U_ 1 q)^^), where u _x = ^(2 -l)], since the sequence f_(x),...,f a (x) is a Hamiltonian circuit in the binary s-cube. We now claim: Theorem 1 : The code & is an (N'+N", (d-l)/2) DP-code. Proof : Let w^ = [v^ 1 V 1 ) ] and w^ = [v^V 2 '] be two codewords of & . We first consider the first requirement. If w and w c belong to the same W., then H(w (l) ,w (2) ) = d(^ 1 (w (l) ), fi _1 (w (2) )) for H(w^^, w^) ts d-1 + (d-l)/2 and the requirement is met. The same happens when w and Wg belong to two consecutive W^s. We must now show that if d(j8" 1 (w^' 1 |),il" (w^ ') > (d-l)/2, then also H^^w^ ) > (d-l)/2. Notice that: -11- and consider the following cases: 1. v eC If v = v c ~ , the condition is obviously met since both w (1 J and w (2) belong to the same W. sequence. Suppose V (D^ V (2). ifv (2) £C , thenH ^ (v (iy2) )idi Ifv^^then it belongs to the sphere of radius (d-l)/2 centered on some other codepoint v*ec; it follows that H > H(v (l) ,v*) - H(v*,v (2) ) i a - (d-l)/2 (d-l)/2 + 1 > (d-l)/2. 2 - v t C, v (2) / C. Assume at first that u (l) = u (2) , i.e., H =H (v (l ^v (2) ). In this case v (l > € V (l) andv^V^, where V and V are paths pertaining to a transition of the same type. Then if v (1) and v (2) belong to the same coset, H(v (l) ,v (2) ) § d since each coset of C has the same distance structure as C. Suppose that v and v ( belong to different cosets of C, i.e., v^ e c^^ and (2) (2) v e C . Then let v ± and v i+1 be the elements of c ' which delimit V and v* be the element of V (l) in C (2) ; we can write H( V V X ) + H(v (l) ,v*) + H(v*,v i+1 ) = d by the construction of the code &'. Now, if H(v (l) ,v*) £ (d-l)/2, then H g H(v (2) ,v*-) - H(v*.v (l) ) ■ d - (d-l)/2; the same holds otherwise, due to the distance property of C'. Finally if V and V^ ' pertain to different types of transitions, then H(u (1) , vS 2 >) g (d-l)/2 by the property of & " and H(v ,v '" ) 1 1, since v^ and v^ ' belong to the decoding subsets of two distinct points of C. Hence, H l (d-l)/2 + 1 > (d-l)/2. ■ We now evaluate the efficiency of the DP-code & just constructed. The number K of codewords in & is (d-l)/2 less than the sum of the numbers of codewords in each segment W. (i = 0,...,2 s -l). Denoting by |w. | the -12- cardinality of W. we have 1=0 x ' 2 With the same notation we obtain |¥^| = |v\J + \v ± \. While \v ± \ = d for every i, |u. | = q. depends upon i. Thus K = 2 d - (d-l)/2 + Sq . The value of Zq. depends clearly upon the mapping cp, I.e., upon the assign- ment of vectors ueC" to transitions. The selection of this mapping is very relevant to establishing the value of K and, as we shall see later, to the encoding and decoding algorithms. To gain some insight into this problem, we must refer to the choice of the sequence f n (x),...,f ■ (x). This sequence of s-dimensional d vectors is equivalent to a sequence T over the integers {0,1,. . . ,s-l} , s such that, if the i-th element of T g is r, we have f i _ 1 (x) + f^x) = x . One possible choice of T is provided by the standard Gray code sequence, s which is defined by r T 1 = / T = T^j-l)^ for j = 2,...,s-l T = T ., (s-l)T , (s-1) s s-1 s-1 Hereafter we shall only consider such standard Gray code sequences. With this restriction, consecutive pairs in T are either of type (0,j) or of type (j,0) (j = l,...,s-l). Furthermore, denoting by v. the total multiplicity of pairs ( Mi) Let the integer i be given. 1. Express i as i = p(d+b) + r where r < d + b. 2. Express p by its binary Gray code representation f (x) P h i +s(p) 3. If r-b = h > 0, compute v(x) = f (x)m(x) + Z, x m p y m=l > set u(x) = cp[5(p)J and go to step 5. Else, proceed. k. Compute v(x) = f p (x)m(x) and u(x) as the (r+l)st element on the path U , p-l,p 5. ^(i) = (v(x),u(x)). Decoding Algorithm w -> j9" 1 (w) Let the codeword w = (v(x),u(x)) be given. 1. Express v(x) as v(x) = f (x)m(x) + r(x) where deg[r(x)] < deg[m(x)]. 2a. If r(x) = 0, p is the integer whose Gray code representation is f (x). Then, u(x) belongs to the path U and let it be the (r+l)st element on this path. Then $ -1 (w) = p(d+b) + r. 2b. If r(x) ^ 0, inspect u(x) and determine 6 = q> _1 (u) (table look-up). Then determine, by performing at most d trials, the -16- integer r for which r+1 i +5 A v(x) + Z x m = v*(x) m=l is a multiple of m(x). Express v*(x) as f p (x)m(x). Then ^ -1 (w) = p(d+h) + d + r. Before closing this section, we illustrate the technique just presented "by the following examples. Examples (l) A (71,9,1) -code can he constructed as follows. We choose the (7,10 single-error-correcting Hamming code as Cl The resulting code fi' has parameters N« = 7, a = h and d = 3. The code C" can he chosen as cp(0) = 00, cp(l) = 11, cp(2) = 10, q>(5) = 01 (i.e., N" = 2). Since d = d = 1 and d_= 2, using expression (a) we ohtain 2 3 ^ L K = 2^.3 - 1 + (2 5 .2 + 2 2 .1 + 2 2 1) = 71 (2) Consider the (15,7) BCH double-error-correcting code as the code C 1 for the construction of £ ' of parameters N' = 15, s = 7 , d = 5. We must also choose a code C" having at least 7 codewords at minimum distance (5-l)/2 =2: one such code is the set of the even weight binary sequences of length 1*. Then we can select cp(0) = 0000, cp(l) = 1111, and cp(2),...,cp(6) as weight 2 codewords. From (a) we readily obtain K = 2 7 . 5 - 2 + 376 = 1011^ thus, we have a (lOllj., 19,2) -code. (3) Finally we construct a DP-3 coae. Consiaer the famous (23,12) triple-error-correcting Golay coae as the C coae. We obtain ~+~,^ t\t« - 9^ <5 - 12 d = 7. The coae C" must have a coae & ' with parameters JN = 0> s - x ^> a '• -17- at least 12 codewords with minimum distance 3; then c" can be chosen as the (7 ,k) single-error-correcting code. We select the mapping cp so that ¥[cp(0)] = 0, W[cp(l)] = 7, W[cp(i)] = k for i = 2-8, W[cp(j)] = 3 for j = 9-11- We obtain K = 2 12 .7 - 3 + 22512 = 51181 . that is, a (5II8I, 30,3) -code. If instead we choose an (8,i+) Hamming code as C", this allows the following selection of cp: W[ep(o)] = 0, W[cp(i)] = k for every i ^ 0. This results in K = 2 12 .7 - 3 + (2 12 + \).k = 1+5057 that is, a (lj.5057,31,3) -code which is very simple to encode and decode. In the next section we shall compare the efficiency of the DP-codes just described against upper-bounds on the attainable efficiency. k» Bounds on the Information Content of DP-codes To evaluate the information content of DP-codes, several authors have referred to the number K(N,t) of codewords in an (N,t)-code. These bounds have usually been presented for circuit codes (i.e. when the codeword sequence forms a closed path) but asymptotically they have the same behavior as the corresponding bounds for path-codes, i.e. our DP-codes. Upper bounds to K(N,t) are based on geometric arguments [ 1,5,7] and are reminiscent of the Hamming bound for error-correcting codes. Except for rather small values of N and t, the discrepancy between the bounds and the best codes discovered is substantial. The upper-bounds, however, are interesting in their asymptotic behavior, because they -18- provide a useful guideline for evaluating the efficiency of various code constructions. For given threshold t, the best upper bound known is due to Douglas [ 5 ]; it represents a small refinement of a result obtained by Chien, Freiman, and Tang [ 1 ]. This bound is of the form N[l-H(^)] + o(N) (e) K(N,t) < 2 where H( ) is the entropy function. A variant of this bound, presented by Wyner [Ik], gives asymptotically a substantial improvement over (e) when t is a constant fraction of N. The bound expressed by (e) indicates that at least NH[(t-l)/2N] bits of redundancy are needed by any (N,t)-code. The constructions of DP-codes proposed by various authors provide lower bounds to K(N,t). With one notable exception, there is a substantial gap between upper and lower bounds. In fact typical lower bounds to K(N,t) for t l 2 are of the form (see [ 1 ], [12]) (f) K(N,t) ^ 2 2N / t (for t § 2) whereas f or t = 1 Vasil'ev [13] and Danzer and Klee [2 ] were able to obtain . . N-%:_N+o(N) K(N,1) ^2 Let us now consider the DP-codes constructed in Section 3. From relations (b) and (d) we obtain K> 2 2 n -l-nt m 2 N-(t+1.32)to» 2 N+o(N) -19- which shows that their redundancy is at most of order (t+1.32)% N. Hence the codes described in Section 3 are comparable in efficiency to the codes constructed by Vasil'ev and Danzer and Klee, f or t = 1 while for t ^ 2 they are superior to previously known codes. Further- more, for fixed t and asymptotically in N, their efficiency is of the ► same order as that prescribed by the upper bounds. 5. Simple code-composition techniques The class of codes presented in section 3 is very efficient for large ranges. For short codes there is a certain disadvantage in that not all codeword lengths can be selected. This is due to the fact that the construction of jQ< must be based on existing binary cyclic codes. Thus there may be substantial gaps between realizable word lengths of codes in that class. For the application to pattern classification problems, where small ranges are usually needed, other code construction techniques may be more appropriate, and so we describe two of these. The efficiency of codes constructed according to these methods, however, compares unfavorably to that of the codes of section 3 for large ranges: the redundancy is typically proportional to the codeword length N, as opposed to t% N for the codes of section 3. For small values of N (say W g 5) and t it is fairly easy to construct optimal codes (i.e. those of maximal range) by hand. These may be used as building blocks for the composition techniques described in this section. The table below gives the value of K (N,t) for N max ' ' from 1 to 6, and t from 1 to 5. -20- Table: Ranges K (N,t) for optimal DP-codes. IllclX \N t\ 1 2 3 k 5 6 1 2 3 5 8 34 25 2 2 3 k 6 8 11)- 3 2 3 k 5 7 9 k. 2 3 k 5 6 8 5 2 3 h 5 6 7 Notice that for t § N - 1, we have K (N,t) = N+l, an optimal (N,t)-code "being any shortest path in the N-ciibe that joins two vertices that are at maximal distance N from each other. For t = N - 2, K (N,t) = N + 2, an optimal code "being a shortest path between opposite vertices of the N-cube, augmented by one more codeword. For t < N - 2, there appears to be no simple description of optimal (N,t) -codes. The following two techniques, judiciously used, allow one to construct reasonably efficient DP-codes for any small values of N and t. Both are based on the idea of combining an (N T , t')-code with an (N",t")-code to form an (N,t)-code, each of whose codewords w is the juxtaposition of a codeword u of the first code and a codeword v of the second code. It is a simple exercise to verify that each of these techniques yields a DP-code with parameters as stated. a) Threshold addition technique . Let vl 9 u , ..., u be an (N'^t 1 )-code, and v,, v , . .., v be an (N",t")-code, with p i q. Then if p < q, ilv., u i_ Y o> u o Y o> u p v V -21- u 3 v y u y\> '"> u p v p > u p v p+ i is an (N'+N", t'+t")-code of range K = 2p. If p = q, vl v does not exist, and K = 2p - 1. to) Range extension technique . Let u^ u 2 , ..., u p toe an (N',t)-code, and v ±9 v ,...., v toe an (N",t)-code. Denote toy V the sequence v v . . .v , toy V the sequence V q V q-l*" V l> by u i V the sec 3. uence ototained toy juxtaposing to each element of V the codeword u. on the left- and similarly for u V \(t+l)+l Y > U h(t+1)+ 2 V "' U (h+l)(t + l) V q (h even ) «h = s >(t+l) + l V > Vb*l)+2 V 1' —' U (h + l)(t+l) V l (h odd) The sequence W Q , V^, ..., w r p /( t+1 )i is an (N*+N",t)-code of range K = [p/(t+l)"|q + t[p/(t+l) J. Example In order to construct a (6, 3) -code, we may compose it by means of the threshold addition technique from a (3,l)-code and a (3,2)-code. Optimal codes for these parameters are readily found: (5,l)-code: K = 3 (3,2)-code: K = k 000 on I Oil 111 100 000 001 011 111 -22- Composed (6, 3) -code: K = 8 000 000 000 001 001 001 001 Oil Oil Oil Oil 111 111 111 111 100 This compares rather favorably with the range K = 9 of an optimal (6, 5) -code. -23- References 1. R. T. Chien, C. V. Freiman, and D. T. Tang, Error correction and circuits on the n-cube. Proc. Second Allerton Conference on Circuit and System Theory, Sept. 28-30, 196k, Univ. of Illinois, Monticello, 111., pp. 899-912. 2. L. Danzer and V. Klee, Lengths of snakes in boxes, J. Combinatorial Theory 2 (1967), 258-265. 3. D. W. Davies, Longest 'separated' paths and loops in an N cube, IEEE Trans. Electronic Computers EC-14 (1965), 261. h. R. J. Douglas, Some results on the maximum length of circuits of spread k in the d-cube, J. Combinatorial Theory 6 (1969), 323-339. 5. , Upper bounds on the length of circuits of even spread in the d-cube, J. Combinatorial Theory 7 (1969), 206-2lif. 6. R. G. Gallager, Information theory and reliable communication , Wiley, New York, N.Y. , I968. ~~~ 7. V. V. Glagolev, An upper estimate of the length of a cycle in the n-dimensional unit cube (Russian), Diskret. Analiz. , No. 6(1966), 3-7. 8. W. H. Kautz, Unit-distance error-checking codes, IRE Trans. Electronic Computers EC-7 (1958), 179-180. 9. V. Klee, A method for constructing circuit codes, J. Assoc. Comput. Mach. Ik (I.967), 520-528. 10 « , What is the maximum length of a d-dimensional snake?, Amer. Math. Monthly 77 (1970 ), 63-65. 11. W. W. Peterson and E. J. Weldon Jr. , Error-correcting codes , (2nd edition), MIT Press, Cambridge, Mass., 1972. 12. R. C. Singleton, Generalized snake-in-the-box codes, IEEE Trans. Electronic Computers EC-15 (1966), 596-602. 13. Ju L. Vasil'ev, On the length of a cycle in the n-dimensional unit cube, Soviet Math. Dokl. h (1963), I6O-I63 (transl. from Dokl. Akad. Nauk SSSR 148 (1963), 753-756. Ik- A. D. Wyner, Note on circuits and chains of spread k in the n-cube, IEEE Trans, on Computers C-20 (1971), k7h. LIOGRAPHIC DATA ET 1. Report No. UIUCDCS-R-73-602 it le and Subtitle DIFFERENCE-PRESERVING CODES uthor(s) F. P. Preparata and J. Nievergelt erforming Organization Name and Address 3. Recipient's Accession No. 5. Report L'-ue September 1973 6. 8. Performing Organization Re No. pt. Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 ponsoring Organization Name and Address National Science Foundation Washington, D. C. upplementary Notes 10. Project/Task/Work Unit No. 11. Contract /Grant No. NSF G J- 31222 • 13. Type of Repott & Period Covered 14. :ts A code (of integers by binary sequences) is called difference-preserving '-code) if it has the following two properties: 1. if the absolute value of the difference between two integers is less than equal to a certain threshold, the Hamming distance of their codewords is equal to s value. 2. if the absolute value of the difference between two integers exceeds the eshold, then the Hamming distance of their codewords also exceeds this threshold. Such codes (or slight modifications thereof) have also been called path-codes, cuit-codes, or snake -in- the -box codes. This paper discusses the application of codes to pattern recognition and classification problems, and presents a construction efficient DP-codes whose information content is asymptotically (in the length of ewords) of the order of th eoretical upper bounds. ey Words and Document Analysis. 17a. Descriptors ' "" ■• coding, difference-preserving codes, bounded-error codes, path-codes, snake-in-the-box codes, pattern recognition and classification dentifiers/Open-Ended Terms 0SAT1 Field/Group liability Statement ^TIS-18 ( 10-70) 19. Security Class (Thi- Report ) U N,I LASSH-1 I; D 20. Security Class (Th Page UNCLASSIl-'lhD 21. No. of Pages 2k 72. Pric USCOMM-DC 40329- P 7 a* ^ ,U*0^ UNIVERSITY OF ILLINOI9-URBANA 3 0112 047417826 I