BW H WMfWW 
 
 m 
 
 BflflRSS 
 
 B0Mj 
 
 SB! 
 BH 
 
 fflfi 
 ml 
 
 ■ 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMRAIGN 
 
 510. ^ 
 
 IJlCoT 
 Oop.2, 
 

SV4 ■ o^ 
 
 wf-2^ UIUCDCS-R-T2-i+96 
 
 coo-2118-0028 
 
 Methodological Aspects 
 of Scene Segmentation 
 
 August 1972 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 THE LIBRARY OE THE) 
 
 UNIVERSITY OF ILLINOIS 
 AT URBANA-CHAMPAIGN. 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/methodologicalas496raul 
 
COO-2118-0028 
 
 UIUCDCS-R-72-U96 
 
 Methodological Aspects 
 of Scene Segmentation 
 
 By 
 Peter Raulefs 
 
 Department of Computer Science 
 
 University of Illinois 
 
 Urbana, Illinois 6l801 
 
 This work was supported in part "by Contract AT(ll-l)-21l8 
 with the U.S. Atomic Energe Commission. 
 
Ill 
 ACKNOWLEDGMENT 
 
 The author would like to express his gratitude to Professor 
 Bruce H. McCormick who introduced him into the area of image processing 
 and contributed to this thesis in numerous discussions and suggestions. 
 
 The author is also grateful to the members of the Illiac III 
 group for pleasant and helpful cooperation. In particular, he acknowledges 
 the help of Mrs. Judy Arter for typing and improving this thesis through 
 editing, and he appreciates the excellent drawings done by Mr. Stanley 
 Zundo. 
 
IV 
 
 TABLE OF CONTENTS 
 
 CHAPTER Page 
 
 1 INTRODUCTION 1 
 
 2 SCENE SEGMENTATION 3 
 
 2.1 Property Space Representation 3 
 
 2.1.1 Local Representation of Pictures 3 
 
 2.1.2 Definition and Representation of Regions 5 
 
 2.1.3 Criterion for Partitioning Pictures into 
 
 Regions 9 
 
 2.1. h Density and Entropy of Sets in the Property 
 
 Space 10 
 
 2.1.5 Formal Description of Scene Segmentation 
 
 into Regions 13 
 
 2.1.6 Metrics in the Property Space 1^ 
 
 2.2 The Clustering Approach to Scene Segmentation 
 
 18 
 
 2.2.1 A Formal Concept of Clustering 19 
 
 2.2.2 Application of Clustering to Scene 
 
 Segmentation 23 
 
 2.2.3 Review of Some Clustering Techniques 25 
 
 2.2. k A Combined Clustering Technique for Scene 
 
 Segmentation 2o 
 
 3 STRUCTURAL ANALYSIS OF SCENES 33 
 
 3.1 Data Representation of Composite Scenes 33 
 
 3.2 Synthesis of Composite Objects in Scenes 35 
 
 3.2.1 Synthesis of Composites Using Graph 
 Transformation Rules 3o 
 
 3.2.2 Synthesis of Composites Using Cartesian 
 
 Covers ~ 
 
 3.2.3 Matching Models to Scenes 3o 
 
 k CONCLUSIONS kO 
 
 REFERENCES kl 
 
V 
 
 LIST OF FIGURES 
 
 Figure Page 
 
 la (6,3) - Regular Tessellation of the Plane 7 
 
 lb (U,U) - Regular Tessellation of the Plane 7 
 
 lc (3,6) - Tessellation of the Plane 8 
 
 2a (U,U) - Tessellation Which is h- Imbedded in Another 
 
 (U,U) - Tessellation 21 
 
 2b Cohesions of Simple Regions 21 
 
 3 Cohesions of Different Regions in a Scene 22 
 
CHAPTER 1 
 INTRODUCTION 
 
 We consider scene analysis to consist of two major parts: picture 
 preprocessing and structural analysis. 
 
 Scenes are viewed as being pictorial representations of an assem- 
 bly of more or less well known objects. In image processing research, a 
 scene is usually given as a digitized picture constituting the output of a 
 scanning device. The raster points of a digitized picture are referred to 
 as picture points. The preprocessing phase consists of partitioning the 
 picture into sets of picture points. Subsequently, these sets have to be 
 interpreted as representations of certain objects (structural analysis). 
 
 The basic idea of this paper is that the representation of objects 
 in a scene is given by sets of picture points such that the local properties 
 of points within such a set are more similar to each other than those belong- 
 ing to different sets. Such a set of picture points will be called a region, 
 The idea of partitioning scenes into regions has proven to yield a conveni- 
 ent data structure for subsequent processing techniques (see [BF TO], [GUZ 
 TO], [BP TO], PR TO] and Sect. 3 of this paper). 
 
 Although this approach to region definition is closely related to 
 the concept of clustering, a precise definition has not yet been given. It 
 is shown in Sect. 2.1 that the information-theoretical quantity entropy is 
 useful in characterizing a region. In Sect. 2.2, entropy is used to define 
 the cohesion of a region, and this term gives a tool to analyze clustering 
 methods and to specify an algorithm giving an optimal scene segmentation. 
 
 After having partitioned a picture into regions, each region must 
 be interpreted as a part of a known object. This requires a model of how 
 
objects may be represented and which relations between these representations 
 must hold. Then, recognition can be viewed as finding an optimal match of 
 regions to a collection of model objects. Although we do not pursue a 
 solution to this problem in this paper, regions are used to define a data 
 structure which is applied to approaches to scene analysis briefly reviewed 
 in Sect. 3. 
 
CHAPTER 2 
 SCENE SEGMENTATION 
 
 2.1 Property Space Representation 
 2.1ol Local Representation of Pictures 
 
 A picture P is considered to "be given as a finite set of points 
 in the plane that are associated with local attributes. 
 
 Definition 2.1 Let P be a finite set of discrete points in the plane. 
 
 A local attribute, A, , is a function 
 
 t 
 
 A, 
 
 p+ v t = V£ U {*} 
 
 P -*■ A, (p ) 
 r m t m 
 
 mapping each point p e P into the value set , V , where 
 
 V, consists of a finite set V' of real numbers and the 
 t t 
 
 don't care value, #. 
 
 Remarks : 1. An, attribute A assumes the don't care value, A (p ) = *, 
 j T/ in 
 
 at a point p eP whenever no real number is specified as 
 
 the value of A, at p . 
 t r m 
 
 2. We assume that the attributes A are always specified so 
 that the values in V' are naturally ordered real numbers, 
 but we do not assume any ordering between the elements 
 
 in V' and *. 
 
 3. We do not assume that a distance function is known in any 
 
 of the sets , V . 
 
 With reference to some coordinate systems in the plane, 
 we adopt the convention that for any point, peP, with 
 coordinates x and x , the attributes A and A are defined 
 by A.(p) = x. (i=l,2). 
 
With these notions we consider a picture, P, to be represented 
 as a finite set of points in the plane such that each picture point, p, 
 is associated with a N-dimensional vector (A (p), ..., A^(p)) of attribute 
 values „ 
 
 Definition 2„2 Let A =' {A , . .., A„} be a set of local attributes. The 
 property space X(A) is defined as the set 
 
 N 
 
 x(a) = n v , . 
 t=i t 
 
 Definition 2.3 Let P be a picture consisting of M points p , ...,-p^. 
 The property space representation of P is defined to be 
 the set P = {x,, „ . . , x^} of M points in X(A) : 
 
 P = ' {x I for 1< m< M: x = (A. (p ),... ,A. T (p )} 
 
 Remark : 
 
 To obtain a convenient representation of pictures , the 
 don't care value, *, may be associated with a real number, 
 x % , which is not contained in any^of the sets V, ..., V' c 
 
 N 
 
 We denote by X* (A) = n V' the subset of X(A) that 
 
 t=l t 
 contains the N-tuples of all specified attribute values. 
 
 Example 
 
 To illustrate the property space representation of a 
 picture, we consider the gray value of picture points as 
 an attribute A . The gray value is defined by quantizing 
 the possible intensities of light associated with picture 
 points and assigning a number to each intensity interval 
 thus obtained. The corresponding property space, 
 
5 
 y{ -{A. ,A ,A J ), is three-dimensional, where two dimensions 
 are used to describe the coordinates and the third to 
 give the gray values of picture points. 
 As the gray values of single picture points are often 
 influenced "by noise effects, attributes obtained, for 
 example , by averaging the gray values over neighborhoods , 
 may be considered. 
 
 2.1.2. Definition and Representation of Regions 
 
 The property space representation can be considered as a local 
 description of pictures. A step towards a more global description is the 
 formation of regions „ This is motivated by the following considerations: 
 
 1. From evaluating the importance of various parts of pictures 
 for human perception of form, F. Attneave concluded that the "information 
 contents" of a black-white picture is concentrated at points where the 
 gradient of the density distribution of black points is large ([ATT5^], 
 [AA.66]). It is suggested in [ATT5^+] that reducing a picture to a contour- 
 line drawing reveals some redundancy among points circumscribed by closed 
 contour-lines. This interpretation of experimental results in terms of 
 information theory has been seriously questioned in [GC66]. But we can 
 safely infer that the representation of pictures as an arrangement of 
 regions (l) leads to a higher stage of conceptualization, and (2) reduces 
 the number of code words necessary to represent a picture for automated 
 picture processing. 
 
 2. Scene segmentation into regions is a convenient step of 
 generalization allowing the application of graph processing techniques. 
 A similar data structure has been used in [BFTO], [GUZTO], [BPTO], and 
 [PRTO]. 
 
A: Regions in Discrete Point Sets 
 
 Definition 2.1* Let P "be a set of M points in the plane such that the 
 
 coordinates x = ( x n > x p) °f each point in P are defined 
 ■with respect to some coordinate system. 
 
 For any e>0, a subset UsP is called e -chain-connected 
 
 1 k 
 
 if for any pair x_, x' e U, there is a chain t_ , „ . . , t_ eU 
 
 such that: 
 (i) 
 
 x = t and x' = t : 
 
 (ii) for j = 2, . . . , k 
 
 where I I xl 
 
 tJ" 1 - t J 
 
 < e 
 
 = /x, + 
 
 x„ denotes the Euclidean 
 norm of a vector, x = (x ,x ? ). 
 
 Definition 2.3 An e-chain-connected set of picture points in a picture, 
 P, is called an e -region in P for any e>0 o 
 
 B. Regions in Cellular Images 
 
 In many applications, the acquisition of pictorial data is done 
 by scanning pictures at all points of a pre-selected regular grid. A 
 systematic method of defining a grid is to tessellate the plane into (q) 
 regular p-gons as discussed in [GRTl]. It can be shown (cf. [GRTl]) that 
 there are only three regular tessellations of the plane for (p,q) = (6,3), 
 (UjlO and (3,6). Then, a regular grid , G, is obtained by placing grid 
 points at the fearycenters of all p-gons. Each p-gon is called a cell. 
 
 Definition 2.6 
 
 Let P be a set of M points of a regular grid, G, obtained 
 by a regular tessellation df the plane into (q) regular 
 p-gons (cells). A set, Us=P, is G-chain-connected if for 
 any two points Pjp'eU, there is a chain q , . „ . , q,?U 
 
 1 K. 
 
 such that: 
 
G-REGION 
 
 PICTURE POINTS 
 
 Figure la (6,3) - Regular Tessellation of the Plane 
 
 PICTURE POINTS 
 
 G-REGION 
 
 Figure lb (h,k) - Regular Tessellation of the Plane 
 
PICTURE POINTS 
 
 G-REGION 
 
 Figure lc (3,6) - Tessellation of the Plane 
 
9 
 (i) p ■ a 1 and p 1 = q k ; 
 
 (ii) for j =s 2, ..., k: c(q ) and c(.q ) are edge-adjacent 
 or vertex-adjacent, where c(q) is the cell containing 
 the point q„ 
 
 Definition 2.7 A G-chain-connected set of picture points in a picture, 
 F, is called a G-region in P. 
 
 Regular tessellations and G-regions are illustrated in Figures la-c. 
 
 2.1.3 Criterion for Partitioning Pictures Into Regions 
 
 Early approaches to region finding, such as [ROB65], employ two 
 steps: (l) edge finding; and (2) fitting the line drawing obtained after 
 (l) to a model. In this section, we are interested in finding regions with- 
 out regard to a model, thus assuming an early stage of image preprocessing. 
 
 A different method, proposed in [MP6T], bypasses the edge finding 
 phase by directly forming regions as the union of squares with approximately 
 the same gray value. In [BF70], regions are constructed by joining 
 "atomic regions" according to two heuristic rules. In our notation, these 
 "atomic regions" are chain -connected sets of picture points whose properties 
 satisfy certain equivalence relations, e.g. equality of gray values. The 
 method of [BF70] is based on the notion of the "strength of boundary", which 
 is defined as the sum of the distances (in the plane) of all adjacent 
 points separated by the boundary. The heuristics are then aimed at merg- 
 ing regions if the boundaries separating them are weak. 
 
 The region finding mechanism of these approaches can be generalized 
 to the following scheme: 
 
 1. Define a similarity measure for (e-resp G-) chain-connected 
 
10 
 regions represented in .the property. space. 
 
 Remark; A set consisting of one picture point only is always 
 chain-connected . 
 2. Assuming some threshold value 0, merge any two chain-connected 
 regions to a single region if their similarity is larger than 9. 
 Remark: Two regions S and S ? are ( e -resp.G-) chain-connected 
 if any pair (p.. ,p p )eS x S p is chain-connected. 
 The definition of a similarity measure depends on defining a metric 
 in the property space. Then, the threshold value, 9, can be specified as 
 that distance in the property space up to which regions are considered to 
 he similar. By definition of the property space representation of pictures, 
 the planar coordinates are also coordinates in the property space X. Con- 
 sequently, the ahove partitioning scheme yields a segmentation into regions 
 such that the density distribution of points in X is more uniform in each 
 region than in the entire picture. This can he further generalized to the 
 following criterion: 
 
 [Region Criterion] An optimal partition of a picture into regions is 
 given by: 
 
 1. minimizing the average intra-set dispersion within regions, and 
 2 maximizing the average inter-set dispersion between regions 
 To obtain a formal and precise statement of this criterion, the concept of 
 a density distribution of points in the property space will be introduced 
 in the next section. With this notation, we shall see that the [Region 
 Criterion] can very well be interpreted is terms of information theory. 
 
 2.1.^ Density and Entropy of Sets in the Property Space 
 
 As the property space, X(A) , is discrete and finite, the definition 
 
11 
 
 of a density distribution of points, in subsets of X(A) depends critically 
 on the neighborhoods in which densities are evaluated. These densities 
 ■will be applied by interpreting them as probability densities. Therefore, 
 we have to establish non-overlapping neighborhoods to define normalized 
 densities. The lattices considered in crystallography are a useful notion 
 for this purpose, as lattices are generated by applying elements of the 
 translation group to unit cells (see, for example, [KIT63]). 
 
 The property space X(A) is imbedded in the N-dimensional vector 
 
 space, R , of N-tuples of real numbers. As points in X(Aj may be distri- 
 
 N 
 buted quite irregularly, we establish neighborhoods by partitioning R 
 
 A 
 
 into N-dimensional cubes with sides of length %. 
 
 Definition 2.8 Let <x , . . . , x > be an ordered set of orthogonal 
 
 N 
 vectors in R spanning an N-dimensional cube with sides 
 
 of length i. 
 
 The set C(x°) = {x e R N hA n> ..., A. T e [0,1[: x = x ° + I \ % n } 
 
 n=l 
 
 is called a unit cell with origin x . 
 
 We consider translations of the unit cell described by translation 
 operators T^: 
 
 T$[C(x°)] = { x' R N |^x eC(x°): x« = f + Ax n } 
 
 for 151^ and X = , +1 , +2 , . . 
 
 Definition 2.9 A standard tessellation , T(x ,£,), of R into unit cells 
 is defined to be the set of all unit cells such that: 
 (i) C(x ) is a unit cell with origin x_ and sides of 
 length £4 
 
12 
 
 (ii) if Q(x 1 )qT{x 9 \) then there are translations 
 
 n n n 
 
 T 1 T r 
 
 X » > A such that 
 A l r 
 
 ^(t^C.CtJCcCx ))...) = c(x') . 
 
 A l A 2 A r 
 
 Remarks: 1. By Definition 2.8 all unit cells contained in a standard 
 
 N 
 tessellation of R do not overlap. 
 
 2. Whenever the length, &, is not relevant, a standard 
 
 tessellation, T(x ,£) will he referred to as T(x ). 
 
 Definition 2.10 Let S be a set of M points in X'(A) and T(x ) a standard 
 tessellation. The normalized density , p (x) , of S at a 
 point x e S is defined "by 
 
 n (x) = rr |SnC| , where CeT(x ) is a unit 
 H s — M i i — 
 
 cell with x£C. 
 
 The normalized density has the properties E p (x) = 1 and 
 
 xeS S 
 
 06p (x)il for all xeS. Therefore, we can utilize p to define the entropy 
 s s 
 
 of the set S. 
 
 Definition 2.11 The entropy H(S) of a set S of M points in X'(A) for a 
 given standard tessellation, T(x°) is defined by 
 
 H(S) = - I p (x)logp (x) . 
 
 xeS S ~ 
 
 The entropy, H(S), has the usual properties discussed in information theory 
 and thermodynamics: 0^H(S)<log M, and H(s)=log M iff VxeS: P (x) =Const. 
 
13 
 and H is a U-convex function (cf. IGAL68]). 
 
 2.1o5 Formal Description of Scene Segmentation into Regions 
 
 The ideas developed in the previous section can now be applied 
 
 to the segmentation of scenes into regions. 
 
 By definitions 2.5 and 2.7, a region, S, is an e-resp.G-chain- 
 
 connected point set in the plane of picture points. We denote the 
 
 x x 
 
 property space representation of a region S by S . Clearly, S is not 
 
 necessarily a chain-connected set with respect to a standard tessellation 
 
 of X(A), i e the unit cells containing points of S may not be edge- or 
 
 x ' • 
 
 vertex-adjacent. However, the projection of S into V x V is e-resp.G- 
 
 chain-connect ed . 
 
 The intra-set dispersion of points in S is measured by the negentropy 
 -H(S ) If the density distribution of points in S is uniform, H(S ) 
 attains its maximum, H(S ) = log |S |. Consequently, the problem of find- 
 ing regions by minimizing the average intra-set dispersion of points is 
 equivalent to maximizing the average entropy of regions. 
 
 To compare the dispersion of points of different regions in X(A) , 
 we have to define a measure which is independent of the area occupied by 
 particular regions in the plane. 
 
 Definition 2.12 The normalized entropy , H(S ) , of a set S of M>1 points 
 in X(A) is given as 
 
 H(S X ) = -^— H(S X ) 
 
 log M 
 
 For M=l: H(S X )=0. 
 
lU 
 
 Remark: The normalized entropy of a set, S , in X(A) satisfies 
 
 &£ H(b )= 1. For a given segmentation. S= {S , . .., S } 
 of a picture into L regions, the average ihtra-set 
 dispersion is measured by means of 
 
 1 L x 
 H(S) = i z H(S*) . 
 
 L £=1 £ 
 
 Minimizing the average intra-set dispersion is equivalent 
 to maximizing H( S) over all possible partitions of S 
 into regions. 
 
 2.1.6 Metrics in the Property Space 
 A. Objectives 
 
 The decomposition of scenes into regions depends decisively on how 
 the distance of any two points in the property space is evaluated. We 
 will restrict ourselves to considering weighted Euclidean metrics, i.e., 
 for any two points x, y_ £ X, the distance is given as 
 
 d(x,y_) = { i U n (x n -y n )]^ l/2 
 n=l 
 
 where w >0 is the weight for the n-th coordinate. This definition of a 
 distance is equivalent to multiplying each coordinate of vectors in X with 
 a particular weight (see [SEB62]). Hence, (w^) > (w.A ) C an be interpreted 
 as regarding the i-th coordinate to be more important than the j-th coordin- 
 ate, where A £ = max{x^} - min{x }. 
 
 The necessity of specifying the relative importance of each coordin- 
 ate with respect to all others can be seen by applying the Theorem of the 
 
15 
 
 Ugly Duckling ([WAT69a], [WAT69b]) to vectors in the property space, X: 
 
 Let each coordinate of vectors in X represent a predicate with 
 values or 1, and let all possible predicates be represented as coordin- 
 ate axes of X. Then, with each predicate, its negation is also a 
 coordinate axis of X. The Theorem of the Ugly Duckling states that any 
 pair of two vectors in X are as similar to each other as any other pair 
 of two vectors, where the similarity is given as the number of corres- 
 ponding coordinates which are equal. This "similarity" is identical to 
 the Hamming distance of binary sequences. The Theorem of the Ugly Duck- 
 ling still holds whenever any finite number of values can be attained 
 by each coordinate and the similarity is defined in the same way as the 
 number of equal, corresponding coordinates. As predicates with a finite 
 number of discrete values are the properties defined in Section 2.1.1, 
 we conclude that to introduce dissimilarities between vectors in X, the 
 properties have to be selected and weighted. 
 
 The procedure of attaching weights to properties is usually 
 referred to as feature selection. As the relative importance of features 
 is not known a priori, we have to give a training algorithm to determine 
 the weights. Although the problem of feature selection has been exten- 
 sively investigated in the literature, these studies do not lead to 
 efficient algorithms decomposing scenes into regions. An approach to 
 feature selection using entropy maximization is given in [TOU69]. This 
 method has characteristics typical to many other feature selection tech- 
 niques: It is assumed that (l) the number of classes to be recognized 
 is known and that (2) the components of feature (i.e. property) vectors 
 are normally distributed random variables. Both assumptions are usually 
 not applicable to picture segmentation. 
 
16 
 To obtain an algorithm that computes a metric from a given train- 
 ing set of examples, we consider the following system; 
 
 1. A training set, {P , ..., P , , ..,, P T >, of examples is given, 
 consisting of several pictures already decomposed into regions. 
 
 2. The distance between two points x_, y_ eX is given by the weighted 
 Euclidean distance 
 
 N 
 
 d(x,y_) = £{<, 2 (x -y ) 2 ) 1/2 
 — *- n n n n 
 
 n=l 
 
 N 
 with the constraint n w = 1. This constraint guarantees 
 
 n=l n 
 that metrics derived from pictures differing only by a linear 
 
 transformation (e.g. shrinkage) will be equal (cf. [SEB62]). 
 
 N " • 
 
 Remark ; Without the constraint II u = 1, d is not necessarily 
 
 n=l 
 a metric, but only a pseudo-metric (see [BIR6T]), as 
 
 d(x,y_) = is possible even when x^y for seme weight 
 
 vectors oj_ = {oj , ..., u }. 
 
 3. The weights {^ , ..., ov, } are to be adjusted so that P , . . „ , P 
 satisfy the [Region Criterion] of Section 2.1.5. 
 
 According to the [Region Criterion], H(S) is to be maximized for the 
 training set of sample pictures. To obtain a maximized H(s), we have to 
 adjust the weight vector, _o), so that the density is distributed as uniformly 
 as possible within each region of P , ..., P w . 
 
 B. Algorithm to Determine a Pseudo-Metric in the Property Space 
 Maximization Problem : 
 
 We assume that a standard tessellation of X and a training set 
 
 S ~{ S 2.» "** S 5' ••*' S j} > ^ L re S ions is given, where the £ -th region, 
 S^ , consists out of M^ cell groups. At this point, the coordinate axes 
 
IT 
 
 of X have some arbitrary scaling and we assume a weight vector w_ to be 
 w_ = (l, . .<,, l). Any change of w_ = (to, , . . . 7 ukJ changes a coordinate 
 value, x 9 to tox . ' If the unit cell, C, of the standard tessel- 
 lation was initially given "by some origin, :x = (xl.*..., X. n ) and 
 Ax/ = (Ax 19 ..., Ax n ), this will change to x' = ( w i x 01 > ••••» W ]\j x on^ and 
 Ax'= (u Jx ... s u) Ay ), The number of elements of S^ contained in the 
 unit cell that contains a vector, x £ X, is denoted by An ( x ,A x ) <, 
 Thus, the normalized density, p(S«,x), is given by 
 
 A n (x,Ax) 
 
 p(Sn ,x) = — • , and 
 
 36 Ax 1 «««Ax N 
 
 An(x,Ax) An(x,Ax) 
 
 H(S») = - / {^ log(i r- )} dx v-- dx w 
 
 xeS U l ^V"^ M * ^i--* Ax N 
 
 ^1 
 
 N 
 
 With the weight vectors, w_, and the constraint, n w =1, this entropy 
 
 n=l n 
 becomes : 
 
 M £ Av 
 V S £>=M^AV ' Mx-.AxM log( An( gt) )dxj-..dx^ 
 x £b £ 
 
 Hence, H (S«) attains a maximum when the distribution of An(x' ,Ax' ) is as 
 uniform as possible for varying to. 
 
 The Algorithm OPTW-H ; 
 
 lo Select some values Az n , Az_, .,., Az. T with A z > 0: 
 
 1' 2' ' N n 
 
 2. For each point x e S simultaneously; take x as the center 
 of a polyclinder, and expand +Az in each dimension n, l*n£N; 
 
18 
 
 3. Repeat step 2 for all x, x' e s that have not yet reached the 
 following stopping criterion; For some n, the expansion per- 
 formed in step 2 leads to a nonempty intersection with the 
 interior of an adjacent polycylinder with a point x' e g as 
 center. If this criterion is reached for some n, x, and x * , 
 then the last expansion +y\z in step 2 is deleted and further 
 expansions along the n-th coordinate terminated. 
 
 km If all expansions in step 2 are terminated "by the criterion of 
 
 step 3: For l*ra£M, , the polycylinder around x E S^ is given 
 
 by its centerpoint, x , and its radii \r^ =An. Az., where 
 J * ' -m ill 
 
 n. is the number of expansions around x in the n-th dimension c 
 i # -m 
 
 Hence, the weight w for the n-th coordinate is given by 
 * ° n 
 
 M t \ 
 1 (m) 
 
 w n - M ^^i •• 
 m=l 
 
 The result of this algorithm depends critically on the selection of 
 Az , ..., Az in step 1, i.e. on the coarseness of the chosen raster. 
 
 2.2 The Clustering Approach to Scene Segmentation 
 
 The basic idea of clustering is to partition a set of objects into 
 subsets such that each subset contains objects which are as similar to 
 each other as possible. Each of these subsets is called a cluster (a 
 review of clustering techniques developed before 1965 is given in [BAL65]). 
 In order to investigate the applicability of clustering techniques to scene 
 segmentation, the concept of clustering will be precisely stated and 
 related to that of region finding in the next section. Subsequently, 
 
19 
 
 existing clustering techniques will "be reviewed as far as they are used 
 to obtain a clustering approach to scene segmentation. 
 
 2.2.1 A Formal Concept of Clustering 
 
 The underlying concept for the definition of a cluster, "being 
 developed in this section, is that a cluster, E, is a collection of objects 
 such that the cohesion between all objects in R is somewhat larger than 
 the sum of cohesions within the sets of any partition of S. A frequently 
 used interpretation of cohesion is "average similarity". However, to 
 apply clustering to region finding techniques, cohesion will be construed 
 analogously to the mutual information defined in information theory,, 
 
 Definition 2.15 For any standard tessellation, T(x ) , of the property" 
 
 o ' 
 space, X, a standard tessellation, T'(x ), is called 
 
 K-embedded in T(x°) iff K is the smallest positive inte- 
 
 n n K n 
 
 ger such that J K translations T. ...T. e T 1 : C = U T, (C) 
 
 A l A K k=l \ 
 
 The following is an example for a (U,U)-tessellation 
 
 which is U-embedded in another (k,k )-tessellation: 
 
 zztiz 
 
 zdiiz 
 
 "1 — T 
 
 ^ 
 
 '■=F=$ 
 
 -■T^ 
 
 '^ 
 
 embedded tessellation 
 
 Definition 2.l6 Let T(x°) and T'(x° ) be standard tessellations of X and 
 T'(x° ) be K-embedded in T(x°). The cohesion c(rQ of a 
 
 set, R, consisting of P T(x )-cells, C^^. 
 
 C is 
 
 defined by: 
 
 r(R) = E H(C ) - H(R) . 
 P=l P 
 
20 
 
 Examples 
 1. 
 
 3 1 " 1 " 1 ! x~ rT " fx^ff^X 1 \ 
 
 |X 
 
 ,-*U 
 
 4 . I 
 
 5c xT oc i x 
 44 — J 
 
 IX" I XI 
 
 P=U, cohesion: 1.375 "bits 
 
 tessellation 
 T(x°) 
 
 tessellation 
 T r (x_° ) : each 
 cell contains k 
 points of X, k- 
 embedded in 
 T(x°). 
 
 2. 
 
 X 
 
 :— rl- 
 
 i4=_ 
 
 S==; 
 
 = r 
 
 1? 
 
 x 
 
 X 
 
 1 — ■ — -a 
 
 cohesion: U.00 bits 
 
 3. 
 
 PC x 
 pc X 1 
 
 -a — 
 
 PC 
 
 i i 
 " 3f B "3~~ 
 
 i 
 
 .i 
 ' i 
 
 pc 
 
 4 — I 
 
 cohesion: -2.00 bits 
 
 Cohesions for different scene segmentations are illustrated in Figures 2 
 and 3. 
 
 The cohesion, c(R), can be interpreted in terms of information 
 theory as the average mutual information of the T(x )-cells constituting 
 R on the ensemble of T'(x )-cells belonging to R. In the context of 
 
21 
 
 fc^ 
 
 --1- 
 
 ._!. 
 
 -|. 
 
 i 
 _! 
 -i— 
 
 i 
 
 I 
 
 i 
 f 
 
 L. 
 
 ..L. 
 
 -\- 
 
 . I — 
 
 ■\— 
 
 = t=4«— = 
 
 =^4=^u=U^= 
 
 ■• »— 
 
 I- 
 
 -r 
 
 -+■ 
 
 — L J 
 
 4- 
 
 ,i 1 1 
 
 --!■— -■»-• 
 
 (4,4)- IMBEDDED TESSELLATION 
 
 Figure 2a (^,M - Tessellation Which is 
 U-Imbedded in Another (U,U) - Tessellation 
 
 »L 
 
 "I 
 
 ,_L 
 
 A 
 
 ! x 
 
 = t 1- 
 
 X X I 
 
 — I 
 
 -J L-- ■ U- 
 
 _L 
 
 »l 
 
 X X X X 
 
 
 I X 
 
 ! P 
 
 T" 
 
 H^ 
 
 ^.L 
 
 s h 
 
 S L 
 
 T 
 
 =A 
 
 I x 
 I 
 
 41 i 
 
 i x 
 
 I 
 
 i _j 
 
 I X 
 
 ~r 
 
 i 
 i — 
 
 I X 
 
 I 
 
 I 
 
 _i 
 
 h 
 
 x 
 
 =» t= 
 
 COHESION: 1.375 BITS 
 
 COHESION: 4.00 BITS 
 
 Figure 2b Cohesions of Simple Regions 
 
22 
 
 COHESIONS: £(S 1 ) = 3.75 BITS 
 £(S 2 )=3.78 BITS 
 £(S 3 ) = 2.24 BITS 
 
 £(S 4 ) = 2.62 BITS 
 £(S 5 )=3.75 BITS 
 
 £(S|)=1.94 BITS 
 £(S 2 )=1.52 BITS 
 
 Figure 3 Cohesions of Different Regions in a Scene 
 
23 
 
 measure theory (cf. [HAL50]), it is. shown in [WAT69a] that C is a supra- 
 
 additive measure, i.e. for two sets, R and R , <;(R U R ) ^ c(R, ) + c( R ? ) 
 if z;(R. ) * c,^R p ) > 0. This is due to the fact that the entropy is a 
 
 sub-additive measure, i.e. H(R U Rp.)_< H(R ) + H(R ). If the cohesion 
 
 ^(R) is negative, then for some i (i&i^P) , H(C. . ) = 0. The extrema are 
 
 given by -log P <_ ^(R) < P log K - log (P»K). Definition 2.l6 now 
 permits us ^ Q define a cluster. 
 
 Definition 2.17 Let T(x ) be a standard tessellation such that there 
 
 o' 
 exists another standard tessellation, T'(x ) } that is 
 
 K-embedded in T(x ) with K>1. A chain-connected set, R, 
 
 of P T(x_ ) -cells is called an , £-cluster with respect 
 
 o' 
 to T'(x ) , if e >0 is the largest number such that there 
 
 p 
 exists a partition {R ,..., Rj of R., ^(r) - E^(R^)> e. 
 
 P=l P 
 
 2.2.2 Application of Clustering to Scene Segmentation 
 
 The task of finding an optimal partitioning of a picture into 
 regions satisfying the [Region Criterion] can usually not be solved in a 
 reasonable time by comparing all possible partitions. Instead, we employ 
 the following strategy: 
 
 (1) Using a parameterized heuristic rule, find disjoint chain- 
 connected sets of picture points such that each set is part 
 of exactly one of the regions to be foundo 
 
 (2) Each of the sets determined in (l) is used as a core for a 
 region by applying a "grow algorithm" to each set so that 
 eventually the entire picture is partitioned. 
 
 (3) Changing the parameters of step (l) allows further reparti- 
 tioning. 
 
2k 
 
 Psychologically, the decomposition of pictures into closed domains 
 is usually guided by two aspects [ZUS7Q]; The attributes of points in a 
 domain, (l) do not vary very much in their distribution, and (2) are 
 quite similar to each other. 
 
 The first point of view led to the [Region Criterion] of Section 
 
 s 
 
 2.1„5. The second suggests the application of a clustering technique. 
 To develop a method which employs both ideas, we make the following 
 heuristic assumption [HA]: [HA] Each region of a satisfactorily decom- 
 posed picture contains exactly one distinguished cluster. A "distinguished 
 cluster" in this context is an e-cluster such that for any other e'-cluster 
 contained in the same region, e >>e'. 
 
 Applying [HA] to the above strategy, we obtain the following 
 algorithm for segmentation of a scene S: 
 
 Algorithm S : 1. Select some value e; 
 
 2 Determine all e-clusters in S ,, obtaining the clusters 
 
 q I q f q t 
 
 1 » p » * * * » T * 
 
 3. Apply the algorithm GROW until all points in s are ■ 
 contained in one of the sets S„ , 1^£^L, obtained by 
 joining new points to S'. 
 The algorithm GROW joins points to a previously established cluster 
 center in accordance with the [Region Criterion]: 
 
 Algorithm GROW : Let x e S be a point not yet contained in any of the sets 
 S , . „ . , S of step 3 in the algorithm S. Perform the 
 operation S.-*- S U {x} if 
 
 (1) S U {x} is a chain-connected set and 
 
 (2) V A * j tf H(.S,0 .{x}) = 5H(S A U {x}). 
 
25 
 
 2.2.3 Review of Some Clustering Techniques 
 
 The necessity of using a clustering technique arises in step 2 of 
 algorithm S. Although there have been many such techniques suggested (cf. 
 [BAL65] , [SS63] ) , their applicability in our case is severly restricted by: 
 
 1. The number of points to be clustered is very large ( >10 ) s but 
 the number of attributes (gray value, color, etc.) is smaller 
 than, for example, those considered in the numerical taxonomy 
 of biological objects [SS63]. 
 
 2. It is not possible to make a priori assumptions about the 
 probability density distribution of points in the property 
 space. 
 
 3. Similarity measures as well as the metric in the property space 
 are subject to pre-clustering considerations and may be adapted 
 during the clustering procedure. 
 
 Under these restrictions, we consider (l) a probabilistic and (2) 
 a graph-theoretical technique. Their applicability to our scene segmenta- 
 tion approach will be evaluated in Section 2.2.4. 
 
 1. A Probabilistic Clustering Technique 
 
 We consider the probabilistic clustering technique of the following 
 
 scheme [TSY71]: 
 
 For a given set, X, of patterns, x, find L probability density dis- 
 
 L 
 
 tributions, P , ..., P , such that the mixture density, P(x) = £ P P„(x.k)» 
 
 1 L a=l Z l 
 
 attains each of its maxima (" modes " ) in exactly one of the disjoint, chain- 
 
 L 
 connected subsets X , ..., X of X with U X = X. 
 
 1 L £=1 L 
 
 This scheme satisfies restrictions 2. and 3. and is often referred 
 
 to as an example for "self-learning" [TSYTl]. In order to reduce the 
 
26 
 
 computational complexity and in accordance with ..re strict ion 1. we reduce 
 this scheme to extimatln^ anodes . After having found all modes, the : 
 remaining points in X can he classified according to the nearest neighbor 
 classification rule which classifies a point, x, "to the set containing 
 its nearest neighbor. It is shown in [CH6T] that the probability of 
 error for this rule is less than twice the error-probability of Bayesian 
 decision rules. 
 
 The following method for multivariant mode-seeking in a set, X, 
 is proposed in [HFTO] and [MD65]: 
 
 1. Compute the maximum eigen-vector (associated with the largest 
 eigen-value) , of the covariance matrix of X; 
 
 2. Project X onto its maximum eigen-vector; 
 
 3. Determine the extrema of the one-dimensional probability 
 density obtained in 2.; 
 
 k. Partition X with hyperplanes perpendicular to the eigen-vector 
 found in step 1. and intersecting the eigen-vector at the 
 locations of relative minima found in step 3°; 
 
 5. If only one extrema was found in step 3., the above procedure 
 is repeated, starting at step 1. with the next largest 
 eigen-vector; 
 
 6. For each new domain found in k. , the above procedure is 
 repeated from step 1. 
 
 2. Graph-Theoretical Clustering Techniques 
 a. Matula's Clustering Concept 
 
 In [MAT70] and [MAT71], D.W. Matula introduced a concept of cluster- 
 ing which can be summarized in terms of our notation as follows: 
 
27 
 
 A weighted graph, G = (r,E), consists of a set, r, of nodes and a 
 set, E, of edges between nodes in r such that each edge e-eE is associated 
 with a weight, w(e). Let S he a set of points representing a picture 
 in the property space, X(A). S can be considered as a complete graph, 
 G(s) = (s,E(s)) = {(x,x')| x, x' e $• We t£Lke the weight , co(e) , of an edge, 
 
 e = (x,x' )eE(S), to be the distance between its end points: co(e) = -77 j-\ • 
 
 Consequently, the weight of an edge in the graph, G(S), can be interpreted 
 as the affinity between its endpoints. The affinity between two disjoint 
 sets, S', S" e S of nodes in G(S) is given by 
 
 oo(S',S") = X Z I co(x,x'). 
 
 CI a" AtD s. to 
 I o 1 o — ~ 
 
 We consider G(S) to be pruned by specifying the threshold affinity graph , 
 G = {(x,x') e E(S)|d(x,x') <t} for some cut level t > 0. 
 
 If {G ,G } is a partition of any graph G, the cut set , C(G ,G p ), is 
 given by C(G ,G p ) = {(x,x' |x is node in G , x' is node in G p }. A graph, G, 
 of order >2 in which every cut set has k ^ edges is called k-edge-connect ed , 
 and a maximal k-edge-connected subgraph of G is called a k- component of G. 
 
 For any graph of order >2, the edge-connectivity is defined as 
 A(G) = min{|C||C is a cut set of G}. For any x e S U G(S), its cohesiveness 
 h(x) is defined as h(x) = max {X(G' ) |G' is a subgraph of G and G' contains 
 x}, and the strength of the graph, G, is a(G) = max{A^.G l )|G' is a subgraph 
 of G}. Every k-component (k>l) which does not contain a (k+l) -component 
 of G, as well as every trivial component (consisting of one vertex only) of 
 G is a cluster of G. G itself is a cluster iff a(G) = X(G). Any subgraph 
 k' of a cluster k in G with X(k') = A(k) is called a subcluster of G. 
 Under the relation "is a proper subgraph of", the subclusters of a graph 
 form a partial order with clusters as maximal elements. An algorithm , 
 
28 
 determining the k-components and clusters of a graph is given in [MAT71], 
 
 b. Detection of Clusters Using Minimal Spanning Trees 
 
 In [ZAHTl], an algorithm for detecting clusters using minimal span- 
 ning trees is given. The .following summary is a reinterpretation of this 
 concept adapted for application to our- scene segmentation approach. 
 
 A spanning tree , ST(s), of the graph, G(s), defined under a. is a 
 connected graph containing all nodes of G(S) but no circuits. A minimal spanning 
 tree, MST(s) of G(s) is a spanning tree with a minimal sum of all weights. 
 If G and G~ are two interconnected subgraphs of G, their distance , 
 p(G ,G ), is defined as the minimal weight of all edges connecting G and 
 G . The link set, 2^(G ,G ), is the set of all edges connecting G 1 and G^ 
 with weights = p(G ,G ). Any subset, C, of S is called a 6-clump iff for 
 any partition {C ,C 2 > of C, p,(C,S-C) - p.(C ,C 2 ) £ 6 with 6>0. It is shown 
 in [ZAHTl] that the restriction of an MST(g) to a 6-clump in s. is a 
 connected subtree of the MST(s). 
 
 A 6-clump is a set internally bound together stronger than the bounds 
 between itself and nodes outside, whereas an e-cluster of Definition 2.17 
 is completely defined by .its ■ own internal properties without respect to 
 its environment. 
 
 An efficient algorithm for determining a minimum spanning tree of a 
 graph is given in [SEP70] and others are referenced in [ZAH71]. The 
 strategy to detect an e-cluster, given in [ZAH71], is to determine incon- 
 sistent edges , whose weights differ significantly from the average weight 
 of edges in the MST. 
 
 2.2.U A Combined Clustering Technique for Scene Segmentation 
 
 When applying a graph clustering technique to a set, S, of points in 
 
29 
 
 the property space, S is regarded as a complete graph, Gfe). Clusters, 
 as defined in [MATTl], have to be obtained by taking appropriate values 
 for t and searching for k- components in G for some k. [MATT0,Tl] do 
 not give a way to choose t and k. 
 
 A connection between the concepts of [ZAHTl] and [MATT0,Tl] is 
 established by the following theorem: 
 
 (*) Let G be a complete graph and MST(G) be a minimal spanning 
 
 tree of G. Then, the threshold affinity graph, G , partitions 
 G into complete subgraphs, G" , G , ..., G / s,iff there are 
 exactly n(t) edges, (x^x'), ###j (* (* ) > x ' (+ ) ^ ' in MST(G) 
 
 with d(x. , x- ) i "t for i =i in(t). The nodes of the sub- 
 
 t t 
 graphs, G , ..., G /.\ s are those of the subtrees of MST(G) 
 
 after cutting all edges with weights >, t. 
 Proof: This property follows immediately from Theorems 1-3 in [ZAH71]. 
 
 Any edge in a minimal spanning tree connects two subtrees , and by 
 Theorem 2 of [ ZAHTl], it is the edge with a smallest weight among all edges 
 connecting the nodes of both subtrees in the complete graph containing all 
 nodes of the tree. (*) allows us to reduce the problem of choosing the cut 
 level, t , by considering all edges in the complete graph, G, to the task 
 of looking at the edges of MST(G) only. 
 
 The strength of a complete graph, G, with M nodes is <G) = ^G) 
 = M-l, i.e. any complete graph is a cluster as defined in [MATTl] and has 
 no subclusters except itself. Hence, the concept of clusters in [MATTl] 
 is not applicable to our case. Instead, we apply the idea of cohesion 
 developed in Section 2.2.1 to graphs. 
 
30 
 
 Definition 2.18 Let T "be a tree such that each edge, e, in T connect- 
 ing two nodes, x and x ' , has a weight w(e) = -r? yy 
 
 The entropy of T is defined as 
 
 H(Tj E m ^TtT log ^TtT 
 
 eel 
 
 where 'ui(T) = I u(e). 
 e e T 
 
 Definition 2.19 The cohesion, g(T;T , ..., T ) , of a tree, T, with 
 
 respect to a partition, {T , ..., T }, of subtrees in 
 T is defined as 
 
 M 
 C(T;T , ..., T M ) = E H(Tj - H(T) . 
 
 m=l 
 
 With Definitions 2.l8 and 2.19, e-clusters are also defined for trees. 
 
 An efficient clustering algorithm has to cope with the problem of 
 finding appropriate partitions, {T , ..., T }, of subtrees. The method 
 of detecting inconsistent edges in [ZAH71] assumes a priori knowledge 
 about the distribution of weights, which was excluded by restriction 2. 
 of Section 2.2.3. An algorithm leading to quite optimal results is given 
 in [WAT69a] but requires comparison of all possible dichotomies when 
 sequentially partitioning the graph, thus violating restriction 1. There- 
 fore, it seems appropriate to apply a probabilistic method which is more 
 efficient when dealing with a large number of objects. This can be done 
 by virture of the following definition: 
 
31 
 Definition 2.20 Let G be a graph and x a node in G. Then the weight , 
 
 w(x-), is defined as 
 
 W = I W^M z w ^ e )' if l E ( x )| >.2, 
 
 — ' eeE(x) 
 
 where E(x) is the set of all edges having x as one end- 
 point and w(e) = -r? pr , if x' is the other endpoint 
 
 of e. 
 
 For |E(x_)| = 1, we define cu(x) = 0. 
 
 If co(G) = Z w(x)» the normalized weight of a 
 
 x node in G 
 node is denoted by 
 
 oj(x) 
 
 TTgT * 
 
 ,(x) = 
 
 Interpreting the normalized weight of a node as a probability den- 
 sity, we can now apply the mode seeking algorithm indicated in Section 
 2.2.3 to minimal spanning trees. The resulting modes are nodes in the 
 MST to be used as cluster centers. The distance between two nodes in a 
 tree is given by the sum of all weights co(e) of edges forming the connect- 
 ing path between the nodes. After having determined the modes of a MST, 
 a partition of subtrees can be obtained by applying the nearest neighbor 
 classification technique. 
 
 Combining the preceding concepts, we obtain the following algorithm 
 to partition a chain-connected set, S, of points in the property space 
 into regions; 
 
32 
 
 Algorithm C 
 
 1. In the complete graph, G(S), determine the minimal spanning 
 tree, MST(s), using the distances in the property space as 
 weights of edges in G(S); 
 
 2. Compute normalized weights of all nodes in MST(S); 
 
 3. Apply the mode-seeking algorithm of Section 2.2.3 to MST(S), 
 
 C C C 
 yielding modes x , x , ..., x , defined as the cluster modes 
 
 in MST(S); 
 
 C C 
 
 h. Using (x } , ..., {x } as initial classes, apply the nearest 
 
 neighbor classification method to obtain a partition, 
 
 {S-. , ..., S } of chain-connected sets in 8: 
 1 n 
 
 5. Using parameters 8., and , repartition {S , ..., S } by 
 applying the algorithms MERGE and SLICE. 
 
 a. Algorithm MERGE: Merge two adjacent subtrees, S. and S. 
 
 J 
 
 of MST(S) if s(S. U S.,S.,S.) >6 n , for some parameter 0, . 
 1 J i J 1 1 
 
 b. Algorithm SLICE: Partition a subtree, S., of MST(S) into 
 
 subtrees, S, S, ..., S. / . \ > if applying steps 1 to h 
 
 of Algorithm C yields a partition {S. n , ..., S. /.%} of 
 
 il' in(ij 
 
 subt 
 
 rees in S. with c(S.;S ... s. , . ,) > 8„. 
 
 i l il' ' m(i) 2 
 
33 
 
 CHAPTER 3 
 STRUCTURAL ANALYSIS OF SCENES 
 
 3.1 Data Representation of Composite Scenes 
 
 We assume to be given the following results of the scene segmenta- 
 tion procedure described in Chapter 2: 
 
 I. A scene, S, has been partitioned into M regions, S , ... S ; 
 
 r r r 
 II. Considering a set A = &. , . . . , A„} of T attributes applic- 
 able to regions, the vector (A (S ), ..., A (S )) has been 
 
 evaluated for each region, S , in S. 
 
 m 
 
 r r r 
 
 III. Let R = {R, , ..., R q } be a set of functions 
 
 Sxs->W r = W r U{*} with 14S4S, 
 
 R r : J 3 S 
 
 (S.,S ) -» R r (S.,S ) lti^-M 
 
 mapping pairs of different regions into a value set, w , 
 
 which consists of a discrete finite set, w of real numbers 
 
 s 
 
 and the don't care value, *. 
 
 For each pair, (S.,S.), of different regions, the vector 
 ■^ J 
 
 (R^(S.,S.), , R^(S.,S.)) has been evaluated. 
 
 1 1 J b 1 J 
 
 r r 
 Remarks: 1 The attributes A , ..., A map regions into discrete, 
 
 finite value sets, V (l*t*T), as do the local attributes 
 
 introduced in Definition 2.1. Two different types of such 
 
 attributes can be considered: 
 
 a. attributes obtained by averaging local attributes 
 
 over all picture points contained in a region. (Example: 
 The arithmetic mean of gray values of points in a region, 
 
3k 
 
 (Example: An attribute indicating the shape of 
 a region, e.g. with a value set {1, 2, 3, *}, where 
 1 = circular, 2 = triangular, 3 = rectangular, and 
 * = (not 1, 2, or 3 or not specified).) 
 2. The functions, R , . v , R can be interpreted as binary- 
 relations with many values. (Example: The quantized 
 
 distance, d (S.,S.), of two regions, S. and S. is 
 1 J 1 ,1 
 
 d r (S, ,Sj = [ , q | i 1 | Z I ||x. -x 
 
 ib i l|S 7x,sS. x. e S "- 1 : 
 
 where | |x| | denotes the Euclidean norm of a point, 
 x = (x- 9 x 2 ) 9 in the plane and [r] denotes the smallest 
 integer less than or equal to r.) 
 3. Although n-ary relations with n>2 may be used to describe 
 properties of regions in a scene, the restriction to 
 binary-relations, like the functions R , does not restrict 
 the generality of scene description-. It is shown in 
 [MONTI] that n-ary relations with n>2 can be reasonably 
 well represented or approximated by binary relations. 
 We consider two representations of scenes: 
 
 A- The Relationship Matrix - The relationship matrix, R = [r. ..], 
 is introduced in [CRTl]. For a scene partitioned into M 
 regions, S^ ..., S^ E is an MX M matrix. The entries, r. ., 
 of R are: 
 
 for i 4 j: S-dimensional vectors, (R^(S . ,S . ) , ..., r£(S.,S.)) 
 
 for i = j: T-dimensional vectors, (A^(S. ,S . ) , ..., a£(S.,S.) 
 
35 
 
 A complete description of all properties of a region, S., is 
 
 provided by the i-th row and the i-th column of R. R is 
 
 r r 
 symmetric if all functions, R en, are symmetric with 
 
 respect to their arguments, (S.,S.) e S . 
 
 -*- J 
 
 B. The Digraph Representation - A scene, S = {S , . . . , S } , can 
 
 be represented as a digraph, G = (r ,E ), whose node set is 
 
 r = {S , ..., S }. The set, E , of edges in G consists of 
 
 labelled and weighted edges. There is an edge, e. ., such that: 
 
 for i 4 j: u, with ltueS, indicates the subscript in 
 
 r r u 
 
 R of a function in R and e. . is associated 
 
 u 1J 
 
 with the weight , to( e . . ) = R (S.,S.). 
 
 ij u i' j 
 
 for i = j: u, with l^u^T, indicates the subscript of 
 
 r u 
 an attribute in A and e. . is associated with 
 
 lj 
 
 the weight, co(e U .) = A r (S.). 
 
 Remark: To simplify this representation, an edge, (S.,S.), 
 
 J 
 
 with a label, u, is deleted whenever R (S.,S.) = *. 
 
 u 1 j 
 
 3.2 Synthesis of Composite Objects in Scenes 
 
 The scene segmentation procedure described in Section 2 results 
 in partitioning scenes into primitive regions. Usually, scenes consist 
 of composites of primitive regions, called figures . Hence, scene analysis 
 has to cope with synthesizing figures from primitive regions. The synthesis 
 of figures is done by merging regions according to some criteria to be 
 established. However, these criteria depend upon the specific environment 
 and the intentions considered. In this section, the formal structure of 
 
36 
 two such approaches is briefly reviewed, using the scene representations 
 introduced in Section 3.1. 
 
 3.2.1 Synthesis of Composites Using Graph Transformation Rules 
 
 Let G r = (r r ,E ) be a digraph representing the scene S = {S , ..., S }, 
 
 The synthesis of composites is obtained by merging regions, yielding a 
 
 new graph G = (r ,E ). Nodes in r may either represent primitive regions 
 c c c c 
 
 in S or regions obtained by merging regions in S. The edges between any 
 
 r r 
 
 two nodes in V , which are also both nodes in r , remain unchanged. All 
 c 
 
 r . . r r 
 
 other edges in E have to be determined. The function mapping G to G 
 D c c 
 
 is a special graph transformation. ■ 
 
 It is shown in [SMC 70 ] that many graph transformation rules 
 can be conveniently inferred from invariance principles such as conserva- 
 tion of union and intersection of point sets. 
 
 3.2.2 Synthesis of Composites Using Cartesian Covers 
 
 The concept of cartesian covers is introduced in [MMC 70]. A 
 
 discrete, n-dimensional set E = {(x, , ..., x )| x.eE. for 1 < i < n} with 
 
 1 n ' i i = = 
 
 H = {0, 1, ..., h.-l| h. > 0} (l=i=n) is considered. An element 
 (x , ..., x ) in E is usually denoted by e with 
 
 n-1 i-1 
 j = x + E (x . I h.J 
 
 n i=i n " 1 k=o n " k 
 
 and e £ E is called an event. 
 
 A cartesian literal , X^ 1 , is defined as the set 
 
 Xf = ((x r ....xJIx.eA.} 
 
 for A. S H. (1 < i <*n) 
 
 l i = = 
 
37 
 
 A set Lt E is called a cartesian complex if it is represented as the 
 intersection of cartesian literals: L = f\ XV , where I£{1, ..., n}. 
 L is also called 
 
 iel X 
 
 an interval, if there are n numbers a., b.e H. (i=i=n) such that 
 
 ' 11 i 
 
 A. = {x . I a . < x . <b.} for all i . 1 < i < n : 
 i i ' i = i = l = = 
 
 a factor, if there do not exist a. , b. e H. (for all i, 1 < i < n) such 
 
 ' ill ' = = 
 
 that A. = {x.la. <x. <b.}. 
 
 l i ' l = l == l 
 
 Example : Let n = 2 , h = h 2 = 3 * A l = {1 ' 2 ^' A 2 = {0 ' 1} ' A { = *° > 2} > 
 and A' 2 ={0,2}; 
 
 A_1 A 2 A l A 2 
 
 then, L = I'/l X. is an interval, while L' = X X is 
 
 a factor. 
 
 Let f: E-*r[0,l] ,#} be a mapping defining the sets F = {eeE|f(e) = a} 
 
 for ae{0,l,*}, F* = {e e E|0 < f(e) < 1}, and for some Xe[0,l]: 
 
 F 1X = {eeE|f(e) > X}, F° X = {eeE|f(e) < X} . A set D(f |x) = {L. | i=l,2,...} 
 
 of cartesian complexes is called a cartesian cover of f under X if 
 
 F ^ U L. *•=. F U F*. If all cartesian complexes in D(f|x) are intervals 
 
 i 
 (factors), D(f|x) is also called an interval (factor) cover. For simplicity, 
 
 we assume X = 1 and write D(f|x) = D(f), F 1X = F 1 , and F° A = F°. 
 
 The application of cartesian covers to the formation of composites 
 
 consists of two steps: 
 
 1. We interpret the property space as an n-dimensional discrete 
 
 vector space E. Then, the attribute values of a region are represented by 
 
 an event in E. Thus, a scene S = {S , ..., S } is represented in E by n 
 
 events e n , . . . , e . 
 1 n 
 
 2. We establish criteria to define the optimality of a cartesian 
 cover. Then, we try to find an optimal set of cartesian complexes covering 
 the events e , ... e . A composite is formed by joining all events covered 
 
38 
 
 by exactly one cartesian complex. 
 
 An efficient algorithm to form optimal cartesian covers is 
 described in [MR 72]. Examples for criteria defining the optimality of a 
 cartesian cover L are: 
 
 a) E(LflF ) = Max. - maximize the number of events in F 
 
 covered by complexes in L; 
 
 b) i(h) = Min. - minimize the number of literals needed 
 
 to represent L; 
 
 c) den(L,F ) = Max. - maximize E(L0F ) per literal in L. 
 
 where , 
 
 E(LHF ) ■ 
 
 den(L,F ):= 
 
 fc(L) 
 
 For synthesizing switching circuits, the importance of these 
 criteria is well established. However, in the case of scene analysis, 
 little is known whether these criteria do actually contribute to the 
 description of regions, and further experimentation is necessary. For 
 this purpose, weights for specific attributes or events may be introduced 
 to compute weighted covers . 
 
 3.2.3 Matching Models to Scenes 
 
 It seems to be appropriate to assume that most scenes we are 
 dealing with are composed of composite figures such that each can be regarded 
 as a realization of some known 'model figure'. Instead of imposing criteria 
 of hardly known importance and meaning, we may try to fit models to a scene. 
 For this purpose, the relationship matrix appears to be a useful data 
 structure. 
 
 A model, M, is a collection of m objects M = {M , ..., M }, repre- 
 sented by an (m,n )-relationship matrix R(M). We assume that we are given 
 
39 
 
 a set {M., ..... M } of p models. Then, the task of interpreting a scene 
 —1 — p 
 
 S = {S, , .... S } in terms of models, M_ , ..., M , can be defined as 
 1' n — 1 ~i? 
 
 finding an optimal match of S to some M. such that (a) each region in S 
 
 J 
 
 is associated with exactly one model, and (b) the number of regions not 
 or only poorly matching a model is kept at a minimum. Although approaches 
 toward matching algorithms have been made (e.g. [BP TO]), their efficiency 
 is still rather poor. 
 
Uo 
 
 CHAPTER k 
 CONCLUSIONS 
 
 The property space representation allows us to define the density 
 of the spatial distribution of local properties of picture points. Properly 
 normalized, this density can be interpreted as a probability density which 
 serves to define entropy and cohesion of a set of picture points. It is 
 shown for an arbitrary but simple example that the cohesion gives a good 
 theoretical basis for applying clustering methods. Clearly, much more 
 experimentation is necessary to establish this concept. 
 
 Similar to parsing in syntax analysis, scene analysis has (l) 
 to infer a global structure from local properties, and (2) to recognize 
 given structures by matching their components to regions in the picture. 
 Successful parsers usually employ a combination of the 'top-down' and 
 'bottom-up' method, and this strongly suggests an approach combining these 
 two schemes in scene analysis, too. Therefore, it seems to be highly 
 desirable to extend global analysis techniques as graph transformations 
 and covering so that they will also employ local properties. 
 
kl 
 
 REFERENCES 
 
 [ATT 5^] Attneave, F. , "Some Informational Aspects of Visual Perception", 
 Psychology Review, vol. 6l (195*0, pp. 183-193. 
 
 [AA56] Attneave, F. and Arnoult, M.D., "The Quantitative Study of Shape 
 and Pattern Perception", Psychology Bulletin, vol. 53 (1956), 
 PP. U52-U71. 
 
 [BAL 65] Ball, G.H. , "Data Analysis in the Social Sciences - What About 
 the Details?", Proceedings of the 1965 Fall Joint Computer 
 Conference, vol. 27, part 1 (1965), pp. 533-559. 
 
 [BP 70] Barrow, H.G. and Popplestone, R.J., "Relational Descriptions in 
 Picture Processing", in Machine Intelligence, Vol. 6 , Melzer, B. 
 and Michie, D. (eds.), Edinburgh University Press, 1970, 
 pp. 377-396c 
 
 [BIR 67] Birkhoff, G„ , Lattice Theory , American Mathematical Society, 
 Providence, Rhode Island, 1967. 
 
 [BON 6k] Bonner, R.E. , "On Some Clustering Techniques", IBM Journal of 
 Research and Development, January 196U, pp. 22-32. 
 
 [BF 70] Brice, C R. and Fennema, C.L., "Scene Analysis Using Regions", 
 Artificial Intelligence, vol. 1 (1970), pp. 205-226. 
 
 [CH 67] Cover, T.M. and Hart, P.E. , "Nearest Neighbor Pattern Classifi- 
 cation", IEEE Transactions, vol. IT-13 (1967), pp. 21-27. 
 
 [CR 71] Chien, Y.T. and Ribak, R., "Relationship Matrix as a Multi- 
 Dimensional Data Base for Syntatic Pattern Generation and 
 Recognition", Proceedings of the Two Dimensional Digital Signal 
 Processing Conference, October 1971, Columbia, Missouri. 
 
 [CUL 68] Cullen, H.F. , Introduction to General Topology , Boston, 1968. 
 
 [FER 67] Ferguson, T.S., Mathematical Statistics , Academic Press, New 
 York, 1967. 
 
 [FOY 6U] Foy, W.H., "Entropy of Simple Line Drawings", IEEE Transactions, 
 vole IT-10, April 1964, pp. 165-I67. 
 
 [FRA 67] Fralick, S.C., "Learning to Recognize Patterns Without a Teacher", 
 IEEE Transactions, vol. IT-13, January 1967, pp. 57-64. 
 
k2 
 
 [FU 69] Fu, K.S., "On Sequential Pattern Recognition Systems" , in 
 Methologies of Pattern Recognition , Watanabe, S. (ed.), 
 Academic Press, New York, 1969. 
 
 [GAL 68] Gallager, R.G. , Information Theory and Reliable Communication , 
 Wiley and Sons, New York, 1968. 
 
 [GC 66] Green, R.T. and Courtis, M.C., "information Theory and Figure 
 Perception: The Metaphor that Failed", Acta Psychologica, 
 vol. 25 (1966), pp. 12-35, 
 
 [GR 69] Gower, J.C. and Ross, G.J.S. , "Minimum Spanning Trees and 
 
 Linkage Cluster Analysis", Applied Statistics, vol. 18 (1969), 
 
 PPo 5U-6U. 
 
 [GRA 7l] Gray, S.B. , "Local Properties of Binary Images in Two Dimensions", 
 IEEE Transactions, vol. C-20 (l97l), pp. 551-561. 
 
 [GRE 65] Grenander, U. , "Some Direct Estimates of the Mode", Ann. Math. 
 Stat., vol. 36 (1965), pp. 131-138. 
 
 [GUZ 68] Guzman, A., "Decomposition of Scenes into Bodies", AFIPS 
 
 Conference Proceedings, vol. 33, part 1 (1968), pp. 291-30U„ 
 
 [GUZ 70] Guzman, A., "Analysis of Curved Line Drawings Using Context and 
 Global Information", in Machine Intelligence , Vol. 6, Meltzer, 
 B. and Michie, D. (eds.), Edinburgh University, 1970, pp. 325-375. 
 
 [HAL 50] Halmos, P.R. , Measure Theory , Princeton Press, 1950. 
 
 [HA 67] Hart, P.E. , "A Brief Survey of Preprocessing for Pattern Recog- 
 nition", Stanford Research Inst. Report No. RADC-TR-66-819 (l9t»7). 
 
 [HA 69] Harary, F. , Graph Theory , Addison-Wesley , New York, 1969. 
 
 [HARL 70] Harlick, R.M. , "Multi-Image Clustering", Proceedings of the 1970 
 Army Numerical Analysis Conference, pp. 75-90. 
 
 [HF 70] Hennchon, E.G. and Fu, K.S., On Mode Estimation in Pattern 
 
 Recognition , Purdue University Press, Lafayette, Indiana, 1970. 
 
 [HK 67] Haralick, R.M. and Kelly, G.L., "Pattern Recognition with 
 
 Measurement Space and Apatial Clustering for Multiple Images", 
 Proceedings of the IEEE, vol. 57, April 1967. 
 
 [HY 6l] Hocking, J„G. and Young, G.S. , Topology , Addison-Wesley, New 
 York, 196I0 
 
k3 
 
 [KIT 63] Kittel, C, Introduction to Solid State Physics , 2nd Edition, 
 John Wesley, Nev York, 1963. 
 
 [MAT TO] Matula, D.W. , "Cluster Analysis via Graph Theoretic Techniques", 
 Proceedings of the Louisiana Conference on Combinatorics , Graph 
 Theory, and Computing, Mullin, R.C. (ed.), University of 
 Manitoba, 1971, pp. 199-212. 
 
 [MAT 71 ] Matula, D.W. , "k-Components , Clusters, and Slicings in Graphs", 
 
 preprint in 1971. To appear in SIAM Journal of Applied Mathematics, 
 
 [MD 65] Mattson, R.L. and Dammann, J.E. , "A Technique for Determining 
 and Coding Subclasses in Pattern Recognition Problems", IBM 
 Journal of Research and Development, July 1965. 
 
 [MI 72] Michalski, R.S., "Varivalued Logic and Its Applications to 
 
 Pattern Recognition", Department of Computer Science, University 
 of Illinois, Urbana, Illinois Report, 1972 (in preparation) „ 
 
 [MMC 72] McCormick, B.H. and Michalski, R.S. , "CARTESIAN COVERS - A 
 Theoretical Introduction", Department of Computer Science, 
 University of Illinois, Urbana, Illinois, 1972 (in preparation). 
 
 [MON 71] Montanari , U. , "Networks of Constraints: Fundamental Properties 
 and Applications to Picture Processing", Department of Computer 
 Science Report, Carnegie-Mellon University, Pittsburgh, 
 Pennsylvania, January 1971. 
 
 [MP 67] Minsky, M.A. and Papert, S. , Project MAC Progress Report IV , 
 MIT Press, Cambridge, Massachusetts, 1967. 
 
 [MR 72] Michalski, R.S. and Raulef s , P., "Computer Synthesis of Cartesian 
 Covers and Varivalued Logic Expressions", Department of Computer 
 Science Report, University of Illinois, Urbana, Illinois, 1972 
 (in preparation). 
 
 [ORE 62] Ore, 0., Theory of Graphs , American Mathematical Society 
 Publication, vol. 38, Providence, Rhode Island, 1962. 
 
 [PR 70] Preparata F.P. and Ray, S.R. , "An Approach to Artificial Non- 
 symbolic Cognition", Coordinated Science Laboratory Report No, 
 R-l+78, University of Illinois, Urbana, Illinois, 1970. 
 
kk 
 
 [ROB 65] Roberts, L.G. , "Machine Perception of Three Dimensional Solids", 
 Optical and Electro-Optical Information Processing , Tippet, et. 
 al. (eds.), MIT Press, Cambridge, Massachusetts, 1965. 
 
 [ROS 69] Rosenfeld, A., Picture Processing by Computer , Academic Press, 
 New York, 1969. 
 
 [SEB 62] Sebestyen, G.S. , Decesiori-Making Processes in Pattern Recognition , 
 New York, 1962. 
 
 [SEP TO] Seppanen, J. J. , "Algorithm 399", Communications of the ACM (Oct. 
 1970), pp. 621-622. 
 
 [SMC TO] Schwebel, J.C. and McCormick, B.H. , "Consistent Properties of 
 Composite Formation Under a Binary Relation", Information 
 Sciences, vol. 2 (19T0), pp. 179-209. 
 
 [SS 63] Sokal, R.R. and Sneath, P. H. A. , Principles of Numerical Taxonomy , 
 San Francisco, 1963. 
 
 [TOU 69] Tou, J.T., "Feature Selection for Pattern Recognition Systems", 
 in Methodologies of Pattern Recognition , Watanabe, S. (ed.), 
 Academic Press, New York, 1969, pp. ^93-508 . 
 
 [TSY 71] Tsypkin, Ya. Z. , Adaption and Learning in Automatic Systems , 
 Academic Press, New York, 1971. 
 
 [WAT 60] Watanabe, S. , "Information-Theoetical Aspects of Inductive and 
 Deductive Inference", IBM Journal of Research and Development, 
 vol. h (i960), pp. 208-231. 
 
 [WAT 69a] Watanabe, S. , Knowing and Guessing , New York, 1969. 
 
 [WAT 69b] Watanabe, S., "Pattern Recognition as an Inductive Process", 
 Methodologies of Pattern Recognition , Watanabe, S. (ed.), 
 Academic Press, New York, 1969, pp. 521-53 1 +. 
 
 [ZAH 71] Zahn, C.T., "Graph-Theoretical Methods for Detecting and 
 
 Describing Gestalt Clusters", IEEE, vol. C-20 (1971), pp. 66-86. 
 
 [ZUS 70] Zusne, L, Visual Perception of Form , Academic Press, New York, 
 1970. 
 
 
ormAEC-427 U.S. ATOMIC ENERGY COMMISSION 
 
 (6/6 ?L. UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR 
 
 DISPOSITION OF SCIENTIFIC AND TECHNICAL DOCUMENT 
 
 ( See Instructions on Reverse Side ) 
 
 AECM 3201 
 
 AEC REPORT NO. 
 
 COO-2118-0028 
 
 UIUCDCS-R-72-l±96 
 
 2. TITLE 
 
 METHODOLOGICAL ASPECTS OF SCENE SEGMENTATION 
 
 J. TYPE OF DOCUMENT (Check one): 
 
 [3J a. Scientific and technical report 
 
 ^] b. Conference paper not to be published in a journal: 
 
 Title of conference 
 
 Date of conference 
 
 Exact location of conference. 
 
 Sponsoring organization 
 
 □ c. Other (Specify) 
 
 ». RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): 
 
 a. AEC's normal announcement and distribution procedures may be followed. 
 
 "2 b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors. 
 ] c. Make no announcement or distribution. 
 
 .. REASON FOR RECOMMENDED RESTRICTIONS: 
 
 I. SUBMITTED BY: NAME AND POSITION (Please print or type) 
 
 Peter Raulefs 
 
 Fellow, Department of Computer Science 
 
 Organization 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 6l801 
 
 Signature 
 
 r^Ur £q^ H^ 
 
 Date 
 
 January 2k, 1972 
 
 FOR AEC USE ONLY 
 
 AEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE ANNOUNCEMENT AND DISTRIBUTION 
 
 RECOMMENDATION: 
 
 PATENT CLEARANCE: 
 
 □ a. AEC patent clearance has been granted by responsible AEC patent group. 
 LJ b. Report has been sent to responsible AEC patent group for clearance. 
 I I c. Patent clearance not required. 
 
BLIOGRAPHIC DATA 
 IEET 
 
 1. Report No. 
 
 UIUCDCS-R-T2-H96 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 June, 1972 
 
 Title and Subtitle 
 
 Methodological Aspects of Scene Segmentation 
 
 Author(s) 
 
 Peter Raulefs 
 
 8. Performing Organization Rept. 
 No. 
 
 Performing Organization Name and Address 
 
 Dept. of Computer Science 
 
 Univ. of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 6l801 
 
 10. Project/Task/Work Unit No. 
 
 ILLIAC III 
 
 11. Contract /Grant No. 
 
 AT(11-1)-2118 
 
 . Sponsoring Organization Name and Address 
 
 U. S. Atomic Energy Commission 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 Feb. - June 1972 
 
 Supplementary Notes 
 
 . Abstracts 
 
 Scene segmentation into regions and structural analysis of scenes is discussed. 
 
 The spatial distribution of properties of points in a digitized picture is used 
 to define the entropy of point sets (regions) in a picture. An optimal partitioning 
 of a scene into regions is given by maximizing the average entropy of all regions. 
 Using entropy, the cohesion of regions is defined and applied to analyzing the 
 clustering approach to scene segmentation. 
 
 Structural analysis of scenes in terms of synthesizing composites is briefly 
 reviewed by considering graph transformations and cartesian covers. 
 
 . Key Words and Document Analysis. 17a. Descriptors 
 
 J. Identifiers/Open-Ended Terms 
 
 1:. COSAT1 Field/Group 
 
 { Availability Statement 
 
 ■ RELEASE UNLIMITED 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 'M NTIS-35 ( 10-70) 
 
 USCOMM-DC 40329-P7I 
 
stf 
 
 z^ 
 
 VST* 
 
JUL 26 
 
 1973 
 
^0^*0 
 
mm 
 
 m^^H SB 
 IB HHnffl •■•/ ij 
 
 
 ■. 
 
 
 
 r»i 
 
 »« 
 
 « 
 
 
 
 ^B 
 
 
 1 
 
 ■ 
 
 BU I i 
 
 HBO ROM 
 I 
 
 I