LI B RAR.Y OF THE U N IVERS ITY or ILLl NOIS 370 ' 1 Return this book on or before the Latest Date stamped below. University of Illinois Library etc 2 2 Wif 2 5 1*1 L161— H41 Digitized by the Internet Archive in 2012 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/similaritybetwee02cron BUREAU OF RESEARCH AND SERVICE College of Education University of Illinois Urbana, Illinois SIMILARITY BETWEEN PERSONS AND RELATED PROBLEMS OF PROFILE ANALYSIS Study performed under Contract N6ori-07135 with the Bureau of Naval Research Lee J. Cronbach University of Illinois and Goldine C Gleser Washington University Technical Report No. 2 April, 1952 THf LIBRARY OF THE AUG 1 1 1C5? UNiv:r.:i7ir of Illinois SHiILARITY BE'RVLLN ?ERSOI^[S /.I© RI^.UTED PROELii3:3 C? PROt^lLi: ANALYi;!^^^ Lee J, Cronbach, Brreau of hesearch rnd Se::-^ice. College of ICciU'^ ation. University of Illinois Goldine Cc Gleserc, DepartiT-ent of Ueuropsycliiatry, School of hedicine, VJaship.c-ton UrdversitT Studies of personality ' and behavior are turning increasingly to a simul- taneous consider aoion of sever ol traits or characteristics, arid a great msn^'' in- vestigations attenpt to deal irith profiles or patterns of scores© In this paper -je bring together tlxe procedures 'jhich may be used for describ- ing relations between such patterns of multiple scores., A cor,p arisen of these possible treatments leads to recorriiTiendations for iirrj-iroved procedures in fut'ore investigations of sii'iilarity betvjeen persons » The tj'^re cf research on Hhich our results bear csn be iilust:rL.ted by refer- ence to several recent studies. One is the efCort by Kell^/ and i^lske (22} to validate certain predictions made in the VA study of clinical ps^rcholo^ists. They compax-ed profiles cf assessors' ratings with profiles of criterion ratings. Many studies concerned with classify ing patients on the basis of "Jechsler-3ellevue pro- files have studD.ed the sirrdlarity cf patterns of scores^ and Barnette ( i) has compared ps7/chor.etric profiles of occupational .groups© \- -"-Thj.s stud^/- was made in connection xvith Contract K6ori~07135 between the i i University of Illinois ai;d the Office of Naval Reseai'-ch, Human Relations Branch, ^'Teclmical Report r'--2, i^ril l?^2c -2- Other investigators have been interested in the possibilities of "inverse factor analysis", as introduced by Burt and developed by Stephenson, In Stephenson's hands, the so-called Q~technique i3h) has been app3.ied widely to the study of similarity between persons, and to the identification of types of persons. Fnedler and others (17^ 18) have used the method not only to compare one person to another, but also to compare various perceptions by the same person. An example is the experiment in which A describes himself along many dime.nrjions, A predicts how B ;jili descrjbs him- self, and then B descr5.bes himself. Three comparisons are possible, which might be said to indicate the "real similarity" of A and 3, A*s "assumed similarity" to B, and A's "insight" into B. In addition to the foregoing studies of i ^e eciivalence of one person^ s responses to another's, the statistical devices we consider are rele- vant to studies of stimulus equivalence. Osgood and Su.ci (26, 2?) for example, is presently employing methods like those we discuss to str.dy serr.antic problems by demonstrating which words elicit similar association patterns under controlled con- ditions. As another exsirrple, we find that sociometric data may be treated so as to indicate the extent to which tvo group members see the group in the same way, or so as to indicate the extent to which the two persons are perceived in the same way by the group. That is, we can stucfy the persons as social pero^ivers and alpo as per- ceived objects. The formulas we discuss are reDevant to all the foregoing types of investigation. Despite the rather large number of studies which employ statistical measures of similarity, there has been no coinprehensive analysis of the possible alternative procedures. In the present psper we state a general model which clarifies the problem of determining similarity of two score-sets. Within that model we compare the many formulas employed to date, and advance some proposals of our own. In our examination of procedures, we find that some popular methods, such as the procedure of correlating profiles, have serious limitations. The 10 se dfc t° i^'^-taaoq ■osii*.- ' b^2J- c-lo-ot' ■'•Jo bi ^i^-'Lr^Grrr.i •jL ■ I. vn. .ir te 3*B&.+ eQrir.Di-'ii"n^x6 7;'iKa: • -^.. • ^^t'-r^-^^-^b--:II 'XV^'l I : :■>'.: • ;.>-> -^ y;»ie.:.' xo'^ vi^-.v^f^o 'fivrii ." ^'.'no -u- 3 ansy of the variates a, b, c, which are k in number i any one of the persons 1, 2, •..•• N X, y, Z classes of persons x.^ the score of person i on variate j Considering only two persons, we have the set of x. (x^f ^bl' *••• ^kl^ ^^^ person 1, and the set of x.p for person 2« Without placing any restriction upon our data, we may regard the x as the coordinates of a point P-, in k-dimensional space. The x^2 define a point Pp. TVhen the variates are independent they are properly represented by orthogonal axes, whereas correlated variables are more appropriately represented by oblique axes. As two profiles become more similar, the points representing them fall closer together. Accordingly, we define the dissimilarity of two score-sets as the linear distance between the corresponding points. The formulas to be presented in this section apply to score-sets of mai^y types; viz., responses to a series of items, raw scores on a set of tests, profiles of deviation scores, individuals' ratings of a group of stimuli on a subjective scale, or responses to a Stephenson forced-sort procedure. We shall later discuss the fact that in some of the above, and also in the treatment iu^lied by convention- al measures of correlation between persons, points are limited to certain sub- divisions of the k-space. The formulas given here are as appropriate for these restricted score-sets as for the unrestricted case. If we assume the axes to be orthogonal, the distance D between any two points may be easily obtained from its square by use of the generalized Pythagorean rule, j^ I^ « Z (x - X l^. In subsequent formulas, we shall often use the symbol A. x . to refer to the quantity in parentheses. The persons involved in the difference will be obvious from the context. This formula defines the basic measure under consideration in this paper. irL^^'i-'-r >jnspi Ij. 8^&a-»ir?.8 ow aoiio32 Laid - ■ - ai^Iiloo"' ,rj2.aj ?c ten fi no E9T:::^De wst tfifnecM lo enri'^G £ oJ ? .'-'p.-t : v:M.:.-i^dx/c y no i-jjJifiXoS '„!:> qi'oig s 1o 5311x08'^ MoxJtJ.v^ a;^ J 911/ r: • A .-"^ - -5- We shall show that all the formulas presently used in psychological research in- volving similarity may be expressed in terms of formula (l) under certain stated restrictions. This points to the fact that in current practice correlations among the variates are generally ignored, lie shall discuss later how formula (1) com- pares to the true measure of distance when score-sets are correlated. However, it can be noted here that when intercorrelations are uniformly low, no serious dis- tortion in the ordering of similarities among a group of individuals is incurred by the use of the orthogonal measure. On the other hand, when one wishes to take the correlations among the variates into account, the advisable procedure is to transform the variates into an uncorrelated set, in which case (1) is fully appro- priate. Formula (1) and its derivatives therefore promise to be suited for most psychological investigations of profile similarity. With any one set of tests, the two most similar persons id.ll have the smallest separation, D, and also the smallest D • If the two persons have identi- cal score-sets, D^ equals zero. If scores on any variate can range from -oo to 2 00, as would be the case with normally distributed variates, D and D can increase without limit. However, the large values have only an infinitesimal probability. D and D^ result in identical ordering of individuals with respect to dissimilarity. A particularly interesting distance in k space is that from the centroid of a population to any particular point P. • This distance, which we shall call the eccentricity of an individual (E^^), is obtained by the formula: / ^ ,.. = .2 1 =v' ^^(^ji-^a.) . (2) The expected value of E for the population, that is, the dispersion of all the points about the centroid, is given by E? - 2 a2 (3) >: •* «;9i\t- tcl;:. frj lo jini ri.«*fryo ^r\.- ak. rtos CO ^a'bsoo-rq sX-rf-^Bc-v/:' ecli ^SrAiocs. oJ"nt roj:; i:i3v ©d^ ^noit> ^ f ■ *^ * j-?.OiT! io?- i39->i:!;{:!,y:i o.t as: '.laisfiJ" csvhtsvi-isb &?i: bne (X) Blsrnno'^' ♦©isxta ?-at V»V^rf iliV 3riO.- i^; -fcir^, owe- Oli^J »^J89J Ic ^^^.^ •;.' .yJ-.CTsfir'.tagiij r^S J-ooa.:^-.- -^J-jr^, ^I^:-'^ rvJifr- r^-^r-.^':.. ^^t:. n^v-'r .^: /•-:.p..-r S-, ,t • -6- where o^ is the standard deviation of variable J in the population. Since D*^ depends on the nuitiber of variates in the set and on the size of their units, it is frequently unsuitable for malcing comparisons from one score-set to another. A measure of distance in standard units is therefore required. The dis- persion of the points in a population provides a "yardstick" for this p-orpose, since 2 the expected value of D*^ is just tvrice the dispersion of the population. That is. 3-5 -: I = 2E7 - 2Z o^ . J ... - .1.. - .^OT . (i^) I'l's standard index, which we call S, is defined by the equation: ■ ^. 2 2 2^ ^' \2 ^12- ^12 ' -^ - — 1^=- . (5) •: ■ . ' D-^it 2 E^ . . When the measures used in a pattern of scores have been standardized on a large sample, as is true, for exainple for the Bellevue-VJechsler subtest scores, . then the standard deviations of such a reference group may be used to determine E • If, however, the only data available are those for a relatively small sample, then 2 'the best estimate of a. for the Toopulation is . ^ . 2 est a-. » N V. , (6) 3 N - 1 ^ T-rhere V. is the obtained variance in the sample. est E = rri- • J . (^) This is the value used in obtaining the standard index S^ , 2 ■ o S , like D% can range from zero, for identical score-sets, to infinity, 2 In the population used for reference, the mean of all S is 1, The large values • ii ' 2 •■•■.-. of S are decreasingly frequent, and for most types of distribution the probability -7- of large vaU.ues is irfiritesir'".?!* Correj.ated v;.ri'-.tt?g ^:d "he o'cliq'ue rricdcl . Tr.e f criT'.J. ns preatrtod above ignore any correlation £'^i^v:^ tlie \'ci%-^£'tcs, T;ie variates have b-^^;: I'^preo^iirlied in nn orthogon?vl model, -.:±th eao .dXic porpendic-Jilar .to tl.c ire-wSt, Ji corrciircion T.'ere taken into rcriount, the- v;:;ri axes would bs treated as a sot of obliquely Inclined axes, Such th?.t ti'S angle between £«€? v;o'-iid be small for hin:hly correlated variates. Such an oblique model is u'sc-d inniathcmatical statistics, because it tdces into ecco^-int oil the av-^ilable inf 6 rniat;: on. This obliq^Jie rr;6del undei-lies tlie development of the discrirninvnt fimction, the Ko telling T test, r.nd the .generalized distance iisasure ol HohL->lanobis (see IlDdges (20). Rao (30) ♦ "^^^^ therefore examine lio'j a dip- tance mec^ure based on the-rtore comprehensive oblique niodci differs froiri that ob- tained through the .orthogonal '.iicUei. • The problera which confronted l'iri-.alanobis, and others who have used his tech- nique, v:^!^ that of detcrrdning the clistt-nce between tt/o groups, raec-isu re has been used particvJ.arl7 in ahthropolcgical research, uhere the' purpose is to studj^" the similorrcy of racial and tribal. ^ro'.tps- on. phySiical ir;e asurements. His iormu].a is usually T.'ritten in the following; form ' (usiiu,' the. block 4D to distinc^uish this measure f ron our D): "' - 1)2 ^ 2 .2 (X^^ dj. d, • To avoid confusion, vre can rcv-rrite this in a notation consistent with ours: 2 2 9 ..., = 2 2,rf^^/^x^^x (Q^ Here C\ ^^ ^'1-^ jj' "Sloment of the inverse of the coriibincd vathin-grcuc covariance matr:jc Q^ ., • As Hah ala'iobis develops the probj.eri],' he deal.s with differences be- tween group mea^is, but lornula (3) can '.also be irterprcted as related to -'Jie dif- ference between individ..als., .The h^halanobis fornula gives the swie result as wo--J.d be obtfdned if the original' vari at es were .sT,a::.c!ai'di:::ed, and then axes -8- were rotated to any orthogonal set of vsriates so that formuXa (1) could be applied. In fact, since tlie computations required by (8) are impractical for more than a few variates, the usual method of dealing with an oblique space is to make such a transformation and apply (l)» Rao (30) suggests one transformation (out of many possible), x^rhich is relatively easy to apply. Suppose we begin with variates a, b, c,.., expressed in standard measure, and seek an orthogonal set a, b , c ,.#. These equations may be used: a = a b = ^ - ^^ab o VT^r^ ab c o c • ar - b r ac o cbp Vi . r2 - t\ ac "• cbo etc* This transformation defines b as the portion of b not predicted from a, and c o as the residual in c not oredicted from a and b • Then D'" determined from a^, o o^ b , ... is identical to D'" determined from a, b, ••• • 2 We may note several important properties of 3D , or of D obtained from standardized and transformed variates. This measure has a kno>m distribution and thus fcrm.s the basis for testing the significance of the difference between groups <, It may also be used to determine whether additional variables add significantly to the discrimination between groups. Moreover, D ^ is closely related to Fisher's discrirdnant function, and particularly to the proportion of individuals • classified into the wrong group by the most efficient possible discriminant function] it is therefore a measure of the efficiency of classification. One of the striking features of ID is that all orthogonal components in a set of variables I..- Ij *...: :.';:. f. ■-. 1 '<-•■;. ■'i - -9- 2 have equal weight in the measure. The consequences of this if 3D is used as an index of similarity between individuals require discussion. In any set of variates, some variance is likely to be due to general qualities or factors represented in several variates, some is due to common factors present in only a few variates, and some is due to factors found only in a single variate. This unique variance may be due to real traits specific to a single test, or it may represent error variance. Ordinarily, when we wish to investigate similarity in a domain, we are concerned with general qualities found among a population of variables, rather than with characteristics defined by a single sample of items-, We would like the similarity index obtained to be reliable from one sample of items to another, so that the same pairs of people will be reported as similar on both occasions. This problem is of greater importance in psychological work, where the number of variates is unlimited and some correlations between them are low, than in anthropological work, where variates are accurately measured and highly intercorrelated, and the total domain under study is relatively restricted. Stability of the similarity index from one set of variates to another demands more consideration in measuring similarity of individuals than in measuring similarity of groups. Unreliable factors will not discriminate appreciably between groups and therefore will not influence ]D^ between groups. Now the Mahalanobis measure, which is designed primarily to capitalize on separation of groups in any reliably measured factor, assigns equal weights to all factors, whether they be general or unique. In a set of physical measurements, it would assign equal weight to such factors as height, breadth with height constant, and so on. If D^ were applied to measuring the distance between two individuals on the Wechsler-Bellevue test, one factor might be general ability, and a second factor might be an element common among the verbal tests. If there were ten scores in the profile, however, there would be eight other independent factors extracted .siImu: Xf?>'fo t'iyiir.in.BW ; oS oi-r tc ot e i ^.s ■jr<.L3 &;• s-n/ nrrt:-.' ^xLi.^•s&i^ ^eoa^ttzv 1011s in- '^i -• io ^XqniBS 9lv.;ii"i bsiix r:-*--: r . :: '\^>- • '^•;,';,A'^ 'OSli:. .rr.r: : \-Tr M as bs^ioq-.'* 3C X„xw a.l oeiieq ■■••• : ••'■ ;'i-^.+ - '':•■■ ■^ . .rv 310rfv Til 3.' ^Ii,tOii .■.3df!iL':'. on J' 9^jS leii Y-L9'/ris«:.oi s.i: ^jbwjw i9Dnir Tr oi ".ate lis V 'rr, i?2 c.ir: '-tril o.-^j /I.' •. s.ii .1.X5 oa sJd^JLs^/ Isi/iJ3 a^igxL; :; ^o^^-^i l:)MU3;Ss:;. ■ cbn.r 0'iT+ n sonsd'B'xb c .-■'i rlO- and assif^ned equal -jeighto Host of thece Tjoald te specific to particiolar tests, .and nzny ci them irould be prinarily loaded vrith error of lueasurenent. Hence 3D would ass j. 2^1 as Tiuch Tienslit to differences in these unimportsnt and perh^s meaningless factors as to the general factor* This means th'at for particular pairs of persons D*- will be unreliable from trial to trial and from one set of tests tD another set chosen from the same general domai.n,'.' _ . /. satisfactory sol'iition vihich also t alee s into .account the correlation among the original variates is to assign .i7ei;;^hts to the transformed variates de?uiberately- Z^ch a^, b , c ,.,.,can be ais signed a weight according to its apparent iJiiportance^^. . before formula (1) ds applied. This wo".ild be especially .feasible if the orthcgcn:. j ■ • vai'iates were based on a factor analysis, so that the investigator knoirs wliich •scores represent important general qualities, and which are- unimportant residuals^ If the investigator knows that the Wechsler profile contains only four factors he" wishes to weight in the similarity index, h'e can assign zero weight to the unim-.' portant and unreliable factors. .It is certainly troublesome, however, to trans- ,. xorm variates, especially if factor scores must be estimated* Generally, a wiser '• procedure is for the investigator to make his initial measurements on a set of ■.. "• '• variates which ore neorly uzicorrelated, and each of which is important and reliably measured. This requires care in the original planning of an investigation, but once such a set is employed, form.ula (1) applies directly, and the similarity ;■ measures obtained would be generally stable if a second set of instruments measiu'- ing the sar.:e factors were applied to the some people. ■ V/e may next examine what hs^pens if D^ is applied directly to a set of Go;r'' • stcndai'dized correlated variates, with the correlation not being taken into-' Sec ovrnt. This in effect tal:es which are oblique to each otlier and stretches tne space to place them pei-pendicular (Fig. 1), If we express the resulting measure in •ill- Points 2 3 U 5 Points 1 B 3 3 2 a = 2 113 2 3 a lo^ lo5 Oo5 Oo5 lo5 b Oo5 OoO 0.5 0.0 0.75 .+ Oblique representation of origins! data I 'I Variates standardized Lstances shoirn are proportion'-.^, to the j} measure Variates treated as if ortiiogonal Distances shown are Dio"')ortioncil to the D rpieasure Fig, are 1, Effect on separation of persons when data are treated by formulas for ID or D UNIVERSITY OF IIUMOB -12- terms of the variates a , b , etc» considered above. 0^ d2 - A2 a -^ (1 - r|b) A 2 b^ + r^^ A^ a -^ 2 r^^ ^/ 1 . r|^ A a Ab^ + (1 - r? - r^^ ) A^c^ + rL A^a + r?, A^b + 2 r r ^ A aAb ^ ac cb^' o ^ cb^ o ac cb o (10) Here it is apparent that some factors are weighted more heavily than others. If we collect terms, we find coefficients as follows: A^ a:l+r^^ + r +... (k terms) ab ac A^ ^0= (1 - ^Ib) ^ ^cb^ •*• ^Lq ■"••• ^^ " ^ ^^""""^^ A^ c : (1 - r2 - r\ ) + t\ +... (k - 2 terms) o ab cb^ dcQ etc. ^^^V 2r,b^l-4 * 2r^r * 2 r^ r^ * ... (k - 1 terns) o o (11) etc. ^A ^. ■- ^^Z /'- «. / ;t -I^^a^ iS .'i^^ ■l•^? ^-^Vrr-S. i a-x?!/ UJUi. . ■ .+ ,"fjo ,,-.o r:?.:. -13^ Since by definition a^ b , c , •••• are orthogonal, any cross-product terms summed over all pairs of persons in a sizeable sanple approaches > zero. Hence in considering the weights of the various factors, on the average , we may disregard these terms. Then the coefficients of the square terms indi.cate the weights of the various factors. If the -original variates are left :>n.th unequal variance, those variances also would affect the weights. '(For example, the terms in /\^a would be a, + o^ r^, -+ d^ i^ ii^»*.Y, A B ab C ac ■ **•''• It is immediately evident that a factor which appears in several of the ori- ginal varirtes receives greater weight in D than a factor which appears in just a few variates. In particular, a unique factor receives relatively little weight, and for that reason D will be more stable than D from one trial to another or from one set of tests to another. =. i-'or sny particular pai-r of individuals it is impossible to evaluate the exact ■ weight of the various factors resulting from the use of D. This is due to the fact that the cross-product terms contribute to the weight of each factor. 6ince 'the product of differences may be either positive or negative, the factors make a different contribution to D in each pair of persons. IJhen a large number of variables are involved, these terms vjill tend to cancel out so that their sum approaches zero. Even so, it cannot be assumed for any given pair of persons that the crcss«»>prcduct terms are -negligible. Of course, the more nearly the original variates are uncorrelated the less the cross-products influence the resulting D measure. '>' ■Figure 1 sketches the transformations involved in using D and D.ad measures. of .dissimilarity, for two variates with substantial correlation ( r * •70). AB ' It is evident that both procedures alter the distance, betw&en points, unless we ^ • ■ . ^■^'^ •'begin with standardized variates (in which case D preseryes the distances im«"ere^ altered) or begin with orthogonal variates (in which case D preserves the distances ^ r^t a '• t ^9 9 4 unaltered)* In the figure ve note, for example, that originally 1 and h, are closer together than 1 and 2; but for both D and ]D measures, 1 and k are found to be fai'ther apart than 1 and 2» The conclusions from our examination of the relationships between ID and D are as follows: 1, >rnile ID is the appropriate statistic to use in testing hypotheses for significance, it is not a desirable descriptive measure of similarity for psycho- logical work because it places excessive weight on unimportant residual factorsc "t is relatively unsatisfactory for exploratory studies seeking to chart similarity :•' illations in order to formulate hypotheses ♦ 2, D has the advantage over ID that it will tend to be more stable from one cample of variates to another, but the presence of cross-product terms in D makes its psychological composition uncertain for any given pair. This same uncertainty cf factorial makeup applies to any other distance measure using an orthogonal model when varirtes are correlated, 3« If the investigator chooses his variables so that each one 'is' im.portant and so that the set is relatively uncorrelated, then D is quite satisfactory as a descriptive index, D will be stable from one set cf reliably -measured variates to another, provided the sets are "parallel" in content, i»e,, designed to measure the same factors. (If variates are largely uncorrelated save for a single general factor, an inde:: Dj, which \ie shall introduce later provides for an altered v:eighting of such a general factor,) Pearson's CRL» A precursor of the Mahalanobis measure was Pearson's co- efficient of racial likeness ( 28), which was likewise intended to measure distances between groups. In its original form, CRL was essential.ly the same as our D , save that each variate was expressed in str^ndard form, and that a multiplier involving the number of cases per group was included, A modified form of CRL which allows -15- for correlation among variates was also developed, but was not used because of the computational labor required by it. Except for the factor representing number of cases, it is essentially the same as Mahalanobis' measure. Many of those who tried to use Pearson's index in anthropological research were dissatisfied with it* The criticisms arising cut of its sensitivity to d 1~ ferences in number of cases from group to group are irreDevant to oiir so arch for measures of similarity betv^e^^n individuals, Morand, :ji discussion of a ppper b^ Rao (30), notes that the form of CRL which ignores correlations has given mv-eiaTo > able results in some anthropological research, notably when the index is deteri.ilv ;' for groups which are intuitively or theoretically quite dissimilar. This Bpr^epvz from the context, to be a consequence of the high weight CRL (like D) assign.s -Lc any general factor having large loading among the variates. High correlations are usual among anthropometric measures, /. solution to this difficulty appears .--o be an altered weighting for the first component, such as our D (see below) pro':lda3„ w Choice of ^cale for the measurement of dissimilarity - Although we have de- fined D and S as measures 6f dissimilarity, formulas have been presented in terms of IT and S • It is evident that for the purpose of descrirtion either th?^ linear o measures or their squares could be used. Both CRL and D' c:cp. p.:r>r'^s'zc'i ±a t.er;,ij of the square of the distance. It should be noted, however, that these measures were developed to test whether groups differ significantly with respect to the lineji' distahce between them* Other metrics might also be chosen as measures of s.i.'D.lai'ity, C^tL?]]. p:d- poses a transformation of S^ such that the values will range from 1 to -1. The usual product-moment correlation between persons may be obtained from D" tr a transformation of the form 1 - cD^, The choice of an expropriate scale Qep«-" r.d.s upon both theoretical and practical considerations. One desired property of a descriptive index is that it convey to the reader a sense 01 the magnitude of the quality assessed, D, which is interpretable 93 ''^.tb STOl" : ■ ■:•:- .3 u. ...:.-. -i>nJ: •^' -^t?-' vt>T' r^;::]. •vTT'S;^^- ,<-.-...r,- ■^•^:T/-' ^, ^.r; r,t ,.+xo^noo 9:i:J" eno'j't -. -r .?no.T^ gaibiioi sg-isl gnx^i^r: 'iodo.- ■ 'naa ^c^^ , rt CL, ^ o I -P -20- 0) Q) o ;:! ^ fH Q -H ?^ C ch -P •H O o ;3 -p c: o -H 0) ^ rH •H x: -H Q ^ 00 4:: •H S C ^ -P O -H •H M h -P C ro H S cc; Ch -H -p Q) a, 0) c-i o rt fxq .Q (D CO CO H -p d 'D CO •H -P 5 w ru rd c4 a P-i -P •=3; '3 c o >s'H -P ^ •H -d jp^, ^1 CD tn id Jh 4^ iH ?3 C •H CO e ?=: cd O •r-i O O o •H -P cd H u) -P O^ Ci cd o xi OD CO H CO CD CD H H •H Pk A '^ Q 1 H & CD •H CO a ♦H •P cd CD > e^ 0) cd H ^ pq u:i ro Ch CO CD fl ?4 7i •H CO CO +> CD rt M H Q) t; p; P.'-H s u u 3 Ph fH « ^ fH +> 1 C CD fH Q) -P CD P p. H >5 •% ^ (D CO G ^ -9 •H H CD cd tsfl ^ •H 1 p^-p ;h CO ^ ^ &H "VH O »» CO O bJD 1) ^ G -P fH -H --i ^ -H •» G ^ Q C? Cu CO G O •H 1^ rH CD U fH I G-->. CO O CO •H G CO Cd xi O G § •H CD ^ Ti 4--> xi n O P< — ' O ^ CD 5_, -H -P W O fH CO & •rl O •H S^ -P G o o 01 CO to u CO +^ -p CD rP 1) 0) ><; X CD t^ i^i^ fi fxq > W > c; « Fh o -p I G TJ Jh O -H (D CD O O O a, ^ <^ ^ 0) G CO ,M CD H CD O Jsl I P-hJ ch CO cd o >i -p •H G •H •H fH fH •+^ a) •P cd -p G > ■^ CD Q) cd H r^ CO w G 'H jj CO CD CD *> Ph ^ a> Cd Cd H -G ^ fij CO CO H "^-i H CO A cd G uU ::i G CO (-^ c d •H 00 b fH •H CO is +3 CD •H tJJ +-'• CD •H Q fn iH -O fH H 4^ -H cd -P CO •H Cd -i-' Ph fH Sh CD ^ G> fn fH CD '9 ;ih P^-d Id S Ph PhTJ cd ^ 1 Ti M C CO G >-l G c3 ^ CO 12 4-> fH •H j-3 CO -d CD CD P^ Cd 4^ •\ N •H 7^ G •H X, > OJ Xi CD ^ CD w Cd « cd fi cd H H CD ^ '^ H-"^ 1 G G CD U -H ?H CD 1 -P X: ^ G ^3 CD -H ^ fn CD d CM Ph G C(^ x; CO CD 1 !=^i^ ^ CO cd p. •0 ^ -21- Eccentricity- Elevation - (times^) scatte / D : Data in k space D* : Data projected pi^ k - 1 hyperplane D" : Data projected f rom k - 1 hyperplane to k - 2_hfxjiersphere D* : Data projected from k space to k - 1 l:^rper sphere Figure 2, Projections implied by various distance meas\ires \ vS.'t-^^r-Tl •?l'i"J£)! >f;oi7 ^"^^B'l : "(J T - >i o ^jox-isv -^vi beilqi-LC r-ino.t;^03tati -22- Scatter is the sq-J.are root of the s^om of squares of the individual's deviation scores abcut his own mean x~ ^ That is, it is v./k~ times the standard deviation ci" within the j^rofile. Using E! to represent scatter, and primes to represent scores expressed in deviate form. E.' =V 2 (x -.rT)2 =x/r3c"T2— (15) Shape is the residual informat?lon in the score-set after equating profiles for both elevation and scatter, VJhen we change scores to deviations about the person's mean, we develop new scores x' , which are subject to the linear constraint This removes from the scores any information about the person's average. Even' though there are k scores still, there ar-e only k-1 degress of freedom. Tv^o people whose scores in each j are separated by a constant amount have the same profile of x| • For exaiiiple, suppose score sets in k space are as follows: For person 1: 2-2032 (Elevation is l) For person 2: -I; -2 1 (Elevation is -1) , Then the deviation score-set for either person is 1-3 -1 2 1 , 2 2 For these people, B in k space is 20, but D' is zeroo We shall use the prime to refer to measures in the k-1 hyperplane. I'Jhen a profile is subjected to one linear constraint, we have in effect, projected the points into a spp.ce of one less dimension, which we refer to as a k-1 (diiiiensional) hyperplcne. Ue place our hyperpla^ie through the origin, per- pendicular to the direction representing the elevation f actor « Each point P pro- jects into a new point (P*) as the figure shows «, The distance PP' is the elevation, and CP' is the scatter. \ V^-i- -23- It should be clear that k and k-1 spaces yield different information about the similarity of persons, although measures of similarity in both spaces msy be of value. Comparison of deviation scores is most frequently found in psychology in studies of '.lechsler-Bellevue profiles, vjhere atteripts are made to interpret the shape and variability within a subject's profile* Burt also deals with such devi- ati'on scores when he employs covai-iances rather than correlations betifeen persons to obtain a matrix which he then factors into types ( 5)« Eeicre discussing further the measurement of dissimilarity in k-1 hyperplane, let us consider how such score-sets may be constrained to lie on a k-2 (dimensional) h^rpersphere, A h;,-persphere is the locus in space of points all of which have the same disx-ance from some center. This geometric property is imposed whenever all score-sets are subject to the constraint that the sum of squares for each set is a constant. But it miay easily be seen that this is precisely the type of constraint which is imposed by standardizing a set of scores; ioSo, dividing by their standard deviation. Dividing by the scatter of the profile has a similar effect. If we divide each deviation score for an individusJ. by his scatter, this results in a score set (x'l.) for which 2 . 2x' '^' ji Since, whenever scores are constrained as in a set of x" , the sum of squares is a constant, differences in scatter ar.iong persons have been eliminated from con- sideration, just as differences in elevation are eliminated when scores are express- ed as deviations from the person's mean. Conversion of score-sets from deviation scores to sets of x" has projected points from the k - 1 hyperplane into a k - 2 3 hyper sphere with unit radius «j This is sketched in the third panel of Figure 2, (Because our sketch is based on a set of only three variates^ k - 2 is only one^ and the sphere in this instrnce is reduced to a circle^.) lie define the measure of dissimilarity (D") on the k - 2 h^-persphere as the distance between score-sets having unit-scatter. i\[e might have divided scores hy their standard deviations, which would have placed all points on a sphere of radius sykT Distances on this sphere would be a constant multiple of corresponding distances on the unit sphere. Eliminating differences in scatter from consideration is widespread in present statistical studies of proi:les in psychology. Sometimes this is done consciously, as v;hen Stephenson asks subjects to sort descriptive statements into piles with a fiiced nuinber of statements per pile, so that the resulting scores for each person have the seme standard deviation» More commonly, standardization is introduced through a correlation formula* The product-moment formula, for example, divides cross-products by the product of the standard deviations, and thus staiidardizesa ether forrr.ul?^ such as rho, Tau, and r have the same effect. Our diagram shows ps how points P and Q, which appeared reasonably near each other in k space because they are quite similar in elevation, are found to be fairly distant from each other i:hen measured in k - 1 h^rperplane; and diametrically placed, i.e., virtually as dissimilar as possible, in k - 2 hypersphere* Differences of this sort make it operative for the investigator to decide on a rational basis which type of score- set is to be his basis for studying the relation between persons. The k-1 hyperspheres, which have not been used in psychological work, have properties of considerable interest. Such a distribution of points is obtained by dividing the original set of scores for each person by the squrre root of the s^am of squares. If the original variates are measured in meaningful units with an absolute zero, then the square root ox tlie sum of squares, vjhich represents the distance of a point from the origin, might be considered to be a measure of overall "size." Division by this measure extends all points to unit distance from the origin. Tito score sets which are in the same proportion, such as (ii, 8, 2) and (2, [j., 1) lie on the same vector and thus project to the same point on the hypersphere e -25- Thus proportional score sets have zero dissimilarity as measured on this hypersphere, just as geometric figures for which corresponding sides are in proportion are term- ed "similar". Dissimilarity may be measured in terms of the distarxe between the points on the unit hypersphere (D*) or in terms of the cosine of the angle between the vectors. More appropriate to p ychological data is projection onto the hypersphere with the centroid of the population as center. This projection is achieved by dividing each score by the eccentricity. Thus differences in eccentricity are re- moved from consideration. All persons who devj.a.te in the sair-e di.rv?ction from the group average are projected irto the same pc.'xt and thus are considered to be the same "type". The measure of separation on this hypersphere is designated B*. R elat io ns between measure s of disslmiJ.arity for ori ginal and d erived 5Cor<=^- sets. Formula (1) for D, and (5) for S, r.re equally correct vrhether data occupy the k space or are conf jjied to a smaller space by one or more ccnatraints. It is of value to compare the indices by examli-^jig the effect of treating the same set of data successively in the various spaces, T/Ve begin with the relation betxveen D and D', 2 2 2 ?- D»^ = 2 A xl = 2 A X. - k Ax. ^^ j "^ j •^ (17) 2 The first member on the right-hand side is D and the second component is proportJ.on- al to the difference in elevation, /\^x, ; i,e.. 2 2 2 D' * D' - k A ^x. 12 12 - ^g^ On the average over all pairs, "~? 7 2 2 DK, = 2^ a. - 2k 4_ (19) 2 Here, a_ is the variance of elevation scores, over the population of persons, 2 2 g,2 ^ D' - k Ax. Pfr.a^-kai ) <^^^ J >i ~ •< Ji a - -^ - SI^ — ^^ . ^ •=>-. ,.. :■'. i-v ^. ' ■■^R -26- These relationships between measures of dissiinilarity in k and k - 1 space suggest the possibility of constructing a new measure of dissimilarity in which elevation is given any desired weight w« Such a measure would allow for weighting the elevation and shape factors to predict a particular criterion, if one is avail- able. It visa permits reducing the exces^^ive weight the elevation factor receives when variatss pre substantially correlated, as for the investigations where Morand found difficulties with CRL» Suppose we denote the new measure of distance by D • w 2 2 2 D = D' + TrJ k d X, w (21) and w D«^ + w kC^^X. 2(Z a + w k a^ ) D^ - k (1-w) qSc, 2(Z a^ - k(l-w) a£ ) (22) VJhen w is zero, D.. reflects differences in shape and scatter onlyj as w approaches w 1, Dj^^ approaches C, Because of its flexibility, form.ula (22) (or its numerator al oro) appears to be the most suitable basis for deterni n in g similaidty of persons, ¥e shell discuss below some reasons for this recommendation. The relationship of the measure of distance in k-space to that on the k-1 hypersphere msy be derived from the lavj of cosines. For the hypersphere with center at the centroid and radius equal to unity, .2 ,„ „ x2 '■^ - ^2 - (^^2)' h2 hh 2 2 (23) where <^E is the difference in eccentricity of the tvio individuals. Since the dis- tance measure is defined for a unit hypersphere, the values of D* have a possible range of to 2, regardless of the number of variates involved. Thus, when score- tjets are divided by their sum of squares the measure of distance is comparable from one set of variates to another and there is no need of further standardization by a measure such as S*# H >*-- CC^ -27- The relationship between D' (in the k-1 hyperplane) and D" (k-2 hypersphere) is analogous to that which held? between D and D'« 2 ,„. „.^2 iy,2 , D^^ - rr.; - E«) „ D»^ «A^Ei J. 2 3. '^ (2U) vjhere E' is the measure of scatter. D" may be written in terms of the D measure from k-space as E» E» 1 2 This formula shows clearly what types of differences between individuals, reprer^^ .vi- ed in the original data and in D'^ are discarded uh'^a -we employ only k-2 space ir,- formation. One of the subtract aci tsrms represents dlCf^rences in elevation; the other represents differences i:: ?c2tter« Here, again, since vre h?y? d(?i?ined D" as a measurement on the unit sphere, the values range from to 2 ( OO'"^ ^h)* The expected value of D" is T^f - 2(l-k 4, ) , (26) where or,, is the variance, over variates, of the means of the scores after dj.-" vision by the scatter. It msy be noted that the average value is ?, ?_f »ni o;ily if all variates (in derived score units) ha\re equal -T.^r.n,5 ov3r clX persons. TtSs 2 is of partic^ilp.r interest h^c'use of the close relotior-shiD of D'' to the corre- lation measure frequently used to show similarity bex-w^en persons^ which V3 now dis- cuss. Co mpar is on of measiires in k~2 space. Let us specifically consider the relation- ship betwaen Q, the correlr:t?.on between persons, and D", It is easily shovm that D"^ = 2(1 - Q) (27) -ys^. :>',.! '-".'i X "*■.err?:-'ii ntoii fyHJassn ^'lilJ^CSKv S.. y - '3 'H 2 X •V-- .'-ir •> t^ - c -^ aril .(ii>-»»a>o ■ m.o'i.'l ogm'i s^tfl vci-.; / ..:ii-f-i)^^ - ro ',B\ A\'? ^'rA': ssini 3/: -J ic, ^5:^ji:ix-itv ri'dvo :"x?,eQ 3 J. 'f.'jtt'-o'^ ^>/i* '^rf -28- 2 Thus it is evident that our formulation in terms of D" encompasses any results obtained by product-moment correlation among profiles, including those from Stephen- son's forced-sort method. In particular, any distortions imposed by use of an orthogonal model for correlated variates will affect studies using correlation between persons. It may be noted that average Q for a population is zero when D"^ = 2, But we have seen that this is true only when all variates have equal means. Thus, if items of unequal popularity are chosen for the sample of traits, the expected value of Q is greater than 0« Inclusion of some items on which members of the sample tend to agree will increase the correlation between individuals. Some implications of this will be discussed later. The three prominent correlational procedures using ranking are rho, Tau (23), and Dul^as* r (13). Rank-correlations are sometimes used in the belief that as- ps sumptions regarding the test score metric are thereby avoided. . This is not the case for rho, I^ien each score is assigned a rank, the separation between two adjacent ranks is fixed, over the whole range. The result is that all profiles are forced into the same rectangular distribution, just as Stephenson's forced-choice sorting forces all profiles into the same normal distribution. Such forced distributions appear to be fully justified only if the investigator regards a particular dis- tribution as most likely to represent the nature of his profiles. Usually rho and product-moment correlations give about the same results for a particular set of data, Kendall's Tau gives values substantially lower than rho. It is a rank corre- lation based on the direction of differences between all possible, pairs of variates, Tau is preferred to rho in some studies because its sampling distribution is known. If Tau S Tau , then rho^p> rho , in almost all pairs of cases. That is to say, Tau is very nearly a function of rho. Analysis by Tau will therefore yield con- clusions very like those from rho, and both of these will be reasonably close to r.s ^♦•G •:• ■^B^.lp. QC i-J-iu a. ,b:r>M.-— ^ r '^ S-^*' Si^J" Jn^- 'i VV;7 2J^ CJ^i..-:,.: ,.oili Kr^r^dt rrDilJ ; f ' -29- 2 results from D" and Q. One difficulty with Tau is that the number of conqsarisons which must be made increase r^idly with the number of variates. The third coefficient, proposed by DuMas-;?-, is simple to compute. Kelly and Fiske drew our attention to the fact that r is an approximation of sorts to Tau. I/here as Tau calls for considering all possible pairs of variates, r uses only ps the adjacent variates. I.e., if a prof ile is written in a certain order (Compu- tational, Scientific, Mechanical, •••), r would consider the direction of dif- ference between Computational and Scientific, and Scientific and Mechanical, but would not use the difference of Computational and Mechanical. Rearranging the profile in different order would change the correlation, for different pairs would now be used. If the arrangement of variates is a random selection out of all possible orders, or if the variates are uncorrelated, r is an estimate of Tau. If there ps is any rationale underlying the arrangement, r is peculiarly biased. Consider , ps • a Wechsler profile of five verbal and five performance scores. These will con- ventionally be listed in that order. For this profile, Tau would be based on U5 pairs of scores; 10 verbal with verbal, 10 performance with performance, and 25 verbal with performance, r would use only nine pairs: 1; V V, U P P, 1 V P. ps In this example, r„„ is determined almost wholly by the smallest differences in ps scores, which are least reliable, r would therefore be lower than Tau for ps Wechsler profiles, and possibly by a large amount. Because it uses relatively little information (here, 9 pairs out of U5)^ r is expected to be inexact even when it is unbiased. ■K-Incidentally, DuMas (13) suggests chi-square as the preferred method of estimating similarity where a more precise approach is required. This suggestion is unsound, since profile entries are scores rather than frequencies and chi- square cannot be used with such data. .... * 1 V ■ - . f:q ... -i-j*. lo r:o.;;v:ooiJ:r; SiiJ i'y:^i::noD ;?:j;ov l ^\ aq t\*«« r Xc.UX»^.lr.3ii ,C.. Ij:Jiii2I0ti .u.:'..4CXw blue- Giiiat! J^.tstslllb 10*5 ,.'.■ 1700 "^d^i ^^ vterfn f;rc/ov T^/bio jr..*tf>'i1|f> ni ofx'>o^ i-ii .4.0 riro noX't'ic'-ae ;Tir:>i r;-.u ^^ i;.. bi J-* .'^c^ lo .j...'Jx'l:;JSi■^^^"^■--:! i'-iJ i- -*c vf. '.'. -fioo XIxw ae.^A'T •ea'rooe t^oae/rrxo^ isq Q-rrl fc/it, jjsc'-jov svxj: lo oLiloiq 'tf^.fErfo-^V. £■ .^.aao'J srf hliJGi^ i..^ ^olx^otq 2xrW ic'-i ,i^bi(^ j.ifJvt Ai; bstix.' ' •' ■........;• j-U-if^v ^S loG jsonsnic'l'xt^q rijiv sur^simo'ii^q CX ^Isdiov d^ir 5Jid'iov OX :cu>'sc;;?a lo urlsa -.sosioialllb ;t3f»XXi3ifr3 9.«ii" \>.^ XXovLIaIst; SC3J/ ti e3i.sD9g ,.TciLOf!TS 3.3'ici' s x^f Yi::^-afeoq has ^aeXJi^ciq laiaxlos*: , j'^S' ; :; U'- eir cti asthr ■J Ic boilism h3f-i:t?l«r.q f>i.,+ o.s c'tS-fnp.-xrfo ei^e'^nL'n (61X) '^isMi/'i , vlX'^^'trr^^bJ-'jnpJ- -xda bra? Kt/iDns/i/p&rl .isii."? ioAnJst 59*soob f^Xo -i ■r-^j.'r> ol.c^oiq *5:n":xa .Vro/oenL' si .^j-3i- ;io^c acrx-sr i'^t:- rd' >JoAnso o-XGi/pa -30- Basic Decisions in Profile Comp ar isen The comparison of two profiles will give different results, depending upon the investigator's choices at several points in the planning of the investigation. During the development of techniques of Q correlation, there has been some con- fusion and dispute regarding these matters, but at this point, Burt and Stephenson, at least, seem to be in agreement on the principles underlying the method. Many of the issues have been discussed with exceptional soundness by Burt, in The Factors of the Mind , Chapters Vi and XI C^df* ^7 one who proposes to study relations be- tween persons by Q correlation or other measures should examine Chapter VI with care. Although Burt discusses specifically the use of Q correlation in factor analysis, the same questions regarding metric and domain apply to any descriptive studies of relation between persons. The investigator must define a trait-domain within which similarity is to be investigated. There is a certain amount of loose thinking regarding the concept of similarity of persons which occasionally leads investigators to regard their studies as an attempt to determine which persons are generally similar. Such views are encouraged by occasional references to Q- technique as a method for studying "the whole personality". Actually, the investigator, either by plan or by the necessary j.jjriitations of any instrument can study only a relatively limited segment cf the person, and it will be noted that Stephenson himself now places great em- phasis on the proper definition of the segment of personality to be investigated. The investigator defines the domain where he is seeking to investigate sim- larity by four choices: 1, He chooses the set of variates. 2, He chooses a metric for each variate. 3, He assigns equal or differential weights to each variate. k» He decides to study similarity in k space or in some restricted portion of the k space. •Jt- aa 5.?. 8X ari^ '1 *v' -31- The investigator can make each of these decisions quite arbitrarily, but he is more likely to arrive at useful and scientifically meaningful results if he has a carefully-considered reason for each decision. Different decisions will be arrived at in different problems. In the use of objectively measured variates, such as are used in anthropometric studies, the appropriate decision may be dif- ferent from the decision reached in designing a study of subjective estimates of personality'-. This is a departure from Stephenson's view, since he always em- ploys variates restricted to k - 2 space. Choice of variates» Similarity is always similarity in some respects . If we know that two people are quite similar in ten different characteristics, we cannot infer that, on some other set of characteristics uncorrelated with the first ones, they will be any more similar than randomly selected people. Since the number of characteristics i^hich might be the object of stucfy is essentially unlimited, it is reasonable to expect that people who are similar in one respect will be quite dis- similar in some other domains of behavior. The domain to be studied will have to be selected with care.^ One group of qualities especially promising for investi- ro:':.±ons of similarity are pervasive and general variables which affect performance ir many situations; examples are general mental ability, cultural background, and s^.ie of the comraonly identified personality traits. Another type of variate which r:>v be profitably used is the more specific qualities which seem likely to be as- C'.:- 'elated irith some criterion performance with which the experimenter is concerned. I'or example, in study of performance of a military group, it might be appropriate -"-This comment also applies to measures of "empathy" or "diagnostic accuracy." There is little evidence that the person who is able to judge one quality is also a superior judge of other qualities. ( 8 ) s.r; <• . •; .r.riist.trd'xs '^JiL'D e S9 S.\ :i:n9b n no: axiJ -rfs •,■■»'- ", v.rr ^VDtV ^ » • ©W 5 • vrTC.- ;fff:» .^';•t-I > A, « :c<92 r. ,T' '••:•■. 'rro'.*" o ,* * <- r- -32- to determine the siiaiiarity oi members' attitudes toward being in military ser^vice, or attitudes toward military discipline. Having defined a domain of traits in which he is interested, the investigator might, in theory, dravj a random sar-ple of traits to be measured. Ke obtains much greater control over his investigation if he uses a plaimed or stratified sample in which deliberately chosen chsractoiistics are measured as reliably a,T possible. Such a procedure is exemplified in StephenrorJs recent use of a "factorial design" for selecting variates (3ii)» In general, the more frequently a quality is represented in the set of variates, the more weight it has in the similarity measure. If items are gi-cupcd into unc or related subtests, each of which has knoTjn variar.re, the vari mce 'jf the subtest indicates its relative weight in the total. It ip therefore n?y:.'o;>i*ic:t>6 to include a greater nuni^er o-f items dealing with qualities wliioh r^.r.ri ^rp3ji.?I]7 'jCi- port ant for the investigation. Choices r egarding _me^ric« All psychometric studiea rnu^it xy-ikf'i Frvre rvjr^-jnrr.tton regarding the metric or scale units in which the variates ?.ra m3:-.':iir-=;c'.r LXv^opt in very liMted problems, psyt^hologists and educators he-Te la.ckL»o. fcj'.c^ wiv-h f.^cjul units, or scales in which a unit on one scale is exactly co.av^p.-r^bLc to a .'iLit c-n the other scale, representing equal amounts of the propert.ies b?:'n.^ T,'3ur:.->ire.d,. Jn fsot, • it is doubtful whether comparability between scale's can. ever bs eGt&b?.ii?hpd savo zy ■ arbitrary assumption. Yet, any study of s jjnilarity ox p erg. in? de^^nd-: fx,--.8v:rpti^^nn of comp arable units along a s-cale and bet ween scal_es_j The investigator must choose for each variate a scaSe cucb that h- j-af:ardG on^e unit as representing the same airioimt of the property at c']! points cf the scalo^ If' the investigator does not regard this assumption as valid for. ^^oia .~.if].3, "lo '^hctld transform the units to a scale he regards as more near-ly line-stc with i'93p*Dot to the property measured. In most psychological studies, so m\ich error is present to ci^- scure relatioriships t,hat jE'siiare to obtain a sga^ of eq^ai ixkt^i^^f^^. will have ^it-tie effect upon the conelusions. VJhen studies turn to more precisj^iy measured vaar^iables ;i.£.iq.L0v3.ri> \iii-t. irfiT«».+eb ^ifi ';■ ^D-:y >e loJ fU x1. xc ,R dhu -33- variables than present psychological tests afford, this question becomes more , crucial. The second assumption is peculiar to profile analysis: the investigator must assume that one unit represents the same degree of similarity on all variates. Comparability of metrics is unlikely to be testable in most problems. The as- sumption enters research on similarity because a two-point difference in Block De- sign, for instance, has the same effect on the index of profile similarity as does a two-point difference in Arithmetic # The use of standard scores is only a device to improve on manifestly non-comparable raw score units 5 the new ur.lts may also lack perfect compar?bJlity, and to that degree studies of profile simil?:''ity con- tain error. The irr/er^bj gator may modify the units to make thoTi more crmodrable to one another, in whatever respect concerns him. If he regards one mea<=!iire as more important tb?n anoth'?.r in an overall estimate of similai'ity, he nay deliberate- ly assign larger units tr> that variate* When an invesbir^ation vlcaLs with objectively measured v?riates, such as physi- cal measures or test scores, the metric is altered by standardizing, weighting^ and other transformations. We can illustrate choice of a mstric by referring to the Rorschach M src-i-e^^ W^en an investigator uses raw scor? ■'jnit?;, he ?5 couDting the difference frrra '^.er^o M to 3 H a^> equal to a difference fro^ ?C M +c 33 I'.. If he normalizes the score, he will weight the fonner difference mrre heavily because the raw-score distribution is skewed. Normalizing would be ?:lvj sable if, on psycho- lo;;;icaL grounds, the inve'^tigatcr :^egards the differoncq from to 3 as mere im- portrnt than the second difforencer, No general recomirondation can be made as to whether a variate distribution should be normalized or not. The variance of each cbar3cteristic over persons, ?nd. therefore its influence on the D mea.-^ure, T-rill be debe^nliied by the choice oi' Uiirltj,, Zi variates dre ex- pressed in standard units, each variate is assigned equal weight. Now sometimes this is quite appropriate; it is common in identifying physical tjrpes to express length of . -Ill 'Sics'i: " i:.o->il.. 1:0 e:*,;.f J^dT kOlvtemrlii ■,-..•-+' bo'ssj ;;i9Si/ io.: -rr.rr- .;-c:^ TO ^rv;i. • nose and length of limb in standard units so that they make equal contributions, Conceivabily, in a study of resemblance in appearance^ such equal variation would be in^propriate. Does a person T.ho d=;p?-rto iroii''. tlie avercige by one 'jtancJard .'.eviction in length of eye-bro\^ seem as distinctive p.s one vjho departs by one 3:> d:> in D.angth of nose? The selection of weights is ordlnarilv arbitrory, and equal ^/eights (ire#, standardization of variates) is often the bent arbitrary'" cho-'.ce-^ Jf a c-rlteri^n .'.s available, optimal predictive veif^htc ^.17,'" be nelecl-.ede Tne discrjjninri-it funotion iis a device for weighting variates to Tnaxirjl'7;e separation between criterion gro'^os. When the measurements are subiective, the chojce of metric TD3''Poent': -rurther difficulties, Subjecti-'-e rr.tin.^s ave ^is^.4 .in r'cr.dje^ of esthetic j.-r^rc-ronc-,'-, ?jn self -ratings, or in ratings of other nerncnsr, Oi'.e msj standr^rdiz.^? the ratings ?.s- signed on each variate, but this assumes that the stimuli judged are equally variable on each quality. Perhaps it is rncre rcascnable to suppooe, for ex??Tj]e. that pupils vary much more in soc? abiD.ity than in v-'-bedience, Ratings of djlferen-'j qualj.ties can sometimes be made more cor.iparab.'Le oy clefj.nin^ ■'..he points p.lcng tje rating scaJe explicitly. Sometimes the ratings by a person can be expressed in terrns of his j,n«d. Sometimes one can accept the rating sca3.e as a scale of pni;a],-?ppearing in- ten''als. After ccmp arable subjective judgments on the several v?rD ?tes ^ro oVcffi.ned, differential weight-i according to supposed njnport?nce ma3'' be ar.sign?d if desired**. . . Inclusion of elevation in_ the difference measure. The dcnain is farther defj.ned by the decision to use k, k - 1, or k --2 Fpp.c^.r Elevrition is defined by the rverage . of a person's scores. It hD5 an obvious me.^ni'^g in the 'fechs?.er test, Tjhere els- vation is essentially an overall measurs of ability. In a Por3ch3ch score-set, ele~ . 1 . - * ' ■ vation represents responsiveness, being highl?/ correlated with total R, Ko]zinger _;. . .,. (21) has demonstrated that the average of -scores is heavily loaded with the first ■ .'_ '/.• principal component of the scores, i.e., with the general factor or other .frequently; ;•'. represented factor, Thus^ if scores are ^.correlated, elevation represents the comiroii ;^:". thread among theiu. On the" other hand, if scores have low correlations,^ the elevation'".. '■-■'r'.l: cit es ."[^eis v/oxd-iivo ■■; T;:,<'. i:>ilG. •^fd^Hs ^ a":;' [\.}B^ ilO i;' ■.'j'l ax : csrrr^b ad! ^■v£o8 »l»^:i. h: '^o '^•X'J'R.''-' .sxjn^ fioi: 'qjo& i'Jxir bf^inlaiJ .arVJTt \r.C.Ti"t^. -^1 /"'ntir ■ ,r,?ts r^-riO''» --^t- asTcooe 1.: c- ^onse^rrqsn: ■^Id-'i'i • Vt-i- >_.- -. -^^ ■. — ;. ■ ~ 1 ^.j^i' score represents a mixture of factors and has no interpret able' significance. A.. ' . ;.:;. ratio based on the sum of interit^m covariancfes has been suggested (ll,ly ) as an , " index which generally reflects the extent to which elevatnon represents ^. •" '" common factor^-. If this ratio is large, one can regard elevation as saturated with ... .,.'■•'■ .. . » . .1 ■ . \; some psychological quality. If this ratio la small, however, elevation lacks % ■.-.psychological meaning. In fact, the elevation component may be purely arbitrary' if scores are uncorrclated. For instance, many person Jdity profiles could be scored •■■ as logically' if the direction of some variables were! reversed,, submission being counted instead of dominance y . for ' example. " This "would lovjer the elevation (avers^e - on all traits) for very dominant persons, and raise it for submissive ones. Such reversals do not alter D in k space, but they do affect D* and D". Stephenson attempts to avoid this problem vrhen he obtains self-descriptions from a set of variates which includes a statement and another nearly opposite in sense. He inight have one submissive statement, and an opposite dominant one. For such a balanced set of variates, the elevation should be near zero for each person, and ar^r non-zero elevation score could be safely disregarded as due to inconsistency of response. Elevation can be considered a meaningful score rather than an arbitrary com- posite only when variates are generally correlated, so that the "positive" direction of each can be deterirdJied operationally. When the elevation score is interpret able, one can decide whether differences in elevation sho^ild affect D» Sometimes it :'s wise to include elevation in the difference measure and sometimes it is unwanted^ Often the elevation score represents a response set (lO) such as tendency to say Like to interest items j.n general, or to say Yes in checking descriptions of symptoms. Investigators differ in their judgment as to whether such variables are due to transient verbal sets or. are important aspects of behavior, and, indeed, response sets seem to involve both qualities. If the investigator wishes to include '., elevation,- whatever it laeasures, in determining differences between persons, he ; should use the full k-space data. He may be well advised^ however, to use a special 1 .ax-^.u-. •ai.: 4 viitn .r:^ij;i 'j; ^ k ^f>t ;:T£u-xaj:5 c;-:;i^i co ai) oj ^ jii^t-sv^ .icci,.>:> ■ ',,.. .,,,'-,:,,..-,., hc.+ 'i.r.-,'^ -^ri^- '. nrt!-'-^ •-• ,f.'r> "-'.-<;?! ir..-:-»- ^c'l oc ~E^yi£,&nJ: J.c- B£ ^I .'■■.;-c, irH -v,.: ,- ■- ; o, ^ w, ?vr,;.:t-5-f "^ ,-% "' ;v^ T ri<^-v; <' i.-*- Toirfo''"' nrrl nl j-:r^xd'ir/:!ftG.-n:"COf:i ir.lxmiz .3Si)x;fj^'.T- aO eJ'i'^a iieev-'jod *'n»ii.fi.-j'igir giUBsi.'iSX. nx -37- Transformati on to k - 1 sphere . The projection which eliminates differences in eccentricity from consideration and places all points on a k - 1 hypersphere may or may not have practical value. In studies where configurations having geo- metric similarity are thought of as representing similar types, D* appears to de- serve consideration. Such a problem is likely to be encountered in work with bocfy- types or other physical measurement, where concern is literally with shspe rather than with size. Measurement of shape by projecting onto a sphere with the population centroid « as center to obtain D' likewise has possible interest. Unlike measurement of shape by D* in the k - 1 plane, D' is invariant no matter which end of a dimension is taken as the positive direction, ide msy think of a person as having a factor specification equation, just as a test can be specified in terms of reference factors. The specification for the person tells what factors account for his deviation from the mean. Since D* treats as identical people who have the same factorial specification, no matter how far they deviate, it may be the sppro^rl^te measure for some type -theories. The limitations upon interpretation of D', however, include the serious difficulties which we discuss below in connection with k - 2 space. Considerations in using k - 2 sphere . The treatments in k - 2 space will be discussed at length, because such procedures are especially common. Projection on- to the k - 2 sphere treats as identical those profiles which are proportional when expressed as deviations from the person's elevation. For example, D" would be 0, and Q would be 1, for this pair of score-sets: 3 1 U (Elevation = 2; deviation profile is 1 -1 -2 2) 1-3-5 3 (Elevation « -1; deviation profile is 2 -2 -U k) Those profiles having small scatter are magnified in projection onto the sphere, (or we might say that those having large scatter are diminished proportionately). Figure 3 draws attention to some consequences. • i ■.'• ''«v n? ^A'toricre 1 -' if o:f a^ ^•if ,:isj:» riiii." luuij- aiiriaJ Ojl ;.:.'::lX3»q^i '?^' -'''•■-'• •w' r'3i) - )i :'v:TJ"sos si^'ajl ^^niv^n '^c^oUsi -38- We note that differences between persons near the center of the sphere are much magnified. The small D' becomes a large D'' ^, but D*' , is little greater 12 12 3li than D' • Points 1 and 2 represent persons with flat profiles. People who would be judged quite similar in k or k - 1 plane are sometimes reported as markedly dis- similar in the k - 2 measure. Figure 3, Magnification of distances in projection onto sphere Figure U indicates the effect of the projection when error of measurement is involved. Each sketch shows a set of obtained measurements such as might be obtained ■:•;.•/ / \ • • • / r * -••'■•• ■ T- ,-+• \ \ / Low scatter, low error Low scatter, moderate error High scatter, moderate error Figure i;* iSffect of error and scatter oh the projection onto a sphere liJX'viT vinos'xi: ^■^''T, L iic o:.i.lJ"ri:S- -. ':c-y :-9'n.«* ?^'1l ?.3>tF •JTOc. 'fTO^- -39- on repeated testing of one person, assuming that his error variance over trials is equal for all variates, and that errors are independent in k space. We show three cases 5 low scatter low error, low scatter moderate error, and high scatter moderate error. The second circle makes it clear that if a profile has small scatter, even a small amount of error may cause a drastic variation in the person's position in k - 2 space, i person for whom the variates are truly equal would fall at C in the k - 1 plane. On different trials he would have an equal probability of f cOJLing any- where on the sphere, and might at different times take diametrically opposite po« sitions. The implication is that the position of some persons in k - 2 space will be far more variable than others, and that such methods as D", Q, rho, and Tau will give unreliable similarity measures for persons with rather flat profiles. This is an expression, in other terms, of the sometimes-neglected principle that dif- ferences betv/een two variates within a profile cannot be interpreted with confidence unless the original variates are reliable and not saturated with a common factor (?5^)# If the conventional assumption that error of measurement is eqp.al for all persons is ^proximately true for the original variates, and if flat profiles in k space can be expected, the assumption of equal error is not even approximately true for measures of people's positions in k - 2 space. Stanley (33) has provided some data which confirm our analysis. He admir: stor- ed the Allport-Vemon Study of Values twice, and correlated the two profiles. Tnis correlation is a measure of distance between the two profiles in k - 2 space. For each person, he had a correlation and also a measure of scatter within the profile; these two correlated .38 over all persons, the greater scatter being associated with the greater reliability. The question m\ist now be raised whether the stu(fy of profiles in k - 2 space, or more specifically, whether correlation between profiles in the usual manner, is a justifiable line of investigation. If the removal of the first factor and the magni- fication of error variance when scatter is equated are both disadvantageous, is a -> rtr. -1 srij ' r ~ --; J i£) s>»»>:^ ■- iKiv crfi^"xs'i i^: j-i 'i^u-^- ji-j- : 1 r »>«• ■>s i n •>fiao 51 a.. «'- adds foxJ-fi : .;:noxJ..s: •.r YJ-xisIinia gX-:'^ iioiaii avi;^^ ^ rrx ;^-//^otc o^ ?»nv? a^^wtt-d ^onsjnxb io 9n;j:^tKrt fi ax n-^x ^^j,^-t- SIIOO ' t- ..r'^ota lo ^-^iUJ-ii ori,? •i.:>rl^0^ri7 baei:^ .1^ «.'.-:. ■rtri'nr s^'t tt-: lo*'?^! *3T .-.:r: Jaif^.' ticxise>x'p v.-iT -A pj ■<:>4-:t£:oc ^^rh-^ -i-ortfix'KV '-.err X e.;.u^<. 'oxcts; procedure which involves both of these worth further consideration? Knowing, how- ever, that Stephenson and othisrs using his techniques have obtained significant re- sults, ve cannot dismiss the method until we determine why the faults we suspect have not interfered too drastically with their investigations. The explanation seems to be that there are conditions where k - 2 space data give useful and not-unduly inaccurate results. Consider first the question of removal of the first principal component, as is done when deviate scores are obtained. This projects a distribution of points in k space into a k - 1 space, and in so doing removes the variance due to elevation. The same elimination of elevation is accomplished by the forced-sort technique. An essential condition for the resulting data to be useful is that the position in k - 1 space must be determined with substantial reliability. Under what circumstances can we e:q)ect reliability after the first component is removed? If the variates are near- ly uncorrelated, each variate contributes to the total dispersion of persons approxi- mately in proportion to V., and the elevatidn score removed constitutes one kth of 2 the total variance. The component removed from D will on the average be only one 2 2 kth of the total, and D* will be quite similar to D • Now this is what happens in the type of Q-sort Stephenson originally proposed, where variates were sampledfrom a heterogeneous collection. If a set of variates involves about fifty factors, all more or less equally weighted, removal of one factor is not expected to alter dis- tances between persons enough to cloud results. As more correlated variates are used, extracting the elevation factor does discard more of the possibly- important variance, and the residual information will be more unreliable as a result. The second question relates to the effect on reliability of projection from k - 1 hyperplane to k - 2 hypersphere. This projection leads to substantial magnifi- cation of error if a profile is flat. Recalling that C represents the center of the sphere, and is the point corresponding to a flat k - 1 profile, and that 0' is the centroid in k - 1 space, we can expect few flat profiles if the dispersion of persons 2 2 O'P' is much smaller than CO' • This demands that 0' be some distance from C ^r* -Ui-. -* XfubiUJ-Jon tra L/ioai; ^^£b sofiq?: - ano;. 'fief ^saJ" ad o^ >i fiioq 10 nojLw.:a^ >-: i:-^ -J^ ^"^s 39^o^•; -^i.v '^ •■^'■ :cij/"" ,''.'? ot f^ut oni'ji :'? ni biiB ^aosqvi .-He oJffx 90.- r r. - 5( nx ao.c-trr:r>q Sii-j- i li/lcsi r >'5.lsl» grc^LiBoi !^rid" tol nox-trbnoo xsXvJnease -.-x^q^V^ i^TOi:^:.q. iG :o^i-it>'... " ....•^^.^^:,: -;■« ClfiV riosa ;.:^,/« ^- -•r.-rfr vr io jft :no ?.ecfjj^io*s(io3 b9vr :od?, n. Is ^d^ bits ^,V o^ aoi Jtrqoiq ni. x^sSfta . ^.( — • -'' ,-.•>■:; '-vv- I-vt:^ :>rf^ .r snoacT^fi .+r,r!W s.c sxfit woH . 5 ^tXL'p 90 XlJrv --a bus. jJXvtoJ ©iU lo ni:^' , • . ...,^..^ '.'>r"^-T .h---^ "-^A) rgx-T --i'Xofjr).'-'? -:r>.il 'ro fiion": b3zu ^-Ti a9t£XT>:v dgo; .tJSjjqoi birolo o^^ njuona anoatoq c^^^i^ad ssoo^.r s; in other words, that the means for the k variates not be equal, (CO' is the sum of squares of these means). The more persons fall close to Cj the more will magnifi- cation of errors for them obscure results obtained with k • 2 measures. l-Jhen many variates are used in the profile, as in Stephenson's forced-sort method, there is a good chance that some of the means will be unequal, and flat profiles then are less common. It is to be noted that when original scores are expressed as deviations around the group mean there will be mai^ flat profiles; such scores are badly suited to k - 2 space procedures. In general, the essential condition is that flat profiles in k - 1 space be rare or absent. While use of variates with unequal means will reduce the number of flat pro- files, this has the disadvantage that correlations then tend to become larger and more uniform, so that one obtains less information about differences between persons. In the extreme, if items differ \d.dely in popularity, most persons will rank them in the saune order and almost all Q correlations will be 1,00, Similarity between individuals or within a group can apparently be given no psychological interpretation unless it is measured in a domain in vjhich at least some pairs of people are dissimilar. The similarity index obtained for any set of items depends to a major degree on the discriminating power of the items. This means that the absolute magnitude of the Q correlations cannot be directly interpreted and may have no practical significance to investigations of similarity. Only vmer.- i\- is demonstrated that a difference between groups or between pairs of individv.aJ-s ii: exists wagnitude of correlation /Is it possible to offer an interpretation, Fiedler (16)^ for example, asked thers^ists of several schools to rate statements describing a therapeutic relationship in order to determine if they differ along school lines in their concept of an ideal therapeutic relationship. The correlations between ratings were positive and large (median •6U)« One would be tempted to interpret such corre- lations as indicating a high degree of similarity among the therapists regardless of scliool. But it is also possible that the statements used represented such markedly desirable and undesirable qualities that high agreement could be found in almost any » •- 9JMX^ ,a;roaT3a Tv-j^^j.^c -i.;.. n.:: ni' T Ilxw anooiaq. ,.i>J,.S. aTj' vi •^srJ'i tzoQBb iiJiic'irv^iBJa eoB'i ^- •• 'vii^s J- '^noi.i.c-I.Bi'^o'j ^f< = ■.'•' + I/JCiv ID nx iii-.-y • •^<>. -sott q?i-- -k. • r. Vt XIX . . , ^c'^Xnt.' 91 "C; ;i'n alci' •- : '-/> oTxr ■xi.^ on- avr^rt v^?r. ,J5 5X ^£ drS . •ioQfioa .'-. -..- ;-, ; ,,« + -142- sample of persons acquainted with therapy. Undoubtedly statements about more de- batable qualities in the ther^eutic relationship could be found which would result in much lower correlations among therapists of different schools. However, Fiedler goes on to extract the valuable information that the expert therapists correlate higher with one another, regardless of school, than they do with non-expert thera- pists of the same school. This difference supports his major conclusion, since it indicates that the choice of items had not compD.etely pre-determined the correlationsc It is disturbing to realize, however, that choice of more obviously desirable and undesirable statements might have resulted in higher correlations in both groups, so that the differences he found would have been obscured. This demonstrates that while Q correlations can be used to show the rej.ative similarity of two pairs of persons, or persons in two groups, little meaning can be attached to the size of a Q correlation per se . It is not surprising that most profile studies today utilize comparisons in k - 2 space, since the problems have been conceived in terms of correlation as used to study relationships between tests. However, it is questionable whether that model is a particularly good one. For the determination of similarity between two tests, it is reasonable to eliminate the mean and variance from consideration. As Thomson 05) and Burt have pointed out, the test mean represents its general level of dif- ficulty for the population, while the variance is a function of the units used. Both of these values are usually quite arbitrary, depending on the choice and number of items, and since we are mainly interested in the underlying relationship between tests, these values are equated. However, in dealing with similarity of individuals , it is necessary to consider rather carefully what is involved when individuals are equated for level and scatter. To illustrate the interpretation that can be made for measures in k or k - 1 space, which measures in k - 2 space do not allow, we refer to a study by Bendig (2). He asked professors of psychology to rank 15 professional journals in terms of their importance for study by graduate students. These ranks were correlated and factor yI.be^^^'^'C^:^ct^ -,L'o:^t .'J iv ^'- : . Tfrl;^ rJ-£jL9T-l«- •O 33Cr. r>fl1 lo • iC 90.5rLD •^^ -.-r.- r.-.p ^■frK-^c . ibai; b :ii JV r ni . >. -^Vii.' 'Ci/ •O^i ■jiizx: ,.^, , . , 'i' -^t: rjXJ".; : ... ;r.'i©fioC' : XBi" n.'" 1" • n99*.:>>c-d y: ■-rij.r'?-o'r:- J-?!*rcn iuru-i XiiV:rz -U3- analyzed, leading to the conclusion that there were three bi-polar types of persons, described in terms of (a) interest in experimental approach to psychopathology, (b) interest in statistical and psychometric theory, (c) interest in theory construction in clinical area. Suppose Bendig had asked the Judges to rate the journals on some objectively-defined scale ranging, for example, from "Knowledge of most contents of this journal should be required on comprehensive examinations" to "Reading this journal will not be worthwhile for any student". Then the elevation factor (tendency to give many journals high ratings) would reveal something about the judge's view of graduate training. A judge who wants students to read many journals differs from a judge who rates only a few high, even though he gives the same rank order to the journals. Moreover, the variability of the ratings by a judge would indicate his tendency to differentiate within the field of psychology, regarding some areas as worthwhile and some as trivial. A judge with a flat profile would be reporting that he is equally sympathetic to a wide range of psychological interest. A judge with a wide variation of ratings indicates a stronger differentiation. Two judges who ranked the journals in the same order, but who differed in the scatter of their ratings, would be expected to allow quite different latitude for students in train- ing. At one point, Bendig characterizes his subjects as arranged from a "theoreti- cal-experimental-statistical" pole to a "pr actio al-nonexperimental -intuitive" value orientation. Possibly, rather than this typology, a k - 1 or k space measure would reveal that the judges could be better grouped in terms of specialized versus catho- lic values. Combining our two conditions, it appears that measures in k - 2 space can give useful information only if the dispersion of persons in k - 1 space and also the scatter for nearly all persons are large relative to the error dispersion. Data in k - 1 space are required to determine whether these conditions are met. Then one can determine whether profiles in k - 1 space are reliable (1^), and whether there are many flat profiles. Moreover, one can if he wishes eliminate the people with flat r>rofiles from the stucfy. The forced-sort does not collect k - 1 data and one has "srvx'YCt as -.jtisto-";-" • ;j;?-^r':'::£ si? ^^oc^(;dx/8 k':,l essiis^ j£;i^;i3 3■;^r^^^. ..-i^'. 5 ^ ■ " ,_..r.- &vt5 aco •5Ci-.q3 S - >i rtx '^eii/assia cfsrfj e-xesqqp Ji j^Enci:J-xi>r{0D ow" 'too gnxnxtffs.oO •tsXl /i.'.'v' CiT-o^'"^ &r^.t =^t>:r^Mrjr> -.>f(^^r o;f "*:<■ ?',r> ^^ • , ■i'~:y-r>'' ': ■-.[ r..- rf .. ni^ : ' »-f •»■ r» +;• >. r _ -J in. r r^*. -i,.** »>.-.— u -^.^^ -> «-^.«.r >^ W -4. -.-.~ '» no basis for testing whether profiles are reliably located. It seems quite im- portant for those studying similarity to investigate reliability directly by obtain- ing two estimates for each profile. Reliability of k - 2 space measures has ordi- narily not been examined in past investigations of similarity. While we have discussed th e cond itions under which measures which force equal scatter on all pe rsons can b e ma de maxL-nally u sej^^l^ >je d-^ not recommend such pro- ced''ires. Our consideration o f -J.l 7:ossibi liti&r D.-^ads r,? t o su;:;ggst that the method most generally ad visable i'>- t',e mos-sure of e^jUc' tioa (21; whsre k - 1 plane data are corabined with the measure of elevat i on usin g a del iberat e ly chosen weight (which may be zero) for elevation, (When the weight is unity^ this measure is the same as^ in k space). Excepting treatment of physiological and anthropometric measures, we know of no psychological or educational problem where "correcting" profiles for scatter ir advantageous. In those studies where k - 2 space measures have been used in the past, properly interpreted positive results need not be discounted. The faults to which we have dravjn attention operate to obscure true relations and to make the measurement tech- nique insensitive. This would make non-significant results, or low Q-correlations, likely in some instances where a better techjiique would find more relationship. We know of no biassing factor or systematic error in these procedures, however, which would have introduced significant apparent relations where none should be found. The specialized problem of comparing a person's profile with his estimated pro- file introduces an interesting minor question. Several such studies are listed in a recent p^er by Brown (U)« The usual method is to administer (say) the Kuder Pre- ference Record, and then to require the person to rank his interest in the categories. The profile from the test is rank-correlated with the estimated profile. But this is not precisely the question that should be asked. If one were to predict the in- terests of the average man, they v:ould not all be equal; on the contrary, some cate- gories are generally more popular. The estimated profile, obtained by the usual di- rections, is a k - 2 space profile based on the estimated strength of interests '^ 'J^^.- J. V -■ ,«:ilciil'.^Ji: ;:>© 3C' .i ..c :5Ti: •{<"^. ?r • CXu ".Pv-i relative to each other. The test profile is a k (or k - 1) profile based on the estimated size cf the d «3viations of the person's interests fron the interests of the norm group. These tuo profiles should not normally be highly correlated, because the average popularity;" of the categories has been nquDlized in the test profile. To determine if people can estlriate their own tert profiles, the experiment can be re- designed to make the estim.ate mere like the tert in logical structure. Perhaps the easiest technique vrould be to ?.sk tne person to guess his percentile standing on each category. A D measure (or D ) based on this profile would take into account elevation and scatter, and would correctly compare profiles expressed in terms of derived scores. 2 Short-Cut Formulas Based on Mean D One use of measures of similarity is to con^jare any two persons. In research, hovjever, the questions more often relate to the similarity of two groups, or the homogeneity of some particular group. If questions could be answered without com- puting the measure of similarity for each pair of persons involved, it would be possible to obtain the ans-^'ers much more rapidly. He have discovered several formulas based on u which relate to such inquiries. Unfortunately, however, they are based on the average of D squared for a set of pairs, and there seem to be no similarly helpful spprca.ches for obtaining the average of D directly. We have in- 2 dicated earlier the difficulties which make D inappropriate as an interval scale to measure distance. The following formulas are presented for three purposes. They may be employed as a first rapid way of ansx^rering questions about groups, provided the investigator recognizes that different results irdght be obtained if mean D or 2 median D had been determined instead of mean D • A second value of the formulas is that they provide insight into the nature of distance measiires. Factors which 2 increase mean D will also, in general, increase mewi D and medinn D, even though not in the same amount. The third use is for checking computations. 06 ^1 -D' .-(•■ 11 Average similarity wj-^hin group« It was previously noted that in any group. "■'■ iJ - 1 ^ ^ where i and i' vsry over all persons in the s.3niple. This index expresses the averap:c similarity of a group, i.e., its homogeneity, except that it gives greater weight to large distances thin wou'J.d a linear measure of distance. This foinula might be used to compare the homogeneity of one group with that of another, as in an in- spection of a grouping of persons into postulated "t^/pes". If CL is the centroid of the sample in the space under analysis (whether it is the center of the reference class or not), and i varies over the sample. E^ oroT^ = 1 rT7| (29) This is the mean second moment of persons about the centroid, and is analogous to a variance for the distribution. It is not mathematically a vsrisnce, hovxever, since 9 the mean E is greater than zero, "ie have referred to E" as a measure of dispersion. It will be noted that if points are di.stributed on a hypersphere, 0^ lies within the hypersphere, and no one can fall at the centroid of the sample. Formulas comparable to the above can easily be written for S , and for measures in which weights are assigned to the variates. Similarity of person to group , Fqr a single person, it rnry be interesting to know his average distance from all other members in a group. If i is a member of Group Y, 11' K - 1 ^ f 1 j' M - 1 11 2 ^ (i = 1, i' = 2, 3, ... N) (30) Here P ^ is 2 (x.. .^ f . -U7- Dj is the average similarity within Group Y, If i is not a member of Group Y, 1)2 ^ p2 + 1 nl (i not in Y; .i» - 1, 2, ..^ N) (31) 11 Yi 2 Y The difference between (30) and (31) is due to the inclusion of i in Y in the first case. As N increases, (30) approaches (31) • As before, one must bear in mind that we have averaged the sq uared distarces. Distance between ;?; roupS o The measure of similarity between two groups might be found in 0„, the separation of their centroids. This is the measure most used in comparison of groups to test the null hypothesis, but we sometimes de5:ire to de- 2 termine instead the average D between members of the two groups. It permits us to ask whether a group resembles another group as closely as members within the group resemble each other. For this we have D^ = n p2 J. n tj'^ 11 . ' Vi ViT " Vz (i = ^^ 2, ... N ; i' = 1, 2, ...N ) (32) Here we see the average cross-similarity as made up of three components: squared distance between group means, dispersion within the first group, and dispersion within the second group. 2 The formula can be rewritten as follows, if o . is the variance of j for 0(Y) the population Y represents, etc.: ^ii« = ? (^irv^ ♦ "="1(7^ " (^ (Y^ " ^. )^ ) • (33) ^^ j J(Y) J(Z) J.U; J. (2) There is one term for the variance within each group, and one which is twice the variance between groups. Summary and Recommendations Studies attempting to determine the similarity of persons have used a variety of statistical procedures. Some of these procedures are more advantageous than others, and we have attempted to analyze each procedure so that investigators can choose the method most likely to reveal the effects they seek to measure. ict . ?. ■:OPBt lahx-oi^J-neo , . V r;i ,o.j- Vi)t "foi.:'"i».uiqoq si; ^ * r ^ ^ '' • '•• .t 6J. dai.r'x-j sno i-H; ^gifo'ss /iDt-a /i.cajrw sor ol mx-^^ ano ai oiariT sb/'SjiLf.i'VOSf: r^nrjT!'! •:-•.+ .'rb o.t- ;> i hii.+B uiiod;iC Jiv- v'iM y^or:; f^■^•: :::j'xi;i.>3C.oaq o^t:-. ,<2:;iij;eoo'i ^.'■ xobxii lo e'jJiorfo bne. ^^a^c^si1.6v . sr« ^s^xcrxj-fvXfp ec-'dt '3et'?.soq oo acrxruj axrt Jr-riiJ" srcuaKS od" b9XiajL.:.'i£i ax lOJ^sxc^aov o;rj.:-r i-'-{t .IoiQaz '1 tsnox^^i.^iio t;.iJJxX ciViii^ :^ai:^j,^~:^ 'M ^'i.ai-IJ'onta 'vv oj i<: j-t^qt'? aKef?'^ -X^bc'-T. .r.<3no:f, .-o-tio ar.*."? rjc .br^v;.Rd '.'• x.sh. ^^Be?-^^'?:.'. T .^v'.ttq.h'xc' ^nm-'Lordi ax nox.;f- - i-^o nx .-lii^:: c;'v:-vvq "io C'XX'v -d^isb'^ooiq is'iil&^'&'i'x ■ Vj^ooos octTfx riv.).tteX;i' . .if ^ o.t aldsir:; - ux '.ri -o^ici-ic anu' 7,^.4 ^w bi'.Si ^oOii r-js:. :ixyxiooALr r. ^iisv ' • -U5- ("elevation") factor-. The various orthogonal indices nay be classified as follows: k space measures, which reflect differences in profile shape, elevation, and scatter. These include the .Pleasure D or S which we describe, C at tell 's r , and one form of Pearson's CRL, P k - 1 hjrperplane measures, which remove differences in elevation from the data before comparison of profiles. The index D' is used for such dstae I special index D^ is suggested which permits the investigator to reintroduce the elevation factor x^)ith any desired x-jeighto k - 1 hypersphere measures, which remove differences in eccen- tricity of profiles. Measures in this group are chiefly of theoretical interest. k - 2 hjTDersphere measures, which remove differences in elevation and scatter from the profile. These include product- moment correlation, rho, Tau, r , and correlation based on Stephenson's forced-sort procedure. The investigator should eliminate elevation and scatter from his distance measure only if there is a psychological reason for regarding differences in these as unimportant. For most purposes, we regard the index IL. as best suited to simi- larity studies, IJhen w is one, this becomes Do If D or D is used, the investi- gator treats as alike those people who have the same profile, but considers that profiles having different elevation or different scatter are as truly different as profiles having different high or low points. In contrast, measures in k - 1 space (based on deviations around the person's mean) and measures in k - 2 space (with scores standardized in each profile), discard some of the most reliable in- formation in the score set. Profiles in k - 1 space are less reliably determined than k-space profiles. In going to k - 2 space, error is greatly magnified for persons i-rith small scatter. Such magnified errors are likely to obscure true re- lationships. Most investigations have been based on k - 2 space measureso Ue do not believe that such indices are generally the best for research on similarity. It is true that some studies have successfu].ly discovered relationships with these measures. Measures in k - 2 space can be dependable when variates are reliably -50- measured, and where there are few f.lat proiiles. Even in studies where k - 2 measures have been useful, s. /.lore ^o-.:er?ul technique vrculd bo exrjected to produce the results with r^reater cl.arity* In studies which yielded no significant relations involving k - 2 siiiilprity measures, an indez such as D might have found relation- 'I'T ships of importance. . /... „. ,. In choosing beti:sor D, D', and D , the investigator must decide whether there is an interpretable elevation factor, and whether- this- fjctcr ohc^old be allowed to influence his distance measure. If tlie variates .do. not hiivc substarxtial positive intercorrelaticn, Xire recomirend that D, computed 9n the original measui^es in k space, be used to determine dissirdlarity. If the vrriates do generally measure a common factor, the investigator should consider the meaning of this factor and decide whether it is one he wishes to count. If he wishes to eliminRte it from consider- ation becsuLe he regards it as ijrrelevant to his problem, he will use D* as his in- dex. If ho wishes to include V.ie factor, he may choose an appropriate weight for it and use D,,. The advantage of D over D is that with substantially intercorrelated variates the elevation factor m.srv- receive greater weight in D than it should, rela- tive to the xjeight given to the shape of the profile* The distance index msc- be expressed in terms of B, D'^, or some transformation to another sc-"de. It appears unwdso to force D j.nto a correlation-like index ranging from +1 to -1 as Catte 11 suggests* There is probably no limit on hov? dissimilar two people car:i be, save as one is Imposed by the metiiod of gathering data. Hence in k space or k - 1 h7^'perpl3ne D can range fraa (perfect similai-ity) to 00. If siriii- ^ 2 larity is reported as D'^, we have useful formulas for mean D under various situa- o tions, L'*j however, seems to be less meaningful thaii D as a measure of distance, ex- pecially as D is literally the "distance between points in our geometric model. It is also more likely to h.avc -statistical properties v;hich make it possible to utilise " ^ ' . 2 means, variances, and product -moment correlatiorxS. Thus we advise that D be used only in preliminary stucdes where its simplicity is of value and where ordering of ; -l^ . ■ •'• -51- similarity is the major question. The use of Q as a measure should also be limited to these conditions. This paper has given little attention to problems of reliability, but it is clear that measures of distance between points cannot be determined dependably if the locations of the points are undependable. Therefore, any steps the investi- gator takes to make his profiles more reliable are well worth while. Profile research is necessarily faced with severe difficulties. The results of any investigation are influenced by numerous choices which must be made in part arbitrarily. Even when these decisions are made wisely, the difficulty of making reliable, measurements on many variates at once is a severe one. lie hope that in spite of these problems, the adoption of techniques of analysis which include as much information as the data permit, and which do not introduce additional errors of their own, will permit studies of similarity to advance psychological knowledge. r.fj- ». I fa-. ;s Q :. vaj: dcfxv ^i. Ri'lFERENC^S U Barnette, ¥. L. Occupation.al aptitude patterns o: selected groups of coun- seled veterms^ Psychol^ licncrr* , '^9^'^} ^-5^^ '-!^^« 322,. 2» Bendig, A, W. k Q-technique s'oudy of the proxes-icnal i-ntcrer;!?: cf psychol- ogists. J. Psychol. , 1952, 33, >l-^h*. 3» Block, J.., Lcvine, L..,; and HcUf^mar, Q,- Tct^ting^^for the^existence of psych- ometric patterns* J«. abnoriri,. loc> Pcychol ... 19^1, ^6, 3>^>-3:>9* U» Brown, M» N» Ejcpressed and ii^^'cntoried interests of veterans* J.. ptxI^ Psychoid , 1951, 35, UOl-UC?.. 5.. Burt,. C. L* Correlations between persons,. 3rit<. J.. Psychol. , 1937, 23, 59-96. 6'» Burt, C L. The factors of the r;iind > London: University of London Press, 19U0* » Cattell„ R- B^ r and other cr.efficients of pattern sirrdlarity„ Psycho- metrika, 19i49,- iU? 279-298. 8. Chowdhry, Ko, and Keucomb, T,. lU The relative abilities of leaders and non- leaders to estimate opinions of their c\m groups*. Jo. abnom > soc. Psycncl>, 1952, his 51-57. 9. Cronbach, L. J* "Pattern tabulation": a statistical method for analysis of limited patterns of scores x^ith particular reference to the Rorschach test,. Iiiduc. psychol. Heasmt. ,. 19U9, 9, 1U9-172.. 10. Cronbach,. L. J. Further evidence on response sets and test design* Educ> psycViol . Ke asmt y, 1950, 10, 3-31 • 11* Cronbach, L. J. Coefficient alpha and the internal structure of tests. Psychomctrika ,, 1951, 16, 297-33U. 12. Dul^as, F. H. A quick method for analyzing the similarity of profiles. J« clin. Psychol., 19^6,, 2,. 30-83 r. 13. Dul^as,, F. K. On the interpretation of personality profiles. J> clin, Psychol., 19U7, 3, 57-65c lit. Ebel.. R. L. Estimation of the reliability of ratings. Psychomctrik?^ 1951, 16, U07-li2Uv 15. Edgerton,, K . A., Bordin," E.,, and Molish,, K. Some statistical aspects of profile records. J,. educ> Psychol ., 19lil,- 32, 185-196. 16. Fiecaer^ F. E. The concept of an ideal therapeutic relationship. J,, consiat. Psychol^ , 1950, lU,'239-2[i5. ^17. Fiedler,, F. E.., P-iaisdell,. F. J.,, and Harrington, W. G. The relationship be- tween unconscious attitudes end sociometric choice., c'. abnorm> soc. Psychol,., in press. % « ^ • •• ..■ I Cv- <^ .rl^ b:::. h -^i-iX'"- ■^-""k-- » » » -53- 18. Fiedler, F. E,, and Senior, Kate. Ln exploratory study of unconscious feel- ing reactions in fifteen patient-therapist pairs, J. abnorm, soc. Psychol, , in press, 19» Gleser, Goldine C, Lcevinger, Jane, and DuBois, P, H. Resolution of a pool of items into relatively homogeneous subtests, Amer, Psychologist, 19^1, 6, UOl, " 20. Hodges, J, L. Jr. Discriminator^'' Analysis: 1. Survey of discriminatory analysis. USiu? School of Aviation Medicine. P.andolph Field, Toxas, 1950, 21. Holzinger, K. J, Factoring test scores snd zrplications for the method of averages. Psychometrika , 19lH, 9, 257-262. 22, Kel].y, E, L., and Fiske, D. VJ. The pred iction of Derfo rmarcq in clini c rl psycho logy , /.nn Arbor, University of Mici.rnxari Press, ly^ls 23, Kendall, H, G. Rank correlation methods «> London. Charles Griffin and Co^, Ltdo, 19l;8. 2Ii, Moses, L. E. Non-parametric statistics for psychological research. Ps ycho l. Bull. 1952, h9, 122-lii3. 25. Hosier, C, I, Batteries and Profiles in Lindcuint, E, F, (ed«) £duc ational measurement . Washington Amer. Council on Educ«, 195l^ pp 76I;-8o8"I 26. Osgood, C, E. and Suci, G. A difference method for analyzing the struct\ire of inter correlations among variables xijhxich is applicable to ravj-score matrices, Psychol. Bull , in press, 27. Osgood, C, £o and Suci, G. The nature and measurement of meaning, Psychol. Bull , in press, 28. Pearson, K. On the coefficient of racial likeness, Biometrika, 1928, 18, 105-117. 29» Rabin, A, I., and Guertin, li, H, Research mth the Wechsler-Bellevue test: 19U5-1950. Psychol, Builo, 1951, li8, 211-2U8. 30. Rao, C, R« The utiliza-r .on of multiple measurements in problems of biological classification, J, rc.v . stat^ Soc , Sec« B, 19'-), 10, 159-203. 31. Rao, C, R, Tests of si^^nificance in m.ultiv.^ricte analysis, Biom , 19ii8, 35, 58-79. . ... 32. Rao, C, R,, and Slater, P. Multivariate analysis applied to differences be- tween neurotic groups, Brit, J, Psychol ,, ^tatr Sec, 19^9, 2, 17' 29, 33» Stanley, J. C, Insight into one's own vsLues. J. educ, Psychol ,, 1951, h2j 399-408. 3l4-« Stephenson, ./, A statistical apr)roach to typology; the study of trait-universe? J. Clin. Ps:/chol ., 1950, 6, 26-33. 35 • Thomson, G. The f actorial analysis of hiiman abilit;/-. Fourth Edition^ London: University of London Press, Ltd., 19^-0, v^ UNIVERSITY OF ILLINOIS-URBANA 3 0112 084224671