X I B R. A R. Y 
 
 OF THL 
 UNIVER5 ITY 
 Of 1LLI NOIS 
 
 370 
 
Return this book on or before the 
 Latest Date stamped below. 
 
 University of Illinois Library 
 
 D« :!i ,g; 
 
 m 
 
 13«76 
 
 °K 2 2 m 
 
 L161— H41 
 
Digitized by the Internet Archive 
 
 in 2012 with funding from 
 
 University of Illinois Urbana-Champaign 
 
 http://www.archive.org/details/someempiricalstu04rudi 
 
tip. 3-4 
 
 BUREAU OF RESEARCH AND SERVICE 
 College of Education 
 University of Illinois 
 Urbana, Illinois 
 
 THE RELATIONSHIP OF INTERPERSONAL PERCEPTION TO 
 EFFECTIVENESS IN BASKETBALL TEAMS 
 
 Fred E. FiedleT, Walter Hartmann, and Stanley A. Rudin 
 College of Education, University of Illinois 
 
 Technical Report No. 3 
 
 SOME EMPIRICAL STUDIES OF THE RELIABILITY OF 
 SOCIAL PERCEPTION SCORES 
 
 Stanley A. Rudin, Irving Lazar, Mary E. Ehart and Lee J. Cronbach 
 College of Education, University of Illinois 
 
 Technical Report No. 4 
 
 Studies performed under Contract N6ori-07135 
 with the Office of Naval Research 
 
 Project on 
 Social Perception and Group Effectiveness 
 
 Mav. 195? 
 
SOME EMPIRICAL STUDIES OF THE RELIABILITY OF 
 INTERPERSONAL PERCEPTION SCORES 
 
 Stanley A. Rudin, Irving Lazar, Mary E. Ehart, and Lee J. Cronbach 
 College of Education, University of Illinois 
 
 Study performed under Contract N6ori-07I35 
 with the Office of Naval Research 
 
 Project on 
 Social Perception and Group Effectiveness 
 
 Technical Report No. 4 
 May, 1952 
 
SOME EMPIRICAL STUDIES OF THE RELIABILITY OF 
 INTERPERSONAL PERCEPTION SCORES ♦-!/ 
 
 Stanley A. Rudin, Irving Lazar, Mary E. Ehart, and Lee J. Cronbach 
 College of Education, University of Illinois 
 
 Introduction 
 
 Fiedler and others have suggested that Assumed Similarity, Real Simi- 
 larity, and other measures of social perception represent psychological attri- 
 butes influencing group behavior. Each of these variables is operationally de- 
 fined in terms of the similarity between two descriptions of individuals on a 
 set of items. To demonstrate that these measures of social perception are 
 meaningful and general concepts, they must be shown to have reliability or 
 consistency. 
 
 This study is an attempt to determine whether such reliable individual 
 differences exist. Specifically, it is an investigation of the reliabilities of the 
 measures used in a study of effectiveness of basketball teams, reported by 
 Fiedler, Hartmann, and Rudin (3). Consistency of the similarity measures can 
 be demonstrated in these ways: 
 
 ( 1) over items, - 
 
 (2) over persons ("perceivees"; the objects of perceptions); 
 
 (3) over time. 
 
 The first two methods have been applied to Assumed Similarity scores in this 
 this study. Only the first type of reliability was studied for Real Similarity 
 
 *This study was made in connection with Contract N6ori-07135 between 
 the University of Illinois and the Office of Naval Research, Human Relations 
 Branch. Technical Report No. 4, May 1952. 
 
 1/ The authors wish to acknowledge with pleasure the cooperation of W. G 
 Warrington in solving many computational problems. 
 
-2- 
 scores. Stability over time has not been investigated. 
 
 Development of the Instrument 
 
 Since the test was constructed under time pressure there was not suf- 
 ficient time to run a pilot study, to pretest items, or thoroughly to investigate 
 the assumptions involved in the various types of instrument that might have 
 been used. 
 
 On the basis of a previous study by Fiedler, it was decided to use a modi- 
 fication of the Stephenson C ~technique(5). Briefly, the instrument was devised 
 in the following manner: 
 
 (1) 100 statements descriptive of a person's behavior were derived 
 
 from interviews with high school students. The statements were 
 
 2/ 
 then grouped into 20 blocks of 5 items each.-' The items dealt 
 
 with a variety of personality traits. 
 
 (2) The subject was instructed to indicate which statement in each 
 block he considered to be most characteristic of himself, and 
 which statement in each block he considered least characteristic 
 of himself. 
 
 The subjects were high school basketball players from 14 teams, total- 
 ling 178 subjects. For a more detailed description of the subjects and pro- 
 cedures, see the paper referred to previously (3). 
 
 Measures 
 
 The instrument was administered to each subject four times, with these 
 instructions: 
 
 (1) Mark the items to describe yourself (self - s) 
 
 (2) Mark the items to describe how you would ideally like to be 
 (ideal - i) 
 
 (3) Predict how the person with whom you can cooperate best will 
 describe himself ("positive" choice - p) 
 
 (4) Predict how the person with whom you can cooperate least well will 
 describe himself ("negative" choice - n). 
 
 2/ More recent work in this project suggests that the five -item block is 
 less efficient than Stephenson's sorting technique. Studies of this problem are 
 still incomplete. 
 
-3- 
 Each series of responses could be correlated with another to obtain an 
 index of the similarity with which the two tests have been marked. This was 
 done by comparing each block in one teat with the same block in the other test 
 and assigning a numerical score to each pair of blocks. Five degrees of simi- 
 larity were possible. Perfect agreement was counted as + 2, an agreement of 
 only one pair of the markings as +1, neither agreement or disagreement as 0, 
 a disagreement of only one pair of the markings as -1, and disagreement of both 
 pairs of markings (perfect disagreement) as -2. Examples: 
 
 Perfect Agreement Partial Agreement Perfect Disagreement 
 
 Test 1 Test 2 Test 1 Test 2 Test 1 Test 2 
 
 Most Least Most Least Most Least Most Least Most Least Most Least 
 
 X 
 
 
 
 
 
 
 
 
 . 
 
 X 
 
 X 
 
 
 
 v 
 
 
 
 i 
 
 
 
 y 
 
 I X 
 
 
 
 | 
 
 
 
 
 
 ■ 
 
 X i 
 
 i x 
 
 t 
 
 
 1 .._,_ — . 
 
 X 
 
 i 
 
 
 i 
 
 
 X 
 
 
 
 
 1 
 
 
 1 
 
 
 X 
 
 1 .. 
 
 
 X 
 
 
 
 
 
 
 
 X , 
 
 fe- 1 
 
 Scored +2 
 
 Scored +1 
 
 Scored -2 
 
 The scores obtained in this way are a linear function of the Pearson product- 
 moment correlation , when block scores are added and divided by 40. 
 
 It is obvious that two classes of correlations can be obtained: (1) corre- 
 lations between two tests taken by the same subject but with different instructions 
 (the "internals"), and (2) correlations between two tests each taken by two dif- 
 ferent subjects ("externals"). The internals or intrapersonal correlations yield 
 measures of Assumed Similarity (AS); i.e., a measure of how similar the sub- 
 ject perceives himself to be to someone else, or how similar he perceives two 
 other persons to be. The externals or interp ersonal correlations yield a measure 
 of "Real Similarity (RS), which is a measurement of two persons' actual simi- 
 larity en these items. 
 
 The reliability of the following measures was determined in the present 
 study: 
 
 Internals 
 
 (1) ASp: the assumed similarity between self and the prediction of the 
 best co-worker (s vs. p) 
 
 (2) ASn: the assumed similarity between self and the prediction of the 
 person with whom the subject said he could cooperate least well 
 
 (s vs. n) 
 
-4- 
 (3) ASo: the assumed similarity between the person's best co-worker 
 
 and the person with whom he said he could cooperate least well 
 (p vs. n). 
 These measures are schematized in Figure 1. 
 
 ASp 
 
 s = self description 
 p = prediction of best co-worker 
 n = prediction of the person with whom S can 
 cooperate least well 
 
 SCHEMATIC DIAGRAM OF THE MEASURES USED IN THIS STUDY 
 
 Figure 1 
 
 Externals 
 
 RS: the ;, real similarity" between the self descriptions of two persons on 
 this test. 
 
 Internals 
 
 M 
 
 ethod. Seventy-seven players were chosen at random from the total N o. 
 
 178. The test was split into odd-even halves of 10 blocks each. The internal 
 similarity score over odd blocks was compared with the similarity score over 
 even blocks by means of Guttman's split-half formula (4). 
 
 Results. Correlations between the three internal measures are shown in 
 
 Table 1. 
 
 Table 1 
 
 CORRELATIONS BETWEEN THE INTERNAL MEASURES (N = 7Q) 
 
 Variables T_ 
 
 ASp vs. ASo .18 
 
 ASp vs. ASn .30 
 
 ASn vs. ASo • 62 
 

-5- 
 A summary of the reliabilities can be found in Table 2. Inspection of 
 distributions of the three measures showed them to be fairly normal. All these 
 correlations are significantly greater than zero (P <.01). 
 
 Table 2 
 
 RELIABILITIES OF THE MEASURES OF ASSUMED SIMILARITY 
 
 (N = 77) 
 
 Variable 
 
 ASp 
 ASn 
 ASo 
 
 Mean* 
 
 SD* 
 
 R< 
 
 e liability 
 
 .297 
 
 .179 
 
 
 .618 
 
 .097 
 
 .200 
 
 
 .695 
 
 .111 
 
 .214 
 
 
 .608 
 
 *In terms of the Q correlation scale 
 
 Conclusions. The basketball instrument is sufficiently reliable to detect 
 group differences in AS and reliable enough for investigations similar to the 
 basketball study. On the other hand, greater reliability would be necessary be- 
 fore the instrument could be used for drawing conclusions about single individu- 
 als. 
 
 The fact that AS scores ar e consistent from one set of items to another 
 supports the hypothesis that this aspect of social p erception is a trait in which 
 consistent individua l d ifferences exist . The establishment of AS as a measur- 
 able trait justifies further investigation to determine its psychological meaning 
 and practical significance. 
 Externals . 
 
 Method . While it was considered desirable to find the reliability of 
 inter personal measures, correlating the similarity scores of odd vs even blocks 
 for each pair among 77 persons (2,926 comparisons) would have involved too 
 much time and effort. Furthermore, mathematical theory indicates (cf. Techni- 
 cal Report No. 2) that the P.S measures for different persons would have dif- 
 ferent reliability, depending on the variability of the traits measured within 
 the person. Therefore, the reliability of RS scores was determined separately 
 for each of five key subjects selected from one school. The five key subjects 
 were selected at random. 
 
 Their self descriptions were split into odd-even halves, by blocks. These 
 halves were then compared with the corresponding halves of each of the 77 
 subjects in the above-mentioned sample, and similarity scores summed for each 
 half in the same manner as for the internal measures. Thus, each reliability 
 
-6- 
 coefficient indicates the consistency of the instrument in reporting which of the 
 77 subjects are most similar to the one key person. 
 
 Results. Table 3 presents the reliability of real similarity scores involv- 
 ing the five key persons used. As can be seen, all are quite low, and three of 
 the five are negative. The negative reliabilities may, for the purposes of the 
 present study, be regarded as chance departures from zero. 
 
 Table 3 
 RELIABILITIES OF REAL SIMILARITY TO EACH GF FIVE KEY 
 
 PERSONS (N = 77) 
 Person Reliability 
 
 1 -.226 
 
 2 .068 
 
 3 .264 
 
 4 -.571 
 
 5 -.302 
 
 Concl u sions. On the present instrument, measures of interpersonal simi- 
 larity are almost entirely lacking in reliability. With such scores, there is no 
 possibility of establishing any correspondence between RS and any measure of 
 group effectiveness. 
 
 Attempts to Improve Reliability 
 
 Two methods were used in an attempt to improve reliability: (1) eliminat- 
 ing items which did not correlate with other items, and (2) scoring the test in 
 terms of clusters of items before computing similarity scores. 
 
 Items were therefore grouped into clusters. Three judges working to- 
 gether classified the items into 13 trial clusters on the basis of "traits" or 
 "factors" which they were presumably tapping. Keys for each cluster were made 
 up, and 100 persons (including the 77 used in the RS and AS studies referred to 
 above) were scored on each of the 13 clusters. (In scoring, the item marked 
 "most characteristic" was counted as +1, and the item marked "least character- 
 istic as -1.) The internal consistency of each cluster was determined bv means 
 of coefficient alpha (1), modified to take into account the spurious correlation 
 between items appearing in the same block. These alphas ranged from .48 to 
 -.10, with a median of .21. In addition, tetrachoric r's between clusters were 
 computed. 
 
. 
 
-7- 
 
 Clusters were then discarded if they had too low reliability and combined 
 with others if they had high intercorrelation. After application of these criteria, 
 eight clusters were retained and improved by item analysis. In this manner, 
 the test was reduced to a 59 item test of eight clusters. The reliabilities (in- 
 ternal consistencies) of these eight clusters was estimated by using the Guttman 
 split-half formula on a new sample consisting of the remaining 73 cases from 
 the total N. They are shown in Table 4. 
 
 Table 4 
 NAMES AND RELIABILITIES OF THE EIGHT REVISED CLUSTERS 
 
 No. 
 
 1 
 2 
 3 
 4 
 5 
 6 
 8 
 10 
 
 V iN - .v,; 
 
 
 Reliability 
 
 Name 
 
 
 (Split half) 
 
 Leadership 
 
 
 .554 
 
 Friendly attitude 
 
 
 .238 
 
 Comedian 
 
 
 .262 
 
 Politeness, conformity 
 
 .192 
 
 Competitive 
 
 
 .385 
 
 Academic, intelle 
 
 ctual 
 
 .299 
 
 Aggressiveness 
 
 
 .596 
 
 Independence 
 
 
 .348 
 
 The clusters were originally derived in order that a profile for each 
 person could be obtained. These profiles could then be compared, and the 
 similarity between any two profiles indicated by the statistic D (cf. 2). Profile 
 scoring involved the following steps: (l) each person's score on each cluster 
 was obtained; (2) the two profiles were compared by using the formula: 
 
 D =\'(ai - bl)« + (a 2 - h z Y + .... + (a fc - b,J 
 
 Here a. = score of person a on cluster 1 
 b. = score of personjp on cluster 1 
 a = score of person a on cluster 2 
 b ? = score of person b on cluster 2 
 D = distance (or dissimilarity) measure 
 
 The 78 new subjects who were used to find the reliabilities of the eight revised 
 clusters were also used in this second part of the study. 
 Inte rnals. 
 
 Only ASp was treated, since it was assumed that an improvement 
 in reliability of one internal score would be indicative of the success or failure 
 
-8- 
 of the profile -scoring technique in this instance. 
 
 Method. An odd -even split could not be used, since only 59 of the original 
 100 items were retained, thereby making some of the blocks incomplete. There- 
 fore, the test was split into two halves with an equal number of blocks in each 
 half. The split was arranged so that each cluster was represented by an approxi- 
 mately equal number of items in each half. Also, no cluster was represented by 
 more than one item in any one block. This will be referred to as the "A-B 
 split". 
 
 In order to test the efficiency of profile scoring, it was necessary to ob- 
 tain reliabilities of ASp when scored by the profile method and when scored 
 unclustered. Two methods were therefore used to obtain reliability: (1) the 
 "unclustered" method, which was the same as the procedure used on the ori- 
 ginal 100-item test, except that the A-B split was used instead of an odd-even 
 split, and (2) the "profile -scoring" method. This second method involved the 
 following steps: each of the eight clusters was split into halves on the basis 
 of the A-B split mentioned above. Thus, when a half A key and a half B key for 
 each cluster were applied to a self description, it was possible to get one pro- 
 file made up of all eight A halves, and another profile consisting of all eight B 
 halves. This procedure was duplicated on the prediction of the best co-worker. 
 This gave four profiles for each person; the A and B profiles on the self de- 
 scription and the A and B profiles of the prediction of the best co-worker. The 
 A profile of the self description and the A profile of the prediction were then 
 compared by the previously-given dissimilarity formula, and D obtained. The 
 two B profiles were compared in the same way. This procedure yielded two 
 assumed similarity scores, one on the A half, and one on the B half, for each 
 person. Correlating these scores yielded the reliability coefficient. 
 
 Because of the labor involved in these computations, a sample of 40 was 
 drawn from the group of 78 subjects and used in these calculations. 
 
 Results. A summary of the results is presented in Table 5. It can be 
 seen that both reliabilities obtained from the shortened test are substantially 
 higher than the reliability obtained from the original test. Significance of these 
 differences was not tested. 
 
 Conclusions. These results suggest that elimination of ambiguous items 
 or items uncorrelated with others in the test is beneficial, but that profile 
 scoring of this instrument does not improve reliability. They also tend to 
 confirm the conclusion that ASp can be reliably measured. 
 
- 
 
-9- 
 
 Table 5 
 RELIABILITIES OF ASp OF THE ORIGINAL, SHORTENED, 
 AND PROFILE SCORED TESTS 
 
 Method of Scoring 
 
 Unclustered 
 
 Unclustered, selected 
 items 
 
 Profile scoring, 
 selected items 
 
 N_ 
 77 
 
 40 
 
 40 
 
 No. of Items Reliability 
 
 100 .62 
 
 59 
 
 59 
 
 .75 
 
 .67 
 
 Externals. 
 
 Method. Substantially the same procedures were used here as in the AS 
 section. Self descriptions of the five key persons, were compared with the self 
 descriptions of 40 others by the unclustered method. Self descriptions of two 
 key persons, chosen at random from the original five, were compared with 40 
 others, using the profile -scoring method. Only two of the original five key 
 persons were used in this part of the study, because of the computational time 
 and labor involved. 
 
 Resul ts. The results are summarized in Table 6, which includes RS re- 
 liabilities from the previous section for purposes of comparison. With se- 
 lection of items, all RS unclustered scores attained reliability greater than zero, 
 but not much greater. Cluster scoring did not appear to have a beneficial 
 
 effect. 
 
 Table 6 
 
 RELIABILITIES OF RS OF THE ORIGINAL, SHORTENED, AND 
 
 PROFILE SCORED TESTS FOR FIVE KEY PERSONS 
 
 Key 
 
 Person 
 
 1 
 2 
 3 
 
 4 
 
 5 
 
 100-Item Unclustered 
 (N = 77) 
 
 -.226 
 
 .068 
 
 .264 
 -.571 
 -.302 
 
 59-Item Unclustered 59~Item Profile 
 (N a 40) Scored (N = 40) 
 
 .109 
 
 .337 
 
 .027 
 
 .099 
 
 .370 
 
 -.624 
 
 ,047 
 
 Conclusions. For these items, RS shows extremely low reliability. Select- 
 ing items which correlate raised reliability, but not enough to make the RS 
 score on this instrument useful. The slight improvement in reliability may be 
 

 
 
 
 . 
 
-10- 
 
 due to retaining more reliable items, but is probably also due to increasing 
 the homogeneity of the items. Shortening of the test, of course, tends to affect 
 reliability adversely. 
 
 Profile scoring does not seem to help. This conclusion, for the five -item 
 forced-choice biocK, is supported by preliminary indications from a matne- 
 matical study by Warrington. 
 
 Relia bili ty o ver Perceivees 
 
 As stated in the introduction, the second method of testing reliability called 
 for the demonstration of consistency over perceivees. An ancillary study was 
 conducted by Fiedler in which this hypothesis along with several others was 
 tested. 
 
 Captains of seventeen university intramural basketball teams were asked 
 to complete test blanks for (1) self description, (Z) prediction of best cooperator, 
 and (3) prediction of second best cooperator. The Assumed Similarity to the 
 best cooperator was then correlated with the assumed similarity to the second 
 best cooperator over the seventeen players. Rho was .58. This compares 
 favorably with the split-half reliability of ASp on the original 100 -item in- 
 strument (r = .618). 
 
 While the small N must be kept in mind, it is tentatively concluded- -pend- 
 ing study with a larger N--that a second phase of the reliability of ASp, re- 
 liability over perceivees, has been demonstrated. The result does tend to show 
 that the person who assumes similarity in one case also does so when perceiv- 
 ing another, but the generality of this attitude is a problem for further study. 
 The fact that ASp does not depend on a specific perceivee makes it more signi- 
 ficant as an expression of some personality characteristic of the respondent. 
 
 While correlation of the two ASp measures was high, the correlation must 
 be influenced by the degree of preference the perceiver has for the two per- 
 ceivees. Thus, the correlation between ASp for best cooperator and ASp for, 
 say , fifth cooperator would presumably be less than the correlation of .58 
 between ASp for best cooperator and second oest cooperator. Further, it seems 
 likely that the distance between the best cooperator and the second best co- 
 operator as used in this study is not the same for every subject. 
 
 Summar y 
 A study was performed to investigate the reliability of certain interpersonal 
 perception measures, using data gathered in a study on basketball teams (3). 
 

-11 
 
 The two classes of variables were (1) Assumed Similarity, defined as the corre- 
 lation between a person's self description and his prediction of the responses 
 of another on a set of items, or the correlation between two predictions of others 
 that one person has made, and (2) Real Similarity, defined as the correlation 
 of different persons' descriptions or predictions on a set of findings. The 
 major findings were: 
 
 1. Measures of Assumed Similarity derived from the basketball 
 instrument possess reliability sufficiently high to detect group 
 differences and allow for exploratory studies. The AS tendency 
 is reliable over sets of items and over perceivees. 
 
 2. The reliability of Assumed Similarity measures can be materially 
 improved by selecting items that are unambiguous and correlate 
 with other items. It was not improved by profile scoring. 
 
 3. Measures of Real Similarity on the present instrument were very 
 unreliable and cannot be expected to correlate with criteria. 
 
 The reliability of these measures increased slightly but not 
 sufficiently by item selection. Profile scoring was of no help. 
 However, it should be noted that the profile scoring procedure 
 was not finally tested here, having been limited to five -item 
 blocks and clusters of low internal consistency. 
 The study implies need for better measures of real similarity. Assumed 
 similarity is established as a trait which can be measured and whose psycho- 
 logical prooerties should be further investigated. Pretesting of items is ad- 
 vised for further instruments. 
 

 . 
 
References 
 
 1. Cronb-ch, L. J. Coefficient alpha and the internal structure of tests. 
 
 psvchometrika , 1951, 16, 297-334. 
 
 2. Cronbach, L. J. , and Gleser , Goldine C Similarity between persons 
 
 and related pr ob lems of profile analysis. Champaign-Urbana, Illinois: 
 Univ. of Illinois, 1952. (Mimeographed, Technical Report No. 2, 
 Contract N6ori-07135 between the University of Illinois and the Office 
 of Naval Research.) 
 
 3. Fiedler, F. E. , Hartmann, W. , and Rudin, S. A. The relationship^ 
 
 interpersonal perception to effectiveness in b askej^ball_teams. 
 Champaign-Urbana, Illinois: Univ. of Illinois , 1952. (.Mimeographed, 
 Technical Report No. 3, Contract N6ori-07135 between the University 
 of Illinois and the Office of Naval Research.) 
 
 4. Guttman, L. A basis for analyzing test-retest reliability. Psychometr ika, 
 
 1945, _10, 255-282. 
 
 5. Stephenson, W. A. Statistical approach to typology: the study of trait 
 
 universes. J. clin. Psychol. , 1950, 6, 26-38. 
 
w