LB UC-NRLF $B 13 Mil sychological Examinations of College Students BY F. EDITH CAROTH1 Submitted In partial fulfillment of the requirements for the Degree of Doctor of Philosophy^ in the Faculty of Philosophy > Columbia University NEW YORK 1922 EXCHANGE Psychological Examinations of College Students I BY F. EDITH [CAROTHERS Submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy, in the Faculty of Philosophy, Columbia University NEW YORK 1922 ACKNOWLEDGMENT The writer is indebted to Professor H. L. Hollingworth for the original plan of this investigation and the opportunity to conduct the tests in his laboratory; for his constant interest and his valuable suggestions in the preparation of the results. For helpful criticism regarding the arrangement and statistical treatment of the data she is also indebted to Professor R. S. Woodworth, Professor A. T. Poffenberger, and Professor J. McKeen Cattell. \ SECTION I HISTORY OF FRESHMAN TESTS The recommendation in 1882 by Galton 1 of the establishment of anthropometric and medico-metric laboratories for the examina- tion of individuals represents the first definite recognition of the need of examining individuals in order to give them vocational guidance. Galton saw the importance both to science and to individuals of collecting complete life-histories of people which should include photographs, anthropometric measurements, and medical facts. To meet this need he established his now famous laboratory in the South Kensington Museum, London. There, by payment of a small fee, individuals could go and have certain physical measurements made and undergo tests for keenness of vision and hearing, dynamometer pressure, reaction time, etc. Several years later, at the World's Columbia Exposition in i893, 2 Professor Joseph Jastrow arranged a laboratory devoted to tests of a strictly psychological nature. Prior to Jastrow's work, however, Cattell proposed 3 and tried out a series of ten mental tests and measurements on students in the psychological laboratory of the University of Pennsylvania. In devising his series of tests Cattell followed Galton in combining physical measurements with psy- chophysical and strictly mental tests. He went a step farther, however, by emphasizing the necessity of standardizing methods of procedure in administering tests so that results secured by different experimenters might be comparable. In addition to the Pennsyl- vania students, tests were also given to the students of Cambridge University and Bryn Mawr College. Gal ton's work stimulated other investigators to devise tests for measuring the capacities of individuals. Of particular interest is the list of ten fundamental traits or properties proposed by Kraepe- lin 4 as the basic factors to be considered in examining both normal individuals and the "mentally sick." These so-called fundamental dispositions include: the mental capacity to do work, the ability to 1 Fortnightly Review, 1882, p. 332. 8 Cattell and Farrand, L. Physical and Mental Measurements of the Students of Columbia University. "Mental Tests and Measurements," J. McK. Cattell with appendix by Francis Galton, Mind, 1890. 4 Der Psychologische Versuch in der Psychiatric; Emil Kraepelin, Psychologische Arbeiten, 1895. 2 Psychological Examinations of College Students be influenced by practice, strength of practice or general memory, special memory ability, susceptibility, fatigability, the ability to recuperate, the depth of sleep, the intensity of distraction and adaptability. To each one of these fundamental traits Kraepelin arbitrarily assigned a certain test, assuming that excellence of performance in the assigned test, say adding, would indicate excel- lence in the corresponding quality, say the capacity to do work. Although his assumption, without statistical proof, that certain tests would measure certain functions rendered his results inac- curate, from the modern standpoint, his work is interesting in that it is representative of a distinct stage in the use of tests for diag- nostic purposes. With the accumulation of data and the gradually increasing clearness of conception of the meaning of tests, methods of admin- istering them were revised. In i896 5 appeared the first report of the results of mental and physical tests made on freshmen only. It concerned the work done by Professor Cattell and Dr. Farrand on one hundred Columbia University students in 1894-5 and 1895-6. At this time there was conceived the plan of testing Columbia students during their freshman and senior years. Their tests comprised ten records and twenty-six measurements. Such physical measurements were taken as the color of hair and eyes, height and weight, breathing capacity, sensation areas, and strength of right and left hands. Other measures were of a sensory character, while certain simple tests of a mental character were taken, such as the rate of perception and the perception of space and time. In addi- tion, a personal record-blank was filled out by the student and a record of the impressions made upon him by the subject was filled in by the experimenter both before and after testing. The tests were given individually, the investigators and several assistants acting as experimenters, and required from forty minutes to one hour for their completion. The underlying purpose in giving these tests is clearly stated in this statement by Cattell and Farrand: 6 "When used with freshmen on entering college the record is of interest to the man and may be of real value to him. It is well for him to know how his physical development, his senses, his movements, and his mental processes compare with those of his fellows. He may be able to correct defects and develop aptitudes. Then when the tests are repeated later in the college course and in subsequent life the record of progress or regression may prove of substantial importance to the individual." 5 Cattell, J. McK., and Farrand, L. Physical and Mental Measurements of the Students of Columbia University, Psychological Review, 1806, III, 618-647. 6 Above reference. History of Freshman Tests 3 These Columbia freshman tests continued to be given each year under Professor Cattell's direction. In 1901 7 an account and discussion of the results was published by Wissler. He discusses the changes and additions made in the tests and considers the records of 250 freshmen, a small number of seniors, and some Barnard girls. The tests employed were: length and breadth of head, strength of hands, fatigue, eyesight, color-vision, hearing, perception of pitch, perception of weight, sensation areas, sensitiveness to pain, per- ception of size, color preference, reaction time, rate of percep- tion, naming colors, rate of movement, accuracy of movement, perception of time, association, imagery, memory, (auditory, visual, logical, and retrospective). Records of stature, weight, etc., to- gether with data concerning parentage, personal habits, and health, the physical measurements taken in the gymnasium, and academic marks were also secured. From the similarity of the results of freshmen tested each year, Wissler concluded that freshmen enter- ing Columbia from year to year are a homogeneous group and represent a type. His general conclusions are: 1. That the laboratory mental tests show little intercorrelation in the case of college students. Correlations range from .28 (accuracy and speed in marking out A's), to +.39 (auditory and visual memory correctly placed). 2. That the physical tests show a general tendency to correlate among themselves, but only to a very slight degree with the mental tests. 3. That the markings of students in college classes correlate with themselves to a considerable degree. Correlations run from +.11, (mathematics and logical memory) to +-75 (Latin and Greek). These early Columbia tests and measurements were principally motor and sensory in character, and the few tests that might be considered to have an intellectual quality were so simple that they proved of little value for determining the mental status of the college freshman. They are, however, significant in that they represent the first definite attempt to establish standards of performance for freshmen and to show students how their standing in various tests compared with the average standing of their class. Subsequent to the establishment of the practice of testing the Columbia students in their freshman and senior years, committees were appointed by the American Psychological Association in 1896 7 Wissler, Clark; The Correlation of Mental and Physical Tests; Psychological Review. Mono- graph Suppl., Vol. Ill, No. 1901, p. 62. 4 Psychological Examinations of College Students and 1907, respectively, to consider the possibility of accumulating mental and physical statistics through cooperation on the part of various psychological laboratories and to devise a standard series of group and individual tests. In 1896 the committee drew up a series of physical and mental tests appropriate for college students tested in a psychological laboratory. Various other proposals were made for the scientific study of the college student. In 1899 President Harper of Chicago recommended that special study be made of the college student's character, intel- lectual capacity, and tastes, by the questionnaire method. In 1906 Thorndike 8 called attention to the fact that the entrance examina- tions given by the College Entrance Board of the Association of Colleges and Preparatory Schools of the Middle States and Mary- land did not measure at all accurately the candidate's capacity and emphasized the need of the scientific study of this matter. Williams 9 also stressed the importance of studying the college student. Like President Harper, he recommended the questionnaire method for ascertaining facts concerning the student's personality, and suggested the use of Whipple's information test for obtaining a knowledge of the student's range of information. He also pointed out the need of vocational advisors for freshmen. Calfee 10 in 1913 reported the results of four general intelligence tests on 103 freshmen (51 boys and 52 girls) of the University of Texas. The tests used were card-dealing, card-sorting, alphabet- sorting, the mirror test, and the spirometer test for vital capacity. She finds inter-test correlations for the boys and girls combined ranging all the way from +.50 to .00. The correlations between the tests and college grades range from +.32 (card sorting and grades) to +.16 (mirror test and grades). The correlation between the lung test and grades is .11. Considering the girls' records alone, the inter-test correlations range from +.45 to +.19, and the correlations with college grades from +.28 to +.13, and with the lung test the correlation is .00. No further attempt to measure the performance of college fresh- men in tests is reported until December, 1915, when Dr. Karl T. Waugh presented a paper on "A New Mental Diagnosis of the College Student" before the American Psychological Associa- Thorndike, E. L. An Empirical Study of College Entrance Examinations. Science, N.S., 1906, 23, 839-845- Williams, C. W. Scientific Study of the College Student. Calfee, M. College Freshmen and Four General Intelligence Tests, Journ. of Educ. Psychol., 1913, 4, 223-231. History of Freshman Tests 5 tion. 11 In 1912 he applied seven tests 12 individually to freshmen in Beloit College, and three years later, in 1915, he gave the same tests to thirty-nine of the same subjects. Waugh's inter-test cor- relations range from .43 to +.54, and he finds some improve- ment in the tests from freshman to senior year. During the year 1913-14 Bingham 18 gave nine tests to 200 Dart- mouth freshmen, seven of them being given individually. As a number of psychology students, unpracticed experimenters, assisted Professor Bingham in his testing, the results of his investigation are somewhat inaccurate. He gives norms for the nine tests, (median, standard deviation and coefficient of variability) and the range from the poorest to the best. As no correlations are reported we have no information as to the relationships between the tests. Bingham's chief contribution consists in his use of the method of ogive percentile graphs. The data in seven of his tests are presented in this form, thus serving as a scale. Given the score made by any individual, the experimenter by reference to the chart can readily assign him a rank among his classmates. The speed with which a student may be thus assigned his relative position in any given trait makes this method a most convenient one for the instructor. 14 At the University of Texas the same year Bell 15 gave nine tests w to about seven hundred and fifty freshmen. Bell definitely states that his aim was to devise a series of tests that would "be of assis- tance to college authorities in aiding freshmen to adjust themselves to their environment." The time required for testing was from forty to forty-five minutes. The tests were given not individually, but in groups averaging a little less than twenty each. The time- limit method was used. This, together with his arbitrary method of scoring the tests may account in some measure for the unsatis- factory nature of his results. He weighted each test so that a perfect 11 Waugh, Dr. Karl T. A new Mental Diagnosis of the College Student. New York Times Magazine, January 2, 1916. 12 Waugh's tests were: I. Concentration of attention (cancellation of A's); 2. Range of infor- mation; 3. Speed of learning (substitution); 4. Quickness of association (opposites); 5. Ingenuity (puzzle-box); 6. Steadiness; 7- Memory for a passage (immediately after hearing it read and after an interval of two weeks). "Bingham, W. V. Some norms of Dartmouth Freshmen; Journ. of Educ. Psychol., March, 1916, Vol. 7, PP. 129-142. 14 Bingham's tests were: i. Endurance of grip; 2. Tapping; 3. Memory span for auditory digits; 4. Logical memory; 5. Cancellation; 6. Color Naming; 7. Logical relations; 8. Mixed relations; 9. Perception of form. Bell, J. Carleton. Mental Tests and College Freshmen; Journ. of Educ. Psychol., Sept., 1916. Vol. 7, pp. 381-399. 16 Bell's Tests include: i. Cancellation of triangles; 2. Addition; 3. Association or learning pairs; 4. Recognizing forms; s- Marking right statements; 6. Easy directions; 7. Hard Directions; 8. Alternatives; 9. Completion (using "The Strength of the Eagle" as material). 6 Psychological Examinations of College Students mark or the highest mark would approximate 100, and the other marks range downward from this to zero. For example, in the Triangles test there were fifty triangles to be crossed out. Each one correctly crossed out counted two points and five points were deducted for each error, positive or negative. For example, if a student crossed out 35 triangles, omitted 3, and crossed out one circle, his score was 70 minus 20 = 50. The other tests were scored in similar manner. Bell also obtained the correlations of freshmen university grades with each other and of the university grades with the mental tests. His conclusions are : 1. The correlations between freshmen university grades vary from + -34 (mathematics history) to + .59 (English history, science history) . 2. The highest correlation between class marks and test scores is + .31 (English Completion). 3. Among the tests themselves the highest correlations are found between the Association and Recognition tests, and between the Directions, Alternatives and Completion tests. 4. There is a considerable difference in the results of the tests with the best and the poorest students, but the scores are so variable as to be of little value for individual diagnosis. The investigations of Calfee, Waugh, Bingham, and Bell illustrate the striking change that has taken place in the character of mental tests since the early Columbia tests were first instituted. In place of sensory and motor tests we now employ tests which will measure diverse mental functions. Motivated by this same desire to secure a group of tests for college students indicative of mental ability, and correlative with college grades, Rowland and Lowden 17 began to try out groupings of psychological tests in 1912-13 and carried out their investigations over a period of three years. The tests were conducted individually on all the students in Reed College, twelve students of experimental psychology assisting in conducting the tests. The first grouping of tests was tried out on 54 students during 1912-13, after which the grouping was revised and given to 195 more subjects. No inter-test correlations are reported. The highest correlation between university grades and the groupings was between the grades and the letter-group g-r-s-t., cancellation, opposites, logical memory, judgment (syllogism), rote memory, 17 Rowland, E. and Lowden, G. Report of Psychological Tests at Reed College. Journ. of Exper. Psychol., 1916, I, 211-217. History of Freshman Tests 7 cancellation of words with a and /, (a correlation of + .37 with a P.E. of .06). Psychological tests have also been conducted for several years at Vassar College. Results of tests made upon Vassar freshmen during the years I9I4, 18 1915, and 1916 19 show data collected from four sources, namely: I. Answers to a questionnaire calling for infor- mation regarding the student's imagery, interests, language facility, and habits; 2. Results of the tests; 20 3. Freshmen academic grades ; 4. Reports of promising students by their instructors. To deter- mine roughly the correlation between academic marks and test scores, the difference between the average class standing of students having test scores in the first or highest quarter and the average class standing of students with test scores in the last quarter was found. If there was a marked difference the experimenters con- cluded that a positive correlation existed. According to this rough method they found a positive correlation between academic marks and the tests except Hard Directions. On the whole, the results of the Vassar tests appeared to indicate that ability in the tests correlates well with ability in freshman studies, while inability to do well in the tests is correlated with a similar inability to do well in freshman studies. Moreover, students designated as "promising" by their instructors tend to manifest a high grade of performance in the tests. (14.5% of 317 freshmen tested in 1917 who passed all the tests in the Terman Superior Adult Tests were rated by their instructors as being of only average ability.) The experimenters also found that the relation between success in freshman tests and academic success in three years' work is less than that between success in freshman tests and academic success in the freshman year. Inasmuch as there were thirty different testers, each one being assigned a small group of freshmen, little confidence may be placed in the accuracy of the data. The tests as conducted at Vassar are of value more for the opportunity they afford students of psychology to acquire training in experimental methods of pro- cedure than for any contribution they make to our knowledge of freshman standards of performance in various tests. "White, Sophie D.; May, Sybil; and Washburn, M. F. A study of Freshmen. Minor Studies from the Psychological Laboratory of Vassar College, No. 31, Amer. Jour, of Psychol., 1917, Vol. 28, pp. 151-154. "Montagne, M.; Reynolds, M. M.; and Washburn, M. F. A Further Study of Freshmen . Amer. Jour, of Psychol., 1918, 29, 327-330. 20 The tests described include: Verbal memory and memory for ideas; Reading Backwards; Hard Directions; Analogies; Sentence Building; Suggestibility; Free Association; Thurstone Reasoning. 8 Psychological Examinations of College Students An interesting contribution in connection with the application of psychological tests to college freshmen is that of Kitson 21 at the University of Chicago. With the general purpose of devising a "system for measuring the mental capacity of college students in order to guide their college work," Kitson selected sixteen tests. 22 About half the tests were given by the group method. The time required for testing was two and one half hours. From forty com- plete records Kitson computed norms of performance in the various tests. In addition, a graphic chart was arranged for each student to show his standing in each test and to furnish a net score com- bining his standing in all the tests. In the particular tests used, Kitson found a significant positive correlation only between: 1. Memory for meaningful material seen and heard (+.54); 2. Between the first and second reproductions of this material (+ .49); 3. Between the Opposites and Constant Increment tests (-f .40). When correlations were computed of standings in each test with standings in the net score, they were found to be some- what higher. The correlation between college marks and psychologi- cal tests was found to be + .44 (P.E. .09) but from forty records secured from a second group of freshmen tested the correlation was found to be only + .20 (P.E. .11). Kitson explains this low correla- tion on the ground that many other factors besides intelligence enter in to determine standing in school studies, such as the personal factor of the instructor, the student's will power, social surroundings, economic conditions, and physical condition. The correlation between the psychological tests and intelligence as estimated by the dean was -f- .57 (P.E. .05). Twenty-one of the 1915 freshmen were retested in seven of the tests in their Sophomore year and improvement was shown in every test except one. (Numbers heard.) Comparison between the net score for freshman and sophomore year shows a correlation of + .88 (P.E. .03). Although his norms of performance in the tests and his inter- test correlations are not very reliable, based as they are upon only forty records, there is much to be said in favor of Kitson's general method of procedure. His emphasis upon the importance of study- ing the individual student in his relation to the college and his a Kitson, H. D. The Scientific Study of the College Student. Psychol. Monog., 1917, 23 (No. 98), p. 8x. The tests employed were: Number-checking; Memory for numbers heard; Memory for objects seen; Memory for logical material heard; Secondary memory for same; Immediate memory for logical material, seen; Secondary memory for same; Loss in logical material, heard; Loss in logical material, seen; Opposites; Constant increment; Hard directions, printed and oral; Word build- ing; Sentence-building; and Business ingenuity. History of Freshman Tests 9 realization of the fact that psychological measurements, however large the role they may play in determining a student's abilities and aptitudes, must not be considered the sole factor in such a determination, but rather should be so coordinated with measures of the student from various other aspects as to lead to our fuller understanding of the nature of the individual student and his potentialities, signify a decided advance in the method of treating the problem. The splendid cooperation of all the students and his success in dealing with delinquent cases speak much for Kitson's general method. Other minor investigations have been made on freshmen with the same purpose. Sunne, 23 working at Newcomb College, found a low correlation between college grades and an information test tried on twenty- five freshmen, and with ninety-nine freshmen who were given a series of tests found correlations of tests with grades ranging from o to + .25. Haggerty 24 found a correlation of a quality of reading test and omnibus test with medical marks of + .62 and + .60, respectively, and of the two combined of -f- .65, in the case of sixty-nine candidates for medical school who had already completed two years of college. At the University of Iowa King, 28 working with a little group of nineteen freshmen, found a tendency for the students with high academic marks to make higher scores in the completion, logical memory, and lanes test than the students with low academic marks. He gives no statistical evidence in support of this statement. Later, using a series of five tests with 56 freshman engineers, he obtained a correlation between students' ranks in all the tests combined and their academic grades of -f .27. The tests employed by King were: 1. Courtis Arithmetic, Series B, (graded for speed and accuracy); 2. Hard Opposites; 3. Recognition of Forms; 4. The Kansas Silent Reading Test, (H.S. Series); and 5. "Hall Cube Test," a test of visual imagination. A little later Irving King and James M'Crory 26 followed Kitson's method more definitely. In the fall of 1916 they tested 276 women and 268 men freshmen in seven different tests: the Courtis Standard Sunne, D. The Relation of Class Standing to College Tests, Journ. of Educ. Psychol., 1917, 8, 193-211. * 4 Haggerty, M. E. Tests of Applicants for Admission to University of Minnesota Medical School. Journ. of Educ. Psychol., 1918, 9, 278-286. King, I. The relationship of abilities in certain mental tests to ability as estimated by teachers, School & Society, 1917, 5, 204-209. King, I. and M'Crory, J. Freshman Tests at the University of Iowa, Journ. of Educ. Psychol., 1918, 9, 32-46. io Psychological Examinations of College Students Arithmetic Test, Series B; mixed relations; two tests of "opposites;" a completion test used by Simpson; visualization; Whipple's infor- mation test, and a logical memory test. The group method of test- ing was used, the tests being given in groups of from ten to twenty- five. Their rather low inter-test correlations indicate, they state, that they are measuring a variety of mental functions. They find, moreover, fairly good correlations between the tests and academic grades (+ .14 to -f .45 in the case of the girls, and -f .21 to + .84 in the case of the boys). In their attempt to make practical applica- tion of the tests for the diagnosis of their students in general and cases of special ability and disability, as Kitson does, they have been fairly successful. At Northwestern University Uhl 27 obtained inter-test correla- tions ranging from -f .18 (Trabue Completion K and Information), to -|~ .42 (Trabue Completion M and Information), for a group of one hundred freshmen tested in the fall of 1916. His series contained only four tests: Trabue Completion K and M, a hard opposites list of iwenty words, and an information test which consisted of the seventy most familiar words in Whipple's list plus thirty new words. Test correlations with the first semester English and Mathematics grades were determined and found to range from -f- .48 (English and Mathematics), to + .16 (Completion K and Mathematics). When he had three mathematics instructors rate these one hundred stu- dents for ability, Uhl found a correlation of -f .93 between their ratings and the Mathematics grades of the students. This high correlation was no doubt due to the tendency on the part of the teachers to make their judgments of the students practically equiva- lent to the students' course grades. The correlation between the instructor's judgments and the ranks of these same students in their last year of high school was + .59, and with all the tests combined was + .36. Uhl thinks his tests fail to measure accurately, the information test being the most unsatisfactory, and attributes his low correlations to the homogeneity of his group, the relative sim- plicity of the tests, and the unreliability of school marks. Thurstone's 28 work represents a further development in the use of psychological tests. At the Carnegie Institute of Technology the attempt is made to use psychological tests as a criterion for admis- sion. A series of six mental tests was given to 114 freshmen of the Margaret Morrison Carnegie School in October, 1917. The problem Uhl, W. L. Mentality Tests for College Freshmen, Journ. of Educ. Psychol M I9i9, io, 13-28 28 Thurstone, L. L. Journ of Educ. Psychol., March, 1919. History of Freshman Tests 1 1 was to determine whether they could reduce the number of students who were dropped for poor scholarship or placed on probation for poor scholarship by the use of the mental tests, and to determine whether the mental test ratings correlated with faculty estimates concerning the general ability of the students. The tests which agreed well with the judgment of the faculty were retained. In working up his results Thurstone used the method of critical scores. After plotting scatter diagrams for each test, upper and lower critical scores were determined such that every student above the upper critical score is above the average in the opinion of the faculty, and every student below the lower critical score is below the average in the opinion of the faculty. The mental test rating was designated as the medium percentile rank in all six tests plus 5 points for each test in which the student is above the upper critical score, and minus 5 points for each test in which he is below the lower critical score. Students with a mental test rating of 10 or below were reported as doubtful. Thurstone found a correlation between instructors' estimates of students' ability and the combined mental test rating of + .60. From his results he concluded that: I. The mental test rating would have eliminated seven of the eleven total failures at the beginning of the year. 2. No average or good student would have been eliminated by the mental test rating. All students who scored below the lower critical mental test rating were, without exception, poor students. Moreover, all the freshmen who were rated high by the faculty were above the average in the mental test rating. From all indica- tions, this method is working out well at Carnegie. The past three years have brought a further development in the use of psychological tests for measuring the intelligence of college freshmen. Since 1918 the Army Alpha test has been administered to freshmen in several colleges with varying degrees of success. Professor Stone 29 reports that its use at Dartmouth justifies the recent proposal to admit students scholastically in the upper quarter of their class in approved schools. Strictly speaking, the work at Dartmouth should not be included in this history, since it deals with the results obtained in testing all the college classes rather than freshmen only. We mention it here, however, because the college authorities are now devoting particular attention to admin- Stone, Charles Leonard. "Intelligence and Scholarship;" The Dartmouth Alumni Magazine March, 1920. 12 Psychological Examinations of College Students istering the test to the freshman class. During the fall of 1918 the Army Alpha test was given to all the students in the Students Army Training Corps which included practically the entire student body. The average score in Alpha for the 677 S. A. T. C. men tested was 147.5. The average academic grade for the same men was 2.12, using the scale D = i, C = 2, 6=3, A = 4. The correlation between the academic marks and Alpha scores was + .44. There is also a significant correspondence between a student's score in the Alpha test and the scholarship quintile his academic record places him in. Although less exact than Thurstone's method of assigning individuals their relative position in a group, this method serves to give a rough and quick estimate of a student's status. Similar to this Dartmouth study is that of Walcott's 30 at Hamline University. Here, too, not freshmen alone, but all students were given the Alpha test in the fall of 1918. Walcott's results are based on data secured from 61 men and 145 women. As in the Dartmouth investigation, a far greater proportion of men and women students secure a score in Alpha in the high grade intelligence group than was found in any of the army camps. The median score is 129 for the Hamline men and 133 for the women, with the same sharp differentiation between the poor and the good groups as Stone found at Dartmouth. The correlation between the results of the Alpha test for the women and their first term academic grades was -f- 47, slightly higher than the Dartmouth result. Although Walcott does not consider the army test the best device for determining the fitness of students for college work, he sees in the significant difference in score between the upper and lower half of the students tested, the practical use to be made of this fact in the placing of students. Similar investigations have also been conducted by Hill, Filler, 31 and Hunter at the University of Illinois, Dickinson College, and Southern Methodist University, respectively. At the University of Illinois 3,500 students were tested in twenty-four groups in March, I9I9, 32 members of the faculty acting as experimenters. As at Dartmouth and Hamline, the scores of the students at each of these colleges show them to be a very select group compared to the army men. The median score of the freshmen in the school of liberal arts and sciences at the University of Illinois is 147. At o Walcott, G. D. "Mental Testing at Hamline University." School and Society, 1919, ro, S7-6o. M Filler, M. G. A Psychological Test. School & Society, 1919, 10, 208-209. Hill, D. S. Results of Intelligence Tests at the University of Illinois; School & Society, 1919, 9, 542-545- History of Freshman Tests 13 Southern Methodist University 88 the effort was made to secure select groups of students in order to compare their scores with the average score for the school. Each student was asked to name men and women students whom they thought would make high scores in the Alpha test. For 16 men and 8 women named by from five to forty students as being able to make the highest scores, the average score for the men was 154, and for the women 156, justifying the judgment of the students. With a similar group of students named by the faculty as being able to make the highest scores even better results were obtained, the average for the men being 161 and for the women 167. In selecting a group of men and women whom they judged would make low scores the faculty were equally successful. Both faculty and students thus showed themselves fairly good in their ability to select students on the basis of intelligence, though this method of selection is inferior to selection on the basis of actual scores. The correlation between the Alpha scores of the women students and their college grades for the fall term was + -52. No correlations are given in the Illinois and Dickinson reports, which are only preliminary. The following is a comparative table showing scores obtained at the University of Illinois, Dickinson College, and Southern Methodist University: Southern University Dickinson Methodist of Illinois College University Total number tested 3,254 213 321 Number of freshmen 489 72 128 Lowest freshman score ..... 52 75 60 Highest freshman score 188 195 188 Median freshman score 147 141 127 Hunter explains the lower median score at Southern Methodist University as due to a difference in the method of conducting the test. More fully developed than these three preliminary investigations is the work being done at Brown University. 34 Colvin reports the results obtained from 103 freshmen with the Alpha test and two series of psychological tests, known as Brown University Series I and II, which were separated by an interval of several days. Each series consisted of four tests: mutilated sentences, vocabulary, analogies or mixed relations, and a reasoning test. The distribution "Hunter, H. T. Intelligence Testa at Southern Methodist University; School & Society, 1919. io, 437-440. u Colvin, S. S. Psychological Tests at Brown University; School & Society, 1919, io, 27-30. 14 Psychological Examinations of College Students of scores for both Series I and Series II separately and for the combined scores of Series I and II conformed closely to a normal probability curve. The correlation between Brown University Series I and II is + .75, and between the average of these two series and the Alpha test is + .79. The correlation between the Brown University tests and the average academic marks of the first and second terms is + .59, and between the army test and the average of the marks of the first and second terms is + -45- Practical appli- cation was made of the tests to foretell a student's probable aca- demic success and to aid in diagnosing cases of failure in school work. Colvin found that two-thirds of 80 students reported as doing unsatisfactory work in the first term had made low scores in their psychological tests, while only one-sixth of the men had a satisfactory grade. Most of the cases of students doing poor college work who had obtained high scores in the tests were due not to lack of ability, but to other reasons. So satisfactory have the tests been in determining the students' mental status and helping them that they are still being employed. In a recent article in the Educational Review 35 Professor Colvin compares in greater detail the scores and correlations obtained in the Brown University tests and the Alpha test, and reports results secured in giving the Brown tests and the Thorndike tests to 300 freshmen. The Brown tests require about fifty-five minutes of actual working time as contrasted with about three hours required by the Thorndike tests. The median score for the Brown tests is 62.4 with a standard deviation of 10.59, compared to the median score for the Thorndike tests of 76.5 with a standard deviation of 14.89, the difference being due to the fact that the Brown tests have a maximum score of 100, while the Thorndike tests have a maximum score of about 150. The correlation between the scores obtained by students in the two tests is + .816 with a P. E. of .0138, but the Thorndike tests show a higher correlation with academic marks (+ .53) than the Brown tests (+ 46). While the Thorndike tests show a slight superiority in prognostic value, nevertheless results show that men receiving scores in the lowest fifteen per- centile of either the Brown or the Thorndike tests have a relatively small chance of graduating from college. Colvin warns against the danger of refusing men admission to college solely because of a low psychological record. He advocates the conservative position of Colvin, S. S. The Validity of Psychological Tests for College Entrance. Educational Review. June, 1920. History of Freshman Tests 15 regarding the psychological record as one among many factors to be considered in diagnosing cases of individual students. At Ohio State University the Alpha test was successfully given to 5,950 students October 10, 1919, in groups of one hundred to two hundred and fifty. The distribution of scores for the entire group conformed to the normal probability curve, the students being grouped into five classes as follows: Approximate Class Score Percentage in Each Class I. 178-212 Very superior intelligence 5 II. 155-177 Superior 20 III. 115-154 Average 50 IV. 85-114 Fair 20 V. o- 84 Poor 5 The percentage of students falling into each of these five classes was then determined for the various university units separately, such as the Graduate School, Commerce and Journalism, Law, Medicine, Engineering, Arts Education, Agriculture, Pharmacy, etc. The median, highest, and lowest scores, and the number examined for each class (college year), in each college and in the whole university, are reported. The highest median score, 157, was obtained by the Graduate School; Arts received second place with a median score of 147; Commerce and Journalism third, with a median score of 146; and so on down to a median of 112, (Veterin- ary Medicine group). The report gives an interesting comparison of the various college groups. The Thorndike tests, previously mentioned, are rapidly becoming more widely employed for freshmen testing than the Army Alpha. Jones, 86 writing in the Educational Review, clearly describes the general nature of these tests. Although conceding their practical value, he urges that they should be employed "not to the exclusion of other measures for determining fitness, but along with them." Evidence of a student's fitness to undertake college work should, in Professor Jones' opinion, include his preparation for college work, his character and promise, his health, and his intelligence denoted by his score in the Thorndike test. In a brief report before the New York Branch of the American Psychological Association this year Mr. Wood stated that the purpose of the Thorndike tests was fourfold: I. To select those fit for a college course; 2. To aid college committees; 3. To assist the progress of schools; 4. To M Jones, A. L. Psychological Tests for College Admission; Educational Review, 1919, $8. 271-278. 1 6 Psychological Examinations of College Students assist the Dean in the administration of the college. Results from a large number of freshmen showed a correlation between the total Thorndike score and the average college grade of + .52, and the median college grade of + .54. Although no published reports of results secured with the Thorndike tests have appeared, investi- gators who are employing the tests find them highly satisfactory. SECTION II STATEMENT OF THE PROBLEM WITH A LIST OF THE TESTS EMPLOYED The present investigation, begun at Barnard College in the fall of 1915, about two years before the Army Alpha and the Thorndike Tests were originated, was carried on during the years 1915-16, 1916-17, the fall of 1917, and the spring of 1919. The general purpose underlying the investigation was similar to that underlying the investigations of other experimenters during this period a purpose which continues to motivate present studies. The aim was first, to establish norms and standards of performance in mental tests for Barnard freshmen, and second, to give students a clear conception of their abilities and aptitudes along various lines. More specifically, this investigation concerns the trial of a group of tests with the object first, of determining their reliability as measures; second, their correlation with freshman university grades; and third, with physical records taken in the gymnasium. In selecting the particular group of tests to be used several factors contributed. Paramount in importance was the desire to select a series of tests of such nature as to call into play various mental functions. In addition, it was desired to secure tests which previous investigators had found to have a positive correlation with such factors as age, ability along some vocational line, or general intelli- gence. Equally important in determining the final selection was the time-limitation factor. Owing to unwillingness on the part of stu- dents to act as subjects for a longer period, and to the factor of fatigue which would probably influence the results of tests com- pleted after that time, it was found necessary to have a series of tests such as could be completed in one hour. Consideration of all these factors finally lead to this selection of tests : 1. Coordination 8. Verb-object 14. Word Memory 2. Tapping 9. Mixed Relations 15. Logical Memory 3. Cancellation 10. Word Building 16. Substitution 4. Checking II. Word Naming 17. Completion 5. Color Naming 12. Knox Cube 18. Information 6. Directions 13. Digit Span 19. Vocabulary 7. Opposites SECTION III METHOD AND TECHNIQUE OF THE INVESTIGATION Shortly after the beginning of the academic year, in the fall of 1915, the series of tests selected according to the manner described in the preceding section was submitted to a preliminary trial in order to determine the best method of conducting the tests, and to afford the writer practice in their administration. After determining the general method of procedure, a notice was posted in the Fresh- man Study of Barnard, stating that a series of psychological exam- inations had been instituted for Barnard freshmen, and giving a description of the nature and purpose of the tests. It was stated that the time required for the examination was one hour, and an accompanying schedule indicated the hours at which the test might be taken. The place where the examinations were to be held was also indicated, and all freshmen interested were requested to sign their names on the schedule opposite the hour at which they could take the test. This method of permitting the student to take the test at the hour most convenient for her, rather than at a time prescribed by the experimenter, seems advisable in that it establishes a certain uniformity in conditions, the student usually being in her best physical condition at the time of testing. In addi- tion, letters were sent to individual students in the class, reminding them of the examination, and an account, written by Professor Hollingworth, of the widespread use of similar tests by reliable business firms and their value in selecting candidates for positions along various lines, appeared in the college weekly. A similar notice of the tests was posted in Freshman Study in the fall of 1916, and in the fall of 1917. Letters were also sent to individual students at these times. The subjects, as indicated, were Barnard students in their fresh- man year. The fact that they had had no training in experimental psychology, and were unfamiliar with the tests employed, made them a suitable group for testing. Out of a class of about one hundred and forty freshmen during 1915-16, one hundred were tested. This constitutes our first group of subjects whom we will designate as Group I. During the year 1916-17 (class of 1920), Method and Technique of the Investigation 19 eighty- five freshmen were tested, and in the fall of 1917 fifteen more (class of 1921) were given the tests. These last two groups together constitute our second group of one hundred freshmen whom we will designate Group II. In addition, in order to determine the reliability of the tests, the series was divided into two equivalent parts in a manner to be described later. In the spring of 1919, during the period extending from March 14 to May 15, forty-five freshmen from the class of 1922 were tested twice on the same day, each test requiring forty-five minutes of the student's time. All the tests were given individually. This enabled the experi- menter to supervise personally the performance of each subject and to stop her at any indication that she did not fully understand the directions given. It was likewise an important factor in contributing to the standardization of the conditions of the experiment. The subject was by this means freed from any feelings of irritation or discouragement that might have arisen if she had taken the test with a group of students whom she knew to be more rapid workers than herself. In such a case the knowledge that others were accom- plishing their work in a shorter period of time would operate to arouse in some subjects such feelings of the futility of competing with their companions that their resulting performance would have been much slower than would have been the case where the tests were taken under more favorable conditions. Each freshman, then, was examined individually, and every effort was exercised to make the conditions of the experiment as uniform as possible. The room employed for the testing was one regularly used by the Department of Psychology for advanced experimental work, and from the point of view of light and ventilation it is well adapted for research. Except during the tapping and coordination tests, the subject sat at a small laboratory table, opposite the experimenter. As the room was so situated as to be almost unaffected by sounds from neighbor- ing rooms, and was itself kept in a quiet condition, there was nothing to distract the subject's attention from her work. As previously indicated, attempt to secure uniformity in admin- istering the tests was also made. Besides giving the tests individ- ually, the order in which the tests are listed was followed. In a few cases circumstances rendered it necessary to deviate slightly from this order, but in general it was followed rigidly. The result of the preliminary trial had been to indicate the most satisfactory manner in which the tests should be administered. The aim was to make the directions as clear, simple, and direct as possible. As a detailed 2O Psychological Examinations of College Students account of the instructions given for each test will be considered in the next section, it is only necessary to mention here that the method of procedure agreed upon was carefully followed with one or two exceptions where misinterpretation of the directions resulted in the experimenter's repeating the instructions in a slightly different form. SECTION IV DISCUSSION OF THE TESTS, INCLUDING MATERIALS USED, METHODS OF PROCEDURE, AND RESULTS Test No. I. Coordination This test, popularly termed the "three-hole test" calls for both speed and accuracy of movement and gives an indication of the subject's motor ability and coordination. Apparatus: An oak plate tilted at an angle of 45 degrees to the base board, containing three brass-line holes arranged in the form of an equilateral triangle, about 8 cm. apart. Contact of the metal rod with the bottom of the hole makes an electrical connection recorded by the automatic counter. Stop watch. Instructions: "I want you to hold this (stylus) in your right hand and to touch the bottom of each one of these targets as quickly as possible, going around in a circle without skipping any of the holes. You see every time you do so, the contact is registered on the electric counter. I want to see how many contacts you can make in one minute. You start then when I say, 'Go' and stop when I say, 'Stop.'" Method of scoring : The score represents the number of contacts made in one minute. Results: The average, standard deviation, and range for groups I and II (200 freshmen in all), is indicated in Table I below: TABLE I Test No. i Coordination Group I Group II Test No. 2. Tapping This test has been widely used as a test of motor speed and endur- ance and has been considered by some experimenters to afford the best index of motor capacity. Apparatus: Tapping board with metal plate and electric counter. Range Poorest Best (Av. of (Av. of Average S. D. lowest 5) best s) 82.7 10.77 63-8 109.0 8 4 .I 11.92 60.8 IIO.4 22 Psychological Examinations of College Students Tapping stylus with flexible connecting wire attached. Two dry cells. Stop watch. Instructions: "I want you to hold this (stylus) in your right hand and tap on here (indicating the brass plate) as quickly as possible. I want to see how many times you can tap in a minute. Start when I say 'Go' and stop when I say 'Stop.' " These instructions were accompanied by an illustration of tapping by the experimenter. For this test the subject sat directly in front of the tapping board, resting her arm on the table, and assumed the position most con- venient for her. Method of scoring: The score represents the number of taps made in one minute. Results: Table II shows the results obtained in this test: TABLE II Test No. 2 Tapping Group I Group II Range Poorest Best (Av. of (Av. of Average S. D. lowest 5) bests) 376.26 51.69 263.2 499.0 368.54 39-32 283.0 451.4 Test No. j. Cancellation This test is well adapted for measuring concentration and alert- ness of attention, maximum effort being required to accomplish the task quickly and accurately. In addition to involving such factors as "speed of perception" and "discrimination" it is partly dependent upon the subject's muscular reaction to stimuli presented. Owing to the fact, previously mentioned, that it was necessary to complete all the tests in one hour, it was found advisable to limit some of the tests. Inasmuch as we desired to include the Checking Test which involves functions similar to those involved in Cancella- tion and as it was believed that these two tests together would exert an unfavorable influence upon the results of following tests due to the eye-strain they would cause, it was deemed advisable to use only one half of the Cancellation blank and one half of the Checking blank. The halves of these blanks have been found by Woodworth and Wells to be equal in difficulty and they suggest that one half of the blank in the case of both these tests is a suf- ficient test. Thus we were able to avoid undue eye-strain and were further able to spend the extra time, saved from halving these two tests, in lengthening three of the Association tests. Discussion of Tests 23 Materials: Woodworth-Wells number blank, Form A. 87 Stop watch. A pencil was used for checking. Instructions: After placing the blank on the table before the subject, face downwards, the following instructions were given: "When I say 'Go' I want you to turn over this sheet of paper, and cross out all the 3*3, as quickly as possible, going across the paper like this (illustrating). There are five 3's on every cross line so you want to be sure to cross out all those on the first line before passing to the second line. Start when I say 'Go.' " Method of scoring: The time taken to complete the cancellation was the score. Errors were very rare and were therefore entirely disregarded. Results: Table III indicates the performance in this test. TABLE III Range Poorest Best Test No. 3 (Av. of (Av. of Cancellation Average S. D. lowest 5) best 5) Group I 76.51 17.51 128.28 52.12 sec. Group II 76.77 13.82 105.60 50.76 sec. Test No. 4. Checking This test measures functions similar to those employed in the Cancellation test, although here the functions involved are more complex. To quote Woodworth and Wells, "The detection of a pair of digits in a group is a specialized performance, not reducible to the acts of detecting the single digits. The difficulty of this test is mainly perceptual and the overlapping which is effective in find- ing pairs of digits must occur in the perceptive process." 38 Inas- much as Professor Woodworth found the first half of his number blank, Form B, to be equal in difficulty to the second half, for the reason mentioned under "Cancellation" only one half of this blank was employed. Materials: Woodworth-Wells' number blank, Form B. Stop watch. Pencil. Method of procedure: As in the Cancellation Test, the blank was placed before the subject, face downwards, and the following instructions were given: "When I say 'Go' I want you to turn this " Woodworth, R. S., and Wells, F. L. Association Tests. Psychological Monograph, No. 57, 1911, P. 24. Woodworth, R. S., and Wells, F. L., Op. cit. Psychological Examinations of College Students paper over and check any way at all, as quickly as possible, all the numbers that contain both a 9 and a 6. Start when I say 'Go.' " Method of scoring: The total number of checks to be made was 35. Therefore the score was obtained by dividing the time taken by the subject by the number of correct checks made and then multiplying by 35. No account was taken of wrong checks made as it was believed that the time spent in making them sufficiently penalized the subject. Results : Table IV shows the performance attained in this test. Teat No. 4 Checking Group I Group II TABLE IV Average 102.93 sec. 105.98 S. D. 19.64 20.45 Range Poorest Best (Av. of (Av. of lowest 5) best 5) 152.28 72.6 161.0 76.86 Test No. 5. Color Naming "This is a test of discrimination-reaction, involving prompt decision and correct reaction to a situation." Materials: Wood worth-Wells' Color Naming blank. 39 Stop watch. Method of procedure: Preliminary to the actual test the blank was placed before the subject with only the sample line of five colors showing. The subject was then asked to give the names of each color. Then the following directions were given : "I want you to name all these colors for me, as quickly as possible, going across the paper, from left to right, as in reading. Start when I say 'Go.' w Method of scoring: The score was the time taken by the subject to complete the entire series of 100 reactions. Results : The results are shown in Table V. Test No. 5 Color Naming Group I Group II . TABLE V Average 56.01 sec. 58.55 sec. s. D. 8-75 9-3 6 Range Poorest Best (Av. of (Av. of lowest 5) best 5) 78.84 sec. 81.32 sec. 41.16 sec. 39-0 sec. Op. cit. **"\. Discussion of Tests (25 j Test No. 6. Directions This test measures the subject's speed in apprehension and her general intelligence. Materials: Woodworth-Wells' Hard Directions blank. Stop watch. Instructions: "When I say 'Go' I want you to turn this blank over and follow directions do just what the directions say, as quickly as possible." Method of scoring: The score is the time in seconds required to complete the test. Errors were counted separately. Results : Table VI indicates the performance in this test. TABLE VI Range Poorest Best Test No. 6 (Av. of (Av. of Directions Average S. D. lowest 5) best 5) Group I 126.15 52.00 296.6 64.08 sec. sec. sec. Group II 119.76 41.65 243.2 61.6 Test No. 7. Opposites For a test which would indicate a general tendency or "adjustment ' to react according to instructions" and also measure the quickness and accuracy of association of ideas, the two equal lists of opposites proposed by Woodworth and Wells were combined into one list. Our reason for combining the lists was in order to get a real measure of the individual's ability to name opposites. If we had taken only the short list we would have obtained an adequate measure of the subject's alertness of attention and ability to adapt herself to a situation, but we desired to go further than this and find out whether the individual really had any special ability for naming opposites. This test also indicates facility in handling words and is generally considered to have a high correlation with general intelligence. Materials: Woodworth-Wells' Lists of Opposites printed on cardboard. Stop watch. Method of procedure: These instructions were given: "I want you to name the opposite for each one of these words (showing card with lists, at a distance) as quickly as possible, not repeating the words themselves but just naming the opposite. For instance, if the word were 'tall,' you would say 'short.' Be sure you give the 26 Psychological Examinations of College Students exact opposite of each word before proceeding to the next. Do you understand?" The subject was stopped if a wrong opposite was given and not permitted to proceed with the other words until the right opposite was given. Method of scoring: As no errors were permitted to be made in the test, the score represents the time taken for completing the task. Results: Table VII indicates the results obtained in this test. TABLE VII Range Poorest Best Test No. 7 (Av. of (Av. of Opposites Average S. D. lowest 5) best 5) Group I 51.08 10.33 79-oo 34- 8 4 sec. sec. Group II 50.88 8.55 71.52 35.92 Test No. 8. Verb-object This is also one of the association tests and measures ability to handle verbal relations. As in the Opposites Test we combined the two equivalent lists of verbs proposed by Woodworth and Wells into one test. Desire to obtain a real measure of the subject's innate ability to name objects was the reason for lengthening this test. Materials: Two equal lists of verbs combined into one list and printed on cardboard. Stop watch. Method of procedure: These instructions were given: "In this case I want you to name an object for each one of these verbs, as quickly as possible, not repeating the verbs themselves but simply naming the objects. For instance, if the verb were 'bake,' you would say 'bread' or 'cookie.' Do you understand?" Method of scoring: As no errors were permitted to be made, the score presents the time required to complete the test. Results: The results are indicated in Table VIII. TABLE VIII Range Poorest Best Test No. 8 (Av. of (Av. of Verb-object Average S. D. lowest s) best 5) Group I 65.55 12.32 99.56 45.48 sec. Group II 67.35 I2 -9i 99- 8 47-24 Discussion of Tests 27 . Test No. 9. Mixed Relations or Analogies This test measures facility in handling associations, and ability to perceive relationships among logical material. As in the two preceding Association Tests the two equal lists proposed by Wood- worth and Wells ("Eye: see = Ear- ; Oyster: shell = Banana: " and "Good: bad = Long: ; Man: woman = Boy: ") were combined into one long list for a reason similar to that which led us to lengthen the Verb-object and Oppo- sites tests. Materials: Combination of Woodworth Wells' two equal lists for Mixed Relations test, printed on cardboard. Stop watch. Method of procedure: The subject was shown sample analogies and the following instructions given: "In this case there are three words given and you are to supply a fourth word that has the same relation to the third word as the second word has to the first. For example, in this case, 'Box: square = Orange: ,' square gives the shape of the box. Then the shape of an orange is round, so you would supply 'round* as the fourth term. (Two other illus- trations were then given.) The relations involved won't always be the same; it may be the case of shape, or opposites, etc. But you look at the first pair of terms in every case and then make the second pair express the same relationship as the first pair. Do you understand ?" Method of scoring: As no mistakes were allowed, the score is the time required to complete the test. Results: The results are shown in Table IX. TABLE IX Range Poorest Best Test No. 9 (Av. of (Av. of Mixed Relations Average S. D. lowest 5) best 5) Group I 139.64 42.97 266.6 82.88 sec. sec. sec. Group II 131.66 32.97 227.2 79.56 sec. sec. sec. Test No. jo. Word Building For a test that would indicate ingenuity and skill in the manipu- lation of letters and give a measure of the subject's command of vocabulary, the word building test was used. The number of words written in a given time depends in part on whether the subject 28 Psychological Examinations of College Students proceeds with a definite plan, combining, for example, "a" with all the other letters, then "e" with all the other letters, -etc., or goes about the task in a vague or random fashion. Materials: Sheet of paper at the top of which were written the letters a e i I p r. Method of procedure: The procedure as given by Whipple 40 was followed with the exception that the time-limit was three minutes instead of five. Method of scoring: The score represents the number of words written. A word was considered correct if it is included in Whipple's list of admitted words. Results: Table X shows the results secured in this test. TABLE X Range Poorest Best Test No. 10 (Av. of (Av. of Word Building Average S. D. . lowest s) best 5) Group I . 16.33 4-93 6.0 27.2 words Group II 16.23 4-5 2 6.4 24.6 Test No. ii. Word Naming This uncontrolled association test appears to be a good test for determining individual differences, the subjects tending to write words belonging to various categories. Such differences as the tendency to write series of rhymed words, to write a series of words that are grouped about one central idea, then to write another series of words grouped about a second central idea, suggested perhaps by the last word in the first series, etc., are revealed in this test. It also depends in part on the subject's speed of writing. Materials: Stop watch. Sheet of paper and pencil. Instructions as follows were given: "I am going to give you three minutes in which to write all the words you can. It makes no dif- ference what sort of words they are they can be anything you want to write." Method of scoring: The score equals the number of words written. Results: Table XI shows the results for this test. Test 12. Knox Cube This test gives an indication of the subject's power of observa- tion, memory, and ability to concentrate her attention. It involves Whipple, G. M. Manual of Mental and Physical Tests. Part II, p. 275. Discussion of Tests 29 the ability to handle concrete objects and to imitate another's performance with accuracy. TABLE XI Range Poorest Best Test No. ii (Av. of (Av. of Word Naming Average S. D. lowest 5) best 5) Group I ....... 67.14 12.78 40.8 94.2 words words words Group II 67.87 11.86 45.0 93.0 Materials: Five one-inch cubes. Method of procedure: Pintner's standardization of the Knox test was followed. Care was exercised to execute all movements slowly and deliberately and at a uniform rate. Method of scoring: The score represents the number of lines correctly imitated. Results: Results are indicated in Table XII. TABLE XII Range Poorest Best Test No. 12 (Av. of (Av. of Knox Cube Average S. D. lowest 5) best 5) Group I 9.20 1.56 5.8 11.4 lines Group II 8.82 1.64 4.8 12.0 lines Test No. 13. Digit Span To measure ability to reproduce with accuracy disconnected and non-logical material, the digit span test was employed. It tests the subject's power to concentrate her attention upon the series of digits as they are read aloud to her by the experimenter and to so retain said series in her mind that she may reproduce it with abso- lute accuracy immediately after the experimenter has ceased speaking. It affords an opportunity also to observe individual differences. Materials: Digit Span blank. Stop watch. Method of procedure : These instructions were given : "I am going to read some numbers to you and as soon as I have finished saying them, I want you to repeat them in exactly the same order." The smallest number ol digits given was five. Three trials were 30 Psychological Examinations of College Students given for each number. The attempt was made to repeat the num- bers without rhythm. Method of scoring: The score represents the highest number of digits correctly repeated two trials out of three. Results: Table XIII indicates the results of this test. TABLE XIII Range Poorest Best Test No. 13 (Av. of (Av. of Digit Span Average S. D. lowest 5) best 5) Group I ....*.... 7.39 digits 1.31 5 digits 10.2 digits Group II 7.67 " 1.29 5.2 " 10.2 " Test No. 14. Word Memory Test No. 15. Logical Memory Both of these tests call into play functions similar to those demanded in the digit span test. However, here the material to be reproduced has meaning, consisting in Test 14 of a series of con- crete words and in Test 15 of a list of familiar proverbs. Materials: Cards containing a list of 25 words and a list of 25 proverbs, respectively. Also two blanks containing 50 words and 50 proverbs, respectively. The cards and blanks were those em- ployed by Edith Mulhall Achilles. 41 Method of procedure: Instructions were given as follows: "I am going to let you look at a list of words (or proverbs as the case might be) for one minute, after which I am going to ask you to write as many of the words (or proverbs) as you remember." The subject was allowed one minute in which to write down the words she remembered and two minutes to write the proverbs. After record- ing the words remembered the subject was given a second list in which there were 25 words previously seen and 25 new words, and was asked to mark "y" all the words she recognized as having seen before and "n" those she thought she had not seen. Similar pro- cedure was followed for the test with proverbs. Method of scoring: For Recall the number of words or proverbs written constitutes the score. No account was taken of the order in which they were recalled, or any false recollections recorded. In scoring Recognition this formula was employed to derive the score : tt Achilles, Edith Mulhall. Archives of Psychology, No. 44, 1920. Discussion of Tests 31 50 (which is the total number of words or proverbs) minus 2 x number of errors = score. Results: Tables XIV and XV indicate the results of these tests. TABLE XIV Test No. 14 Word Memory Recollection Average S. D. Group I 1 1. 59 words 2.70 Group II ....... 10.91 " 2.79 Word Memory Recognition Group I 35.84 " 7.44 Group II 35.07 " 8.33 Range Poorest Best (Av. of (Av. of lowest 5) best 5) 6.6 words 17.4 words 6.2 " 18.0 " 2O.O 14.8 47.2 48.4 Test No. 15 Logical Memory Recollection Group I Group II . . Logical Memory Recognition Group I Group II TABLE XV Average 6.19 6.50 proverbs 36.75 3747 S. D. 1.74 1.76 8-95 7.69 Range Poorest Best (Av. of (Av. of lowest 5) best 5) 3-2 proverbs 17.2 18.4 9.6 9.8 proverbs 47.6 48.4 Test No. 16. Substitution For a test which would measure speed of learning new associa- tions the Substitution test was employed. In this test a key is constantly referred to and as the test proceeds it is gradually learned, the subject depending less and less upon it. Comparison between the time taken to complete the first and second halves of the blank gives a measure of the amount of time saved from learning the key. Materials: Substitution test blank. The blank with 5 geometrical forms was used. Stop watch. Method of procedure: The key was explained to the subject and then the blank was placed face downwards before her and she was instructed to turn over the Substitution blank at the signal "go" and to begin with the first form and take each one as it came, going across the paper from left to right, and to write the proper number in each form according to the key at the top. 32 Psychological Examinations of College Students Method of scoring: Three scores were taken, representing the time for the first half of the blank, the second half and the whole blank, respectively. Errors, being rare, were counted separately. Results: The data for this test are found in Table XVI. TABLE XVI Range Test No. 16 Average S. D. Poorest Best Substitution i Half .... seconds seconds seconds Group I 64.33 9-69 87.68 46.8 Group II 66.68 12.14 97-6o 46.0 Substitution 2 Half Group I 59-io 11.62 86.2 37.0 Group II 61.51 13.15 91.8 38.4 Substitution Whole . . . Group I 123.09 19.61 167.72 86.48 Group II 128.19 23.89 187.0 87.40 Test No. 17. Completion For measuring correctness and facility in the use of words, readi- ness in perceiving and comprehending situations and affording some indication of creative ability, the Completion test was em- ployed. To quote Trabue, "On the whole it will be found that ability to complete these sentences successfully is very closely related to what is usually called 'Language ability .' " ^ Materials: Trabue Language Seal A. Stop watch. Method of procedure: The standard procedure suggested by Trabue was followed, a time-limit of four minutes being employed. Method of scoring: In general, the method was to follow Dr. Trabue's scoring; "A score of 2 being given each sentence if perfectly completed, a score of I if almost but not quite perfectly completed, and a score of o if not attempted at all or if imperfectly done." Total of 48 points is the maximum score attainable in Scale A. Results: Table XVII represents the performance of the freshmen in this test. TABLE XVII Range Poorest Best Test No. 17 < Av - f ( Av - of Completion Average S. D. lowest 5) best 5) Group I 36.08 4.33 26.8 44.8 Group II 3578 4.36 25.2 44.4 Trabue. Completion-Test Language Scales. Discussion of Tests 33 Test No. 18. Information To measure range of information and obtain some conception of the number and kind of objects known and the degree to which they are known, the information test was used. It tests the individ- ual's knowledge rather than her ability. Material: The information test blank as specified in Whipple's Manual, containing 100 words and directions for marking them. Method of procedure: The subject followed the directions at the top of the blank, marking each word with a certain letter which indicated the degree to which it was known to her. There was no time-limit in this test, the subject being allowed all the time she desired to finish the blank. Method of scoring: The score represents the number of words marked "D," "E" "F," and "N," respectively. As no check was used in this test, the score probably shows over-estimation. The total score was obtained by assigning these values: D = 3; E = 2; F = I ; and N = o, and taking their sum. Results: The table following indicates the results of this test. TABLE XVIII Test No. 1 8 Average Information D 2 1.47 words Information E 13. 70 " Information F 14.81 " Information N 50.01 " Total Score: Group I 106.63 Total Score: Group II 104.71 s. D. 9.71 6.16 6-43 10.35 Range Poorest Best 3. 6 words 41 .6 words 3 1.8 69.6 25-51 59-8 26.79 554 28 26.2 29 158.2 161.8 Test No. ip. Vocabulary This test merely indicates the number of words in the individual's vocabulary. Materials: Vocabulary test blank as specified in Whipple's Manual. 43 Method of procedure: The subject was asked to follow the directions given at the top of the test blank and to mark the words carefully according to the directions. 'Op. cit. Vol. 2, p. 310. 34 Psychological Examinations of College Students Method of scoring: The score represents the number of words marked plus (+). This number indicates the vocabulary-index; the index taken as a per cent, is multiplied into 28,000. Results: Table XIX shows the results for this test. TABLE XIX Range Poorest Best Test No. 19 (Av. of (Av. of Vocabulary Average S. D. lowest 5) best S) Group I 74.8 1 words 6.86 59.6 86.6 Group II . 73.90 " 7.60 59.4 87.4 SECTION V NORMS OF PERFORMANCE AND THEIR PRACTICAL APPLICATION To summarize the results of the preceding section, Table XX shows the norms of performance for the two hundred Barnard freshmen (Groups I and II), in all the various tests. The average, probable error, and range from the poorest to the best score are shown for each test. To avoid misrepresentation of facts by undue weight being given extreme cases, the average of the five poorest scores is in each case taken as the poorest score, and the average of the five best scores as the best score. The following is a comparative table comparing our results with those ofother investigators who have employed some of these tests with freshmen. Only those cases are considered where the tests are identical, and the method of scoring the same. Test Cancellation Color Naming . Hard Directions Opposites Word Building . Digit Span . Information Barnard Norm Bingham 76.6 sec. 48.3 sec. 57.2 sec. 56.2 sec. 122.9 sec- 50.9 sec. 1 6.2 words 7-53 digits 7 digits 20.4 words Kitson 69.2 sec. Other Investigators Washburn, 153 sec. 1 1 0.9 sec 52. 6 sec. 21.4 words Sunne, 1 8 8.4 digits Cattell, 7.6 Waugh, 24 King& M'Crory, 25 Smith, 10.9 Figures I to 23 inclusive, show graphically the dispersion of measures about the average in the case of the Barnard freshmen. To secure uniformity and facilitate comparison, the charts are constructed with the average in each case as the mid-point and the scores expressed in terms of P.E. units from the average as a center. The P.E. was taken as the unit because it is a convenient and familiar measure. The vertical scale is also kept constant except in three tests where it is changed for reasons to be specified later. Inspection of these figures reveals many interesting features. We may divide the tests roughly into five groups. 44 The first group contains the two motor tests Coordination and Tapping. 44 Justification of this division of the tests will be given in Chapter VI. Psychological Examinations of College Students o- O NO oO vO vO d vO vO QOOOOOOOOOOQOO C v JOpI s ^>-'ONO\ONvO(N|pTt'NOocrN|T$-ooo<(NiOfNiO ^" t^ l~^ t> 'T t^ (N ONI-I >-l HH rl-HH Tl-Tj-( ^HOO MOO O ^r^f OOOOVONOOOOOOO QOOOOO tOc tCMOOVO fO >H M O Ch O^ M IO OOOO IOI-H t>.t^O roiO-^-vO t^"-" I* (i o * 00 VO 00 > IO 00 ^ C> 00 00 t>. 00 t^. O t^O 1^00 o \O t-i oo vo VO | (3 (2 w s^ o slii-s E- O O " I S S c fe c > O O M 5 I ^ 2 < t^ 00 ON O *H 4 3 I -I A +1 Z 3 (16. I COORDINATION 3 2 -I F I 6. 2 A +1 2 3 TAPPING F 16. 3 -I A 41 2 3 4 CANCELLATION XI 5 J1 4 3 F I 6. 4 2-1 A +1 2 3 NUMBER CHECKING 7 654 3 2-1 A +1 234 F I 6. 5 COLOR NAMING 8 7 6 5 4 3 2 -I A +1 2 3 FIG. 6 DIRECTIONS 4 3 2 -I * +1 2 F I G. 7 P P S I T S 35 30 25 20 15 10 6 5 43 2-1 A +i 23 F I 5. 8 V [ R B G J t C T 87 654 32-1 A +1 23 F I 6. 9 MIXED RELATIONS _ r^ it mm - if 785432-IA+I 2 F I 6. I WORD BUILDING 54 3 2-1 Afl 2 3 F I 6. I 2 R H X CUBE 6 4 Z * I 4 F I 8. I 4 WflROMEMOfir RECOLLECTION U 6 5 4 3 2 - I 4 +I 2 3 F I 6- 15 WORD RECOGNITION 3 I - I A +1 Z 3 \ 5 Ft GIG LOGICAL II E M R Y - R E C L L E C T I N Norms of Performance and Application 41 Here we have fairly uniform distributions. The actual range for Coordination is from 3^ P.E. to + 5> P.E. (skewed at the positive end), and for Tapping from 5^2 P.E. to + 7 P.E. But to take the actual range as the basis of our comparison is misleading. A clearer conception of the facts is obtained by noting the closeness with which the measures distribute themselves about the central tendency. In these two motor tests we find a fairly uniform dis- tribution, suggesting that the tests are adequate for selecting good and poor subjects even in a group as homogeneous as college fresh- men. In the second group we may place those tests which involve powers of perception and comprehension, namely, cancellation, checking, color naming, word naming, and substitution. Here again we find a distribution approximating the normal curve of distribution. At first glance it would appear that in four of these tests the curves are skewed toward the negative or poor end. In both Fig. 3 and Fig. 4, (Cancellation and Number Checking), we find a case at 7>^ P.E.; in Fig. 5 (Color Naming) we find one at 7 P.E.; and in Figures 18, 19, and 20 (Substitution), we find cases at 9 P.E.; 7 P.E., and 8 P.E.; while at the good end no case exceeds + 4 P.E. We must take care, however, not to let these extreme cases mislead us as to the general character of the distribution. If we count up the cases on either side of the average we find 1 08 cases above the average in Cancellation, 109 in Number Checking, 106 in Color Naming, 107 in Substitution, and 98 in Word Naming. Thus we really have a more or less uniform dis- tribution with a tendency of the number of scores above the average to exceed the number below it. Disregarding the few extreme cases, we find the majority of the scores contained within the normal limits of the P.E. distribution, (- 4 P.E. to + 4 P.E.). In the third group we may place the tests involving associative relations, namely, Directions, Opposites, Verb-object, Mixed Rela- tions, Word Building, and Completion. Here, likewise, as in the two preceding groups, we find fairly uniform distributions with a greater number of cases above than below the average, (except in Word Building, where the distribution is about equal). The major- ity of cases are likewise contained within the normal range of 8 P.E., but there are a few extreme cases at the poor end in Completion, Opposites, Verb-object, Mixed Relations, and an extreme case at both the good and bad end in the Word Building test. The fourth group contains those tests which call into play powers [U I 87 654 32-1 A + 1 23 FIG. 17 LOGICAL RECOGNITION 98 7654 32-1 A +1 23 F I G. I 8 S U B S T I T U T I N- F I B S T H A L F 7 6 5 4 3 2 - I A +1 2 3 4 f I 6. I 9 S U B S I I T U T I H- S E C N HALF 8 7654 32-1 A -H 234 F 1C. 20 SUBSTITUTION WHOLE Norms of Performance and Application 43 of learning, viz : observation and retention, namely: Word Memory, and Logical Memory. A word of explanation is needed here regarding the construction of the chart for Logical Memory (Recollection). The categories into which the scores fall are so few that the finest grouping possible is in i P.E. units instead of $4 P.E. units as in the other tests. As we said before, to secure uniformity we let the P.E. represent the same interval along the base line in all tests. Now, in order to keep the area of a given number of cases constant for all tests, it is neces- sary where we have scores in terms of I P.E. units to reduce the vertical scale proportionately. Therefore, we regard the measures as distributed evenly over the P.E. intervals and reduce the vertical scale one-half. In this test and in Word Recollection we find a greater number of cases below the average than above. The curve is skewed toward the poor end in Word Recollection, and toward the good end in Word Recognition and Logical Recognition. In our fifth group we have tests which depend on the subject's knowledge rather than her innate ability, namely, Information and Vocabulary. Here we find fairly uniform distributions with no extreme cases. This suggests the tendency of education to make a homogeneous group of individuals approach a general level of per- formance in a test of mere learning. We have, finally, a miscellaneous group which comprises the Digit Span and Knox Cube tests tests which showed both a low intercorrelation and low correlations with the other tests of the series. In the Knox Cube test the small number of categories makes it necessary to use I P.E. units and in the Digit Span test it is necessary to use 2 P.E. units. To sum up then, these surfaces of distribution are fairly symmet- rical, if we disregard the few extreme cases. In addition, the fact that the averages and surfaces of distribution for the first group of one hundred freshmen (Group I) are approximately the same as for the second group of one hundred (Group II), corroborates this con- clusion and supports the view that the norms here presented are reliable. ACADEMIC GRADES Besides their score in the psychological tests we have additional information about the first group of one hundred freshmen (Group I) in the form of university grades and records taken in the gym- nasium. The college subjects may be grouped into five classes: ll 1 L L, 654 32-1 A 1-1 234 5 F I G. 2 I COMPLETION FIG. 22 INFORMATION 4 3 2-1 1*1 2 3 4 f I 6. 23 VOCABULARY Norms of Performance and Application 45 I. Language (including English, Latin, Greek, German, French, Italian, and Spanish); 2. Mathematics; 3. Science (physics, chemis- try, botany or geology); 4. Philosophy (including psychology); and 5. History. Due to the freedom allowed the students in making out their programs, the same subjects are not taken by all, and the number of cases in each class therefore varies. The letter system of marking is employed at Barnard, the letters A (excellent), B (good), C (fair), D (Poor), and F (failure), being used. For the statistical treatment of the data the letter grades were transformed into numbers according to the scale: A = 90, B = 80, C = 70, D = 60, and F = 50. Norms for these freshmen in their college work are shown in Table XXI. TABLE XXI Academic Number of Range (Actual) Record Cases Average P. E. Lowest Highest 1. Language 97 75.31 4.69 50 90 2. Mathematics .... 88 76.99 6.99 50 90 3. Science 41 72.26 7.74 50 90 4. Philosophy 27 78.15 3.15 60 90 5. History 26 72.88 2.88 60 90 The averages tend to be approximately equal for all subjects with a nearly equal range of distribution. PHYSICAL MEASUREMENTS Table XXII gives averages, P.E.'s, and range from lowest to best score of the physical measurements taken in the gymnasium. TABLE XXII Number Range (Actual) Test of Cases Average P. E. Poorest Best Height 97 159.92 cm. 4.08 137 172.9 Weight 97 120.59 Ibs. 12.59 9 182 Lung Capacity 94 171.05 cu. cm. 13.50 118 230 Strength of Grip, r.h 97 30.02 kg. 4.02 13 43 Strength of Grip, 1. h 97 27.27kg. 4.27 16 38 Upper Back 97 20.60 kg. 3.4 12 42 Chest 97 19. 60 kg. 2.6 u 36 One of the main purposes of this investigation, as we remarked in a preceding section, was to give the individual student a knowl- edee of her strengths and weaknesses. Accordingly, at the com- Tdppiny 368 Ct^n-CcHa-tlon. 60 CktckCnfr 13.9 Co tot Naming SS' Directions lOO Opposite* 37 Vetb Objtct 43 Mixed fftbtionf 81 Woid Buitatina 10 Motet Namina, 15 Completion f/8 Kn.9X Cult 8 Diait Span '0 Woidfocoft. IS LooU*.? ftectQ. M Substitution /OK Information //& V*CA&UQSUJ 75 Language 10 v * Pfatatopfy AcK -23 Ckeft ZZ IS fe loo Sa Iw h |s 3 N OS 5 N Norms of Performance and Application 47 pletion of the entire series of examinations each year, an individual report was sent to each student who took the tests. This consisted of two blanks giving a description and interpretation of the various tests, with whatever significance. each test was known to possess from a vocational standpoint. In addition to these explanatory blanks, there was a third blank which indicated the standing of the individual student in each of the tests, together with the average standing, (with the P.E.), in each test for the entire group of one hundred freshmen, so that the individual could compare her own record with that of the average in every case. The ideal plan would have been for the experimenter, after sending each student her report, to have had a personal interview with her. In this she could have cleared up any difficulties the student might have had in interpreting her results and under- standing their significance. She could also have rendered distinct aid by suggesting means whereby the student could make the best use of her abilities, or strengthen her weak points. Where the girl was doing academic work of a grade below the level her test record showed her capable of, the experimenter could have sought to determine the cause of the girl's academic failure whether due to too many distractions, outside work, or what not and given advice accordingly. Lack of time made it impossible to do this, however. We therefore have no record of these girls in their last three years of college to show whether they benefited from their test results. It is worth while at this point, nevertheless, to indicate how one may proceed to make practical use of these tests. Charts I to 6, inclusive, represent the psychographic records of six students from Group I. They are constructed as follows: Reading along the heavy horizontal base line, we have the names of the nineteen psychological tests, (Substitution First Half and Substitution Second Half are omitted since ability in this test is adequately measured by Substitution Whole), the academic subjects varying from two to four, according to the programs of study, and seven physical measurements. Opposite the name of each test, subject, and physical measurement is the individual's score, and below this, the amount of her plus or minus deviation from the average scores expressed in P.E. units. To make the individual's relative standing more concrete, her score in P.E. units is also expressed in terms of what her position would be in a group of one hundred freshmen, selected at random. The vertical line (reading up from the base line) is divided into Coo zdi nation 81 Is CancetfatCo* 6/ Cfackin.fr 81 Cot** nasnino 57 h "*^ Directions 157 R Opposite* 87 VetbQbjiU 61 Mindfota* 224 1* |OO WoidBuMino 13 1* Compfttion 30 h KM* Cu$e 8 r Digit Sf>A* S V/Qtci Kttoa . 32, |S LQiica? Rttca. 38 a (T Is Infvtmc&An 58 VocAbutaiij 61 K H Lan.quQ.ae. $5 k, Science 50 lT Hiftot*/ 60 HeiA&t 160.5 o Weiete lc3.S N Is Utno Capacity HZ $T,ip left Z3 N N N N 00 co ISO Tappina 4/i C&eckina 73. 6otoz N Dittcttms I35 Ofpositts M. 6 Vfi&O&ject (J. f?etat. 4 Qict Naming S3 37 10 7 * tnox fad tfttd /? A f 7 a /c?V.8l 76 6Z\ /72| 00 c o 20 ** Co CO i* 3/ Cooidin&tion Sl ina 3/U t Ca.nceeea.tSon Cotoi. Nantino ** ?m r 60 Hixtd ftekt. /o8\ 7 Wotd Naumtn* S6 37 Diait Span. S focotf. 7 LoqitoJt fttcoo Substitution ^n lo l 7 60 60 Luna C&faciLulbO I* ON Norms of Performance and Application 51 equal divisions, indicating position in a group of one hundred fresh- men selected at.random, using the norms of Table XX as the basis. No. i is considered the poorest individual in each case, No. 100 the best. The heavy horizontal black line in the center represents the average individual or the 5oth individual in the group. To illus- trate the use of these charts let us consider Chart I, A.M.'s record. In coordination this individual scores 96. Referring to Table XX, we see that the average freshman score for this test is 83.42 with a P.E. of 7.5. A.M.'s deviation from the average score is, therefore, + 12.58 (96-83.42) -f- 7.5 (the P.E.) or + 1.67 P.E. units above the average. We know from the normal curve of distribution that between the average and + I P.E. are found 25% of the cases, or 25 cases in a group of one hundred individuals. Between I P.E. and + 2 P.E. there are approximately 17% more cases, or 17 in a group of one hundred individuals, so that if a girl made a score of + 2 P.E. she would rank 50 (average) + 25 + 17, or 92 in the group. A.M., however, does not quite reach this score. Her score reaches only. 67 of the interval between -f I P.E. and + 2 P.E., or, .67 of the 17 cases contained within these limits. Now .67 X 17 = 11.39, * e -j A.M.'s score is that of the nth individual in this group. This is only her approximate position, of course, since the scores are not distributed evenly over the interval. To secure her exact position we would transform her P.E. score into rank according to proper table. 5 She therefore stands 50 + 25 + 1 1 , or 86 in a group of one hundred freshmen in coordination. In Tapping her score is 368 taps. The average freshman score in this test is 372.4 taps with a P.E. of 27.6. A.M.'s deviation from the average, accordingly, is 4.4 (372.4 368); her deviation in terms of P.E. is 4.4 -5- 27.6 (the P.E.), or she is .15 P.E. units below the average. Her score therefore reaches .15 of the 25 cases in the interval between the average and i P.E. Now, .15 X 25 = 3.75. Her score there- fore gives her a rank 3.75 or approximately 4 places below the aver- age or 5Oth individual, i. e., she stands 46 in a group of one hundred freshmen. A similar method was employed in finding out the psychographic records of the other five students. Considering the net scores in the psychological tests, A.M. ranked 97 in Group I, only three individuals surpassing her. When we group the tests under the five divisions suggested above, we see that although she would stand well above the average in a random group of one hun- dred freshmen, she makes her highest rank (88 average rank for Thorndike, E. L. Mental and Social Measurements. SI 317 *. CoCoz a J77.& Vet,b Objtct 6mO M o h !?9999999 c !l99 ". o ! >: 9 t^- T}- ooo Tt-vO O 00 o . - " *! ^ ! 9 +++++++++++ + I + + ++++ ) MOiOOMOi^POrO irtOO O 00 O M - ; ^09 1 9 *? 9^^^ H +++++++++ +++ + + + I +++ ' ' j~j M M roo O v) t^oo c<5Nioo O^or^Noooo Ov rS suoi^Bia^H paxipv "99 h J1 r ? l ? r ? 1!?" 9 9 ^ "9^9 X I I ++++++ ++++ + I + ++++ WfJOO 9 9 + + 4- i^r~oo O f* O inM-^-t^ N * 1C * M MPr * j ++++++ +--++++ +"+'+' +'+++ OfOrOvOOOOOHOO>W5 tf) t^OO vONNTj- M N ^. "f?^^ Nror^ciNfoiHN M O O OfONO PJ w<5 ++++ ++++++++ + + + f +'+"+' OMM OiOOOOvO>t-0 t- * t^ a l^oo O N '" f?^ 1 ^ NNNrOMNrONM C5 O MNMQ ' ' +++ +++++++++ + + ++ UOpBUlpJOO3 S ff m ^. v2 "o 00 O O O MOO row M l^N O N O r^ t^/5i-iO\ O >o *O -trprowOqiiwMNi-iM M N w MNq"-" M M - +++++++ I +++'+' + + + 4+1 \ .2 il1isffJig^lJlJi!ifl 8 si a 'hms B i >4w4diio6di6iw4 t o ^ ood6>* J03 ' ;3 >^' 5l *- Soi a Inter-Test Correlations and their Significance 59 tive, are low. This would indicate that we are testing here different mental abilities. The fact that we can group certain tests together on the basis of relationship shown by the correlation coefficients further supports this view. It is possible to find several groups of tests which correlate closely among themselves, but loosely with the other tests. The following table gives the various groupings with their correlations: TABLE XXIV GROUPING OF TESTS ON THE BASIS OF THEIR CORRELATION COEFFICIENTS Group I. Coordination and Tapping, Correlation +.48 with each other. Group II. Cancellation, Checking, Color Naming, Word Naming, Substitution. Average Correlation of tests within group +-3 2 " u Cancellation with all others +.35 Checking " +.36 u u Color Naming " " " +.27 Word Naming " +.34 Substitution " " +.30 Group III. Directions, Opposites, Verb-object, Mixed Relations, Word Building, and Completion. Average Correlation of tests within group -H.3 2 " Directions with all others +.25 Opposites u " " +.40 Verb-object " " " +.31 Mixed Relations " " " +.40 Word Building " u " +.25 * Completion " " +.30 Group IV. Word Recollection, Word Recognition, Logical Recollection, Logical Recognition. Average Correlation of tests within group +.38 Word Recollection with all others +.39 Word Recognition +.37 " " Logical Recollection " u " +.40 tt " Logical Recognition " u u +.35 Group V. Information and Vocabulary. Miscellaneous: Digit Span, Knox Cube. Thus Tapping and Coordination correlate + 48 with each other, but both tests show a much lower correlation with the other tests. (The correlations outside of the group range from + -33 to + .01). This agrees with Thorndike's theory that tests of the motor sensory level correlate rather closely with each other, but only loosely with tests of other levels. In Group II, Checking has an average corre- lation of + .36 with the others of the group, and also a much lower 60 Psychological Examinations of College Students correlation with tests outside Group II, (ranging from + .30 to - .04). Similarly, in Group III, Opposites and Mixed Relations both have an average correlation of + 40 with the other tests in this group, but a lower correlation with any test outside the group, again conforming to Thorndike's contention that tests on the associative level correlate closely with each other, but rather loosely with tests on other levels. (The average correlation of Opposites with the tests outside Group III is + .15; the average correlation of Mixed Relations with tests outside Group III is + .10). In Group IV, also, Logical Recollection has an average correlation of + .40 with the other tests in the group, but a lower correlation with any test outside this group. (The correlations outside the group run from + .30 to -1- .01). Information and Vocabulary differ from the other tests of the series in that they are indicative of one's learning rather than one's innate ability. There is only a chance correlation between them and the other tests. A more detailed discussion of this relationship we will postpone till the following section. As for Knox Cube and Digit Span, perhaps the best plan is to consign them to the miscellaneous class. Knox Cube shows on the whole the closest correlations with the tests in Group II, but the average group correlation is not high enough to warrant us definitely placing it in this group rather than in Group IV. In like manner, aside from its surprisingly high correlation with Cancellation (+ .77), Digit Span shows no close relationship with any other test. If we omit these four tests, (namely, Information, Vocabulary, Knox Cube, and Digit Span), we do get very definite groupings of the other tests, as shown in Table XXII above, indicating that we are measur- ing different abilities. The rather high intercorrelations between the tests of each group, together with their low correlations with tests outside their own groups would support this view. There is no evidence from these results to support Spearman's theory that correlations are produced between all sorts of performance, the amount of the correlation being simply proportional to the extent that the performances concerned involve the use of a general com- mon factor or "general ability." Our data give evidence neither of a common factor nor of a hierarchial arrangement of the correla- tions. Attempts to arrange the correlations to form a hierarchy met with even greater failure than Simpson reports. The simplest and clearest way to explain the existing relation- ships between our tests seems, therefore, to arrange them in the groups indicated in Table XXIV a grouping supported by the Inter-Test Correlations and their Significance 61 * Ji S> Sj 3 * ) ? ) N S IT r ++++++++++++ + r + + ' S CO f II|P|||^SSP|S|*||||| lllllllilllHtifllilll S^asSsfilSlliall I I ^ s u od 6, 62 Psychological Examinations of College Students actual correlation coefficients. The tests within each group seem to be closely related to each other because they possess elements in common elements serving to bind them closely to each other, but loosely to tests without their own groups. Thus, Group I involves motor capacity and skill; Group II powers of perception and com- prehension; Group III associational relations; Group IV pure memory. Though there is some slight overlapping in the qualities called into play in the various groups, nevertheless it is not sufficient to spoil our classification. Table XXV gives the inter-test correlations corrected for attenua- tion. The correlations are all higher but show in general the same relationship. They range from + i.oo (Cancellation and Digit Span; Word Recollection and Word Recognition; Word Recollec- tion and Logical Recollection) to + .00 (Tapping and Word Recol- lection; Mixed Relations and Word Recollection). When the corre- lations are corrected for attenuation, Logical Recollection shows the highest average correlation ( + .39) with the other tests (Omit- ting Information and Vocabulary). Then, in order, Word Naming; Substitution, Word Recollection and Cancellation; Opposites, Verb Object and Word Building; Checking, Directions and Mixed Relations; Completion and Color Naming; Word Recognition and Logical Recognition; Coordination, Digit Span, Knox and Tapping. The corrected coefficients of correlations also support the group- ings of tests given in Table XXIV. It is possible to arrange the attenuated correlations in the same groups as those given by the raw correlations. The corrected coefficients of correlation are higher than the raw correlations but the relationship between the tests is similar. To determine the reliability of the test scores, an investigation was conducted three years after the testing of the first group of one hundred freshmen (Group I). Two trials of the tests were given to a group of 45 freshmen during the period extending from March 14 to May 15, 1919, inclusive. The two trials occurred in every case on the same day and required approximately 45 minutes of the student's time. Table XXVI gives a list of the tests employed in two trials. The method of procedure in conducting these tests with the 45 freshmen was the same as that employed with the 200 freshmen in Groups I and II. Moreover, all the tests were conducted individ- ually just as was done in testing the freshmen in Groups I and II, and the room employed for the testing was the same as in the former Inter-Test Correlations and their Significance 63 TABLE XXVI 1. Coordination Trials I and 2 identical, same as with Groups I and II. 2. Tapping Trials I and 2 identical, same as with Groups I and II. 3. Cancellation First half of Woodworth- Wells' blank used in Trial I, and second half in Trial 2. 4. Checking First half of Woodworth- Well's blank used in Trial I, and second half in Trial 2. 5. Color Naming Trials I and 2 identical. 6. Directions Woodworth- Wells' blank used in Trial I ; Wells' alterna- tive form used in Trial 2. 7. Opposites f The first half of each of these Wood worth- Wells' 8. Verb-object { blanks was used in Trial I, and the second half in 9. Mixed Relations [Trial 2. 10. Word Building Letters a e i I p r used in Trial I. (Same as in groups I and II). Letters a e o b m t used in Trial 2. 11. Word Naming Trials I and 2 identical. 12. Knox Cube Trials I and 2 identical. 13. Digit Span Trial I as in Groups I and II; equivalent form used in Trial 2. 14. Word Recollection / 15. Word Recognition I Trial I the same as in Groups I and II; equivalent 16. Logical Recollection 1 Mulhall form used in Trial 2. 17. Logical Recognition * 1 8. Substitution Given only once. (The closeness with which the cor- relations of the first half of the test with the other tests agreed with the correlations of the second half of the test with the other tests, measures the reliability of this test.) The correlation between the score in the first half of the blank and the score in the second half of the blank was taken as the measure of reliability. 19. Completion Given only once. The correlation between the score in the odd numbered sentences and the score in the even numbered sentences was taken as the measure of reliability. investigations. Just as we found the average and P.E.'s for the various tests to be approximately the same for both groups I and II, so the norms for this group of 45 freshmen are approximately the same as those obtained for Groups I and II. Thus, since one group of Barnard freshmen appears very similar to any other group of Barnard freshmen selected at random, we may fairly assume that the coefficients of reliability secured with any one group will also be indicative of the relationship that would exist between two trials with any other group selected at random. If, then, we find the reliability of the tests high for this group of 45, it is fair to judge that it would have been equally high with the group of 100 fresh- 64 Psychological Examinations of College Students men, (Group I), whose test scores were used in computing the correlations given in Table XXIII. TABLE XXVII TEST CORRELATIONS BETWEEN TRIAL i AND TRIAL 2 GROUP OF 45 FRESHMEN 1. Coordination +.66 2. Tapping +.77 3. Cancellation . +.60 4. Checking +.88 5. Color Naming +.88 6. Directions +.76 7. Opposites +.79 8. Verb-object +.70 9. Mixed Relations +.60 10. Word Building +.70 11. Word Naming +.71 12. Knox Cube +.69 13. Digit Span +.83 14. Word Memory Recollection +.18 15. Word Memory Recognition +.33 1 6. Logical Memory Recollection +.48 17. Logical Memory Recognition +-73 1 8. Substitution +.70 19. Completion +-77 Table XXVII shows the correlation between the first and second trial for each of the 19 psychological tests. With three exceptions Word Recollection (+- .18), Word Recognition (+- .33), and Logical Recollection (+- .48) the correlations are high enough to indicate a high degree of reliability. These reliability correlations range from -+ .88 in the case of checking and Color Naming to + .60 in the case of Cancellation and Mixed Relations. If we disregard Word Recollection, Word Recognition, and Logical Recollection on the ground that their low reliability coefficients suggest that their correlations with the other tests do not give us an exact measure of the existing relationship, we have remaining a series of 1 6 reliable tests. The inter- test correlations based upon the scores in these 16 tests are accurate indicators of the true relationship existing between these tests. Our conclusions drawn from these inter-test correlations are, moreover, strengthened by our knowledge that they are based upon reliable test scores which give an accurate measure of the freshman's ability in these tests. SECTION VII CORRELATIONS BETWEEN THE TESTS AND ACADEMIC MARKS TESTS VERSUS MARKS AS MEASURES OF MENTAL ABILITY The charts discussed in Section V showed that the freshman scores in the psychological tests were distributed according to the normal probability curve. Tables XXVIII to XXXII inclusive, show the distribution for the five groups of academic marks, based on grades of freshmen in Group I. TABLE XXVIII TABLE XXIX TABLE XXX LANGUAGE MATHEMATICS SCIENCE Grade Frequency Frequency Frequency F (50-60) 2 i 4 D (60-70) 14 14 6 C (70-80) 49 33 1 6 B (80-90) 30 30 12 A (90-100) 2 10 3 TABLE XXXI PHILOSOPHY TABLE XXXII HISTORY Grade Frequency Frequency F (50-60) o o D (60-70) I 4 C (70-80) 10 1 6 B (80-90) 12 4 A (90-100) 4 2 Not only is there a coarse grouping (only five units) as compared with the fine grouping of scores in the various psychological tests (15 to 20 units), but the distributions fail to follow the normal error curve as is the case in the test scores. With the academic marks there is a decided skewing of the distribution curves toward the good or positive end. It seems as though instructors made a delib- erate effort to avoid failing their students. As for the passing grades, inspection of the marks suggests that there is insufficient care in rating students according to their relative abilities in various courses. Observation of the uniform surfaces of frequency obtained when these one hundred freshmen were given the twenty-three psycho- logical tests, compared with the decidedly skewed distributions for the same students in academic marks, prepares us for correlation tables XXXIII and XXXIV. 66 Psychological Examinations of College Students Table XXXIII shows the correlation between the scores of all the psychological tests (excluding Information), and the marks in each of the five academic groups for the freshmen in Group I. Language shows a fair positive correlation with Mixed Relations (+ .20), Word Building (+ .31), Completion (+ .30), and Vocabu- lary (+ .41), i. e. t with the tests in which the language factor per- forms a significant role. Mathematics shows a fair positive correla- tion with Cancellation (+ .28), Checking (+ .22), tests involving simple mathematical processes, and Knox Cube (+ .24). Science shows positive correlations with Opposites (+ .33), Verb-object (+ .23), Mixed Relations (+ .30), tests involving the higher thought processes needed in understanding the science courses given at Barnard, Knox Cube (+ .34), a test involving powers of perception and observation which are necessary in scientific labora- tory work, and Logical Recollection (+ .21), which is also an im- portant factor in scientific work. The correlations of Philosophy with Cancellation + .37, Word Naming (+ .29), Knox Cube (+ .28) and Digit Span (+ .22) are unexpected. TABLE XXXIII CORRELATIONS BETWEEN TESTS AND ACADEMIC RECORDS Language Mathematics Science Philosophy History Coordination .12 +.05 . .03 +.03 +.15 Tapping .16 +.01 -.10 +.15 +.00 Cancellation v -+.14 +.28 +.04 +.37 +.10 Checking .01 +.22 +.06 +.10 +.02- Color Naming +.11 +.07 +.12 .07 .05 Directions +.03 -.10 -.03 -.22 +.13 Opposites +.17 -.01 +.33 +.01 +.30 Verb-Object +.04 +.03 +.23 +.17 .05 Mixed Relations +.20 +.01 +.30 +.12 +.19 Word Building +.31 +.15 +.00 .17 +.24 Word Naming +.10 +.06 +.02 +.29 +.09 Knox Cube +.18 +.24 +.34 +.28 +.08 Digit Span +.19 +.19 +.05 +.22 +.33 Word Memory Recollection . . .01 .23 .07 .27 .03 Word Memory Recognition . . +.06 +.02 +.12 +.10 +.13 Logical Memory Recollection . . +.13 +.13 +.21 -.03 +.40 Logical Memory Recognition . . .03 +.06 +.03 .08 +.02 Substitution ist Half . . . . . -.08 +.11 +.09 -.19 +.18 Substitution 2nd Half -.05 +.08 +.06 -.14 +.26 Substitution Whole .10 +.11 +.00 .19 +.14 Completion +.30 +.02 +.05 +.17 +.14 Vocabulary +.41 -.05 +.12 +.09 +.23 The Tests and Academic Marks 67 TABLE XXXIV CORRELATION BETWEEN TESTS AND INTELLIGENCE QUOTIENT Intelligence Quotient Coordination +.18 Tapping +.17 Cancellation +- 22 Checking +.20 Color Naming +.23 Directions +.20 Opposites +.24 Verb-object +.23 Mixed Relations +.20 Word Building +.22 Word Naming +.26 Knox Cube +.22 Digit Span +.16 Word Memory Recollection +.14 Word Memory Recognition ~Ki7 Logical Memory Recollection +.23 Logical Memory Recognition +.18 Substitution First Half +.27 Substitution 2nd Half +.25 Substitution Whole +.27 Completion +.21 Vocabulary +.03 History shows positive correlations with Opposites (+ .30), Word Building (+ .24), Digit Span (+ .33), Logical Recollection (H- .40), and Substitution (+ .26), i. e., with the tests involving ability to memorize logical material and ability to perceive rela- tionships between facts two essentials for successful performance in the required first-year history course at Barnard. In general, then, the five academic groups show positive corre- lation with tests which we would expect to correlate with them. Table XXXIV gives the correlations between the tests and the composite score of all the academic groups. The correlations are all positive, ranging from + .14 to + .27 (excluding Vocabulary), suggesting a positive relationship. They are, however, too low to be used for diagnostic purposes. Aside from a few correlations in Table XXXIII previously mentioned, the correlations between the various tests and each of the five academic groups are even less susceptible to use for practical purposes. In view of these low correlations and the wide variation in corre- 68 Psychological Examinations of College Students lations obtained between tests and marks by other experimenters, the question arises: Do the academic marks or the psychological tests give the more reliable estimates of the student's mental ability? The present writer believes that the psychological tests give the more adequate measures. What meager experimental data there is relevant to this question of the reliability of school marks, corroborates this view. The skewed distributions in the case of the Barnard academic grades were indicated before a fact which has been noted by investigators in the case of other institutions. 47 Professor Max Meyer, 48 making a statistical study of all the marks of forty instructors given during a period of five years at the University of Missouri, found a striking lack of uniformity in the standards of grading used. So striking was the non-uniformity that the college authorities were moved to establish a definite system of marking in 1908, with the aim of overcoming the ten- dency of the instructors to distribute grades according to personal opinion. Following Meyer, a study of the distribution of marks at the University of Wisconsin was made by Dearborn, 49 and of the marks at Harvard University and the University of California by Foster. 50 These, and studies made at the University of Chicago, Amherst College, and Columbia University, agreed in showing the same wide variation in the standards of grading employed by instructors. Aikins 51 found a slight difference in the relative positions assigned to 17 students in a philosophy class by the students themselves on the basis of several ten-minute tests, and the positions he assigned them on the basis of four hour tests. Smith gives several plates, illustrating clearly the great discrepancies and marked lack of uniformity in marking systems at the University of Iowa. 52 Zerbe, in a detailed study of the distribution of grades assigned for academic work and those assigned for shop work at the School of Applied Industries, Carnegie Institute of Technology, found that the grades as distributed for the shop work were based on a much lower standard than the grades assigned for the theoretical " Kelly in a monograph entitled "Teachers' Marks" has given a history of the standards of marking in elementary schools, high schools, and colleges. Meyer, Max. The Grading of Students, Science, 28; 243-252. Dearborn, W. F. School and University Grades. 50 Foster, William T. Scientific vs. Personal Distribution of College Credits; Popular Science Monthly, 78; 378-408. si Aikins, H. A. The Reliability of "Marks," Science, N. S., 1910, 32; 18-19. " Smith, A. G. A Rational College Marking System, Journ. of Educ. Psychol., 1911. 2; 383-393- The Tests and Academic Marks 69 subjects. 63 He also observed a marked lack of conformity to a standard in the case of grades given by individual instructors. When Jones M gave an opposites test and a memory test to each of two elementary psychology classes, taught by different instructors, he obtained these interesting results: Instructor "A" Instructor "B" (28 students) (33 students) Class standing and opposites .... .09 .49 Class standing and memory .44 .07 These correlations were explained when further investigation revealed that instructor A taught by the outline method, emphasiz- ing the memory factor, whereas instructor B discouraged verbatim statements taken from the text book. Both instructors were teach- ing the same subject, but assigning grades according to entirely different standards. After an exhaustive study of the question at Harvard and other institutions, President Foster of Reed College concluded that "Not only are there extreme variations among different courses, but there are variations in the same course from year to year that cannot be accounted for, apparently, by any of our scientific studies in the distribution of abilities among human beings. From Maine to California the administration of college credits, although alike in no other particular, agrees in this: "That its basis is personal rather than scientific." 55 Recognition of this personal equation factor has led Smith, Weiss, 56 Zerbe, Foster, Starch, and other investigators to emphasize the need of a uniform system of grading. They agree, moreover, in maintaining that the distribution of college grades, when properly assigned, should conform to the normal probability curve. In 1914, a committee on standardizing grades at George Washington University made a similar proposal. Definite attempts to enforce such systems of marking are now being used at the University of Missouri, Reed College, and other institutions. Even in a more restricted and more objective situation when instructors are asked to assign grades according to performance in a definite task as for example, in a written examination paper, there is great variability due to the widely different subjective 61 Zerbe, J. L. Distribution of Grades. Journ. of Educ. Psychol., 1917, 9; 575-588. " Jones, E. S. A Suggestion for Teacher Measurement. School and Society, 1917, 6; 321-322 . Zerbe, J. L. Distribution of Grades. M Weiss, A. P. School Grades To what Type of Distribution shall they Conform? Science , 1912, 36; 403-407. 70 Psychological Examinations of College Students standards employed by the teachers in judging. 57 Jacoby found a variation of 1.5 points out of 10 in the grades of six professors of astronomy in marking eleven astronomy papers. 58 Starch and Elliott had facsimile reproductions made of two first-year English papers and a geometry paper, printed on the same kind of paper the students had written them on. 59 These they then had rated by 142 high school teachers of these two subjects. The English papers were also rated by a class in the Teaching of English in the Univer- sity of Wisconsin and by a Summer School class of teachers in the University of Chicago. They found that the grades assigned to the two English papers by 142 English teachers ranged in the case of one paper from 64 to 98, with a probable error of 4.0, and in the case of the other from 50 to 98, with a probable error of 4.8. The grades of the mathematics paper assigned by 118 mathematics teachers ranged from 28 to 92, with a probable error of 7.5 points. 60 In a later investigation Starch had ten college freshman English papers graded independently by ten instructors of the various sections of freshman English. 61 He found as wide a range of marks as he obtained with the English and Mathematics papers of his former investigation. Moreover, when ten papers were regraded by the same instructor after a certain interval of time, Starch found an average difference between the first and second grading of 4.4 points. He also found a mean variation of the grades assigned by teachers in different schools of 5.4 points, by teachers in the same department and institution of 5.3 points, and of grades assigned at different times by the same teachers to their own papers of 2.2 points. On the basis of all his data, he concluded that the best marking scale is 100, 95, 90, 85, 80, etc., and that the distribution of grades should follow the probability curve. All the studies thus far made in this field indicate this same variation in standards of grading. There are, moreover, additional factors which render school marks absolutely unreliable measures of a student's mental ability, and cause low correlations between psychological tests and marks. 57 For illustrations of the variability of Civil Service examiners in rating the same papers, the variation between the marks of teachers in New York State on the one hand, and the Regents on the other, see Kelly's monograph. 48 Jacoby, H. The Marking System in the Astronomical Course at Columbia College, 1900- 1910, Science, 31; 819. 59 Starch and Elliott, Reliability of Grading High School Work in English, School Review, September, 1912. 60 Starch and Elliott, School Review, 21, 254-259. 81 Starch, D. The Reliability and Distribution of Grades, Science, 1913, 38; 630-636. The Tests and Academic Marks 71 James, from work done at Whitewater Normal School, gives these three reasons for the low correlations obtained by him: 62 "i. The reluctance of nearly all teachers, and their inability because of the limitations of our poor rating methods, to rate the good students as high as they should be rated, or the poor ones as low as they should be rated." "2. The rather closer application to their studies made by the less able, due to greater anxiety and more time at their disposal." "3. The easy-going satisfaction displayed by many able minds content with what is for them mediocre accomplishment, and the greater drain on their time imposed by fellow-students for outside activities of all kinds." From data obtained from a questionnaire sent to 127 delinquent college freshmen and to their high school principals, Miner con- cluded that such traits as "lack of purpose, laziness, and lack of resistance to social and other distractions" often explain a student's failure in school work. 63 Their marks in such cases are unreliable measures of their ability. Scott manifested agreement with Miner when he stated that: "Where students stood high in the tests, but low or medium in estimates, their failure to succeed in class work was usually due to laziness, timidity, or disgust for the idea of struggling for marks." 64 Abundant statistical evidence, therefore, supports our conten- tion that the striking lack of uniformity in standards of grading among instructors, making for skewed distributions of marks, the differences in grades assigned the same paper by teachers at dif- ferent times, the personal equation in marking, the tendency of many able students to neglect studies for outside distractions and of poorer students to apply themselves. more assiduously, the role played by such factors as lack of purpose or incentive, interest in outside or in college activities, economic pressure causing students to devote much time to earning money, etc., make college marks totally inadequate measures of students' ability. All these factors are influential, moreover, in making Barnard marks as unreliable as marks given in other colleges. No attempt is made by Barnard instructors to distribute their grades according to the normal M James, B. B. Mutual Correlations of Intelligence, Scholarship, and Vocabulary. School & Society, 1919, 9J 427. In School & Society, 1918, 7; 238-239, James gives similar factors as influencing the correlations between marks and tests. 68 Miner, J. B. The College Laggard. Journ. of Educ. Psychol., 1910, i; 263-271. "Scott, C. A. General Intelligence or "School Brightness." Journ. of Educ. Psychol., 1913, 41 500-524. 72 Psychological Examinations of College Students probability curve. Absolute freedom is permitted the teachers. As a result, the personal bias of the teachers plays a large part in the marks received by students. This, combined with the con- tributory causes above mentioned, renders Barnard marks untrust- worthy. The psychological tests, on the other hand, have much to recom- mend them as giving reliable estimates of freshmen's mental ability. All the tests employed are standard tests. They were, moreover, administered by one experimenter according to a care- fully standardized method of procedure. All conditions were kept constant the place of testing, the attitude of the experimenter, the method of conducting the tests, and the method of scoring. Every student undertook the examination with a determination to do her level best. Whereas, in school subjects, lack of interest or incentive often caused a girl to do a lower grade of work than she was mentally capable of doing, here there was a definite incentive impelling her to exert maximum effort. Each freshman expected to receive vocational guidance based on her test scores. She accord- ingly took the psychological test at an hour convenient for her when she was feeling in good condition. Genuine interest in the tests, (noted in the case of all students), coupled with a keen desire to make a favorable record, renders their test scores reliable esti- mates of their ability. The fact that the scores conform to normal distribution curves further indicates the reliability of these measures, We do^not claim, however, that we can_predict_a_student's future_success_in_college from her psychological test record. The psychological examination gives an adequate measure of what each freshman can do. From it we can make an authentic psycho- graph of her mental abilities indicating in which processes she is strong, and in which she is weak. Whether she will make high academic grades or attain success in later life depends not only upon her mental capacity, but upon such other factors as interest, incentive, will-power, economic stress, environmental conditions, etc. The tests, not her academic marks, measure her mental capac- ity; to predict her future performance in school or her success in a particular vocation, we must also consider these other factors. SECTION VIII CORRELATIONS BETWEEN PSYCHOLOGICAL TESTS AND PHYSICAL MEASUREMENTS. THEIR SIGNIFICANCE There is one further problem to be considered the relation existing between the psychological tests and the physical measure- ments. The correlations shown in Table XXXIII, based on the records of the one hundred freshmen in Group I, furnish an im- portant contribution to our existing meagre data on this subject. Most investigators who have hitherto reported correlations between physical traits and mental ability have used school marks or teachers' estimates as indicators of mental ability. Their sub- jects, moreover, have been school children. Porter, Smedley, De Buck, MacDonald, Gilbert, Baldwin, Pyle, King, Arnold, Wilson, and Schuyten are some of the chief workers in this field. Widely varying results have been reported, some experimenters finding positive correlations between physical traits and school progress, others negative, and still others indifferent or zero corre- lations. Discussing the significance of these varying correlations, Whipple says: "The trend of evidence is to the effect that all such correlations, where found, are largely explicable as phenomena of growth, i. e., as correlations with relative maturity. This makes intelligible the fact that, in general, the positiveness of all such correlations lessons with age, and that many of them, indeed, become difficult or impossible of demonstration in adults." 65 Of the investigations in which adults have been used as subjects, the work of Dr. Karl Pearson is perhaps the most extensive. He made measurements of 1,000 Cambridge University students, obtaining these correlations: Mental ability and dolichocephaly . . . . +.03 =*=.<>3 Mental ability and short heads . . . . .08 =^.03 Mental ability and broad heads . . . . +.04 .03 His method of rating his subjects for mental ability was extremely rough, consisting merely in grouping the men into two big classes pass men and honor men. Similar correlations obtained by Pearson between head measurements and mental ability as measured by Whipple, G. M. Manual of Mental and Physical Tests. Part I, p. 71. 74 Psychological Examinations of College Students teachers in the case of 1856 school boys twelve years of age, lead Galton to conclude "that there is no marked correlation between ability and shape or size of the head." 66 In another investigation with Cambridge students, Pearson found zero correlations between mental ability, determined roughly as indicated above, and strength of pull, strength of squeeze, long sight, weight, and ratio of weight to stature. 67 Continued testing of Cambridge students and school children lead Pearson to conclude in 1906 that "The results (of our investigations) confirm the previous conclusion that: While there exists a slight but sensible relation between size of head and intelligence, there is no possibility of using this relation to make even rough individual predictions." 68 These investigations, although interesting, have no direct bear- ing upon our problem, however, which is concerned with the rela- tionship existing between the performance of college freshmen in psychological tests and their physical measurements taken in the gymnasium. We have good reason to feel that these physical measurements are fully as reliable and accurate estimates as are the psychological test scores. The physical examinations were all conducted in the Thompson Gymnasium of Teachers College. They were given individually, the head of the Department of Physical Education of Barnard College making all the measurements. These were then immediately recorded on the student's physical record card by an assistant. Thus any inaccuracy in taking the measurements would be a constant one, and would not disturb the relative ranking of the freshmen. Experimental conditions were as uniform as in the case of the psychological tests. Each girl came to the gymnasium at an hour convenient for her and went through all parts of the examination according to a standardized method of procedure. No clothing was worn during the examination, save for two light cloth flaps which were fastened loosely about the shoulders by means of a draw string and two similar flaps fastened about the waist which could easily be raised in taking measurements. These were provided by the physical director for the occasion. Pearson, K. On the Correlation of Intellectual Ability with the Size and Shape of the Head . Proc. Roy. Soc. 1902, LXIX, 333-342. Lee, A., Lewenz, M. A., and Pearson, K, On the Correlation of the Mental and the Physical Characters in Man. II Proc. Roy. Soc., 1002, LXXI, 106-114. 68 Pearson, K. On the Relationship of Intelligence to Size and Shape of Head, and to other Physical and Mental Characters. Biometrika. 1006, 5; 105-146- Psychological Tests and Physical Measurements 75 The physical records taken were: height measured in centi- meters with a stadiometer; weight, measured in pounds with the Fairbanks scale; lung capacity, measured in cubic centimeters; and four other strength tests grip right and left hand, upper back and chest, measured in kilograms with a dynamometer. The norms for these measurements obtained for these one hundred freshmen were given in Section V. The curves of distribution for these seven measurements (which lack of space prevents us from printing), conform approximately to the normal probability curve. The subjects, moreover, with a very few exceptions, were all eighteen years of age or over, so that the factor of relative maturity does not affect the correlations. The freshmen are a rather homogeneous group with respect to age. These facts, coupled with the accuracy of both the physical and psychological measures give us good reason to believe in the reliabil- ity of the correlations in Table XXXIII. It is interesting to note that six of the seven physical measure- ments all except lung capacity manifest zero or chance correla- tions with all the psychological tests. The average correlation of each of these six measures with all the psychological tests is as follows: Height with all the tests, -f- .05; weight + .06; strength of grip, right hand, + .04; strength of grip, left hand, + .02; strength of upper back, + .02, and strength of chest, + .05. As these correlations are all less than the probable error ( .068) they indicate clearly that there is no connection between these physical measurements and a freshman's mental ability as indicated by her psychological test records. In the case of lung capacity, all the correlations (except with vocabulary) are positive. They are markedly low, though, the average correlation between lung capac- ity and all the psychological tests being only + .10. This is little more than the probable error, indicating the existence of only a chance relationship. The uniformity of the single correlations in exhibiting this tendency toward chance relationship is significant. In only eight cases out of the total number of 154 correlations, or, in fact, we might say in only six cases, since the correlations between Substitu- tion First-half and lung capacity (+ .20) and Substitution Second- half and lung capacity, (+ .26) duplicate information yielded by the correlation between Substitution Whole and lung capacity (+ .24) are there correlations of + .20 or over. The highest correlation is only + .26 (Substitution Second-half and lung capac- 76 Psychological Examinations of College Students ity), which is too low to admit of diagnostic purposes. With these few exceptions, all the correlations between physical measurements and the tests 146 correlations in all show approximately zero relationship. The large number of these correlations justifies us TABLE XXXV CORRELATIONS BETWEEN TESTS AND PHYSICAL MEASUREMENTS 1. Coordination . 2. Tapping . . 3. Cancellation 4. Checking . . . . '. . 5. Color Naming .... 6. Directions ' 7. Opposites 8. Verb-object 9. Mixed Relations .... 10. Word Building .... 11. Word Naming .... 12. Knox Cube 13. Digit Span 14. Word Memory Recollection 15. Word Memory Recognition 1 6. Logical Memory Recol- lection +.18 +.22 +.22 -.05 +.01 +.10 +.09 17. Logical Memory Recogni- tion +.09 +.ii +.14 -.05 +.04 -.01 +.05 18. Substitution First Half . +.19 +.05 +.20 +.02 +.07 .00 +.02 19. Substitution 2nd Half . . +.17 -.02 +.26 +.05 +.07 +.09 +.02 20. Substitution Whole . . +.19 +.00 +.24 +.04 +.07 +.06 +.01 21. Completion -.12 .05 +.04 -.02 .05 +.02 .03 22. Vocabulary -.02 +.07 -.17 +.07 .05 .04 .15 Average +.05 +.06 +.10 +.04 +.01 +.01 +.04 in concluding that the relationship between the physical measures and the tests is one of chance only. It is interesting to know that the only other experimenter who has reported the results of a similar study with college freshmen supports this view. Although Wissler in his study of the results of the old Columbia freshman tests reports only two correlations Psychological Tests and Physical Measurements 77 between the physical tests and the psychological tests namely, a correlation between length of head and logical memory of -f .21, and between breadth of head and logical memory of - .05 the observation of the records of freshmen in other physical tests com- pared with their records in the psychological tests lead Wissler to conclude : "That the physical tests show a general tendency to corre- late -among themselves, but only to a very slight degree with the mental tests." 69 Although the physical measurements exhibit only a chance connection with a freshman's psychological test score, they should be taken into consideration by an instructor or advisor whose duty it is to give guidance to a student in planning her college course. In Section V we pointed out the case of a freshman (Chart 3, G.S.), whose net score in the psychological examination was well above the average freshman record, but whose standing in academic work was in the lowest quintile of the class. The fact that she made the best record in the class in the physical measurements, together with the information we later acquired concerning her athletic activities, explained her academic failure. The more varied measures of a student we have, the better qualified we will be to make an adequate psychograph of a student's relative abilities and disabilities, in various lines. 'Wissler, Clark. Psychological Review Monograph Supplement, June, 1901. GENERAL SUMMARY OF THE RESULTS WITH SUGGESTIONS FOR THE PRACTICAL USE OF THE TESTS A series of nineteen psychological tests was given to two groups of one hundred Barnard freshmen each with the aim first of estab- lishing norms and standards of performance and giving students a clear conception of their abilities and aptitudes along various lines and second of determining the reliability of the tests and their correlations with freshmen university grades and physical measure- ments. All the tests were given individually according to a standardized method of procedure and under standard conditions. The averages and surfaces of distribution for the first group of one hundred freshmen (Group I) are approximately the same as for the second group of one hundred (Group II) and for a third group of forty-five freshmen showing that Barnard freshmen are a homogeneous group, differing little from year to year. The inter-test correlations range from + .77 (between Cancella- tion and Digit Span) to .00 (between Tapping and Word Recollec- tion and between Mixed Relations and Word Recollection). The positive correlations between Cancellation and the other tests (+ .03 to + .77) contradict the old compensation theory. The fact that the correlations are all positive is suggestive of a definite relationship between Cancellation and these various tests. Checking and Word Naming show the highest average correla- tion (+ .25) with the other tests (omitting Information, Vocabu- lary, Word Recollection and Word Recognition); then, in order, Opposites; Verb-object and Cancellation; Color Naming; Direc- tions, Mixed Relations, Word Building, and Completion; Logical Recollection and Substitution Whole; Knox; Tapping and Digit Span; Coordination; Logical Recognition. On the whole, the inter-test correlations, although mostly posi- tive, are low, indicating that we are testing different mental abilities. On the basis of the relationship shown by the correlation coeffi- cients we may divide the tests into three groups: (i) motor tests (Coordination and Tapping); (2) tests involving powers of per- ception and comprehension (Cancellation; Checking, Color Nam- General Summary of Results 79 ing, Word Naming and Substitution) ; (3) tests involving associa- tive relations (Directions, Opposites, Verb-object, Mixed Rela- tions, Word Building and Completion); (4) tests which call into play powers of learning, viz., observation and retention (Word Memory and Logical Memory); (5) tests depending on the sub- ject's knowledge more than on her innate ability (Information and Vocabulary) ; (6) miscellaneous group (Digit Span and Knox Cube). There is only a chance correlation between Information and Vocabu- lary and the other tests. With the exception of this group and Digit Span and Knox Cube, the remaining groups of tests correlate closely among themselves but loosely with the other tests. There is no evidence from these results of a general common factor nor of a hierarchial arrangement of the correlations. The tests within each group seem to be closely related to each other because they possess elements in common elements serving to bind them closely to each other but loosely to tests without their own groups. The coefficients of correlation corrected for attenuation are con- siderably higher than the raw correlations but show in general the same relationships. The coefficients of reliability are low for Word Recollection (-f .18), Word Recognition (+ .33) and Logical Recollection (+ .48). For the other tests they range from + .88 (Checking and Color Naming) to + .60 (Cancellation and Mixed Relations). We have, thus, a series of sixteen reliable tests. Inter-test correla- tions based upon the scores in these sixteen tests are accurate indicators of the true relationship existing between these tests. The psychological tests show low correlations both with each of five academic groups (i) Language, (2) Mathematics, (3) Science, (4) Philosophy and (5) History, and with the composite score of all the academic marks (+ .14 to + .27). Lack of uniformity in standards of grading among instructors, causing skewed distribution curves of marks, the personal equation in marking, the role played by such factors as lack of incentive, interest in outside or college activities, economic pressure, etc., make college marks inadequate measures of the students' ability. There is evidence that the psychological tests give a true estimate of each freshman's mental capacity. To predict her performance in school or in a future vocation both her capacity and such other factors as interest, incentive, will-power, environmental conditions, etc., must be considered. 8o Psychological Examinations of College Students The correlations between the physical measurements and the psychological tests show approximately zero or chance relationship. Psychographic charts may be constructed, showing each student her relative rank in the tests, academic grades and physical measure- ments. Such psychographs may be put to practical use as for example, in cases where a student is doing academic work of a grade below the level her test record showed her capable of. The results of this investigation make it possible to offer a few tentative suggestions to college administrators who desire to insti- tute a system of student guidance. The first step in such a plan might well be to put each member of the freshman class through a thorough physical examination to determine her physical fitness for undertaking college work. This examination should be made by the director of the Physical Education department or a com- petent assistant in the department. Students with correctible physical defects should be given proper treatment eyeglasses, special physical exercises or what not, according to their needs. Those suffering from a slightly run down condition might be advised to take a light program until they regained their normal condition; those too far below par might be advised not to enter college. The second step might be to obtain an estimate of her mental capacity on the basis of her score in a psychological examination. A psychologist (who might also act as vocational advisor) with an assistant might well be in charge of this work. If possible, each freshman should be tested individually, the same experimenter conducting all the tests according to a standard method of pro- cedure. As for the particular tests to be used, they should be varied in character, adapted to measure various mental abilities. A series that may be divided into several groups, each group testing a rather definite mental ability and such that tests within each group correlate highly among themselves but loosely with all tests outside their own group, as in the present investigation perhaps represents the ideal type of tests. The particular series of tests employed in this study is not, however, recommended as the best series of tests that might be used. It is very probable that a series could be found that will test more significant mental abilities and such that the tests within each group will correlate more closely with each other and more loosely with other tests. Only by empiri- cally trying out different series can the ideal series be found. Where lack of time or the size of the freshman class makes it impossible to test each freshman individually, a comprehensive General Summary of Results 8 1 group test that has been found successful as for example, the Army Alpha or the Thorndike Group test may be employed. In view of the successful results secured with these group tests and the speed with which they may be administered, it may well be that such a comprehensive group test as the Thorndike test would be the best to employ. In the case of students who barely passed or who failed in this group test, such a series of tests as that used in the present investigation might be used to supplement the results of the group test. It would seem that a group test which might be supplemented, where necessary, by an individual examina- tion would be the ideal arrangement. As we stated before, a psychologist and an assistant should pref- erably be in charge of the psychological testing. Perhaps a group of fifteen to twenty persons with some experience in scoring psy- chological tests might be employed to score the tests immediately after the psychologist has given them. In this way the examinations might be easily scored within three or four days and the reports made out for each student very soon after. The results of the psychological examination and the physical examination together with the student's academic entrance record, might then be sub- mitted to the psychologist or vocational advisor. On the basis of these records, psychographic charts might be made out for each student indicating her strengths and weaknesses. The vocational advisor might then have an immediate interview with such stu- dents who showed any marked disabilities. In this personal con- ference the advisor might try to obtain from the student pertinent information concerning her interests, economic status, environ- mental conditions, etc. All these supplementary items of informa- tion would then enable him to form a comprehensive idea of the student's mental and moral calibre. With this as a basis vocational advice could be given the student regarding her choice of subjects, study habits, participation in extra curricula activities, etc. Per- haps such students might be asked to report at stated intervals for further conference. Much the same procedure might be followed with the other students except that here fewer conferences would be necessary. The advisor should be free to devote all his time to supervising the academic career of the students and to rendering needed advice. Obviously such a man should be a psychologist with both ability to interpret the various measures secured of each student's ability and tact in persuading students to follow his suggestions. From the 82 Psychological Examinations of College Students attempts that have thus far been made in certain institutions to guide students' academic careers, it seems probable that with an able vocational advisor aided by a competent assistant such a sys- tem would be a distinct help in stimulating students to exert maxi- mum effort in doing their college work. VITA The writer was born in Norwich, Chenango County, New York, on October I, 1895. She graduated from Girls' High School, Brook- lyn. In 1916 she received the degree of Bachelor of Arts from Barnard College, and in 1917 the degree of Master of Arts from Columbia University. Her work in psychology has been done under Professors R. S. Woodworth, H. L. Hollingworth, L. S. Hollingworth, E. L. Thorndike and A. T. Poffenberger. She was assistant in psychology in Columbia University 1918-20; assistant in Summer School, 1918, 1919, and 1920; assistant and instructor, Extension Teaching, 1918-19; and instructor, Extension Teaching, 1920. THIS BOOK IS DUE ON THE LAST DATE STAMPED BELOW AN INITIAL FINE OF 25 CENTS WILL BE ASSESSED FOR FAILURE TO RETURN THIS BOOK ON THE DATE DUE. THE PENALTY WILL INCREASE TO 5O CENTS ON THE FOURTH DAY AND TO $1.OO ON THE SEVENTH DAY OVERDUE. 12Apr'55 apK 2 195S U 'D LD MOV 7 MftVS LD 21-100n-7,'33 517566 UNIVERSITY OF CAUFORNIA LIBRARY