In. 
 
 THE NORMAL SGHOOL 
 QUARTERLY 
 
 Series 14 January, 1916 Number 58 
 
 Standards Employd in the 
 Determination of Teach- 
 ing Efficiency 
 
 
 By 
 
 EDWIN A. TURNER 
 
 PUBLISHT JANUARY, APRIL, JULY, AND OCTOBER OF EACH 
 
 YEAR BY THE ILLINOIS STATE NORMAL UNIVERSITY, 
 
 NORMAL, ILLINOIS 
 
 Enterd August, 1902, at Normal, Illinois, as second-class mail matter 
 under Act of Congress of July 16, 1894 
 
 N. B — Any teacher in Illinois may get the Normal School Quarterly regularly 
 by sending exact name and address, and by giving prompt notls of any change of 
 address. 
 
 Simplified spellings ar used in the offlsial publications of the Illinois State 
 Normal University. 
 
UNIVS RSfTY OF ILLINOIS LIBRARY 
 
 nm 3 3 1917 
 
 Normal School Quarterly 
 
 Publisht 
 
 by the Illinois 
 
 State Normal University, 
 
 Normal, 
 
 Illinois 
 
 Series 14 
 
 
 JANUARY. 1916 
 
 
 No. 58 
 
 STANDARDS EMPLOYD IN THE DETERM- 
 INATION OF TEACHING EFFICIENCY 
 
 By 
 Edwin A. Turner 
 
 At present our pedagogical literature bristles with the term 
 efficiency. Even writers of ability use it extravagantly. The 
 term itself seems to satisfy. It suggests the shop, the factory, 
 and the salesroom, where performances are judged in terms of 
 the concrete and where definit standards are blockt out in open 
 competition. It apparently pacifies the longing for scientific ac- 
 curacy and generates a feeling of confidence in him who sets it 
 up for his goal. 
 
 Unfortunately the teaching profession in the main has adopt- 
 ed efficiency as its slogan without making adequate provision for 
 determining when it is attaind. Until the spokesmen for the 
 profession can in a very simple and in a very practical way point 
 out the meaning of efficiency as it relates to specific attainment 
 and can give explicit directions for determining the degree of 
 efficiency of this or that sort of teaching, the term efficiency 
 must be considerd more or less platitudinous. 
 
 In the industries the ability of the performer is easily mes- 
 ured, since the products of his labor are objectiv, concrete, and 
 redily subjected to comparativ tests. The efficiency of the black- 
 smith is mesured by the length of time the shoe clings to the 
 hoof and by the degree of comfort it gives the horse. The effi- 
 ciency of a dentist is mesured by the length of time the filling 
 remains in order or by the permanency and comfort of the bridge 
 he has made. The efficiency of a gardener is determind by the 
 number and quality of vegetables produced per unit of area. In 
 any case when the result is better than that ordinarily produced 
 the performer is thought of as having superior ability and conse- 
 quently he is considerd efficient. 
 
Subjectivly considerd, efficiency is the ability to produce su- 
 perlativ results consistently. The median or average of a number 
 of such abilities is a desirable standard to use in an endevor to 
 determin the merit of individual performances. In the industrial 
 and scientific fields such standards are well known. In the 
 teaching profession we have just begun to use them advantageous- 
 ly. We cannot hope to attain efficiency until we are able to 
 determin when it is attaind. 
 
 With the single exception of the minimum knowledge re- 
 quirement, which is generally provided by law, there is no other 
 legally accepted standard for judging the ability of teachers. The 
 wide and varied use of standards employd in determining the 
 ability of teachers is notorious. 
 
 The far-reaching significance of the conditions resulting 
 from the application of dissimilar standards is beyond the com- 
 prehension of those who evaluate the teaching process in terms 
 of local and personal standards. There is not a little evidence to 
 substantiate the opinion that subnormality, retardation, disin- 
 terestedness, disobedience, and withdrawals from school are the 
 direct result of the inadequate standards held by administrators 
 and teachers. Until some of the standards now employd in mes- 
 uring the results of the teaching process are discarded and others 
 are materially modified, the proportion of abnormalities occur- 
 ring in the schools will not be materially changed. 
 
 STANDARDS OF MESUREMENT 
 
 There are two distinct classes of standards now employd in 
 determining the merit of teaching. These may well be cald the 
 a priori standards and the objectiv standards. The former are 
 deductions based upon definitions formd, principles assumed, or 
 inferences drawn from known causes. The latter are based upon 
 the mesured abilities of pupils. 
 
 i. A PRIORI STANDARDS 
 
 This class of standards is in the main the outgrowth of an 
 attempt on the part of those who have been responsible for the 
 direction of educational agencies to account for the character of 
 the servises renderd by teachers, on the basis of some real or 
 imaginary principle either directly or indirectly related to the art 
 of teaching. The quality and relativ value of each standard in 
 
this class depends upon the educational ideals and insight of 
 those who have establisht it. 
 
 The standards employd in the early stages of educational 
 development and those still employd by persons unfamiliar with 
 the essentials of the teaching process are crude and often ludic- 
 rous. On the other hand the standards which have been estab- 
 lisht by educational experts, in the light of recent research, are 
 excedingly valuable in that they stimulate an analysis of the pro- 
 cess and give valuable direction to teaching. 
 
 The Attitude of Pupils and the Community Towards the Teacher 
 
 This standard is too frequently used by school officials in 
 determining the efficiency of their teachers. If the children and 
 the community are fond of a teacher it is assumed that he is 
 giving splendid servis in the classroom. If he is not generally 
 popular it is taken for granted that he is giving poor servis. 
 Doutless this standard was developt in and about the private 
 school, and especially the subscription school where the teacher 
 "boarded around". Under such conditions adaptability was the 
 prime requisit of survival. In spite of the wonderful growth in 
 the science of teaching there still exists in some communities 
 the notion that popularity is an index of efficiency. 
 
 It is reasonably certain that a teacher of character and of 
 fine teaching ability will win the respect and usually the admira- 
 tion of his pupils and patrons. It is quite as reasonably certain 
 that a relativly inferior teacher may and not infrequently does 
 win the esteem and harty support of the entire community in 
 which he teaches. This esteem may result from local political 
 activity, church connections, participation in club activities, or it 
 may be in response to a wholesome attitude of the teacher towards 
 the life of the community, all of which may be excellent sup- 
 plementary qualities for a teacher to possess. Certainly they 
 should not be the main consideration in the selection of a teacher. 
 
 Being a "good fellow" is an enviable human trait, but it has 
 no legitimate place among the basal standards which are employd 
 in determining the worth of teachers. The social and personal 
 qualities of the officers of a bank do not become an incentiv to 
 me as a depositor until the standing of the bank and the integrity 
 of the officials have been ascertaind. The harty greeting and 
 the talkativ propensities of a barber do not become an induce- 
 
ment to me to patronize his shop until I have determind the fine 
 quality of his razor and the sanitary practises of his establish- 
 ment. No thoughtful parent will let church connections, social 
 prestige, political affiliations, or friendship of long standing be 
 the predominating factor in the choice of a physician for his 
 dangerously sick child. Certainly there are stronger reasons why 
 these supplemental and most desirable qualities should not be con- 
 siderd basic in the selection of a teacher. 
 
 Character of Grades and Number of Promotions 
 
 Another common and widely used standard of judging 
 teaching efficiency, closely related to the above, is that of grades 
 and promotions. It is passing strange that this standard of mes- 
 urement should be relied upon so extensivly. A parent usually 
 thinks his children well taught if they receiv high grades. He 
 is quite as strongly convinced of the teacher's inferiority if his 
 children fail of promotion. In view of recent investigations in 
 respect to the reliability of grades, as an index of actual achieve- 
 ment, this standard is a travesty upon the science of education. 
 A grade as ordinarily determind is, to say the least, the expres- 
 sion of a conglomerate impression which may be colored by a 
 single performance of the pupil, by his general attitude toward 
 the school, by the emotional controls of the teacher, or by the 
 personal relations which exist between teacher and pupil or 
 between the teacher and the family of the pupil. 
 
 Grades vary in proportion to the variation of personal stand- 
 ards. It is reasonable to suppose that an easy-going teacher is 
 more likely to give high grades than is the teacher who is ex- 
 cessivly conscientious and diligent in an endevor to improve the 
 standing of his pupils. It not infrequently happens that the 
 grades of two chums, or of two children whose families are inti- 
 mate, are adjusted from month to month so that first one pupil 
 and then the other has the higher grade. It is notorious that 
 good children receiv higher grades in proportion to their ability 
 than do mischievous children. Other influences well known to 
 the profession are factors in determining grades. The multi- 
 plicity of factors involvd in grade making is a strong indictment 
 of the practis of judging teachers exclusivly or even partially 
 on the basis of the promotion list. 
 
Classroom Technique 
 
 The value of this standard rests on the assumption that there 
 is a close correlation between the character of the stimuli employd 
 by the teacher and the character of the child's controls which 
 result from the use of such stimuli. 
 
 On the basis of this assumption one procedes to determin a 
 teacher's efficiency by an examination of her classroom technique. 
 The following items are usually considerd in such procedure: 
 (1) forms of presenting subject-matter, such as the lecture 
 method, the textbook method, the developing method, including 
 a combination of one or more of these methods ; (2) the character 
 of the question employd — the direct question, indirect question, 
 elliptical question, leading question, etc.; (3) the sort of other 
 devices used — illustrations, drawings, field trips, concrete materi- 
 als for science work, pictures, maps, etc. ; (4) the language of 
 the teacher, his intonation, the board work, the general appear- 
 ance of the classroom, and especially the spiritual atmosfere of 
 the room. 
 
 This standard is decidedly more reliable than either of those 
 previously considerd. It finds justification in the common agree- 
 ment that the majority of teachers who get splendid results employ 
 a good technique. In fact, teachers of this type find technique 
 indispensable. It is in harmony also with certain generally 
 accepted psychological principles. However, technique is not of 
 itself is sufficient guarantee of adequate results, because of the 
 large number of variables introduced in its application. The 
 value of a device depends in large mesure upon the experiences, 
 judgment, temperament, zest, clearness of vision, physical energy, 
 and high ideals of the teacher. Without these attributes in their 
 proper proportion, technique in operation resolvs itself into the 
 lifeless movement of school machinery; with them it insures 
 accuracy, effectivness, consistency, and the proper distribution 
 of time and energy. 
 
 The Reactiv Attitude of the Child 
 
 In discussing the relativ merit of this standard with that of 
 the preceding one, F. M. McMurry says : "Teachers, supervizors 
 of teachers, and authors of books on teaching, have been so in- 
 tently observant of the procedure of the teacher that they have 
 overlookt that of the pupil. Yet the center of gravity of the 
 
school lies in the pupil, and what he himself finally does determins 
 the value of the teacher's efforts. He, therefore, should be the 
 primary object of consideration rather than the teacher, and the 
 quality of the instruction should be judged mainly in terms of 
 his activity." 
 
 In conformity to this notion McMurry formulated the fol- 
 lowing criteria for the mesuring of teaching efficiency : ( 1 ) Motiv 
 on the part of the child; (2) Consideration of values by the 
 pupils; (3) Attention to organization by the pupils; (4)Initiativ 
 by the pupils. 
 
 The superiority of this standard over those previously men- 
 tiond is at once evident. It strikes right at the hart of the lern- 
 ing process, or as Tompkins would put it, at the spiritual unity 
 within the child. The author of the above criteria not only be- 
 lievs in the theory that "the center of gravity of the school lies 
 in the pupil", but he applies this theory daily in his classroom. 
 Those who hav attended his classes know that he practises all 
 that he preaches. 
 
 If the pupil's reactiv attitude is the key to educational direct- 
 ion and the goal of educational effort, as we believ it to be, it 
 is fair to assume that it should be of paramount consideration in 
 any attempt to determin the quality of teaching. 
 
 As principles of direction the above criteria are all that is 
 desired. They force analysis of the teaching process, and suggest 
 the proper distribution and emfasis of the teaching agencies. 
 They are basic to our whole scheme of pedagogy. To abandon 
 the principles underlying these criteria would be to ignore teach- 
 ing as a profession. 
 
 Tho indispensable as an agency for the improvement of 
 teaching these criteria are decidedly inadequate as a means of 
 determining the relativ merit of teaching. Their inadequacy is 
 due to the fact that the character of their application depends 
 entirely upon the judgments of those who attempt to determin 
 the merit of teaching. The necessity of interpretation introduces 
 a decided variable. 
 
 The decisions of several judges as to the merits of a certain 
 recitation will vary in proportion to the variation in their exper- 
 iences and insight. What may seem to be "motiv on the part of 
 the child" to one observer may appear as excessiv emotion to 
 another. Indications of a "consideration of values" to one judge 
 
may appear as a wanton neglect of essentials to another. "Atten- 
 tion to organization" to another observer may impress his as- 
 sociates as being a mere juggling of facts. Indeed, what may 
 seem to one critic as "initiativ of the pupils" may appear to another 
 as rampant individualism. Just as the jury is an uncontrollable 
 variable in the machinery of justis, so the supervizor as a per- 
 sonal judge of teaching efficiency is a variable which is exced- 
 ingly difficult to reckon with in the application of the McMurry 
 criteria. 
 
 Subjectiv Guides and Scales 
 
 Numerous guides and scales hav been developt of recent 
 years for estimating the work of teachers. These ar valuable 
 to the supervizor in that they force analysis of the teaching act 
 and thereby make it possible for him to point out definitly the 
 strong and weak points in the recitation, and afford an oppor- 
 tunity for him to give the teacher some practical suggestions as 
 to the improvement of his methods. 
 
 The following "Ten-point scale" is somewhat typical of 
 helps of this sort: 
 
 TEN-POINT SCALE FOR ESTIMATING CLASSROOM WORK 
 IN HIGH SCHOOLS 1 
 
 I. "Setting" of class topics in the course. 
 II. Mastery of intellectual content and effectiv logical organ- 
 ization of materials. 
 III. The mechanics of classroom management. Economy of 
 
 time and grasp of pedagogical technique. 
 IV. Effectiv emfasis upon the mental processes and values 
 peculiar and essential to the subject. 
 V. Independence of teacher and class as a growth toward 
 their material. 
 VI. Suitability to the pupil of the type of recitation employd. 
 VII. The "common sense" factor. 
 VIII. Evidence of culture versus mere erudition. 
 
 IX. Class participation and class sense of responsibility. 
 X. Class respect for lerning. 
 
 Scales of this sort do not, however, materially assist the 
 supervizor in judging the relativ results of teaching. In the 
 
 1A tentativ scale now being- prepared by Professor Charles Hughes Johnston 
 of the University of Illinois. 
 
8 
 
 application of this scale as in the application of the McMurry 
 standards a markt variable is introduced in the judge who applies 
 it. Furthermore, the points are not of equal significance. Some 
 of these points are several times more significant than others. 
 Two teachers of widely different abilities when mesured by this 
 scale may receiv the same numerical mark. One may be stronger 
 in the essentials ; the other stronger in the non-essentials. 
 
 II. OBJECTIV STANDARDS 
 
 Objectiv standards may be divided roughly into two classes: 
 (1) standardised tests; (2) standardised scales. The former is 
 a graded series of problems accompanied by the number of cor- 
 rect answers obtaind by a median pupil of a widely selected group. 
 The Courtis Standard Tests, The Kansas Silent Reading Test, 
 and The Thorndike Reading Scale are standards of this type. 
 The latter is an arrangement of the carefully prepared work of 
 pupils into an evenly graded system which has been determind 
 and evaluated by a number of competent judges. Thorndike's 
 and Ayres' Handwriting Scales, The Harvard-Newton Com- 
 position Scales, and Thorndike's Drawing Scale are standards of 
 this type. 
 
 A historical survey of the objectiv standards, accompanied by 
 a discussion of their relativ merit, is perhaps the easiest and dout- 
 less the most pedagogical way of showing the relativ educational 
 value of these standards as agencies in determining the quality 
 of teaching and in paving the way for placing teaching upon a 
 scientific basis, a distinction which it does not as yet merit. 
 Origin of Objectiv Standards in America 
 
 So far as I can ascertain, Dr. J. B. Rice is the father of the 
 objectiv standard in America. Zelous for better opportunities 
 for the child, enthused by his recent psychological studies at Jena 
 and Leipsic, free from prejudices which sometimes result from 
 inferior teaching, he set for his task the exposition of certain 
 evils which he conceivd to exist in the public schools. Conse- 
 quently from 1891 to 1896 he became a critical student of educa- 
 tion. He visited and examind the schools of one hundred Amer- 
 ican cities. He pointed out in the colums of the "Forum" what 
 seemd to him remedial mesures for these schools. After four 
 years of constant investigation he came to the very decided con- 
 viction that concerted effort towards obtaining satisfactory re- 
 
suits in public education is impossible until we know what 
 satisfactory results are. "If we do not know", he wrote in the 
 "Forum", December, '96, "what we mean by satisfactory results, 
 how shall we be able with any degree of intelligence to judge 
 when our task has been satisfactorily performd? Until we come 
 to a definit understanding in regard to this matter, our entire 
 educational work will lack direction and we shall continue as 
 heretofore, to grope our way along passages completely envelopt 
 in darkness in an endevor to land we know not where. 
 
 "If we might have a standard which would enable us to tell 
 when our task has been completed, our attention might be 
 earnestly directed towards the discovery of short cuts in educa- 
 tional processes. For want of such a standard each individual 
 teacher has thus far been a law unto himself ; permitted to ex- 
 periment on his pupils in accordance with his own individual 
 educational notions, whether inherited from his grandfather or 
 the result of his study and reflection, entirely regardless of what 
 was being done by others. So long as this condition is possible, 
 pedagogy cannot lay claims to recognition as a science. Until an 
 accurate standard of mesurement (my italics) is recognized by 
 which such truths may be discoverd, ward politicians will con- 
 tinue to wield the baton and educational anarchy will continue to 
 prevail." 
 
 The First Objectiv Standard 
 
 Dr. Rice was not a faddist. Indeed, he was excedingly 
 practical. In his characteristic way he set out in 1896 to establish 
 a standard of mesurement for spelling. He undertook personally 
 the herculean task of examining 13,000 children in spelling. This 
 investigation extended over a period of sixteen months and in- 
 cluded sixteen American cities. 
 
 The children were tested on a list of words, on words given 
 to them in sentences, and on the words used in their compositions. 
 The tabulated results in the "Forum" for April, 1897, is, so far as 
 I have discoverd, the first objectiv standard in spelling or in any 
 other subject. The list of words standardized by him consists 
 of too few words to be of servis in judging the spelling abilities 
 of children. The list of words presented in sentences is subject 
 to the same criticism. This objection does not hold for his com- 
 position test. Had he estimated the percent of words correctly 
 speld in the compositions on the basis of the number of different 
 
10 
 
 words used, insted of upon the basis of the entire number of words 
 used, he would have establisht the first practical objectiv standard. 
 As it is his percents of words correctly speld are entirely too 
 high. 
 
 Rice's Arithmetic Test 
 
 In the October number of the "Forum", 1902, Dr. Rice re- 
 ported the results of an arithmetic test which he had conducted 
 in seven different cities, including eighteen bildings and 8,000 
 children. As Stone pointed out, later, Dr. Rice's results were not 
 satisfactory as a standard, due to certain limitations in the prob- 
 lems used and the character of the methods employd in gathering 
 and scoring these. 
 
 Rice's Language Test 
 
 One year later Dr. Rice gave a detaild report of the test he 
 made in language. This test extended to nine cities, and included 
 twenty-two schools, containing 8,300 children. The compositions 
 were arranged in five groups on the basis of relativ merit. The 
 papers of each group were graded 100%, 75%, 50%, 25%, 
 0% respectivly. The results showd conclusivly that there was 
 a wide variation in the English abilities tested by him, but owing 
 to the strong probability of error in his results they hav not 
 been employd as a standard for determining English ability. 
 
 Tho Dr. Rice's results are of little value as standards, his 
 experiments have stimulated two lines of research in education 
 which are fraught with wonderful possibilities. I refer on the 
 one hand to the investigations which have had for their goal the 
 establishment of objectiv standards of mesurement, and on the 
 other to the investigations to determin minimum essentials. Both 
 of these problems were raisd by Dr. Rice and he has lived to see 
 some partial solutions of both. 
 
 The Cornman Spelling Standard 
 
 Dr. O. P. Cornman, of Philadelphia, stimulated by the work 
 of Rice, carried on a series of tests in spelling by the composition 
 method, extending from June, '96, to June, '98. In 1903 he 
 publisht the results of this investigation in a volume entitled 
 Spelling in the Elementary School. Cornman's data were care- 
 fully gatherd and the results methodically tabulated. He sub- 
 stituted the median for the average employd by Rice. In his 
 composition test Rice counted all the words which were speld 
 
11 
 
 correctly, including all recurring words when properly speld. 
 When a misspeld word recurd he counted it but once. On this 
 basis of counting he determind the spelling abilities of the chil- 
 dren in terms of the percentage of words speld correctly. This 
 accounts for the high percentages which he reported. Cornman 
 counted all words in the composition and determind the ratio of 
 the speld words and misspeld words in terms of percent. He 
 not only counted the recurring words which were speld correctly 
 but the recurring misspeld words as well. This accounts for his 
 percentages being lower than those reported by Rice. 
 
 The work of Rice and Cornman stimulated many young men 
 in the large educational centers. Edward L. Thorndike, who has 
 since become the wizard of the objectiv standard, wrote in the 
 "Forum" in 1905 as follows : "The study of education is begin- 
 ning to be quantitativ, we are becoming properly disgusted with 
 the one-sided booking which only takes account of dollars spent 
 and neglects the debit side, the income in knowledge, habits, 
 power, zeal and ideals. This ambition toward an exact objectiv 
 mesurement of the results of educational endevor is a symptom 
 of helthy scientific fervor and also of common sense wisdom. 
 No one possest of science or sense will deny the value of suc- 
 cessful quantitativ study of school work." 
 
 Arithmetic Abilities of Children in the Sixth Grade (Stone) 
 
 In 1908 C. W. Stone publisht in the Columbia University 
 Contributions to Education a report on the arithmetical abilities 
 of children in the six-A grade. Mr. Stone personally conducted 
 the examinations in twenty-six school systems, including seventy- 
 nine schools and 6,000 children. He gave one test in the funda- 
 mentals and one in the reasoning processes. 
 
 Stone's method of gathering data and of tabulating results 
 was superior. He set a standard in this particular which has 
 been emulated by later investigators. The exercises in the 
 tests proved, as Courtis pointed out, too complex for practical 
 mesurements. The results were a mesure of a combination of 
 abilities in the fundamentals and in the reasoning processes, and 
 consequently were difficult of interpretation and application. 
 Because of the difficulty of applying his results they have not 
 been used extensivly in determining arithmetical abilities. 
 
12 
 The Thorndike Handwriting Scale 
 
 The first satisfactory result from a practical point of view of 
 all the agitation for quantitativ standards of mesurement occurd 
 in 1910. The Thorndike Scale for Judging Handwriting appeard 
 in the Teachers College Record of that year. Referring to this 
 scale, Ayres says, "The credit of developing the first mesuring 
 scale for handwriting belongs to Professor Edward L. Thorndike 
 of Teachers College, Columbia University. The publication, in 
 March, 1910, of his Handwriting Scale constituted a most im- 
 portant contribution not only to experimental pedagogy, but to 
 the entire movement for the scientific study of education." 
 
 In reference to the need of such a scale Thorndike said, "At 
 present we can do no better than estimate a handwriting as very 
 bad, good, very good, or extremely good, knowing only vaguely 
 what we mean thereby, running a risk of shifting our standards 
 with time, and only by chance meaning the same by a word as 
 some other student of the facts means by it. We are in the 
 condition in which the students of temperature were before the 
 discovery of the thermometer, or any other scale for mesuring 
 temperature beyond the very hot, hot, warm, lukewarm, and the 
 like, of subjectiv opinion." 
 
 Altho, as Ayres pointed out, this Handwriting Scale con- 
 stituted a most important contribution not only to experimental 
 pedagogy but to the entire movement for the scientific study of 
 education, Professor Thorndike in his presentation was sensitiv 
 of its imperfections. He says : "The scale is presented now in 
 spite of its imperfections, for these reasons. It is the result of 
 some twenty ratings, and ensures mesurements far more accurate 
 than anyone could make without it. For the present needs of 
 school practis and educational research, a very precise instrument 
 for mesuring handwriting is not required. The best way to get 
 a more perfect scale is by the use of this one as a starting point." 
 
 The Thorndike scale represents types of the handwriting of 
 children of grades five to eight inclusivly. The writing from 
 these grades was groupt into eleven groups on the basis of 
 quality. The quality of the groups is represented by figures 7, 8, 
 9, 10, 11, 12, 13, 14, 15, 16, and 17 respectivly. Quality 7 repre- 
 sents the poorest samples taken from grade five and quality 17 
 represents the best samples taken from grade eight. The steps 
 
13 
 
 of difference between the qualities were equal in the sense of 
 being cald equal by from twenty-three to fifty-five competent 
 judges. This means "that 14 is as much better than 13 as 13 is 
 than 12 ; that 13 is as much better than 12, as 12 is better than 11, 
 and so on ; that quality 14 is two times as far above zero merit in 
 handwriting as quality 7. 
 
 The scale includes quality 18, which was taken from a copy 
 book, and qualities 4, 5, and 6. Samples 5 and 6 were taken from 
 the fourth grade and sample 4 was manufactured for the purpose 
 of extending the scale below the merit of fourth-grade children. 
 
 The Thorndike Handwriting Scale is easily applied in testing 
 the quality of handwriting. After a little experience a teacher 
 can scale the writing of her entire room in a very short time. By 
 means of such a scale we have often mesured the writing of an 
 entire room in less time than a forty minute period. The several 
 samples supplied for each of the qualities 16, 15, 14, 13, 12, 11, 
 9, and 8 make it especially easy to apply this scale. 
 
 Teachers who habitually think of quality in terms of grades 
 can, for all practical purposes, easily transfer the qualities of the 
 scale into grades by multiplying the numbers of the scale by 5.8. 
 Those who have mesured the merit of handwriting with this or 
 the Ayres' scale will not be content to judge the merit of writing 
 in terms of personal experience. 
 
 The Ayres Handwriting Scale 
 
 In November, 1911, Leonard P. Ayres of the Russel Sage 
 Foundation began a preliminary experiment to determin the 
 relativ legibility of different samples of handwriting. He early 
 concluded that the scheme was feasible and proceded to perfect 
 a writing scale on that basis. His first printed scale appeard in 
 P'ebruary, 1912. In discussing the merits of this scale he says: 
 "The method by which the present scale has been produced, and 
 the criterion on which it rests as a basis differ radically from 
 those adopted by Professor Thorndike. The difference in the 
 basis is that in the present case legibility has been adopted as a 
 criterion for rating the different samples in place of 'general 
 merit' used as the basis of Thorndike's scale. The change sub- 
 stitutes function for appearance as a criterion for judging hand- 
 writing." 
 
14 
 
 Ayres gatherd 1,578 samples of writing from forty school 
 systems. The samples were red by ten readers, each of whom by 
 means of a stop watch recorded the exact number of seconds 
 required to read each sample. The samples were then placed 
 in eight groups on the basis of the time required to read them. 
 The following table shows the rating of a type sample of each 
 group. 
 
 Table I 
 
 Rating in words red per minute, 
 Point on scale of sample found at each point 
 
 90% 209.2 
 
 80% 202.7 
 
 70% 195.1 
 
 60% 186.2 
 
 50% 175.7 
 
 40% 163.4 
 
 30% 149.1 
 
 20% 132.2 
 
 The scale was divided into three longitudinal divisions on the 
 basis of slant. The top, or A division, contains the vertical 
 samples. The middle or B division, contains the samples of 
 medium slant, and the lower, or C division, contains the samples 
 of extreme slant. As implied in the above table the scale is divided 
 into eight vertical divisions, each of which contains a sample of 
 each slant. The three samples in the right colum are markt 90%, 
 those in the next colum to the left 80%, etc. 
 
 Because of its inclusion of samples representing the three 
 main types of slant, this scale is easily applied. The application 
 of this scale to the handwriting of most school systems at once 
 reveals wide variation in writing abilities, which implies either 
 widely different methods of teaching, widely different ideals as 
 to the sort of writing which should obtain, or widely different 
 degrees of zeal towards securing good writing. The following 
 graph (Figure I) of the writing abilities of the children of the 
 Training School of the Illinois State Normal University, as shown 
 by the first application of the Ayres scale, reveals the sort of 
 variation which frequently exists when subjectiv standards alone 
 are relied upon. 
 
15 
 
 The application of the Handwriting Scale not only reveald 
 wide variation within each grade, but it reveald wide variation 
 between the grades as well. This first application of the scale 
 showd that there were two children in the sixth grade who wrote 
 better than any of the children of the seventh and eighth grades. 
 It showd also that there were six children in the sixth grade who 
 made a grade of 70 while there were but four children in both 
 the seventh and eighth grades who reacht the 70 mark. This 
 test made it perfectly evident on the one hand that grades five 
 and six needed no extra consideration relativ to drill in writing, 
 while on the other hand it showd that grades seven and eight 
 needed a writing revival. 
 
 The graph (Figure II) shows what was accomplisht by the 
 eighth-grade teacher after he became conscious of the relativ 
 needs of his pupils. In the November test fifteen pupils made 
 grades of 40% or less. In the May test none made a grade less 
 than 50%. In the November test only two pupils made grades 
 of 70% while in the May test ten pupils made grades of 70%, ten 
 pupils made grades of 80%, and three pupils made grades of 
 90%. A careful examination of Figure II will reveal other 
 markt changes which resulted from an application of the Hand- 
 writing Scale. 
 
 A record of the writing ability of the children of grades four 
 to eight inclusiv, taken in November and May, and filed for 
 reference becomes a definit and valuable guide for any school. 
 It makes it possible to determin at any time whether a sufficient 
 amount of time and energy is being given to this subject. Such a 
 record protects the children from the excessiv zeal or the indif- 
 ference of the teacher, and indicates to the teacher the relativ 
 merit of his endevor. 
 
 Starch's Letter- Exposure Handwriting Test 
 
 Professor Daniel Starch of the University of Wisconsin, 
 reported his handwriting test in the Journal of Educational Psy- 
 chology for October, 1913. He pointed out that the Thorndike 
 and Ayres scales were mesures only of form and legibility 
 respectivly. He argued that a simple analysis of handwriting 
 shows that its three chief elements are legibility, producibility, 
 and form. 
 
 Starch held that legibility can best be determind by reading 
 
16 
 
 ; 
 
 \ 
 
 
 
 » 
 
 
 « 
 
 c 
 
 .« 
 
 > ; 
 
 i 
 
 i 
 
 s ! 
 
 » e 
 
 5 « 
 
 s 
 
 J 
 
 / 
 
 I 5 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 y 
 
 
 <•- 
 
 4i 
 
 *o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 • 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 y 
 
 y 
 
 / 
 
 
 
 t 
 
 7 
 
 i 
 
 
 
 
 
 
 
 
 
 
 
 H 
 
 
 
 
 ^ 
 
 
 j 
 
 R 
 
 f"*^ 
 
 ^ 
 
 X^ 
 
 - 
 
 *^^ 
 
 .*»""' 
 
 -*•* 
 
 
 ^ 
 
 
 V 
 
 V 
 
 57 
 
 
 
 
 / 
 
 / 
 
 • 
 
 V! 
 
 o 
 
 ^ 
 
 *>> 
 
 "% 
 
 
 
 > 
 
 
 t 
 
 
 
 
 * 
 
 < 
 1 
 
 ^0-* 
 
 
 
 
 
 
 
 
 
 
 V. 
 
 
 i 
 
 
 1 
 
 l v 
 
 
 % * 
 
 
 
 
 
 
 
 
 
 
 
 \ 
 
 \ 
 
 \ 
 
 . > 
 
 \ 
 
 1 
 
 « 
 
 • 
 
 1 
 
 
 V 
 
 
 % 
 x 
 
 
 
 
 1M 
 
 
 
 
 
 
 
 i 
 { 
 
 1 
 
 
 \ 
 
 s 
 
 
 % 
 
 
 
 \ 
 
 X 
 
 > 
 
 
 » 
 
 
 
 
 
 
 
 
 • 
 % 
 
 
 
 s 
 
 N 
 
 
 k 
 
 
 
 \ 
 
 9 
 
 
 
 
 
 
 
 
 
 • 
 
 1 
 
 
 > 
 
 ^. s 
 
 \ 
 
 « 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 % 
 
 % 
 
 •-... 
 
 •% 
 
 s. 
 
 \ 
 
 A 
 
 \ 
 
 
 
 
 
 
 
 
 —\ 
 
 
 
 
 
 3* 
 
 V 
 
 — +- 
 
 ^ 
 
 * 
 
 ST 
 
 •*§£ 
 
 3° 
 
 u 2 b w 
 
 *- c <u a 
 
 2 Hi 
 
 .at 
 
 is*?! 
 
 bo'«3 "c3 
 i c> § 
 
 £j£ 
 
 132 
 
 si- 2 
 oil 
 
 u a> <l> o v 
 
 to o3 ci rt $ 
 u u u u, u 
 
 -t ih o in oo 
 I 
 
17 
 
 ^ 
 
 
 I * 
 
 « 
 
 1 5 
 
 
 : A 
 
 <: 
 
 L i 
 
 > t 
 
 s. V 
 
 \ 'f 
 
 
 >w (« 
 
 J » 
 
 i 
 
 ^ 
 
 ^ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 *** 
 
 
 
 
 ^ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 m s 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 4 
 
 9 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 «" 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 *' 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 „ 4 
 
 #•* 
 
 S 
 
 *■ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 if- 
 
 
 
 
 
 
 
 
 
 
 n 
 
 
 
 
 
 
 
 t 
 
 
 
 
 
 
 
 
 
 
 / 
 
 6 
 
 ^ 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 
 
 
 / 
 
 f 
 
 
 
 
 
 
 
 
 % 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 
 
 
 
 I 
 
 
 
 
 
 
 
 
 
 9 
 
 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 
 
 
 
 i 
 
 
 
 
 
 
 
 
 
 t 
 
 
 ^s 
 
 
 
 
 
 
 
 
 K 
 
 % 
 
 
 
 
 
 
 
 -? 
 
 
 
 *t 
 
 
 
 
 
 
 
 
 % 
 
 
 
 
 
 
 s 
 
 ' 
 
 
 
 
 
 
 
 
 
 
 
 \ 
 
 \ 
 
 ft 
 % 
 
 \ 
 
 
 * 
 
 / 
 
 / 
 
 
 
 
 ^ 
 
 a 
 
 
 
 
 
 
 
 
 
 
 / 
 
 * 
 
 
 
 
 
 
 
 <B 
 
 
 
 
 
 
 
 
 
 / 
 
 
 \ 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 t 
 
 
 
 % 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 • 
 
 
 
 
 % 
 
 
 
 
 
 
 
 
 
 
 
 
 
 > 
 
 f 
 
 
 
 
 
 » 
 \ 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 
 
 
 
 
 * 
 
 1 
 
 
 
 
 
 
 
 V 
 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 
 
 
 \ 
 
 ft 
 
 
 
 
 
 
 
 % 
 
 \ 
 
 
 
 
 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 % 
 
 • 
 * 
 
 V 
 
 » 
 
 >? 
 
 
 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 
 
 ^ 
 
 
 
 
 
 
 
 
 
 \ 
 
 \ 
 
 k 
 
 \ 
 
 
 
 
 
 
 
 K 
 
 
 
 
 
 
 
 
 
 
 
 
 ^ 
 
 
 
 
 
 
 "If 
 
 
 
 
 
 
 
 
 
 
 
 
 N 
 
 N 
 
 •n 
 
 ^ 
 
 •n 
 
 *. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 X 
 
 NS 
 
 » 
 
 •a-g 
 
 J* 
 
 &*3S 
 
 1*. 
 
 •s . 
 
 § en 
 
 ft 6 
 
 if 
 
18 
 
 exposed areas of handwriting and thereby determining the 
 average rate per letter of such reading. In conformity with this 
 theory he prepared a device for mesuring handwriting as follows : 
 In a piece of cardboard were cut three circular openings in a 
 straight row 1.5 cm, apart. The openings were each 2.5 cm. in 
 diameter. By shifting the cardboard about over the writing to 
 be mesured, he was able to test its legibility at several places. 
 The number of letters exposed and the time required to read 
 them were recorded after each trial. From the records of several 
 exposed areas the average reading per letter was computed. 
 
 Starch's experiments proved that there is a remarkably close 
 correlation in the results obtaind by the Letter-Exposure Test 
 and those secured by the Thorndike and Ayres scales. 
 
 It is doubtful if the Letter-Exposure Test is as convenient 
 for testing the handwriting of large numbers of children as is 
 either the Thorndike or Ayres scales. 
 
 After testing the efficiency of writing scales Starch says : 
 "We may conclude that after some practis in the use of a scale 
 the mesurements with either scale are from three to four times 
 as accurate as the valuations made by the usual percentil marking 
 system." 
 
 The Courtis Standard Tests 
 
 In December, 1910, W. S. Courtis, of Detroit, reported in 
 The Elementary School Teacher his Standard Test (Series A) 
 in Arithmetic. This test developt as a result of applying the 
 Stone test in the Detroit Home and Day School, in which Mr. 
 Courtis was hed of the Department of Science and Mathematics. 
 After a free use of his Series A Test, which consisted of testing 
 the pupils' ability to use the four fundamental processes when 
 employd in tables ordinarily used in schoolrooms, and of testing 
 the pupils' ability to employ the reasoning processes involvd in 
 the solution of problems suitable to the grammer grades, Mr. 
 Courtis concluded that "The work done with Series A has proved 
 that the basic problem in education to-day is that of ministering 
 adequately to individual needs. The first step towards this end 
 is the formation of definit objectiv standards." The standards 
 derived from the use of Series A, however, are either complex or 
 of questionable value, owing to the uncertainty of their meaning. 
 
19 
 
 This is particularly true of the reasoning tests in which mere 
 ability to read is a large factor. 
 
 Series B is the result of an attempt to secure definit objectiv 
 standards for each of the four fundamental operations with whole 
 numbers. With the establishment of this standard it is possible 
 to set for each grade just the degree of skill in each of the 
 fundamental processes that is within reach of the average, or 
 median, child of the grade. 
 
 The following table shows the median skills of three distinct 
 groups of children in the fundamentals of arithmetic provided 
 in the Courtis test. The approximation of the series reveals the 
 universal character of the results. 
 
 Table II 
 
 
 
 5tr 
 D. 
 
 i grade 
 B. 
 
 G. 
 
 6th grad 
 D. B. 
 
 G. 
 
 Addition 
 
 ... A 
 R 
 
 6.7 
 3.9 
 
 7.2 
 3.7 
 
 7.1 
 37 
 
 8.4 
 4.6 
 
 8.3 
 4.9 
 
 8. 
 
 4.4 
 
 Subtraction 
 
 ... A 
 R 
 
 8. 
 
 5.5 
 
 7.6 
 4.9 
 
 6.5 
 4.9 
 
 8.8 
 6.2 
 
 9. 
 6.3 
 
 8.9 
 6.1 
 
 Multiplication . . . 
 
 ... A 
 R 
 
 6. 
 3.8 
 
 5.8 
 3.3 
 
 6. 
 2.6 
 
 7.4 
 4.8 
 
 6.9 
 
 4.8 
 
 7.2 
 
 4.5 
 
 Division 
 
 ... A 
 R 
 
 4.9 
 27 
 
 4.5 
 2. 
 
 4.5 
 2.3 
 
 6.4 
 4.4 
 
 5.5 
 3.3 
 
 5.8 
 4.3 
 
 
 
 7th 
 D. 
 
 grade 
 B. 
 
 G. 
 
 8th 
 D. 
 
 grade 
 B. 
 
 G. 
 
 Addition 
 
 ... A 
 R 
 
 9.2 
 3.4 
 
 9.2 
 5.6 
 
 8.9 
 4.7 
 
 10.2 
 6.7 
 
 11. 
 
 7.5 
 
 97 
 5.6 
 
 Subtraction 
 
 ... A 
 R 
 
 9.8 
 7.3 
 
 10. 
 6.9 
 
 10.2 
 7.8 
 
 12.3 
 9.5 
 
 11.4 
 8.6 
 
 11.7 
 
 8.4 
 
 Multiplication . . . 
 
 ... A 
 R 
 
 9.6 
 6. 
 
 8. 
 5.1 
 
 8.4 
 
 5.2 
 
 10.5 
 7. 
 
 9.5 
 6.5 
 
 9.7 
 6.4 
 
 Division 
 
 ... A 
 R 
 
 8.6 
 7.1 
 
 6.9 
 5.1 
 
 7.6 
 5.1 
 
 10.6 
 8.8 
 
 6.9 
 6.9 
 
 7.6 
 6.3 
 
 D = Detroit (1,315 children tested) 
 B== Boston (20,441 children tested) 
 G = General (3,618 children tested) 
 A = Number of problems attempted 
 R = Number of problems right 
 
20 
 
 Courtis early discoverd the value of the objectiv standard in 
 determining individual variation. He says : "The results of the 
 tests disclosed the usual wide range of individual variation in 
 every grade." After a use of the objectiv standard for some 
 time Professor Courtis writes: "Not only did the variabilities 
 decrease, but unhoped degrees of accuracy were attaind." 
 
 The following graphs of the abilities of intermediate pupils 
 in multiplication and oral reading as determind by the Courtis 
 and Gray scales show conclusivly how variability is easily detected 
 by the application of objectiv standards. 
 
 The graphs shown in Figure III reveal two distinct groups 
 of abilities in each subject. This may mean that little care has 
 been given to promotions. It is more likely to indicate a lack 
 of sufficient drill under proper conditions. After the abilities are 
 once reveald there is every reason to believ that a conscientious 
 teacher will raise the abilities of the lower group and thereby 
 reduce the degree of variability. 
 
 Just as a proper diagnosis in medicin is a prerequisit to 
 effectiv medical treatment, so a proper diagnosis of the specific 
 abilities of pupils is a prerequisit to the application of proper 
 methods. 
 
 The Hillegas Scale for the Mesurement of Quality in English 
 Composition 
 
 In September, 1912, Professor M. B. Hillegas publisht his 
 composition scale in The Teachers College Record. In the intro- 
 duction to this scale Professor Hillegas refers to the previous 
 efforts at quantitativ standards by Cornman, Rice, Stone, and 
 Thorndike. He does not, however, refer to Rice's pioneer effort 
 to establish a standard in English composition in 1902. 
 
 Hillegas used a method similar to the one Thorndike used in 
 determining quality in handwriting. He, aided by one other 
 person, graded about 7,000 compositions into ten classes. From 
 these ten classes seventy-five samples were chosen. Artificial 
 samples were employd at the extremes of his scale, as they were 
 in Thorndike's writing scale, in order to produce a scale of wide 
 range of mesurement. In all there were eighty-three samples 
 employd. These eighty-three samples were given to more than 
 one hundred persons, who were requested to rank them 1, 2, 3, 
 etc., in the order of their merit. 
 
21 
 
 t 
 
 S3 3 
 
 *_JL 
 
 «k ^ «|a tL J> y> Jfc. 
 
 K 
 
 i* ';. 
 
 / 
 
 / 
 
 / 
 
 / 
 
 s 
 
 / 
 
 ./ 
 
 f 
 
 \ 
 
 >' 
 
 \ 
 
 i Ss 
 
 1 l« 
 
 s I S S 
 
 8" a g 
 " 3 £> 
 e ©I 
 
 ! P 
 
 i if 
 
 8 l! 
 
 ^ co to«« 
 
 slsl 
 
 j« 
 
 c c 
 
 co en 
 
 «i Id 
 
 •2 tc'-S 2 
 
 C8 c * '*3 
 
 ofoo 
 
 < < 
 
22 
 
 Owing to misunderstandings and errors, only seventy-three 
 records were used. On the basis of like characteristics these 
 records were reduced to twenty-three. This reduced number of 
 samples containd all the important steps in quality from the poor- 
 est to the best. Six other samples, including two artificial ones, 
 were finally added, making a total of twenty-nine samples. 
 
 The twenty-nine samples were rankt by 234 judges. On the 
 basis of this ranking the number of samples was reduced to ten. 
 The difference between the merit of the first and second samples 
 in the scale is not identical with the difference in merit of any 
 other two successiv samples. These differences, however, are 
 sufficiently equal for practical purposes. 
 
 The Hillegas scale is a meritorious piece of work. It is a 
 decided step in the right direction. The brevity of the samples 
 and the gradual gradation from one quality to another makes its 
 application from this point of view quite easy. The Hillegas 
 scale, tho a meritorious piece of work, has many defects. Com- 
 menting upon the Hillegas scale, Frank W. Ballou of the Depart- 
 ment of Educational Investigation and Mesurement of the Boston 
 Schools says : "An experiment with the Hillegas scale showd 
 that the use of such an objectiv mesure did unify the grades 
 given to compositions by teachers. It was also found, however, 
 that the Hillegas scale was not satisfactory to the teachers of 
 Newton, owing to what seemd to them to be inherent faults. 
 These faults may be stated briefly as follows : first, the scale aims 
 to mesure too varied a product ; second, the compositions in it are 
 not typical of good school work — (a) some are artificial, (b) 
 others are 'bookish', really reproductions, and (c) no conversa- 
 tion is containd in any of them." As Courtis's practical tests in 
 arithmetic grew out of an attempt to use the conclusions of Stone, 
 so an attempt on the part of the teachers of Newton, Mass., to 
 use the Hillegas scale led directly to the practical Harvard-New- 
 ton Scales for the Mesurement of English Composition. 
 
 Report of Superintendent Bliss on English Composition 
 
 While at Elmira, N. Y., Superintendent Bliss reported in the 
 Psychological Clinic for March, 1912, a series of tests he had 
 carried on in composition. He had the children reproduce stories 
 red to them. These reproductions were taken to the central offis 
 and groupt, on the plan practist by Rice, into five groups. He 
 
23 
 
 determine! the median ability for all of the children in each of the 
 grades above the third. He then reported the median ability for 
 all of the children of that grade in the city with the median for 
 the particular grade in the school. He also publisht sample 
 compositions of each group of compositions in the scale. 
 
 The results obtaind from the use of this scheme were little 
 less than marvelous. He says : "In a Massachusetts school 
 system, with 33 third-grade teachers the initial test showd a city 
 average of 8.5 points, with twenty-three classes below the re- 
 quirement and eight classes above. One year later the city average 
 was 19.2 points, with thirteen classes below the requirement and 
 nineteen classes above. This represented an increase of 126% 
 in the level of efficiency in the third grade." Mr. Bliss cites other 
 cases where even greater percents of increase were made by the 
 use of this method. 
 
 The Harvard-Newton Scales 
 
 These scales are the product of the work of the eighth-grade 
 teachers and the elementary-school principals in the public schools 
 of Newton Mass., assisted by the teachers of English in the high 
 schools of Newton, and by teachers and principals in Arlington, 
 Mass., and Boston, under the direction of Frank W. Ballou and 
 with the co-operation of the Joseph Lee Fellow for Research in 
 Education. 
 
 The compositions were written by the eighth-grade pupils of 
 Newton. All of the compositions of the eleven grade schools 
 were groupt into five groups. Each group included specimens of 
 a given type of composition (narration, description, etc.). Each 
 eighth-grade teacher selected 25% of the compositions of her 
 grade on the basis of their representativ merit. These selected 
 compositions from the eleven schools were then arranged into 
 four groups. Twenty-four readers were instructed to arrange 
 the themes in each group in the order of their merit and to arbi- 
 trarily rate the best theme 95% and each of the remaining themes 
 with reference to this standard. These ratings were tabulated 
 and the median grade for each composition was workt out. For 
 example, the highest grade for composition number one was 
 95%, the lowest grade was 68%, and the median grade was 83%. 
 In like manner tabulation was made of the distribution of the 
 
24 
 
 ranks given each composition. They were then arranged in 
 serial order according to the median ranks, beginning with the 
 highest. By means of this latter method it was discoverd that 
 25% of the judges were radical in their judgment. Consequently 
 the 25% of radical readers were cut off. The scale was then 
 bilt on the median percentil basis. Out of the twenty-five composi- 
 tions which were chosen to represent each form of discourse, 
 six typical compositions were finally chosen for the scale. The 
 difference in degree of quality was carefully workt out and 
 the samples were arbitrarily markt 95%, 85%, 75%, 65%, 
 55%, and 45%, respectivly. 
 
 The Harvard-Newton Scales 1 commend themselvs to the 
 practical school man on the following points : first, there is a scale 
 for each form of discourse; second, the compositions in the 
 scale are the real productions of children and not "bilt up" 
 compositions for purposes of securing gradation in the scale; 
 third, each scale consists only of six types. This makes it an 
 easy matter for the person doing the grading to familiarize him- 
 self with the scales. The greatest weakness in these scales lies 
 in the fact that they are best suited for eighth-grade pupils. 
 
 An application of these scales reveals the fact that there is 
 but slight variation in the grades of two or more judges. Indeed, 
 the variation is so slight that a single investigator can feel rea- 
 sonably certain that his grades will not vary widely from the 
 median of several judges. 
 
 In our opinion, the Harvard-Newton Scale ranks for practi- 
 cability alongside the Thorndike and Ayres handwriting scales, 
 and the Courtis Tests in Arithmetic (Series B). It has the real 
 ring to it and will doutless have a wide use. 
 The Courtis Test in English 
 
 Professor S. A. Courtis has five different tests in English : I, 
 Handwriting Test; II, English Composition Test; III, Spelling, 
 Punctuation, and Grammar Test; IV, Rates of Reading and 
 Writing Test ; and V, Rates of Reproduction Test. In his writing 
 test Mr. Courtis uses four groups of letters with five in a group 
 in each of ten lines. Pupils are required to copy these as rapidly 
 as they can and maintain a good quality. The speed of each 
 child is recorded and the quality of the writing is mesured by the 
 
 iThe Harvard Press, 50 cents 
 
25 
 
 Thorndike and Ayres scales. Thru the co-operation of teachers 
 Mr. Courtis hopes to establish a standard test in both speed and 
 quality for each grade. 
 
 Mr. Courtis bases his English composition standard on an 
 original story, "Bessie's Adventures", parts of which are red 
 while other parts are imagind. His method of determining the 
 relativ merit of compositions is the same as that used by Dr. Rice. 
 Teachers are requested to group these original stories into five 
 groups, on the basis of merit. From each of these groups they 
 are requested to select a sample and return such samples to him. 
 In this way he hopes finally to establish a standard of English 
 abilities in the several grades, similar to those he has determind 
 in arithmetic. His other English investigations follow a similar 
 procedure. All of his tests may be had in his "Manual of In- 
 structions for Giving and Scoring the Courtis Standard Tests." 
 
 Mr. Courtis has not presented the exercizes in his English 
 tests so clearly and attractivly as he presented those of his arith- 
 metic tests. 
 
 The Thorndike Scale for Mesuring Achievement in Drawing 
 
 In the Teachers College Record for November, 1913, Pro- 
 fessor Thorndike presented a scale for "The Mesurement of 
 Achievement in Drawing". In reference to the purpose of the 
 scale he says : "It is the purpose to present a provisional scale by 
 which achievement and improvement in drawing can be mesured 
 with somewhat the same clearness, exactness, and commensura- 
 bility as achievement and improvement in lifting weights." 
 
 The same general method which was used in determining 
 the Thorndike Handwriting Scale and the Hillegas Composition 
 Scale was employd in the making of this drawing scale. Forty- 
 five drawings of children were first submitted to a number of 
 critics whose ratings reduced the number to a series of fifteen 
 drawings graded from zero up. 
 
 This series of fifteen drawings was rated by 376 persons, of 
 whom sixty were artists of distinction, eighty were supervizors 
 of art, and 236 were students of education and psychology. 
 
 The unit of the scale was one merit. This unit is "The dif- 
 ference of merit in children's drawings which 75% of artists, 
 teachers of art, and intelligent judges generally can distinguish, 
 
26 
 
 and which 25% of them fail to distinguish." The drawing lowest 
 in the scale was judged of zero merit. The difference of merit 
 between two drawings is not necessarily a unit merit. It depends 
 upon the relativ number of judges who considerd one drawing 
 better than the other. If 75% of the judges considerd one draw- 
 ing superior to another the difference in quality is cald a unit 
 merit. If less than 75% of the judges distinguisht a difference in 
 merit between two drawings the difference between the two is less 
 than one merit. If more than 75% of the judges discernd a 
 difference in merit the difference in quality was markt more 
 than one merit. The following is the determind rating : 
 
 Table III 
 
 Drawing 1=0 merit Drawing 8=10.5 merit 
 
 Drawing 2=2.4 merit Drawing 9=11.8 merit 
 
 Drawing 3=3.9 merit Drawing 10=12.6 merit 
 
 Drawing 4=5.7 merit Drawing 11=13.5 merit 
 
 Drawing 5=6.5 merit Drawing 12=14.4 merit 
 
 Drawing 6=7.8 merit Drawing 13=16 merit 
 
 Drawing 7=8.6 merit Drawing 14=17 merit 
 
 The reader should see the drawings in the Teachers College 
 Record, which accompany these merit values. 
 
 No one is more conscious of the limitations of this scale than 
 is Professor Thorndike. In spite of its limitations it is a valua- 
 ble contribution to experimental education. The method of 
 attack, the care employd in determining differences in merit, and 
 the scientific attitude of the author in the whole procedure will 
 have a wholesome effect upon investigators. It is as practical in 
 determining the qualities of children's drawing as are the writing 
 scales in determining the quality of handwriting. It would better 
 meet the needs of the schools if it attempted to mesure the various 
 aspects of children's art insted of a single aspect. It is to be 
 hoped that it will be followd by other "drawing scales" which 
 are adapted to mesure the various aspects of children's drawings. 
 
 The Thorndike Reading Scale A: Visual Vocabulary 
 
 Thorndike's Reading Scale A for visual vocabulary appeard 
 in the Teachers College Record for September, 1914. In present- 
 ing this scale Professor Thorndike states that there are four 
 fases of reading ability which should be mesured : "(1) A pupil's 
 
27 
 
 ability to pronounce words and sentences seen; (2) a pupil's abil- 
 ity to understand the meaning of words and sentences seen ; (3) a 
 pupil's ability to appreciate and enjoy what we roughly call 
 'good literature' ; and (4) a pupil's ability to read orally, clearly, 
 and effectivly." 
 
 The following scale in conjunction with the silent reading 
 tests perfected by both Kelly and Gray, given later in this report, 
 is an adequate mesurement of number (2) above. Gray's scale 
 for the mesurement of oral reading provides for number (1) 
 above. Professor Thorndike says that he is working on scales to 
 mesure (3) and (4). It is hoped that these scales will soon be 
 developt. 
 
 Thorndike Reading Scale A : Visual Vocabulary 
 
 Write your name here 
 
 Write your age here years months 
 
 Look at each word and write the letter F under every word that 
 
 means a flower. 
 Then look at each word again and write the letter A under each 
 
 word that means an animal. 
 Then look at each word again and write the letter N under each 
 
 word that means a boy's name. 
 Then look at each word again and write the letter G under each 
 
 word that means a game. 
 Then look at each word again and write the letter B under each 
 
 word that means a book. 
 Then look at each word again and write the letter T under each 
 
 word like nozv or then that means something to do with time. 
 Then look at each word again and write the word GOOD under 
 
 every word that means something good to be or do. 
 Then look at each word again and write the word BAD under 
 
 every word that means something bad to be or do. 
 
 4. camel, samuel, kind, lily, cruel 
 
 5. cowardly, dominoes, kangaroo, pansy, tennis 
 
 6. during, generous, later, modest, rhinoceros 
 
 7. claude, courteous, isaiah, merciful, reasonable 
 
 8. chrysanthemum, considerate, lynx, prevaricate, reuben 
 
 9. ezra, ichabod, ledger, parchesi, preceding 
 
 10. crocus, dahlia, jonquil, opossom, poltroon 
 
 10.5 begonia, equitable, pretentious, renegade, reprobate 
 
 11. armadillo, iguana, philanthropic 
 
28 
 
 The Kansas Silent Reading Test 
 
 Dean F. J. Kelly of the School of Education, University of 
 Kansas, while director of the Training School in the State Normal 
 at Emporia, developt and standardized The Kansas Silent Read- 
 ing Test. This test will appeal to practical school men. It is 
 definit, simple, and easily presented. The results can be quickly 
 and definitly determind. In practicability it ranks with the 
 Thorndike and Ayres Handwriting Scales, and Courtis Arith- 
 metic Tests (series B), The Harvard-Newton Composition 
 Scales, Thorndike's Reading Scale and the Ayres Spelling Scale. 
 
 The entire test consists of carefully graded groups of exer- 
 cizes; one for the primary grades, one for the grammar grades, 
 and one for the high school. The following exercizes are chosen 
 from the sixteen exercizes listed in the test for grades three, 
 four, and five. 
 
 Value 
 2.1 
 
 Value 
 4.9 
 
 No. 1 
 Mary is older than Nellie, and Nellie is older than 
 Kate, which girl is older, Mary or Kate? 
 
 No. 9 
 It was a quiet, snowy day. The train was late. 
 The ladies' waiting room was dark, smoky and close, 
 and the dozen women, old and young, who sat wait- 
 ing impatiently, all lookt cross, low spirited or 
 stupid. 
 
 In this scene, the women probably kept their 
 wraps on, because they wisht to be redy to take the 
 train. Pretty soon the station agent came and put 
 more coal in the stove, which was alredy redhot in 
 spots. Do you think this made the women happier? 
 
29 
 
 Value 
 5.6 
 
 No. 
 
 10 
 
 
 
 Below are three lines. 
 
 If the first is 
 
 the short- 
 
 . est, place a dot above it. 
 
 If the last line 
 
 is shorter . 
 
 
 put 
 
 . a cross above the longest. 
 
 If each of the 
 
 other 1 
 
 ines . 
 
 is longer than the last line, put a cross 
 
 above 
 
 the . 
 
 . shortest line. 
 
 
 
 
 
 The Gray Reading Tests 
 
 These tests were developt by Professor William S. Gray, 
 now in the School of Education, University of Chicago, while a 
 graduate student at Columbia and Chicago. In an endevor to 
 determin certain facts concerning reading achievement, rather 
 than in an attempt to devize a test per se, this scale was workt out 
 by Mr. Gray. The exercizes employd consist of carefully graded 
 selections. Those for the oral reading test increase in difficulty 
 of interpretation. This test is not so easily operated as is the 
 Kansas Silent Reading Test. 
 
 The oral test is designd to mesure abilities in pronunciation, 
 omissions, insertions, substitutions, and repetitions. The silent 
 test is intended to mesure the pupil's ability to determin the 
 thought essentials in a series of reading exercizes. 
 
 Alredy a sufficiently large number of children have been 
 tested to determin a pretty safe standard of the median abilities 
 of the children in grades three to eight inclusiv. It is to be hoped 
 that this scale will be put in a suitable form and soon be made 
 accessible to teachers. 
 
 The Ayres Spelling Scale 1 
 
 A scale for mesuring ability in spelling prepared by Dr. 
 Ayres was determind from data consisting of 1,400,000 spellings 
 by 7,000 children in 84 cities thruout the country. The words in 
 the scale are 1,000 in number. These words are arranged in col- 
 ums on the basis of their difficulty. All the words in each colum 
 
 1 Single copies of the Ayres Spelling- Scale and of the Ayres Handwriting 
 Scale may be had for 5 cents each, by addressing the Russell Sage Foundation. New 
 York City. 
 
30 
 
 have practically the same difficulty. The scale shows the percent 
 that the median child of each grade should make on each colum 
 of words. For example, the median child in the third grade 
 should spell correctly 58% of the words in colum 14. The median 
 child in a fourth grade should spell correctly 79% of the words 
 in the same colum. Median abilities are indicated in like manner 
 for the other grades. 
 
 (The practicability of this scale is characteristic of Dr. 
 Ayres' contributions to the science of education.) It is very 
 satisfactory for determining the spelling abilities of children. 
 Indeed, it is quite doubtful if there will be any improvement upon 
 this scale for the mesurement of spelling abilities in the near 
 future. 
 
 The Composition Method of Testing the Spelling Abilities of 
 
 Children 
 
 It will be rememberd that both Rice and Cornman used the 
 composition method of determining the spelling abilities of chil- 
 dren. The abilities as shown by these investigations were so high 
 that practical school men considerd them worthless as standards. 
 
 The high grades reported by both Rice and Cornman were 
 due to the methods employd. Rice found the ratio between all 
 of the words speld correctly (including duplicate words) and the 
 misspeld words (duplicate misspeld words not counted). This 
 method produced a low percentage of error. Cornman attempted 
 to correct this error by counting all duplicate misspeld words as 
 well as duplicate words which were speld correctly. As is evident 
 this method slightly increast the percentage of error in spelling. 
 
 The error in both methods resulted from the fact that both 
 Rice and Cornman did not recognize that children duplicate a 
 larger proportion of words which they can spell correctly than of 
 words which they misspell. There are at least two reasons for 
 this : first, there is a nativ tendency to use freely words which one 
 is confident he can spell and to avoid the use of words difficult to 
 spell ; second, there are a number of easily speld words such as in, 
 on, and, the, so, for, is, etc., which make up the major portion of 
 the duplicated words. 
 
 If the above reasons are sound it is evident that one's spelling 
 grade is raisd by increasing the number of repetitions when 
 mesured by the Rice and Cornman plans. Since children neces- 
 sarily repeat a large number of simple words it follows that the 
 spelling grades of children will be too high when tested by the 
 Rice-Cornman methods. 
 
31 
 
 Because I believd that a spelling standard based upon the 
 composition method is the only standard that is reliable for daily 
 use in the school room, I began to gather data in the spring ot 
 1915, for the purpose of determining a composition standard of 
 spelling which is free from the manifest errors in Rice's and Corn- 
 man's conclusions. Instructions were sent to a number of super- 
 intendents and principals who had previously manifested a wil- 
 lingness to assist in this investigation. So far thirteen schools 
 hav reported. These instructions were to the effect, first, that 
 all duplicate words and the words / and a in the compositions 
 should be crost out ; second, that of the words not crost out the 
 ratio of the words speld correctly to those misspeld should be 
 exprest in percent. 
 
 Thirteen schools r'eturnd papers properly markt. The re- 
 sults from eleven of these schools hav been tabulated, and the 
 median ability for each grade determind as follows: 
 
 Table IV 
 Median Spelling Abilities of Eleven Schools as 
 Determind by this Composition Method : 
 3rd grade 4th grade 5th grade 6th grade 7th grade 8th grade 
 91% 93.6% 95.5% 96.6% 96.9% 98.2% 
 
 The Median Spelling Abilities Reported by Cornman : 
 3rd grade 4th grade 5th grade 6th grade 7th grade 8th grade 
 94.6% 96.5% 97% 98.1% 98.9% 99.5% 
 
 A comparison of the two tables reveals a decided difference 
 in the two results. This is greater than the tables indicate. Our 
 instructions were to give the test to the best school in the city. 
 These instructions were given with the thought that a standard to 
 be of real value should represent abilities determind under most 
 favorable circumstances rather than under mediocre circum- 
 stances. It is quite probable that the median abilities shown in 
 our report (Table IV) ar decidedly higher than medians which 
 would be obtaind from testing all of the children in the cities 
 where these schools were located. 
 
 Table IV is but a tentativ report of this investigation. Addi- 
 tional data and a more critical examination of the various papers 
 reported are necessary before the reliability of these results can be 
 depended upon. It is very probable, however, that additional 
 data will show but slightly changed median abilities of the several 
 
32 
 
 grades with the single exception of the third grade. There is 
 evidence that this mark is too low. 
 
 There is a prevailing notion abroad in educational circles 
 that objectiv standards can be used only in mesuring the skills 
 of pupils. Persons who hold this notion argue that since these 
 standards mesure skill only, the results of such mesurements are 
 of little value in determining the relativ merit of teachers. They 
 further argue that since the objectiv standards mesure form and 
 not content, any markt attention given to this sort of mesurement 
 will result in an over emfasis of form at the expense v .of content. 
 
 These arguments are based upon two fallacies : ( 1 ) It is 
 fallacious to assume that only skill can be mesured by the ob- 
 jectiv standard. It is true that standards for the mesurement of 
 skill were determind first. Standards for the mesurement of 
 abilities to reason, to enjoy, and to appreciate are following. The 
 Kansas Silent Reading Test and the Gray Silent Reading Test 
 are both standards of the latter type. (2) It is fallacious to assume 
 that attention to the mesurement of such abilities as the funda- 
 mentals in arithmetic, handwriting, spelling, form in reading, 
 etc., will result in an over emfasis of the formal subjects to the 
 detriment of the content subjects. This would not be fallacious 
 were it not true that grades far above the median indicate an un- 
 due emfasis upon the subject taught and consequently are a mark 
 of poor teaching. 
 
 It must be rememberd that an application of a standard test 
 will detect an undue emfasis of some particular subject-matter 
 as well as an insufficient emfasis of it. 
 
 It is excedingly important that the interest of the school men 
 of the State of Illinois be elicited in support of a movement to 
 apply the objectiv standard more generally. We should have 
 Illinois standards for the various abilities which can now be 
 definitly mesured. 
 
 I would suggest that a bureau be establisht by the State 
 Teachers Association, or in connection with the Department of 
 Public Instruction, the State Normal Schools, or the School of 
 Education at the University of Illinois, for the direct purpose of 
 preparing and distributing these tests and for the purpose of 
 tabulating and distributing the results. Any one of these branches 
 of the public school system of the state should be and, I believ, is 
 willing to undertake this work if it is the wish of the school men 
 of the state to have it done. 
 
3 0112 105727298