THE UNIVERSITY eTF ILLINOIS LIBRARY 370 nal4^7 »l?rtoea,tl8ll. ^ The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. To renew call Telephone Center, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN OCT 18 SEP 27 A /J 1982 1982 L161— O-1096 343.WooDROw, Herbert. Brightness and Dullness in Children. Philadelphia: J. B. Lippincott and Company, 1919. 322 p. There are various discussions of retardation and advancement, elimination, special classes, bright and dull pupils and experiments with bright pupils. 344.W00DS, Elizabeth L. "Provision for the gifted child," Edu- cational Administration and Supervision, 3:139-49, March, 1917. A summary of provisions in a large number of cities showing that flexible promotion is ver>' common, special classes or groups fairly so, and that practically all superintendents favor making some such provision. 345. Young, Ella Flagg. "Grading and classification of pupils," Ad- dresses and Proceedings of the National Educational Associa- tion, 32:83-86, 1893. A rather general discussion. 346. "The Cambridge plan," Journal of Education, 75:375-76, April 4, 1912. A brief description of this plan. [50] BULLETIN No. 17 BUREAU OF EDUCATIONAL RESEARCH COLLEGE OF EDUCATION THE PRESENT STATUS OF WRITTEN EXAMINATIONS AND SUGGESTIONS FOR THEIR IMPROVEMENT By Walter S. Monroe Director, Bureau of Educational Research Assisted by Lloyd B. Souders Formerly Assistant in Bureau of Educational Research PRICE SO CENTS PUBLISHED BY THE UNIVERSITY OF ILLINOIS, URBANA 1923 TABLE OF CONTENTS PAGE Preface 5 Chapter I. — Introduction 7 Chapter II. — Summary of criticisms of written examina- tions 9 Examinations yield inaccurate measures of school achieve- ment 9 Written examinations tend to encourage undesirable mental processes 13 Passing the final examination an undesirable objective. ... 14 Examinations injurious to health of students 14 Time devoted by teachers to written examinations not profitably spent 15 Chapter III. — Preparation and administration of exam- inations IN HIGH school 16 The data collected 16 Requirement of final examinations in Illinois high schools 16 Time devoted to written examinations 19 Characteristics noted in marking examination papers 20 Weighting of questions 22 Recognition of rate of work 22 Methods of marking examination papers 23 Directions to students concerning methods of work 24 Recognition of a standard distribution in assigning grades to examination papers 24 Relation of examination grades to final grades 25 Summary 25 Chapter IV. — The constant and variable errors in exam- ination GRADES 27 Constant and variable errors of measurement 27 Magnitude of variable errors of measurement in stand- ardized test scores and in examination grades 28 Methods employed in present investigation concerning reliability of written examinations 29 Data collected for investigation 30 Reliability of written examination grades and of stand- ardized test scores 31 Conditions tending to produce variable errors of measure- ment in examination grades 37 Magnitude of constant errors of measurements in stand- ardized test scores and in examination grades 40 Results of present investigation and of previous studies compared 41 Conclusion — relative accuracy of examination grades and of test scores 41 Chapter V. — The content of written examinations 43 The data collected 43 Classification of questions 43 Relation of examination questions to educational ob- jectives 46 Chapter VI. — The improvement of written examinations. 48 Reduction of constant errors 50 Reduction of variable errors 54 Agreement of content of examinations with educational objectives 55 Simplification of administration of written examinations. . 56 Chapter VII. — Rules for the preparation and adminis- tration OF written examinations 63 Appendix 66 PREFACE This bulletin reports the results of three extensive investigations relating to written examinations. These investigations were made by Mr. Souders under the di- rection of the Director of the Bureau of Educational Re- search. The tabulations and statistical calculations were made by Mr. Souders or by clerks working under his im- mediate direction. The preparation of the published re- port, however, is the work of the Director of the Bureau. The Bureau of Educational Research wishes to ac- knowledge its indebtedness to the superintendents, prin- cipals, and teachers who cooperated by furnishing the necessary data. The data required in the study of relia- bility of written examinations necessitated considerable additional labor. Without their cooperation these in- vestigations would not have been possible. Walter S. Monroe, Director. November 1, 1923. Digitized by the Internet Archive in 2011 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/presentstatusofw17monr PRESENT STATUS OF WRITTEN EXAMINATIONS AND SUGGESTIONS FOR THEIR IMPROVEMENT CHAPTER I INTRODUCTION Preparation and administration of written examinations im- portant phases of the teacher's work. Written examinations, except in tlie few schools where they have been abolished, form a very important phase of the teacher's work, both because of the time devoted to their preparation and administration and of the significance attached to the measures which they yield. The final grades upon which promotion and the awarding of school honors depend are determined largely by final examinations and by writ- ten tests given during the school term. Altho standardized edu- cational tests have become widely used during recent years, written examinations are still the most frequently used type of measuring instrument. This will probably always be true, par- ticularly in the high school. Hence, we may expect that written examinations will occupy in the future as in the past, an important place in the work of our schools. Need for more information concerning written examinations. Thefl^have been numerous investigations which showed that the marking of written examination papers is highly subjective — that is, different teachers tend to assign different marks to the same paper. With the exception of these studies relatively little precise information is available in regard to written examinations but a number of criticisms based upon experience and theoretical considerations have been made. As a result many teachers and other school officials have come to consider written examinations very inferior instruments and have abolished them in a number of schools. A search through our educational literature, particularly textbooks, reveals an astonishing lack of information in regard to written examinations. Relatively little specific attention has been [7] given to their preparation and administration in our courses for the training of teachers. Inexperienced teachers have been left largely to their own devices in this important phase of their work. Careful inquiry and observation have indicated that there is a variety of practises with reference to the types of questions asked and to the administration of written examinations. Hence it appears that there is need for a comprehensive investigation of the present status of written examinations in order that a more intelligent esti- mate may be formed of their value in the process of education and that specific directions may be formulated in regard to their preparation and administration. Purpose of this bulletin. It is the purpose of this bulletin to present (1) a brief summary of certain previous investigations re- lating to written examinations and also of the arguments for and against written examinations; (2) the results of three extensive in- vestigations conducted by the Bureau of Educational Research, (a) the preparation and administration of written examinations in Illinois high schools, (b) the constant and variable errors in exami- nation grades, and (c) the content of written examinations; and (3) suggestions for the improvement of written examinations. In the concluding chapter the author presents a list of rules in regard to the preparation and administration of written examinations. [8] CHAPTER II SUMMARY OF CRITICISMS OF WRITTEN EX- AMINATIONSi Plan of chapter. In this chapter the important criticisms of written examinations are briefly summarized. Following each criticism either a brief answer is given or a reference is made to a detailed discussion in a later chapter. By presenting both sides of the question in this way, it is hoped that the reader will be assisted in forming an intelligent estimate of the merits of written examinations. I. Examinations yield inaccurate measures of school achieve- ment. In support of this argument six points have been made. 1. The most important criticism relating to the accuracy of written examinations is that the marking of the papers is highly &uly££tiy-£. A large number of scientific investigations have yielded objective evidence that different teachers when working independently tend to assign widely varying marks to the same paper. One of the first studies of this type was by Starch and Elliott who found that the marks assigned to the same examina- tion paper in Plane Geometry by 116 teachers ranged from 28 to 92 on the scale of 100 percent. The facts of such investigations as this can not be disputed but as we have no means of securing per- fectly accurate measures of achievement, the question at issue concerns the relative rather than the absolute accuracy of the measurements secured. Facts may be misinterpreted. In Chapter IV we shall present evidence to show that when judged in relation to other means for measuring school achievement, written exami- 'Starch, Daniel, and Elliott, E. C. "Reliability of grading high-school work in mathematics," School Review, 21:254-59, April, 1913. Morton, Robert L. "The examination method of licensing teachers," Education- al Administration and Supervision, 6:421, November, 1920. Wood, Ben D. "Measurement of college work," Educational Administration and Supervision, 7:301-34, September, 1921. Kelly, F. J. "Teachers' marking," Teachers College Contributions to Education, No. 66. New York: Teachers College, Columbia University, 1914. [9] nations yield relatively more accurate measures than generally supposed. In view of the additional information secured the sub- jectivity of written examinations loses much of its potency as a reason for their abolition. 2. The questions of ordinary examinations are usually not equal in difficulty and weighting by teachers is highly subjective.^ It has been inferred that this condition tends to increase mater- ially the inaccuracy of examination marks. Comparisons of weighted and non-weighted scores yielded by standardized tests have revealed that the errors introduced by disregarding the unequal difficulty of exercises or questions are not significant in most cases.' 3. It has been pointed out that frequently the content of written examinations is not in agreement with recognized educa- tional objectives. Catch questions relating to trivial facts or worded in a misleading way have been cited as illustrations. Certain examination questions also have referred to items which had not been included in the course or at least had received only minor emphasis. Some evidence with reference to the justifica- tion of this criticism will be presented in Chapter V. 4. In most examinations the rate of work is neglected. The usual practise is to allow sufficient time for all pupils to finish or to base the mark only on the questions answered in the unfinished papers. Hence a student's examination grade is not influenced by the rate at which he answers the questions. It is easily possible to take into account the student's rate of work in determining the mark assigned to his examination paper. One plan is to set an ex- amination of sufficient length so that all members of the class will be employed during the entire period. Another procedure is to have the student record the time when he finishes. In this way some weight can be given to his rate of work. This criticism is, however, a minor one. In some subjects the rate of work is an im- portant consideration but in others, particularly those in which reasoning predominates in answering the questions, the neglect of the rate of work will affect the accuracy of the examination marks only slightly, if at all. ^Comin, Robert. "Teachers' estimates of the abilities of pupils," School and Society, 3:67-70, January 8, 1916. 'Charters, W. W. "Constructing a language and grammar scale," Journal of Edu- cational Research, 1:249-58, April, 1920. Monroe, Walter S. "The description of the performances of pupils on exercises of varying difficulty," School and Society, 15:341-43, March, 1922. [10] 5. Written examinations are usually so short that they do not offer an adequate opportunity for a student to demonstrate his ability. This criticism is frequently expressed in the statement that it is unjust to base a student's standing for a semester or a year on an examination paper written during a brief examination period. When stated in this way the criticism refers to two issues between which there is failure to distinguish. The first is in regard to the weight allowed the examination grades in determining a student's final standing. This question of the weight given the final examination is discussed in a later chapter, but it may be said here that the usual practise in high schools is to count the written examination as one-third of the student's total grade. The second refers to the inaccuracy of the grade due to the limited opportunity which is given the student to demonstrate his ability. For practical reasons it is necessary that measurement of the total achievement for the term be based upon a sample. In general, in- creasing the scope of the examination will tend to increase the accuracy of the measures yielded. Some evidence with reference to the reliability of examination grades based upon short samples will be presented in Chapter IV. It is possible for a teacher to make examinations more comprehensive. This can be accomplish- ed in part by exercising more care in the preparation of the ques- tions. The "new examination" in which pupils are required to do little or no writing affords one means for covering a wide range of subject-matter in a brief period. This method of improving ex- aminations will be discussed in Chapter VI. The final point to be made with reference to the inaccuracy of examination marks refers to the distinction between a "score" which describes a pupil's performance on the examinafion and a "grade" which interprets this score with reference to a norm. Failure to recognize this distinction is primarily responsible for too high grading by some teachers and too low by others. Even the same teacher is likely to assign "high grades" on some ex- aminations and "low grades" on others. In order to understand how norms (standards) are used in connection with the grading of examination papers it is necessary to distinguish between "scores," or measures, and "grades," or marks. A "score" simply describes the performance which has been recorded on the examination paper. For example, a pupil may answer 55 percent of the questions correctly. In this case 55 [11] is his "score." If a certain number of points or credits had been given for each question his score might be 129, or 91, or 217. A "grade" interprets this description with reference to certain norms. A "grade" indirectly describes a pupil's performance on an examination, but it tells also whether the performance is to be considered as above or below passing; whether the pupil is to re- ceive the highest mark or the lowest mark or an average mark. It is customary to describe the quality of examination papers in terms of the percent of questions answered correctly. For example, if an examination includes ten questions and a pupil answers seven of them correctly and an eighth one partially right, he is given a score of 75 percent, which is interpreted to mean that in the judgment of the examiner he has answered the ques- tions 75 percent correctly. School marks or "grades" are also frequently expressed in terms of percents. Sometimes they are expressed in terms of letters or other symbols, but these in turn are defined in terms of percents. For example, the grade of "A" may be defined as being between 95 percent and 100 percent. Since both "scores" and "grades" are generally expressed in terms of percents, it is only natural that the two have been con- fused and that "scores" have been used as "grades." A good illustration of their difference came to the writer recently. An examination in mathematics was given to nearly 1000 freshmen in one of our large universities. This examination may properly be described as "hard," considering the training which the students had received. One student made a score of 100. The lowest score was 12. The average was approximately 55. From the standpoint of the distribution of scores this was a "good examination." If it had been easier, so that any consider- able number of pupils received scores of 100 percent, it would have been unsatisfactory. If it had been so "hard" that a considerable numberof students made zero scoresitwould also have beendefect- ive. In both cases it would have failed to differentiate between some students who were not equal in ability. However, obviously an injustice would be done if a passing mark of 70 or 75 were adopted and all pupils having scores below this mark were given a grade of failure. The passing mark for this particular examination should be in the neighborhood of 40. If the "scores" are to be represented in terms of "grades" a "score" of 40 should be translated into a "grade" of 70 or whatever passing mark has been adopted by the institution. [12] The recognition of this distinction between "scores" and "grades" enables us to indicate the way in which subjective norms are implied in "grades." A "grade" is not a pure measure or description of the pupil's performance. It is rather an interpre- tation of the measure of his performance with reference to certain norms. When no distinction is made and "scores" are used as "grades," pupils will receive high "grades" if the examination is "easy;" and low ones if it is "hard." Thus, the difficulty of the examination is one factor in establishing the norms with reference to which the "scores" are interpreted when they are used as "grades." Severe marking will tend to set high norms. Only when the examination is of average or "standard" difficulty and the marking is average in severity do "scores" and "grades" become identical in magnitude. Since the norms are established by the difficulty of the examination and the severity of the scoring, they must be subjective. In the investigations of the marking of ex- amination papers it was shown that teachers varied widely in their judgments concerning the worth of examination papers. There is no reason to expect that they would agree more closely in estimating the difficulty of examinations. Hence, norms which depend upon teachers' estimates of the questions appropriate for examinations and upon their marking of the papers must be con- sidered subjective. It is possible to increase greatly the objectivity of these norms and the first requirement is to recognize the dis- tinction between "scores" and "grades." (See page 38 for a further consideration of this topic.) Summary of inaccuracy of examination marks. From the preceding discussion examination marks are, without doubt, shown to be far from accurate measures of school achievement. However, it does not necessarily follow that the errors involved are of sufficient magnitude to justify the abolishment of written examinations. In the writer's belief the greatest benefit will come from making an intelligent inquiry into the nature of these errors and from taking steps to reduce them to the lowest magnitude. II. Written examinations tend to encourage undesirable mental processes. Many critics have claimed that most exami- nations, particularly those given at the end of a course, tend to encourage "cramming." The assertion is made that many stud- ents do little or no studying until near the close of the term. Then by the process of "cramming" they are able to pass the final ex- [13] amination and attain a relatively high standing in the course. This criticism assumes that "cramming" is an undesirable mental process and that final examinations are responsible for its occur- rence. The undesirable feature is the neglect of study throughout the term. This is not due to the fact that final examinations are given but that undue emphasis is placed upon them and that the teacher has failed to check up on the student's work day by day throughout the term. One of the points which may be made in favor of final ex- aminations is that they furnish an immediate incentive for review and organization of the content of the course. The writing of an examination itself may be an important part of the student's learning. This is particularly true in the case of questions which require reasoning and organization of information. "There is no impression without expression," and the writing of a three-hour examination is undoubtedly an intensive form of expression. Hence, one is justified in maintaining that written examinations tend more to encourage desirable mental processes than undesir- able ones. III. Passing the final examination an undesirable objective. The assertion has been made that when a final examination is required, the passing of it tends to become the objective for which many students work. When this occurs it is due not to the fact that the final examination is required but rather to the undue emphasis which is placed upon it by the school. If an examination consists of appropriate questions it is not undesirable to have the student keep it in mind as one of the objectives to be attained by studying the subject-matter of the course. However, as we shall show later, (see page 25) the usual practise is to count the final ex- amination grade as one-third in determining a student's final standing. In many schools it receives less weight. When the final examination counts only one-third or less in determining a stud- ent's final standing it is difficult to say in what respect it forms an important educational objective. IV. Examinations injurious to health of students. Some critics claim that written examinations, particularly those given at the end of a course, are injurious to the health of students, many of whom make very strenuous preparation for them. The obvious strain which accompanies the writing of answers to the questions of examinations sometimes lasting two or three hours [14] must also be borne. It is undoubtedly true that both the prepar- ation and the writing frequently make enormous drains on the energies of students. However, no careful investigation has been conducted of the actual effect upon their health. To one who ob- serves the great expenditures of time and energy devoted to social and athletic activities, it is difficult to believe that examinations are in general more injurious to the health of students than many other activities in which they are permitted and even encouraged to engage. Here again it should be realized that this criticism is not fundamentally a criticism of examinations, but rather of setting very long examinations or of placing extreme emphasis upon them by making the final grade of the course depend wholly or very largely upon the examination grade. V. Time devoted by teachers to written examinations not profitably spent. In the opinion of some critics the time given to the preparation of questions and particularly to the marking of examination papers might be more profitably employed. Infor- mation concerning the time actually devoted to the preparation and the administration of written examinations is given in Chapter III. However, it may be pointed out here that a teacher can not attain a high degree of efficiency as an instructor unless he checks up the work of his students in order to assist those who need supplementary and remedial instruction. Only by knowing the extent to which his students have achieved individually and col- lectively can a teacher m^ke his instruction fit the needs of his class. Thus considerable time must be given to measuring the results of teaching. This is an indispensable portion of the teach- er's task. It is only when a teacher devotes an undue proportion of his time to the preparation and administration of examinations that such work tends to be wasted. Doubtless, the time devoted to written examinations might in many cases be profitably increased. Students receiving low marks should -have their answers studied in order to ascertain in what ways and why they have failed. Such information will frequently be exceedingly illuminating to the instructor, and aid him in determining his own shortcomings. [IS] CHAPTER III PREPARATION AND ADMINISTRATION OF EXAMINATIONS IN HIGH SCHOOLS The data collected. The purpose of the study reported in this chapter was to secure information concerning the present practise in the preparation and administration of written examinations in high schools. A questionnaire was mailed in the fall of 1922 to 254 high-school principals in Illinois and a second one was sent to approximately 2900 high-school teachers.^ One hundred and eighty-nine replies were received from principals and 1816 from teachers. Of the latter it was necessary to discard eighty so that the following report is based upon returns from only 1736 high- school teachers who are distributed as follows: Commercial Subjects 192 Modem Languages 82 Drawing and Art 26 Music 21 English 342 Science 309 Home Economics 1 43 Shop Work 58 Latin 118 Social Science 198 Mathematics 247 Representative character of data collected. The high schools from which answers to the questionnaire were received ranged from those established in rural communities to a large metro- politan high school. No supplementary investigation was made to ascertain the extent to which the replies were representative of conditions in Illinois but in the tabulations there was no indica- tion that the data collected were not representative of the state. A few of the replies, particularly those of teachers, suggest that some slight misinterpretation of certain of the questions may have been made. (See page 23). Such cases, however, were relatively rare and probably did not affect the median of the results. Extent of the requirement of final examinations in Illinois high schools. Evidence of the subjectivity of the marking of ex- amination papers, together with other adverse criticisms of written examinations, has tended to cause many teachers and superin- tendents to be skeptical of their value. In a number of schools ^These questionnaires are reproduced in the appendix on pages 66 and 68. [16] final examinations have been abolished or made optional with the teachers and they are not considered essential by many teachers. In order to ascertain the present practise in Illinois the high-school principals were asked, "Do you require your teachers to give final examinations?" Only twenty-one principals or 11 percent stated that final examinations were not required. Thus it is the practise in Illinois high schools to require that final examinations be given. This, however, does not mean that all students must take them. Of the 168 high schools in which final examinations are required 101 or 60 percent reported that it was their practise to exempt certain students. Scholarship, that is making a grade on daily work above a certain average, was mentioned by all of these schools as one of the conditions on which exemption was based. Deportment was mentioned by 52 percent and attendance by 32 percent as additional conditions. No information was secured with reference to the explanation of the exemption from examinations of students meeting certain conditions but general observation has indicated that two reasons are frequently recognized. The first is that promise of exemption from the final examinations operates as a powerful motive to secure a high quality of daily work, regular attendance, and good deportment. The other is the belief held by many teachers that final examinations are unnecessary to determine a student's stand- ing in a course. They contend that the average of a student's daily grades should be taken as a final grade for the course. There is no doubt that the promise of exemption from the final examination operates as a powerful motive in the case of many students. It should, however, be recognized that such an incentive is artificial and therefore open to criticism. In so far as possible a student should be actuated by motives which sustain an intrinsic relation to the subject-matter. If it is necessary or advisable that the final examination be considered as a motive, it could be used to encourage systematic review and organization of the course. This should constitute a very important phase of studying. Students may, of course, be asked by their teachers to review frequently and to summarize and organize the work at the end of the term, but they cannot be convinced easily of the necessity of such work if it receives no more weight in determining their final grade than their performances during an equal period of time elsewhere in the course. [17] The second reason is a valid one in many cases. In the experi- ence of most teachers the mark made on the final examination changes the standing of relatively few students. Experienced teachers can under favorable conditions estimate with consider- able accuracy the achievements of their pupils. If the class is reasonably small and if the teacher has used methods of instruc- tion which call for frequent oral and written performances by the students and has kept a careful record of these performances throughout the term, his estimates will generally be relatively accurate measures of the achievements of the students. There are, however, certain limitations which should be noted. Teachers may be unduly influenced in their estimates by the more recent performances of their students. Unless careful records have been kept throughout the term inferior work at the beginning tends to be overshadowed by good or excellent work during the closing weeks. In case the class is a large one the teacher may not have an adequate opportunity for becoming acquainted with all of its members. Teachers' estimates are likely to be materially affected by personal characteristics of students; one with a pleasing person- ality is in many cases rated higher than one who is unattractive. If the classwork is conducted so that there is little or no written performance required, teachers' estimates will necessarily be based almost wholly on the oral responses given during the class period. Some pupils make a good showing in class when the recitation is oral but are at a decided disadvantage when asked to record their answers in writing. Frequently this difficulty is encountered when they are careless in their thinking and do not have clear ideas to express. In oral recitation they are able to make a fair showing because of personal characteristics and because of the stimulus of detailed questioning by the instructor. Furthermore, in a class discussion a bright student who has a good command of language may easily pick up ideas from other members of the class and recall ideas from his general experience sufficient to make a good showing. On the other hand there are students who express themselves more effectively in writing. They may be good thinkers but a little slow in their mental processes and not clever in dis- cussion. Thus there are cases in which it is difficult or impossible for a teacher to estimate accurately the real achievements of students from their daily work alone. The final examination at the [18] end of the term will In a considerable number of cases furnish ad- ditional information which is needed in arriving at the student's true standing. The final examination in itself provides a distinct type of edu- cational opportunity which does not occur elsewhere in the course. Altho the writers have no evidence to present upon this point they are convinced from their experience with college students and from the comments of a number who have been exempted from final examinations in high school that the practise deprives stud- ents of an important educational opportunity. Not infrequently students who have been "excused from examinations" in high school state that they experienced a distinct handicap when they entered college. If final examinations can be justified they should be required of all students. To use them only as a device for moti- vating the work of the term destroys much of their value. Time devoted to written examinations. Three questions were asked relative to the time devoted to the preparation and ad- ministration of written examinations. The replies from the princi- pals indicated that the most frequent practise is to allow ninety minutes for the writing of a final examination. This is the time allowed in 45 percent of the schools having final examinations. Fifteen percent allow eighty minutes and a slightly larger percent one hundred and twenty minutes. The teachers were asked to state approximately the number of minutes which they use "in preparing questions for a final ex- amination which students are allowed a total of ninety minutes to answer." The median time which varies only slightly for the differ- ent subjects is approximately fifty minutes. Individual teachers in the same subject differ widely in the amount of time which they give to this phase of their work. Two teachers^ one in mathe- matics and one in science, stated that they spent more than six hours in the preparation of a set of final examination questions. In each subject there were a number of other teachers who stated that they devoted not more than thirty minutes to such work. It is possible that some teachers failed to interpret this question correctly but doubtless much of the variation is due to differences in the practises of teachers during the semester. Some probably make a memorandum of questions as they occur during the term and use this list as a basis for preparing the final examination. Also experience is a contributing factor. Teachers who havebe- [19] come very familiar with the subject should be able to formulate questions more quickly than those who are not so well versed. The teachers were asked also to give the approximate time which they used in "marking the papers of a final examination which students are allowed a total of ninety minutes for answer- ing." They were directed to base their estimates upon a class of twenty-five students. The median time is approximately two and one-half hours. The variations between the different subjects are not large when the differences in their character are considered. The greatest number of hours are required for English and social science and the least for drawing and art. Here also there were wide variations in the amount of time reported by the individual teachers. A total of twenty-five teachers in which all subjects except home economics, shop work, and social science were in- cluded, stated that they devoted not more than thirty minutes to marking a set of papers for twenty-five students. On the other hand, thirty-nine teachers stated that they spent 480 minutes or eight hours in the marking of a single set of papers. It is obvious from the replies received that some teachers treat this phase of their work much more seriously than others or that they employ widely different methods. Probably some correct all errors or insert references which will enable the students to correct their own errors when the papers are returned. Others merely check the errors and still others probably do not attempt to even check each error but estimate the worth of the paper as a whole. The question concerning the amount of time which a teacher is justified in devoting to the marking of a set of examina- nation papers may very profitably be raised. Final examination papers should be treated seriously and there should be an earnest endeavor on the part of the teacher to estimate as accurately as possible the grades which are assigned but it is doubtful if the ex- penditure of as much as twenty minutes per paper which was re- ported in some cases could be justified. The median practise seems to represent a more reasonable amount of time. Characteristics noted in marking examination papers. The principals were asked to state whether it was the practise in their schools for teachers to subtract from a pupil's grade for (1) poor writing, (2) poor spelling, (3) poor English. Seventy-one princi- pals or 42 percent stated that teachers were accustomed to make deductions for poor writing. In 60 percent of the schools it was the [20] TABLE I. PERCENT OF TEACHERS REPORTING INTENTIONALLY LOWERING A STUDENT'S GRADE FOR POOR WRITING, POOR SPELLING, AND POOR ENGLISH Subject Poor Writing Poor Spelling Poor English Ancient Language 31 61 65 60 37 31 31 48 35 32 37 75 74 82 96 72 48 82 60 53 77 61 80 Commercial Subjects 72 Drawing and Art 67 English 98 Home Economics 73 Mathematics 48 Modern Language 74 Music 48 Science Shop and Vocational 56 74 Social Science 65 practise to lower a student's grade for misspelled words and in 68 percent for poor English. Fifteen principals or 9 percent reported that all three characteristics were recognized only in the marking of papers in courses in English. The teachers were asked if they intentionally lowered a student's grade because of each of the three characteristics men- tioned in the above paragraph. A summary of their replies is given in Table I, which indicates considerable variation with reference to the influence of poor writing, poor spelling, and poor English upon examination grades. Since writing, spelling and English may be considered essential parts of courses in English we should naturally expect that teachers of this subject would intentionally lower a student's grade for defects in any of these characteristics. Outside of the subject of English, the majority of teachers do not lower a grade for poor writing except in commercial subjects, and drawing and art. With the exception of mathematics deduction is made by most teachers for poor spelling. The potency of poor English in determining a student's grade is slightly less than that of spelling in a number of subjects. The handwriting, spelling, and quality of English which a student uses in writing an examination should be recognized. It does not, however, follow that a student's standing should be intentionally lowered for poor handwriting, poor spelling, and poor English in school subjects other than English. When this is done his grade becomes a measure of these abilities as well as of the abilities in the field of the subject in which the examination is given. In history, for example, a student's grade would become a [21] composite measure of his achievement in history, the legibility of his handwriting, the quality of his spelling and the use of grammati- cally correct English. As a result both the teacher and the student are likely to be confused concerning the shortcomings of the ex- amination paper. A better procedure would be to keep a record of the errors in spelling, poor writing, and poor English and when it is considered desirable a separate grade may be given covering these three characteristics. Credit for a course may be withheld until the student has brought his handwriting, spelling and Eng- lish up to a satisfactory standing. The weighting of questions. Sixty-four percent of the high school principals indicated that their teachers were accustomed to give more credit for correct answers to difficult questions than to easy ones. Approximately four-fifths of the teachers replying to the questionnaire stated that they attempted to weight examina- tion questions on the basis of difficulty. Thus there is a very' definite effort to eliminate the errors introduced in examination grades by the unequal difficulty of questions. (See page 10.) Recognition of rate of work. Eighty-two percent of the teachers stated that they were accustomed to set examinations short enough so that practically all students could answer all the questions. Only 32 percent noted the time which each student spent in writing his examination paper, and only 8 percent said it was their custom to set examinations long enough so that practi- cally no student would have time to answer all of the questions. Thus it is clear that relatively few teachers recognize the stud- ent's rate of work in determining his grade on an examination. Incidentally it may be noted that when examinations are short enough so that practically all students can finish a great deal of time is wasted. Individual differences exist in all classes and it is not at all unusual to find some student finishing in one-third to one-half of the time which others devote to the examination. Aside from the waste of time which results from this practise, it is likely that the confusion caused by the leaving of those pupils who have finished tends to disturb the attention of those who are still writing. If final examinations constitute a valuable educational opportunity, there is no justification for wasting time. It is much better to set an examination long enough so that practically all students will be occupied for the entire period. [22] TABLE II. PERCENT OF TEACHERS GIVING AFFIRMATIVE ANSWERS TO FOUR QUESTIONS RELATING TO THE MARKING OF EXAMI- NATION PAPERS Subject Correct Answers Written Each Ques, on one paper One Ques. on all papers Each paper as a whole Ancient Language 9 48 40 22 19 72 15 35 27 36 18 73 82 75 76 68 75 74 80 72 75 75 24 16 25 19 24 18 23 21 27 27 22 20 Commercial Subjects Drawing and Art 31 42 English 34 Home Economics 38 Mathematics 24 Modern Language 33 Music 42 Science 27 Shop and Vocational Social Science 37 40 Method of marking examination papers. Scientific investi- gation has revealed that the reliability of examination grades can be materially increased by adopting a systematic method in mark- ing papers.^ Among the procedures recommended are the writing out of correct answers, and the grading of one question on all of the papers before taking up another question. In order to ascertain the practise of high-school teachers in marking examination papers the following four questions were included in the questionnaire sent to them. 1. Before starting to grade a set of examination papers do you write out the answers which you consider correct? 2. Do you usually grade all the answers on one paper before taking up those of another paper? 3. Do you usually grade the answers to one question on all of the papers before taking up the answers to a second question? 4. Instead of marking the answers to each question separate- ly do you attempt to estimate the value of the paper as a whole? The percent of teachers giving affirmative answers to these ques- tions is given in Table II. A few apparent discrepancies in this table are due to the fact that certain teachers did not answer all the questions. Normally we should expect a teacher who an- swered the second question affirmatively to answer the third one negatively. This, however, did not always happen. *Kelly, F. J. "Teachers' marks," Teachers College Contributions to Education, No. 66. New York: Teachers College, 1914, 83 p. [23] With the exception of mathematics, it is not the custom of teachers to write out the answers to their questions. No data are at hand to show what effect this has upon the accuracy of the ex- amination marks. Experience with standardized tests would indi- cate that the failure to write out correct answers, at least in ab- breviated form, would operate to make the grading of examina- tion papers less accurate. About three-fourths of the teachers are accustomed to mark all the questions on one paper before taking up another. This plan has the advantage of enabling the teacher to consider a pupil's performance as a whole. In the case of students who make a large number of errors the teacher will find this helpful in providing remedial instruction. It has, however, been proposed that the reliability of examination grades can be increased by marking the answers to one question on all the papers before taking up the answers to another question. Directions to students concerning methods of work. The high-school teachers were asked the following question, "Do you prepare in written form carefully worded directions to the stud- ents regarding the procedure they are to follow in answering the questions? (These directions might include such points as, order in which questions are to be answered, length of answer, arrange- ment of work, etc.)" Only 35 percent of the teachers gave an affirmative answer. It is possible that in some classes there is a sufficiently definite understanding concerning the methods of work to be followed and explicit directions are unnecessary. However, it is likely that in many cases students would be able to give more truthful evidence of their ability if they were given pre- cise directions concerning the length of answer, desired arrange- ment of work, etc. The order in which the questions are to be answered is a point which should be stressed. In the case of questions that are at all indefinite or general there should be speci- fications concerning the degree of elaborateness which is expected in the answer. Recognition of a standard distribution in assigning grades to examination papers. The teachers were asked the following ques- tion, "In assigning grades to examination papers do you attempt to have their distribution conform to any standard form such as the normal distribution?" Only 31 percent of the teachers gave an affirmative answer to this question. This probably means that [24] relatively few teachers have recognized the distinction between "scores" and "grades," (See pages 11-13 for an explanation of this distinction.) and for this reason are neglecting one means of making their grades more accurate measures of school achievement. Relation of examination grades to final grades. The princi- pals were asked if they advised their teachers as to the proportion of the final mark for the semester which should be based upon the final examination. Only eleven or 7 percent replied negatively. Of those who gave advice on this matter 95 percent made a definite ruling. The most frequently mentioned proportions allotted the final examination are 25, 30, 33^, and 40 percent. In 4 percent of the schools the examination counts for one-half in determining a student's final grade, in 1.3 percent for only one-tenth of the final grade. The teachers were also asked this same question. The replies varied from 10 percent to 50 percent. Except in science and shop work, the median practise is to estimate the final examina- tion mark as one-third in determining a pupil's final grade. A considerable number of teachers indicated that they gave not more than one-fourth or one-fifth value to the final examination. Summary. The typical practise with reference to final ex- aminations in Illinois high schools may be summarized as follows: 1. Final examinations are required of students and exempt- ions are made largely on the basis of scholarship. 2. Students are allowed ninety minutes for writing a final examination. Teachers spend slightly less than one hour in pre- paring examination questions and from two to to three hours in grading a set of papers for twenty-five students. 3. With the exception of mathematics, the majority of teach- ers lower a student's grade for spelling, and with the exception of mathematics and music the majority lower it for poor English. Poor writing is not a potent factor except in English, commercial subjects, and drawing and art. 4. About three-fourths of the teachers attempt to weight the questions on the basis of difficulty. 5. The majority of the teachers do not consider rate of work in estimating the grade assigned the final examination paper. 6. The majority of teachers do not write out the answers to the questions preparatory to the marking of papers. The general practise is to mark all answers on one paper before taking up the next. 7. About one teacher in three writes out directions with reference to the procedure which the students shall use in answer- ing the questions. [25] 8. About one teacher in three tries to have his grades con- form to a standard distribution. 9. The proportion of the final mark for the semester which is based upon the final examination grade varies from 10 to 50 per- cent. The median practise is 33}^ percent. The majority of the principals make a definite ruling regarding the value placed upon the final examination. [26] CHAPTER IV THE CONSTANT AND VARIABLE ERRORS IN EXAMINATION GRADES Constant and variable errors of measurement.^ Two types of errors are encountered in educational measurement. The pres- ence of variable errors is indicated when a test is given twice to the same group of pupils. The two average scores of the group may be the same but this will not be true for individual pupils. A few pupils will make the same or approximately the same score on the two trials. Others will make higher scores on the second trial than on the first, while still others will make lower scores on the second trial than on the first. If we assume that the average of the two scores obtained represents an approximately true measure of a pupil's achievement then the differences between the first set of scores and the corresponding average scores would be the vari- able errors of the measures resulting from the first application of the test. Some of these differences approximate zero. Some of them are positive and about an equal number negative. Another set of variable errors would be obtained by using the scores se- cured from the second application of the test. In the case of a number of pupils the variable errors for the first application of the test will not be the same as those for the second application. Thus, as the name implies, variable errors change in magnitude from pupil to pupil within a group and also for the same pupil in a series of measurements of the same achievements. A constant error is the same for all members of a group. Such an error occurs in teachers' marks where there is a tendency to grade too high or too low. It is found in the case of standard- ized educational tests when mistakes occur in the time allowed or when other departures are made from standard testing conditions. A constant error may be either positive or negative and it is gen- erally different for different tests. ^For a more detailed discussion of the nature and magnitude of the constant and variable errors of educational measurement, see Monroe, Walter S. "The constant and variable errors of educational measurements." University of Illinois Bulletin, Vol. 21, No. 10, Bureau of Educational Research Bulletin No. 15. Urbana: University of Illinois, 1923. 30 p. [27] \ These two types of errors usually occur in combination, that is, a given measurement may and frequently does involve both a constant error and a variable error. The actual error is a combi- nation of these two. However, in studying the accuracy of edu- cational measurementsit is helpful to distinguish between the two types and to consider each separately. The usual method used for calculating an index of the magnitude of the variable errors does not give any indication of the magnitude of the constant error. Also the method commonly used for determining the pres- ence and probable magnitude of constant errors does not yield an index of the variable errors. Furthermore, different methods are required for decreasing the two types of errors in educational measurements. Methods of describing the magnitude of the variable errors of measurement yielded by standardized educational tests. In describing the magnitude of variable errors in the measures yielded by standardized educational tests, the usual method is to have the test given twice to a typical group of pupils under as nearly the same conditions as possible. The coefficient of correlation be- tween the two sets of measures is taken as the index of the magni- tude of the variable errors. Usually there will be a constant error in one and sometimes in both of the sets of measures but the sta- tistical procedure used is such that this error does not affect in any way the coefficient of correlation secured. This coefficient of cor- relation is commonly spoken of as the coefficient of reliability. A coefficient of 1.00 would mean that the variable errors were zero. Available data with reference to the magnitude of errors of examination grades and standardized test scores not comparable. Investigations of the Starch-Elliott type have proven that ex- amination grades involve errors but the method which they em- ployed is different from that used in studying the errors in the scores yielded by standardized educational tests. Starch and Elliott confined their efforts to a study of the subjectivity of the marking of a single examination paper. Except in the use of quality scales, as in handwriting and English Composition, the scoring of standardized educational tests has been made highly ob- jective. Hence, there has been little need for studying the sub- jectivity of the marking of test papers. On the other hand, there [28] has not been, so far as the writers are aware, any reported attempt to apply to written examinations the method commonly used in studying the reliability of standardized educational tests. Hence, comparisons to show the relative reliability of the two types of measuring instruments cannot as yet be made. For this reason in the present investigation it has seemed worth while to apply to written examinations the same method which is commonly used in studying the reliability of educational tests. Certain modifications are of course necessary. These will be noted in the following paragraphs. The investigation pertains primarily to the variable errors Involved in examination grades. Incidentally some light will be thrown upon the magnitude of con- stant errors. Methods employed in the present investigation of the relia- bility of written examinations. The essential feature of the meth- ods employed In the present investigation is securing two Inde- pendent examination grades for each pupil for the same units of work. This requires that two examinations be given to each of the groups of students from which data were secured. Two methods were used. These are described In the following directions which were sent to those cooperating In this Investigation. Method I Two sets of examination questions are to be prepared by a single person, or two or more persons working together. Each of the two lists should contain the same number of questions. There should be a distinct effort to make the two lists approxi- mately equal In difficulty and as nearly as possible similar in re- spect to the type of questions. After the two lists of questions have been made both should be given by each teacher to all of her pupils under as nearly the same conditions as possible. If not given on the same day, the two examinations should be given within a period of one week. For example, if two sets of examination questions in seventh grade geography have been prepared, both sets of questions should be given by each seventh grade teacher to all of her pupils. Each teacher is to mark both sets of examination papers for her pupils. In marking these papers the teacher should indicate the credit given for each question and write the total grade plainly upon the examination paper. When two or more teachers have given the same examinations it is not necessary that they confer In regard to the marking of the papers. If this Is done a memor- andum regarding the procedure should be attached to the exam- ination papers. [29] This method may be followed by a single teacher who has two or more sections of a given subject. Both examinations should be given to all sections taught by this teacher. This method of studying the reliability of written examinations can be applied to any school subject. The Bureau of Educational Research is most interested in having it applied to arithmetic, history, geogra- phy, and language in the elementary school and to history, math- ematics, English, and science in the high school. Method II Two sets of examination questions for the same subject are to be prepared by two teachers working independently, each teacher preparing a set. There is no requirement concerning the length or the difficulty of the two sets of examination questions except that both should cover the same amount of work. The teachers who prepare the questions should not confer concerning either the kind or the number of questions asked. After the questions have been prepared, both sets are to be given by each teacher to all of her pupils. If not given on the same day, the two examinations should be given within a period of one week. After the examinations have been given each teacher will grade all of the papers written upon the questions that she pre- pared. This will mean that she will grade a set of papers for her own pupils and also a set for the pupils of the other teacher. There should be no conferring between the teachers in regard to the method of scoring. In marking these papers the teacher should in- dicate the credit given for each question and write the total grade plainly upon the examination papers. The data collected. Through the city superintendents and high-school principals a general invitation was extended to school systems in Illinois to participate in this investigation at the close of the second semester 1921-22 and also at the close of the first semester 1922-23. No instructions other than those just noted were given to those who cooperated. It should, therefore, be borne in mind that the data collected are for written examinations as they are usually given and not for special types of examinations or for unusual methods in the administration or the grading of the test papers. The reliability of examination grades could probably have been increased if certain directions had been formulated in regard to the marking of the examination papers but the purpose of the investigation was to determine the reliability of typical written examinations administered in the usual way. Returns were secured from seventy-two groups of children but it was necessary to discard the data for six groups because [30] instructions liad not been followed. The examinations given to the sixty-six groups were all of the traditional type. The papers were marked by the teachers on the scale of 100 percent, and were then sent to the Bureau of Educational Research. The coefficients of reliability reported in this chapter were calculated under the di- rection of the writers. Coefficients of reliability of examination grades. The coeffi- cients of reliability of written examinations for the sixty-six groups of students are summarized in Table III. This table also shows the number of students in each class, the number of ques- tions in each examination and the method followed in giving the examination. The reliability coefficients have been grouped by subjects and have been arranged in descending order of magni- tude. For those entries marked with an asterisk (*) in the column headed "Method," one of the examinations was given by the prin- cipal or some other person not actually teaching a class in the sub- ject at that time. However, this person was considered competent to be in charge of the examination. The total distribution of re- liability coefficients is given in Table IV. Two of the coefficients of reliability are negative. The high- est is .95. It is interesting to note that the coefficients given for history are, on the average, higher than those obtained for arith- metic. The most reliable examinations given were in algebra. With the exception of history, arithmetic, and algebra, the num- ber of groups is so small that comparisons can not have much significance. The median coefficient of reliability .65 may be used as a general index of the reliability of written examinations. The coefficients of reliability of standardized educational tests. McCall' has stated that the "range of self-correlation for many standardized tests is about .5 to about .9." The writer's experience has indicated a somewhat greater range. In Table V the reliability of a number of standardized educational tests is given. Those for the silent reading tests by Brown, Starch and Courtis are taken from a recent bulletin^ by the writer. The range in this table is from .19 to .92. ^McCall, W. A. How to Measure in Education. New York: The Macmillan Company, 1922, p. 396. 'Monroe, Walter S. "A critical study of certain silent reading tests." University of Illinois Bulletin, Vol. 19, No. 22, Bureau of Educational Research Bulletin No. 8. Urbana: University of Illinois, 1922, p.33-34. [31] TABLE III. COEFFICIENT OF RELIABILITY FOR WRITTEN EXAMINA- TIONS SET BY TEACHERS Class Number No. of Pupils No. of Questions Coefficient Correlation Method 64 (Arithmetic) 21 35 24 38 64 37 38 88 41 33 72 17 22 55 73 27 56 21 43 27 51 20 23 37 36 52 45 54 33 54 53 74 15 43 27 37 21 52 10 7 and 8 10 10 10 5 10 10 5 and 10 10 6 10 7 and 10 6 10 5 5 6 and 10 10 8 and 10 10 30 and 19 5 and 7 10 and 6 10 6 15 6 5 5 5 5 3 5 5 7 and 45 6 and 7 5 and 8 .76 .74 .73 .71 .69 .67 .64 .64 .61 .60 .56 .48 .48 .47 .47 .33 .30 .29 .06 —.18 .92 .91 .88 .82 .81 .78 .73 .61 .68 .62 .57 .19 .78 .53 .53 .50 .47 .41 II 58 " II 43 " II 63 " II 61 " II 2 " 1 62 " I 1 " I 60 " II 65 " II 4 " I 71 " * 44 " II 3 " I 66 " II 23 " I 32 " II 59 " I 45 " II 42 " II 5 (Algebra) I 72 " * 50 " II 34 " II 57 " II 6 " 35 " II 31 " II 7 (Language) 9 " "8 " ..; 10 (Literature) . . . .- I 18 (English) I 12 " II 11 " I 54 " II 53 " II 13 " II [32] TABLE III. (Continued) COEFFICIENT OF RELIABILITY FOR WRITTEN EXAMINATIONS SET BY TEACHERS Class Number No. of Pupils No. of Questions Coefficient Correlation Method 20 (History) 14 28 19 43 19 53 32 28 64 24 30 29 29 47 21 23 26 63 31 30 50 33 42 10 46 18 23 16 5 5 5 5 9 5 8 and 10 5 and 7 5 10 5 and 10 10 5 and 10 5 and 10 5 10 and 8 10 and 5 5 8 2 6 and 8 5 5 8 4 and 5 5 7 and 8 16 and 10 .95 .85 .78 .76 .75 .75 .67 .66 .63 .57 .55 .66 .66 .62 .43 .32 — .11 .39 .89 .82 .68 .87 .75 .68 .34 .79 .83 .53 15 " I 23 " I 36 " II 56 " * 67 " II 41 " II 40 " II 14 " I 48 " II 49 " II 39 (Geosraphv) II 46 " II 47 " II 16 " I 68 " * 38 " II 17 (Civics) I 37 (Latin) 19 " II I 51 " II 21 (Spelling) I 52 (Geometry) II 70 " * 22 " 24 (Spanish) I 33 (German) I 69 (Commerce) * [33] TABLE IV. SUMMARY DISTRIBUTION OF COEFFICIENTS OF RELIA- BILITY FOR WRITTEN EXAMINATIONS Size of Coefficient Frequency of Correlation .95 1 .00 2 .85 4 .80 4 .75 9 .70 4 .65 9 .60 8 .55 4 •50 4 • .45 5 .40 2 .35 1 .30 4 .25 1 .20 .15 1 .10 .5 1 — .5 —.10 — .15 1 —.20 1 Total .66 Median .65 From certain unpublished studies by the writer the follow- ing information has been obtained. The Courtis Standard Re- search Test, Series B, Forms 1 and 2 were given to pupils as fol- lows: Grade V, 89; Grade VI, 81 ; Grade VII, 52; and Grade VIII, 38. The thirty-two coefficients of reliability ranged from .409 to .904 with the median at .665. Forms 1 and 3 were given to a slightly larger group in each of the four grades. The thirty-two coefficients of correlation between the two sets of scores for this administration of Series B ranged from .528 to .963 with the median at .704. The Woody Arithmetic Scales, Series A, were given to several groups of pupils. Two scores were secured by using alternate items of each of the scales and applying Brown's formula.* The twelve coefficients of reliability computed in this way ranged from .91 to .46 with the average at .66. Forms 1 and * 2rh In this formula r^ is the correlation between two scores which this 1+rh test yields. One is based upon reproduction and the other upon answers to questions. [34] TABLE V. RELIABILITY COEFFICIENTS OF STANDARDIZED EDUCA- TIONAL TESTS Test Illinois General Intelligence Scale* Courtis Standard Research Tests, Series Bt Brown Silent Reading Test — Rate Courtis Silent Reading Test, No. 2 — Rate Otis Group Intelligence Scale§ Monroe Standardized Silent Reading Test Revised* — Rate Courtis Silent Reading Test, No. 2 — Comprehension — No. Quest Starch Silent Reading Test — Comprehension — Words Monroe General Survey Scale in Arithmetic* Monroe Standardized Silent Reading Test Revised* — Comprehension. Monroe Standardized Silent Reading Test Revised* — Rate Monroe Standardized Silent Reading Test Revised* — Comprehension. Starch Silent Reading Test — Comprehension — Ideas Indiana Attainment Scale, No. IT Starch Silent Reading Test — Rate Pressey Primer ScaleT Courtis Silent Reading Test, No. 2 — Comprehension — Index Pressey First Grade Vocabulary ScaleT Brown Silent Reading Test — Comprehension — Quantity' Pressey Primer ScaleT Brown Silent Reading Test — Comprehension — Quality Coefficient .92 .87 .86 .85 .84 .84 .80 .77 .76 .76 .75 .72 .72 .66 .62 .59 .58 .37 .36 .33 .19 *Monroe, Walter S. "The Illinois Examination." University of Illinois Bulletin, Vol. 19, No. 9, Bureau of Educational Research Bulletin No. 6. Urbana : University of Illinois, 1921, p. 47. tPressey, L. W. "A Group Scale of Intelligence for Use in the First Three Grades: its validity and reliability," Journal of Educational Research, 1 :285-94, April, 1920. ttUnpublished data of the Bureau of Educational Research, University of Illinois. §Colvin, S. S. "Some recent results obtained from the Otis Group Intelligence Scale," Journal of Educational Research, 3:1-12, January, 1921. 2 of Monroe's Standardized Reasoning Test in Arithmetic were given to pupils as follows: Grade V, 36; Grade VI, 92; Grade VII, 76; Grade VIII, 81. The coefficients of reliability for correct principle were as follows: .530, .630, .645, and .723; for correct answer they were .518, .528, .576, and .707. Using Brown's formula the coefficients of reliability for Gray's Silent Reading Tests were computed for thirty grade groups. These coefficients ranged from .55 to .85 with the median at .67. The number of pupils per group was less than 100 in only five cases. For several grade groups reliability coefficients were secured for Monroe's Standardized Silent Reading Tests which ranged from .222 to .907 with an average of .669. Haggerty has computed the reliability for both Sigma 1 and Sigma 3 of his Reading Examination by having the same test re- peated. In the case of Sigma 1 the interval between the two appli- cations of the test was six weeks. For 200 children in Grades I to III inclusive the coefficient of reliabilitv .84 was obtained. In the [35] case of Sigma 3 the interval between the two applications was only two days. For 126 pupils from Grades V to VIII, inclusive, the coefficient of reliability was found to be .885. For the sentence test alone the reliability coefficient was .769 and for the paragraph test, .806. For Thorndike's Scale Alpha for the Understanding of Sentences, McCall has reported a coefficient of reliability of .37. This was obtained by using a test similar to Alpha but not con- sidered a duplicate form. Gates^ reported reliability coefficients for Thorndike-McCall Reading Scale which ranged from .25 to .72. All of these were for pupils belonging to a single grade. For the Burgess Picture Supplement Scale the author has given coeffi- cients of reliability ranging from .62 to .88 for grade groups from the second to sixth grades inclusive. In each case the number of pupils was relatively small. Gates gave coefficients of .62, .59 and .66 for three grade groups. For the Otis Self-Administering Test of Mental Ability the author has reported an average reliability coefficient of .921 for the higher examination and of .948 for the intermediate examina- tion. Presumably these coefficients are based on the scores se- cured from pupils for a sequence of several grades. For the separate tests of the Stanford Achievement Test the authors re- ported coefficients of reliability based upon separate grade groups which ranged from .75 to .96. When the composite score of all the tests was used the reliability coefficient was .98. The relative reliability of written examinations and stand- ardized educational tests. The data which have just been sub- mitted indicate that the difference between the reliability of the two types of instruments is not as great as is commonly believed. The median of the reliability coefficient for written examinations given in Table IV is .65. There are many reliability coefficients for standardized tests in Table V which are less than this. Further- more, the additional citations of coefficients of correlation in the above paragraphs indicate that for a number of standardized edu- cational tests which have been very widely used the median of the reliability coefficients for grade groups is in the neighborhood of .65. Thus the conclusion seems justified that altho some of our more elaborate standardized tests, such as the Stanford Achieve- *Gates, Arthur I. "An experimental statistical study of reading tests," Journal of Educational Psychology, 12:379, October, 1921. 136} / ment Test, the Illinois General Intelligence Scale, and the Otis Self-Administering Test of Mental Ability, may be expected to yield measures whose reliability is greatly in excess of that of typical written examinations, many widely used standardized educational tests yield measures which possess about the same degree of reliability as the grades obtained from written exam- inations prepared by teachers and other school officials. It should be noted that reliability refers only to the variable errors of measurement. The constant errors as we shall show, (p. 40) are likely to be very much larger in examination grades than in the scores yielded by standardized educational tests. It should also be noted that the time required to give many of the stand- ardized tests is much less than that devoted to a typical written examination. The absolute reliability of examination grades. The state- ment that the reliability of a typical examination is equivalent to that of many standardized tests and only slightly less than that of a number of others still leaves a doubt with reference to the abso- lute reliability. For practical purposes the reliability coefficient of .65 needs to be interpreted in terms of the variable errors of measurement to be expected. The correlation tables for eight groups having a reliability coefficient of approximately .65 were taken and the scores translated into a five point system of school grades. It is assumed that these classes were typical and the high- est scores were translated into a mark of "A," the lowest into a mark of "E." This was done in an arbitrary way but the results indicate roughly one meaning which may be attached to a re- liability coefficient of .65. It was found that in 40 percent of the cases the students received the same grade in the two examina- tions. In an additional 42 percent the grade which they received on the first examination was only one point higher or lower than that received on the second. For example, if a student in this group made a "D" on one examination, he made an "E" or "C" on the other. The two grades received by the remaining 18 percent differed by two points or more. Conditions tending to produce variable errors of measure- ment in examination grades. Several sets of examination papers were examined in order to ascertain the conditions which tended to produce the lowest coefficients of reliability and hence the largest variable errors of measurement. The most potent cause [37] appeared to be that the two teachers recognized widely different educational objectives in making out the two sets of examination questions. This seemed to be the case in Group 42, arithmetic, for which the coefficient of correlation was —.18. In Group 22, geometry, there was a difference in the general plan of the exami- nations; one teacher permitted the students to choose one of two questions in part of the examination while the other required that all questions be answered. This difference in the plan of the ex- amination appeared to increase the variable errors of measure- ment. There was also a difference in the educational objectives recognized in that one teacher placed much more emphasis upon the practical application of geometry than the other. Another cause which operated to lower the degree of correla- tion and hence to increase the magnitude of the variable error was the adherence to different standards of excellence by the teachers who graded the papers. For example, in Group 45, arithmetic, one teacher considered only the final answer to the exercise; if that was right the student received full credit — if wrong, no credit was given. ■ The other teacher gave credit for correct principle. The coefficient of reliability for this group was .06. It was noticed that in general pupils made higher grades on the tests set by their own teacher than on those set by another person. This appeared to be true even when distinct differences could not be identified either in the educational objectives or in the methods of grading of the two teachers. In Group 32 for which a reliability coefficient of .30 was obtained when the grades made on the first examination were correlated with those made on the second examination, a second coefficient was calculated by com- paring the student's grade made on the examination prepared by his own teacher with that set by another teacher. This procedure gave a coefficient of correlation of .40. When the two classes were taken separately coefficients of .57 and .44 were obtained. These data tend to supplement the evidence already cited that differ- ences in the content of the examination and in the plan of marking are potent factors in producing the variable errors of measurement. The magnitude of constant errors in examination grades. It is probable that most of the teachers marking the examination papers did not recognize the distinction between "scores" and "grades"^ and that the marks placed upon the papers were con- *See page 11 for a statement of this distinction. [38] TABLE VI. DISTRIBUTION OF DIFFERENCES BETWEEN AVERAGES OF EXAMINATION GRADES Difference Frequency 50 1 32 1 29 1 27 1 22 1 21 1 20 2 18 3 16 1 15 1 14 2 13 2 12 1 10 2 9 1 8 4 7 3 6 6 5 4 4 6 3 5 2 6 1 8 3 Total 66 Median 6.2 sidered as "grades." In several instances the "grades" made on one examination were on the average much higher than those made on the other. If "scores" were used as "grades," any differ- ences between the averages of the two sets of measures indicate the presence of constant errors. In order to secure an index of their magnitude the differences were calculated for the sixty-six groups to which two examinations were given. These are assem- bled in Table VI. For three of these groups the difference between the averages of the two sets of "grades" was zero; for eight other groups it was one. At the other extreme we find a difference of 50 in the case of one group. The median difference is 6.2. It should be noted that the differences between the averages of two sets of examination grades are not constant errors. They are merely indicative of the presence of constant errors. If one examination was easy and the other hard the difference would be the sum of a positive error and a negative error. If both examina- tions were hard the difference would be smaller than the constant error in either average. The large differences shown in Table VI [39] are probably caused by the combination of an easy examination with a difficult one. This was very obviously true in the case of the one difference of 50. Furthermore, in interpreting Table VI it should be remembered that possibly some of the teachers recog- nized the distinction between "scores" and "grades," and the marks would have been appropriately adjusted before being used as "grades." So far as it was possible to ascertain from an analysis of the examination papers the large differences are due to two causes — differences in the difficulty of the two sets of examination ques- tions and in the severity of the grading. For example one of the examinations which produced a difference of 40 consisted of seven questions of which the pupils were permitted to answer any five. These questions were relatively easy. In the other examination, there were ten questions and the pupils were required to answer all of them. Very few were able to complete this second examina- tion in the time allowed and the teacher appears to have counted the unfinished exercises as failures. Nine out of twenty-two child- ren in the second group made zero on the examination. In this way a very large constant error was introduced but the coefficient of reliability for this group was .48. Relative magnitude of constant errors in examination grades and in standardized test scores. In another place^ the writer has discussed the magnitude of the constant errors in educational tests. In cases where there has been coaching for tests, intentional or not, or disregard for standard directions, large constant errors have been introduced. In one extreme instance a constant error of over three and a half years occurred in the mental age scores of a group of children. In general, however, because of the standard directions for administering the tests and scoring the papers, of the objectiv- ity of the marking, and of the norms for interpreting test scores, the constant errors in standardized tests are very much smaller, and are likely always to be smaller than those found in examina- tions given by teachers. However, some reduction in the magni- tude of the constant errors in examination scores will result when the use of either very easy or very difficult sets of questions is avoided and when a conservative plan of marking is followed. TMonroe, Walter S. "The constant and variable errors of educational measure- ments." University of Illinois Bulletin, Vol. 21, No. 10, Bureau of Educational Re- search Bulletin No. 15. Urbana: University of Illinois, 1923, p.19-20. [40] Explanation of the apparent contradiction between the re- sults of this investigation and previous studies of examination grades. The results of this investigation have caused the writers to revise their estimate of the accuracy of examination grades. The findings indicate that the errors are much less than they ap- peared to be from evidence resulting from investigations of the Starch-Elliott type. One naturally asks the question, "Why this apparent contradiction?" Starch and Elliott obtained similar re- sults for three different examination papers and numerous other investigators have corroborated their findings. The mass of evi- dence accumulated is so extensive and uniform in character that one would naturally be inclined to accept the conclusions indicated in preference to the apparent contradictory results of the present investigation. However, a careful analysis of the procedures re- veals that the results are not necessarily contradictory. The method followed by Starch and Elliott combines both constant errors and variable errors. The "grades" assigned to the exami- nation paper in geometry were influenced both by the subjectivity of the marking and by the tendency of some teachers to grade high and of others to grade low. The present investigation has separat- ed the variable errors from the constant. It has also shown that the examination scores have in some cases involved relatively large constant errors. The extreme differences between the grades assigned to the same paper reported by Starch and Elliott (see page 9) are easily explained when it is understood that they repre- sent the combination of variable errors and constant errors. Especially is this true when we realize that the constant errors would likely be larger for teachers of different schools as in their investigation than for teachers in the same school as in the present investigation. Conclusion with reference to relative accuracy of examination grades and scores yielded by standardized tests. As already indicated the writers believe that the data presented in this chapter show that examination grades are more accurate meas- ures of achievement than many persons have considered them to be. Standardized tests yield scores involving errors, both con- stant and variable, but in the case of our best standardized tests these errors are distinctly less than the corresponding errors in ex- amination grades. Furthermore, measurement by means of standardized tests usually requires much less time than is com- [41] monly devoted to written examinations. This conclusion refers to written examinations of the traditional type and admin- istered under typical conditions. It is likely that written examina- tions and their administration may be improved so that the difference in the accuracy of examination grades and test scores will become much less than at present.^ ^Theconditionsof standardized tests would have been more closely approximated if both sets of examination questions had been prepared by the same person and marked by different persons. If this had been done it is reasonable to expect that the coeffi- cients of reliability would have been somewhat higher and the differences in the aver- ages of the two sets of scores smaller. [42] CHAPTER V THE CONTENT OF WRITTEN EXAMINATIONS The data collected. In response to an invitation sent to superintendents and high-school principals in Illinois sets of ex- aminations were received from fifty-six schools for the first semes- ter and from fifty schools for the second semester of the school year 1921-22. A range of approximately sixty subjects was repre- sented. It seemed desirable to restrict this analysis of sets of questions to the thirteen subjects listed in Table VII. The num- ber of sets of questions and also the total number of questions are given in this table. Classification of questions according to type. After consider- able experimentation a list of fifty types of questions as given be- low was formulated. Aims \ — Analysis Cause (give) Classification Comparison Completion Conjugation Construction (a figure, study or statement) Construction (give the) Contrast (general) Contrast (specific basis) Correction Criticism Decision (choice or preference) Declension Definition Description (characterization) Diagram (illustrate by) Discussion Effect (give the) Evaluation Example (illustrate by) Expansion Explanation (tell why or how) Facts (definite number) Facts (indefinite number) Factoring How many (tell) Law (give the) Mathematical operations of addition, subtraction, multiplication, and divi- sion Method Outline Parsing Proof Punctuation (capitalize and correct sentences) Recall Reduction to lowest terms Relationships (give the) Rule Scanning Simplification Source Substitution (values for letters) Summary Solving for unknown quantity Syllabus Translation (foreign language to English) Translation (English to foreign language) Use (give the) Where (tell) [43] TABLE VII. NUMBER OF QUESTIONS AND SETS OF QUESTIONS EXAMINED Subject English I English II English III Algebra I Plane Geometry Latin I Latin II Physics General Science Civics American History Domestic Science Domestic Art Total Sets 901 Questions 80 721 83 694 79 721 80 731 80 636 81 683 76 539 76 795 62 789 59 560 62 550 42 392 41 368 7621 All questions for the thirteen subjects mentioned in Table VII were classified under some one of these types. This classification was made by Mr. Souders with the assistance of a single clerk working under his immediate direction. Altho any classification of this kind is necessarily subjective, a relatively high degree of uniformity has, we believe, been secured. Summary of classification. Twenty-six of the fifty types of questions were represented in six or more of the thirteen subjects. The relative frequency of each is given in Table VIII. This classi- fication of examination questions shows a high frequency of cer- tain types and very little or no use of a number of other types. If we omit Latin, Algebra, and Plane Geometry in which the na- ture of the subject-matter restricts the kind of question asked, we find that 32 percent of all the questions require "explanation." The next most frequent type used, 21 percent, calls for a "definite number of facts." Frequently all questions are considered as belonging to one of two groups, "thought questions" or "memory questions." Such a definite classification is not, however, always possible. The character of the mental process involved in answering depends upon the person replying as well as upon the form of the question asked. Those questions calling for definite facts are almost cer- tain to be based upon memory; on the other hand, those requiring classification, evaluation, contrast, etc. are likely to demand [44] Ijy apssuioQ ^-HO) o I n -H (S ^ o >o ■* 3DU3I3g OIlSaUIOQ SDISXqj aouapg |BJ3U3C) CS r-H I CS fO XJjataosQ auBjj VO CN -H ^ J BjqaSjy CS -"f vO IJ UUB7 «,^|uo ^hI ^csi-H-H cscot^l ^^ roco>-( o I up^l —H CN n I OO SDIAI3 ■* >— I >0 >>0 t~~ -H ,— I I itjojsijj ueouaray III MSiiSua II HS!l3"3 '-HCNCO I TfO—i^Oc'O O I Hsnsua ^H -^ -H ol Tf I CN C^ CS >^ SO I •— I St. 9 c oOc)r >- y;;; ci, t; CO V3 * CIS rt i3 h .ti •— cB en - — _^ ■»/ i_iAj f 1 -^ ^~" " *^ *-» iJ ^ ^— O CO O ^ ^, ?> « .« lii J? « .1^ y 'J S ti o !r! -t; y OJ c S •:: 2H uu U CO 00 I cs < H [45] thought on the part of most students. If, however, such classifi- cations or evaluations have been made in a previous class exercise some students may easily remember the answers and, in such a case, a thought question for one student becomes a memory ques- tion for another. For the purpose of this study Types 10, 20, 22, 25, and 26 have been designated as probable memory questions, the remaining types as probable thought questions. The percent of each group is given in the last two lines of Table VIII. These percents can be considered as only a rough indication of the re- lative frequency of these two very general divisions. In her investigation of "the question as a measure of effi- ciency in instruction," Dr. Stevens^ attempted to determine the relative number of thought questions and memory questions asked by teachers in a single class period. The percents of memory questions for history, English and science were 83, 55, and 67 respectively. This relative frequency is much larger than indi- cated in Table VIII. The difference may be due to the fact that in the present investigation only written examination questions were considered, but it is altogether likely that it is indicative of a real change in the type of questions which teachers commonly ask of their students. Relation of questions to educational objectives. The ques- tions which teachers ask during class periods constitute a concrete expression of the educational objectives which they are day by day setting for their students. The questions of the final examina- tions should, therefore, be representative of the types of education- al objectives set in the different school subjects.^ The emphasis upon memory and some of the simpler types suggests a need for a modification in emphasis in most of the school subjects. Quality of examination questions. Altho the writers have no objective evidence to present in regard to the quality of examina- tion questions, those submitted for this study were in general considered good. Catch questions or those stated so that they would not be understood easily by students were very rare. Many questions were stated so that the grading of the answers was ob- jective and would indicate that their form had been influenced by 'Stevens, Romiett. "The question as a measure of efficiency in instruction." Teachers College Contributions to Education, No. 48. New York: Teachers College, Columbia University, 1912. ^See page 55 for a further discussion of objectives in school subjects. [46] the exercises of standardized tests. This was especially true of those examinations having questions of the true-false or recogni- tion type. In the judgment of the writers the criticism that teach- ers are inclined to ask misleading or catch questions is not a valid one in the case of the examinations studied in this investigation. [47] CHAPTER VI THE IMPROVEMENT OF WRITTEN EXAMINATIONS Altho we now have a number of standardized tests which are superior to written examinations, and we have reason to believe that they will be used even more extensively than at present, there is need to give attention to the improvement of written examina- tions. It does not appear likely that standardized tests will ever replace written examinations. The latter type of measuring in- strument will probably continue to be the most frequently used means of measuring the achievements of school children. Written examinations may be improved by correcting the faults which have been noted in the preceding chapter. In this chapter we shall consider four important improvements: (1) Reduction of constant errors; (2) Reduction of variable errors; (3) Securing a greater agreement of the content of examinations with recognized educational objectives; (4) Simplification of the administration of written examinations. There is some overlapping between these improvements. For example, the magnitude of errors in measurement, particularly variable errors of measurement, will be reduced by securing a greater agreement between the content of the examination and recognized educational objectives. The devices for simplifying the administration of examinations also tend to make the results more accurate. Causes of constant errors in examination grades. The fun- damental cause of constant errors in examination grades, i.e., "high grades" or "low grades," is the failure to recognize the dis- tinction between "scores" and "grades." (See page 11.) A pupil's grade tells his standing with reference to a norm, i.e., the passing mark. When no distinction is made between "scores" and "grades" this norm is subjective. Altho the passing mark may be defined numerically as 70 percent or 85 percent it is fixed in the case of a particular examination by the difficulty of the questions and by the severity of the marking of the papers. Pupils will re- ceive "high grades" when the examination is easy or the plan of [48] marking is generous. They will receive "low grades" when the examination is hard and a severe plan of marking is followed. If the teacher makes no distinction between "scores" and "grades" he sets the norm for a particular examination when he makes out the questions and decides upon the plan of marking. He implicitly expresses the opinion that the pupil whose achievements are barely "passing" will make a grade of 70, or the passing mark adopted. He also implies that the pupil whose achievements are exception- ally high will make a high grade, i.e., a grade of 95 or between 95 and 100. Such expressions are merely subjective. Since the failure to recognize the distinction between "scores" and "grades" is the cause of constant errors the plan for improve- ment is obvious. The papers should be marked in terms of "scores." These may be on the scale of 100 but this is not essen- tial. In fact it will probably assist a teacher in keeping the dis- tinction in mind if the scores are not on the scale of 100. After the papers have been marked the "scores" should be translated into "grades" by comparison with a norm in which the subjective ele- ments are reduced to a minimum. A standard average grade used as a norm. The simplest ob- jective^ norm is a standard average grade. This may be set arbitrarily but a more rational procedure would be to take the average of the grades given in a school on a particular subject during a period of several years. The standard average grade defines the grade into which the average score of a typical class should be translated. For ex- ample, if the standard average grade is 85 and the average score in a particular class is 57 the grade corresponding to this score would be 85. In case the class is made up of poor students the average grade of the class should be below the standard average grade. If the class is unusually bright their average grade should be higher than the standard average. The translation of the average score into the appropriate corresponding grade furnishes a basis for the translation of the other scores of the group. The procedure just outlined is necessarily crude. It is par- tially subjective because the determination of the general status of ^The adjective "objective" is not intended to indicate perfect objectivity or even as high a degree of objectivity as we have in the case of many standardized tests. As used here it means that the norm is distinctly less subjective than the norm commonly implied in the usual examination. [49] the class is left to the teacher. However, the teacher may use previous school records or the measures obtained from a stand- ardized test to assist him in arriving at a partially objective es- timate of the general status of the class. The use of a standard distribution instead of merely a standard average grade represents a more systematic procedure. 1. Decreasing the magnitude of constant errors by means of a standard distribution of grades. For several years a number of educators have been urging that teachers make the distributions of their grades conform to a standard shape, i.e., that a specified percent of the members of a typical class be given a grade of A, another specified percent a grade of E, and so on for each of the marks adopted by the school.^ A number of distributions have been recommended. For a five point system of grades several authors have recommended the following distribution, 7, 24, 38, 24, 7. Other distributions which have been advocated are 7, 18, 50, 18, 7 and 5, 15, 60, 15, 5. The essential feature of the plan is a specification of the per- cent of the students of a given group who are to receive each mark rather than the particular form of distribution used. There is much evidence which indicates that the distribution of achieve- ments of an unselected group of students approximates the normal probability curve. ^ If we assume that true measures of the achievements of an unselected group of 100 or more are distributed normally this adjustment does not fix the percent who are to re- ceive each grade. The normal probability curve may be divided in many ways, for example, it is possible to divide the curve so that there would be 50 percent of A's, 20 percent of B's, 10 percent of C's, 10 percent of D's, and 10 percent of E's. In such a distribu- tion a grade of A would be given to all students above the average of the class. An appropriate meaning could be stated also for each of the grades. A distribution which is symmetrical has certain advantages and one of those mentioned in the preceding para- graph is to be preferred. The particular standard distribution to be used is a matter of policy which each school should determine. Some argue that different standard distributions be adopted for -If grades are expressed in percents the corresponding intervals such as 95 to 100, 90 to 94, etc. would be used instead of A, B, C, etc. The normal probability curve is bell shaped and Is symmetrical with the average or median as a center. [SO] the different years of the high school, some advocate different standard distributions for different school subjects. It should be noted, however, that there are certain advantages in uniformity. It would be desirable for all high schools, particularly those in a given state, to agree upon a common standard distribution and to use this for all subjects. Grades assigned in different schools can have a common meaning only when they conform to the same standard distribution. The proposal that teachers make the distributions of their grades conform to a standard shape has met with much criticism. As in any controversy there have been extremists on both sides and many of those participating have given evidence that they failed to understand clearly the nature of the proposal of its es- sential features. Among the advocates of the use of a standard distribution are those who have insisted that the normal probabil- ity curve explicitly defines the students who must receive A's, who must receive B's, etc. Cases have been reported of instructors who frankly admitted that a certain student deserved to receive an A but that they had used up all the A's which the distribution al- lowed, and that, therefore, the student must be satisfied with a B. One hears also of instructors who announce at the beginning of a course that a certain number of the class must fail. It is rumored that in some of these instances the students enrolled have hired certain other students who were indifferent to their scholastic standing to enter the course in order to provide the requisite num- ber of failures. The opponents of the use of a standard distribu- tion have contended that there was no a priori reason why any student should fail and that always the quality of the student's work should determine his scholastic standing. Furthermore, they have pointed out that in any group of students brought together for instructional purposes it is extremely unlikely that the distri- bution of achievement would approximate at all closely their pre- determined standard distribution. The mechanical and unin- telligent application of a standard distribution by some instruc- tors has given the opponents of the plan concrete examples of what they imagined to be the normal result of its use. A standard distribution is merely a device which teachers may use in order to reduce to a minimum the constant errors in their grades, but to be helpful it must be used intelligently. It must be remembered that a standard distribution is a means and [51] not an end. Whenever common sense Indicates that the distribu- tion of grades for a particular class should depart from the normal distribution no instructor should hesitate to award the grades which he believes the students deserve. It is intended that the standard distribution will be closely approximated only for a large unselected group of students. A particular class very frequently is made up of a selected group of students. Furthermore, classes of the usual size, 20 to 35, are so small that frequently there will be significant departures from this standard distribution. Translating scores into school marks by means of a standard distribution. A standard distribution is useful in translating ex- amination "scores" into "grades." The examination papers should be marked in terms of a score. This score may or may not be on the scale of 100 points. In order to avoid confusion between "scores" and "grades" it is wise to use a scale of points shown so that the maximum score will not be 100. If the papers have been marked in this way the scores may be arranged in col- umns as indicated in the left hand margin. The first step in translating these scores into grades is to determine whether or not the class is typical. If an experienced 69 68 64 63 60 58 57 56 55 54 51 50 38 35 32 teacher has had a class for several weeks he will usually be able to estimate its general status with a fair degree of accuracy. • At the beginning of a school year or in the case of an inexperienced teacher some outside informa- — tion is needed. The previous school record of the stu- dents may be studied but in many cases it will be more convenient to administer a general intelligence test. The r-. average mental age and the distribution of the I.Q.'s of the class will be a very reliable index of the composition of the group. If the median I.Q. is distinctly below 100 the teacher may know that he has poor pupil material. If it is much above 100 he knows that the class consists of pupils better than the average. If there is an unusually high number of low I.Q.'s he may expect a relatively high number of low grades. With the general status of the class in mind the scores may be grouped in conformity with the system of marks used. In the illustration in the left hand margin it has been assumed that the class is approximately typi- cal. The percent of A's and also the percent of failures [52] are somewhat larger than the percent specified in most standard distributions. If the scores are arranged in the form shown be- low the general shape of the distribution will be more obvious. However, in the majority of cases it will be sufficient to use the arrangement given in the margin. 58 57 47 56 69 46 55 68 38 42 54 64 74 35 41 51 63 73 32 40 50 60 70 E D C B A It is seldom that one will have exactly a symmetrical distri- bution of grades for a class of this size. Some departures from the standard distribution must be expected. In case the class is not typical one should expect marked departure from the standard distribution. For example, the distribution for a given class might be as shown below. In this there are no grades below passing but there are a number of poor students just above the passing mark. Also the percent of A's and B's is unusually large. Such a distribution is not normal but might well represent the distribution of grades for a particular class even when the normal distribution had been adopted as the standard. If the teacher is able to show that the general status of the class justifies such a departure he deserves commendation rather than criticism for his distribution. 64 42 63 41 54 60 74 40 51 58 73 38 50 57 70 35 47 56 69 32 46 55 68 E D C B A An accumulative distribution used as a check upon constant errors. A standard distribution is also useful as a check upon the grades given by a teacher over a period of several terms. When the grades for the entire period are assembled in such a distribution any general tendency on the part of the teacher to give too high or too low grades will be revealed. Each teacher should keep an [S3] accumulative distribution of the grades in each subject he teaches. For example, a teacher in mathematics should keep an accumula- tive distribution of the grades given in classes of first-year algebra. When the total number of grades becomes large a comparison of this distribution with the standard distribution will reveal any tendency on the part of the teacher to grade too high or too low in this subject. A teacher should then take steps to correct any marked departures from the practise defined by the standard dis- tribution. In large schools where there are several sections of the same subject it will be helpful to secure a distribution of grades each time they are issued. Any marked departures from the standard distribution will then be called to the attention of the teachers. However, one should avoid giving the impression that there must be uniformity with the standard distribution. De- partures from this standard distribution are justified when the group of pupils can be shown to be selected. Thus a departure from the standard distribution is a cause for an investigation on the part of the teachers concerned. If evidence can be produced which justifies the departure no change in the system of grading should be used. On the other hand when investigation reveals no reasons why there should be departures from the standard dis- tribution, the teachers should be urged to modify their system of grading so that a greater uniformity will be secured. 2. Decreasing the variable errors in examination scores. The reduction of the magnitude of variable errors of measurement in examination scores is to be secured mainly through the adop- tion of rules which will bring about greater uniformity in preparing and administering examinations. These rules should include specifications in regard to the effect of poor writing, poor spelling, and poor English upon a student's grade, and should be in agree- ment in regard to giving credit for correct principle and partial credit for exercises partly right or partly completed. The rules may properly include also specifications relating to the number and types of questions to be asked and the form in which they are to be presented to the students. For guidance in marking papers a teacher should write out, at least in abbreviated form, the cor- rect answers to the questions. The accuracy of examination scores will be increased also by making the examinations more uniform with respect to content.* *For recommended rules covering these and other points see Chapter VII. [54] It has been proposed that the use of types of questions which call for answers that may be objectively classified as either "right" or "wrong," would facilitate uniformity in marking the papers. This means of reducing the variable errors of measurement will be considered under the head of "simplifying the administration of written examinations." 3. Securing agreement of the content of examinations with recognized educational objectives. The intrinsic function of an examination is to measure certain achievements. In general the achievements for which we desire to secure measurements are those included in the recognized educational objectives. Hence, the questions should be in agreement with the objectives. Therefore, it is impossible to cover all details in a given subject-matter field. The questions should relate to the most significant facts, princi- ples, etc. of the course. Catch questions and those calling for un- important details have no place in an examination. For example, an examination in spelling should not include unusual or obsolete words, an examination in history should not call for obscure dates or other trivial facts. In securing agreement the teacher should make use of such terms of minimum essentials as are available. For example, in spelling a teacher may very properly select the test words from Ayres' list of the one thousand most frequently used words or from some other carefully prepared minimum essential list. In geography a teacher will find the Hahn-Lackey Geography Scale a helpful source of questionings. In other subjects the teacher will not have access to terms of minimum essentials as complete as in these two subjects, but he should become familiar with curriculum studies and other investigations^ relating to educational objectives. ^The following list is suggestive of studies relating to educational objectives: Yearbooks of the National Society for the Study of Education. Bloomington, Illi- nois: Public School Publishing Company. Part I of 14th — reading, writing, spelling, language and grammar, arithmetic, history, literature, geography. Part I of 16th — reading, writing, spelling, arithmetic, history, physical education. Part I of 17th — arithmetic, geography, reading, English, civics, history. Part II of 17th — history, civics, economics, sociology, geography. Part I of 19th — on new materials of instruction, reading, history, geography, mathe- matics, nature study, civics. Part I of 20th — on materials of Instruction — all subjects in elementary schools. Part II of 22nd— the social studies in the elementary and secondary school. "Arithmetic, course of study for the elementary schools. Including the kindergarten and the first six grades." Course of Study Monographs, Elementary Schools, No. 1 Berkeley, California: Public Schools, 1921. 86p. {Concluded on p. 56.) In some subjects there are valuable committee reports which give the consensus of opinion concerning the relative importance of the numerous distributions. The teacher must assume most of the responsibility for se- curing the agreement between the content of the examination and educational objectives. In many of the high-school subjects he can obtain little assistance from such sources as just indicated. However, if this purpose is kept in mind and if he is really famil- iar with the subject which he is teaching, gross inconsistencies with recognized educational objectives will be avoided. 4. Simplifying the administration of written examinations. The administration of written examinations, particularly the marking of the papers, can be greatly simplified by the use of certain types of exercises. For example, in the true-false type of exercise the pupil merely indicates whether the statement is true or false. Instead of asking the question, "Why did the Puritans come to America in the seventeenth century?" we may ask whether the following statement is true or false. "The Puritans came to America in the seventeenth century seeking wealth." The pupil may give his answer to this exercise by writing a plus sign after the statement if he considers it true and a minus sign if he considers it false. In case the statement is dictated to him he may write after the number of the exercise the word "true" or "false" or the appropriate sign. The answering of such exer- cises requires very little of the pupil's time and the scoring is ex- ceedingly simple. Questions which can be answered merely by "yes" or "no" also simplify the administration of examinations. Similar results can be secured with recognition exercises such as have been used in a number of standardized silent reading tests. The following is an exercise of this type. Ayres, L. P. "A measuring scale for ability in spelling." N. Y.: Division of Education, Russell Sage Foundation, 1915. 58p. Ayres, L. P. "Measuring scale for handwriting." N. Y.: Division of Education, Russell Sage Foundation, 1920. (Folder, chart.) Bagley, W. C. and Rugg, H. O. "The content of American history as taught in the seventh and eighth grades." University of Illinois Bu!letin,Vol. 13, No. 51. Urbana: University of Illinois. Charters, W. W. Curriculum Construction. N. Y.: Macmillan Co., 1923. 352p. Charters, W. W. and Miller, Edith. "A course in grammar." University of Missouri Bulletin, Vol. I, Education Series 9. Columbus: University of Missouri, 1915. Hahn, H. H. Hahn-Lackey Geography Scale. Wayne, Nebraska: H. H. Hahn, State Normal School. Hahn, H. H. Scale for Measuring Ability of Children in History. Wayne, Nebraska: H. H. Hahn, State Normal School. Moore, E.C. Minimum Course of Study. N. Y.: Macmillan, 1923. 402p. [56] "The first president of the United States was: Christo- pher Columbus, Benjamin Franklin, George Washington, Thomas Jefferson." In answering this exercise the pupil is asked to underline or mark in some other way the name required to make a true sentence. Completion exercises in which pupils are asked to supply words which have been omitted furnish still another means of simplifi- cation. Directions for constructing a true-false examination.^ 1. In constructing true-false exercises, a list of statements covering in some detail the portion of the subject on which the pupils are to be examined should be prepared. Some of the statements can then easily be changed so that they are false. The untruth of a state- ment should not be too obvious or it will be worthless for testing. Also statements should be selected which require an acquaintance with the subject in order to determine their truth or falsity. 2. In a true-false examination the number of true statements should approximate the number of false statements, and the arrangement should be such that there is no regular sequence be- tween true statements and false statements. 3. Since the pupil can give his responses very quickly, the examination should consist of not less than fifty statements. A true-false examination of one hundred statements can be given in the time usually devoted to an ordinary examination. 4. The examination should be mimeographed or printed so that each pupil will have a copy. He may give his answers in the margins of the sheets, or, if it is desired to use the same set of papers with another group of pupils, he may be given a sheet of paper on which there are numbered blanks. The pupils will then be asked to record in the blanks their answers to the corresponding exercises. 'A less desirable plan, which may be followed when it is not possible to secure mimeographed copies of the examination, is to read the statements to the pupils and have them record their answers in numbered blanks. The disadvantage of this plan is that the pupils do not have a satisfactory opportunity to study the statements. _^lso the class may give some indication of the answer if a statement appeals to them as being ridiculous. 5. The pupils should be given specific directions in regard to answering exercises about which they are uncertain. One writer *For an example of a true-false examination, see Appendix p. 69. [57] has suggested that the pupils be instructed to guess concerning the truth or falsity of the statement. Another writer who has used this type of examination instructed the pupils as follows: "First, go through the list quickly and mark all that you know for certain, then go back and study out the harder ones. Do not guess; the chances are against you on guessing. Don't endanger your score by gambling on those questions about which you know nothing." This second procedure is probably the better. The scoring of a true-false examination. Since only two re- sponses are possible, it is obvious that a pupil may give a correct response as the result of chance. In order to take this possibility into account, a pupil's score on an examination of this type is the number of exercises answered correctly minus the number answer- ed incorrectly. Exercises not attempted are not counted. Directions for constructing a recognition examination.' In constructing this type of examination none of the proposed an- swers should be too obviously incorrect. An exercise can yield an indication of a pupil's achievement only when he is forced to use judgment in determining which of the proposed answers is suit- able. For example, the illustrative exercise given would be practi- cally worthless for testing purposes if all the names, except that of George Washington, were of persons living today or of persons having no connection with our national life. In applying this type of exercise to the field of arithmetic the proposed answers should include erroneous answers which pupils are inclined to give: if the exercise called for the quotient of two fractions, one of the proposed an- swers should be the product of the fractions and another their sum, and perhaps another should be the fraction obtained by taking the sum of the numerators as a new numerator and the sum of the denominators for a new denominator. When the correct answer is included in a group of such answers as these, the pupil who does not know how to find the quotient of such fractions will be unable to determine the correct answer except as a matter of chance. On the other hand, if all of the answers except the correct one were integers or were so large that they were obviously incorrect, a bright pupil who knew nothing about division of fractions would be able to select the correct answer. The correct answer should not always be found in the same position; sometimes it should be 'For an example of a recognition examination see Appendix p. 75. [58] first, sometimes last, and sometimes in an intermediate position. As in the case of the true-false examination, a recognition ex- amination should consist of a large number of exercises. Examinations of this type should be mimeographed or printed and each pupil should have a copy. Definite instructions concern- ing methods of work should be given. It is probably best to in- struct the pupil to work through the test rapidly, answering those exercises about which he is certain. He should then go back over the list and try the more difficult ones. Not fewer than four pro- posed answers should be included in each statement and the pupils may be instructed to guess if they do not know, since the chance of success by guessing is slight. The pupil's score on an examination of this type may be taken as the number of exercises done correctly. A somewhat unusual but interesting type of recognition ex- ercise is that described as a "matching contest." In this a pupil is given two lists of statements, the first numbered 1, 2, 3, 4, 5, etc., the second marked A, B, C, D, E, etc. In the second list, there is a statement which corresponds in meaning to a statement in the first list and the pupil is to pair these statements, marking by the number of the first list the letter of the corresponding state- ment of the second. For example, in the exercises given below: by the date marked (5), 1898, we place the letter B to indicate the event for which that date is significant. It is difficult to construct such examinations so that they will require reasoning on the part of the student. Their most important use is in the elementary school for rapid drill in certain phases of some subjects, such as definitions in geography and grammar, etc. The following exer- cises, selected from the Spokane United States History Test, illustrate the use of such an examination in linking a certain date or person with the corresponding event. 1. 1846 2. 1865 3. 1863 4. 1917 5. 1898 6. 1789 7. 1792 8. 1776 9. 1861 10. 1914 A. Lincoln's Emancipation Proclamation B. Spanish-American War C. Beginning of World War D. Declaration of Independence E. United States entered World War F. Election of Washington as President G. War with Mexico began H. Invention of the cotton gin I. Lee's surrender at Appomattox J. Beginning of Civil War [59] 1. Foch 2. Lincoln 3. Fulton 4. Dewey 5. Pershing 6. Bell 7. Edison 8. Jefferson 9. Lee 10. Franklin A. Destroyed Spanish fleet in Manila Bay B. Invented the telephone C. Leading Confederate General D. Wrote the Declaration of Independence E. Invented the steamboat F. Commanded allied armies in the World War G. Was President during the Civil War H. Commanded American Forces in the World War I. Was Revolutionary patriot, author, and inventor J. America's most famous inventor Directions for constructing completion exercises.^ A com- pletion exercise should be constructed so that no suggestion will be given of the correct words to be written in the blanks. Further- more, the facts to be supplied should be important. The best plan is to prepare a list of important statements and principles covering the portion of the subject over which the pupils are to be examined and then from these statements to strike out a certain significant word or phrase. In every case, if it is possible, the words omitted should be such that only one answer will be correct. Since little writing is required of the pupils they may be asked to fill in as many as one hundred blanks. The scoring of completion exercises is not as highly objective as in the two types mentioned above. Pupils will tend to write a variety of words in the blanks. Different words may have almost the same meaning, and frequently the scorer will be compelled to determine whether the meaning of some word is sufficiently near that of the correct answer to justify giving the pupil credit for having answered the exercises correctly. However, by a careful selection of statements and of the omitted words, this subjectivity may be greatly minimized. For example, in the sentence, "The first Continental Congress was held in ... .," only one possible word can be correct. In using completion exercises it is necessary to provide each pupil with a mimeographed or printed copy of the examination. The pupil's score is the number of blanks filled in correctly. Advantages of the "new examination." Examinations con- sisting of exercises of the types described above have certain ob- vious advantages. There will be a large saving of time for both teacher and pupil. The pupil is called upon to do little or no *For an example of a completion examination see Appendix p. 73. [60] writing in giving his answers and therefore is able to respond to a large number of exercises. The teacher in scoring will have little or no occasion to use judgment as he will need only to note the brief responses given by the pupils. Thus the labor of scoring will be greatly reduced and, more important, the scoring will be much more highly objective than that in the marking of examination papers of the usual type. The saving of time in the giving and scoring of the "new examination" will more than offset any ad- ditional time that may be expended in its construction. Another advantage is that the new examination can be made more com- prehensive. Examinations as a rule consist of ten questions. Some are limited to a smaller number. Consequently the scope of ex- aminations of the traditional type is necessarily narrow. "New examinations" of the true-false type should consist of not less than fifty exercises and may have as many as one hundred. Other types of the "new examination" should be of a corresponding length. Hence a "new examination" will usually be more comprehensive than a traditional examination. Limitations of the "new examination." There are certain limitations of the new examination which should be noted along with its advantages. It can not be used in mathematics except to a limited extent. It can not be used at all in English Composition. In other subjects there are many phases of achievement which are not measured directly by examinations made up of exercises of the types described above. Hence, altho the "new examination" is more comprehensive with reference to information, and does meas- ure certain types of achievements, it is likely that pupils would miss much valuable experience and training if they were not at times asked to "compare," "explain," "discuss," "define," or "tell why." They should also be asked to summarize material presented on a topic or to apply certain principles. The following questions taken from Hahn's Scale for Measuring Ability of Children in History appear to require mental processes distinctly different from those called for by the "new examination." 1. "State points of similarity between the position of the United States in 1812 and their position in 1912." 2. "Arrange the following events in order of cause and effect: Force Bill, Carpet Baggers, Fifteenth Amendment, Negro Rule in Some of the Southern States, Ku Klux Klan." 3. "Name the presidents of the United States since 1892." [61] An intelligent attitude toward the "new examination." The simple administration of the new examination and other attrac- tive features should not blind one to the limitations just mention- ed. As indicated in Chapter II written examinations do more than merely secure measures of achievement. If they consist of the right kind of exercises they afford significant educational oppor- tunities. The educational opportunities of the "new examination" are necessarily restricted, and it would be unfortunate if it entirely replaced examinations of the traditional type. The new examina- tion, however, has a place. It may be used occasionally in most school subjects. It is useful when a teacher wishes to test the acquaintance of a class with a wide range of facts. It has little diagnostic value and examinations of the traditional type should be used when information is desired concerning the weaknesses of different members of a class. For this reason the "new exami- nation" is more appropriate for use at the end of a term than for tests during the term which have as their purpose both measure- ment and diagnosis. [62] CHAPTER VII RULES FOR THE PREPARATION AND ADMINISTRATION OF WRITTEN EXAMINATIONS. Below, a group of suggested rules governing the preparation and administration of written examinations are given. These represent the opinion of the writers which is based upon a careful study of the problems involved, as well as upon several years of experience in the measurement of school achievement. 1. Final examinations should be required. In school subjects such as shop work, in which the performances secured from pupils are highly objective, the waiving of this requirement may be justi- fied. When final examinations are given no student should be excused from them because of high daily grades, deportment or attendance. (See p. 16) 2. The content of final examinations should agree as closely as possible with recognized educational objectives. In fields where minimum essentials have been determined they should be used as a basis in formulating questions. (See p. 55) 3. The questions should be definite and stated so that all pupils will interpret them alike. Questions relating to items of minor importance should occupy a minor place in examinations. Questions relating to points which have not received attention in the course should be omitted.^ (See also rule 11.) 4. When the necessary equipment is available the questions should be mimeographed or typewritten so that each student will have a copy on his desk. In case they are written on the board the teacher should make certain that all pupils are able to read them correctly. It is well in either case to read the questions aloud to the class. 5. The examination should be sufficiently difficult so that few pupils will make perfect scores. (This rule should not apply ' incorporated city obtains from the government a under which it may elect its officials and conduct its business. 14. Civil service employees may be removed from service only for 15. The power of Impeaching a state officer is given to the 16. The is by far the most prominent and powerful executive official in the state. Ver)' state officers are appointed by him or are re- sponsible to him. 17. All important officials connected with the executive or judicial service of the United States may be removed by through the lower house of Congress and by in the senate. 18. Far more property is destroyed by than by all other agencies. 19. There is no task of state and local government which outranks in importance that of providing an education at public expense. 20. All rivers and canals within a single state are controlled by the in which they are located. 21. Most of the revenue for state and local governments is secured by a on 22. A state is the fundamental law which the people of the state have arranged for their government and protection. 23. A state constitution can be changed by means of an 24. The three-fifths compromise provided that five should be counted as equal to three when reckoning the for either direct taxation or representation. [74] RECOGNITION EXAMINATION IN ALGEBRA Prepared by Bureau of Educational Research University of Illinois Name Boy or Girl Age last birthday Next birthday will be 19. Grade Date City State School Teacher Below you will find a number of statements. In each statement a word or number has been omitted. At the close of the statement several words or numbers have been given. One of these is the correct answer. Select the word or number which you think is correct and draw a line under it. Most, if not all, of the examples can be solved by mental calculation. If any figuring is necessary, work on the margin of the page. You will be allowed 17 minutes for the test. 1. Numbers that are represented by letters are called numbers. substituted — literal 2. When two or more letters are multiplied together each is called a of the product. factor — coefficient 3. If a man rides a certain distance in 10 hours, in h hours he rides 4. The statement 2x + 5 = 29 is called an identity — equation. 5. If 16 is subtracted from three times a certain number the result is 1 10. The number is 36^; 31M; 42 6. A number which is a factor of two or more numbers is called a factor. common — equ al 7. If there are two equal factors of a number, either is called the of the number. square root — common factor 8. To multiply algebraic fractions take the of the numerators for a new numerator and the product of the denominators for a new denominator, sum — product [75] 9. A fraction whose numerator or denominator (or both) contains fractions is called a fraction. multiple — complex 10. A is a statement of a fact which is to be proved, theorem — axiom 11. The name given the + sign is negative — positive 12. To find the sum of two numbers whose signs are opposite, take their regarding each as positive, and prefix the sign of the larger number to the answer, sum — difference — product 13. Whenever a number occurs without a sign, the sign is to be under- stood. X; +; — 14. The number denoting the power of a term is called the prefix — exponent 15. If a = 2, b = - 3, and c = - 5 then^^ = - 6; - 2; 30 a 16. In adding like terms add the coefficients for the new coefficient and it by the common factor. multiply — divide 17. An expression which contains more than one term is called a monomial — polynomial 18. If the length of a rectangle =4 feet more than twice the width, the perimeter = 56 feet. The length = feet. 8—12—16 19. Any term may be transposed from one side of an equation to the other, provided its is changed. sign — value 20. Any equation which contains no higher power of the unknown letter than the first is called a equation. radical — simple 21. The exponent of the product of two powers of the same number is equal to the of the exponents of the factors. product — sum 22. To raise the product of two numbers to any power, raise the numbers separately to that power and take their product — sum 23. The square of any two numbers is equal to the square of the first number twice the product of the two plus the square of the second number, plus — minus 24. A 20 foot ladder rests against a building, the bottom of the ladder being 12 feet from the cellar wall. The top is feet from the ground. 8-16 25. In division, the sign of the quotient is whenever the dividend and divisor have like signs. — ; + [76] 26. In finding the quotient of two powers of the same number the exponent of the quotient is equal to the exponent of the dividend by that of the divisor. increased — diminished 27. (3x2-2x-l)-^(x-l) = 3x+l;3x-l 28. A factor which has no factor except itself and unity is called a factor. prime — multiple 29. The product of all the common prime factors of two or more numbers or expres- sions is called their common factor. highest — lowest 30. If one number is exactly divisible by another, the first is called a of the second. divisor — multiple 31. In algebraic fractions the dividend is called the denominator — numerator [77] UNIVERSITY OF ILLINOIS BULLETIN Issued Weekly Vol. XXI November 26, 1923 No. 13 [Entered as second-class matter December 11, 1912, at the post office at Urbana, Illinois, under the Act of August 24, 1912. Acceptance for mailing at the special rate of postage provided for in secticQ 1103, Act of October 3, 1917, authorized July 31, 1918.] BULLETIN No. 17 BUREAU OF EDUCATIONAL RESEARCH COLLEGE OF EDUCATION THE PRESENT STATUS OF WRITTEN EXAMINATIONS AND SUGGESTIONS FOR THEIR v^ IMPROVEMENT By Walter S. Monroe Director, Bureau of Educational Research Assisted by Lloyd B. Souders Formerly Assistant in Bureau of Educational Research PRICE 50 CENTS PUBLISHED BY THE UNIVERSITY OF ILLINOIS, URBANA 1923 The Bureau of Educational Research was established by act of the Board of Trustees June 1, 1918. It is the purpose of the Bureau to conduct original investigations in the field of education, to summarize and bring to the attention of school people the results of research elsewhere, and to be of service to the schools of the state in other ways. The results of original investigations carried on by the Bureau of Educational Research are published in the form of bulletins. A complete list of these publications is given on the back cover of this bulletin. At the present time five or six original investigations are reported each year. The accounts of research conducted else- where and other communications to the school men of the state are published in the form of educational research circulars. From ten to fifteen of these are issued each year. The Bureau is a department of the College of Education. Its immediate direction is vested in a Director, who is also an instructor in the College of Education. Under his supervision research is carried on by other members of the Bureau staff and also by grad- uates who are working on theses. From this point of view the Bureau of Educational Research is a research laboratory' for the College of Education. Bureau of Educational Research College of Education University of Illinois, Urbana THE UNIVERSITY OF ILLINOIS THE STATE UNIVERSITY URBANA DAVID KINLEY, PhD., LL.D., President The University Includes the Follozoing Departments The Graduate School The College of Liberal Arts and Sciences (Ancient and Modern Languages and Literatures; History, Economics, Political Science, Sociology, Philosophy, Psychology, Education; Mathematics; Astronomy; Geology; Physics; Chemistry; Botany, Bacteriology, Zoology, Entomology; Physiology; Art and Design; Home Economics) The College of Commerce and Business Administration (General Business, Banking, Insurance, Accountancy, Railway Administration, Railway Transpor- tation, Industrial Administration, Foreign Commerce; Courses for Commercial Teachers and Commercial and Civic Secretaries; Commerce and Law) The College of Engineering (Architecture; Architectural, Ceramic, Civil, Elec- trical, Gas, General, Mechanical, Mining, Municipal and Sanitary, Railway En- gineering, and Engineering Physics) The College of Agriculture (Agronomy; Animal Husbandry; Dairy Husbandry; Farm Mechanics, Farm Organization and Management; Horticulture, Landscape Gardening, and Floriculture; Agricultural Extension; Home Economics) The College of Law (Three-year and four-year curriculums based on two years of college work) The College of Education (General Education; Athletic Coaching; Agricultural Education; Home Economics Education; Industrial Education; Music Educa- tion; University High School; Bureau of Educational Research) The Curriculum in Journalism The Curriculums in Chemistry and Chemical Engineering The School of Railway Engineering and Administration The School of Music (four-year curriculum) The Library School (two-year curriculum for college graduates) The College of Medicine (in Chicago) The College of Dentistry (in Chicago) The School of Pharmacy (in Chicago; Ph.G. and Ph.C. curriculums) The Summer Session (eight weeks) EIxperiment Stations and Scientific Bureaus: U. S. Agricultural Experiment Station; Engineering Experiment Station; State Natural History Survey; Bio- logical Experiment Station on Illinois River; State Water Survey; State Geolog- ical Survey; U. S. Bureau of Mines Experiment Station. The library collections contain July 1, 1923, 556,105 volumes and 127,941 pam- phlets. For catalogs and information address THE REGISTRAR Urbana, Illinois BULLETINS OF THE BUREAU OF EDUCATIONAL RESEARCH COLLEGE OF EDUCATION, UNIVERSITY OF ILLINOIS, URBANA, ILLINOIS Price No. 1. Buckingham, B. R. Bureau of Educational Research, Announcement, 1918-19 IS No. 2. First Annual Report 25 No. 3. Bamesberger, Velda C. Standard Requirements for Memorizing Lit- erary Material SO No. 4. Holley, Charles E. Mental Tests for School Use. (Out of print) SO No. 5. Monroe, Walter S. Report of Division of Educational Tests for 1919-20 .25 No. 6. Monroe, Walter S. The Illinois Examination 50 No. 7. Monroe, Walter S. Types of Learning Required of Pupils in the Sev- enth and Eighth Grades and in the High School 15 No. 8. Monroe, Walter S. A Critical Study of Certain Silent Reading Tests.. .50 No. 9. Monroe, Walter S. Written Elxaminations and Their Improvement 50 No. 10. Bureau of Educational Research. Relation of Size of Class to School Efficiency SO No. 11. Monroe, Walter S. Relation of Sectioning a Class to the Effectiveness of Instruction 15 No. 12. Odell, Charles W. The Use of Intelligence Tests as a Basis of School Organization and Instruction 50 No. 13. Monroe, Walter S., and Foster, I. 0. The Status of the Social Sciences in the High Schools of the North Central Association SO No. 14. Monroe, Walter S., and Carter, Ralph E. The Use of Different Types of Thought Questions in Secondary Schools and Their Relative Dif- ficulty for Students 30 No. 15. Monroe, Walter S. The Constant and Variable Errors of Educational Measurements 25 No. 16. Odell, Charles W. An Annotated Bibliography Dealing With the Classification and Instruction of Pupils to Provide for Individual Differences SO No. 17. Monroe, Walter S., and Souders, Lloyd B. Present Status of Writ- ten Examinations and Suggestions for Their Improvement 50 y >L 1 \^ y