- THE UNIVERSITY OF ILLINOIS LIBRARY 370 K6 -, H0.9-I3 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN r JUL2 8 1<78 APR 9*» OCT 1 1^2 SEP 2 7 f!fT 2 5 »» OCT 23 138!; MAY*. L161 — O-1096 Digitized by the Internet Archive in 2012 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/writtenexaminati09monr BULLETIN NO. 9 BUREAU OF EDUCATIONAL RESEARCH COLLEGE OF EDUCATION WRITTEN EXAMINATIONS AND THEIR IMPROVEMENT by Walter S. Monroe, Director PRICE SO CENTS PUBLISHED BY THE UNIVERSITY OF ILLINOIS, URBANA 1922 370 TABLE OF CONTENTS PAGE Preface 5 Introduction: Basis of a rational estimate of the value of written examinations 8 CHAPTER I. Criticisms of Written Examinations Set by Teachers and Other School Officials 9 1. Examinations yield inaccurate measures of achievement 9 A. Marking of examination papers subjective 9 B. Questions of examination not equal in difficulty, and weighting by teachers subjective 11 C. Content of examinations not in agreement with educa- tional objectives 12 D. Rate of work neglected 13 E. Adequate opportunity for a pupil to demonstrate his ability not offered by single examination 13 F. Marks assigned to examination papers imply subjec- tive norms 14 2. Undesirable mental processes stimulated by examinations.. 16 3. Examinations tend to become educational objectives 17 4. Examinations injurious to health of students 17 5. Time devoted to marking of examination papers might be more profitably employed 17 CHAPTER II. The Defense of Written Examinations Set by Teachers and Other School Officials 18 1. Measurement of abilities of students necessary to high de- gree of school efficiency 18 2. Substitutes for written examinations 18 A. Standardized educational tests versus examinations.... 19 B. Teachers' estimates versus examination marks 19 C. Daily "grades" versus examination marks 21 3. Inaccuracy of examination marks 22 A. Neglect of the rate of work not necessary 22 B. Unequal difficulty of questions not a serious defect 22 C. Inaccuracy of single examination 23 4. Examinations force students to review and organize con- tent of course 24 5. Examinations furnish effective motive 25 6. Proper use of examinations 26 7. Examinations as objectives 26 8. Effect of examinations upon health of students 26 9. Time devoted to examinations a profitable investment 27 CHAPTER III. Methods of Improving Examinations 28 1. Increasing objectivity in marking of examination papers.... 28 A. Increasing objectivity of "grades" by improving ex- amination questions 28 (1) General methods 28 (2) New examinations: use of questions permitting only one correct answer 29 a. True-false exercises 30 b. "Yes" and "no" exercises 31 c. Recognition exercises 31 d. Completion exercises 34 e. Other advantages of "new examination" 34 f. Limitations of "new examination" 35 B. Rules for marking examinations 36 2. Increasing objectivity of norms for translating examination scores into school marks 39 3. Securing closer agreement of examinations with educational objectives 41 CHAPTER IV. Summary 42 APPENDIX 44 1. True-false examination in physiology 44 2. True-false examination in history and civil government 48 3. True-false examination in geography 52 4. Completion examination in American government 57 5. Recognition examination in algebra 59 6. Traditional examinations 62 PREFACE During the past twenty years there have been many controversies concerning the value and place of written examinations. There have also been a number of investigations of examinations and examination "grades." Since standardized educational tests have become widely used a number of superintendents and teachers have proposed that they replace the written examinations set by teachers and other school officials. More recently some attention has been given to the improvement of written ex- aminations by the application of certain principles of test construction. Because of the importance of the written examination and also because a number of inquiries have been addressed to the Bureau of Educational Research, it has seemed wise to organize and publish a summary of the important ideas relating to both the criticism and the defense of examina- tions. To this there have been added a number of suggestions for the im- provement of examinations. It is hoped that this bulletin may foster intelligent thinking relative to written examinations and their use in our schools. Although this bulletin is largely the product of the labors of the Director of the Bureau of Educational Research it is only just that the contributions of other members of the staff should receive recognition. Both Mrs. Charles H. Johnston and Mr. Lloyd B. Souders have made substantial contributions. Walter S. Monroe, Director WRITTEN EXAMINATIONS AND THEIR IMPROVEMENT. INTRODUCTION Basis of a Rational Estimate of the Value of Written Examinations. Until recently, examinations occupied a regular place in the work of the school. Students expected them as a matter of course, and the ac- curacy of the marks placed upon examination papers was not seriously questioned. However, for a number of years written examinations set by teachers and by other school officials have been subjected to criticism. During this period the defects and the limitations of examinations have been thoroughly canvassed. Many prominent educators have advised that they be abolished entirely, and in a number of school systems this has act- ually been done. The friends of examinations, however, have urged their merits and have insisted that the abolition of them would cause our educational system to deteriorate. The controversy has not been without prejudice on both sides. The marking of examination papers involves much drudgery for instructors. Students dislike examinations partly because they require a type of intensive mental activity which many of them prefer to avoid and partly because it is fashionable in many schools to oppose them. Conservatives, naturally, have resented any proposal to change a system of education which they credited with producing the educated men of the present generation. Some, at least, have expressed the belief that examinations have been largely responsible for the quality of the output of our public schools and colleges. In evaluating the criticisms and the defense of written examinations it is imperative that one keep in mind the fact that they have more than one function. Written examinations are not merely measuring instru- ments, although this function is probably most prominent in the thinking of many persons. The written examination is used as an instrument for measuring the achievements of students, but it also affords a unique type of opportunity for learning. Under rather well defined conditions, certain tasks are set for the pupil and he is required to demonstrate within a limited time what he is able to do. He is thrown upon his own resources and forced ^ to work under pressure. In the actual writing of his answers to the ques- tions of the examination the pupil has an opportunity to learn. Ideas tend to become more definite as a result of expression in written form. Frequently the pupil gains new ideas as a result of the reflective thinking he does in answering the questions. It is true that all pupils do not al- ways learn in taking an examination, but it is also true that all pupils do not take advantage of all other educational opportunities which are offered them. In addition to the actual taking of the examination, the pupil frequently, as a preparation for it, engages in review; and, because he knows that later he must take the examination, he has a stronger motive for this review. Not only is it important that we recognize the existence of functions other than the one of measurement, but it is also imperative that we bear in mind two distinctions. First, we must distinguish between criticisms of examinations and criticisms of certain kinds of examinations. The fact that some teachers set poor examinations does not furnish an adequate basis for concluding that all examinations should be abolished. In the second place, we should distinguish carefully between criticisms of ex- aminations and criticisms of the ways in which they are used. Good ex- aminations may be used for wrong purposes. For example, a good ex- amination might be given to a pupil or a group of pupils merely as a punish- ment for some misbehavior. If we believe that such use is not justified it does not follow that the examination itself is subject to adverse criticism or that all examinations should be abolished. CHAPTER I. CRITICISMS OF WRITTEN EXAMINATIONS SET BY TEACHERS AND OTHER SCHOOL OFFICIALS The arguments advanced for and against examinations have dealt with various phases. Some of the criticisms have emphasized the effective- ness of the examination as a measuring instrument; others have had to do with the purposes for which examinations are used by teachers and by other school officials. Some criticisms are based upon facts, while others merely represent opinions. In the following pages the most significant criticisms have been summarized and grouped under a few major heads. In presenting these criticisms there will be no attempt to point out their limitations or to present the arguments in favor of written examinations. This will be reserved until the second chapter. 1. Examinations yield inaccurate measures of achievement. A number of criticisms of written examinations set by teachers and by other school officials have referred to their effectiveness as instruments for meas- uring the achievements of students. These criticisms may be summarized under six heads. A. Marking of examination papers subjective. Scientific investi- gation has proved that the marking of examination papers is subjective, i.e., different teachers, when working independently, tend to assign widely varying marks to the same paper. An investigation by Starch and Elliot 1 is typical of many that have been made. These investigators selected a final examination paper in geometry, written by a student in one of the largest high schools in Wisconsin. An exact reproduction of this paper and a set of the questions were sent to one hundred and eighty high schools in the North Central Association. It was requested that this paper be graded according to the practise and standards of the school by the princi- pal teacher of mathematics. One hundred and sixteen acceptable replies were received. The papers showed evidence of having been marked with unusual care and attention. In seventy-three schools where the passing grade was 75 the lowest mark given was 39 and the highest 88. The mode was 75, with twelve teachers giving this mark. Of the one hundred and Starch, Daniel, and Elliot, E. C. "Reliability of grading high school work in math- ematics," School Review, 21: 254-59, 1913. sixteen marks assigned to this paper, two were above 90 and one was below 30. Twenty were 80 or above and twenty other marks were below 60. Forty-seven teachers assigned a mark passing or above, but sixty-nine teachers thought this paper not worthy of a passing mark. Robert L. Morton 2 reports an investigation of the reliability of the marking of examination papers written by teachers applying for a license to teach. In 1904, the Ohio Legislature provided for uniform questions for the teachers' examination. These questions were to be prepared in the office of the State Superintendent of Public Instruction and sent to the eighty-eight county boards of examiners. Special examiners were appointed in each county to rate the papers. Morton selected an arith- metic paper from the files of one board of examiners. The paper was mimeographed, care being taken to produce exactly the language, spelling, and punctuation of the original paper. A copy of this paper, together with the questions, was sent to each of the eighty-eight county superintendents in Ohio with the request that it be graded by the special examiner for arith- metic. Replies were received from fifty-five counties. The lowest mark given to the paper was 60 and the highest 99. In marking the answer given to one question on this paper five examiners rated it at zero, twenty- one at 10, and the other twenty-nine assigned marks between these extremes. If each answer had been rated in the county assigning the low- est mark to it, the total "grade" for the paper would have been 28. On the other hand, if the highest marks assigned to the answers of the various questions had been used to make up a "grade", a mark of 100 would have been given to the paper. Morton investigated in a similar way the mark- ing of a paper in the theory and practise of teaching and also of one in geography. Similar variations in the marks were found. ^ A striking illustration of the subjectivity of the marking of exami- nation papers by college instructors is cited by a recent writer 3 . One of the group of expert readers assigned to the marking of examination papers in history, after scoring a few papers, wrote out for his own convenience what he considered model answers to the questions. By some mischance this "model" examination paper fell into the hands of another expert reader who graded it as a paper written by a student. The mark he assigned to it was below passing and, in accordance with the custom, this "model" was rated by a number of other expert readers in order to insure that it was properly marked. The marks assigned to it by these readers varied from 40 to 90. 2 Morton, Robert L. "The examination method of licensing teachers," Educational Administration and Supervision, 6: 421, November, 1920. 3 Wood, Ben D. "Measurement of college work," Educational Administration and Supervision, 7: 301-34, September, 1921. 10 Scientific investigation of the marking of examination papers has been sufficiently extensive to prove that, except in a very few instances, the process is subjective. Except for accidental errors, different teachers should assign the same mark to an examination paper in spelling. The marking should also be highly objective in arithmetic unless there is an attempt to allow partial credit for examples and problems partially right or for correct principle when the answer is not correct. The marking of the answers to questions which call for specific facts, such as dates, names of places, or persons, should approach objectivity. With the exception of these cases, the marking as it is ordinarily done is highly subjective, and hence the "grades" are inaccurate measures of achievement. As might be expected, the degree of subjectivity varies with differ- ent school subjects. It is, however, sometimes found to be high where the nature of the subject matter leads one to expect that the marking will be relatively objective. For example, Starch and Elliot found that the mark- ing of an examination paper in geometry was just as subjective as one in English or history. Kelly 4 found that the rating of examination papers in algebra was considerably more objective than in physics. B. Questions of an examination not equal in difficulty, and weighting by teachers subjective. There is abundant evidence that the questions of an examination are generally not equal in difficulty. Frequently, in this respect, they vary widely. When the questions are submitted to a large number of pupils, some will be answered correctly by a large percent of the pupils, others by only a small percent. To give as much credit for answering an easy question as for a difficult one would appear to introduce serious errors into the marks assigned to the papers. Because it is recog- nized that the questions which make up an examination are generally un- equal in difficulty, teachers frequently attempt to assign appropriate weights. For example, one question may be assigned a credit of 15 points while an easy one is given a credit of only 4 points. One investigation 5 has shown that teachers' estimates of the difficulty of questions are highly subjective. Twenty teachers were asked to arrange twenty-three prob- lems in arithmetic in the order of their difficulty. A very wide variation in these rankings was found. One problem was considered the easiest by one teacher and ranked twenty-first in difficulty by another. The results of this investigation seem representative. That being the case, any weight- 4 Kelly, F. J. "Teachers' marking," Teachers College, Columbia University, Con- tributions to Education, No. 66, 1914. 6 Comin, Robert. "Teachers' estimates of the ability of pupils," School and Society, 3: 67-70, January 8, 1916. 11 Ing of questions by teachers must be considered highly subjective, and hence not a satisfactory corrective for the unequal difficulty of questions. C. Content of examinations not in agreement with educational ob- jectives. The criticism is frequently made that teachers, in formulating examination questions, tend to ask for unimportant details and to neglect the minimum essentials of a subject, and that, therefore, a pupil's per- formance on an examination can not be a truthful index of the extent to which he has achieved the educational objectives set for him. Some questions are described as "catch questions." By this, it is usually meant that such questions call for some unimportant detail or that they are am- biguous in some way. There appears to have been no scientific investi- gation of the character of the examination questions asked of pupils. However, it is doubtless true that this criticism has justification in some cases because frequently teachers give relatively little time to the prepar- ation of their questions, and these often reflect any hobbies or prejudices which the teachers may have. Experience in the construction of stand- ardized educational tests has shown that it is difficult to eliminate all am- biguity and indefiniteness in questions. Hence, it is likely true that many questions are not well stated, and for this reason are not properly under- stood by those taking the examination. When this is the case, the "grades" tend to be inaccurate measures of achievement. When an examination is set by some person other than the teacher of the class it not infrequently happens that many of the questions pertain to topics which have received little or no attention during the instruction periods. In many schools it seems to be the custom for the superintendent or the principal, without consultation with the teacher in charge, to make out the questions for the final examination on which the pupils' semester grades are largely based. For example, in a fifth grade geography class in an Illinois city, four of the five questions of the examination concerned current conditions about which the children, instructed only in their texts, knew little. A few pupils, fortunate enough to have heard these matters -discussed in their own homes, received a passing grade. The majority of the class failed. This examination, interesting and in itself not subject to criticism, should not have been used, however, as a means for measuring the achievements of that particular class. It was not in agreement with the educational objectives toward which the teacher had directed their efforts. Such examinations are "hard" in the sense that capable students will answer only a relatively small percent of the questions correctly, and are rightly criticized as being unjust because the students are not given an opportunity to demonstrate their achievements. 12 D. Rate of work neglected. The usual plan is to set an examination which practically all pupils can finish in the time allowed. No record is kept of the time which the pupil has spent in writing his answers. If two pupils write papers which are considered equivalent in quality but one has completed the examination in forty minutes and the other in ninety min- utes, it is not customary to distinguish between their performances. ^ Both will receive the same "grade." This means that the rate of work is neg- lected. Since the rate at which a pupil is able to answer questions is one index of his ability, the ordinary examination fails in this respect to secure a truthful measure of his ability. E. A single examination does not offer an adequate opportunity for a pupil to demonstrate his ability. Some critics urge that a single exami- nation, even when carefully prepared and graded, will not in general yield a reliable measure of a student's ability. McAndrew 6 , in reviewing the work of the New York City high schools, says, "New York City high schools use them (examinations) in deciding the promotion to the training school for teachers. We have every year some students whom their teachers have complimented regularly but who fail of graduation because of a three hour test which nullifies the work of four years. I cannot see how a pupil writing for three hours can be tested for what he has done for a year or more." Courtis 7 expresses much the same thought in the following statement: "The best examination is not that represented by the score of a single performance in a single day. Human effort is variable and human skill too easily upset to make it fair to have promotion based upon chance scores." Thorndike 8 has summarized a number of investigations carried on at Columbia University in order to determine the reliability of the "grades" made on college entrance examinations as a basis for predicting the type of work which the student will do in college. He states that we cannot estimate the success of the student in college from his grades on entrance examinations with "enough accuracy to make the entrance examinations worth while and to prevent gross injustice being done to any individual. The record of eleven or more entrance examinations gives a less accurate prophecy of what a student will do in the latter half of his college course than does his high school record." Similar results have been obtained by •McAndrew, Wm. "Our old friend the examination," Proceedings of National Edu- cational Association, 1916, pp. 527-33. > . 7 Courtis, S. A. "Standardizing of teachers' examinations, Proceedings of National Educational Association, 1916, pp. 1078-86. _ _ n 8 Thorndike, E. L. "The future of the college entrance examination board, Educa- tional Review, 31: 470-83, May, 1906. 13 other investigators. 9 Studies of this type do not necessarily prove that examination "grades" are inaccurate measures of achievement. Because of other elements which enter into college life, a student having made a satisfactory record in his secondary school may not carry on successfully his work in college. F. Marks assigned to examination papers imply subjective norms. This criticism has to do with errors in interpreting measures of achievement yielded by examinations rather than with errors in the measures, but since examination marks are usually thought of as measures, it may be con- sidered under this head. In reading the controversial literature on written examinations one will find little mention of the criticism expressed by the heading of this paragraph, but in the illustrations given below the reader will recognize that this weakness of examination marks has been sensed by most pupils and teachers. However, they appear to have failed to analyze the situation sufficiently to grasp the source of the difficulty. In the judgment of the writer this is one of the most serious weaknesses of the traditional examination. In order to understand how norms (standards) are used in connection with the grading of examination papers it is necessary to distinguish be- tween scores, or measures, and "grades," or marks. A score simply de- scribes the performance which has been recorded in the examination paper. For example, a pupil may answer 55 per cent of the questions correctly. In this case 55 is his score. If a certain number of points or credits had been given for each question his score might be 129 or 91, or 217. A "grade" interprets this description with reference to certain norms. A "grade" indirectly describes a pupil's performance on an examination, but it also tells whether the pupil's performance is to be considered as above passing or below passing; whether he is to receive the highest mark or the lowest mark or an average mark. It is customary to describe the quality of examination papers in terms of the percent of questions answered cor- rectly. For example, if an examination includes ten questions and a pupil answers seven of them correctly and an eighth one partially right, he is given a score of 15 per cent, which is interpreted to mean that in the judg- ment of the examiner he has answered the questions 75 percent correctly. School marks or "grades" are also frequently expressed in terms of per- cents. Sometimes they are expressed in terms of letters or other symbols, but these in turn are defined in terms of percents. For example, the grade of "A" may be defined as being between 95 percent and 100 percent. 'Lincoln, E. A. "Relative standing of pupils in high school and early college, and their college entrance examinations," School and Society, 5: 417-20, April 7, 1917. 14 Since both scores and "grades" are generally expressed in terms of percents, it is only natural that the two have been confused and that scores have been used as "grades." A good illustration of their difference came to the writer recently. An examination in mathematics was given to nearly 1000 freshmen in one of our large universities. This examination may properly be described as "hard," considering the training which the students had received. One student made a score of 100. The lowest score was 12. The average was approximately 55. From the standpoint of the distribution of scores this was a "good examination." If it had been easier, so that any considerable number of pupils received scores of 100 percent, it would have been defective. If it had been so "hard" that a considerable number of students made zero scores it would also have been defective. In both cases it would have failed to differentiate between some students who were not equal in ability. However, it is obvious that if a passing mark of 70 or 15 is adopted, it would be unjust to say that all pupils having scores below this passing mark should receive a "grade" of failure. The passing mark for this particular examination should be in the neighborhood of 40. If in this case it is desirable to have the scores represented in terms of "grades" a score of 40 should be translated into a "grade" of 70 or whatever passing mark this institution has adopted. The recognition of this distinction between scores and "grades" enables us to indicate the way in which subjective norms are implied in "grades". A "grade" is not a pure measure or description of the pupil's performance. It is rather an interpretation of the measure of his perform- ance with reference to certain norms. When no distinction is made and scores are used as "grades," pupils will receive high "grades" if the ex- amination is "easy;" if it is "hard" they will receive low ones. Thus, the difficulty of the examination is one factor in establishing the norms with reference to which the scores are interpreted when they are used as "grades". Severe marking will tend to set high norms. It is only when the examina- tion is of average or "standard" difficulty and the marking is average in severity that scores and "grades" become identical in magnitude. Since the norms are established by the difficulty of the examination and the severity of the scoring, they must be subjective. In the investigations of the marking of examination papers it was shown that teachers varied wide- ly in their judgments concerning the worth of examination papers. There is no reason to expect that they would agree more closely in estimating the difficulty of examinations. Hence, norms which depend upon teach- ers' estimates of which questions are appropriate for examinations and upon their marking of the papers must be considered subjective. 15 2. Undesirable mental processes stimulated by examinations. A number of critics have urged that examinations — and especially the prepar- ation for them — tend to stimulate undesirable mental processes, and that as a result a student's learning is not of the right kind. It is claimed that memory is emphasized to the exclusion of higher mental processes. Many questions call only for facts, and even in the case of those which require reasoning the student must have some facts with which to reason. He is generally asked to answer the questions without having access to his text books or to other sources of information. It is seldom that the examina- tion provides the student with any of the facts or principles which he re- quires in the reasoning process. Hence, what he records upon his ex- amination paper must be based upon his own store of information. Thus, it is undoubtedly true that examinations make heavy demands upon the memory of the pupils. Those who have supported this criticism of examinations insist that this emphasis upon memory tends to change the child's mind into an automatic machine. The following statements are typical: "It is mere commonplace to say that in many schools the best preparation for an ex- amination is to have in memory a vast number of details. The student may undertake the examination in such a condition of brain fatigue that he would find it difficult to solve a simple original problem; but if he has this plethoric memory of details he will succeed." 10 "The memory which gives back knowledge in the exact form in which it is received is likely to shine at examinations." 11 In this connection it is claimed examinations place a premium upon "cramming." Since memory is emphasized it is only natural that stu- dents should engage in an intensive review of the course immediately preceding the examination period. It is a well established principle that the recency of an experience materially affects its recall. The critics of examinations insist that students who have given little attention to the work of the course during the term may write a creditable examination paper merely as a result of an intensive cramming immediately preceding the examination date. 12 Since this is true, students tend to loaf during the term and then "cram" for examinations. When a student's learning is limited to a brief period of intensive cramming immediately preceding the examination, he will tend to forget very rapidly. In addition, he has missed the regular growth that was 10 Monroe, Paul. Encyclopedia of Education, Vol. 2, 536-38. "Schaeffer, N. C. "The unmeasurable in teaching," N. E. A. Proceedings, 51: 169- 72, 1913. "Hollister, H. A. High School Administration. Boston: D. C. Heath & Company, 1909. 16 possible during the term and has had no opportunity to organize and re- late the various items of the course. 3. Examinations tend to become educational objectives. It has been charged that teachers tend to emphasize examinations in such a way that the students come to think of them as the educational objectives to be attained. Because the fear of making a low "grade" on an examination is an effective motive, teachers frequently use it. The students are re- minded of approaching examinations and are warned that they will not be able to pass them unless they study certain topics. When the "grade" received upon the final examination determines the student's success in the course, it naturally follows that he will think of "passing the examina- tion" as the objective to be attained. Sometimes students secure former examination questions or compile a list of the topics the instructor has emphasized and study these to the exclusion of other topics which may be more important. Thus, the real objectives to be attained are obscured. 4. Examinations injurious to health of students. Because of the strenuous preparation for examinations and also because of the obvious strain which accompanies the taking of an examination, a number of critics have urged that examinations are injurious to the health of students. Even students who have been faithful during the term will frequently cram for the examination because of a desire to take a high rank in their class or because of the additional credit for quality that is given in some institutions. When a student is required to write for a number of hours during a single day, as sometimes occurs, it is obvious that even under the best conditions he has expended a large amount of nervous energy. 5. Time devoted to marking of examination papers might be more profitably employed. A majority of our larger colleges and universities have recognized the fact that the marking of ordinary examination papers makes heavy demands upon the time of instructors and have provided "assistants" or "readers" who are to rate the papers. Not infrequently the examination period is dreaded by instructors as well as by students because of the drudgery involved in reading examination papers. One writer 13 has estimated that it requires approximatly three hours to con- struct, give, and mark an examination for an average class. A teacher will give twenty or more examinations per year. This means that each teacher devotes not less than sixty hours to the construction, giving, and marking of examinations. This estimate is conservative. Many teachers spend much more time than this. The critics of examinations have pointed out that this time might be more profitably devoted to other school activities. 13 McCaIl, W. A. "New kind of school examination," Journal of Educational Research, 1:33-46, January, 1920. 17 CHAPTER II. THE DEFENSE OF WRITTEN EXAMINATIONS SET BY TEACHERS AND OTHER SCHOOL OFFICIALS The criticisms of written examinations summarized in the preceding chapter appear to make a strong case against their use in schools. Some of the criticisms are established facts, and others corroborate our general observations. It is, however, necessary to remember that only one side of the case has been presented. Some of the criticisms are much less serious than their advocates would have us believe. Others are not funda- mentally criticisms of examinations, but rather of the way in which they are used and of other phases of our school procedure. In addition, certain significant merits of examinations were not mentioned. In this chapter the criticisms set forth in Chapter I will be examined, and the defense of examinations presented. In Chapter III we shall explain certain methods of improving written examinations which tend to eliminate or greatly minimize certain defects that now exist. 1. Measurement of abilities of students necessary to high degree of school efficiency. In considering written examinations as measuring instru- ments, it is necessary to bear in mind that the work of the school cannot be carried on in an efficient manner without from time to time measuring the abilities of students. These measurements are essential to the organization and administration of our schools as well as to the instruction of the class room. The promotion and classification of pupils, the guidance of pupils, both educational and vocational, the supervision of instruction, and even in- struction itself, cannot be most efficient unless the abilities of the pupils are measured at intervals. If these measurements are not made in one way they must be in another. If written examinations are abolished the abilities of pupils must be measured in other ways, or the efficiency of our schools will decrease. 2. Substitutes tor written examinations. It has been proposed by some critics that substitutes for written examinations may be used. Standardized educational tests, teachers' estimates, and daily "grades" furnish types of information upon which the measurement of the pupil's achievement may be based. However, it would seem, for the following reasons, that such means of measuring cannot be considered as satisfac- tory substitutes for written examinations. 18 A. Standardized educational tests versus examinations. Standard- ized educational tests have been shown to be superior to ordinary examina- tions as measuring instruments, but in considering the proposal that they replace written examinations certain facts must not be overlooked. The number of satisfactory standardized educational tests is as yet very limited. In only a few school subjects, such as handwriting, spelling, arithmetic, and oral and silent reading, do we have available standardized educational tests which might be used as substitutes for written examina- tions set by the teacher. Even in these subjects it may happen that for some reason the teacher has emphasized certain topics and omitted or treated casually others. Furthermore, official courses of study vary widely from city to city. Standardized educational tests are of necessity confined to those topics which are uniformly taught or, at most, to those topics which are generally taught; therefore, in a particular instance there may not be available any standardized educational test which is adapted to the instruction which the class has been receiving. A teacher, however, can construct an examination which is specifically designed to measure the results of instruction given to a particular class. Thus, standardized educational tests can be used only to a limited extent as substitutes for examinations set by the teacher or by other school officials. In this connection it is well to remember that standardized educa- tional tests are not perfect measuring instruments. Even the best of them do not begin to approach the degree of accuracy to which we are accus- tomed in the measurement of physical objects. The results which they yield may involve errors so large as to distort or make entirely erroneous many of our interpretations. Thus, in considering the errors which are involved in examination "grades" we should not forget that our best in- struments for measuring mental abilities are far from perfect with refer- ence to accuracy. B. Teachers' estimates versus examination marks. Some of the critics of examinations have insisted that experienced teachers are able to estimate the achievements of students by reason of their acquaintance with them during the term. It has been claimed that these estimates will be more truthful measures of achievement than the "grades" obtained from a written examination given at the end of the term. There is no doubt that experienced teachers can, under favorable circumstances, estimate with considerable accuracy the achievements of students. If the class is reasonably small and if the teacher has used methods of instruction which have called for frequent oral and written performances by the students and has kept a careful record of the quality of these performances through- 19 out the term, the estimates may be relatively accurate measures of the achievements of the students. However, there are certain limitations which should be noted. Teachers may be unduly influenced in their estimates by the recent performances of the students. Unless careful records have been kept throughout the term, inferior performances at the beginning tend to be overshadowed and the teacher's final estimate based upon the work of the last few weeks. In case the class is a large one, the teacher does not have an adequate opportunity of becoming intimately acquainted with the students. Teachers' estimates are likely to be materially affected by per- sonal characteristics of pupils. A pupil with a pleasing personality or one who is liked by the teacher is likely to be rated higher than one who is unattractive. If the class work is conducted so that there is little or no written per- formance on the part of the students, teachers' estimates will necessarily be based almost wholly upon the oral responses which students give during the class period. Some pupils make a good showing in class when the recitation is oral, but are at a decided disadvantage when asked to record their answers in writing. Frequently this difficulty is due to the fact that they are careless in their thinking and do not have well formed ideas to express. In oral recitation they are able to make a fair showing because of personal characteristics and because of the stimulus of detailed ques- tioning by the instructor. Furthermore, in a class discussion a bright student who has a good command of language may easily pick up ideas from other members of the class and recall items from his general experi- ence sufficient to make a good showing. On the other hand, there are stu- dents who express themselves most effectively in writing. They may be good thinkers but a little slow in their mental processes and not clever in discussion. Thus, it is difficult or impossible for teachers to estimate accurately the real achievements of students from oral recitations alone. In classes where a large amount of written work is required, as in English composition, the teacher has an opportunity, in formulating estimates, to consider the pupil's written as well as oral performances. In such sub- jects, the proposal to replace written examinations by teachers' estimates based upon the work of the course has much more merit than in other subjects where there has been only a small amount of written work. But even when much daily written work is required, a written examination covering the work of the term may give additional information concerning the real achievements of some students. 20 C. Daily "grades" versus examination marks. It has been pro- posed that the best measure of a student's achievements throughout a term is furnished by the average of his daily "grades." These "grades" include teachers' estimates of pupils' performances in class, marks given for written work submitted, and "grades" earned on short quizzes. Those who favor using the average of these marks as a pupil's final grade point out that by this means we secure a measure of a pupil's achievement which depends upon the quality of his work day by day and which, furthermore, is the average of a large number of measurements. Probably no one would contend that a student's daily "grades" should not receive con- sideration in determining his final standing. There are, however, certain dangers in taking the average of a pupil's daily "grades" as his final mark. In the first place, if the student knows that his final mark depends only upon his daily work he is likely to study for the day alone. His daily "grade" is based upon a rather small unit of work. Furthermore, it is based upon his performance immediately after studying the assignment. Thus, a pupil's ability to organize the content of a course and his retention of it are likely to receive little consideration in making up his final mark. Of course, it is possible for a teacher to ask the student to review frequently and to summarize and organize the work of the course at the end of the term. However, it is difficult to convince students of the necessity of reviewing the work of a course if the performance on this review receives no more weight in determining the final "grades" than the performances during an equal period of time elsewhere in the course. When classes are large it is difficult to have each student recite each day, at least to a sufficient extent to furnish any basis for assigning a daily "grade." When the members of the class are called upon only occasionally it frequently happens that they prepare only those lessons upon which they think they will be questioned. Of course, a resourceful instructor can do much to discourage this practise but the fact remains that daily "grades" may not be representative samples of a student's achievements in a course. A writ- ten examination covering the work of the entire course or of a large division of it will furnish a measure of achievement which can be secured in no other way. Conclusion: No satisfactory substitute for written examinations. Teachers' estimates and daily grades furnish certain types of measures of the achievements of pupils. In certain cases standardized educational tests can be used to secure relatively accurate measures of achievement in certain fields. However, these means of measuring cannot be considered satisfactory substitutes for written examinations. The written examina- 21 tion yields a type of achievement which cannot be secured through any other means. This measurement is more important for some school subjects than for others but in relatively few is one justified in abolishing the written examination as a means of measurement. 3. Inaccuracy of examination marks. In Chapter I several sources of error in examination marks were mentioned. In the case of some of these sources sufficient experimental evidence was presented to be con- vincing. The only way in which these sources may be eliminated or minimized is by modifying examinations. These modifications will be treated in Chapter III, but three sources of error may be commented on at this place. A. Neglect of the rate of work not necessary. It is not necessary that the rate of work be neglected in examinations set by teachers and other school officials. It is easily possible to take into account a student's rate of work in determining the mark which he is to receive on his examina- tion paper. One plan is to set an examination of sufficient length to keep all members of the class employed during the entire period. In marking the papers, the number of questions which the student has answered should be considered as well as the quality of his answers. Another procedure is to have each student record the time when he finishes. This record will enable the teacher to determine the amount of time devoted to answering the questions. In the measurement of handwriting and of silent reading, as well as of abilities in a number of other fields, it is extremely important that the rate of work be considered. B. Unequal difficulty of questions not a serious defect. It does not appear that accurate measurements of the abilities of students can be secured by giving the same credit for answering an easy question as for answering a difficult one. However, investigations of this question in connection with the scoring of standardized educational tests have in- dicated that the errors introduced by this procedure, which appears to be illogical, are not large. After having weighted the exercises of his language and grammar tests on the basis of difficulty, Charters 1 dropped the weights because he found that the correlation between the weighted and unweighted scores was slightly over 0.90. A number of other test makers have likewise used exercises which were unequal in difficulty without assigning any weighted credits to them. A number of other tests which consist of exercises arranged in ascending order of difficulty have been Charters, W. W. "Constructing a language and grammar scale,"J ourna l of Educa- tional Research, 1:249-58, April, 1920. 22 scored by taking the number of exercises done correctly, which amounts to giving as much credit for doing an easy exercise as for doing a more difficult one. The writer 2 has shown in the case of certain standardized educational tests that the error introduced by disregarding the inequality of the ques- tions in scoring is not larger than the variable error of measurement due to the variability of the performances of pupils. It, therefore, seems likely that the error introduced by disregarding the unequal difficulty of the questions of an examination will not be significant in most cases. When the examination consists of as many as twenty or more questions the un- equal difficulty of the questions probably can be safely disregarded. Of course, it is wise to avoid extreme variations in difficulty whenever possible. Those setting examinations should endeavor to formulate questions which do not differ widely in this respect. There is some doubt whether the difficulty of a question can furnish a criterion for determining the amount of credit which should be given for answering it correctly. Achievements are not valuable merely because they are difficult but because of the relation which they sustain to our educational objectives. Questions which are relatively unimportant may be very difficult. In fact, lack of instruction is one reason why a question is difficult. Therefore, topics which are assigned minor places in a course of study may furnish the basis for very difficult questions, while the topics which have been emphasized because of their importance may furnish the easier questions. It is not unlikely that the best basis for weighting ques- tions is their social importance rather than their difficulty. C. Inaccuracy of single examination. It has been urged that a single examination cannot yield an accurate measure of the achievements of students. It is human nature to be variable in one's performances. For certain reasons students whose achievements are ordinarily high may make low "grades" on a single examination. It must therefore be admitted that many of the "grades" yielded by a single examina- tion are likely to involve serious errors. However, in considering the significance of these errors with reference to the use of written examina- tions, it is necessary to distinguish between examination "grades" and final "grades." The fact that the "grade" for a single examination may be inaccurate and, hence, that a final "grade" should not be determined by a single final examination is not sufficient reason for abolishing written ex- 2 Monroe, Walter S. "A discussion of the performances of pupils on exercises of varying difficulty," School and Society, 15:341-43, March, 1922. 23 animations. It simply means that more than one written examination should be given or that other measures of achievement should be taken into consideration in determining final "grades." Teachers' estimates and daily grades should have a place. Any written tests which may be given during the term should also be given some weight. A final exam- ination, however, deserves a place also. In some subjects it should be given more weight than in others. 4. Examinations force students to review and organize content of course. One of the criticisms made in Chapter I was that written exami- nations tended to emphasize memory and consequently to minimize the opportunities for the functioning of the higher mental processes. On the other hand, the friends of written examinations insist that they tend to produce exactly the opposite effect. They contend that final examina- tions have a positive value in that they tend to force students to review and to organize the content of a course. This is a very important phase of learning in many subjects. It is pointed out that the emphasis which examinations place upon memory is not undesirable. Students should learn many facts, and they should learn them with sufficient accuracy and in sufficient detail to answer most examination questions. The memoriza- tion of a considerable number of facts is to be commended. Even the in- tensive cramming which is deplored by many persons has educational value. If a student studies throughout the course, an intensive review and organization of the material is extremely valuable. In fact, the profit to the student is probably relatively greater for time expended in this work than for any other equal amount of time devoted to the subject. Unless carried to an extreme, cramming is undesirable only when it has not been preceded by thoughtful study. Even in this case it is better to have the student "cram" for the examination than to go through the course without engaging in any learning. Incidentally, it should be noted that the existence of "cramming" is not the fault of the examinations but of the type of instruction given and of the policy of making the student's final grade depend wholly or very largely upon his examination mark. With proper instruction and with considerable weight given to his record throughout the course, it will not be possible for a student to loaf during the term and secure a creditable final "grade" by "cramming" immediately preceding the examination. The fact that some students are able to earn high marks merely by a brief period of intensive study immediately pre- ceding the examination is more of a reflection upon the content of the course than upon examinations. Undoubtedly, there are some students who are sufficiently brilliant to learn within a period of a few days as much 24 as the average student learns throughout the course; and, furthermore, it is probably true that in some of our courses relatively little is demanded of students. It may be pointed out, also, that the extreme emphasis upon memory cited by some critics is not the fault of examinations in general but of particular types of questions. It is possible to ask questions which may be answered entirely from memory. It is also possible to ask questions which require much reasoning and organization of information. The writing of an examination may itself be an important part of the student's learning. "There is no impression without expression." The writing of a three hour examination is undoubtedly an intensive form of expression. In case the questions are such that the reasoning and organi- zation of information are required, the student may learn a great deal from the act of writing his answers to the questions. From this point of view, it may be urged that a student who is not required to take final examinations is deprived of an important opportunity for learning. Not infrequently students who have been "excused from examinations" in high school report that they experience a distinct handicap when they become college students. 5. Examinations furnish effective motive. Although the written ex- amination should not be defended merely for the reason that it fur- nishes an effective motive, it must be admitted that this motive makes a special appeal to many pupils. In so far as possible, the pupils should be motivated by the intrinsic values of the subject matter. However, these values are frequently remote from the life of the student, and it is necessary to resort to motives which make a more immediate appeal. This point of view is effectively set forth in the following quotation: "Something in addition to merely appealing to the future motives seems necessary at times to urge along the average reader, some means whereby his future means are made comparatively more immediate. Not merely does the ex- amination stimulate the reader generally, but it also helps to determine his specific interest and behavior. If the child feels himself compelled to give an account of what comes within the field of his perception, he will soon de- velop a habit of attending to that for which he will probably be held ac- countable. The examination, when properly applied, constantly re- minds the reader that he is responsible for the mastery of what is proposed for his learning.'/ 3 3 Myers, G. C. "Examinations and the learner," Educational Review, 54:274-84, June, 1917. 25 6. Proper use of examinations. As we have just pointed out, ex- aminations may fulfill a very important function in the education of students. Also they may be abused. In some schools final examinations are used as a punishment. Students who attain a fixed average in their daily grades and who have not fallen below a minimum in their deportment are excused from final examinations. In such a case the examination is regarded both by teachers and students as undesirable or evil and natur- ally fails to fulfill its function. This practise of excusing some students from examinations is not a defensible one. The implication is that the examination is not an integral part of the school work, but merely a pun- ishment or a last resort measuring instrument. It is probably wiser to abolish final examinations than to follow the practise of excusing students upon the basis of high daily grades and satisfactory conduct. In con- sidering the defense of the written examination it is necessary to bear in mind that the existence of such abuses of examinations should not be in- terpreted as criticisms of examinations, but rather of the teachers and other school officials who use them. 7. Examinations as objectives. Examinations have been criticised because they tend to become immediate educational objectives. This is not wholly undesirable. If the examination consists of appropriate questions a pupil who studies to pass the examination will be directing his energies to appropriate ends. It is only when the examinations tend to become exclusive objectives or consist of inappropriate questions that this criticism has much force. Furthermore, it should be recognized that this criticism also is not a criticism of examinations as such, but rather of the use that is made of them by teachers and other school officials. 8. Effect of examinations upon health of students. While it is undoubtedly true that the preparation for examinations and the writing of them frequently make enormous drains on the energies of students, we have no carefully conducted investigation of the actual effect upon their health. To one who observes the enormous expenditures of time and energy which students devote to social and athletic activities, it is difficult to believe that examinations are in general more injurious to the health of students than many other activities in which they are permitted and even encouraged to engage. It is undoubtedly true that in the case of some individual students examinations make a heavier drain upon their energies than they should be asked to bear. However, here again it should be real- ized that this criticism is not fundamentally the criticism of examinations but rather of setting very long examinations or of placing extreme emphasis upon them by making the final "grade" depend wholly or very largely upon the examination "grade." 26 9. Time devoted to examinations a profitable investment. After a teacher has instructed his students to the best of his ability some will have failed to learn; some will have learned certain details but neglected others; and a few may have mastered all of the assignment. It becomes necessary at some time for each teacher to check up the work of his students, in order to know which students need supplementary or remedial instruction. Only by thus ascertaining the extent to which his students have achieved can the teacher make his instruction fit the needs of his -class. For these reasons the time which is devoted to the construction of an examination, giving it, and rating the papers should be considered a legitimate expendi- ture of the teacher's time; in fact, in some cases it is doubtless among the most profitable of the time expenditures which the instructor makes. 4 In fact, the instructor may very profitably do more than simply rate the examination papers. In the case of students receiving low marks, their answers may be studied in order to ascertain in what ways and why they have failed. Such information will frequently be exceedingly illuminating to the instructor. This argument applies more forcibly to examinations or "tests" given at intervals throughout the course than to final examina- tions, but even in the latter case the instructor will frequently receive many helpful suggestions for improving his teaching. 4 This argument applies only to a very limited extent to examinations given at the end of a course. 27 CHAPTER III METHODS OF IMPROVING EXAMINATIONS 1. Increasing objectivity in marking of examination papers. The principal source of error in the 'grades" assigned to examination papers is the subjectivity of the marking. This fault has been sufficiently demon- strated. It is, however, possible to increase materially the objectivity of written examinations. This may be accomplished in two ways: (1) improving the examination questions, and (2) introducing improved meth- ods of marking examination papers. A. Increasing the accuracy of "grades" by improving examination questions. (1.) General Methods. The questions should be so stated that all pupils will interpret them in the same way. When a question is am- biguous the performances which different pupils give are not comparable. Pupils who are so unfortunate as to interpret a question in a way not in- tended by the instructor, can receive no credit for their answers unless the instructor recognizes the various interpretations. Even in this case all answers are not comparable because some are judged with reference to one interpretation and others with reference to another. A striking illustration of the difficulty of eliminating all ambiguous questions occurs in Form 7 of the Army Alpha Intelligence Test. The exercise asks the person taking the test to tell whether "cleave" and "split" mean the same or opposite. Both answers are correct because "cleave" has two meanings which are exactly opposite. Apparently this ambiguity was not detected although the test was prepared with care by well trained persons. How- ever, in most cases, teachers will be able to avoid ambiguous exercises if they are subjected to a careful scrutiny. All questions should be so stated that they will be clearly understood. Pupils should be given definite instructions concerning the methods of work to be employed. They should be told whether they are to work rapidly or slowly. In some cases it is advisable for the examiner to say at appropriate intervals, "It is now time to begin on question ." Perhaps students should not be forced to comply with these directions, but they will tend to prevent an improper distribution of time over the list of questions. In questions in which pupils are asked to "discuss" or "explain" the completeness of the discussion desired should be indicated. In arithmetic pupils should be given directions concerning the form in 28 which their work is to appear. If it is to be copied in a particular form all should do this; if it may be left in the form used to calculate the answer all pupils should leave it in this form. From the standpoint of securing accurate measurements it is best to set an examination which is relatively hard. If more than 10 percent of the class make perfect "grades" the examination is "too easy," i.e., it is not difficult enough to yield accurate measures of the achievements of bright pupils. Unless the class is a selected group it is not possible for more than 10 per cent of the best pupils to possess exactly the same degree of achievement. The examination should be long enough so that all pupils will be employed for practically the entire period. The examination is not only useful as a device for measuring the improvements of pupils, but also valuable as a school exercise. No pupil should be permitted to waste any of the time set aside for the examination. (2.) New examinations: use of questions permitting only one correct answer. The marking of examination papers becomes subjective when the scorer is asked to exercise judgment in determining the credit to be given for the pupil's performance. In spelling, a pupil's performance is either right or wrong and our practise is to allow no credit for a per- formance which is not entirely correct. Thus, the marking of an examina- tion paper in spelling is highly objective because the scorer is not asked to exercise judgment concerning the quality of the performance. A high degree of objectivity may be attained also in the operations of arithmetic by agreeing to give no credit for examples partly correct. In other subject matter fields we are accustomed to ask some questions which call for specific facts and, hence, admit of only one correct answer. It has been claimed that such questions appeal only to the pupil's memory and that they do not yield an index of his acquaintance with principles and of his ability to organize and apply his knowledge. In order to reach this phase of his education we have asked the pupil to "discuss," "tell why," "compare," etc. When a pupil is asked to formulate an answer consisting of one or more sentences it is difficult or impossible to classify the performance as either right or wrong. When scorers are asked to exercise judgment in evaluating such performances, wide differences of opinion exist. In order to overcome this subjectivity of marking, it has recently been proposed that we can measure a pupil's acquaintance with principles and ideas by means of certain types of exercises which permit of only one correct answer. Such exercises have been used in our standardized educational tests, and it is now suggested that they be used by teachers in their examinations. Four types of such exercises will be considered. 29 a. True-false exercises. Instead of asking the pupil to formulate an answer in response to a question, we ask him to tell whether a given statement is true or false. For example, instead of the question, "Why did the Puritans come to America in the 17th Century?" we ask whether the following statement is true or false, "The Puritans came to America in the 17th century seeking wealth." The pupils may give his answer to this true-false exercise by writing a plus sign if he considers it true and a minus sign if he considers it false. The mental processes required in answer- ing such exercises do not appear to be the same as those which occur in answering questions of the usual type. However, experimental evi- dence indicates that there is a high correlation between the scores which pupils make on a true-false examination and their acquaintance with ideas and principles as determined by our ordinary examinations. Directions for constructing a true-false examination. 1 1. In con- structing true-false exercises, one may prepare a list of statements which cover in some detail the portion of the subject on which the pupils are to be examined. After such a list has been prepared, some of the statements can easily be changed so that they are false. The untruth of a statement should not be too obvious or it will be worthless for testing. Also state- ments should be selected which require an acquaintance with the subject in order to determine their truth or falsity. 2. In a true-false examination some of the statements should be true and some false, and the number of true statements should approximate the number of false statements. They should be arranged so that there is no regular sequence between true statements and false statements. 3. Since the pupil can give his responses very quickly, the examina- tion should consist of not less than 50 statements. A true-false examina- tion of 100 statements can be given in the time usually devoted to an ordi- nary examination. 4. The examination should be mimeographed or printed so that each pupil will have a copy. He may give his answers in the margins of the sheets, or, if it is desired to use the same set of papers with another group of pupils, he may be given a sheet of paper on which there are numbered blanks. The pupils will then be asked to record in the blanks their answers to the corresponding exercises. A less desirable plan, which may be followed when it is not possible to secure mimeographed copies of the examination, is to read the statements to the pupils and have them record their answers in numbered blanks. The disadvantage of this plan is that x Two sample true-false examinations are given in the appendix. 30 the pupils do not have a satisfactory opportunity to study the statements. Also the class may give some indication of the answer if a statement appeals to them as being ridiculous. 5 . The pupils should be given specific directions in regard to answering exercises about which they are uncertain. One writer 2 has suggested that the pupils be instructed to guess when uncertain concerning the truth or falsity of the statement. Another writer 3 who has used this type of examination instructed the pupils as follows: "First, go through the list quickly and mark all that you know for certain, then go back and study out the harder ones. Do not guess; the chances are against you on guess- ing. Don't endanger your score by gambling on those questions about which you know nothing." This second procedure is probably the better. The scoring of a true-false examination. Since only two responses are possible, it is obvious that a pupil may give a correct response as the result of chance. In order to take this possibility into account, a pupil's score on an examination of this type is the number of exercises answered correctly minus the number answered incorrectly. Exercises not at- tempted are not counted. . b. "Yes" and 'no" exercises. These exercises are just what their name implies. Each is answered correctly by "yes" or "no." No other answers are appropriate. The administration and scoring are similar to those of the "true-false" exercises, of which they may be considered a special type. In constructing "yes" and "no" exercises care must be taken to avoid questions for which either of the two possible answers is absurd. ^ In such case, obviously, a correct answer will not furnish satisfactory evidence of the pupil's achievement in a school subject. Approximately ^ the ^same number of exercises should call for the answer of "yes" as for "no," and they should be arranged in no regular order of sequence. c. Recognition exercises. Exercises in which the pupil is asked to choose from a number of proposed answers have also been used to make the scoring objective. This type of question has been called the "recog- nition exercise." It has been used in standardized silent reading tests and in a number of our group intelligence tests, and may be illustrated by the following: "The first president of the United States was: Christopher Colum- bus, Benjamin Franklin, George Washington, Thomas Jefferson." 2 McCall, W. A. "A new kind of school examination," Journal of Educational Research, 'Woodmen*!). "The measurement of college work," Educational Administration and Supervision, 7: 301-34, September, 1921. 31 The correct answer may be indicated by an underline or by some other mark. If only one of the answers proposed may be considered correct the marking of such exercises will be highly objective. Directions for constructing a recognition examination. 4 In con- structing this type of examination none of the proposed answers should be too obviously incorrect. An exercise can yield an indication of a pupil's achievement only when he is forced to exercise judgment in de- termining which of the proposed answers is suitable. For example, the illustrative exercise given above would be practically worthless for testing purposes if all the names, except that of George Washington, were of per- sons living today or of persons who had had no connection with our na- tional life. In applying this type of exercise to the field of arithmetic the proposed answers should include erroneous answers which pupils are in- clined to give. For example, if the exercise called for the quotient of two fractions, one of the proposed answers should be the product of the frac- tions and another should be their sum, and perhaps another should be the fraction obtained by taking the sum of the numerator as a new numerator and the sum of the denominator for a new denominator. When the correct answer is included in a group of such answers as these, the pupil who does not know how to find the quotient of such fractions will be unable to determine the correct answer except as a matter of chance. On the other hand, if all of the answers except the correct one were integers or were so large that they were obviously incorrect, a bright pupil who knew nothing about division of fractions would be able to select the cor- rect answer. The correct answer should not always be found in the same position; sometimes it should be first, sometimes last, and sometimes in an intermediate position. As in the case of the true-false examination, a recognition examination should consist of a large number of exercises. Examinations of this type should be mimeographed or printed and a copy given to each pupil. He should be given definite instructions con- cerning the method of work to be followed. It is probably best to instruct him to work through the test rapidly, answering those exercises about which he is certain. He should then go back over the list and try the more difficult ones. Not fewer than four proposed answers should be given. When this is done the chances are slight that a pupil will give the correct answer by guessing. The pupils may be instructed to guess if they do not know, since the chance of success by guessing is slight. The pupil's score on an examination of this type may be taken as the number of exercises done correctly. 4 A sample recognition examination is given in the appendix. 32 A somewhat unusual but interesting type of recognition exercise is that described as a "matching contest." In this a pupil is given two lists of statements, the first numbered 1, 2, 3, 4, 5, etc., the second marked A, B, C, D, E, etc. In the second list, there is a statement which corres- ponds in meaning to a statement in the first list and the pupil is to pair these statements, marking by the number of the first list the letter of the corresponding statement of the second. For example, in the exercises given below: by the date marked (5) 1898 we place the letter B to indicate the event for which that date is significant. It is difficult to construct such examinations so that they will require reasoning on the part of the student. Their most important use is in the elementary school for rapid drill in certain phases of some subjects, such as definitions in geography and grammar, etc. The following exercises, selected from the Spokane United States History Test, illustrate the use of such an examination in linking a certain date or person with the corresponding event. Lincoln's Emancipation Proclamation. Spanish American War. Beginning of World War. Declaration of Independence. United States entered World War. Election of Washington as President. War with Mexico began. Invention of the cotton gin. Lee's surrender at Appomattox. Beginning of Civil War. Destroyed Spanish fleet in Manila Bay. Invented the telephone. Leading Confederate General. Wrote the Declaration of Independence. Invented the steamboat. Commanded allied armies in the World War. W 7 as President during the Civil War. Commanded American Forces in the World War. I. Was Revolutionary patriot, author, and in- ventor. J. America's most famous inventor. 1. 1846 A. 2. 1865 B. 3. 1863 C. 4. 1917 D. 5. 1898 E. 6. 1789. F. 7. 1792. G. 8. 1776. H. 9. 1861. I. 0. 1914. J- 1. Foch A. 2. Lincoln B. 3. Fulton C. 4. Dewey D. 5. Pershing E. 6. Bell F. 7. Edison G. 8. Jefferson H. 9. Lee 10. Franklin 33 d. Completion exercises. Pupils may be asked to fill in the words omitted from statements. The following illustrates this type of exercise: "Fill in the blanks. Revenue for paying the war debts of the states after the Revolutionary War was provided by a and by due largely to influence. The slavery question in such states as should be carved from the Louisiana Territory was temporarily settled in by the " Directions for constructing completion exercises. 5 A completion exercise should be constructed so that no suggestion will be given of the correct words to be written in the blanks. Furthermore, the facts to be supplied should be important. The best plan is to prepare a list of im- portant statements and principles covering the portion of the subject over which the pupils are to be examined and then from these statements to strike out a certain significant word or phrase. In every case, if it is possible, the words omitted should be such that only one answer will be correct. Since little writing is required of the pupils they may be asked to fill in as many as one hundred blanks. The scoring of completion exercises is not as highly objective as in the two types mentioned above. Pupils will tend to write a variety of words in the blanks. Different words may have almost the same meaning, and frequently the scorer will be compelled to determine whether the mean- ing of some word is sufficiently near to that of the correct answer to justify giving the pupil credit for having answered the exercises correctly. How- ever, by a careful selection of statements and of the omitted words, this subjectivity may be greatly minimized. For example, in the sentence, "The first Continental Congress was held in ," only one possi- ble word can be correct. In using completion exercises it is necessary to provide each pupil with a mimeographed or printed copy of the examina- tion. The pupil's score is the number of blanks filled in correctly. e. Other advantages of the "new examination." Examinations in which true-false questions and the other types of exercises described on pages 29-34 are used have been called "new examinations." Such ex- aminations have certain advantages in addition to increasing the ob- jectivity of the marking of the papers. There will be a large saving of time both for pupil and teacher. The pupil is called upon to do little or no 6 A sample completion examination is given in the Appendix. 34 writing in giving his answers, and he is, therefore, able to respond to a large number of exercises. The scorer will have little or no occasion to exercise judgment as he will need only to note the brief responses given by the pupils. Thus, the labor of scoring will be greatly reduced. The saving of time in the giving and scoring will more than offset any additional time that may be expended in the construction of the "new examination." Another advantage is that the examination can be made more compre- hensive. It is traditional for examinations to consist of ten questions. A few are limited to a smaller number, and only occasionally do we find ex- aminations consisting of more than ten questions. The pupils cannot write upon a large number of questions in the time allowed. Thus, the scope of traditional examinations is necessarily narrow. We have stated that true-false examinations should include not fewer than 50 exercises. Examinations consisting of completion exercises or recognition exercises should have a corresponding length. Thus, the "new examination" may be made distinctly larger in scope. Children are apt, also, to be interested in the new examinations which are distinctly different from the usual type of school exercise, and which make an appeal somewhat in the nature of a game. In being relieved of much writing, which especially in the case of young children, amounts in itself to a laborious exercise, they are probably less fatigued, and are able to devote all their energies to the process of thinking. f. Limitations of the "new examination." It does not appear likely that the "new examination," consisting of the type of exercises we have described, should replace entirely the traditional type of examination. The "new examination" cannot be used in mathematics, except to a limited extent. It cannot be used at all in English composition. The following questions taken from Hahn's Scale for Measuring the Ability of Children in History appear to require mental processes distinctly different from those for which the "new examination" calls. "State points of similarity between the position of the United States in 1812 and her position in 1912. "Arrange the following events in order of cause and effect: Force Bill, the Carpetbaggers, 15th Amendment, Negro Rule in some of the Southern States, Ku Klux Klan. "Name the presidents of the United States since 1892." Furthermore, it is likely that pupils would miss valuable experience and training if they were not asked at times to "compare," "explain," "discuss," "define," or "tell why." This is also true of questions in which they are asked to summarize material presented on a topic or to apply 35 certain principles that have been presented. Hence, it is difficult to con- ceive of the "new examination" being a complete substitute for the tra- ditional examination. 6 B. Rules for marking examination papers. The marking of ordinary examination papers may be made more objective by following a few simple rules. It is advisable often to use the "sorting method." According to this procedure the papers as they are read are sorted into piles, the best ones being placed in the first pile, the next best in the second, etc. Five distributions will, in most cases, prove sufficient, but as the reading goes on, if papers are found which do not appear to belong in any of the piles, additional ones may be started. After all of the papers have been distributed they should be re-read, taking one pile at a time, and the papers in it compared with each other. If these papers do not possess approximately the same value changes in the sorting may be made. After the examiner has satisfied himself with reference to the sorting, "grades" may be assigned to the papers in the different piles. Instead of actually sorting the papers they may be marked with a symbol to indicate the distribution, i.e., mark the best papers 5, the next in quality 4, and so on. However, when this is done, the papers on which the same symbol has been placed should be brought together and re-read. When a large number of papers are to be rated it is well to consider one question at a time. In doing this it is desirable to follow essentially the sorting method, or, at any rate, to assign marks to the answers as read. All of the papers should be read for the first question and a mark assigned it. Five seems a good number for approximately perfect answers, 3 for average, and 1 for very poor answers. The marks of 2 and 4 and even fractional numbers may be used as appropriate descriptions of quality. After the marks on this first question have been assigned, those papers which have been given the same marks may be brought together and re- read. The same procedure should be followed in reading each of the other questions. It is well to shuffle the papers in proceeding from one question to another so that the reader will not be influenced by previous marks. When the questions have all been read, the marks assigned to each paper may be totaled, and on this basis a final distribution of the papers made. Although this method may seem to require an almost prohibitive amount of time on the part of the scorer, the increased objectivity of the "grades" 6 In the appendix questions selected from actual examinations given in Illinois schools are listed. These questions in their demand for explanation, discussion, organization of material, exercise of judgment, etc., show that these traditional examinations call for a distinct type of ability not required by any form of the "new examination." 36 assigned may justify its adoption. Especially is it advised in those cases where emphasis needs to be placed on the accuracy of examination grades. The marking of examination papers within a school system may be made more objective by formulating rules with reference to the credit to be given for certain types of answers. For example, uniform rules may be agreed upon with reference to the credit given for correct principle when the answer is wrong, the credit given for answers partially right, the penalty, if any, for misspellings, grammatical errors, and poor handwriting. In case the same examination is given by two or more teachers some specific rules may be formulated for the particular examination. If time permits, the objectivity of the marks may be increased by having the papers rated independently by two or more teachers and using the average of the marks assigned. A modification of this procedure is to have one teacher rate all of the answers to certain questions and another teacher the answers to other questions. F. J. Kelly 7 describes an experiment which is indicative of the in- crease in the objectivity of the marking of examination papers when uni- form rules are adopted. Six fifth-grade teachers gave the same examina- tion in arithmetic to their pupils. Each teacher marked the papers for her own pupils but did not record the marks on the papers. The super- intendent then asked a teacher, who was unusually systematic in marking examination papers, to prepare a set of rules to be followed in the marking of these papers. After she had done so, she marked all of the papers in accordance with this plan. Then the teachers who had first marked the papers marked them a second time following her plan. This provided two marks for each paper given by the classroom teacher, the first without following any systematic plan and the second given in accordance with the rules formulated. Each of these marks was compared with the mark given by the one teacher who marked all of the papers. In Table I, the six teachers are designated by the letters A, B, C, D, E, and F. The table is read as follows: When no rules were followed teacher A marked one paper 16 to 20 points lower than the "judge," one paper 7 points lower, two papers 4 points lower, two papers 2 points lower, agreed with the "judge" on one paper, etc. The differences between the marks given when the class-room teachers followed no rules and when they followed the rules formulated are very striking. In the first instance the marks assigned by the teacher agreed with those assigned by the "judge" in only 5.5 percent of the cases, while in the second instance they agreed in 63.5 percent of the 7 Kelly, F. J. "Teachers' marks," Teachers College, Columbia University, Contri- butions to Education, No. 66, p. 83, 1914. 37 Table I. Distributions of Differences beween Two Sets of Teachers' Marks on Fifth-Grade Arithmetic Papers — First, without any Effort to Unify the Methods Used, and Second, by a Common Standard (after Kelly) Range of Without Standard With Standard Differences A B C D E F Total A B C D E F Total 21 or more 2 1 .... 2 1 2 'Y i i 2 2 2 1 4 1 1 1 .... .... 2 3 2 1 3 2 4 4 5 5 4 7 10 11 8 18 12 14 16 13 17 10 7 9 6 4 3 2 3 3 1 2 2 5 16 to 20 1 15 14 13 1 1 1 12 1 11 1 1 1 1 10 1 9 1 2 3 1 1 1 1 2 1 2 1 2 3 2 5 1 2 1 8 1 7 1 1 2 1 2 4 2 5 4 5 1 .... 3 1 1 2 1 1 6 5 2 2 2 1 4 4 2 3 2 4 2 .... 1 1 1 1 1 1 3 1 2 2 .... 2 3 6 2 2 4 2 1 1 .... 1 26 3 2 3 17 16 139 13 5 6 8 4 2 3 1 1 4 16 2 3 2 3 1 1 1 3 5 16 2 1 7 1 29 1 2 1 2 4 2 22 5 1 1 3 30 .... 2 2 1 1 2 1 2 6 9 5 2 1 3 4 1 3 2 1 1 5 6 7. . 8.. 9 1 1 10.. 1 11 1 1 12.. 1 1 .... 1 1 1 13.. 1 14 15 1 1 1 1 16 to 20.. ? 1 3 1 Totals Medians 35 +3 41 35 + 1 36 +6 39 -1 33 -4 219 + 1 35 41 35 36 39 33 219 38 cases. This indicates a very marked increase in the objectivity of the marking of the papers. 2. Increasing the objectivity of the norms for translating examination scores into school marks. The use of examination scores, when expressed in terms of percents, as school marks, introduces errors which are as serious as the errors arising from the subjectivity of the marking of examination papers. (See page 9). For standardized educational tests we have objective norms with which to compare the scores. Since a single ex- amination is given to only relatively few pupils, it is not possible to secure objective norms by the procedure followed in standardizing educational tests. It is, however, possible to reduce greatly the errors in school marks arising from this source. The greatest assistance in this connection will be secured by recognizing explicitly that the scores assigned to ex- amination papers and school marks are different. Occasionally they may coincide numerically, but they are fundamentally different. When this distinction is recognized it will be possible for a teacher to introduce ob- jective elements into the norms that are used as a basis for translating examination scores into school marks. A teacher should first determine whether or not his class is typical. The giving of a general intelligence test will be helpful in this connection. A distribution of their I. Q.'s may be considered a very reliable index of the composition of the group. If the median I. Q. of a class is below 100 the teacher may know that he has poor pupil material. If the median I. Q. is above 100 he may know that the class consists of better pupils than the average. If there is a relatively large number of low I. Q's it may be ex- pected that there will be an unusually large number of low "grades." Thus, by means of the intelligence quotient and in other ways, the teacher may come to know the general status of his class. The method to be followed in translating the examination scores into school marks may be illustrated as follows: Suppose that a set of ex- amination papers has been rated in terms of percents. The score placed upon a paper simply describes the percent of the examination which, in the judgment of the scorer, the pupil has answered correctly. The scores should then be assembled in some such way as that indicated below: 58 56 69 47 55 69 35 44 55 68 75 27 34 42 52 63 74 25 32 40 50 60 70 39 There are 23 pupils in this class. The scores have been arranged in six groups. It would be better to make five groups if the school uses a system of five marks. The median point score is 55. If the class is an average one this median score of 55 should be translated into the median or average "grade'' which the school recognizes. If the "grades" are reported in terms of percents and the passing mark is 75 the average grade will usually be approximately 85. If the class is known to possess superior ability the median score of 55 should be translated into a higher "grade." On the other hand, if the class is known to be decidedly below average in ability, 55 should be translated into a lower grade, perhaps as low as 77 or 78. In an extreme case it might be translated even into the passing grade of 75. The translation of the median score into a grade furnishes a basis for translating the other scores. In general, the scores at the lower end of the distribution will be translated into "grades" below passing. The percent of pupils who receive such "grades" will vary widely with different classes. There is a somewhat prevalent opinion that the normal probability curve fixes the percent of pupils who should receive "grades" below the passing mark. This is a mistaken notion. The normal probability curve tells us nothing concerning the percent of pupils who should receive any "grade." It is true that from a statistical point of view there are certain divisions of the curve which are convenient. If the base line of the curve is limited to a length equal to five times the standard deviation (5a) and this distance is divided into intervals of l