6 \S-73 
 
 UNIVERSITY OF ILLINOIS BULLETIN 
 
 Issued Weekly 
 Vol. XIX January 30, 1922 No. 22 
 
 [Entered as second-class matter December li, 1912, at the post office at Urbana, Illinois, under the 
 Act of August 24, 1912. Acceptance for mailing at the special rate of postage provided for in 
 section 1 103, Act of October 3, 1917, authorized July 31, 1918.] 
 
 BULLETIN NO. 8 
 
 BUREAU OF EDUCATIONAL RESEARCH 
 COLLEGE OF EDUCATION 
 
 A CRITICAL STUDY OF CERTAIN 
 SILENT READING TESTS 
 
 By 
 Walter S. Monroe, Director 
 
 Price 50 Cents 
 
 PUBLISHED BY THE UNIVERSITY OF ILLINOIS 
 URBANA 
 
 ^nog-faph 
 
BULLETIN NO. 8 
 
 BUREAU OF EDUCATIONAL RESEARCH 
 
 COLLEGE OF EDUCATION 
 
 A CRITICAL STUDY OF CERTAIN 
 SILENT READING TESTS 
 
 t 
 
 By 
 
 Walter S. Monroe, Director 
 
 PUBLISHED BY THE UNIVERSITY OF ILLINOIS 
 URBANA 
 

 LIBKARY OF CONGWFSS 
 DOCUMENT 
 
TABLE OF CONTENTS 
 
 PAGE 
 
 Preface 4 
 
 The Measurement of Silent Reading Ability 5 
 
 The Problem 5 
 
 The Data Collected 6 
 
 The Performances Required of a Pupil 9 
 
 Description of Pupils' Performances 13 
 
 Scoring Reproductions 15 
 
 The Idea-Counting Method 16 
 
 Brown's Method of Idea Counting 17 
 
 The Word-Counting Method 19 
 
 Subjectivity of Describing Reproductions 20 
 
 Constant Errors and Variable Errors 20 
 
 Summary for Describing Reproductions 25 
 
 Scoring Answers to Questions 25 
 
 Describing the Quality of Compositions 25 
 
 Time Required for Scoring Test Papers 25 
 
 Average Scores and Standard Deviations 26 
 
 Equivalence of Duplicate Forms 29 
 
 Relation of Vocabulary to Difficulty 30 
 
 Formation of Composite Scores 31 
 
 Reliability 32 
 
 Methods of Determining Reliability 32 
 
 Probable Error of r Due to Sampling 33 
 
 Reliability of Tests Studied 34 
 
 Discrimination 37 
 
 Comparison with Teachers' Ratings 39 
 
 Correlation of Comprehension with Memory 39 
 
 Corrected Coefficients of Correlation 41 
 
 Correlation of Comprehension with Vocabulary 42 
 
 Correlation of Cancellation Scores with Measures of Rate 
 
 of Reading 44 
 
 Correlation of Comprehension with Written Composition 44 
 
 Inter-correlation between Tests 46 
 
 Correlation of Single Tests with Composites 50 
 
 Summary of Conclusions 51 
 
 Correlation with Composites 52 
 
PREFACE 
 
 In the field of silent reading, as well as in the fields of other 
 school subjects, the number of available educational tests has been 
 increased so that one desiring to use a test is confronted with the 
 necessity of making a choice. If such a choice is to be made intelli- 
 gently it Is necessary to have at hand experimental data with refer- 
 ence to the reliability and validity of the tests considered. The study 
 which is reported in this monograph was undertaken for the purpose 
 of securing such data with reference to certain silent reading tests. 
 The report is presented in hopes that users of silent reading tests 
 will find the information that it contains helpful in making an intel- 
 ligent selection of educational tests in this field. The monograph 
 will doubtless also be of interest to students in the field of educa- 
 tional measurements. 
 
 Walter S. Monroe, 
 
 Director, Bureau of Educational Research. 
 
A CRITICAL STUDY OF CERTAIN SILENT 
 READING TESTS 
 
 The measurement of silent reading ability. The scores yielded 
 by silent reading tests may fail to be true measures of silent reading 
 ability for two reasons. First, the scores may not be reliable or ac- 
 curate. A score is lacking in reliability when two applications of a 
 test or of duplicate forms of it do not yield approximately the same 
 score when administered to the same pupils, as far as possible, 
 under the same conditions. Included in this is any lack of objectivity 
 in the scoring of the test. Second, the performance which a pupil 
 gives on a silent reading test may depend upon other factors in such 
 a way that it is an index of these factors rather than of silent read- 
 ing ability. For example, when a pupil answers questions from 
 memory his answers may be influenced to such an extent by his 
 ability to remember that his performance is not a truthful index of 
 his ability to read silently. 
 
 Two aspects of the activity of silent reading may be recognized. 
 First, the reading mechanism consists of perception, eye-movement 
 habits, etc. The rate of silent reading is largely dependent upon this 
 mechanism and hence any measure of rate is an index or symptom 
 of the quality of the mechanism. Second, the thought-getting or 
 comprehension aspect of silent reading involves the higher mental 
 processes. The quality of this is indicated by the comprehension 
 scores. Comprehension is not entirely independent of the mechan- 
 ism of silent reading, but, if sufficient time is allowed, pupils who 
 possess poor reading mechanism may stand high in thought-getting. 
 The problem. The problem of this study is to ascertain the 
 reliability and, so far as possible, the function and validity of certain 
 silent reading tests. These tests, as will be shown later, differ in 
 the performances which are required of the pupils. They also differ 
 in other respects. Their titles suggest that all of the silent reading 
 tests included in this study are designed to measure silent reading 
 ability. The fact that they differ widely in certain respects suggests 
 the possibility that no two of them measure the same type of read- 
 ing ability, or at least that they do this with different degrees of 
 validity. The study has been restricted to tests which yield some 
 measure of the rate of reading as well as a measure of comprehen- 
 sion in order that the measurement of both phases of silent reading 
 
activity might be studied. With one exception, the tests which have 
 been used have dupHcate forms. In addition to the silent reading 
 tests, certain jother tests were given to the same pupils, because it was 
 thought that the scores yielded by them might assist in the analysis 
 and interpretation of the scores yielded by the silent reading tests. 
 
 The data collected. Through the courtesy of Superintendent W. 
 W. Earnest and certain teachers of the Champaign Public Schools, 
 the tests chosen for this study were given in the spring of 1920 to 
 a number of pupils in the fourth and seventh grades. All of the 
 tests were administered by Miss Dora Keen, at that time a research 
 assistant in the Bureau of Educational Research. Care was exer- 
 cised to secure as nearly uniform testing conditions as can be ob- 
 tained in the ordinary schoolroom. The lapse of time between the 
 giving of the different forms of the same test was made as nearly 
 equal as possible for the different groups. Only in rare instances 
 were tests given after recess in the afternoon or during the afternoon 
 session on Friday. The tests were given to all pupils in four rooms 
 in both the fourth and seventh grades. The total number of pupils 
 tested in each grade was approximately 140. The study is, however, 
 based upon the records of only those pupils who took all of the tests. 
 The number of complete records in the fourth grade is 80 and in the 
 seventh grade, 91. 
 
 The following tests were given in the fourth grade: 
 
 1. The Courtis Silent Reading Test No. 2^, Form i, "The 
 Kitten Who Played May Queen," and Form 3, "The Kitten Who 
 Caught a Fish." 
 
 2. Brown's Silent Reading Test^, Form i, "The Long Slide," 
 and Form 2, "A Morning Adventure." 
 
 3. Monroe's Standardized Silent Reading Test P, Forms 
 I, 2, and 3. 
 
 'Courtis Silent Reading Test No. 2. Forty-sixth Annual Report. Kansas City, 
 Missouri: Board of Education, 1917. pp. 79-85. 
 
 ^Brown, H. A. "The Measurement of Ability to Read." A Manual of Direc- 
 tions Concerning Giving and Scoring of Reading Tests, Statistical Treatment of 
 the Data and Diagnosis of School Class and Individual Needs. Concord: New 
 Hampshire Department of Public Instruction (in cooperation with the General 
 Education Board). Bureau of Research Bulletin No. i, Second Edition, 1916. 
 PP- 57- 
 
 "Monroe, W. S. ''Monroe's Standardized Silent Reading Tests." Journal of 
 Educational Psychology, 9:303-12, June, 1918. 
 
4- Fordyce's Scale for Measuring Achievement* in read- 
 ing Test No. I, "Narcissus." 
 
 5. Experimental Reproduction Test I, Form i, based on pages 
 84 and 85 of the supplementary reader, "The Strike at Shane's"^, 
 and Form 2 based on pages 6 and 7 of the same publication. The 
 passage for Form i contains 370 words and that for Form 2, 395 
 words. In administering these tests the pupils read from the sup- 
 plementary reader. The exact place of beginning had been marked 
 in each copy. Also the end of the passage to be read was indicated. 
 
 6. Cross-Out Silent Reading Test I, Form i and Form 2. This 
 is an experimental silent reading test. In a passage of rather simple 
 reading material, words were substituted, which did not agree with 
 the meaning of the preceding words in the sentence. A pupil is asked 
 to cross out the words which do not fit. With the exception of the 
 substituted words, the selection is a connected story. 
 
 7. Vocabulary Test. The words of this test are those used by 
 Terman and Childs. The form of the test is that proposed by 
 Whipple". 
 
 8. Cancellation Test, "a-t" and "e-r^ 
 
 9. Memory, "How Mr. Lincoln Helped the Pig."^ 
 The following tests were given in the seventh grade: 
 
 1. Starch's Silent Reading Test No. 6 and Test No. 7.^ 
 
 2. Monroe's Standardized Silent Reading Test II, Forms i, 
 2, and 3. 
 
 3. Fordyce's Scale for Measuring Achievement in Reading, 
 Test No. 2, "Spirit of Spring." 
 
 *Fordyce, Charles. "A Scale for Measuring the Achievements in Reading." 
 The University Publishing Company, Lincoln, Nebraska, and Chicago. 1916. 
 
 '^■'The Strike at Shane's." (Gold Mine Series, No. 2.) Boston: American 
 Humane Education Society, 1908. pp. 91. 
 
 (A supplementary reader for the fourth grade which has as its lesson kindness 
 to domestic animals.) 
 
 "Whipple, G. M. Manual of Mental and Physical Tests, Complex Processes 
 Chapter 12. Baltimore: Warwick and York, 19 14. 
 
 'This test is described by Whipple in the Manual of Mental and Physical Tests, 
 Simpler Processes, p. 311. 
 
 ^Whipple, G. M. Manual of Mental and Physical Tests, Simpler Processes, 
 Pages 207-10. 
 
 'Starch, Daniel. The Measurement of Efficiency in Reading. Journal of Ed- 
 ucational Psychology, 4:1-24, 1915. These tests were used as duplicate forms. 
 
4- Experimental Reproduction Test II, Form i, based on 
 pages 6 and 7 of the supplementary reader, "Old English Heroes,"^" 
 and Form 2, based upon pages 8 and 9 of the same publication. 
 The passage for Form i contains 662 words, and that for Form 2, 
 611 words. 
 
 5. Cross-Out Silent Reading Test II, Form i and Form 2. 
 This test Is similar to the Cross-Out Silent Reading Test used in the 
 fourth grade but is based upon more difficult material. 
 
 6. Pressey Silent Reading Test for Grades VI, VII, and VIII, 
 Form I and Form 2. This is an experimental test. 
 
 7. Vocabulary Test. This is the same test as that used in the 
 fourth grade. 
 
 8. Cancellation Test, "a-t" and "e-r." This is also the same 
 as that used in the fourth grade. 
 
 9. Memory Test, "Marble Statue."" 
 
 10. Composition Test. The Willing Composition Scale^^ and 
 the directions which accompany it were used. 
 
 In addition to the above tests a rating for ability in silent read- 
 ing was secured from the teachers. To guide them in making this 
 rating, the teachers were given the following directions: 
 
 Think of all the fourth (seventh) grade pupils with whose silent reading 
 ability you have ever become acquainted from the best to the poorest. Compare 
 each child in your present class with this distribution of pupils. Give a pupil a 
 rating of S if he has very superior ability in silent reading equalled only by about 
 seven out of every hundred, or 7 percent of fourth (seventh) grade pupils. Give 
 him a rating of 4 if he has superior ability or ability above the average, yet is ex- 
 celled by the very superior group. About 24 out of every hundred, or 24 percent 
 of fourth (seventh) grade pupils, will fall in the superior group. Give him a rating 
 of 3 if he possesses average ability, i. e., ability which lies somewhere close to the 
 middle of the difference between the very best pupil and the very poorest. About 
 38 out of every hundred, or 38 percent of fourth (seventh) grade pupils, will fall 
 in this average group. If the pupil is below the average in ability to read and yet 
 
 ^"Bush, Bertha E. Old English Heroes. (Instructor Literature Series — No. 
 116.) Danville, N. Y., and Chicago: F. A. Owen Publishing Co., and Hall and 
 McCreary, 1909. Pp. 31. 
 
 This is a supplementary reader suitable for the upper elementary grades. It 
 contains brief sketches of the lives of Alfred the Great, Richard the Lion-Hearted, 
 and the Black Prince. 
 
 "Wliipple, G. M. Manual of Mental and Physical Tests, Simpler Pro- 
 cesses, Pages 107-10. 
 
 "Willing, M. H. Measurement of Written Composition in Grades IV to VIII, 
 English Journal, 7:193-202, March, 1918. 
 
does not equal the poorest you have ever known give him a rating of 2. This group 
 is called inferior and will contain about 24 out of every hundred, or 24 percent of 
 fourth (seventh) grade pupils. Give the pupil a rating of i if he is very inferior 
 in ability to read so that he is as poor or very nearly as poor as the poorest pupil 
 you have ever known. About 7 out of every hundred, or 7 percent of fourth 
 (seventh) grade pupils, will fall in this very inferior group. 
 
 The above directions do not mean that you will necessarily be obliged to give 
 7 percent of your class a rating of 5 ; 24 percent, a rating of 4; 38 percent, a rating 
 of 3; 24 percent, a rating of 2; and 7 percent, a rating of i. They do mean, however, 
 that a large number of pupils, a number running up into the hundreds, can be 
 divided in exactly this manner, i. e., 7 percent, very superior; 24 percent, superior; 
 38 percent, average; 24 percent, inferior; and 7 percent, very inferior. You are to 
 think of all the pupils you have ever known from the best to the poorest and by 
 comparison give each pupil in your present class the rating he would receive if 
 he were included with all the pupils you have known and the entire number should 
 be rated in the above manner. 
 
 The performances required of a pupil. All of the silent reading 
 tests in the above list are designed to measure the ability to read 
 
 silently. However, they require a variety of performances from the 
 pupil. In the Courtis Silent Reading Test No. 2, the pupil is re- 
 quired to read a continuous selection for three minutes. At the end 
 of this time he turns to another section of the test and answers ques- 
 tions based upon the selection he has just read. The questions are 
 to be answered by either "yes" or "no." The selection read is re- 
 peated in connection with the questions so that the pupil may refer 
 to it in case he does not remember the answer to any question. The 
 Brown Silent Reading Test and the Starch Silent Reading Tests 
 require the pupil to read a selection and then reproduce what he can 
 remember. Starch allows thirty seconds reading time, while Brown 
 allows one minute. The Monroe Standardized Silent Reading Tests 
 consist of a series of exercises. Each exercise consists of one para- 
 graph and a question based on it. Most of the answers are to be 
 given by drawing a line under a word. Five minutes are allowed 
 for the test. The Fordyce Scale for Measuring Achievement in 
 Silent Reading^^ requires the pupil to read a selection and then an- 
 swer from memory questions based on it. The selection for Test i 
 contains 300 words. The time allowance is 125 seconds. The selec- 
 tion for Test 2 contains 512 words with a time allowance of 140 
 seconds. The time allowed for the reading is intended to be such 
 that 50 percent of the pupils will finish before time is called. 
 The directions which accompany the Fordyce Scale for Meas- 
 
 "This test has only one form. Test i was given in the fourth grade and Test 
 2 in the seventh grade. 
 
uring Achievement in Silent Reading are stated in general terms. 
 For this reason it was necessary to formulate the exact explana- 
 tion to be given to the pupils. The following was used: 
 
 Do not turn over your paper until I tell you to begin. These papers have 
 a story on them. You are to read the story at your ordinary rate of reading, care- 
 fully enough so that you will be able to reproduce the leading thoughts. When 
 I say "mark," draw a line around the word at which you are looking at that time. 
 If you have not finished go right on reading until you come to the end of the 
 story. Then immediately turn your paper face down and sit quietly until all have 
 finished. You are to read the story once and once only, and just as soon as you 
 have finished, turn your paper down. Is there any one who does not understand 
 exactly what to do.^ All right! Begin! 
 
 In the Experimental Reproduction Test the following directions 
 were used: 
 
 Do not open your books until I tell you to begin. Write your name and school 
 on the card." 
 
 This is a test to find out how rapidly and how well you can read. 
 Read carefully; for you will be asked to write out what you have read. Put your 
 finger in the book this way (illustrating). When I say "begin" open your books 
 and begin to read at the first blue mark here (illustrating). When I say, "mark," 
 draw a line around the word at which you are looking, (illustrate), then go right 
 on reading until you come to the last blue mark. Then close your book and sit 
 quietly until all have finished. Read over only once. Do not forget to draw a 
 line around the word where you are reading when I say, "mark." Is there anyone 
 who does not understand just what he is to do? All right! Begin! 
 
 The time allowance was thirty seconds. After they had com- 
 pleted the reading, the pupils were asked to write, in as nearly the 
 same words as possible, all that they had read. This reproduction 
 completed, they were asked to answer a list of questions based upon 
 the selection read. They were not given an opportunity to consult 
 the reproduction nor to add to it after answering the questions. 
 
 The nature of the Cross-Out Silent Reading Test is illustrated 
 by the directions given to the seventh grade pupils: 
 
 Below you will find a paragraph of a story. Certain words in this paragraph do 
 not belong there, that is, they do not make sense and do not agree with what has 
 gone before. Read this paragraph carefully and draw a line through all the words 
 which do not belong there. Do not write anything. Do nothing except cross out 
 the words which do not make sense with what has gone before. Is there anyone 
 who does not understand what he is to do? Remember to cross out only the words 
 which do not agree with what has gone before. All right! Go ahead! 
 
 "A 3x5 card was fastened to the copy of the supplementary reader which was 
 given to each pupil. Before the books were distributed to another class the rate 
 scores were recorded on the cards and a new card attached. 
 
 10 
 
"It happened in our country long ago, in those old days when only a fev 
 white people lived here and everything was rough and civilized. Strong men were at 
 work among the hills, cutting down the brooks and planting corn in the new 
 fields, and towns were springing up all along the walls, but still there were many 
 miles of forest where Indians hunted and bears and wolves had their palaces." 
 
 In this paragraph the words to be crossed out are "dvilized", 
 "brooks", "walls", and "palaces". These answers were read to the 
 pupils after they had marked the paragraph. In case any failed to 
 understand the nature of the exercise it was explained to them. They 
 
 were then directed as follows: 
 
 In the following pages you will find part of a story. It is not a fairy story. 
 In this stery, as in the paragraph above, there are words which do not agree with 
 the meaning of what has gone before. Cross them out just as you did in the above 
 paragraph. Be sure to cross out all the words which do not belong, but cross out 
 only those words; for if you cross out any word which should not be crossed out 
 it will be counted as a mistake. You will be allowed four minutes to work. Many 
 of you will be unable to finish during this time. It is more important, however, 
 to do your work correctly than to cover a great deal of ground. Do all three pages. 
 
 When I say "begin" turn the page and start to work. If anyone finishes before 
 the time is up, close your paper and sit quietly. Is there anyone who does not 
 understand just what he is to do? All right! Begin! 
 
 The directions to the fourth grade pupils differed from the above 
 in only two respects. Two additional illustrative paragraphs were 
 used and the time allowance was three minutes instead of four. 
 
 The nature of the Pressey Silent Reading Test for Grades six 
 to eight may be illustrated by the directions: 
 Look at the first example given just below: 
 
 1. February is the longest month in the year. The above statement is not 
 true; but there is only one word that makes the sentence untrue. This one word 
 is the word "longest"; if "longest" were changed to "shortest", the sentence would 
 then read, "February is the shortest month in the year", which is true. "Longest" 
 is wrong; so take your pencils and cross It out. Draw a line through it because 
 it is wrong. 
 
 Look at the second example just below: 
 
 2. The day dawned bright and dreary; the clear morning light streamed in 
 through the windows and filled the room with its cheery brightness. 
 
 In this paragraph, also, there is one, and only one, word that is wrong, the 
 meaning of which does not fit in with the meaning of the rest of the paragraph. 
 The word is "dreary". Cross it out. 
 
 Two additional illustrative exercises were given and the pupil 
 directed as follows: 
 
 And now — everyone attention! In each of the paragraphs on the other side 
 of the page, there is one, and only one, word that is wrong, which makes the para- 
 graph untrue, or whose meaning does not fit in with the meaning of the rest of 
 
 11 
 
the paragraph. Cross that Avord out. And remember, there is only one word in 
 each paragraph that is wrong. Be sure to take the paragraphs in order. Never 
 skip a paragraph without attempting it. Read rapidly and accurately. You will 
 be given lo minutes in which to work. Ask no questions. 
 Now, turn over the page, and all start! 
 
 In the vocabulary tests the following directions, which are 
 
 printed on the test papers, were read to the pupils: 
 
 Below are lOO words which are designed to measure the size of your vocabulary. 
 
 Consider each one carefully, and place before it one of these four marks: 
 
 (i) the mark "D" if you could define it as exactly as words are ordinarily 
 
 defined in the dictionary. 
 
 (2) the mark ''E" if you could explain it well enough to give some idea of 
 its meaning to one who is not familiar with it, though you could not give an exact 
 definition that would satisfy an expert. 
 
 (3) the mark "F" if the word is merely roughly familiar, so that you have 
 only an indefinite idea of its meaning and could not use it intelligently. 
 
 (4) the mark "N" if the word is entirely new and unknown to you. 
 
 When you have finished, count the marks and fill out these blanks, making 
 sure that the numbers add to one hundred. 
 
 In the fourth grade these directions were modified somewhat 
 in order to make certain that the pupils would understand them. 
 Fifteen minutes were allowed for the test in both grades. 
 
 The Cancellation Tests consist of a page of Spanish text. For 
 the "a — t" test the following directions were given to the pupils: 
 
 On this paper you will find a large number of words from a foreign language. 
 Draw a line through each of these words which contain both an "a" and a "t." 
 
 If the word has an "a" but not a "t" in it do not cross out the word. If it 
 has a "t" but not an "a" do not cross it out. Be sure to draw a line through all 
 words which contain both an "a" and a "t," but only through these words; for if 
 you cross out a word which does not have both an "a" and a "t" in it, it will 
 count as a mistake. When I say "begin" turn over your paper and begin work. 
 You will be allowed two minutes to work. Your score will depend on the number 
 of words you cross out correctly. 
 
 In addition to this explanation of the test, four non-consecutive 
 words were selected from the text and written on the blackboard 
 in order to illustrate the kind of words to be crossed out. The ex- 
 planation for the "e — r" test is identical with the above except that 
 "e" and "r" are used in the place of "a" and "t." 
 
 In the Memory Tests the pupil was directed as follows: 
 This is to be a test to see how well you remember what you hear. I am going 
 to read a little story, and I want every one to pay close attention; for as soon as 
 I have finished I want you to write down, in as nearly the same words as possible, 
 what I have just read to you. Listen carefully, and as soon as I stop reading write 
 down all that I have just read. Your score will depend on how nearly you re- 
 member what has been read to you. Do not begin to write until I have finished 
 
 12 
 
reading. Is there anyone who does not understand just exactly what he is to do? 
 All right! Attention! 
 
 In the composition test the following topics were written on the 
 blackboard. Then the directions given below were read to the pupils: 
 
 AN EXCITING EXPERIENCE. 
 
 A storm. An unexpected meeting. 
 
 An accident. In the woods. 
 
 An errand at night. In the mountains. 
 
 A wonderful story. On the ice. 
 
 A runaway. On the water. 
 
 I want you to write me a story. It is to be a story about some exciting ex- 
 periences that you have had, or about something very interesting that has happened 
 to you. If nothing of the sort has ever happened to you, then tell me of an ex- 
 citing experience someone whom you know has had. You may even make up a 
 story of this kind, if you have to, though I believe you will do better, on the whole, 
 with a real one. I am going to give you about twenty minutes in which to write. 
 You are to write on both sides of the paper, to do all the work yourselves, and to 
 ask no questions at all after you begin. You may make whatever corrections you 
 wish between the lines. There will be no time to rewrite your story. 
 
 I have written the general subject on the blackboard, together with some sug- 
 gestions. You do not have to write on any of these topics unless you want to; 
 they are merely to help out in case you cannot think of an exciting experience 
 yourself. Is there anyone who does not understand just what he is to do? All 
 right! Begin! 
 
 Twenty minutes were allowed for the actual writing. Then the 
 
 pupils were directed as follows: 
 
 You are to have four or five minutes in which to finish your stories, make 
 corrections, and count the number of words written. Write this number at the 
 end of your story. 
 
 Description of pupils' performances. In order to eliminate or 
 reduce accidental errors and subjective errors to a minimum, all test 
 papers were scored independently by two persons working under 
 careful supervision. In the case of those scores for which the sub- 
 jective factor was negligible, any differences between the two scores 
 were reconciled by a third person.^ ^ When a subjective error was 
 involved the average of the two scores was taken unless the differ- 
 ence between them exceeded a fixed maximum. In this case the 
 paper was scored by a third person in an attempt to reconcile the 
 two scores. 
 
 The description of a pupil's rate of reading is objective. Hence 
 only accidental errors are involved. The rate was expressed in 
 terms of words per minute. The scoring of comprehension in the 
 
 ^This third person was the same for all tests, and also was the one who super- 
 vised the scoring. 
 
 13 
 
following tests was also highly objective: Monroe's Standardized 
 Silent Reading Tests, Courtis' Silent Reading Test No. 2, Cross-Out 
 Silent Reading Tests, Pressey's Silent Reading Test, and Cancella- 
 tion Test. 
 
 Monroe's Standardized Silent Reading Tests were scored for 
 comprehension according to the usual directions with a few slight 
 changes with respect to the answers which were considered correct. 
 The pupil's comprehension score is the sum of the comprehension 
 values of the exercises which he does correctly. 
 
 The directions which accompany the Courtis Silent Reading 
 Tests No. 2, provide for two measures of comprehension, the index 
 of comprehension and the number of questions answered. The index 
 of comprehension is found by subtracting the number of wrong an- 
 swers from the number of right answers and dividing the difference 
 by the number of right answers. In addition to these two scores 
 the number of right answers was recorded. 
 
 Two methods of scoring the Cross-Out Silent Reading Tests 
 for comprehension were used. It was found that pupils made two 
 types of errors. Some crossed out words which should not have 
 been crossed out, and words which should have been crossed out 
 were not marked. One description was obtained by taking the dif- 
 ference between the number of words correctly marked and the 
 number of words wrongly marked. (This included only the first 
 type of error.) This score is indicated by the symbols c — w. In 
 the second score, the number of inconsistent words, which the pupil 
 failed to mark in the part of the test read, was recognized. 
 
 c — — w 
 The score was obtained by evaluating the following fraction, ^ — -j- — 
 
 In this fraction c and w have the same meaning as above and o 
 stands for the number of words omitted.^® 
 
 In the Pressey Silent Reading Test a pupil's comprehension 
 score is the number of exercises which he does correctly within the 
 time allowed. In order to have an exercise counted as right the 
 correct word must be crossed out and no other word in the para- 
 graph marked. 
 
 The Vocabulary Test was scored according to standard direc- 
 tions." Each "D" and "E" was regarded as indicating one point 
 and each "F" as indicating a half-point. (See page 12.) The total 
 
 "Whipple, G. M. Manual of Mental and Physical Tests. Simpler Pro- 
 cesses, p. 313. 
 
 "Whipple, G. M. Manual of Mental and Physical Tests, Part II, Complex 
 Processes, p. 310-11. 
 
 14 
 
number of points represents a vocabulary-index. This index, taken 
 as a percent and multiplied by 18,000, affords a measure of the size 
 of the pupil's total vocabulary. 
 
 In the cancellation tests the score was obtained by convertincr 
 rate and accuracy into a single index of efficiency (E).^^ -pj^jg -^^^^^ 
 was obtained by the following formulae: 
 
 A= ^ E=e A 
 
 c-j-o 
 
 Here A == the index of accuracy. 
 
 E = the index of net efficiency. 
 e = the number of words examined. 
 o == the number of words erroneously omitted, 
 c = the number of words crossed, 
 w = the number of words wrongly crossed. 
 After computing the index of accuracy the score in terms of the in- 
 dex of efficiency was obtained. 
 
 The scoring of answers to questions obtained from Fordyce's 
 Scale for Measurement of Achievement in Silent Reading and from 
 the Experimental Reproduction Tests is less objective than the scor- 
 ing of the tests just described. Fordyce gives a list of correct an- 
 swers. This, together with the nature of the questions, makes the 
 scoring of his test highly objective for its type. In the course of scoring 
 the answers to the questions of the Experimental Reproduction Tests, 
 lists of correct answers were compiled and all scoring was done in 
 accordance with them. The acceptable answers were chosen with 
 care from the complete array of all answers given in each of the 
 tests. Any word or group of words judged to give correctly the total 
 idea called for by the question was counted as correct. 
 
 Scoring Reproductions. The reproductions obtained from 
 Brown's Silent Reading Test, Starch's Silent Reading Tests, the 
 Experimental Reproduction Tests, and the Memory Tests were scored 
 by both the "idea-counting method" and the "word-counting 
 method." In addition, Brown's tests were scored according to the 
 directions which he gives. The description of a reproduction is not 
 highly objective. Pupils differ widely with respect to vocabulary 
 and to sentence structure. In addition to incorrect statements, re- 
 productions contain superfluous statements and repetitions. The 
 order of ideas is frequently transposed so that their significance is 
 modified. Ideas contained in the passage read are expressed with 
 
 "Whipple, G. M. Manual of Mental and Physical Tests, Part I. Simple Pro- 
 cesses, pp. 312-13. 
 
 15 
 
various degrees of completeness. These characteristics of reproduc- 
 tions create many opportunities for differences of opinion In their 
 description. 
 
 1. The idea-counting method. The first step in using this 
 method is to divide the selection read Into ideas. In making this 
 division one may adopt a relatively small unit, which is essentially 
 a word or phrase, or a large unit, which approximates a sentence. 
 After experimenting with these two plans of division the former was 
 chosen. A portion of Brown's Silent Reading Test, "The Long 
 Slide," with the divisions indicated, is reproduced below: 
 
 THE LONG SLIDE 
 
 The boys / and girls / who live / in a certain part / of a small / town/ in the 
 country / several miles / from any village / attend / school / in a little / red / school- 
 house / known as / the Long Hill / school. / 
 
 It has / this name / because / it is situated / on the top / of a very long / steep/ 
 hill./ Ever since anyone / can remember, / the scholars / of the Long Hill / school / 
 have always had / time / to slide / down the hill / just once / at recess / in winter / 
 and get back / to the school house / before the bell / rings / to call them back again / 
 into school. / They can go down / very rapidly, / but it takes / a long time / to walk 
 back./ 
 
 Last Monday / morning / Frank Lane / appeared / at school / with a fine / new/ 
 sled. / It was a double-runner / which his uncle, / who owns / a carriage factory / in 
 the city, /had given him. / He named / his new / sled / the Simoon / and almost had/ 
 a fight /with Tom Smith, / who said / it was foolish / to put / such a name / on a 
 sled, / but he kept on / calling it / the Simoon. / 
 
 At recess / that day / Frank / invited / the whole / school / to go / for a coast/ 
 and the twelve / boys / and girls / got onto / the sled / and away they went / down 
 the steep hill. / When recess was over / Miss Black, / the teacher, / rang the bell / 
 but not a scholar / appeared./ Thinking that / the children / had stopped / to play / 
 on the way back / from their slide, / Miss Black / went / to the door / and looked / 
 down the hill / and rang / the bell / again./ But not a scholar / was in sight./ Then 
 she was greatly astonished / and began / to be very angry, / for nothing / like this / 
 had ever happened / in all of her twenty-eight / years / as a teacher. / She waited / 
 and waited / but still / no scholars / appeared. / She stopped / every team / that 
 came / up the hill, / but no one / had seen / anything of them. / 
 
 She stayed / at the schoolhouse / and wondered / what had become of / her 
 children / until it was time / to let out / school / and then / she went / over to John 
 Reed's / who lives / nearest to the school house / and whose son / and daughter / 
 were among the missing / scholars. / Mr. Reed / was greatly frightened / at what 
 Miss Black / told him / about the disappearance / of her school / and immediately/ 
 hitched up / his horse / to go in search / of the lost / children. / Just / as he was 
 driving / out of the dooryard / the scholars / appeared / far down the hill. / It was 
 almost / dark / before / they got back / to the schoolhouse. / 
 
 The pupil's score is the number of ideas which he reproduces 
 
 correctly. Thus, the scorer must determine what ideas, occurring in 
 
 the passage read, appear in the pupil's reproduction. Two rules 
 
 were adopted. 
 
 1. Misplaced clauses and phrases, that is, clauses and phrases 
 which are tacked on to the wrong part of a sentence, are to be 
 counted as incorrect. 
 
 2. Correct ideas found in a statement, which, as a whole, Is 
 directly contrary to the meaning of the text read, are to be counted 
 
 16 
 
as correct. The following example may be cited: John Shafts was 
 not cruel. Here, both the ideas, John Shane and cruel, are held to 
 be correct, while was not is incorrect. In practically aJl cases com- 
 ing under this rule the incorrectness of the statement was caused 
 by the use of a wrong verb or a wrong adverbial modifier, as in this 
 illustration. 
 
 The scorers were urged to keep in mind the general rule that 
 they were to match up identical ideas in the passage read and in 
 the pupil's reproduction, even though sometimes the ideas were not 
 expressed in the same language. In order to secure independent 
 scorings, each selection, with the divisions into ideas indicated as 
 shown above, was mimeographed. The scorer indicated on this 
 mimeographed copy the ideas which in his judgment the pupil had 
 reproduced. In this way no record of the scoring was made on the 
 pupil's test paper, and complete independence of scoring was secured. 
 
 In putting together the results from two independent scorings, 
 when the difference in the number of ideas was six or less, the av- 
 erage was taken. In the case of a difference of more than six the 
 third person went over both papers to change too lenient or too 
 severe scoring. These changes were made until the difference was 
 reduced to six or less. Then the average was taken. 
 
 Brown's method of idea-counting. Brown has given directions 
 for describing the reproductions written by pupils in terms of 
 "quantity of reproduction" and "quality of reproduction." As a 
 basis for his method of scoring, the selection is divided into sections 
 each of which he considers to represent a unit of thought. A por- 
 tion of "The Long Slide" is reproduced to show his plan of division: 
 
 THE LONG SLIDE 
 
 The boys and girls who Hve in a certain part of a small town in the country 
 several miles away from any village attend school(i) in a little red schoolhouse 
 known as the Long Hill School. (2) 
 
 It has this name because it is situated on the top of a very long, steep hill. (3) 
 Ever since anyone can remember, the scholars of the Long Hill school have always 
 had time to slide down the hill just once at recess in winter and get back to the 
 schoolhouse before the bell rings to call them back again into school. They can 
 go down very rapidly, but it takes a long time to walk back. (4) 
 
 Last Monday morning Frank Lane appeared at school with a fine, new sled. {5) 
 It was a double-runner which his uncle, who owns a carriage factory in the city, had 
 given him. (6) He named his new sled the Simoon(7) and almost had a fight 
 with Tom Smith, (8) who said it was foolish to put such a name on a sled, but 
 he kept on calling it the Simoon. (9) 
 
 17 
 
At recess that day Frank invited the whole school to go for a coast, and the 
 twelve boys and girls got on to the sled and away they went down the steep hill.(io) 
 When recess was over, Miss Black, the teacher, rang the bell but not a scholar 
 appeared. Thinking that the children had stopped to play on the way back from 
 their slide, Miss Black went to the door and looked down the hill and rang the 
 bell again. But not a scholar was in sight.(ii) Then she was greatly astonished 
 and began to be very angry, (12) for nothing like this had ever happened in all 
 of her twenty-eight years as a teacher. (13) She waited and waited, but still no 
 scholars appeared. (14) She stopped every team that came up the hill, but no one 
 had seen anything of her school. (15) 
 
 She stayed at the schoolhouse and wondered what had become of her children 
 until it was time to let out school (16) and then she went over to John Reed's, who 
 lives nearest to the schoolhouse (17) and whose son and daughter were among the 
 missing scholars. (18) Mr. Reed was greatly frightened at what Miss Black told 
 him about the disappearance of her school (19) and immediately hitched up his 
 horse to go in search of the lost children. (20) Just as he was driving out of the 
 dooryard, the school appeared far down the hill. (21) It was almost dark before 
 they got back to the schoolhouse. (22) 
 
 The idea which he considered expressed in each of these sec- 
 tions has been condensed in a short statement. These form a key 
 for scoring. The statements corresponding to the sections in the 
 portion of the test reproduced above are given below: 
 
 1. Some children in the country attend school. 
 
 2. The schoolhotise is known as the I^ng Hill School. 
 
 3. It is situated on top of a long hill. 
 
 4. The pupils slide down hill once at recess in winter. 
 
 5. One day a boy brought to school a nezu sled. 
 
 6. His uncle had given it to him. 
 
 7. He named it the Simoon. 
 
 8. He almost had a fight with another boy. 
 
 9. This boy said the name was foolish. 
 
 10. At recess the pupils went for a slide. 
 
 11. At the end of recess no pupils appeared. 
 
 12. The teacher was astonished and angry. 
 
 13. Nothing like this had ever happened before. 
 
 14. After a long wait no scholars appeared. 
 
 15. No one in passing teams had seen her school. 
 
 16. She stayed at school until closing time. 
 
 17. Then she went to the nearest neighbor. 
 
 18. His children were among the scholars. 
 
 19. He was gieatly frightened. 
 
 20. He started to search for the children. 
 
 21. Just then they appeared down the hill. 
 
 22. They reached the schoolhouse just before dark. 
 
 For using this key he gives the following directions :^^ 
 
 "Brown's statement of these directions has been modified in order to make 
 their meaning clear. 
 
 18 
 
1. Each child's written reproduction should be carefully ex- 
 amined, and the number of points in the key which are reproduced 
 by him should be determined and expressed as a percent of the total 
 number in that portion of the selection read. For example, in the 
 part read by a certain child, there may have been forty-eight points, 
 and he may have reproduced twelve of these. The amount repro- 
 duced is, therefore, twenty-five percent of the amount read. This 
 is called "quantity of reproduction". In arriving at a measure of 
 quantity of comprehension, every idea reproduced by the child 
 should be counted which, in most respects, is complete and which, 
 in general, is correcdy stated, even though some of the less impor- 
 tant details are lacking. Credit for quantity of comprehension is 
 given only when all elements of the idea expressed by the words in 
 italics in the key are either expressed or plainly implied in the child's 
 reproduction. 
 
 2. The reproductions should be examined a second time and 
 only those ideas counted which are entirely correct in every respect 
 and of which every detail is reproduced. This is called "quality of 
 reproduction". 
 
 2. The word-counting method. In applying this method, a 
 pupil's reproduction is examined and the words which do not cor- 
 rectly reproduce the selection read are crossed out. The pupil's 
 score is the number of words remaining. The directions for cross- 
 ing out words were essentially the same as those used by Starch in 
 scoring his own silent reading tests. The scorers were directed to 
 cross out the following classes of words: 
 
 (a) Words which incompletely reproduce the thought. 
 
 (b) Words which introduce new ideas. 
 
 (c) Words which represent ideas reproduced elsewhere. 
 
 (d) Superfluous connectives. 
 
 The scorers were, also, directed to bear constantly in mind that 
 the aim of this method is to ascertain the number of words which 
 actually reproduce the thought contained in the passage read. In 
 order to secure independence on the part of the scorers when using 
 the word-counting method, the lines of the reproductions were num- 
 bered. Sheets of ruled paper were then prepared with numbered 
 lines. In scoring the reproductions, the words to be omitted in a 
 line, when computing the pupil's score, were written on the corre- 
 sponding line of the sheet of ruled paper. The number of words 
 remaining in the line of the reproduction was then recorded in the 
 right hand margin. The sum of these entries constituted a pupil's 
 
 19 
 
score. No mark other than the numbers of the Hnes of the repro- 
 ductions was made upon the pupil's test paper. Thus, the second 
 scorer was not influenced in any way by the work of the first. The 
 two independent scorings were reconciled by a third person, accord- 
 ing to the rules given in the case of the idea-counting method, except 
 that a difference of eight rather than of six was allowed before re- 
 scoring was undertaken. This exception does not apply to the 
 Memory Test. 
 
 Subjectivity of describing reproductions. An examination of 
 the records of scoring the reproductions shows many differences of 
 opinion on the part of the scorers. One scorer gave credit for 
 certain words or ideas which the other scorer rejected, while the 
 second scorer gave credit for words and ideas rejected by the first 
 scorer. These differences of opinion tend to balance each other in 
 the resulting scores but not entirely. For some reproductions, two 
 persons will give the same score. For others, the two scores will 
 differ. In a few cases the difference will be marked. Whenever 
 there is a difference, at least one score, and probably both, involve 
 an error.-** Even when the two scores are identical both may in- 
 volve an error. 
 
 Constant errors and variable errors. The scoring of reproduc- 
 tions even under favorable conditions, such as prevailed in this 
 investigation, involves two types of errors — constant errors and vari- 
 able errors. A constant error results in a scorer assigning scores 
 which, in general, are too high or too low. A liberal attitude toward 
 the reproductions will result in high scores. On the other hand, a 
 conservative procedure will result in low scores. An indication of 
 the presence of a constant error may be secured by comparing the 
 averages of the two sets of scores assigned independently by two 
 scorers to the same set of papers. Any differences in their general 
 policy will be reflected by a difference between the averages of the two 
 sets of scores. However, this difference cannot be considered to be 
 an index of the magnitude of the constant error because both per- 
 sons may be inclined to be liberal in their scoring, or both may be 
 conservative, or one may be conservative and the other liberal. 
 
 Variable errors are indicated by the fact that in scoring one 
 reproduction Scorer A will assign a score of 90, and Scorer B a score 
 of 75; but in scoring a second reproduction Scorer A may assign a 
 score of 60, and Scorer B a score of 80. This may happen although 
 
 ^A score is said to involve an error when it differs from the true score which 
 is defined as the average of a large number of scores assigned by different persons 
 
 20 
 
Scorer B is, in general, more liberal than Scorer A. In studying the 
 variable erorrs it is necessary to isolate them from the constant er- 
 rors. Constant errors which affect the average of the scores as- 
 signed by either person do not affect the coefficient of correlation. 
 Hence, it may be used as an index of the magnitude of the variable 
 errors. 
 
 Tables I and II give data relative to both the constant and 
 variable errors involved in the word-counting and in the idea-count- 
 ing methods. Table I shows the facts for the first method and 
 Table II for the second. The scorers are represented by letters. 
 The numbers in the column headed "Difference of Average Scores" 
 were obtained by subtracting the average of the scores assigned by 
 the second scorer from the average of the scores assigned by the 
 first scorer. A positive difference means that the first scorer gave, 
 on the average, higher scores than the second. A negative differ- 
 ence has the opposite meaning. In some cases the difference closely 
 approximates zero, but in others it is relatively large. This indi- 
 cates that, for some scorers, the constant error is relatively large. 
 One is justified in asserting that, on the basis of the possible con- 
 stant error in the scores assigned to reproductions by a single scorer, 
 no reliable inferences can be made concerning the differences in 
 reading ability of two groups of pupils unless the differences 
 between their average scores are large. 
 
 TABLE I, SUBJECTIVITY OF SCORING REPRODUCTIONS BY THE WORD- 
 COUNTING METHOD 
 
 Test 
 
 Memory 
 
 Memory 
 
 Memory 
 
 Memory 
 
 Memory 
 
 Memory 
 
 Reproduction.. 
 Reproduction.. 
 Reproduction.. 
 Reproduction.. 
 Reproduction.. 
 
 Brown 
 
 Brown 
 
 Starch (No. 7) 
 Starch (No. 6). 
 
 Form Grade 
 
 Num 
 ber of 
 scores 
 
 Scor- 
 ers 
 
 Difference 
 
 of average 
 
 scores 
 
 P.E. Est.ii 
 
 P.E.Est.u 
 
 Average 
 
 IV 
 IV 
 IV 
 VII 
 VII 
 VII 
 
 IV 
 
 IV 
 
 IV 
 
 VII 
 
 VII 
 
 IV 
 IV 
 
 VII 
 VII 
 
 92 
 
 27 
 
 116 
 
 123 
 
 100 
 
 31 
 
 94 
 
 31 
 
 68 
 
 117 
 
 "3 
 
 III 
 
 no 
 
 119 
 121 
 
 Y-C 
 Y-K 
 Y-C 
 Y-K 
 Y-C 
 Y-K 
 
 L-K 
 L-C 
 L-K 
 M-F 
 F-C 
 
 T-Mj 
 T-Mj 
 
 M-C 
 
 M-C 
 
 —9.9 
 
 —5-1 
 — 2.0 
 
 —7-5 
 —8.2 
 
 +4.1 
 
 +6.8 
 —1.6 
 
 +4.7 
 —0.5 
 —6.0 
 
 + 12.8 
 +6.9 
 
 -5.8 
 — 2.0 
 
 4 -5 
 3 4 
 3-3 
 5-5 
 3-9 
 2.6 
 
 31 
 
 2-4 
 4-2 
 
 9.2 
 
 5-5 
 
 2.6 
 
 2.1 
 
 .06 
 .04 
 ■05 
 •05 
 .04 
 
 •03 
 
 .06 
 .06 
 .10 
 .06 
 •05 
 
 •15 
 .08 
 
 .07 
 •05 
 
 21 
 
TABLE II. SUBJECTIVITY OF SCORING REPRODUCTIONS BY THE IDEA- 
 COUNTING METHOD 
 
 Test 
 
 From Grade 
 
 Num- 
 ber of 
 scores 
 
 Scor- 
 ers 
 
 Difference 
 
 of average 
 
 scores 
 
 Tit 
 
 P.E. Est.it 
 
 P.E.Est.ii 
 
 Average 
 
 Memory 
 
 Memory 
 
 Memory 
 
 Memory 
 
 Reproduction.. . . 
 Reproduction... . 
 Reproduction... . 
 Reproduction.. . . 
 
 Brown* 
 
 '\ 
 
 IV 
 
 IV 
 
 VII 
 
 VII 
 
 IV 
 IV 
 VII 
 VII 
 
 V 
 V 
 
 IV 
 IV 
 IV 
 IV 
 
 VII 
 VII 
 
 121 
 116 
 
 122 
 128 
 
 94 
 100 
 116 
 112 
 
 77 
 75 
 
 112 
 116 
 
 113 
 118 
 
 122 
 124 
 
 Y-P 
 Y-P 
 Y-P 
 Y-P 
 
 F-P 
 F-P 
 F-P 
 
 S-F 
 
 Cl-S 
 Cl-S 
 
 P-C 
 P-C 
 P-C 
 P-C 
 
 S-Cl 
 S-Cl 
 
 +0.1 
 -t-0.6 
 + 1.0 
 4-0.6 
 
 —0.6 
 +0.7 
 
 —7-9 
 +0.7 
 
 +0.4 
 + 1-5 
 
 +8.7 
 +7-8 
 -6.7 
 +0.1 
 
 —2.3 
 —1.0 
 
 ■95 
 
 .84 
 .89 
 •85 
 
 •94 
 
 ■ 95 
 
 ■ 91 
 .88 
 
 .88 
 .85 
 
 .69 
 
 •75 
 .68 
 .56 
 
 .92 
 •95 
 
 I .1 
 
 I .1 
 
 1.6 
 
 1 .0 
 
 1.6 
 14 
 5-6 
 
 4^5 
 
 2-5 
 
 2-4 
 
 8.4 
 6.1 
 
 5-2 
 
 1.6 
 1-3 
 
 .04 
 
 •05 
 .04 
 .04 
 
 .07 
 .08 
 .08 
 .10 
 
 .10 
 
 Brown* 
 
 1 1 
 
 Brown, Quantity 
 Brown, Quantity 
 Brown, Quality... 
 Brown, Quality... 
 
 Starch (No. 7)... . 
 Starch (No. 6)... . 
 
 .18 
 .16 
 
 .24 
 •30 
 
 .08 
 .08 
 
 •Brown I is The Long Slide; Brown II, A Morning Adventure. 
 
 It appears that a scorer is not always consistent with respect 
 to his constant error. In Table I, Scorer Y and Scorer K show neg- 
 ative differences for two sets of papers and a positive difference for 
 a third set. The same condition is exhibited by Scorer P and Scorer 
 C in Table II. This reversal of policy may be due in part to differ- 
 ences in the character of the reproductions, but, doubdess, the in- 
 stability of subjective judgment is also a factor. 
 
 In the column headed "r^g", the coefficient of correlation be- 
 tween the two sets of scores is given. In the next column the proba- 
 ble error of estimate is given. This was calculated by the formula,^^ 
 P. E. Est.,. =.6745 (J^/\^^ 
 
 2iThe probable error of estimate for two sets of related data is given by the formula 
 P. E.Estii = -6745 CTi \/ I — r?2 (See Yule, Introduction to the Theory of Statistics, 
 Page 177.) In this formula r,j is the coefficient of correlation between the two sets 
 of data and CTj is the standard deviation of the corresponding distribution. The 
 probable error of estimate for the first set of scores (P. E. Est.i) is a measure of the 
 amount of change which would be necessary to bring these scores into perfect corre- 
 lation with the other set of scores. Professor T. L. Kelley has shown that the corre- 
 lation between one set of obtained scores and the corresponding true scores is given 
 by the formula, rit = l/r,,. Therefore, the formula, P. E. Est.it =.6745 Cil/i— r,, 
 gives the probable error of estimate of the first set of scores with respect to the cor- 
 responding set of true scores. A similar formula would give the probable error of 
 estimate for the other set of scores. Since both sets of scores were assigned to the 
 same set of reproductions, the best measure is the average of the two formulae. Hence, 
 <T is the average of (Tj and C,. 
 
 22 
 
As used here the probable error of estimate should be inter- 
 preted as a description of the magnitude of the variable errors or 
 departures of the assigned scores from, the corresponding true scores 
 after the constant error has been eliminated. We may, therefore, 
 speak of the probable error of estimate in this case as the probable 
 variable error of scoring. A probable variable error of scoring of 3.4 
 means that, in general, the variable errors for the two scorers from 
 whom the data were obtained are greater than 3.4 for fifty percent ot 
 the scores. It also means that for fifty percent of the scores the varia- 
 ble errors are less than 3.4. 
 
 The probable variable error of scoring cannot be given a definite 
 significance except in comparison with the magnitude of the score with 
 which it is to be associated. A probable error of 5 does not have 
 the same meaning when associated with a score whose magnitude 
 is 25 as it has when associated with a score of 100. It is, therefore, 
 necessary to compare the probable variable error of scoring with the 
 magnitude of the scores with which it is associated. The same de- 
 gree of objectivity will result in larger variable errors of scoring for 
 large scores than for small scores. Since the probable variable error 
 of scoring which we have obtained is, itself, an "average" it may 
 consistently be compared with the average score. This has been 
 done in obtaining the quantities given in the last column of the 
 table. The probable variable error of scoring has been divided by 
 the average score. A quotient of .06 is to be interpreted as mean- 
 ing that the chances are one to one that the score assigned to a paper 
 will diff^er from the true score by as much as six percent of its mag- 
 nitude. 
 
 In both tables, the coefficients of correlation are high in the 
 sense that most of them diff'er only slightly from i.oo. With the 
 exception of coefficients for "quality of reproduction" and "quantity 
 of reproduction" of Brown's Silent Reading Test, only one is below 
 .83. A number are above .90. There are four coefficients of .97. 
 One is .98. With three exceptions, the number of cases on which 
 these coefficients are based is sufficiently large so that the probable 
 error of the coefficient of correlation due to sampling is relatively 
 small. The description of the variable errors of scoring in terms of 
 the probable variable error of scoring and the ratio of the probable 
 variable error of scoring to the average suggest that these errors are 
 much larger than might be concluded from a consideration of the co- 
 efficients of correlation. For example, in Table I the highest coefficient 
 of correlation is .98 for the second form of the Experimental Repro- 
 
 23 
 
duction Test in the fourth grade. The probable variable error of scor- 
 ing is 2.4 units, which is six percent of the average score. This 
 means that, in general, the chances are one to one that the score 
 assigned to a pupil's reproduction in this group of papers will differ 
 by at least six percent of its magnitude from the true score. This 
 is the effect, only, of the variable error of scoring. The actual error 
 of a pupil's score may be larger, due to the effect of the constant 
 error on the part of the scorer. 
 
 It should also be noted that the highest coefficient of correlation 
 is not always paired with the lowest ratio of the probable error of 
 scoring to the average. In Table II, a ratio of .04 is obtained for 
 three tests. The coefficients of correlation for these are .95, .89, and 
 .85. In Table I, there are four ratios of .06. The corresponding 
 coefficients of correlation are .89, .97, .98, and .96. The lowest ratio^ 
 .03, is associated with a coefficient of .90. Comparisons between 
 the coefficients of correlation and the probable variable errors of 
 scoring, likewise, show many cases of non-agreement. In Table I, 
 the largest probable variable error, 9.2, corresponds to a coefficient 
 of correlation of .96. The lowest coefficient of correlation, ."TJ, cor- 
 responds to a probable variable error of 5.5. The smallest proba- 
 ble variable error, 2.1, corresponds to a coefficient of .97. This lack 
 of agreement is due largely to differences in the magnitude of the 
 scores. 
 
 The scoring of Brown's Silent Reading Test for quality and 
 quantity of reproduction clearly involves the largest variable error. 
 This is indicated both by the coefficient of correlation and by the 
 probable variable error of scoring. If we exclude from our consid- 
 eration these two scores of Brown's test, neither the idea-counting 
 method nor the word-counting method is distinctly superior. In 
 general, the word-counting method appears to involve a slightly 
 smaller variable error when this error is considered in relation to 
 the average score. However, both methods must be described as 
 highly subjective. They Involve a probable variable error of Scoring 
 of .06 or more in addition to a constant error which, in some cases, 
 is probably large. 
 
 The scoring of Brown's test appears to be somewhat less ob- 
 jective than that of the others. This is especially true in the case 
 of the word-counting method. In addition to the variable errors, 
 this method appears to introduce a large constant error. The scores,, 
 "quantity of reproduction" and "quality of reproduction," which 
 Brown recommends, are clearly less objective than the scores ob- 
 
 24 
 
tained by either of the other methods. In fact, they are so highly 
 subjective that their use cannot be defended. 
 
 Snmmary for describing reproductions. The description of re- 
 productions involves large errors, both constant and variable. Even 
 when the scoring is done under careful supervision reliable scores 
 cannot be expected. For this reason, alone, silent reading tests re- 
 quiring reproduction cannot be considered satisfactory. The method 
 which Brown recommends for scoring reproductions appears to be 
 inferior to both the word-counting method and the idea-counting 
 method. 
 
 Scoring answers to questions. The scoring of the answers to 
 the questions in the case of the Experimental Reproduction Tests 
 and Fordyce's test is not perfectly objective unless an elaborate list 
 of acceptable answers is prepared. This was done for both of these 
 tests and, consequently, the scores used in this study may be con- 
 sidered objective in the sense that the scoring approximated uni- 
 formity. These tests, however, should not be considered as being 
 perfectly objective when used independently by different persons 
 who do not have access to elaborate directions for scoring. 
 
 Describing the quality of compositions. The scoring of the com- 
 positions for story value by means of the Willing Scale for Written 
 Composition is not highly objective. Eighty-six compositions were 
 scored independently by two persons. The difference between the 
 averages of the two sets of scores was 6.7. The coefficient of corre- 
 lation between the two sets of scores was .86. The probable variable 
 error of scoring was 2.9 and the ratio of this to the average was .04. 
 The magnitude of the variable error of scoring indicated by the prob- 
 able error and its ratio to the average is less than that involved in 
 either method of scoring the reproductions. 
 
 Time required for scoring test papers. All scorers kept a record 
 of the time devoted to scoring the different tests. As we have in- 
 dicated, care was exercised in the scoring and this probably tended 
 to increase the time consumed. Furthermore, in the scoring of re- 
 productions the procedure followed was not the most economical 
 one. The average number of papers scored per hour is given m 
 Table III. The most rapid scoring was done in the case of the 
 questions of the Experimental Reproduction Tests. The scoring 
 was nearly as rapid for Monroe's Standardized Silent Reading Tests 
 and for the Pressey Test. The scoring of the tests requiring repro- 
 ductions was relatively slow except in the case of Starch's Silent 
 Reading Tests for ideas. 
 
 25 
 
Average scores and standard deviations. In Tables IV and V, 
 the average scores and standard deviations are given for each of the 
 tests in each grade. The averages for the comprehension scores in- 
 dicate that widely different units are used in describing the per- 
 formances on the different tests. In the fourth grade the averages 
 range from 6.2 for one method of scoring the Cross-Out Test to 87, 
 the average index of comprehension yielded by the Courtis Silent 
 Reading Test, No. 2. Even in the case of tests for which the unit 
 is given the same name we have differences in magnitude. For ex- 
 ample, the word is used as a unit in describing the reproductions. 
 The average scores for tests requiring reproduction differ widely 
 for the same pupils. In the seventh grade the average score for 
 Form I of Starch's test is 40; for the Experimental Reproduction 
 Test it is 155. The conditions under which these two tests are ad- 
 ministered are not the same and this is, doubtless, one factor which 
 causes the difference in the scores. Differences in the difficulty of 
 the tests also tend to produce differences in the average scores. It 
 is, however, likely that the units are not equivalent in the two cases. 
 At least, they do not have equivalent interpretations when used as 
 measures of comprehension. 
 
 TABLE III. AVERAGE NUMBER OF PAPERS SCORED 
 PER HOUR 
 
 Test 
 
 Method of 
 Scoring 
 
 Grade 
 
 
 IV 
 
 VII 
 
 
 Usual 
 
 Usual 
 
 Word 
 Idea 
 
 Word 
 Idea 
 
 Word 
 
 Idea 
 
 Question 
 
 Usual 
 
 Usual 
 
 Usual 
 
 Usual 
 
 Usual 
 
 48 
 26 
 15 
 
 60 
 21 
 
 27 
 
 20 
 
 S3 
 
 Courtis 
 
 
 Brown 
 
 _ 
 
 Brown 
 
 — 
 
 
 18 
 
 
 43 
 
 Reproduction 
 
 Reproduction 
 
 Reproduction .... 
 
 Cross-Out 
 
 II 
 
 8 
 
 56 
 
 39 
 
 
 28 
 
 
 47 
 
 Vocabulary 
 
 Composition 
 
 16 
 13 
 
 26 
 
TABLE IV. AVERAGE SCORES AND STANDARD DEVIATIONS FOR MEASURES 
 OF COMPREHENSION 
 
 Test 
 
 Grade IV 
 
 Form I 
 
 Av. 
 
 Form II 
 
 Av. 
 
 Grade VII 
 
 Form I 
 
 Av. 
 
 Form II 
 
 Av. 
 
 Monroe . 
 
 Courtis, Index 
 
 Courtis, Question 
 
 Courtis, Questions Coirect 
 
 133 
 
 84.3 
 36 
 
 Brown, Quantity.. 
 Brown, Quality. . . 
 Brown, Average. . 
 Brown, Efficiency. 
 Brown, Words. . . . 
 Brown, Ideas 
 
 Starch, Words. 
 Starch, Ideas. . 
 
 Reproduction, Question... 
 
 Reproduction, Ideas 
 
 Reproduction, Words. ... 
 
 Cross-Out, C — W. 
 
 Cross-Out, . 
 
 C+0 
 
 Fordyce. 
 Pressey. . 
 
 Memory, Ideas. . 
 Memory, Words. 
 
 Vocabulary. . 
 Composition. 
 
 10.3 
 
 20.8 
 54.5 
 
 6.2 
 
 42.2 
 
 62.5 
 
 26.9 
 
 77-7 
 
 45,0 
 
 6.2 
 
 16.2 
 
 II-3 
 9.2 
 
 20.9 
 17.6 
 18. 1 
 35-3 
 23-4 
 IC.5 
 
 2.8 
 II .1 
 
 30.5 
 
 29.9 
 159 
 
 6.8 
 19.6 
 
 17. 1 
 
 15-5 
 
 91 
 
 17.8 
 40.2 
 
 8.5 
 43-4 
 
 21.3 
 76.0 
 
 5-1 
 
 14.2 
 10.4 
 II .0 
 
 14-5 
 II. 7 
 12.2 
 19.9 
 21.5 
 8.0 
 
 2.3 
 10.4 
 
 25.7 
 
 5-9 
 
 27.4 
 
 3-9 
 134 
 
 23 -9 
 
 29 -3 
 
 9.7 
 
 40.9 
 16.9 
 
 65 
 155 
 
 16 
 
 67 
 
 72.3 
 139 
 
 36.2 
 104 -3 
 
 63 -4 
 
 67.2 
 
 18.3 
 8.7 
 
 2.7 
 
 33-5 
 
 82.2 
 
 7-1 
 
 22.5 
 
 17 
 
 3 
 
 7-4 
 19 " 
 
 38.1 
 18.7 
 
 9-9 
 
 42.5 
 104.2 
 
 18.1 
 69.5 
 
 14. 1 
 
 27.9 
 91 .0 
 
 22.3 
 9.6 
 
 3-0 
 23.6 
 54-3 
 
 8.2 
 
 23 -7 
 
 3-2 
 
 3-8 
 I3-I 
 
 The non-equivalence of units is even more obvious in the case 
 of the average rate scores. In four of the tests the pupil is engaged 
 in continuous reading: Courtis Silent Reading Test, No. 2, Brown's 
 Silent Reading Test, Starch's Silent Reading Tests, and the Experi- 
 mental Reproduction Tests. The average rate scores for these tests 
 exhibit differences sufficiently large to indicate that a word is not 
 a constant unit for the measurement of the rate of reading. For ex- 
 ample, the rate score for Form 3 of the Courtis Silent Reading Test 
 is 153 words per minute. For Brown's Silent Reading Test the rate 
 is 182 words per minute. Similar differences are to be found in the 
 
 27 
 
TABLE V. AVERAGE SCORES AND STANDARD DEVIATIONS FOR MEASURES 
 
 OF RATE 
 
 Test 
 
 Grade IV 
 
 Form I 
 
 Av. 
 
 Form II 
 
 Av. 
 
 Grade VII 
 
 Form I 
 
 Av. 
 
 Form II 
 
 Av. 
 
 Monroe 
 
 Courtis 
 
 Brown 
 
 Starch 
 
 Reproduction.. . . 
 
 Cross-Out 
 
 Fordyce, Words. 
 Pressey 
 
 79-5 
 150.0 
 164.6 
 
 151 .6 
 
 75-3 
 125 .0 
 
 24.9 
 47-9 
 60.3 
 
 77-7 
 28.2 
 27.0 
 
 94-9 
 1531 
 182.5 
 
 J54-7 
 84.0 
 
 21 .1 
 
 55-2 
 78.5 
 
 77-1 
 
 22.2 
 
 Composition (Number of 
 words written.) 
 
 104.0 
 
 193.0 
 
 218.3 
 
 III .7 
 
 179.0 
 
 24.0 
 
 218.6 
 
 30.9 
 
 56.5 
 82.7 
 30.1 
 35-5 
 1-5 
 
 85. 5 
 
 140.7 
 
 202.8 
 216.3 
 133-8 
 
 23-5 
 
 24.8 
 
 88.9 
 
 70.2 
 34-3 
 
 1.7 
 
 seventh grade between Starch's Silent Reading Tests and the Ex- 
 perimental Reproduction Tests. 
 
 The rate scores for all of the tests are expressed in terms of words 
 per minute. However, in the case of Monroe's Standardized Silent 
 Reading Tests, The Cross-Out Silent Reading Tests, and the Pressey 
 Silent Reading Test, the pupil does not do continuous reading. He 
 must stop frequently to give responses. This, naturally, tends to 
 reduce the rate scores. This is clearly shown in Table V. The rate 
 scores for these tests are in most cases considerably less than rate 
 scores in tests where the pupil does continuous reading. The differ- 
 ence is less marked in the seventh grade than in the fourth. 
 
 In Fordyce's Scale for Measuring Achievement in Reading, the 
 pupil reads continuously, but the time allowance is such that a ma- 
 jority of the pupils complete the reading. Thus, they do not have 
 an opportunity to give evidence of their rate of reading. This is the 
 principal reason why the average rate scores for Fordyce's Tests are 
 smaller than for the other tests in which the pupil does continuous 
 reading. 
 
 The standard deviations also exhibit differences. Differences 
 in the magnitude of the units would naturally aifect the standard 
 deviations as well as the averages. The standard deviation is also 
 affected by the shape of the distribution. In a number of cases, the 
 
distribution of scores does not approximate the normal shape. This 
 is, doubtless, one factor affecting the differences between the stand- 
 ard deviations. 
 
 Equivalence of duplicate forms The facts given in Tables IV 
 and V indicate that the forms of these tests are not equivalent. In 
 some cases an effort was made to construct the different forms so 
 that they would be equivalent. This is true of Monroe's Standard- 
 ized Silent Reading Tests. A study^' planned to determine the de- 
 gree of equivalence of these tests has indicated very definitely that 
 they are not equivalent. The degree of non-equivalence revealed 
 by that study is approximately that which is indicated here. The 
 two forms of the Experimental Reproduction Tests, which were 
 constructed without any preliminary study to determine their equiv- 
 alence, appear to be as nearly equivalent as those of any other 
 test in the list, as far as the rate is concerned. In the case of com- 
 prehension, there is considerable difference between the average 
 scores. The two forms of the Cross-Out Tests were also constructed 
 without much regard to equivalence and the average scores differ 
 widely in most cases. 
 
 There is no published statement concerning the procedure 
 followed by the authors of the other tests in order to secure 
 equivalence of the duplicate forms. The average scores for the 
 Courtis Silent Reading Test No. 2 do not differ widely. In fact, 
 the two forms of this test appear to be the most nearly equiv- 
 alent of any of the tests studied. The two Starch tests, No. 6 
 and No. 7, were not intended by the author to be equivalent 
 forms. No. 7 (Form I) was intended to be more difficult, and 
 lower average scores are, therefore, to be expected. This is what 
 we find, except for the word-counting method of describing the 
 reproductions. It is, however, obvious that it is difficult, or im- 
 possible, to construct duplicate forms which will be essentially equiv- 
 alent, especially in the case of a small group of pupils. In addition 
 to any lack of equivalence which may exist, the practise effect, due to 
 one form being given after the other, would tend to produce dif- 
 ferences between the average scores. The amount of this practise 
 effect was not studied, since it was not pertinent to the major prob- 
 lem. 
 
 "Monroe, Walter S. Report of Division of Educational Tests for 1919-20. Uni- 
 versity of Illinois Bulletin, Vol. XVIII, No. 21, Page 19. 
 
 29 
 
Relation of vocabulary to difficulty. In an effort to determine 
 whether the vocabulary of a selection tends to determine its diffi- 
 culty, the selections read by pupils in tests requiring reproduction 
 were analyzed. All the words occurring in each selection were listed 
 and the frequency of each one determined. The number of words 
 in each selection not occurring in Ayres' list of one thousand words 
 was also determined. In the case of the selections which formed 
 duplicate tests, the vocabularies were compared, and the number 
 of words common to the two selections was found. The results of 
 this study are given in Table VI. For the Courtis Silent Reading 
 Test, No. 2, 16 percent of the vocabulary in Form 1 and 19 percent 
 of the vocabulary in Form 2 are not found in the Ayres' list. The 
 number of different words, or the vocabulary, of Form 1 is 37 per- 
 cent of the length of the selection. This means that, on the average, 
 each word in the selection is used nearly three times. In the case 
 of Form 3, the number of different words is 44 percent 
 of the total number of words in the selection. The number of words 
 common to the two selections is 15 percent of the average number 
 of words in the two selections. These facts show that for these two 
 forms of the Courtis Silent Reading Test, No. 2, the two selections 
 are approximately equivalent with respect to the percent of words 
 not found in the Ayres' list. Form 3 contains a slighdy larger 
 percent of words not in this list. Such words will, in general, be 
 unusual words unless they are proper names. Form 3 has a rela- 
 
 TABLE VI. ANALYSIS OF SELECTIONS READ BY PUPILS IN 
 SILENT READING TESTS REQUIRING REPRODUCTION 
 
 Test 
 
 Courtis I 
 
 Courtis III 
 
 Starch, No. 6 
 
 Starch, No. 7 
 
 Brown 
 
 Long Slide 
 
 Morning Adventure. . 
 
 Old English 
 
 Heroes I 
 
 II 
 
 The Strike at Shane's I . 
 II 
 
 Words not 
 
 in Ayres' 
 
 list. 
 
 .16 
 • 19 
 
 ■30 
 ■31 
 
 ■13 
 ■14 
 
 19 
 19 
 
 19 
 19 
 
 Different 
 words. 
 
 •37 
 •44 
 
 •55 
 •59 
 
 •37 
 •35 
 
 43 
 .44 
 
 ■44 
 
 •52 
 
 Words 
 common to 
 both selec- 
 tions. 
 
 .15 
 
 •13 
 
 •13 
 .13 
 
 30 
 
cively larger vocabulary, and makes a greater demand upon a pupil's 
 acquaintance with words. The percent of words which are common 
 to the two selections is surprisingly small in view of the simple char- 
 acter of the material and of the fact that the two selections are con- 
 sidered equivalent in difficulty. 
 
 A comparison of the facts contained in Table VI with those in 
 Tables IV and V indicates that the explanation for the non-equiv- 
 alence of the two forms of the same test is not to be found in the 
 vocabularies of the two selections in the respective tests. Evidently, 
 the difficulty of a selection is determined by some factor other than 
 the actual words used. 
 
 Formation of composite scores. The scores yielded by the 
 different tests are expressed in terms of different scales. Therefore, 
 it is necessary to reduce them to a common scale before combining 
 them to form composite scores. The procedure adopted was to 
 choose as a base the scale of Monroe's Standardized Silent Read- 
 ing Test I, Form i, for the fourth grade and the scale of Test II, 
 Form I, for the seventh grade. All other scores were reduced to 
 the scale of these tests. The formula for reducing the scores ob- 
 tained from one scale to equivalent scores on another scale is as 
 
 follows : (J. fj. 
 
 Si = ^S2+(Avi- — Av2) 
 0"2 (T^ 
 
 In this formula, S2 is the obtained score on Form 1 and Si is the 
 equivalent score expressed in terms of the scale of Form i . Avi re- 
 fers to the average of the scores obtained from Form i ; Av2 refers 
 to the average of the scores obtained from Form 1. The standard 
 deviation of the distribution of the Form i scores iscTi, and Gt is the 
 standard deviation of the distribution of the Form 2 scores. This 
 formula is based upon the usual assumption that corresponding 
 deviations from averages are equal when expressed in terms of the 
 standard deviation of the distribution; in other words, that 
 Si - Av i S2 - Av 2 
 
 (Tx ~~ (72 
 
 When this equation is solved for Si we obtain the formula as given 
 above. The application of the above formula involves the deter- 
 mination of the numerical value of the ratio of — by which the Form 
 
 1 score is to be multiplied and the determination of the numerical 
 equivalent of the constant term of the formula (i. e., of the expression 
 in parentheses). This latter numerical equivalent may be plus or 
 
 31 
 
minus. When it is positive it is to be added and when negative it 
 is to be subtracted. 
 
 After the scores were reduced to the same scale composite 
 scores were formed by calculating the averages of certain groups of 
 scores. Composite AI is the average of Monroe, Form i (compre- 
 hension), Courtis, Form i (answers correct), and Reproduction, 
 Form I (answers correct). (In the seventh grade the Courtis Test 
 was not given and this composite score includes only the other two 
 tests.) Composite All is obtained from the second form of these 
 tests. ^^ Composite BI is the average of Brown's Silent Reading 
 Test (both quality and quantity scores), and the Experimental Re- 
 production Tests (ideas and words). In the seventh grade, Starch's 
 Silent Reading Test-'* (ideas'and words) is used in the place of Brown's 
 test. Composite CI is the average of Composite AI and Composite 
 BI. Composite BII and CII were obtained in a corresponding way 
 from the second forms of these tests. Composite I is obtained by 
 combining all Form i scores. Composite II is obtained by combin- 
 ing all Form 2 scores. 
 
 Reliability. Since, with the exception of Fordyce's Scale for 
 the Measurement of Achievement in Reading, two forms of each 
 test were given, it is possible to compute measures of the extent to 
 which equivalent scores were yielded by the different forms of a 
 test. It is also possible to compute the probable error of measure- 
 ment which is a measure of the magnitude of the departuresof the ob- 
 tained scores from the corresponding true scores. ^^ These departures 
 are the variable errors of measurement. No account is taken of the 
 constant error of measurement in the following discussion. In the 
 case of the tests for which the scoring is subjective, the computed 
 reliability is greater than the true reliability for the reason that the 
 averages of two independent scorings were used instead of the scores 
 assigned by one person. ^^ 
 
 Methods of determining reliability. In Tables VII and VIII, 
 the reliability of these tests is described in terms of four quantities, 
 (i) The coefficient of reliability is represented by the symbol, rx2, 
 and is the coefficient of correlation between the two sets of scores 
 yielded by the two forms of the test. (2) The index of reliability 
 is represented by the symbol, n . This quantity is the coefficient 
 
 *^In the case of the Courtis Tests, Form 3 was used instead of Form 2. 
 **No. 7 is Form i and No. 6, Form 2. 
 
 *°A true score is defined as the average of the scores yielded by a large num- 
 ber of duplicate forms of a test. 
 
 ^''See page 17 for the exact method used. 
 
 32 
 
of correlation between one set of obtained scores and the set of cor- 
 responding true scores. The relation between the index of relia- 
 bility and the coefficient of reliability is expressed by Vu = V^n- 
 This formula was used in calculating the indices of reliability given 
 in these two tables. (3) The probable error of measurement is 
 represented by the symbol, P.E.m. This q uantity was calculated 
 by the formula, P.E.m = .6745 (Ta/i — ri2." 
 
 The probable error of measurement (P. E.m) is a measure of the vari- 
 able errors of measurement, or the differences between the obtained 
 scores and the corresponding true scores. (4) The ratio of the 
 probable error of measurement to the average of the scores from 
 
 P.E.M 
 
 which it was calculated is represented by the symbol, — ' ' . Table 
 
 VII gives information concerning the reliability of rate scores and 
 Table VIII, the corresponding information for comprehension scores. 
 In case the test was scored by more than one method, the information 
 is given for all methods of scoring. 
 
 Probable error of r due to sampling. The coefficients of cor- 
 relation, given in Tables VII and VIII and in the following tables, 
 are subject to an error of sampling when interpreted with respect 
 to the existence of relationship between the two sets of data from 
 which they were derived. All of the correlations in the following 
 tables are based on 80 cases in the fourth grade and 91 in the seventh 
 
 TABLE VII. MEASURES OF RELIABILITY, RATE 
 
 Test 
 
 Grade IV 
 
 rit 
 
 P.E.M 
 
 P.E.M 
 
 Av. 
 
 Grade VII 
 
 rit 
 
 P.E.M 
 
 P.E.M 
 
 Av. 
 
 Monroe I-II. . 
 Monroe I-III. 
 Monroe I I-II I 
 
 Courtis 
 
 Brown 
 
 Starch 
 
 Reproduction. 
 
 Cross-Out 
 
 Pressey 
 
 .76 
 .64 
 
 .85 
 
 74 
 
 .87 
 .80 
 .82 
 
 .92 
 •93 
 
 .86 
 
 "•3 
 13.6 
 II. 8 
 
 19-3 
 26.0 
 
 39-5 
 14.4 
 
 •13 
 
 • 15 
 .12 
 
 ■13 
 •15 
 
 .26 
 .18 
 
 .63 
 ■55 
 .69 
 
 .62 
 
 ■45 
 .76 
 ■50 
 
 ■79 
 ■74 
 •83 
 
 ■79 
 ■67 
 .87 
 •71 
 
 17.0 
 16.6 
 12.3 
 
 44.8 
 
 56.6 
 
 15^8 
 
 I .1 
 
 .11 
 .11 
 
 .09 
 
 ■23 
 .26 
 
 13 
 
 •05 
 
 ^For explanation of this formula and the method of application see page 24. 
 
 33 
 
TABLE VIII. MEASURES OF RELIABILITY, COMPREHENSION 
 
 Test 
 
 Grade IV 
 
 P.E.M 
 
 P.E.M 
 
 Av. 
 
 Grade VII 
 
 P.E.M 
 
 P.E.M 
 
 Av. 
 
 Monroe I-II 
 
 Monroe I-III 
 
 Monroe II-III 
 
 Courtis, Index 
 
 Courtis, No. of Questions 
 Courtis, No. of Questions 
 Correct 
 
 Brown, Quantity 
 
 Brown, Quality 
 
 Brown, Average 
 
 Brown, Efficiency 
 
 Brown, Words 
 
 Brown, Ideas 
 
 Starch, Words 
 
 Starch, Ideas 
 
 Reproduction, Questions, 
 
 Reproduction, Ideas 
 
 Reproduction, Words 
 
 Reproduction, Questions 
 and Words 
 
 Cross-Out C-W 
 
 Cross-Out 
 
 C+0 
 
 Pressey 
 
 Memory, Ideas 
 
 Memory, Words 
 
 •54 
 ■50 
 
 •44 
 .62 
 
 ■52 
 
 •35 
 .40 
 
 76 
 
 • 69 
 
 •73 
 ■71 
 
 •63 
 
 • 79 
 
 .72 
 
 •59 
 •63 
 
 3.0 
 3^4 
 
 2.5 
 
 9 9 
 
 4.6 
 
 5-7 
 
 14.2 
 
 13-1 
 16.6 
 20.1 
 
 13-7 
 6.6 
 
 19 
 19-5 
 
 4.0 
 
 3-4 
 19.9 
 
 4-3 
 12.8 
 
 .20 
 •37 
 • 41 
 
 •30 
 
 •47 
 ■47 
 
 .18 
 ■17 
 
 .69 
 .60 
 .61 
 
 •83 
 
 • 77 
 .78 
 
 5^2 
 5.6 
 
 5^2 
 
 • 17 
 
 77 
 72 
 
 60 
 
 72 
 87 
 
 64 
 
 67 
 52 
 
 •65 
 
 •56 
 •34 
 
 •75 
 
 9^7 
 4.8 
 
 1-9 
 
 33 
 13 3 
 
 •25 
 • 27 
 
 .20 
 •15 
 .19 
 
 .26 
 .21 
 
 14 
 
 .10 
 .13 
 
 grade. In order to economize space, we give in Table IX probable 
 errors due to sampling for various values of r. Most of the coeffic- 
 ients of correlation appearing in these tables are sufficiently large 
 in comparison with the probable error due to sampling that they 
 may be interpreted as indicating the existence of a distinct positive 
 relationship. We are, however, more interested in securing a meas- 
 ure of the departure from perfect correlation. Hence, the probable 
 error of measurement (P.E.m) is a much better index of the degree 
 of reliability of a test than either ri2 or ru. 
 
 Reliability of the tests studied. Brown's Silent Reading Test,^ 
 when scored in the way which he recommends, is the least reliable. 
 The ratio of the probable error of measurement to the average is .54 
 for the quality and .55 for the average of quantity and quality. 
 
 34 
 
TABLE IX. PROBABLE ERRORS OF THE COEFFICIENT OF CORRELATION (r,a) 
 DUE TO USING A LIMITED NUMBER OF CASES* 
 
 ri2 
 
 P. E. 
 
 .1 
 
 0798 
 
 
 2 
 
 0774 
 
 
 3 
 
 4 
 5 
 6 
 
 65 
 
 0734 
 0677 
 0605 
 0516 
 0466 
 
 
 70 
 
 0411 
 
 
 75 
 80 
 
 85 
 
 0353 
 
 0290 
 0224 
 
 
 90 
 
 0153 
 
 
 95 
 
 0079 
 
 *8o in Grade IV and 91 in Grade VII. 
 
 The "efficiency score" has a ratio of .43. The scoring of this test 
 by means of either the idea-counting method or the word-counting 
 method results in scores that are more reliable. Considering both 
 rate and comprehension, the most reliable test is the Courtis Silent 
 Reading Test, No. 2. For rate, the index of reliability is .92 and the 
 ratio of the probable error of measurement to the average is .13. 
 Three comprehension scores are used in connection with this test. 
 The number of questions answered is shown to be the most reliable. 
 The probable error of measurement and the ratio of the probable 
 error of measurement to the average score indicate a degree of 
 reliability for the rate scores yielded by Monroe's Standardized 
 Silent Reading Tests which is surprisingly high, considering the 
 character of the tests. This is particularly true in the seventh 
 grade. With the exception of Pressey's Silent Reading Test, they 
 are the most reliable. In the fourth grade, the reliability is exceeded 
 only by Courtis's Silent Reading Test, No. 2. In Monroe's Stand- 
 ardized Silent Reading Tests a pupil does not read continuously but 
 is forced to stop at the end of each exercise and answer a question. 
 According to the rules for scoring these tests, a pupil receives no 
 credit for an exercise unless he has completed his reading of it to the 
 extent of recording his answer. The increments added to a rate 
 score for doing additional exercises are relatively large, particularly 
 in Test II. Thus, a pupil who has failed only in recording his an- 
 swer to an exercise receives a score which does not indicate his rate 
 of reading. His score is the same as that of the pupil who has just 
 barely completed the preceding exercise. In all of the other tests 
 with the exception of Pressey's Silent Reading Test, the pupil's 
 rate score represents the actual amount read. In view of these facts 
 
 35 
 
it is surprising to find that Monroe's Standardized Silent Reading 
 Tests yield rate scores which have such a high degree of reliability. 
 The figures which are given may be affected somewhat by the fact 
 that these tests proved too short and a considerable number of pu- 
 pils made perfect scores. 
 
 In general, the degree of reliability is higher in the seventh 
 grade than in the fourth. Exact comparisons cannot be made be- 
 cause identical tests were not given in the two grades; but where 
 similar tests were given the results for the seventh grade show a dis- 
 tinctly higher degree of reliability. This may be due to a superior- 
 ity in the tests for the seventh grade or it may be due to the fact that 
 the increased maturity of the pupils causes them to be less variable 
 in their performances. 
 
 The degree of unreliability shown in Tables VII and VIII is 
 distressingly high. As we have indicated, the ratio of the probable 
 error of measurement to the average probably furnishes the most 
 significant statement of the degree of unreliability. Brown's Test, 
 scored by any method, appears to be so highly unreliable that it 
 should be rejected. In interpreting the figures in Table VIII it 
 should be borne in mind that the actual degree of unreliability is some- 
 what larger than that indicated because the element of subjectivity 
 in scoring has been largely eliminated. It appears that individual 
 scores yielded by these tests are very imperfect measures of reading 
 ability. However, the variable errors involved do not affect, to the 
 same degree, the scores of classes or larger groups. Although the 
 scores yielded by these tests must be considered as having only a 
 very limited significance in the case of individual pupils, they are 
 much more significant for groups of pupils. 
 
 Both the Experimental Reproduction Tests and the Cross-Out 
 Tests were merely experimental. The reproduction tests were in- 
 tentionally so. It was desired to ascertain whether a crude repro- 
 duction test, such as might be constructed by a teacher and ad- 
 ministered directly from a supplementary reader, would yield results 
 as reliable as tests more carefully constructed and more conveniently 
 arranged. These tests are shown to be among the least reliable, 
 with the exception of Brown's Silent Reading Test. This is to be 
 expected; but the difference in reliability, particularly in the seventh 
 grade, is not marked. In fact, the Experimental Reproduction 
 Tests exhibit a relatively high degree of reliability in the measure- 
 ment of comprehension. Thus, the reliability of a crude test of 
 
 36 
 
this type is only slightly less than that of tests whose construction 
 was more refined. 
 
 Discrimination. The distributions of the rate scores yielded by 
 the different tests indicate that certain tests fail to yield scores which 
 discriminate between a number of pupils with respect to rate of 
 reading. Form 3 of Monroe's Standardized Silent Reading Tests 
 I and II is clearly too short. In the seventh grade 58 percent of the 
 pupils and in the fourth grade 27 percent completed the test. All 
 such pupils received the maximum rate score. The distributions; 
 for Forms i and 2 of this test contain no such extreme deviations 
 from the normal shape, although Form 2 of Test I and Form i of 
 Test II cannot be said to approximate closely the normal distri- 
 bution. 
 
 The Cross-Out Tests yield distributions which exhibit many 
 irregularities and which cannot be said to do more than suggest 
 the normal distribution. As was to be expected, a large percent of 
 the pupils completed the reading of ♦•he selection in the case of the 
 Fordyce test. Forty-nine percent of the pupils in the fourth grade 
 and 29 percent in the seventh grade received the maximum rate 
 score. The Pressey Test proved too short for the time allowed. 
 Seventy-six percent completed Form i and 56 percent completed 
 Form 2. The Courtis, Brown, Starch, and Experimental Repro- 
 duction Tests yielded rate scores which formed distributions closely 
 approximating the normal shape. A few irregularities were exhibited 
 by the Experimental Reproduction Tests and by Brown's test. 
 
 As judged by the shape of the distribution of the rate scores> 
 the Courtis Silent Reading Test, No. 2, exhibits the least lack of 
 discrimination. The Cross-Out, Pressey, Fordyce, and Form 3 of 
 Monroe's tests exhibit such great departures from the normal dis- 
 tribution that they must, obviously, fail to discriminate properly 
 with respect to the rate of reading for a considerable number of 
 pupils. 
 
 In the case of comprehension, the distributions of scores for 
 Monroe's Standardized Silent Reading Tests closely approximate 
 the normal. The third form appears to have been a little too easy;, 
 but, in other respects, the irregularities exhibited by the distribu- 
 tions cannot be considered to indicate a serious lack of discrimina- 
 tion. The index of comprehension for the Courtis Silent Reading 
 Test, No. 2, fails to discriminate properly between a number of pupils. 
 Both the number of questions answered and the number of questions 
 answered correctly approach more nearly the normal distribution.. 
 
 37 
 
TABLE X. CORRELATIONS WITH TEACHER RATING 
 
 Test 
 
 Rate 
 
 Grade IV Grade VII 
 
 Comprehension 
 
 Grade IV 
 
 Grade VII 
 
 Monroe I. . . 
 Monroe II. . 
 Monroe III. 
 
 Court 
 Court 
 Court 
 Court 
 Court: 
 Court 
 Court 
 Court 
 
 s I Index 
 
 s I Questions 
 
 sl Questions Correct., 
 s I Words per minute. . 
 
 sIII Index. 
 
 s III Questions 
 
 s III Questions Correct, 
 s III Words per minute. 
 
 .38 
 .34 
 • 43 
 
 •SI 
 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 Brown 
 
 I Quantity 
 
 I Quality 
 
 I Average 
 
 I Efficiency 
 
 I Words 
 
 I Ideas 
 
 I Words per minute. 
 
 II Quantity 
 
 II Quality 
 
 II Average 
 
 II Efficiency 
 
 II Words 
 
 II Ideas 
 
 II Words per minute. 
 
 Starch I Words 
 
 Starch I Ideas 
 
 Starch I Words per minute. 
 
 Starch II Words 
 
 Starch II Ideas 
 
 Starch II Words per minute. 
 
 Reproduction I Questions 
 
 Reproduction I Ideas 
 
 Reproduction I Words 
 
 Reproduction I Words per minute. 
 
 Reproduction II Questions 
 
 Reproduction II Ideas 
 
 Reproduction II Words 
 
 Reproduction II Words per minute. 
 
 Cross-Out I 
 
 Cross-Out I C-W 
 
 C-W 
 
 C+0 
 
 Cross-Out I Words per minute. 
 Cross-Out II C-W 
 
 C-W 
 Cross-Out II 
 
 c+o 
 
 Cross-Out II Words per minute. 
 
 Fordyce. 
 
 Pressey I. . 
 Pressey II. 
 
 Composite AI . . 
 Composite All. 
 Composite BI. . 
 Composite BII. 
 Composite CI. . 
 Composite CII. 
 Composite I. . . 
 Composite II. . 
 
 .36 
 
 .32 
 
 .36 
 
 . 19 
 
 • 41 
 
 .26 
 
 ■ 55 
 • Sl 
 
 .29 
 
 .08 
 
 .60 
 .64 
 .63 
 
 ■ 29 
 . 29 
 
 • 41 
 
 • 45 
 
 • 38 
 •SI 
 
 .58 
 .60 
 .44 
 
 • 34 
 
 • 59 
 
 • 56 
 
 •32 
 .50 
 • 39 
 
 .46 
 
 • 46 
 
 • 51 
 
 .49 
 
 • 34 
 
 • 47 
 
 • 23 
 
 • SI 
 
 • 49 
 .46 
 
 . 21 
 .27 
 
 .46 
 
 • 37 
 
 .40 
 
 • 55 
 •S3 
 .58 
 .51 
 .63 
 
 • 58 
 
 • 58 
 
 38 
 
This is particularly true of the latter. The distributions for the 
 Brown, Starch, and Experimental Reproduction Tests exhibit many- 
 irregularities; but there is in all cases a distinct resemblance to the 
 normal distribution. A few of the distributions approach very 
 closely the normal one. Others contain rather marked departures 
 from it. In the case of Brown's test, the distributions for the quality 
 scores exhibit greater departures thant he distributions for the quan- 
 tity scores. 
 
 Comparison with teachers' ratings. All scores, both rate and 
 comprehension, were correlated with the ratings in silent reading 
 given by the teacher. The coefficients of correlation were cal- 
 culated, also, for certain composite scores. These coefficients of 
 correlation are given in Table X. With the exception of one coeffi- 
 ient for the second form of Brown's test, all coefficients are positive 
 and in general sufficiently large to indicate a distinct positive re- 
 lationship between the test scores and the teachers' ratings. Rate 
 of reading correlates more highly with the teachers' rating in the 
 fourth grade than in the seventh. For rate, the average of the 
 coefficients, not including the composite scores, is 43 in the fourth 
 grade and 26 in the seventh. The average of the coefficients for 
 comprehension, not including the composite scores, is 40 in the 
 fourth grade and 44 in the seventh. 
 
 In the fourth grade, comprehension, as measured by Monroe's 
 Standardized Silent Reading Tests, correlates most highly with the 
 teachers' ratings. In fact, the coefficients for the three forms of 
 this test equal or exceed all of those for the composite scores. In the 
 seventh grade this test does not exhibit as high correlations with 
 teachers' ratings. Neither do its rate scores correlate as highly 
 with teachers' ratings as the rate scores yielded by some other tests. 
 It is interesting to note that the correlation between the second form 
 of Brown's Test for "quantity of reproduction" and "quality of 
 reproduction" is essentially zero. For Form i the correlations for 
 these two scores are lower than the correlations for any other scores. 
 This suggests that Brown's method for scoring his test is undesirable. 
 The correlations of the composite scores with teachers' ratings in- 
 dicate that, in the fourth grade, teachers judge silent reading ability 
 more on the basis of the pupils' ability to answer questions than of 
 their ability to reproduce. In the seventh grade, the teachers give 
 greater weight to the pupils' ability to reproduce or to tell what has 
 
 been read. 
 
 Correlation of comprehension with memory. In those tests 
 which require the pupil to answer questions from memory or to 
 
 39 
 
TABLE Xr. CORRELATION OF COMPREHENSION WITH MEMORY 
 
 Test 
 
 Brown I Quantity 
 
 Brown II Quantity 
 
 Brown I Quality 
 
 Brown II Quality 
 
 Starch I Ideas 
 
 Starch II Ideas 
 
 Starch I Words 
 
 Starch II Words 
 
 Reproduction I Questions 
 Reproduction II Questions 
 
 Reproduction I Ideas 
 
 Reproduction II Ideas 
 
 Reproduction I Words 
 
 Reproduction II Words. . . , 
 
 Monroe I 
 
 Monroe II 
 
 Monroe III 
 
 Maximum 
 
 Minimum 
 
 Average 
 
 Grade IV 
 
 Ideas 
 
 •32 
 
 .27 
 
 .36 
 • 19 
 
 II 
 
 29 
 
 Words 
 
 ■ 39 
 
 .23 
 
 .36 
 • 14 
 
 II 
 
 28 
 
 Grade VII 
 
 Ideas 
 
 II 
 
 Words 
 
 II 
 
 •31 
 
 •25 
 
 ■47 
 •34 
 
 .26 
 .20 
 
 .36 
 •35 
 
 ■33 
 ■39 
 
 •35 
 •24 
 .26 
 
 ■47 
 .20 
 
 • 32 
 
 TABLE XII. CORRECTED COEFFICIENTS OF CORRELATION OF 
 COMPREHENSION WITH MEMORY 
 
 Test 
 
 Grade IV 
 
 Ideas 
 
 Words 
 
 Grade VII 
 
 Ideas 
 
 Words 
 
 Brown Quantity 
 
 Brown Quality 
 
 Starch Ideas 
 
 Starch Words 
 
 Reproduction Questions 
 Reproduction Ideas. . . . 
 Reproduction Words. . . 
 
 Monroe I-II 
 
 Monroe I-III 
 
 Monroe II-III 
 
 .67 
 .68 
 
 .66 
 ■54 
 
 40 
 
reproduce the passage read, it would seem that a pupil's ability to 
 remember would materially affect his comprehension score. In 
 order to ascertain the extent to which ability to remember does affect 
 the comprehension score yielded by such tests, the pupils were given 
 the memory test^^ described on page 7. In this test a selection was 
 read to the pupils and they were asked to reproduce the story from 
 memory. The coefficients of correlation between the memory 
 scores and the comprehension scores for silent reading tests are given 
 in Table XI. It is significant that none of these coefficients are 
 large. The first three tests listed in this table require the pupil to 
 give his pe rformances from memory. Monroe's Standardized Silent 
 Reading Tests do not appear to make any considerable demand 
 upon the pupil's memory; he has the passage before him and can read 
 it and re-read it if he desires. If any memory is involved it is im- 
 mediate in character. It is significant that the coefficients of cor- 
 relation for this test closely approximate those for other tests. 
 
 Corrected coefficients of correlation. The measures yielded 
 by these tests involve variable errors. It has been shown in our 
 consideration of the reliability of these tests that these errors are 
 relatively large for the reproduction tests. The presence of these 
 variable errors tends to reduce the coefficients of correlation, and it 
 is possible that the coefficients of correlation given in Table XI do 
 not represent the true relation between comprehension and memory. 
 
 When two forms of both tests have been given to the same pupils 
 it is possible to compute a corrected coefficient of correlation which 
 is free from the effect of the variable errors of measurement. This 
 has been done by means of the following formula :2^ 
 
 's/(rpiqj) (rpiqi) 
 \/(tpiPi) (rqiqs) 
 
 rpq here indicates the true correlation between two series of measures, 
 
 p and q, of the facts A and B. 
 Pi and P2 are two independent measures of A. 
 qi and q2 are two independent measures of B. 
 rpiq.is the correlation obtained from the first measure of A and the 
 
 second measure of B. 
 rpiqi is the correlation obtained from the second measure of A and 
 
 the first measure of B. 
 
 ^*It is assumed that this test measures ability to remember. 
 
 "Thorndike, E. L. "An Introduction to Mental and Social Measurements." 
 New York. Teachers College, Columbia University, 1916. Page 179. 
 
 41 
 
rpiPs is the correlation between the two measures of A. 
 rqiq2 is the correlation between the two measures of B. 
 
 In applying this formula the factors of the numerator are ob- 
 tained from Table XI. For example, in calculating the corrected 
 coefficient of correlation for Brown's Silent Reading Test with 
 memory, rpiqj is the coefficient of correlation of Brown I with Mem- 
 ory II. This is given as .21. The coefficient of correlation of Brown 
 II with Memory I, is rptqi. This is given as .27. The factors of the 
 denominator are the reliability coefficients of the two tests. These 
 are to be found in Table VIII. They are .36 for Brown's Silent 
 Reading Tests and .35 for the Memory Tests. Substituting these 
 values in the formula, 
 
 V-2I X .27 
 
 Tpq = 
 
 V-36 X .35 
 
 = V.45 
 
 = .67 
 This is the first entry of the first column of Table XII. 
 
 A study of the corrected coefficients given in Table XII indi- 
 cates that, in the case of the Experimental Reproduction Tests in 
 the fourth grade, the correlation between Memory and the scores 
 based upon the pupil's reproduction is very high. For ideas it is 
 .97. For words it is .88. For Brown's Silent Reading Tests the 
 correlation is not as high. In fact, it closely approximates that for 
 Monroe's Standardized Silent Reading Tests. In the seventh grade 
 the correlation of Memory with Monroe's Standardized Silent Read- 
 ing Tests is higher than that for either Starch or the Experimental 
 Reproduction Tests, although the difference is not marked in the 
 case of the latter. It, therefore, appears that in the seventh grade 
 memory is not a major factor in determining the comprehension 
 scores of tests which require reproduction unless it is also the de- 
 termining factor in the case of tests which do not appear to involve 
 memory. The statement which has been made with reference to 
 reproduction tests, that they measure the ability to read ayjd re- 
 member^ does not appear to be justified by the facts which are pre- 
 sented here. 
 
 Correlation of comprehension with vocabulary. In Table XIII, we 
 give the coefficients of correlation between the comprehension scores 
 and the scores obtained from the vocabulary test. In the fourth 
 grade most of the coefficients are negative, but all of them cluster 
 closely around zero. This means that, measured by the tests used, 
 
 42 
 
TABLE XIII. 
 
 COEFFICIENTS OF CORRELATION BETWEEN VOCABULARY AND 
 COMPREHENSION 
 
 Test 
 
 Monroe I . . . 
 Monroe II. . 
 Monroe III. 
 
 Courtis I Index 
 
 Courtis I No. of Questions. . 
 Courtis I Questions Correct. 
 
 Courtis III Index 
 
 Courtis III No. of Questions. . 
 Courtis III Questions Correct. 
 
 Starch I Words . 
 Starch I Ideas. . 
 Starch II Words. 
 Starch II Ideas. . 
 
 Brown I Quantity. 
 Brown I Quality. . 
 Brown I Average. . 
 Brown I Words. . . 
 Brown I Ideas. . . . 
 
 Brown II Quantity. 
 Brown II Quality. . 
 Brown II Average. 
 Brown II Words. . . 
 Brown II Ideas 
 
 Reproduction I Questions. 
 
 Reproduction I Ideas 
 
 Reproduction I Words. . . . 
 Reproduction I Questions. 
 
 Reproduction I Ideas 
 
 Reproduction I Words 
 
 Cross-Out 
 Cross-Out 
 
 Cross-Out II C-W. 
 Cross-Out II C-W . 
 C+0 
 
 Fordyce 
 
 Pressey I . 
 Pressey II . 
 
 Composite AI. . 
 Composite All. 
 Composite BI. . 
 Composite BII. 
 Composite CI. . 
 Composite CII. 
 Composite I. . . 
 Composite II. . 
 
 Grade IV Grade VII 
 
 .02 
 
 -•03 
 
 -.02 
 
 -.20 
 
 • 19 
 .10 
 
 -.20 
 .06 
 ■15 
 
 -.11 
 -.12 
 
 •14 
 .01 
 
 -.04 
 
 -•23 
 -.21 
 -.16 
 
 -15 
 -.19 
 
 -•15 
 -.10 
 -.09 
 .12 
 -.04 
 -.04 
 
 -.07 
 -.05 
 
 .09 
 .02 
 
 .04 
 
 .22 
 .22 
 •13 
 
 •31 
 •31 
 .29 
 .22 
 
 •14 
 •17 
 •13 
 •19 
 •24 
 .26 
 
 .18 
 .08 
 
 .16 
 
 .01 
 
 •13 
 
 — 
 
 .21 
 
 
 
 .00 
 
 -.02 
 
 ■23 
 
 .01 
 
 .20 
 
 -.08 
 
 •32 
 
 -.20 
 
 .28 
 
 -.05 
 
 .26 
 
 -13 
 
 •25 
 
 
 •30 
 
 
 .21 
 
 43 
 
there is no relation between a pupil's vocabulary and his ability to 
 read. It is, of course, obvious that, in order to read, a pupil must 
 be acquainted with words. It is, therefore, impossible to believe 
 that vocabulary is not a factor in the reading process. The facts 
 presented here probably mean that, in the fourth grade, vocabulary 
 is not a determining factor and the pupil's ability to read depends 
 primarily upon abilities other than the extent of his acquaintance 
 with words. In the seventh grade the coefficients are all positive 
 but none of them are large. This probably means that, in the sev- 
 enth grade, vocabulary is a minor factor in determining the pupil's 
 comprehension. It is, of course, possible that the vocabulary test 
 used does not measure the extent of a pupil's acquaintance with 
 words. 
 
 Correlation of cancellation scores with measures of rate of reading. 
 In Table XIV, the coefficients of correlation for the scores yielded 
 by the Cancellation Test with measures of rate of silent reading are 
 given. With few exceptions, these coeificients are positive but small. 
 In general, they are slightly smaller in the seventh grade than in 
 the fourth grade. In most cases, there does not seem to be any 
 marked relationship between ability to do the Cancellation Test and 
 the rate of silent reading. One might expect a distinct positive re- 
 lationship between the Cross-Out Silent Reading Tests and the 
 Pressey Silent Reading Tests. It does, however, appear that the 
 relationship which exists with respect to these tests is greater than 
 that which exists for Monroe's Silent Reading Tests. 
 
 The table also includes coefficients of correlation for the scores 
 yielded by the Cancellation Test with the comprehension scores 
 yielded by the Cross-Out Tests. The coefficients are, likewise, small, 
 two of them being slightly negative. It appears, therefore, that the 
 ability to strike out letters from words is not related to the ability 
 called for by the Cross-Out Tests. 
 
 Correlation of comprehension with written composition. An- 
 other measure of a pupil's vocabulary is secured from his written 
 composition. The pupils in the seventh grade were asked to write a 
 composition on an exciting experience. (See page 10.) In Table 
 XV, we give the coefficients of correlation between measures of com- 
 prehension and two measures of these written compositions, the 
 number of words written and the story value. The number of words 
 which a pupil writes in such an exercise is, undoubtedly, an index 
 of his writing vocabulary. It is, of course, possible that his writing 
 
 44 
 
TABLE XIV. CORRELATION OF CANCELLATION SCORES WITH MEASURES 
 OF RATE OF READING AND WITH THE CROSS-OUT TESTS 
 
 Test 
 
 Grade IV 
 
 Cancellation 
 
 IV 
 
 Grade VII 
 
 Cancellation 
 
 II 
 
 Monroe I . . . 
 Monroe II. . 
 Monroe III. 
 
 Courtis I . . . 
 Courtis III. 
 
 Brown I . . 
 Brown II . 
 
 Starch I . . 
 Starch II. 
 
 Reproduction I . . 
 Reproduction II. 
 
 Cross-Out I . . 
 Cross-Out II. 
 
 Fordyce, No. of Words . 
 
 Pressey I . . 
 Pressey II . 
 
 Cross-Out I C-W. 
 
 Cross-Out I S=^. 
 C-l-0 
 
 Cross-Out II C-W. 
 C-W 
 
 Cross-Out II 
 
 C-hO 
 
 .28 
 .26 
 ■23 
 
 .12 
 
 .20 
 
 ■25 
 • 07 
 
 .22 
 ■13 
 
 .20 
 •23 
 
 •30 
 
 .08 
 .02 
 
 •17 
 •14 
 
 • 14 
 
 • 15 
 
 .20 
 
 • 15 
 
 • 14 
 
 .16 
 
 .08 
 
 13 
 
 ■ 15 
 ■13 
 
 •07 
 .10 
 
 ■03 
 
 .20 
 .18 
 .22 
 
 .01 
 •03 
 .06 
 
 ■03 
 
 06 
 
 -.01 
 
 15 
 
 •03 
 
 OS 
 10 
 
 •25 
 .22 
 
 .18 
 ■14 
 
 •25 
 ■33 
 
 .08 
 
 .11 
 
 ■03 
 .06 
 
 .11 
 
 •15 
 
 .21 
 
 .18 
 
 .11 
 
 •05 
 
 .16 
 
 •17 
 
 .11 
 
 -.01 
 
 *In Cancellation Test I, the words containing both "a" and "t" were marked; 
 in Test II, those containing both "e" and "r." 
 
 vocabulary and his reading vocabulary are not closely related. The 
 coefficients of correlation, in Table XV, show that there is little or 
 no relation existing between measures of comprehension and the 
 number of words which were written in these compositions. Even in 
 the case of comprehension scores based upon the number of words 
 and the number of ideas contained in reproductions, the coefficients 
 of correlation fail to indicate the existence of any marked relation- 
 ship. In fact, the coefficients of correlation for measures of com- 
 prehension gained through reproduction are lower, in most cases, 
 than the coefficients of correlation of the number of words written 
 with the comprehension scores derived from Monroe's Standardized 
 Silent Reading Tests. 
 
 45 
 
A higher degree of correlation is indicated between the *'story 
 value" and the measures of comprehension. Some of the coefficients 
 of correlation are sufficiently large to indicate a distinct positive re- 
 lationship between these two traits. It is not unlikely that this re- 
 lationship can be explained in terms of a common general factor, 
 such as general intelligence. 
 
 Inter-correlation between tests. Since in each grade all of the 
 tests were given to the same pupils, it is possible to calculate the 
 coefficients of correlation between scores yielded by the different 
 tests. These are given in the appendix. The magnitude of the co- 
 efficients of correlation is influenced by the reliability of the scores 
 and, therefore, does not truthfully reflect the relationship which 
 exists between the scores yielded by the different tests. In order 
 to secure more accurate indices of the relationship existing between 
 traits measured by the different tests, the corrected coefficients of 
 correlation have been calculated by means of the formula given on 
 page 41. Since the factors of both numerator and denominator of the 
 formula are square roots, it is impossible to calculate corrected co- 
 efficients when one of the raw coefficients is negative. This 
 
 .TABLE XV. CORRELATION OF COMPREHENSION WITH WRITTEN 
 COMPOSITION, SEVENTH GRADE, 9O PUPILS 
 
 Test 
 
 Monroe I 
 
 Monroe II 
 
 Monroe III 
 
 Starch I Words 
 
 Starch I Ideas 
 
 Starch II Words 
 
 Starch II Ideas 
 
 Reproduction I Questions 
 
 Reproduction I Ideas 
 
 Reproduction I Words 
 
 Reproduction II Questions 
 
 Reproduction II Ideas 
 
 Reproduction II Words 
 
 Cross-Out I C-W 
 
 r n . T C-W 
 
 Cross-Uut 1 — — - 
 
 C+0 
 
 Cross-Out II C-W 
 
 Cross-Out II — — 
 
 C+0 
 
 Fordyce Percent 
 
 Pressey I 
 
 Pressey II 
 
 46 
 
 Number of 
 words written 
 
 Story 
 value 
 
 .18 
 • 24 
 
 .29 
 ■33 
 •31 
 
 .10 
 •07 
 ■14 
 .09 
 
 •31 
 
 .28 
 
 •36 
 ■33 
 
 .12 
 
 .11 
 .22 
 
 •24 
 
 14 
 
 .18 
 
 -.07 
 .26 
 .28 
 
 .11 
 
 •37 
 •43 
 
 •13 
 
 •23 
 
 .09 
 
 .11 
 
 .16 
 
 .06 
 
 .04 
 
 .11 
 
 .12 
 
 .12 
 
 .10 
 ■05 
 
 .29 
 .18 
 
accounts for the fact that certain corrected coefficients are not given 
 in Tables XVI and XVII. It will be noted in these tables that, 
 occasionally, a coefficient greater than i.oo is given. This is due 
 to chance errors in the raw coefficients of correlation which, in turn, 
 are due to the fact that a sample of the total population was used in 
 calculating them. The corrected coefficients are, in general, larger 
 than the corresponding raw coefficients. 
 
 Table XVI gives the corrected coefficients for the comprehen- 
 sion scores. A significant characteristic of this table is the variation 
 in the degree of intercorrelation between the tests. For example, 
 Monroe's Standardized Silent Reading Test I correlates very highly 
 with the number of questions answered correctly on the Courtis 
 Silent Reading Test, No. 2. It correlates less highly with the other 
 two scores of this test. The degree of its correlation with the other 
 tests is moderately low. It is significant that the corrected coeffi- 
 cients of correlation between the two tests requiring reproduction 
 are not higher. For example, the highest coefficient of correlation 
 between Brown's test and the Experimental Reproduction Test I 
 is ,79. The lowest is .26. The corrected coefficient of correlation 
 between the scores obtained by the word-counting method is .33; 
 for the idea-counting method the coefficient of correlation is .62. 
 The highest correlation between Brown's test and the Experimental 
 Reproduction Test I is for the number of questions answered cor- 
 rectly. In the seventh grade, the corrected coefficients of 
 correlation between the question scores yielded by the 
 Experimental Reproduction Test II and Starch's Silent Reading 
 Test are as high as those obtained from the reproductions. Both 
 Starch's test and the Experimental Reproduction Test correlate 
 nearly as highly with Monroe's Standardized Silent Reading Test 
 as with each other. A number of the coefficients of correlation for 
 the Cross-Out Test are relatively high. It correlates most highly 
 with Monroe's Standardized Silent Reading Test. In general, the 
 coefficients are higher for the scores obtained by C — W than for 
 
 C W 
 
 t :^ . The former is probably the better plan of scoring. 
 
 Table XVI appears to bear out the usual assumption that diff"er- 
 ent silent reading tests measure different phases of silent reading 
 ability. It is very obvious, in a number of cases, that the same 
 traits are not measured by different tests. However, it should be 
 noted that these differences exist for tests that are similar in struc- 
 ture as well as for tests which possess marked differences in struc- 
 
 47 
 
i 
 
 > ? 
 
 ^^" 
 
 
 
 
 1 
 
 
 
 
 
 
 ■* 
 
 
 
 t-\o 
 
 ' 
 
 "- 
 
 Os 
 
 
 
 
 00 
 
 Iv 
 
 \j-t tv 
 
 
 0+3 
 AV-3 
 
 VO 1^^ 
 
 
 
 M O 
 
 "100 ■* 
 
 
 ^ 
 
 so SO so tv 
 
 AV-D 
 
 00 00 o- 
 
 
 
 
 00 c^'O 
 
 
 ^ 
 
 O- <^so -* 
 tv Ln tvoo 
 
 o 
 
 O+O 
 M-3 
 
 
 >n - •* 
 
 \^\0 w^ ^^ t^ 
 
 
 un <^ ■* 
 
 
 
 fvso ■* 
 
 A\-3 
 
 
 
 00 <s O- « - 
 N M "00 t^ 
 
 
 
 
 
 OssO t^ 
 so "llv 
 
 o 
 
 SPJOM 
 
 O-oo^ 
 rn t-1 ■* 
 
 
 
 ■♦ " 
 
 
 ^ ? 
 
 ^ 
 
 -t fv tvso 
 
 SBSpi 
 
 OOO N 
 ■* ■>*• 1^ 
 
 
 
 — IN. 
 
 
 tn 00 
 
 s? 
 
 « 00 c^ tv 
 loOOOO t- 
 
 suoijsanf) 
 
 
 
 
 t-~ " 
 
 
 00 m 
 wi so 
 
 tN 
 
 ►- tv ■*• M 
 
 OsiOOOOO 
 
 
 SPJOM 
 
 ■*^o 00 
 
 "S^J^^, 
 
 C>vO t^ «n00 
 
 
 
 
 
 yr, OvOO 
 
 SE3PI 
 
 r^ Cvoo 
 
 f'.vO t^ 
 
 00 OM/lOM 
 
 
 
 tN t^ 
 
 
 O O t^ 
 
 so OsOO 
 
 suopsanf) 
 
 
 OOO 
 
 
 
 
 so lo 
 
 
 f^tv Os 
 SO t-00 
 
 
 9B3pj 
 
 
 
 
 
 r;'^:^ 
 
 sF S, 
 
 Os 
 
 Os c^so O 
 
 SO 00 OsOO 
 
 spjoA\ 
 
 I^VO 1^ 
 
 
 
 
 I-- OS ■* 
 
 U-, ■* m 
 
 ■O « 
 
 ■* 
 
 IM OsOO N- 
 
 tvOO 00 00 
 
 > 
 
 -a 2 
 
 SESpi 
 
 t> r^ O 
 
 ■ri- \J-) t^ 
 
 
 
 t^^OvO 
 
 tN l^ 
 
 
 00 tv •- 
 tvOO Os 
 
 8PJ0A\ 
 
 KO t^F^ 
 
 
 
 
 
 00 r^ 
 
 
 OsOO O 
 tv\0 00 
 
 33EJ3AY 
 
 
 00 
 
 
 
 t^ Ul tN 
 
 Os >n 
 
 
 ■*00 
 00 ■<(• 
 
 XlIIBtlf) 
 
 
 ^- 
 
 
 
 
 <S s3 
 
 
 Ooo 
 
 XjijuEnf) 
 
 8 
 
 O 00 
 
 
 
 TJ-OO o 
 'I- ^ u-> 
 
 00 1^ 
 
 
 Os rn 
 
 > „ 
 "^ 2 
 
 6<3 
 
 a33JJ03 
 
 suoiisanf) 
 
 O^ OCC 
 
 
 00 a. t^ 
 
 
 I^ xi-wn 
 
 s_ 5 
 
 
 'Soo O 
 . rooo 
 
 suousanf) 
 }o -ON 
 
 
 
 ir> t«.vO 
 
 
 OVO >n 
 
 o. ^ 
 
 
 OsOO r) 
 tvsoso 
 
 •dtuo3 
 ;o xapuj 
 
 i^^o r^ 
 
 
 O U100 O O 
 
 
 OtnOO 
 
 so IS 
 
 
 m ■* lo 
 
 > 
 
 o 
 
 III-II 
 
 
 
 
 
 
 Os so 
 
 ^ 
 
 IV « ■* Os 
 Os IvOOOO 
 
 III-I 
 
 
 
 
 00 o> 
 
 00 00 00 
 
 00 tN 
 
 so 
 SO 
 
 O Os ■»*• t*1 
 Os ■* IVOO 
 
 II-I 
 
 
 
 
 tNI-~ 
 
 LOOM> 
 
 00 so 
 
 ^ 
 
 so N 10-* 
 OS wi tvoo 
 
 > 
 o 
 
 IIl-II 
 
 
 r^ moo 
 
 
 
 SO 00 00 
 SO 1- ■* 
 
 r^ yr\ 
 
 
 tv ^ N 
 
 Os -"too 
 
 III-I 
 
 
 ^ (-~ o 
 
 
 
 t', Osso 
 so imn 
 
 R - 
 
 
 tv NSO 
 
 II-I 
 
 
 
 c 
 
 
 VC 
 
 lO. 
 
 
 li 
 
 1 -^ 
 
 ^5 
 
 
 
 so 
 
 
 
 
 O 
 
 ■ 
 
 s ■>! 
 
 c 
 
 
 
 W 
 
 c 
 
 
 
 2 
 
 S 
 
 
 c 
 
 .2 
 c 
 
 II 
 
 ° V 
 
 III 
 
 o 
 
 c 
 U 
 
 c 
 o 
 
 d 
 
 
 > 
 
 > 
 
 c 
 
 > 
 
 < 
 
 1 <r 
 
 c 
 
 T3 
 
 1 
 
 T 
 
 
 O c 
 
 '*-'.£ 
 
 ;: 
 
 •a 
 
 o 
 
 1 
 
 o 
 
 o 
 
 Icj 
 
 1 
 
 u 
 
 o 
 
 + 
 
 > 
 1 
 
 
 1 
 
 K 
 
 U 
 
 c 
 
 6 
 
 48 
 

 Grade 
 
 VII 
 
 Comp. 
 
 A 
 
 
 
 
 
 
 
 so 
 oo 
 
 
 t 
 
 SO 
 
 o 
 
 so 
 
 
 CO 
 
 Ov 
 
 
 
 
 Grade 
 
 VII 
 
 Comp. 
 
 OS C?N 0\ 
 
 
 
 oo 
 
 Oi 
 
 CO 
 
 oo 
 
 
 
 CO 
 
 On 
 
 
 Grade 
 
 IV 
 Comp. 
 
 o ^ o 
 
 On Cs Os 
 
 -J- 
 
 On 
 
 On 
 oo 
 
 
 O 
 
 0\ 
 
 CO 
 oo 
 
 
 
 
 
 Grade 
 
 VII 
 
 Pressey 
 
 co oo oo 
 
 
 
 
 '^ 
 
 1^ 
 
 ■>*■ 
 
 
 
 O 
 
 so 
 
 
 Grade 
 
 yii 
 
 Cross 
 Out 
 
 
 
 
 
 so 
 
 
 
 CO 
 oo 
 
 so 
 
 
 Grade 
 
 IV 
 Cross 
 Out 
 
 On 0\ Os 
 
 so 
 
 oo 
 
 so 
 
 
 CO 
 
 
 
 CO 
 oo 
 
 
 
 Grade 
 
 VII 
 
 Reprod. 
 
 ro r^ cs 
 
 oo oo ^ 
 
 
 
 r^ 
 
 
 so 
 
 •^ 
 
 ON 
 
 o 
 
 
 Grade 
 
 IV 
 Reprod. 
 
 Tj- Tj-N£) 
 
 r^oo so 
 
 o 
 
 ON 
 
 On 
 
 
 
 CO 
 
 
 o 
 
 On 
 
 
 
 Grade 
 VII 
 
 Starch 
 
 so so so 
 
 
 
 
 C4 
 
 r^ 
 
 r- 
 i>-i 
 
 ST) 
 
 oo 
 
 so 
 
 oo 
 
 
 Grade 
 
 IV 
 Brown 
 
 On CO f^ 
 
 SO r- r- 
 
 oo 
 
 
 
 0\ 
 
 oo 
 so 
 
 
 On 
 
 oo 
 
 
 
 Grade 
 
 IV 
 Courtis 
 
 
 
 oo 
 
 
 o 
 
 ON 
 
 so 
 
 
 On 
 
 
 
 Grade VII 
 
 Monroe 
 
 I- I- II- 
 
 II III III 
 
 
 
 
 NO 
 SO 
 
 so 
 so 
 
 C4 
 
 SO 
 
 r- 
 
 oo 
 
 CO 
 
 oo 
 
 SO 
 
 so 
 SO 
 cs 
 
 oo 
 
 •-0 
 
 oo 
 
 oo 
 
 ON 
 ON 
 
 On 
 
 so 
 CO 
 
 
 Grade IV 
 
 Monroe 
 
 I- I- II- 
 
 II III III 
 
 c- 
 oc 
 
 c- 
 
 r 
 
 c 
 ^ 
 
 1 
 
 so 
 
 1 
 
 
 so 
 so 
 
 '*• 
 
 oo 
 
 C 
 
 c 
 
 
 
 -1 c 
 
 ^ c 
 
 ^ c 
 
 c 
 
 ^ 
 
 ^ 
 ^ 
 
 
 
 *1 
 
 HJ 
 
 c 
 
 u. 
 
 c 
 c 
 
 7 
 
 1 
 
 
 u 
 
 u 
 
 3 
 
 
 U 
 
 
 2 
 m 
 
 
 u 
 a 
 c/2 
 
 
 c 
 _c 
 
 u 
 
 3 
 
 "a 
 
 s 
 
 D 
 
 
 C 
 
 
 
 U 
 
 
 > 
 
 HJ 
 CLi 
 
 - 
 
 1 
 
 , 
 
 < 
 
 E 
 o 
 U 
 
 u 
 
 49 
 
ture. In fact, the variations in these corrected coefficients of corre- 
 lation are so erratic that one is inclined -to be skeptical of any con- 
 clusions which may be drawn from them with reference to the 
 functions of the different tests. 
 
 The corrected coefficients for the rate scores are given in Table 
 XVII. These are, in general, higher than those for comprehension. 
 In general, the correlation between tests in which the pupil reads 
 continuously is higher than between one test in which the pupil 
 reads continuously and another in which his reading is not contin- 
 uous. However, the correlation between Monroe's Standardized 
 Silent Reading Test I and the Cross-Out Test, in the fourth grade, 
 is as high as that for any of the other tests. The fact that some of 
 the tests were too short and failed to discriminate between a consid- 
 erable number of pupils probably accounts for the fact that a num- 
 ber of coefficients of correlation are not higher. An examination of 
 this table indicates that the rate score secured by means of Monroe's 
 Standardized Silent Reading Tests is a true measure of the pupil's 
 rate of reading. 
 
 Correlation of single tests with composites. In Tables XVI and 
 XVII, the corrected coefficients of correlation for each test with cer- 
 tain composite scores are given. These, in general, are larger than 
 the coefficients of correlation between single tests. In the fourth 
 grade, composite A for comprehension is the average of Monroe, 
 comprehension, Courtis, answers correct, and Reproduction, answers 
 to questions. In the seventh grade, the Courtis test was not given 
 and this composite includes only the other two tests. Composite B 
 for comprehension is the average of the comprehension scores de- 
 rived from reproductions. In the case of Brown's Silent Reading 
 Tests, both quality and quantity are used. In the other cases, the 
 scores obtained by both the idea-counting method and the word- 
 counting method are used. Composite C is the average of composite 
 A and composite B. The general composite is formed by combining 
 all of the scores obtained. 
 
 Monroe's Standardized Silent Reading Tests are shown to cor- 
 relate very highly with composite A. The correlation with com- 
 posite B is very much less, as might be expected. The rate scores 
 derived from this test also correlate very highly with the general 
 composite scores. In fact, with the exception of Pressey's test, the 
 correlation of single tests with the composite scores is very high. It 
 appears, therefore, that each of the tests yields rate scores whicb 
 
 50 
 
may be accepted as correlating very highly with the true rate of 
 silent reading. The scores derived from the Experimental Repro- 
 duction Tests in the fourth grade correlate more highly with com- 
 posite B than those derived from Brown's Silent Reading Test. In 
 the seventh grade, the correlations between Starch's test and com- 
 posite B are slightly higher than those for the Experimental Repro- 
 duction Tests. It appears, however, that the Experimental Repro- 
 duction Tests yield approximately as valid measurements of ability 
 to comprehend as are secured by means of the other tests which, 
 presumably, have been devised with greater care. 
 
 SUMMARY OF CONCLUSIONS. 
 
 1. The scoring of reproductions is so highly subjective that a 
 silent reading test requiring reproduction of material read cannot be 
 considered satisfactory. 
 
 2. Brown's Silent Reading Test is very unreliable for both 
 comprehension and rate. This is true, even when the average of 
 two independent scores is used as a measure of comprehension. 
 
 3. The correlation between scores yielded by the memory 
 test and comprehension scores based upon reproductions is only 
 slightly higher than that existing between the scores derived from 
 the memory test and the comprehension scores yielded by Monroe's 
 Standardized Silent Reading Test. This makes doubtful the usual 
 assumption that measures of comprehension based upon reproduc- 
 tions are affected by the pupil's ability to remember. 
 
 4. Correlation between extent of vocabulary and ability to read 
 is surprisingly low. There is little, if any, relation between these 
 two abilities. 
 
 5. The intercorrelations between tests indicate that different 
 tests measure slightly different traits; but it is surprising to find, in 
 a few instances, a high degree of correlation existing between scores 
 yielded by tests which exhibit marked differences in structure. 
 
 6. There appears to be a higher degree of correlation between 
 the story value of written compositions and comprehension than 
 between the number of words written and the measures of compre- 
 hension. This is true even when the measures of comprehension 
 are based upon reproductions and the reproductions are described 
 in terms of the number of words or number of ideas reproduced. 
 
 7. In the measurement of rate of silent reading, the Courtis 
 Silent Reading Test No. 2, is shown to have the highest degree 
 of reliability. Monroe's "Standardized Silent Reading Tests, which 
 
 51 
 
were intended to yield only very crude measures of rate of silent 
 reading, are shown to be among the most reliable tests. 
 
 8. In measuring comprehension, the Courtis Silent Reading 
 Test, No, 2, is the most reliable. 
 
 9. The coefficient of reliability is shown not to be a satisfactory 
 measure of reliability. 
 
 10. Comparisons with teachers' ratings indicate that, in the 
 fourth grade, teachers tend to judge silent reading ability on the 
 basis of the pupil's ability to answer questions. In the seventh grade, 
 teachers give greater weight to the pupil's ability to reproduce or 
 tell what they have read. 
 
 Correlation with composites. In Tables XVI and XVII, the 
 corrected coefficients of correlation of each test with the composite 
 scores are given. These, in general, are larger than the correlations 
 between single tests. Monroe's Standardized Silent Reading Test 
 correlates very highly with composite A. This means that this test, 
 which is very simple to administer, yields measures of essentially 
 the same traits as are secured by means of this composite, which 
 in the fourth grade involves three scores and in the seventh, two 
 scores. The correlation with composite C and with the general com- 
 posite is also high. In fact, with the partial exception of Starch's 
 Test, no other correlations are as high as these two composites of 
 the Monroe Silent Reading Tests. It, therefore, appears, as judged 
 by composite scores, that this test yields measures of comprehen- 
 sion which agree more closely with the composite measures secured 
 from this group of tests than any other single test. The correla- 
 tions for rate are also high. 
 
 52 
 
THE UNIVERSITY OF ILLINOIS 
 
 THE STATE UNIVERSITY 
 
 URBANA 
 
 DAVID KINLEY, Ph.D., LL.D., President 
 
 The University Includes the Following Departments 
 
 The Graduate School 
 
 The College of Liberal Arts and Sciences (Ancient and Modern Languages 
 and Literatures; History, Economics, Political Science, Sociology, Philosophy, 
 Psychology, Education; Mathematics; Astronomy; Geology; Physics; Chemistry; 
 Botany, Zoology, Entomology; Physiology, Art and Design) 
 
 The College of Commerce and Business Administration (General Business, 
 Banking, Insurance, Accountancy, Railway Administration, Foreign Commerce; 
 Courses for Commercial Teachers and Commercial and Civic Secretaries) 
 
 The College of Engineering (Architecture; Architectural, Ceramic, Civil, Elec- 
 trical, Mechanical, Mining, Municipal and Sanitary, Railway Engineering, and 
 General Engineering Physics) 
 
 The College of Agriculture (Agronomy; Animal Husbandry; Dairy Husbandry; 
 Horticulture and Landscape Gardening; Agricultural Extension; Teachers' 
 Course; Home Economics) 
 
 The College of Law (Three-year and four-year curriculums based on two years 
 and one year of college work respectively) 
 
 The College of Education 
 
 The Curriculum in Journalism 
 
 The Curriculums in Chemistry and Chemical Engineering 
 
 The School of Railway Engineering and Administration 
 
 The School of Music (four-year curriculum) 
 
 The Library School (two=^year curriculum for college graduates) 
 
 The College of Medicine (in Chicago) 
 
 The College of Dentistry (in Chicago) 
 
 The School of Pharmacy (in Chicago; Ph.G. and Ph.C. curriculums) 
 
 The Summer Session (eight weeks) 
 
 Experiment Stations and Scientific Bureaus: U. S. Agricultural Experiment 
 Station; Engineering and Experiment Station; State Laboratory of Natural 
 History; State Entomologist's OfBce; Biological Experiment Station on Illinois 
 River; State Water Survey; State Geological Survey; U. S. Bureau of Mines 
 Experiment Station. 
 
 The library collections contain May i, 1922, 523,230 volumes and 120,131 pam- 
 phlets. For catalogs and information address 
 
 THE REGISTRAR 
 
 Urbana, Illinois 
 
LIBRARV OF CONGRESS 
 
 BULLETINS OF THE BUREAU OF EDUCATIONAL RE- 
 SEARCH, COLLEGE OF EDUCATION, UNIVERSITY 
 OF ILLINOIS, URBANA, ILLINOIS. 
 
 Price. 
 
 No. I. Buckingham, B. R. Bureau of Educational Research, 
 
 Announcement, 1918-19 15 
 
 No. 2. First Annual Report 25 
 
 No. 3. Bamesberger, Velda C. Standard Requirements for 
 
 Memorizing Literary Material 50 
 
 No. 4. Holley, Charles E. Mental Tests for School Use. 
 
 (Out of print) 50 
 
 No. 5. Monroe, Walter S. Report of Division of Educational 
 
 Tests for 1919-20 25 
 
 No. 6. Monroe, Walter S. The Illinois Examination 50 
 
 No. 7. Monroe, Walter S. Types of Learning Required of 
 Pupils in the Seventh and Eighth Grades and in the 
 High School 15 
 
 No. 8. Monroe, Walter S. A Critical Study of Certain Silent 
 
 Reading Tests 50 
 
 No. 9. Monroe, Walter S. Written Examinations and Their 
 
 Improvement. (In preparation) 50 
 
kBiiKK.^^-^P.'^gress 
 
 ^021 728 7145 
 
 HoUinger Corp. 
 pH8.5