UNIVERSITY LIBRARY UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN The person charging this material is responsible for its renewal or return to the library on or before the due date. The minimum fee for a lost item is $125.00, $300.00 for bound journals. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. Please note: self-stick notes may result in torn pages and lift some inks. Renew via the Telephone Center at 217-333-8400, 846-262-1510 (toll-free) or circlib@uiuc.edu. Renew online by choosing the My Account option at: http://www.library.uiuc.edu/catalog/ MAR 8 2006 Bulletin No. 58 BUREAU OF EDUCATIONAL RESEARCH COLLEGE OF EDUCATION A Critical Summary of Research Relating to the Teaching of Arithmetic By Walter S. Monroe Director, Bureau of Educational Research and Max D. Engelhart Assistant, Bureau of Educational Research Published by the University of Illinois, Urbana 1931 UNIVERSITY OF ILLINOIS :: PRESS :i PREFACE In view of the fact that several summaries of research in the field of arithmetic are available, the preparation and publication of another one requires justification. As the title indicates, the present summary is restricted to research relating to methods of learning and teaching arithmetic. A more significant characteristic is the attempt to effect a systematic and critical evaluation of the researches sum- marized. There have been several assertions that a considerable portion of the research reported during recent years is faulty, and a few studies have been criticized by writers in educational periodicals. In the preparation of summaries of research, however, there has been very little evaluation of the studies included. Although the authors of this bulletin have recognized certain specified criteria in their evaluation, the judgments are largely subjective, and, consequently, the conclusions relative to the dependable findings concerning the teaching of arithmetic may not be entirely valid. It is hoped, how- ever, that the publication of this bulletin will contribute to a more adequate understanding of what a critical summary involves. Controlled experimentation has been hailed as a means of securing dependable evaluations of all factors of the teaching process. Careful study, however, indicates certain significant difficulties, and it is hoped that the discussion in the final chapter of this bulletin will con- tribute to a saner understanding of experimentation. The expendi- tures required for certain types of studies do not appear to represent wise investments, and those who are interested in educational research should give careful attention to the probable dependability of the outcomes of the studies they undertake or sponsor. Digitized by the Internet Archive in 2012 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/criticalsummaryo58monr TABLE OF CONTENTS PAGE Chapter I. Introduction 7 Chapter II. Methods of Teaching and Learning the Fundamentals 14 Chapter III. Drill in the Fundamentals 30 Chapter IV. Methods of Teaching Pupils to Solve Verbal Problems 50 Chapter V. Methods of Diagnosis and Remedial Treatment 68 Chapter VI. Methods of Teaching Reading of Arithmet- ical Subject-Matter 76 Chapter VII. Motivation of Learning in Arithmetic. . 81 Chapter VIII. General Summary and Conclusion ... 89 Chapter IX. Bibliography of Investigations Pertaining to Methods of Teaching Arithmetic 108 CHAPTER I INTRODUCTION General purpose of this bulletin. The general purpose of this bul- letin is to present a summary and an evaluation of the research relating to instructional methods employed in teaching arithmetic in Grades I to VIII. For each group of investigations the discussion appears under three heads: (1) summary of reported conclusions, (2) evaluation of experiments, (3) justified conclusions. Sources of references to investigations. The sources of practically all of the references were the ''Summary of Educational Investiga- tions Relating to Arithmetic" of Buswell and Judd 1 and the annual supplements prepared by Buswell. 2 An investigation of Brownell on the techniques employed in research on arithmetic was of service in locating in the above summaries investigations of the types desired. 3 The writers were able to include in the present summary some refer- ences not given in the sources cited above. General types of research included. Most of the investigations included in this summary may be characterized as experiments. Many of these experiments are of the single-group type, and as such may be labeled "uncontrolled" experiments. In investigations of this kind, the experimenter subjects a single group of pupils to the method or procedure which he wishes to try out, and estimates by observation or by administering tests the improvement in achieve- ment assumed to be due to the new method or procedure. Where the gains in achievement are large, the new method may, with some jus- tification, be claimed effective, but it is evident that usually an unknown amount of the gain is due to the operation of other factors. Investigations of this kind are termed "experiments," even though uncontrolled factors are operative, because they possess one impor- tant characteristic of all experimentation — that of trying something out to see what happens. A number of the experiments referred to in this summary are of the controlled type. In place of a single group, two or more equiv- alent groups of pupils are used. In the typical controlled experiment, buswell, G. T. and Judd, C H. "Summary of Educational Investigations Relating to Arith- metic," Supplementary Educational Monographs Xo. 27. Chicago: University of Chicago Press, 1925. 212 p. 2 These supplements are published in the Elementary School Journal, as for example: Buswell, G. T. "Summary of Arithmetic Investigations, 1928," Elementary School Journal, 29:691-8, 737-47; May, June, 1929. 3 Brownell, W. A. "The Techniques of Research Employed in Arithmetic," Twenty-Ninth Year- book of the National Societv for the Study of Education. Bloomington, Illinois: Public School Publishing Company, 1930, p. 415-443. 8 Bulletin No. 58 the two groups of pupils are equated with respect to intelligence or achievement test scores, or both; hence, they are considered poten- tially equivalent with respect to the planned instruction. These groups are subjected to instruction differing only with respect to the experimental factor. For example, one of the groups is taught to add in the upward direction, while the other group is taught to add in the downward direction. After a period of such instruction, in which attempts are made to prevent irrelevant factors from operating un- equally on the two groups, the final achievement test is given. The difference in final-test scores, or in mean gains in achievement from initial to final tests, is then computed, and interpretations are made with respect to the relative superiority of the one method or of the other. Several laboratory experiments have been included in this sum- mary. In these investigations, laboratory apparatus, such as that used in recording eye-movements, is used to secure an understanding of the characteristics of arithmetical learning activity. Some of the investigations are of the type in which data are collected by means of a single administration of a test. In a few places in this summary, relevant "case studies" are cited. Previous summaries of research in the field of arithmetic have occasionally been used to supplement the judgments of the writers with respect to the original research. It should be mentioned that investigations of the nature of pupil responses, as for example, the researches on the relative difficulty of the number combinations, have been excluded from this study. The same is true of analyses of arithmetic texts and practice materials. Research of this kind, however important, is, in the judgment of the writers, more relevant to the problems of the arithmetical curriculum than to problems of methods of teaching arithmetic. Criteria recognized in the evaluation of investigations. Evalu- ation of experiments is largely a subjective matter, but the utilization of specified criteria will tend to make it more dependable. A critical reader may apply these same criteria to the experiments evaluated in this summary and determine, to his own satisfaction at least, whether or not the evaluations of the present writers are justified. 1. Definition and restriction of the experimental factor. In ex- perimental investigations of methods of teaching, the ideal procedure is to vary one of the factors that affect pupil achievement while all others are kept constant. The factor that is varied is designated as "experimental," and, obviously, it must be defined in specific terms. Otherwise the basis of the experimentation cannot be definitely known. For example, if the method of instruction is the experimental Summary of Research Relating to the Teaching of Arithmetic 9 factor and is designated merely as "the project method versus the traditional method," the precise nature of the variation is not clear. Usually the experimental factor must be restricted to a single phase or detail of method. If it is complex, the experimenter cannot know which element of the method produced the observed effect in the pupil achievement. Hence, the factor that is being made the basis of experimentation must be defined and restricted in such a way that the results may be interpreted in definite terms. 2. Control of pupil factors. Variation in the experimental factor is secured by employing two or more groups of pupils and maintaining a specified status of this factor for each of the groups. For example, if the type of drill exercises on addition of integers is the experimental factor, one type is used with Group A, a second type, with Group B, a third type, with Group C, and so on. Since achievement is influ- enced by the capacity of the pupils to learn, by their previous school experience, by their interest in the field of learning, and the like, it is obviously necessary that all significant pupil factors be controlled. This control is usually secured by forming groups that are equivalent with respect to all significant pupil factors. Hence, unless some other means of control is effected, the degree of equivalence of the groups is a criterion of the dependability of the results of the experiment. 3. Control of important non-experimental factors. The achieve- ment of pupils is affected by several factors. The more important ones appear to be the following: 1. Instructional techniques 2. Skill of the teacher in using the instructional techniques 3. Zeal of the teacher 4. Personality traits of the teacher 5. Instructional materials 6. Time spent in learning activity The significance of these factors varies with the character of the achievement, but usually none of them should be neglected. The skill and the zeal of the teacher appear to be more significant than is commonly realized. Control of these factors may be attained by securing equivalence or by determining the effect of variation and by making appropriate allowance for this effect in interpreting the results. 4. Accuracy and validity of measures of differences in achieve- ment. An index of the relative effectiveness of two methods of in- struction or of two types of instructional materials is obtained by computing the difference between the means of the scores on the test administered at the close of the experiment, or, preferably, between 10 Bulletin No. 58 the mean gains in achievement, obtained by subtracting the initial- test means from the final-test means. The obtained difference is affected by the variable and systematic errors of measurement. It is possible, if the coefficients of reliability of the tests used are known, to make appropriate allowances for variable errors of measurement. If the test is administered to both groups under approximately the same conditions, the possibly existing systematic errors of measurement, while they may raise or lower the means similarly, will not influence to a significant extent the difference of the means. It should be noted, also, that fluctuations of testing conditions tending to create system- atic errors in certain groups of scores will tend to produce variable errors when several groups are combined. Hence, when the number of pupils is large, the systematic errors are likely to be less significant than when the group of pupils is small. It should be emphasized in this connection, however, that, when the groups of pupils and the obtained differences in achievement are relatively small, the system- atic and variable errors of measurement are not likely to be of negli- gible significance. It is, therefore, essential that adequate recognition be given to their possible or probable influence. The probable effect of systematic errors cannot be calculated by any formula, and for this reason they are the more difficult to deal with. The problem of an experiment usually specifies or implies the nature of the achievement on which the evaluation of the experimen- tal factor is to be based. Hence, it is necessary to consider the extent to which the instruments used actually measure the specified or implied pupil achievements. This may not be the same as the usual validity of the test, because in this case one is concerned only with the extent to which the test measures the achievement designated in its specified or implied function. It is possible that a test may be more valid with respect to the instructional methods or materials of one group than of the other. For example, a test consisting of addition and subtraction examples in a mixed order would be more valid for a group that had had addition and subtraction taught together than it would be for a group that had had these processes taught separately. A test may also be valid with respect to the measurement of the more specific abilities engendered in arithmetic and yet be quite invalid with respect to such general outcomes as attitudes, ideals, and inter- ests. If the achievement of one of the groups includes such outcomes, the differences in achievement obtained will contain errors of validity. The effect of invalidity is to introduce additional variable errors, and, as in the case of the variable errors of measurement, the effect tends to become negligible when the groups are large. However, the valid- Summary of Research Relating to the Teaching of Arithmetic 11 ity of the test used should not be neglected when interpreting smaller differences in gains. 5. Justification of generalization. If the preceding criteria have been satisfied, conclusion reported may be accepted as dependable with respect to the pupils participating in the experiment. If, how- ever, the investigator wishes to generalize, his data must satisfy an additional criterion. They must be representative of the larger pop- ulation to which the generalization is to be applied. If the sample of pupils used in the experiment was random, the investigator is justified in using the standard, or probable, error of sampling as an index of the representativeness of his groups. If, on the other hand, the sam- ple was not random, the investigator must use other means to show the extent to which his sample is representative. While no specific rules may be stated, the investigator should consider all of the avail- able evidence relative to the traits of the groups concerned. For example, if he has scores of his pupils on intelligence and standardized achievement tests, he may compare the means and standard devia- tions of these scores with the corresponding measures of the larger population. If this comparison indicates that his sample is typical of the larger population, generalizations may be accepted with a reason- able degree of confidence. If the data do not satisfy this criterion of representativeness, the investigator should refrain from generalizing, or limit his generalizations appropriately. The application of these criteria. In the evaluation of the studies reviewed in this bulletin, the second and third criteria are most prominent. The reader, however, should not infer that the other criteria are not important. Usually the definition and restriction of the experimental factor are obvious, and the instructional techniques applicable in the teaching of arithmetic tend to be relatively specific rather than general. Hence, a large proportion of the experiments in the field being considered satisfy this criterion. In the judgment of the dependability of the differences in achieve- ment reported in the experiments summarized in this bulletin, some attention has been given to their "statistical" significance. The com- bined allowance to be made for variable errors of measurement and of sampling may be determined through the use of appropriate for- mulae. 4 The employment of this procedure yields either the probable, or standard, error of the difference, and it is customary to recognize a difference as "statistically" significant when it is equal to, or greater than, 2.78 times its standard error or approximately 4.4 times its *See pages 101 to 106. 12 Bulletin No. 58 probable error. 5 When an obtained difference is 2.78 times its stand- ard error, the chances are not less than 369 to 1 (interpreting the standard error as a limit) that the difference would have the same sign, or be in the same direction, as they would have been if variable errors of measurement and of sampling were eliminated. The "sta- tistical" significance of a difference is, therefore, not very meaningful, since a difference may be "statistically" significant and yet be unde- pendable because of other limitations of the data, such as lack of equivalence, failure to control non-experimental factors, variable errors of validity, and systematic errors of measurement, validity, and sampling. It is a safe assumption that any difference not "sta- tistically" significant in the customary usage would not be of accept- able dependability if consideration is given to all of the probable faults of the data. On the other hand, if an obtained difference is "statistically" significant, its dependability is more certain because of this, but it is by no means guaranteed. In the estimation of the dependability of differences reported in the experiments reviewed in this summary, "statistical" significance has been recognized, there- fore, as but one aspect of the matter. The magnitude of possible systematic errors due to lack of equiv- alence, to failure to control non-experimental factors, to failure to secure comparable testing conditions in experimental and control groups, and to failure to measure the same outcomes in both groups is difficult to determine from the report of an experiment, unless the investigator explicitly refers to the matter. Unless some unusual achievement is specified or implied, most tests designed to measure calculation skills are probably of rather high validity. They, of course, measure the current ability of pupils rather than the permanent residue of achievement. It is likely that the latter type of achievement should be considered, but few, if any, investigators have attempted to base their conclusions on it. Conse- quently, the present writers have not applied this more severe test in their evaluations. When the achievement to be measured includes abilities other than calculation skills, the validity of the measures is an important matter, but it is very difficult to determine the degree of validity. The organization of the summary. This summary of research relating to instructional methods in arithmetic has been divided into six major divisions represented by the following rubrics: (1) methods 5 Monroe, W. S. and Engelhart, M. D. "Experimental Research in Education," University of Illinois Bulletin, Vol. 27, No. 32, Bureau of Educational Research Bulletin No. 48. Urbana: University of Illinois, 1930, p. 59-76. See also: McCall, W. A. How to Measure in Education. New York: Macmillan Company, 1922, p. 404-5. Summary of Research Relating to the Teaching of Arithmetic 13 of learning and teaching the fundamentals, (2) methods of drill in the fundamentals, (3) methods of teaching pupils to solve verbal prob- lems, (4) methods of providing diagnosis and remedial treatment, (5) methods of teaching the reading of arithmetical subject-matter, (6) methods of motivating learning activity in arithmetic. A chapter is devoted to each of these divisions. CHAPTER II METHODS OF TEACHING AND LEARNING THE FUNDAMENTALS The general nature of the experimental factor. The experimental factors of the studies summarized in this chapter are essentially methods of learning or performing the fundamental operations of arithmetic. Requesting pupils to add upward or to add downward, as the case may be, may be thought of as a method of teaching, but the essential element is the activity of the pupil. In the same way, requesting pupils to use the subtractive method of subtraction in which borrowing or decomposition is used, and directing pupils in the use of this method, may be regarded as a method of learning. The research summarized in this chapter has been classified under the following heads: (1) addition; (2) subtraction; (3) division; (4) fractions, decimals, percentage, proportion, and denominate numbers. 1 ADDITION 1. Summary of conclusions as reported. The relative efficiency of upward and downward addition has been studied in one experiment and in two investigations of other types. In the experiment reported by Buckingham 2 the group that was taught to add downward attained greater, but not significantly greater, achievement in addition. From an analysis of test results, Cole 3 reported that individuals add more accurately downward but less rapidly. Buckingham 4 has also re- ported the findings of a questionnaire study in which it was discovered that while more people prefer to add upward when the column is long, they add downward when the column is short. On the basis of the logical advantages that he claims for downward addition, and be- cause of this variation, Buckingham recommends that downward addition be taught. Procedures for adding a column of figures have been studied in four experiments and in one investigation where an observation technique was used. Overman 5 investigated the relative effectiveness J The absence of multiplication from this classification should be noted. The present writers have been unable to discover any experimental investigations of methods of teaching or learning multi- plication. Buckingham, B. R. "Upward versus Downward Addition," Journal of Educational Research, 16:315-22, December, 1927. (18) 3 Cole, L. W. "Adding Upward and Downward," Journal of Educational Psychology, 3:83-94, February, 1912. (29) Buckingham, B. R. "Adding Up or Down: A Discussion," Journal of Educational Research, 12:251-61, November, 1925. (15) 5 Overman, J. R. "An Experimental Study of the Effect of the Method of Instruction on Transfer of Training in Arithmetic," Elementary School Journal, 31 : 183-90, November, 1930. (97) 14 Summary of Research Relating to the Teaching of Arithmetic 15 of the following methods of teaching addition (and subtraction) of two- and three-place numbers in terms of transfer to untaught types, such as addition of four two-place numbers, two three-place numbers, one three-place number, and one one-place number. (1) In Method A the pupils were shown how to perform the process, and there was no generalization or consideration of underlying principles .... (2) In Method B (generalization) the pupils were helped to formulate general methods of procedure from the specific types taught, and these generalizations were constantly emphasized throughout the teaching .... (3) In Method C (rationalization) the reasons and principles underlying the specific types taught were discussed with the pupils. The formulation of general rules of procedure was avoided as much as possible .... (4) In Method D (generalization and rationalization) general methods of procedure were formulated, and the under- lying principles were discussed. Method B was reported as the most effective, Method D was found to be almost as effective as Method A, and Method C, only slightly more effective than Method A, the least effective of all. In connection with his experiment on transfer of learning in addition and subtraction Olander 6 investigated the effectiveness of instruction in generalizing groups of combinations. "For example, these children were led to recognize the law common to zero combinations. They noted that combinations appeared in reverse form such as 6 + 7 and 7 + 6, and they observed that a combination such as 10 — 6 was intimately related to 6 + 4." In his conclusions the investigator states that short daily instruction of this character had no significant effect on the arithmetic scores of the pupils taught by the method. Conard and Arps 7 discovered that strikingly superior results were secured by teaching children to "think results only." 8 Ballenger 9 concluded that it is effective to teach children, who have been having difficulty with addition, to break long columns into two parts and to add each part separately. Arnett 10 reported that the most rapid and accurate individuals add the digits in regular serial order. Excessive combination, or rearrangement, of digits is detrimental to rate and accuracy, but a moderate amount proves beneficial to some individ- uals. Finally, Clark and Vincent 11 found that teaching the pupils to check their answers results in greater, but not significantly greater, accuracy. 6 01ander, H. T. "Transfer of Learning in Simple Addition and Subtraction," Elementary School Journal, 31:358-69, 427-37; January, February, 1931. (94) T Conard, H. E. and Arps, G. F. "An Experimental Study of Economical Learning," American Journal of Psychology, 27:507-29, October, 1916. (32) 8 In this method the individual in the process of adding 3, 4, 9, and 6 thinks 7, 16, and 22 rather than 3 and 4 are 7, 7 and 9 are 16, and 16 and 6 are 22. 9 Ballenger, H. L. "Overcoming Some Addition Difficulties," Journal of Educational Research, 13:111-17, February, 1926. (6) 10 Arnett, L. D. "Counting and Adding," American Journal of Psychology, 16:327-36, July, 1905. (4) n Clark, J. R. and Vincent, E. L. "A Study of the Effect of Checking Upon Accuracy in Addition," Mathematics Teacher, 19:65-71, February, 1926. (27) 16 Bulletin No. 58 2. Evaluation of the experiments. In the only experiment on upward versus downward addition, Buckingham (18) used seven pairs of groups of second- and third-grade pupils, varying in size from eleven to twenty-eight pupils. The paired groups were equated with respect to scores made on an initial test in addition. Each of the teachers participating in this experiment taught a pair of groups, rotated at the end of each week the time of day during which addition was taught, assigned no home work in arithmetic, and introduced no new arithmetic topics. The teacher administered the final test as soon as her pair of groups had attained reasonable proficiency in adding short columns of one-place numbers. The differences in mean gains for the six pairs of groups favored the method of downward addition, but in only one case was the difference "statistically" 12 significant when compared with its probable error. In this investigation, the experimental factor, the direction of adding a column of figures, is specific and appears to have been satis- factorily isolated. The control of the pupil factors by grouping the pupils on the basis of the scores made on an initial addition test prob- ably was not entirely satisfactory. The general intelligence and the addition habits of the pupils were not directly considered. The con- trol of the teacher factors was attempted by having each teacher instruct a pair of groups, one in adding upward and the other in adding downward. This procedure, however, does not insure control, because there may have been variations in zeal and skill. The validity of the test was not explicitly considered ; it depends upon the ability that is specified as the criterion of merit of the direction of adding. If validity is defined as "ability to add throughout the pupils' school experience" or "ability to add when he becomes an adult," it must be admitted that the degree of validity is unknown. In view of the relatively small differences in gains, it seems reasonable to say that the findings, which are interpreted as favoring downward addition, are not dependable. When one considers the information vielded by Buckingham's questionnaire study (15) and by Cole's experiment with adults (29), it still appears that the relative merit of the two directions of adding has not been determined. Neither does one have dependable evidence to support the common-sense view that the direction makes little or no difference. 13 Cole (29) had thirty persons add the same problems both upward and downward. The fact that the subjects were accustomed to use 12 See pages 11 and 12. 13 It may be somewhat immaterial whether children are taught to add upward or downward, si nee the carefully controlled experiment of Beito and Brueckner (9) would seem to indicate that there is a large amount of transfer of training from learning to add in one direction to learning to add in the other, or reverse, direction. It is stated in their conclusions that "When pupils of any mental level are taught Summary of Research Relating to the Teaching of Arithmetic 17 the upward method causes one to question the results obtained in this investigation. It is possible that they added downward more accurately because they added more slowly and took greater pains with an unfamiliar method. Overman (97) used four groups of 112 second-grade pupils which were equivalent with respect to sex, mental age, teacher's estimate of general ability, and score on a preliminary test. The experimental factors 14 appear to have been adequately defined, but there is some uncertainty in regard to the control of the non-experimental factors, especially teacher skill and zeal. Each of the groups were given twenty minutes of practice a day for fifteen days, eight days being used for testing, and seven, for instruction and practice. Tests were given at the beginning and at the end of the experiment, and twice during the experiment. The differences in achievement, as measured by these tests, were "statistically" significant for Methods B and D compared with Method A, but not for Method C. The conclusions of the experiment seem reasonably dependable. They also, for the most part, seem to be the conclusions one should logically expect. That pupils should be stimulated to generalize is sufficiently well established that an experimental comparison of a method with generalization and a method without generalization seems somewhat futile. One wonders in the case of this experiment why generalization plus rationalization should have proved inferior to generalization alone. Common sense would lead to the inference that a combi- nation of both would be most effective. It would seem justifiable to ascribe the apparent inferiority not to the method which combines generalization with rationalization but to the limitations of the experiment. In evaluating the effectiveness of instruction in generalizing groups of combinations, Olander (94) used three hundred pairs of second-grade pupils equivalent with respect to growth in arithmetic ability over a period of five weeks. The reason given for using this technique is the following: If two groups exhibit similar learning curves under similar instruction until a certain point is reached, it can be assumed that the groups are equal in the function in question. The experiment was conducted for twelve more weeks, during only the direct form of an addition combination, such as 7, as nearly as can be, the reverse form, 4, is learned concomitantly at least as completely as the direct form." See: Beito, E. A. and Brueckner, L. J. "A Measurement of Transfer in the Learning ot Manner Combinations," Twenty-Ninth Yearbook of the National Society for the Study of Education. Bloomington, Illinois: Public School Publishing Company, 1930, p. 569-87. (9) u See page IS for a description of these factors. 18 Bulletin No. 58 which time the pupils of the experimental group were given instruc- tion in generalizing for three minutes of the daily twenty-minute period. Achievement was tested at the start of the experiment, at the end of five weeks, at the end of eleven weeks, and at the close of the experiment— at the end of seventeen weeks. The tests included the one hundred addition and the one hundred subtraction combi- nations and were administered by the flash-card method. The pupils in the group not given the three minutes of daily generalizing instruc- tion were able to generalize practically as well as the pupils who were given this instruction. It appears logical that the experimental factor, the generalization instruction, was not applied long or inten- sively enough to add materially to the generalizing abilities acquired by the pupils on their own account. The interpretation that general- ization instruction is not worth while, on the basis of dander's data, does not seem to be justified. 15 In studying the effect of teaching pupils to "think results only," Conard and Arps (32) used two groups of thirty-two grade-school children whose approximate equivalence was shown by comparison of scores on the Courtis test. After eight work periods of seven exam- ples in each of the four fundamentals, the final test was administered. This experiment does not appear to justify a very high rating with reference to any of the criteria. The experimental factor was not adequately defined, and the control of the non-experimental factors probably was not sufficient to justify acceptance of the obtained results as demonstrating the superiority of "thinking results only." There is evidence that experimental conditions were in some respects abnormal and that the experimental pupils sometimes forgot to "think results only." It may be argued that these faults, for the most part, were such as would tend to reduce, rather than to increase, the difference in favor of the experimental method and, consequently, that the findings should be accepted as dependable evidence. In view of the limitations, however, this argument is not convincing, and the reported conclusion probably should not be accepted as dependable. Ballenger (6) used a single group of 130 fourth-, fifth-, and sixth- grade children. These children were taught to divide long columns of figures and to add each part separately. While they improved significantly in accuracy, the results of this uncontrolled experiment cannot be regarded as other than merely suggestive. Such a pro- cedure might be effective for backward children; it probably should not be recommended as a standard method of teaching addition. Children should be taught to add columns of increasing length. 15 The other conclusions stated in this experiment appear to be reasonably dependable. Summary of Research Relating to the Teaching of Arithmetic 19 Splitting columns, as advocated by Ballenger, would seem to be a method of forming undesirable habits which would need to be unlearned later. Arnett (4) used chronoscopic apparatus in determining the meth- ods of counting and of adding used by several adults in a psychological laboratory. His results are suggestive, but they should be verified by observation and by controlled experimentation with school children. Clark and Vincent (27), in their study of the effect of checking on accuracy, used two groups of fifth- and sixth-grade children which were equated on the basis of M. A., I. Q., and initial addition test scores. The size of these groups is not reported. After twenty days of practice, the final test was administered. The principal limitations of this experiment are to be found in the lack of control of non- experimental factors, in the lack of control of special teacher zeal, and in the unknown validity of the tests. The difference in final-test means was in favor of the method of checking, but not significantly so. This might be interpreted to mean that teaching pupils to check additions may be expected to increase the accuracy of their work only very slightly. This conclusion, however, probably is not justified. 3. Justified conclusions. It is evident that none of the experi- ments satisfy completely the criteria stated in Chapter I. Those of Buckingham (18), Overman (97), and Olander (94) come nearest to doing so, but the limitations of these experiments render the conclu- sions of somewhat doubtful dependability. More experiments must be reported before justified conclusions can be expressed with respect to such problems as adding upward versus adding downward, the effect of checking on accuracy, and the like. The merits of instruction involving generalization and rationalization should be tested in exper- iments where failure to control important non-experimental factors does not obscure the effectiveness of such instruction. SUBTRACTION The relative merits of the four principal methods of subtraction have been studied in a number of investigations. These methods may be described briefly by noting the steps in subtracting 25 from 43. In using the subtractive, or take-away, method in which borrowing or decomposition is employed, the steps are: 5 from 13 = 8 2 from 3 = 1 In the subtractive, or take-away, method in which carrying or equal addition is used, the steps are: 5 from 13 = 8 3 from 4 = 1 20 Bulletin No. 58 The additive method in which borrowing or decomposition is used requires the following steps: 5 and what are 13, write 8 2 and what are 3, write 1 The additive method in which carrying or equal addition is used is illustrated as follows: 5 and what are 13, write 8 3 and what are 4, write 1 Decomposition, usually when used as illustrated in the first of these examples, has been called the "first Italian method," and equal addition, when used as in the second example, has been called the "second Italian method." No name is given to the third method, but the fourth is well known as the "Austrian method." Irmina 16 has described a "complementary method" in which either decomposition or equal addition may be used. However, since no experimental evidence has been presented with respect to its merits, this method is not considered here. 1. Summary of reported conclusions. The conclusions of Buck- ingham, 17 Mead and Sears, 18 and Taylor 19 favor the subtractive methods in comparison with the additive methods. 20 The only con- clusion favorable to the additive methods is that of Beatty 21 who found that greater accuracy but less speed resulted from their use. Ballard, 22 McClelland, 23 and Winch, 24 studied the relative merits of decomposition, or borrowing, versus equal addition, or carrying, in connection with the subtractive procedure. 25 In each case the results favored the equal addition, or carrying, process. 26 Johnston's 27 pupils used both the subtractive and additive general methods. For "Irmina, Sister M. "The Relative Merits of the Methods of Subtraction," Catholic University of lS'p 4 5 " ^arch Bulletins, Vol. Ill, No. 9. Washington: Catholic Education Press, F „ ( ^Buckingham B. R. ''The Additive versus the Take-Away Method of Teaching the Subtraction hacts ;.,f durational Research Bulletin (Ohio State University), 6:265-69, September ?8 1927 (16^ isMead C. D. and Sears, Isabel. "Additive Subtraction and Multiplicative Division Tested " Journal of Educational Psychology, 7:261-70, May, 1916. (72) ' "Taylor, J. S "Subtraction by the Addition Process," Elementary School Journal, 20:203-7, INovember, 1919. (114) rather°than'the kS?**' ^ conclusions favor the first two Procedures illustrated on pages 19 and 20 method^ following stud y- not accessible to the writers, also favored the subtractive equal additions "Methods of Subtraction," St. Louis Public School Messenger, 26:28-32, September 1. 1928 (128) ci„ ; ? eatt -y; ^/, W - '' Tl '? Add [<- iv e versus the Borrowing Method of Subtraction," Elementary School Journal, 21:198-200, November, 1920, (8) 22 Ballard, P. B "Norms of Performance in the Fundamental Processes of Arithmetic, with '; u n g f n st I° ( ns for Their Improvement," Journal of Experimental Pedagogy, 2:396-405, December 5, 1914- 3:9-2U, March 5, 1915. (5) "McClelland, W. W "An Experimental Study of the Different Methods of Subtraction," Journal of Experimental Pedagogy, 4:293-99, December 5, 1918. (69) • 2n y inc £' W - **• "'Equal Additions' versus 'Decomposition' in Teaching Subtraction: An Ex- 192™ er (125) C '' U °f Ex P erimental Pedagogy, 5:207-20, 261-70; June 5, December 6, 26 The first two procedures illustrated on page 19. 26 The second of the procedures illustrated on page 19. "Johnston, J. T. "The Merits of Different Methods of Subtraction," Journal of Educational Research, 10:279-90, November, 1924. (52) Summary of Research Relating to the Teaching of Arithmetic 21 both of these groups equal addition, or carrying, was found to be superior. 2. Evaluation of experiments. McClelland (69), Mead and Sears (72), Winch (125), and Buckingham (16) experimented with school children. In all cases the experimental factor was defined and suffi- ciently restricted. The other criteria, however, were not fully satisfied. McClelland (69) employed two groups of children between twelve and one-half to thirteen and one-half years of age in an English school. One group of thirty-four had been accustomed to use the method of equal addition, and the other group of thirty-two, the method of decomposition. After an initial program of testing, which revealed that the equal-addition group was significantly superior, the groups were practiced in their respective methods for a period of twenty weeks. The equal-addition group achieved the greater per cent increase in speed and accuracy. It is evident that McClelland is to be criticized for failure to secure equivalence at the beginning of his experiment. It is possible that the group using the equal-addition method consisted of more intelligent children and, in consequence, made the greater gain in achievement. Furthermore, the degree of control of non-experimental factors is not known. Winch (125) conducted two experiments with girls in English schools. In the first, two groups of nineteen eleven-year-old girls were equated on the basis of scores on a series of initial subtraction tests. All of the children had previously used the decomposition method. In the experiment, one group was practiced in this method, while the other learned the equal-addition method. After eight lessons of fifteen to twenty minutes each the achievement of the group learning the equal-addition method slightly surpassed that of the other, as shown by the scores on a series of final subtraction tests. The second experiment was conducted with two groups of twenty- three eight and one-half year old girls who had been accustomed to the equal-addition method. After equivalence had been secured with respect to ability to subtract, one group was practiced in the equal- addition method, while the other group learned the method of decom- position. After eight lessons of thirty minutes each, four final tests were given. The difference between the final-test means in favor of the equal-addition method is approximately seven times its probable error. Winch is to be commended for his care in securing equivalence with respect to initial subtraction ability, for efforts to control non- experimental factors, and for the statistical treatment of his results. He is to be criticized for the non-representativeness and smallness of his groups and for the short duration of his experiments. While the 22 Bulletin No. 58 techniques used in these experiments are in many respects excellent, it seems unsafe to generalize the findings reported. Mead and Sears (72) used two second-grade classes of unreported size which were shown to be approximately equivalent with respect to ability in addition. One group was taught additive subtraction for four months while the other group learned the subtractive method. The final test revealed a possibly significant difference in favor of the subtractive method, so far as single-column subtraction was con- cerned. An additional test of three-figure-subtraction examples revealed no significant difference between the groups. Mead and Sears are to be criticized for failure to secure more adequate equiv- alence and for not revealing the size of their groups. They are to be commended for certain precautions taken to secure control of non- experimental factors and for their rather satisfactory interpretation of results. In the experiment of Buckingham (16) seven pairs of groups ranging in size from five to twenty-nine pupils were equated in seven schools by means of the Pressey Primary Classification Test. Each of the teachers participating in the experiment taught both groups of a pair for-a period of seven months, at the end of which time the pupils were tested for their proficiency in single-column subtraction. The differences in achievement for six of the seven groups favored the subtractive method as compared with the additive, but in no case was the difference "statistically" significant. Buckingham is to be com- mended for his techniques in securing equivalence, for using children of no initial ability in subtraction, and for certain precautions taken to secure control of non-experimental factors. He is also to be com- mended for using so many different groups and schools. The inter- pretation of his data would seem to exaggerate the effectiveness of the subtractive method. A more conservative interpretation would seem to be required. Ballard (5), Beatty (8), Taylor (114), and Johnston (52) have reported the results of investigations in which the data were collected by test from pupils whose method of subtracting had been deter- mined. Ballard (5) administered his test to 18,678 eight- and nine- year-old English school children. He found the achievement in sub- traction in schools where equal addition, or carrying, was taught to be significantly superior to the achievement in schools where decompo- sition, or borrowing, was taught. He is to be criticized for failure to determine more adequately the methods actually used by the pupils. Taylor (114) had teachers of 11,368 fourth-, fifth- and sixth-grade children put a subtraction example on the board and determine, by Summary of Research Relating to the Teaching of Arithmetic 23 asking the children what they would say in solving the given example, the methods of subtraction that the children were using. His data showed that only 37.6 per cent were continuing to use the additive equal-addition method which they were supposedly taught, while the balance of the pupils had somehow learned and were using subtractive methods. Beatty (8) has criticized Taylor (114) for concluding that his results showed the inferiority of the additive equal-addition method, since evidence was not secured to prove that no other method was taught. Beatty (8) administered the Courtis Research Standard Tests, Series B, to 54 pupils who used the additive methods and 115 pupils who used the borrowing (subtractive?) methods. While his results favor the additive methods for accuracy, they favor the borrowing methods for speed. He is to be criticized for his few cases and for failure to define the methods evaluated. He does contribute the information that 51.8 per cent of one group of eighty-three children actually did abandon the additive for borrowing methods. Johnston (52) determined the subtraction methods used by 277 normal-school students and tested the students for speed and accuracy. His results are slightly significant with respect to the superiority of equal addition, or carrying, when used both with additive and sub- tractive methods, but are entirely inconclusive with respect to the additive versus subtractive methods. Ruch, Knight, and Lutes 28 have criticized Johnston for failure to make adequate allowance for the statistical limitations of his data. A computation by them of probable errors of the differences showed that none of the differ- ences were "statistically" significant. Johnston 29 replied that their computations failed to consider the significant difference in speed in favor of the equal-addition method. When the accuracy means are corrected for speed, Johnston claims the difference is significant. Ruch, Knight, and Lutes 30 replied to this that no differences can be con- sidered "statistically" significant from groups of eight, thirteen, or twenty-three cases. They add that the original report should have contained adequate information with respect to standard deviations and probable errors. 3. Justified conclusions. The great majority of the investigations favored the subtractive, or take-away, methods rather than the addi- tive methods, and the equal-addition, or carrying, process rather than "Ruch, G. M., Knight, F. B., and Lutes, O. S. "On the Relative Merit of Subtraction Methods: Another View," Journal of Educational Research, 11:154-55, February, 1925. "Johnston, J. T. "Still on the Relative Merits of Subtraction Methods," Journal of Educational Research, 12:80-83, June, 1925. 30 Ruch, G. M., Knight, F. B., and Lutes, O. S. "A Rejoinder to Professor Johnston's Criticisms, Journal of Educational Research, 12:83-85, June, 1925. 24 Bulletin No. 58 that of decomposition, or borrowing. However, the faulty techniques used in these investigations, plus the failure to find truly significant differences in achievement between the different methods, would cause one to question the dependability of a conclusion in favor of the subtractive method in which equal addition is used, although the evidence is in its favor. In this connection it is interesting to note that in two summaries of research in the field of arithmetic, Buswell favors the subtractive method in which equal addition is used. 31 This conclusion agrees with that of Irmina, 32 but differs with that of Knight, Ruch, and Lutes, 33 who present certain theoretical considerations in favor of the subtractive method in which borrowing or decomposition is used. Osburn 34 has reported a summary in which he computed the statistical errors of the differences given in the experimental literature. He states that the differences are significantly in favor of the subtractive equal-addition method as compared with the subtractive decompo- sition method, but the subtractive equal-addition method has not been shown to be significantly superior to the additive methods, although the chances are 16 to 1 in its favor. In another recent review of the subtraction experiments the opinion is expressed that "the differences among the rival methods of subtraction must be small; otherwise centuries of observation and a dozen empirical studies would long since have laid down the broad outlines of truth." 35 DIVISION 1. Summary of reported conclusions. There have been only two investigations of the methods of teaching and of learning division. Mead and Sears 36 report that multiplicative division is superior to the traditional method. They illustrate multiplicative division as follows: 4 The .... multiplicative-division class said: "5 | 20, five times what are twenty? Five times four are twenty. Conard and Arps (32) reported that in division the most effective results are secured when pupils are taught to "think results only." 31 Buswell, G. T. and Judd, C. H. "Summary of Educational Investigations Relating to Arith- metic,' Supplementary Educational Monographs, No. 2 7. Chicago: University of Chicago Press, 1925, p. 78. Buswell, G. T. "A Critical Survey of Previous Research in Arithmetic," Twentv-N inth Yearbook of the !\>ational Society for the Study of Education. Bloomington, Illinois: Public School Publishing Company, 1930, p. 460-61. 32 Irmina, op. cil., p. 26-27. 33 Knight, F. B., Ruch, G. M., and Lutes, O. L. "How Shall Subtraction Be Taught?" Journal of Educational Research, 11:168, March, 1925. 34 Osburn, W. J. "How Shall We Subtract?" Journal of Educational Research, 16:237-46, No- vember, 1927. 35 Ruch, G. M. and Mead, C D. "A Review of Experiments on Subtraction," Twentv-Xinth Yearbook of the National Society for the Study of Education. Bloomington, Illinois: Public School Publishing Company, 1930, p. 678. 36 Mead and Sears, op. cit. Summary of Research Relating to the Teaching of Arithmetic 25 2. Evaluation of experiments. Two third-grade classes of unre- ported size participated in the experiment by Mead and Sears (72). The initial test, which was in addition, showed some lack of equiv- alence so far as the trait tested was concerned. No other attempt was made to estimate the degree of equivalence. The division prac- tice of both groups was restricted to simple division by fives. At the end of four months a possibly significant difference was found in favor of "multiplicative" division, as restricted in the preceding statement. A final test containing longer examples showed no sig- nificant difference between the groups. Mead and Sears are to be criticized for failure to secure equivalent groups, for failure to report the size of the groups used, for the restricted character of the training, and for attempting to correct for lack of equivalence in an unjusti- fiable manner. The units and zero points of the initial and final tests were shown to be different, and, therefore, correction by subtracting the difference between initial-test means from the difference between final-test means cannot be condoned. 37 Furthermore, there was not adequate control of the non-experimental factors. The experiment of Conard and Arps (32) was evaluated under addition. 38 3. Justified conclusions. The faults of these two experiments make the listing of a justifiable conclusion impossible. It is doubtful whether the conclusion of Mead and Sears (72) should be regarded as indicative or suggestive. FRACTIONS, DECIMALS, PERCENTAGE, AND PROPORTION, AND DENOMINATE NUMBERS 1. Summary of reported conclusions. Collier 39 has reported that children learn to multiply fractions effectively, if addition of fractions is used as a point of departure. For example, a child may be taught to multiply 4 by % through a request to add %, %, %, %. When the result % has been obtained by the child, the teacher should point out that 8 is the product of 4 X 2. Anspaugh 40 has reported that drill on the fundamental combinations is effective in securing greater effi- ciency in handling common and decimal fractions. 41 37 Monroe, W. S. and Engelhart, M. D. "Experimental Research in Education," University o Illinois Bulletin, Vol. 27, No. 32, Bureau of Educational Research Bulletin No. 48. Urbana: Uni versity of Illinois, 1930, p. 63. (Footnote 14) 3& See page 18. 39 Collier, Myrtle. "Learning to Multiply Fractions," School Science and Mathematics, 22:324-29, April, 1922. (30) 40 Anspaugh, G. E. "Teaching the Number Facts in the Komenskv School," Chicago Principals' Club, Second Yearbook. Chicago: Chicago Principals' Club, 1927, p. 88-89. (2) 41 Knight and Setzafandt have shown that training in the addition of fractions having certain denominators transfers to the addition of fractions having other denominators. Some inferences might be drawn from their conclusions with respect to effective methods of teaching the addition of fractions. See: Knight, F. B. and Setzafandt, A. O. H. "Transfer within a Narrow Mental Function," Ele- mentary School Journal, 24:780-87, June, 1924. (62) 26 Bulletin No. 58 Clapp, Chase, and Merriman 42 found that practice material so prepared that it focuses the attention of the pupils on the kind of per- centage problem they are attempting to solve is more effective than the ordinary textbook material. In ordinary textbook material prob- lems solved similarly are grouped together, but in the experimental material "the pupil is not aided in solving the second problem (of a group of problems) by having solved the first one, unless he begins to understand the principle that underlies the solution of such problems." The nature of the problem statements is varied in the experimental material, and some problems not involving percentage are included to keep the minds of the pupils alert to the kinds of problems they are solving. Monroe 43 concluded that children do not learn to place the deci- mal point in a quotient by a general rule, or as the result of the acqui- sition of a general ability. He contends that the placing of the decimal point in quotients requires several specific abilities. Drushel 44 investigated the relative merits of two methods of plac- ing the decimal point in long division by a test administered to college freshmen. In Method A the student used the rule: "There are as many places in the quotient as those in the dividend exceed the divisor." In Method B the rule was: "First render the divisor an integer by multiplying both dividend and divisor by 10 or some power of 10. Then proceed as with integral divisors." The conclusion favors Method B. Winch 45 has reported that the "method of unity" is an effective method of teaching proportion. This method is illustrated in the following problem: I pay 4 shillings for 2 pairs of boots. What shall I have to pay for 1 pair? What shall I have to pay for 3 pairs? The use of the two questions in these problems directs the solution of the problem from the easy to the more difficult. Winch also reported that proportion in its simpler forms may be taught to children as young as seven years of age, that there do not appear to be any clear sex differences in ability to handle proportion, and that vacation seemed to have little effect on the proportion abilities. He states the very interesting conclusion : "The pupils of schools of very low social class — 'slum schools' — cannot, even in the most favorable 42 Clapp, F. L., Chase, W. J., and Merriman, Curtis. "A Study of the Effectiveness of Two Kinds of Teaching Material," Introduction to Education. Boston: Ginn and Company, 1929, p. 420-24. (.25 ) 43 Monroe, W. S. "The Ability to Place the Decimal Point in Division," Elementary SchoolJ our nal, 18:287-93, December, 1917. (77) 44 Drushel, J. A. "A Study of the Amount of Arithmetic at the Command of High-School Grad- uates Who Have Had No Arithmetic in Their High-School Course," Elementary School Journal, 17:657-61, May, 1917. (35) "-Winch, W. H. "Should Young Children Be Taught Arithmetical Proportion?" Journal of Experimental Pedagogy, 2:79-88, 319-30, 406-20; June, 1913; June 5, December 5, 1914; 3:89-95, June 5, 1915. (126) Summary of Research Relating to the Teaching of Arithmetic 27 pedagogical circumstances, be expected to undertake the work at as early an age as the others." Springer 46 conducted an experiment in which the effectiveness of memorizing tables of cubic and linear measure was compared with the effectiveness of using the facts of these tables in connection with problems. The conclusions favor isolated memorizing of denominate- number facts rather than attempting to learn them in connection with the solving of problems in which they occur. 2. Evaluation of the experiments. The studies of Collier (30); Clapp, Chase, and Merriman (25); Winch (126); and Springer (109) were experimental in nature. Collier (30) used two groups of fifth- grade children, each of which numbered four individuals. No attempt was made to secure equivalence, and the experiment lasted only five days. It was observed that the experimental pupils learned to mul- tiply fractions more quickly than did the control pupils. It is evident that this was a very crude experiment. Its faults are many: small groups, lack of equivalence, short duration, inadequate measurement of gains, and so on. The conclusion that children should be taught to multiply fractions through addition of fractions seems reasonable, but Collier's evidence in support of this conclusion is of doubtful value. Clapp, Chase, and Merriman (25) employed twenty-three pairs of groups of unreported size. Both groups of a pair were taught in the same room by the same teacher. Equivalence was sought with respect to intelligence and initial ability in arithmetic. The duration of the experiment is not stated. At the end of the experiment three tests of eight percentage problems and two other problems each were administered. The results in twenty out of the twenty-three rooms favored the experimental factor — the novel percentage practice ma- terial. Clapp, Chase, and Merriman are to be commended for using so many pairs of groups, for attempting to secure equivalence with respect to two important pupil characteristics, and for the attempt to control non-experimental factors by having the same teacher instruct both experimental and control children. Instruction of a pair of classes by the same teacher, however, does not necessarily insure com- plete control of the non-experimental factors. Since the practice material was novel, it would not be unreasonable if there was some lack of equivalence in the teacher factors of zeal and effort. Further- more, the merit of the experiment is possibly obscured by the method of reporting. One wishes for data relative to the sizes of the groups, to the degree of equivalence secured, and to the differences in gains in 46 Springer, Isidore. "Teaching Denominate Numbers," Journal of Educational Psychology, 6:630-32, December, 1915. (109) IS 28 Bulletin No. 58 achievement along with measures of the "statistical" significance of these differences. The report by Winch (126) refers to five single-group experiments and one control-group experiment. The single groups varied in size from 39 to 361. The smallest group was located in a school in a good district; the rest were located in schools in the poorer districts of London, England. There was no attempt in the single-group experi- ments to control non-experimental factors. These experiments lasted from three to five months in the different groups. At the close of each experimental period, informal tests were administered and the im- provement was noted. In the controlled experiment, two groups of twenty-three English school girls averaging nine years of age were equated with respect to initial arithmetical ability, as revealed by a series of preliminary tests. One group was taught in the usual fashion, while the other group was instructed in proportion by the method of unity. After three practice periods of 17, 16, and 22 min- utes' duration, two of the preliminary tests were repeated. The difference in achievement favors the method of unity, but since thi difference is but 2.5 times its probable error, it may not be regarded as "statistically" significant. Winch is to be commended for his care- ful analysis of the method of instruction used, for repeated experi- ments, and for his attempts to allow for the influences of non-ex- perimental factors even where control groups were not used. Springer (109) used two sixth-grade groups of fifty pupils each. Equivalence was secured with respect to initial ability in arithmetic as revealed by a test of arithmetical problems and with respect to language ability as shown by a language test. The experimental factor does not appear to have been adequately defined and isolated, and, although the groups were rotated, the control of the non-experi- mental factors was not satisfactory. The experiment is also to be criticized for its short duration — six periods of ten minutes each. The differences in achievement in favor of the isolated learning of the denominate-number facts appear to be fairly significant, although no standard or probable errors are given. The experiment is to be criti- cized for its failure to secure adequate control of non-experimental factors, as well as for its short duration. Monroe (77) collected his data relative to the abilities required in placing the decimal point in division by means of four tests lasting one minute each, which were administered to seventy-eight sixth-, seventh-, and eighth-grade pupils. Anspaugh (2) merely reports what happened in a few elementary schools as a result of greater attention to the mastery of the fundamental number combinations. Summary of Research Relating to the Teaching of Arithmetic 29 His study may be termed "experimental" only in the sense any trial of a new method is experimental. Drushel (35) collected his data by the administration of his test to 624 entering college freshmen. The test results revealed that the method in which the divisor is rendered an integer by multiplying both dividend and divisor by 10 or by some power of 10 is significantly the better method. Considering the number of cases on which it is based, the "statistically" significant differences in achievement, and the approximate equivalence of the groups in general arithmetical ability, this conclusion seems quite dependable. This investigation, however, is not an experiment and consequently the degree of control of non-experimental factors is unknown. Hence, the superiority of "rendering the divisor an integer by 10 or some power of 10" cannot be said to have been demonstrated. 3. Justified conclusions. The crudity of the experiments de- scribed prevent the listing of justified conclusions. CHAPTER III DRILL IN THE FUNDAMENTALS Consideration is given first in this chapter to the experiments which have been conducted for the purpose of revealing the effect of drill in the fundamentals. Attention is given next to the relative merits of systematic and incidental instruction in calculation. This is followed by a summary of the investigations in which the type of learning exercises was made the experimental factor. The chapter closes with an evaluation of the research on methods of distributing practice time in arithmetical calculation and on the influence of requests for speed and for accuracy on achievement in the funda- mentals. THE EFFECT OF SYSTEMATIC DRILL IN THE FUNDAMENTALS 1. Summary of reported conclusions. Studies of the effect of a period of systematic drill on achievement in arithmetical calculation 1 have produced evidence in support of the wide-spread belief that ability to add, subtract, multiply, and divide may be increased by systematic drill. Hagen 2 is the only investigator whose findings are not in entire agreement with this belief. 2. Evaluation of the experiments. Although Brown's study (13) is the earliest of this group, the technique used seems to have been superior to the techniques of any of the later experiments. In the first of Brown's studies, two groups of twenty-five sixth-, seventh-, and eighth-grade children were paired on the basis of their initial ability in arithmetic. The arithmetic instruction of one of the groups differed from that of the other in that five minutes of each of thirty recitation periods were devoted to drill in the four fundamentals. At the end of the experiment, a final test, similar to the initial test by which the groups were equated, was administered. The second exper- 'Brown, J. C "An Investigation on the Value of Drill Work in the Fundamental Operations of Arithmetic," Journal of Educational Psychology, 2:81-88, February, 1911; 3:485-92, 561-70; November. December, 1912. (13) Burton, C. B. "Results of Definite Drill in the Four Fundamental Processes as Shown by the Woody-McCall Mixed Fundamentals," Fifth Yearbook of the Department of Elementary School Principals. Washington: National Education Association, 1926, p. 323-28. (19) Kerr, M. A. "Effects of Six Weeks Daily Drill in Arithmetic," Studies in Arithmetic, Indiana University Studies No. 32. Bloomington: Indiana University, 1916, p. 79-95. (56) Phillips, F. M. "Value of Daily Drill in Arithmetic," Journal of Educational Psychology, 4:159-63, March, 1913. (100) -vs.. Smith, J. H. "Individual Variations in Arithmetic," Elementary School Journal, 17:195-200, November, 1916. (107) Wiramer, H. "Experimental Study of the Effects of Drill in Arithmetic Processes under Yarv- mg Conditions," Indiana University Studies No. 32. Bloomington: Indiana University, 1916, p. 96-102. (124) 2 Hagen, H. H. "A Study of Practice Periods in Arithmetic Fundamentals," Chicago Principals' Club, Second Yearbook. Chicago: Chicago Principals' Club, 1927, p. 93-95. (45) 30 Summary of Research Relating to the Teaching of Arithmetic 31 iment was similar to the first with the exception that 222 pupils in four schools participated for twenty recitation periods. Brown is to be commended for the techniques which he used in securing equiv- alent groups, for his care in controlling non-experimental factors, and for his elaborate analysis of the data. He is also to be commended for repeating his experiment with pupils in several schools and in different cities. His differences in gains in achievement, secured in this way, are of sufficient magnitude to support adequately his conclusion with respect to the effect of systematic drill of five minutes per day on achievement in arithmetical calculation. The other five studies of the effect of systematic drill are subject to criticism. Kerr (56) used 423 sixth-, seventh-, and eighth-grade children in her single-group experiment. These children received five minutes of drill in addition, daily, for a period of six weeks. The application of an initial and a final test showed a gain in ability to add, but the significance of this gain is obscured by the failure of the experimenter to employ a control group. Phillips (100) used two groups of thirty-four and thirty-five sixth-, seventh-, and eighth- grade children. After these pupils had been paired on the basis of initial ability in arithmetic, the members of the experimental group were given ten minutes of daily drill in the fundamental operations and with reasoning problems (mental arithmetic). At the end of two months the final test showed a "statistically" significant gain for the drill group. The techniques employed by Phillips seem much superior to those employed by Kerr (56) , but his experiment does not seem to be without fault. The size of his groups was small, and the instructional conditions were not entirely normal. Smith (107) used three fifth- and sixth-grade classes of unreported size. No attempt was made to secure equivalence. One class received what amounted to diagnosis and remedial treatment during drill. The second class received extra drill for the inferior pupils. The third class was merely drilled. After three drill periods per week of twenty-five minutes each for four weeks the final tests were administered. The magnitude of the gains in achievement seems to warrant the statement: "All three types of drill produced very large increases in the achievement of the pupils." The conclusions which state that the first type of drill is significantly superior to the other two would seem to be less depend- able. Smith is to be criticized for failure to secure equivalent groups, for evidently poor control of the time factor, and for failure to report the size of his groups. With respect to the comparative value of drill, this must be regarded as a single-group or uncontrolled experiment. 32 Bulletin No. 58 Wimmer (124) employed fifth-, sixth-, seventh-, and eighth-grade pupils. The pupils in the sixth grade were divided into two appar- ently equivalent groups of twenty-two pupils each. The other classes which averaged about thirty-five pupils each were used as single groups. The Courtis Standard Test, Series A, was administered at the beginning of the experiment, at the end of six weeks, and again at the close of the experiment — twelve weeks from the beginning. Com- parisons are made between the gains of the different classes and be- tween the two groups of the sixth-grade class. The classes which had systematic drill made the greater gains, but the magnitude of the differences in gains is obscured by faulty or complete lack of equiv- alence. The gains are large enough, however, for the classes which had drill to justify the conclusion that "it pays to give regular drill work in arithmetic." Burton (19) employed 2500 third-, fourth-, fifth-, sixth-, seventh-, and eighth-grade pupils in the white rural schools of a county in one southern state. Systematic drill was administered ten minutes daily for a period of six weeks. Curves are given to show the consistent gains in efficiency made by the pupils. The experimenter is to be commended for the large number of pupils used, but he is to be criti- cized for not using some of the pupils for control purposes. Hagen (45) employed twelve pairs of groups of fourth-, fifth-, sixth-, and seventh-grade pupils which were equated on the basis of intelligence test scores. Each teacher participating in the experiment taught a pair of groups. One of the groups of each pair received systematic drill in fundamental problems twice each day, while the other group received the drill once a day. After three months of such insttuction the final test was administered. The difference in achieve- ment, when the gains of all the groups are averaged, slightly favors drill once a day. That this difference is not of much significance is shown by the fact that in six of the twelve pairs of groups the mean differences in achievement slightly favor the use of drill twice a day. The following statement of Buswell relative to the experiment seems justified: "Data might be interpreted differently." 3 3. Justified conclusions. If the Law of Exercise is accepted, it is obvious that pupils who have not attained their maximum skill in arithmetical calculation will profit from systematic drill, especially when the drill is conducted in a way that stimulates a desire to in- crease achievement in this field. Consequently this group of six studies may be labelled as "attempts to prove the obvious." The 3 Bus\vell, G. T. "Summary of Arithmetic Investigations,*' Elementary School Journal, 28:705, May, 1928. Summary of Research Relating to the Teaching of Arithmetic 33 conclusions, except possibly certain incidental details, are merely what should have been anticipated. THE RELATIVE VALUE OF SYSTEMATIC VERSUS INCIDENTAL TEACHING OF CALCULATION 1. Summary of reported conclusions. Meriam 4 and Collings 5 se- cured results that favored incidental teaching of calculation, but Gates, Batchelder, and Betzner 6 have reported that the differences in arithmetic achievement in their experiment favored the "systematic" rather than the "opportunistic" method instruction. Wilson 7 has reported recently that incidental instruction of the informational type is just as effective as instruction of the traditional type, so far as the first two grades are concerned, and that a combination of both types with more emphasis on systematic drill results in very superior arithmetical achievement in the third grade. One of the conclusions of the investigation recently reported by Olander (94) may be interpreted in favor of systematic teaching of calculation: Examination of the scores of one group of children who had no formal instruc- tion in arithmetic for twelve out of the seventeen weeks of the experiment and of another group who had no formal arithmetic instruction whatsoever during the entire seventeen weeks shows that, during the time when no class instruction in numbers was being given, the children learned from approximately a third to less than a half as many number combinations as did the children who were being given the regular class instruction. 2. Evaluation of the experiments. Meriam (73) merely reported a comparison of grades in high school of 362 pupils who had received incidental instruction in arithmetic, in the elementary school, with the grades of those who had had the more traditional form of instruc- tion. The findings of such an investigation cannot be accepted as conclusive, in any sense. There were too many factors unaccounted for which may have influenced the results. Collings (31) used forty-one pupils in one rural school as his exper- imental group and sixty pupils in two other rural schools as his control group. The initial arithmetic test revealed the fact that the experi- mental pupils were slightly inferior to the control pupils in ability in the four fundamentals. Collings also presents much evidence relative to the approximate equivalence with respect to reading ability, hand- «Meriam, J. L. "How Well May Pupils Be Prepared for High School Work without Studying Arithmetic, Grammar, etc., in the Grades?" Journal of Educational Psychology, 6:361-64, June, 1913. (73) 5 Collings, Ellsworth. An Experiment with a Project Curriculum. New \ork: Macmillan Company, 1923. 346 p. (31) 6 Gates, A. I., Batchelder, M. I., and Betzner, Jean. "A Modern Systematic versus an Oppor- tunistic Method of Teaching," Teachers College Record, 27:679-700, April, 1926. (40) . . ,, 7 Wilson, G. M. "New Standards in Arithmetic: A Controlled Experiment in Supervision, Journal of Educational Research, 22:351-60, December, 1930. (123) 34 Bulletin No. 58 writing ability, spelling ability, chronological age, number of years of schooling, number of years spent in the experimental schools, school attitudes, community attitudes, social and economic status of the districts, parentage of children, length of school term, course of study, and so on. After four years of the project curriculum in the experimental school and four years of the traditional curriculum in the control schools the final tests were administered. With respect to ability in the four fundamentals, the differences favor, but not sig- nificantly, the informal method. Collings has been criticized for his failure to control important non-experimental factors: In the experiment by Collings the children taught by the project method achieved more than those taught by the traditional method, but it appears from Collings' report that these teachers worked much harder at their task than did the teachers in the control schools. In view of this fact, it does not appear justifiable to ascribe the superior achievement of the project-method group entirely to the method of instruction. 8 Gates, Batchelder, and Betzner (40) employed two groups of twenty-five first-grade children who were approximately equivalent with respect to such traits as sex, chronological age, mental age, general information, speed of reading, oral spelling, and so on. The group subjected to the opportunistic method was somewhat inferior to the other group in initial ability in oral arithmetic. Techniques used to control teacher factors are described in the following quotation: Both teachers were interested in the project as an experimental study; both, understanding that the results would in no way reflect upon their professional reputation, taught their pupils as under ordinary circumstances except for certain imposed limitations and regulations which were cheerfully accepted and faith- fully observed. Both teachers followed the same general schedule, the same time assignment to different phases of the work, recesses, lunch periods, assembly music, gymnasium work, and so forth. Neither teacher gave any out-of-school time to individual pupils nor allowed others to do so; neither suggested home work, and each as far as possible, prevented it. Neither was given any assistance in teaching; neither enjoyed any advantage in clerical or other help, in funds for materials, in special demonstrations, and so on. It is the opinion of the present writers that the techniques used to control the teacher factors and the other non-experimental factors in this experiment were superior to those used by Collings ( 31 ). "Each of the two methods, 'the modern systematic' and the 'opportunistic,' was followed by an exceptionally able teacher who was experienced in the method and believed it to be, on the whole, the best one." 9 8 Monroe, W. S. and Engelhart, M. D. "Experimental Research in Education," University of Illinois Bulletin, Vol. 27, No. 32, Bureau of Educational Research Bulletin No. 48. Urbana: Uni- versity of Illinois, 1930, p. 36. 9 Gates, Batchelder, and Betzner, op. cit., p. 682. Summary of Research Relating to the Teaching of Arithmetic 35 If this was the case, it would seem that the teacher factors, skill and zeal, were rather adequately controlled. The difference in achievement, as revealed by the final test in arithmetic at the end of the year, was 2.5 times the probable error of the difference. As such, the difference may be regarded as possibly "statistically" significant. A limitation of this experiment, so far as arithmetic is concerned, is the lack of equivalence in arithmetic ability at the beginning of the experiment. Some of the difference in the final achievement in arithmetic may be attributed to the initial superiority of the systematic group. Hence, it appears that the dif- ference should not be interpreted as more than suggestive. Wilson (123) compared the scores of 475 pupils completing the second grade, who had received informal or incidental instruction in arithmetical calculation, with the scores of one group of 174 second- grade pupils and one group of 154 third-grade pupils, who had re- ceived the traditional formal type of instruction. These data support the contention that up to the close of the second grade the informal type of arithmetical instruction results in achievement equal to, and possibly superior to, the achievement resulting from formal instruc- tion. In the later phases of Wilson's experiment over one thousand third-grade children were subjected to a combination of incidental and systematic instruction. One day a week during the third year was devoted to incidental instruction of the informational type, while the other four days were devoted to systematic drill on addition and subtraction. The tables of test results indicate that the pupils at- tained a very high level of achievement in addition and subtraction. W T hile Wilson's conclusion seems well supported by his data, one wonders whether too much emphasis was not placed on the informal aspect of the instruction and too little recognition given to the part played by systematic drill in securing the superior achievement of the third-grade children. In dander's experiment (94) one group of one hundred second- grade pupils received no instruction in arithmetic for the last twelve of the seventeen weeks of the experiment. 10 Another group of eighty- six pupils received no formal arithmetic instruction during the entire seventeen weeks. The achievements of these groups were compared with each other and with the achievement of a group of 296 pupils receiving daily instruction. The initial ability of the group of eighty- six was considerably superior, and that of the group of one hundred, slightly superior, to the initial ability of the group receiving daily 10 See pages 17 to 18 for evaluation of this experiment, which had to do with the effectiveness of generalization instruction. 36 Bulletin No. 58 instruction. It would seem, therefore, that the differences in favor of systematic daily instruction are rather highly reliable. 3. Justified conclusions. The conflicting conclusions of the ex- periments evaluated prevent the formulation of a justified conclusion favoring either the incidental or the traditional method of instruction in arithmetic. The question as to which method is superior awaits further experimental investigation. In view of the relatively specific character of calculation abilities and the demonstrated efficacy of systematic drill, it is difficult to conceive of the incidental method alone as highly efficient. It is possible that the best method would be a combination of the two procedures. THE RELATIVE MERITS OF CERTAIN GENERAL TYPES OF LEARNING EXERCISES FOR DRILL IN CALCULATION 1. Summary of reported conclusions. Ten experimental investi- gations are summarized under this heading. Evans and Knoche 11 have reported that drill in which Studebaker Economy Practice Exercises are used results in achievement superior to that resulting from the use of learning exercises based on materials devised by the teacher. Kelly 12 compared the effectiveness of the Courtis Standard Practice Tests, the Studebaker Economy Practice Exercises, and "the best methods of drill which the teachers could devise." He reported that the Courtis drill material is superior to the Studebaker material, but that both are superior to drills devised by the teachers. Mead and Johnson 13 compared the Courtis Standard Practice Tests with the Thompson Minimum Essentials and reported a conclusion favorable to the Courtis material. Morgan 14 compared the effective- ness of the Economy Remedial Exercise Cards when used with the Compass Diagnostic Tests to that of Lennes' Pads and reported a conclusion favorable to the former. Newcomb 15 found that drill exercises prepared in such a way that proportionate drill is given on the higher decades are more effective than those ordinarily used. Fowlkes 16 concluded that it is desirable "to teach the one hundred combinations (multiplication) by means of text material alone, the teacher doing as little talking as possible" and "to make remedial adjustments by means of printed directions and devices rather than n Evans, J. E. and Knoche, F. E. "The Effects of Special Drill in Arithmetic as Measured by the Woody and the Courtis Arithmetic Tests," Journal of Educational Psychology, 10:263-76, May-June, 12 Kelly, F. J. "The Results of Three Types of Drill on the Fundamentals of Arithmetic,** Journal of Educational Research, 2:693-700, November, 1920. (55) "Mead, C D. and Johnson, C W. "Testing Practice Material in the Fundamentals," Journal of Educational Psychology, 9:287-97, May, 1918. (71) 14 Morgan, L. D. "Specific vs. General Drill in the Fundamentals of Arithmetic," School Science and Mathematics, 29:528-29, May, 1929. (80) 15 Newcomb, R. S. "Effective Drill Exercises in Arithmetic," Journal of Educational Psychology, 16:127-31, February, 1925. (88) 16 Fowlkes, J. G. "A Report of a Controlled Study of the Learning of Multiplication by Third- Grade Children," Journal of Educational Research, 15:181-89, March, 1927. (38) Summary of Research Relating to the Teaching of Arithmetic 37 oral instruction." Knight 17 has reported a conclusion which favors "drill material carefully constructed as to the distribution of practice in addition, subtraction, multiplication, and division of whole num- bers," rather than drill material "slightly in excess as to sheer amount but so built that certain combinations were slighted." 18 The con- clusions of Newcomb (88),Fowlkes (38), and Knight (60) all favor the contention that the relative difficulty of the number combinations must be accounted for in preparing efficient materials of instruction for use in drill. Kulp 19 investigated the relative effectiveness of two types of prac- tice material, the essential difference between the two being that one of the types provided practice in solving reasoning problems in con- nection with computational drill. It is reported that the material which provided practice in arithmetical reasoning was relatively more effective in securing computational achievement, and that its use resulted in a decided increase in arithmetical reasoning ability. A similar conclusion is reported by Rosse. 20 These conclusions seem to agree with that reported by Kirkpatrick 21 several years ago. Kirk- patrick found that use in calculation is a more effective means of learning the multiplication combinations than memorization divorced from use. Myers and Myers 22 investigated the problem of whether it was better to find mistakes among a group of examples of addition, multi- plication, and subtraction combinations than to think of the corre- sponding correct associations. Their results are favorable to learning exercises which emphasize correct associations rather than learning exercises which demand the observation of errors. It is interesting to note, among their conclusions, that pupils thought the discovery of errors made by other people much more interesting than the drill in which correct associations were exercised. The problem of whether learning exercises should be restricted to one arithmetical operation or should deal with more than one has been studied in three experiments. Buckingham 23 sought to deter- mine whether it is "better to teach subtraction facts in connection ^Knight, F. B. "The Superiority of Distributed Practice in Drill in Arithmetic," Journal of Educational Research, 15:157-65, March, 1927. (60) isKnight summarizes in this article the report of an experiment conducted by Luse. bee: Luse, E. M. "Transfer within Narrow Mental Functions, A Study of the Effects of Distributed versus Non-Distributed Drill in Arithmetic,'* University of Iowa Monograph in Education No. 5. Iowa City: University of Iowa. (61) . 19 Kulp, C . L. "A Study of the Relative Effectiveness of Two Types of Standard Arithmetic Practice Materials," Journal of Educational Research, 22:381-87, December, 1930. (65) 2u Rosse, J. C. "An Experiment to Test the Increase in Reasoning Ability from the Use ot lest and Practice Sheets in 6A Arithmetic, '• Journal of Educational Research, 22:210-13, October,1930. (105) 21 Kirkpatrick, E. A. "An Experiment in Memorizing versus Incidental Learning, Journal of Educational Psychology, 5:405-12, September, 1914. (58) . . . 22 Myers, G. C and Myers, C E. "Finding Mistakes versus Correct Associations in Simple Number-Learning," Journal of Educational Research, 18:25-31, June, 1928. (86) m ^Buckingham, B. R. "Teaching Addition and Subtraction Facts Together or Separately, Educational Research Bulletin (Ohio State University), 6:228-29, 240-42; May 25, 1927. (17) 38 Bulletin No. 58 with related addition facts than to teach the addition facts first and the subtraction facts afterward." His conclusions favor the teaching of addition and subtraction together. Myers and Myers 24 prepared learning exercises which required the pupils to shift rapidly among the four fundamental operations. Their conclusions are distinctly unfavorable to such mixed exercises. ". . . . rapid shifting by the pupil from one process to another not only causes great confusion of processes, but the pupil so confused also tends to be more confused when he later works on combinations grouped twenty-five to a proc- ess." Repp 25 prepared two sets of drill material the objective of which was the maintenance of skill. Each of the exercises of one set of material dealt with a single topic, such as addition of fractions, while each exercise of the other set of material was of mixed nature. This difference in organization was the only difference in the content of the two sets of drill material. The conclusions are distinctly favor- able to the mixed type of drill material as a basis of learning exercises for the maintenance of skills in arithmetic. "All pupils profited by use of drills furnished them, but those using mixed drills showed 23 per cent greater gain than those using isolated drills." 2. Evaluation of experiments. Evans and Knoche (37) used two groups of sixth-grade children of unreported size. With respect to equivalence they state that "the children in the two rooms were quite similar in ability. The 6A class was one semester in advance of the 6B group." The pupils of the 6B class were drilled with the Stude- baker Economy Practice Exercises five minutes each day for forty- three days, the time being taken from their regular arithmetic work. The tests administered at the end of the experimental period yielded a probably significant difference in mean gain for the group using the Studebaker Exercises. The experimenters are to be criticized for not attempting to secure equivalent groups and for utilizing pupils whose arithmetic instruction, other than that inherent in the experimental factor, differed so greatly. It is stated that during the period of drill "the main work for the 6A grade was percentage with a general review of the fundamental processes. The work of the 6B grade was deci- mals." It is possible that the zeal of the teacher for the novel practice material was another uncontrolled factor. Kelly (55) used three groups of 133, 146, and 173 fourth-, fifth-, sixth-, seventh-, and eighth-grade children, making no effort to secure equivalence. The groups used the Courtis Standard Practice Tests, 24 Mvers, G. C and Myers, C E. "The Cost of Quick Shifting in Number Learning," Educational Research Bulletin (Ohio State University), 7:327-34, October 31, 1928. (85) 25 Repp, A. C "Mixed versus Isolated Drill Organization," Twenty-Xinth Yearbook of the National Society for the Study of Education, Bloomington, Illinois: Public School Publishing Company, 1930, p. 535-49. (103) Summary of Research Relating to the Teaching of Arithmetic 39 the Studebaker Economy Practice Exercises, or informal exercises prepared by the teachers for eight to fifteen minutes of drill per day, depending on the grade level, for twenty successive days. The tech- niques used in this experiment are open to criticism. A lack of equivalence is indicated by the unequal representation of the different school grades in each of the groups. For example, there were no fourth-grade children in the group using the Courtis material and no VA or VI B pupils in the group using the Studebaker material. Failure to control important teacher factors is indicated in the statement that "The differences from class to class by the same method suggest that after all the efficiency of any method depends mostly on the teacher who is using it." Mead and Johnson (71) used two groups of 105 fifth- and sixth- grade pupils. No attempt was made to secure equivalence, and the preliminary tests reveal some departures from equivalence. The pupils of one group practiced ten minutes a day with the Courtis material, while the pupils of the other group used the Thompson material. No attempt was made to prevent home practice, it being felt by the experimenters that if a practice material stimulated such practice such stimulation should be allowed to operate during the experiment. After ninety days of practice the Courtis Research Test was administered, the results of which were possibly significantly in favor of the Courtis Standard Practice Tests. This experiment is faulty in that no effort was made to secure equivalence or to control practice time. Precision in experimentation demands that pupils of experimental and control groups spend an equal amount of time in learning. Another possible fault is that the Courtis Research Test would be more valid with respect to the Courtis drill material than with respect to the Thompson drill material. Morgan (80) used two groups of twenty-eight fourth-grade pupils. The groups were equated on the basis of average scores made on two standardized arithmetic tests. One group used the Economy Reme- dial Exercise Cards and was subjected to the Compass Diagnostic Tests, while the other group merely used practice pads prepared by Lennes. Both groups were taught by the same teacher for a period of twelve weeks. At the end of this period, the other forms of the initial tests were administered, and the average scores, computed. The difference in mean gains significantly favors the group that used the Economy Remedial Exercise Cards and that had the Compass Diag- nostic Tests administered to it. There seems little reason to doubt the reliability of the findings, but it is impossible to ascribe the supe- rior achievement of the group which excelled to the practice material 40 Bulletin No. 58 or to the diagnostic tests. It would seem, therefore, that the chief criticism which may be made with respect to this experiment has to do with the failure of the experimenter to restrict the experimental factor to a single technique. Newcomb (88) used an experimental group of fifty-one pupils and a control group of twenty-one seventh-grade pupils. With respect to equivalence he states, "A comparison of the intelligence quotients of the pupils of the several classes did not reveal on the whole any ap- preciable differences." The experimental group was practiced five or six minutes a day for thirty-five days on drill material which provided practice on the higher decades, while the instruction of the control group was conducted "in the usual manner." The administration of the Courtis Standard Research Test at the close of the experiment re- vealed the probably significantly superior achievement of the experi- mental group. Newcomb is to be criticized for not securing more adequate equivalence of groups and for not specifying the type of learning activity engaged in by the control pupils. It is possible that greater zeal was exerted by the teachers in utilizing the experimental drill material, since the failure to mention the type of drill material used by the control pupils would indicate a lack of enthusiasm for it. Fowlkes (38) used a single group of thirty-one third-grade pupils whose median I. Q. was 104.5. This group of pupils was drilled on multiplication twenty minutes a day for twenty days by means of the text material alone, "the teacher doing as little talking as possible," and remedial adjustments were made by printed directions and de- vices. There resulted from this instruction achievement which is claimed by the author to be significantly better than that of other third-grade classes. While a single-group technique is not usually to be relied upon, the fact that Fowlkes was able to compare his results with those of other third-grade classes would give his conclusions some dependability. It is possible that he should have allowed for the somewhat superior intelligence of his third-grade class in formu- lating his conclusions. Luse, as reported by Knight (60), used two groups of three hun- dred fifth-grade pupils which were equivalent with respect to general arithmetic ability. One of these groups used carefully constructed material, while the other employed material which slighted certain of the number combinations. "All other conditions were held constant." After fifty consecutive drill periods of fifteen minutes each, the final tests were administered. The differences in achievement were prob- ably "statistically" significant in favor of drill material in which practice is carefully distributed over the number combinations. The Summary of Research Relating to the Teaching of Arithmetic 41 techniques used in this experiment compare favorably with the best of contemporary experimental research in education. In the experiment of Kulp (65) four classes used the practice ma- terial which did not provide practice in arithmetical reasoning, while six classes used the material which did. A total of 113 fourth-grade pupils took the final test. It is evident from the figures given in the report of the investigation that the experimental and control groups were initially equivalent in computational ability, but that the group receiving the training in reasoning was initially superior in reasoning ability. The teacher factor, experience with instructional procedure, favored the practice material which did not provide practice in solving reasoning problems, but it is possible that the influence of this experi- ence was offset by the usually occurring greater zeal for a new method or procedure. The experiment lasted from October to April. The differences in gains in achievement are apparently significantly in favor of the type of material which provided practice in arithmetical reasoning in connection with calculation drill. The investigator is to be criticized for failure to secure more adequate equivalence at the beginning of the experiment, and for failure to indicate more clearly the differences in gains in achievement and the "statistical" signifi- cance of these differences. The investigator is to be commended for his careful description of the compared factors, for measures taken to control non-experimental factors, and for conducting his experiment over a comparatively long period of time. His conclusions would seem to be fairly dependable with respect to the groups used in the experiment. Further experimentation is needed before generalization is justified. Rosse (105) used two groups of eighteen sixth-grade pupils which were equivalent with respect to initial arithmetic reasoning ability and with respect to intelligence as measured by the Otis Arithmetic Reasoning Test and the National Intelligence Test. One group used practice sheets which provided drill in reasoning problems, while the other group used an ordinary arithmetic text. At the end of fifty- eight days the same form of the Otis Arithmetic Reasoning Test was administered. The difference in achievement favors, but not signifi- cantly, the method in which the practice sheets which provided drill in reasoning problems were used. While the conclusions do not seem to be highly dependable because of the size of the groups used, be- cause of the lack of control of important non-experimental factors, and because of the unreliability of the difference reported, they may be accepted as evidence supplementing that presented by Kulp (65). Kirkpatrick (58) used two groups of ten and two groups of twenty- 42 Bulletin No. 58 five normal-school students and two groups of twenty sixth-grade pupils, making no attempt to secure equivalence of groups. No men- tion is made of any procedures used to secure control of non-experi- mental factors. The groups were tested at the end of ten days, and the normal-school students, again at the end of three weeks. The differences in achievement in each case favored the method of learning multiplication combinations through use. It is evident that this experiment may not be regarded as other than crude. Since no attempt was made to secure equivalence of groups, or to control ade- quately non-experimental factors, the differences in achievement may not with certainty be ascribed to the method reported superior. Myers and Myers (86) used two groups of one hundred fourth- and fifth-grade pupils which were matched on the basis of initial arithmetic ability. These groups were also matched with other groups of equal size in order to control the practice effect of the initial test. The experiment was conducted just long enough for the pupils of one group to observe errors in the answers of a group of twenty number combinations, while the members of the other group examined twenty combinations and their correct answers. The differ- ence in achievement, as shown by the final test, was probably signifi- cantly in favor of the exercise in which the pupils observed only correct answers. The chief criticism of this experiment is its short duration. It is possible that the confusion caused by the, exercise containing errors might have worn off with more prolonged use and that, in the long run, its use would result in superior achievement. It may be true, also, that this type of exercise is one which would engender the ability to locate mistakes— a well recognized objective of arithmetic instruction. Buckingham (17) equated seven pairs of groups of from twelve to twenty-eight second-grade children in seven schools on the basis of scores on the Pressey Primary Classification Test. During a daily period of twenty minutes one of the groups of a pair was taught related addition and subtraction facts together, as for example: 1+6, 6+1, 7 — 1, and 7 — 6. The other group of pupils was taught all of the addition facts and then all of the subtraction facts for the same time per day. With the exception of this difference in the learning exercises, the instructional materials and techniques used for each pair of groups were the same. No home work was required and no new topics were introduced in arithmetic during the experi- mental period. The hour of the instruction was alternated for each pair of groups at the end of each week. The statement is made that the experiment lasted about a month for one of the pairs of groups, Summary of Research Relating to the Teaching of Arithmetic 43 but nothing is said in this respect about the others. Three of the differences in achievement revealed by the final test are "statistically" significantly in favor of the "together" method, and three more of the differences are in favor of the together method, but not significantly so. One difference favors, but not significantly, the separate method. Buckingham attaches great significance to this "all but unanimous verdict." He states, "When an experiment conducted seven times yields six results all in the same direction, the evidence is rather con- clusive even though some of the differences, when considered individ- ually, are small or lacking in statistical significance." He recognizes the limitation of the short duration of his experiment and the failure to test retention. While the techniques used in this experiment have some admirable features, a question may be raised with respect to the validity of the final test. Was it adapted to the type of learning exer- cises used by the different groups? If its examples were of mixed nature, it is probable that the test was more valid with respect to the mixed learning exercises. If, however, one of the groups, of a pair, had a test in which addition and subtraction were kept separate while the other group had the same items mixed, the results would probably be more valid with respect to each group, but it is difficult to see how they could be considered comparable. In the face of this dilemma of measurement one does not seem justified in accepting the conclu- sions as highly dependable. Myers and Myers (85) used fifty fifth-grade pupils, sixty-four sixth-grade pupils, and fifty normal-school girls selected in a random fashion. "The first pupil of a given group was tested with the grouped combinations followed by the mixed combinations; the next pupil was tested with the mixed examples first and then with the grouped ex- amples; the third pupil began with the grouped examples and so on alternating throughout the group." The pupils made their responses orally, and the experimenter recorded the time required. An analysis was made of the results, and a check was made of the practice effect. The results significantly favor the method of grouped, rather than the method of mixed, exercises. Applying the two types of exercises to alternate pupils does not insure that they were applied to equivalent groups. 26 Another criti- cism concerns the length of the tests, each of which contained forty items. More dependable results could have been secured by the utilization of a much longer test, or by the utilization of a long period 2fi This technique is probably j ustified when the groups are very large. For example, Monroe used a similar technique, but with a total of 9,256 pupils. See: , . Monroe, W. S. "How Pupils Solve Problems in Arithmetic, University of Illinois Bulletin, Vol. 26, No. 23, Bureau of Educational Research Bulletin No. 44. Urbana: University of Illinois. 1929. 31 p. (79). 44 Bulletin No. 58 of learning prior to a final test. However, if this were done, the exper- imenter would yet be faced with the dilemma of a choice between a doubtfully valid mixed test or non-comparable separate tests. Repp (103) used groups of 263 and 267 twelve-year-old pupils which were equivalent with respect to arithmetical ability as shown by an initial test of .97 + .006 reliability. One of these groups used drill material consisting of twenty-six twenty-minute exercises, each of which dealt with one topic. The other group used material of the same total content but of mixed organization. After twenty-six weeks an exhaustive final test, also of .97 ± .006 reliability, and of a mixed nature was administered. The results of this test are "sta- tistically" in favor of the mixed drills. The final test probably was more valid with respect to the abilities engendered by the mixed drills than with respect to the abilities engendered by the isolated drills. It should be mentioned, however, that an analysis of the achievement during practice ultimately favored the mixed drills. The conclusion may be justified, therefore, that mixed drills are superior for maintenance of skill, while isolated drills are superior in the earlier stages of learning. 3. Justified conclusions. If one accepts the principle that arith- metical ability in the field of calculation is specific, or at least largely so, and that, consequently, ability to calculate consists of a large num- ber of specific abilities, it follows that drill must be provided on each specific ability, unless it is believed that there is essentially complete transfer from one specific ability to another when these abilities are at all closely related. 27 Furthermore, it appears reasonable that the more difficult combinations should receive more drill than the easier ones. Consequently, it is to be expected that learning exercises con- structed with due recognition of the specific abilities to be engendered and of their relative difficulties and interrelations should be more effective than learning exercises not so constructed. This group of investigations supports this general hypothesis and appears to justify the assertion that the hypothesis has been demonstrated. It might be argued that this hypothesis is obvious and, hence, that the principal contribution of these studies is to be found in their details. The more significant of these detailed findings appear to be: 27 The conclusions of the recent investigations of Beito and Brueckner (9) and of Olander (94) would seem to indicate that there is a large amount of transfer in the case of certain abilities. The conclusions of Beito and Brueckner (9) were referred to in a footnote on page 16. Olander (94) has reported that "The ability gained by children on fifty-five simple number combinations in addition and on fifty-five similar combinations in subtraction transferred almost completely to the forty-five remaining simple number combinations in each of the two processes." This conclusion seems to be reasonably dependable, since Olander used relatively large equivalent groups, controlled non-experi- mental factors rather adequately, and secured measures of achievement which seem acceptably reliable and valid. Such a conclusion would not seem to oppose the contention above that the best ma- terials for drill are those constructed so that the more difficult combinations receive the greater practice. It is commonly accepted as a principle in education that the best way to insure attain- ment is to practice the needed abilities directly rather than to depend on transfer. Summary of Research Relating to the Teaching of Arithmetic 45 1. Practice material prepared by experts seems to be more effective than learning exercises based on material prepared by teachers. 2. Learning exercises in which the practice is carefully distributed over the number combinations so that none are slighted and the more difficult combinations occur with relatively greater frequency are superior to learning exercises which have not been thus prepared. 3. Learning exercises to be used in the initial stages of learning calculation should probably require the practice of addition, subtraction, multi- plication, and division separately. Learning exercises whose objective is the maintenance of skill should be mixed in character. The pupils should be given some opportunity to practice their calculation abilities in the situations represented by examples varied with respect to the fundamental process called for. THE INFLUENCE OF DISTRIBUTION OF PRACTICE TIME ON ACHIEVEMENT IN THE FUNDAMENTALS 1. Summary of reported conclusions. Three experiments have been reported on the effect of distribution of practice time on learn- ing, and one has been reported on the distribution of practice needed for retention, or maintenance of skill. Kirby 28 compared practice periods in addition of 22J/2, 15, 6, and 2 minutes' duration and in division of 20, 10, and 2 minutes' duration. The gains in achieve- ment, for both addition and division, favored the two-minute inter- val. Hahn and Thorndike 29 compared practice periods in addition of 5, iy 2 , 10, 11 34, 15, 20, and 22 minutes' duration. Their results tend to favor the longer periods. YVimmer (124) reported that pupils who were given one fifteen-minute drill per week made greater progress than those who were given five minutes of drill five times per week. Reed 30 compared a single hour of practice in addition with a distribu- tion of twenty minutes a day for three days, ten minutes a day for six days, and ten minutes twice a week for three weeks. The gains in achievement favor the distribution of twenty minutes a day for three days. Norem and Knight 31 investigated the distribution of practice needed for retention or maintenance, of skill. They concluded with respect to drill in multiplication that when mastery has been attained "one practice a week is sufficient for maintenance." They state also, however, that one practice a week "is often insufficient practice for maintaining the combinations during the first two weeks following the initial learning of them." "Kirby, T. J. "Practice in the Case of School Children," Teachers College, Columbia University Contributions to Education, No. 58. New York: Bureau of Publications, Teachers College, Columbia University, 1913. 98 p. (57) "Hahn, H. H. and Thorndike, E. L. "Some Results of Practice in Addition under School Con- ditions," Journal of Educational Psychology, 5:65-84, February, 1914. (46) 30 Reed, H. B. "Distributed Practice in Addition," Journal of Educational Psychology, 15:248-49, April, 1924. (102) 3iNorem, G. B. and Knight, F. B. "The Learning of the One Hundred Multiplication Combi- nations," Twenty-Ninth Yearbook of the National Society for the Studv of Education, Bloomington, Illinois: Public School Publishing Company, 1930, p. 551-68. (91) 46 Bulletin No. 58 2. Evaluation of the experiments. Kirby (57) employed groups of 194, 104, 205, and 229 fourth-grade children in his addition experi- ment. These groups were practiced fifteen minutes in addition as an initial test. They were then subjected to forty-five minutes of prac- tice divided into periods of 22^, 15, 6, or 2 minutes in length. Fi- nally, they were practiced for another fifteen-minute interval, which represented the final test. The experimenter exercised considerable care to prevent the children from practicing outside of the practice intervals and to control other non-experimental factors. He conduct- ed the practice himself in practically all of the classes. The experi- ment with practice divided into periods of 20, 10, and 2 minutes' duration in division was conducted in a similar fashion, using groups of 204, 209, and 193 third- and fourth-grade children. The differ- ences in gains seem possibly significant with respect to addition prac- tice periods of two minutes' duration and certainly significant with respect to division practice periods of the same length. Kirby is to be commended for his attempt to secure a representa- tive sample of school children. He checked the performance of thirty-eight of the school classes which were used in this experiment, and which were located in New York City, with results obtained with a class outside of this city. One fault to be found with this experi- ment is that of failure to secure equivalent groups. While the failure to secure equivalence does not invalidate the results, it does obscure their precise significance. The experimenter calls attention to the possible influences of factors not inherent in the short practice period : (1) The groups, working in shorter periods, because of the number of days over which the experiments ran, had greater opportunity during the experiment to profit from the regular school work than other classes .... (2) The groups working in shorter periods had a longer time in which to catch the spirit of the experiment and to become enthusiastic over surpassing their previous performance. They had their records read to them more times and had the in- centives to intense effort repeated more often. (3) They also had greater opportunity and incentive to do work outside of the time given to the experiment. The experiment of Wimmer (124) was described and evaluated on page 32. His conclusion with respect to the distribution of practice time may not be regarded as dependable. Hahn and Thorndike (46) used eight experimental groups varying in size, when approximate equivalence had been secured, from six to nineteen fourth-, fifth-, sixth-, and seventh-grade pupils. These groups were subjected to ninety minutes of practice in addition, divided into periods of 5, 7}^, 10, 11 34, 15, 20, and 22 minutes' duration. While the use of the practice sheets would seem to make Summary of Research Relating to the Teaching of Arithmetic 47 negligible the teacher factors, it is possible that an important extra- school factor was uncontrolled. The investigators state: It should be kept in mind throughout the reading of what follows that any child was free to write out sums and to practice with them at home, during the course of the experiment .... no attempts were made to prevent practice apart from the specified practice in school. The differences favor, but not significantly, the longer practice intervals. More dependence could be placed on this conclusion if larger groups had been used and used with more adequate control of non-experimental factors. Reed (102) used four groups of 60, 50, 51, and 42 first- and second- year college students. The scores on the initial test in addition indi- cate that these groups were only approximately equivalent. One group practiced addition for a period of one hour, while the other groups practiced an equal amount of time distributed in periods of twenty minutes a day for three days, ten minutes a day for six days, or ten minutes twice a week for six weeks. It should be mentioned that the initial ten minutes of practice and the final nineteen minutes constituted the initial and final tests. The results favor significantly the distributed practice as compared with the one hour non-distrib- uted practice. With respect to the distributed practice, the results favor, but not significantly, the daily twenty-minute practice periods. The chief criticisms of this experiment are that it was conducted with adults and that the groups having the distributed practice were initially superior. Hence, its conclusions are probably not applicable to school children. The adults but relearned an old skill. Results might be quite different with new learning. Norem and Knight (91) used twenty-five third-grade pupils in their investigation of the distribution of practice effective for reten- tion or maintenance of skill in multiplication. The parents of the pupils were requested to refrain from assisting them in drill at home, and the pupils were instructed not to practice except when required to do so by the experiment. After an initial administration of two tests, given a week apart, which disclosed unlearned combinations, each pupil was individually drilled to the point of mastery of his formerly unlearned combinations. The pupil was then tested once a week for a period of six weeks on these newly mastered combinations, and then once a month for three months. The analysis of the practice and test achievements of these twenty-five pupils is a commendable feature of this experiment. It would seem to justify the conclusion that one practice a week is sufficient for maintenance of skill in multi- plication after mastery has been attained, so far as this group of pupils 48 Bulletin No. 53 is concerned. It is probable that this investigation should be re- peated with larger groups for greater reliability in the findings. 3. Justified conclusions. The conclusions of Kirby (57) and of Hahn and Thorndike (46) are opposed to each other, while that of Reed (102) tends to agree with that of Hahn and Thorndike (46). The conclusions of Norem and Knight (91) seem reliable for the pupils used in their experiment, but do not seem more than suggestive for pupils in general. The conflicting testimony, plus the obvious faulty techniques of the experiments, prevents the authors from stating a justified conclusion. It would seem, however, that until more adequate experimental evidence has been presented, the teacher will be acting wisely in em- ploying intervals approximately twenty minutes in length with a frequency of one a day until mastery has been attained. After this objective has been reached, shorter practice periods distributed at longer intervals will possibly serve to maintain skill. THE INFLUENCE OF REQUESTS FOR SPEED OR ACCURACY ON ACHIEVEMENT IN THE FUNDAMENTALS * 1. Summary of reported conclusions. The influence of requests for speed or accuracy has been studied in three experiments. 32 Wim- mer (124) has reported that "the difference in progress made by the two groups, one being drilled for accuracy and the other for speed is not very large." Messick 33 reports that if speed is the objective of achievement in addition, it makes little difference which is requested, speed or accuracy. However, if accuracy is the objective, it is much better to request accuracy rather than speed. He states, "In teaching addition to pupils of the fourth and fifth grades of the elementary schools it is better to emphasize accuracy rather than speed." Myers 34 concludes that requests for speed are causes of inaccuracy in the fundamentals. "One may conclude that the loss to learning effi- ciency from the strong speed pressure as applied to the simple number combinations in arithmetic under which many school children must work in school today is appalling." 32 There have been several investigations of the relation of speed to accuracy in the fundamentals of arithmetic; see: Bird, G. E. "A Test of Some Standard Tests," Journal of Educational Psychology, 11:275-83, May, 1920. Courtis, S. A. "Courtis Standard Research Tests: Third, Fourth, and Fifth Annual Accountings, 1913-16," Bulletin No. 4. Detroit: Department of Cooperative Research, 1916. 112p. Luderman, W. W. "Speed and Scholarship Arithmetical Accuracy," School Science and Mathe- matics, 25:522-24, May, 1925. Monroe, W. S. "A Report of the Use of the Courtis Standard Research Tests in Arithmetic in Twenty-Four Cities," Studies by the Bureau of Educational Measurements and Standards, No. 4. Emporia: Kansas State Normal School, 1915. 94 p. Phelps, C L. "A Study of Errors in Tests of Adding Ability," Elementary School Teacher, 14:29-39, September, 1913. 33 Messick, A. I. "Effect of Certain Types of Speed Drills in Arithmetic," Mathematics Teacher, 19:104-09, February, 1926. (75) 34 Myers, G. C "The Price of Speed Pressure in the Learning of Number," Educational Research Bulletin (Ohio State University), 7:265-68, September 19, 1928. (84) Summary of Research Relating to the Teaching of Arithmetic 49 2. Evaluation of the experiments. The experiment of Wimmer (124) was described and evaluated on page 32. His conclusion with respect to speed versus accuracy may not be regarded as dependable. Messick (75) used two groups of 136 fourth- and fifth-grade children. No attempt was made to secure equivalence. One group practiced addition four minutes a day for twenty days, with emphasis on speed. The other group practiced addition for the same length of time, but requests were made for accuracy rather than for speed. The final tests revealed a certainly "statistically" significant difference in accuracy in favor of the group for which accuracy was emphasized. The small difference in speed also in favor of this group cannot be regarded as "statistically" significant. This experiment is faulty in that no attempt was made to secure equivalence. There is some rea- son for believing that important non-experimental factors were not adequately controlled. The experiment was rather short in duration. Myers (84) used one group of ten first-grade children. These children, who had been practiced for two months in addition, were administered a test, the results of which indicated almost 100 per cent accuracy. After two years, "The ten who were still in school were studied again. In the meantime, these children .... had been exposed to rapid-fire drills in the simple addition facts and the basic subtraction facts. The test-flash card .... was their torturer almost daily .... They were frequently subjected to games in which the fastest answers won." The children were then subjected to five practice-test periods, after each of which they were told that they had done very well and were urged to go faster. The decrease in accuracy as more and more emphasis was placed on speed is signifi- cantly shown in this experiment. Myers is to be commended for pro- longing his investigation over so long a period of time. He is to be criticized for securing data from so small a group, for failure to employ a control group, and for creating what appear to be abnormal condi- tions. It is possible that the conditions to which these children were subjected are not typical of good, or even usual, school practice. 3. Justified conclusions. While dependable conclusions must await further controlled experimentation, it seems justifiable to rec- ommend requests for accuracy rather than requests for speed. In any case, it seems justifiable to hold that requests for accuracy should precede requests for speed. After pupils have attained satisfactory accuracy on a given level of difficulty, a teacher is possibly justified in encouraging them to increase their rate. CHAPTER IV METHODS OF TEACHING PUPILS TO SOLVE VERBAL PROBLEMS It is commonly assumed that the responses made by pupils when presented with verbal problems in arithmetic are the result of reflec- tive thinking. Consideration is given in the first part of this chapter to investigations of the nature of pupil responses to verbal problems. The experimental factors of the experiments summarized in the second part of the chapter are variations in types of verbal problems and of problem statements, and those in the third and final portion of the chapter are various methods of teaching pupils to solve verbal problems in arithmetic. THE NATURE OF PUPIL RESPONSES TO VERBAL PROBLEMS 1. Summary of reported conclusions. Three studies have been reported on the problem of the part played by reasoning when pupils attempt to solve verbal problems in arithmetic. Bradford 1 reported from an analysis of test results that "arithmetical work is not done in a critical frame of mind." This conclusion has since been substan- tiated by the more comprehensive investigation of Monroe, 2 in which the conclusion was reached that "a large per cent of seventh-grade pupils do not reason in attempting to solve arithmetic problems . . Many of them appear to perform almost random calculations upon the numbers given. When they do solve a problem correctly, the response seems to be determined largely by habit." Kline and Anderson 3 have reported a laboratory study, the findings of which indicate the nature of the dual role of specific habits and reasoning abilities in solving verbal problems in arithmetic. 2. Evaluation of the investigations. The data in the investiga- tions of both Bradford (11) and Monroe (79) were collected by means of a single administration of tests. The tests of Bradford (11), which were administered to several hundred pupils in Standards VII and VIII in certain elementary schools in England, were composed of examples impossible of solution, of which the following quoted from the report are illustrative: Februa B r rad l f 925' E (\l) °' " Suggestion - Reasoning, and Arithmetic." Forum of Education, 3:3-12, Vni o^ N? ro< V a W u S- " H .°w Pupils Solve Problems in Arithmetic," University of Illinois Bulletin, 1929 .31 p ' (7Q) eaU Educatlonal Research Bulletin No. 44. Urbana: University of Illinois, M*/JSv e * ™)Xk^ d v A u derSOn \^ K -,c'7 he RoIe of Habit in Reasoning," School Science and Mathematics, 26:156-67, February, 1926. (59) 50 Summary of Research Relating to the Teaching of Arithmetic 51 1. If the distance from Aries to St. Brieuc is 500 miles, and from Vire to St. Malo is 50 miles, how far is it from St. Brieuc to St. Malo? 2. If Henry VIII had six wives, how many had Henry II? The extent to which attempts were made to solve such problems was taken by Bradford to be indicative of the absence of critical reflective thinking in the solving of arithmetical problems by school children. While this conclusion seems reasonably dependable, it should be remembered that the data refer to the children of English schools and for this reason may be somewhat less applicable to Amer- ican children. It is in agreement, however, with the conclusion of the investigation reported by Monroe (79). Monroe (79) secured his data by administering a test to 775 sixth- grade, 5902 seventh-grade, and 2579 eighth-grade pupils in forty-one Illinois cities. These pupils were divided into four groups, and equiv- alence was secured by distributing the tests to the pupils in a random manner. In order that each of the tests might be given to a random sample of pupils, the four tests were arranged in alternate order so that when distributed to the pupils in the class, the first, fifth, ninth, thirteenth, and so forth, would receive Test A; the second, sixth, tenth, fourteenth, and so forth, would receive Test B; the third, seventh, eleventh, fifteenth, and so forth, would receive Test C; the fourth, eighth, twelfth, sixteenth, and so forth, would receive Test D. Since the tests were to be given in a large number of classes, it seemed that this plan of sampling would provide equivalent groups It is evident that the four groups were equivalent not only in arithmetical ability but also with respect to teachers, textbooks, and other factors. In general each of the four equivalent groups was equally represented in each classroom, and this representation was secured in a random fashion. The tests administered to these groups differed only in the termi- nology used in stating the problems. For example, in Test A, the second problem is stated in simple terminology, all of the data given are relevant, and the setting is concrete. In Test B, technical termi- nology is used, all the data given are relevant, and the setting is con- crete. The difference in the statement of the problem in these two tests is the change from simple terminology to technical terminology. In Test C, the problem is stated in simple terminology, the data given are relevant, and the setting is abstract. In Test D, technical termi- nology is used, irrelevant data are included, and the setting is abstract. The problems of the tests are so stated that comparisons are possible with respect to the relative influences on correctness of response of simple and technical terminology, wholly relevant data and data partially irrelevant, and concrete and abstract setting. These com- 52 Bulletin No. 58 parisons are made for the data of this investigation, and the results are presented in tabular form in the report of the research. The techniques used in this study appear to be reasonably free from criticism. There seems to be little question that the sample of pupils was representative, and the groups used, equivalent with respect to all significant factors. The data secured seem to be of sufficient quality to warrant the statement that responses of pupils to verbal problems are usually characterized by absence of reasoning. The experiment of Kline and Anderson (59) was conducted with four adults in a psychological laboratory. Time and accuracy were recorded for the responses to four hundred questions, such as "If Thursday is the twelfth, what day is the eighteenth?" The conclu- sions of this experiment are interesting, but may not safely be applied to school children. It would seem, however, that Kline and Anderson have made but another attempt to prove the obvious. It is com- monly recognized that there is close interdependence between specific habits and reasoning. 3. Justified conclusions. The data secured in these three investi- gations appear to justify the conclusions stated, insofar as they apply to the groups of pupils to which the tests were given and by which the test exercises were used. The generalization of the conclusions may be questioned, especially for all types of problems and for all condi- tions of responding to them. Hence, the generalization should be considered tentative. It should also be noted that these investiga- tions deal with the question of what responses pupils make as the result of the instruction they have received. They do not consider the type of responses that pupils should make. THE EFFECT OF DIFFERENT TYPES OF PROBLEMS AND PROBLEM STATEMENTS 1. Summary of reported conclusions. Myers, 4 Hydle and Clapp, 5 Washburne and Morphett, 6 Bowman, 7 Mitchell, 8 Monroe, 9 Wheat, 10 and Osburn and Drennan 11 have reported conclusions relative to the *Myers,G. C. "Imagination in Arithmetic," Journal of Education, 105:662-63, June 13, 1927. (83) i • 7-.u \ • '.^Sr CIap P' T F; L - ."Elements of Difficulty in the Interpretation of Concrete Prob- 1927 1 "m "(50?' aU ° f Educatwnal Research Bulletin No. 9. Madison: University of Wisconsin, «Washburne, C. W. and Morphett. M. V. "Unfamiliar Situations as a Difficulty in Solving Arithmetic Problems, Journal of Educational Research, 18:220-24, October, 1928. (118) 'Bowman, H. L. "The Relation of Reported Preference to Performance in Problem Solving," University of Missouri Bulletin, Vol. 30. No. 36, Education Series, No. 29. Columbia: University of Missouri, 1929. 52 p. (10) sMitchell, Claude. "The Specific Type of Problem in Arithmetic versus the General Type of Problem, Elementary School Journal, 29:594-96, April, 1929. (76) 9 Monroe, op. cit. lowheat, H. G. "The Relative Merits of Conventional and Imaginative Types of Problems in Arithmetic Teachers College, Columbia University Contributions to Education, No. 359. New York: Bureau of Publications, Teachers College, Columbia University, 1929. 124 p (1?1) n n !■ ,l\ l { r . n O V - J - TT and D . re " nan - L - J- "Problem Solving in Arithmetic," Educational Research Bulletin (Ohio State University), 10:123-28, March 4, 1931. (95) Summary of Research Relating to the Teaching of Arithmetic 53 effect upon pupil responses of certain variations in the statement of the problems. Myers (83) administered two problems to fifth-grade pupils and reported that these pupils were able to solve the "imagi- natively stated" one much more easily. Hydle and Clapp (50) studied the following characteristics of arithmetical problems in an effort to determine whether or not these characteristics were causes of diffi- culty in problem solving: 1. Objective setting 2. Size of numbers 3. Unfamiliar objects 4. Arrangement in a series 5. Nonessential elements 6. Visualization vs. experience 7. Project vs. problem form of statement 8. Symbolic terms Variations of these characteristics, with the exception of the arrangement of similar problems in series and the presence of non- essential elements were found to be "statistically" significant causes of difficulty. In addition to this conclusion the authors state that problem solving for pupils is largely a matter of visualization. Prob- lems should be formulated with this in mind in the earlier stages of learning, but in order that generalizing ability might be engendered, it is concluded that the pupils should have as learning exercises a considerable number of problems not related to their first-hand experiences. Washburne and Morphett (118) report that fifth-grade pupils achieve better results with familiar problems than with those con- taining unfamiliar elements. The following problems quoted from the report are illustrative of those used in his study; the first is in unfamiliar terminology, and the second, in familiar terminology: A merchant sold 20 bags of charcoal. Each bag held 35 pieces. How many pieces of charcoal did he sell? The girls have to make 30 boxes of taffy. Each of the boxes holds 25 pieces. How many pieces of taffy do they have to make? Bowman (10) reported that pupils of high ability, as measured in his study, performed equally well on the following types of problems: 1. Problems based upon adult activities 2. Problems based upon children's activities 3. Problems whose setting is in the field of science 4. Problems so stated as to take on the nature of a puzzle 5. Problems of pure computation only, where directions for the right pro- cedure are given Pupils of lower ability showed a higher relative degree of perform- ance on problems of the pure computation type. Mitchell (76) re- 54 Bulletin No. 58 ported that "Problems with definitely expressed numerical quantities seem to be more readily understood and solved than problems of a general nature involving general principles. ' ' The following examples illustrate the types of problems compared in this study. The first is a specific problem, and the second, a general problem. The width of a room is 10 feet, and its length is 15 feet. Find its perimeter If you know the length and the width of a room, how can vou find the perimeter? Monroe (79) reported as another of the conclusions of his study that "If the problem is stated in the terminology with which they [the pupils] are familiar and if there are no irrelevant data, their response is likely to be correct." Wheat (121) determined the relative achievements of pupils with conventionally-stated problems and imaginatively-stated problems. He reported that differences in achievement are negligible. The first of the examples quoted below illustrates the conventional type of statement; the second of the examples illustrates the imaginative type. Margaret spent $3.68 for handkerchiefs at 23 cents each and gave one-fourth oi them to her sister. How many did her sister get? Margaret had been shopping all morning for Christmas presents. She had bought presents for her father and mother and brothers but could not decide what to get for her sister and several of her friends— there were so many things to pick from Just then she saw some pretty handkerchiefs which were marked IS cents each. These were just what she wanted, so she counted her money, found she had S3.68, and spent all of it for handkerchiefs. She kept out one- iourtn of the handkerchiefs to give to her sister and gave the rest to her friends. How many did she keep out to give to her sister? Osburn and Drennan (95) have reported a recent experiment in which vocabulary difficulty did not appear to be a significant factor in problem-solving achievement. These investigators conclude that their data "seem to indicate that pupils are able to sense the meaning of problems even if they do not understand all the words." The con- clusion is also reported that a few of the most important problem types should be taught thoroughly, with the expectation that transfer of training will take care of the remainder. 2. Evaluation of the experiments. Myers (83) administered his two problems to 513 fifth-grade children. One hundred and ninety- seven solved the first problem correctly, while 253 correctly solved the second and more imaginatively-stated problem. It would seem, probably, that the difference is due to practice effect rather than to the fact that the second problem was more imaginatively stated than the first. Hydle and Clapp (50) constructed tests in which the problems were paired with respect to each of the elements of difficulty investi- Summary of Research Relating to the Teaching of Arithmetic 55 gated. That is to say, a problem appearing in one form of the test differed from its mate in the other form with respect to a given ele- ment. For example, in the case of symbolic terms one problem statement would contain symbols, such as X, Y, and Z, instead of the names of objects given in the other problem statement. The tests included five pairs of problems for each of the following elements of difficulty: (1) objective setting, (2) size of numbers, (3) unfamiliar objects, (4) arrangement in a series, (5) nonessential elements, (6) visualization vs. experience, (7) project vs. problem form of state- ment, (8) symbolic terms. The tests were administered to pupils varying in number from 5870 to 7029. These pupils were widely dis- tributed in village and city schools. Those taking the tests were di- vided into two groups of approximately equal ability as shown by a test of twenty-five problems of a concrete character. The statistical interpretation of the data indicated that variations in six of the eight elements investigated might dependably be expected to cause diffi- culty. These elements are (1) objective setting, (2) size of numbers, (3) unfamiliar objects, (4) visualization vs. experience, (5) project vs. problem form of statement, and (6) symbolic terms. Hydle and Clapp are to be commended for their comprehensive and intensive investi- gation. The possible invalidity of their problem tests is adequately recognized in the report of the study. The investigators are to be commended for this and, in the opinion of the present writers, for not contending that the arithmetic curriculum should be so constructed that difficult elements in problem solving be eliminated. Washburne and Morphett (118) used a single group of 441 fifth- grade pupils in six different towns. A test of eight pairs 1 * of problems was administered to all of these children. The results appear to be "statistically" significant in favor of the problems containing famil- iar elements. The data collected would seem to be sufficiently reli- able to warrant acceptance of the conclusion. However, this experi- ment would seem to be but another attempt to prove the obvious. A more worth while investigation would be one that would attempt to show whether or not problems containing unfamiliar elements should be used as learning exercises. Bowman (10) administered both forms of his test to a total of 564 seventh-, eighth-, and ninth-grade pupils of Sedalia, Missouri. Evi- dence is presented to show that the pupils of this group represent an approximately normal distribution of intelligence and are typical of the grades they represent with respect to parentage, parental occu- i2An illustration of one of the pairs of problems is given on page 53. 56 Bulletin No. 58 pations, and environment. 13 Each of the two test forms contained twenty-five problems of the types previously referred to. At the bottom of each page of the forms was placed the following statement to be completed by the pupil: "The problem on this page I liked best is No. ." This was done to secure data relevant to preferences for different types of problems. 14 The coefficients of reliability and of validity for the test as a whole were quite high. The coefficient of reliability was reported as .95 ± .003 in the measurement of perform- ance and .77 ± .01 in the measurement of preference, and the coeffi- cient of validity was reported as .82 ± .01 when the scores secured from an administration of the Stanford Arithmetic Reasoning Test were used as the criterion. The representativeness of the group and the comparatively high reliability and validity of the instrument used constitute strong arguments for the dependability of the conclusions that pupils of high ability perform equally well on (1) problems based upon adult activities, (2) problems based upon children's activities, (3) problems whose setting is in the field of science, (4) problems so stated as to take on the nature of a puzzle, (5) problems of pure computation only, and that pupils of lower ability perform relatively better on problems of the purely computational type. Mitchell (76) administered a test containing fifteen quantitative problems and fifteen general problems — problems without expressions of numerical quantities — to seventy eighth-grade and sixty seventh- grade pupils. The mean difference in scores between the two types of problems is sufficiently large to seem to be "statistically" significant, although no standard or probable error is reported. The dependabil- ity of the findings may be questioned, however, because of certain faults in the data. The sample of pupils is too small to be regarded as representative. It may be that the pupils had greater difficulty with the general, or non-quantitative, type of problem because of lack of experience with problems of this type. Wheat (121) administered tests containing ten pairs of conven- tional and imaginative problems to approximately two thousand fifth-, sixth-, and eighth-grade pupils in several towns in different parts of the country. The differences in achievement between the conventional and imaginative types of problems were not of sufficient magnitude to be considered "statistically" significant, with the possible exception that the conventional type of problem required much less time. Wheat is to be commended for the size and repre- sentativeness of his sample, but his procedures for handling and inter- "While measures of intelligence of some of the pupils are not reported, there is no reason to believe that they were less typical of children in general than those for whom data are reported. '^This matter will be referred to again in the summary of research on motivation of learning in arithmetic. See page 81. Summary of Research Relating to the Teaching of Arithmetic 57 preting his data have been seriously criticized. Osburn 1 * states that Pearson coefficients of correlation are computed from unsuitable data: In at least two cases correlations are figured which are partly based upon the number of problems solved. The distribution of the number of problems solved I not normal ; in fact it is clearly of the U type. The use of the Pearson coefficient of correlation with distributions of this sort may be justifiab e if the regression lines are rectilinear. This necessary condition is not substantiated, and the use of the Pearson technique is therefore open to question. Again the Pearson correlation was originally intended for use with two var- iables onlv. In a number of cases in this study it is used where three and even lour variables are involved. For example, a correlation 1S shown be ween intelligence quotients and indices of similarity scores. In this case four variables are reaW involved, but they appear as two because quotients of respective _ pa r are use d This is handy, but hardly justifiable, as a statistical procedure. Osburn also criticizes the study from other points of view. He states, "In conventional problems, as here defined, the setting is left to the imagination, while in the imaginative problems the setting is made explicit by description but is still not perceptually present. The critic points out that the pupils quite possibly received previous training only on the conventional type of problem. In spite of the fact that they had had little or no training in the solution of imaginative problems the pupils did well with them. This might mean the existence of transfer, or it might indicate a marked advantage for the imaginative type when the factor of previous training is properly controlled by acceptable scientific techniques. Finally, Osburn contends that Wheat is to be criticized for assum- ing that arithmetic material should be used which can be bought cheaply and taught quickly and easily. Osburn holds that the ob- jectives of arithmetic must be considered here. "The question there- fore is not which problem is most economical to teach, or to buy, but which one will better prepare the pupil for quantitative thinking in real conditions— the sorts of situations which he will meet in Me. Osburn then presents arguments for the imaginative type of problem. The present writers are inclined to grant that most of Osburn s criticisms appear to be justified. It should be pointed out, however, that Osburn is somewhat inconsistent. For example, he holds that the two types of problems are synonymous and then contends that training has been different with respect to each. If they are synony- mous, why should each not be equally well adapted to engender those abilities accepted as the objectives of arithmetic? After all, it would seem that the conclusion that "pupils of the intermediate grades are uosburn, W. J. "Two Recent Books on Arithmetic," Educational Research Bulletin (Ohio State University), 9:66-73, February 5, 1930. 58 Bulletin No. 58 neither hindered nor helped in their problem practice exercises by problems of the imaginative type, when no limits are imposed upon the amounts of time of the practice periods," may be accepted as fairly dependable until better evidence has been obtained experi- mentally which reverses it. Osburn and Drennan (95) had teachers of two classes of third- grade children teach a representative list of problems with particular emphasis on the ''cues," or language aspects, of the problems. An examination made up of twenty verbal problems containing new ones, but no additional vocabulary difficulty, was given after six weeks of such instruction. On the next day, another test was administered containing twenty problems which involved vocabulary difficulties, illustrated by such terms as narcissus, gypsum, tortoise, chemist, sulfuric acid, and excavating. The data indicate that the pupils made very acceptable scores on both tests. The investigators suggest that the changes in vocabulary may have been a factor of little significance, because "mainly just 'nouns' were changed, and since the test was given the next day after the first test, that the pupils sensed the similarity of Test II to the test of the day before." This appears to be a very serious limitation of this investigation. The present writers are inclined, therefore, to give little weight to the conclusions of other studies of the influence of terminology on problem-solving achievement in arithmetic. 3. Justified conclusions. These eight studies of the effect of dif- ferent types of problems and problem statements are not comparable, and, hence, it is difficult to synthesize the findings. Most of them, however, support the principle that pupils make higher scores on tests consisting of familiar problems, or problems stated in familiar terminology. The conclusion that pupils respond more correctly to problems stated in concrete rather than imaginative or abstract form, with irrelevant elements excluded, and related to activities exper- ienced by children is less unanimously supported by the experimental evidence. This generalization is an obvious inference from the psychology of learning, but these studies contribute to our under- standing of what makes a rabblem unfamiliar. METHODS OF TEACHING PUPILS TO SOLVE VERBAL PROBLEMS 1. Summary of reported conclusions. Newcomb, 16 Stevenson, 17 Greene, 18 Clark and Vincent, 19 Washburne and Osborne, 20 Lutes, 21 JournTnfZ%%%eZfr C ^ PU So) H ° W ^ ^ Pr ° blemS *" Arithmetic '" Elementary School B.* M ll? ,e n e £ a ? n, £i5i ^' eta J; "^ e r me u 4 ial Work in Arithmetic," Second Yearbook of the Department oj Elementary School Principals Washington: National Education Association, 1923, p. 395-429 (74) calionaTplyThotogy] i(>: Jit jSSSr^Sw. ^T^ " EdUCati ° nal Measurement." Journal of Edu- 9:117-2^, Fe bruaryh924 n ^sT) 1 ^ 8 ^ PUPllS ' Em>rS in Fractions '" Journal of Educational Research, Novemb? 1 r th 1916 H '(107 I ) n ' liVidUal Variations in Arithmetic," Elementary School Journal, 17:195-200, j?„* ^ey.enfon, P. R. "Increasing the Ability of Pupils to Solve Arithmetic Problems," Educational Research Bulletin (Ohio State University), 3:267-70, October 1924 (112) ,«^c 6 u^ a I e Pv Will !f- m - "SiW? 1 S , and £ ernedial Measures,in Arithmetic," Twelfth Annual School- men s Week Proceedings Philadelphia: University of Pennsylvania, 1925, p. 296-305. (127) 'Buswell, G. T and John, Lenore. "Diagnostic Studies in Arithmetic," Supplementary Educa- tional Monographs, No. 30. Chicago: University of Chicago Press, 1926. 212 p ?ffi menlar > * auca Arithm^ fe n / ^r^iJJ : ^° m ? Re 1 U i> ts - 0f Remedial Instruction Following the Use of Diagnostic Arithmetic Tests, Catholic Educational Review, 24:19-27, January, 1926 (54) ™„™,?™ ^i ?* £' '' Im T Pr ° Vement T °- f I nstr . u ction in Arithmetic," University of Kansas Bulletin of Education, VoL 1, No. 4. Lawrence: University of Kansas, 1927. 42 p. (93) rv» ft K.r!Si7 a J ,' *.' Rernedlal Instruction in Arithmetic," Elementary School Journal, 28:124-33. October 1927. Also in: Journal of the National Education Association, 17:87-89, March. 1928 (96) Frf„JSiT P nS ; i ^i an A- N ni lb r\ Uer * P u F ' " A Su P er ™ion Project in Multiplication," Journal of Educational Research, 18:38/-96, December, 1928. (28) ri^Z^t a K E u A ,' r and F ? S ^\i£ e o- ^ An Ex Periment with Remedial Work in Common Fractions." Elementary School Journal, 29:280-83, December, 1928. (87) Septemb5 W 1929 W (14)" " Remedial Cases in Arithmetic," Peabody Journal of Education, 7:100-107. meHr P^hf^'w' "^Jvf* ?* and Treat ment of Some Common Difficulties in Solving Arith- metic Problems, Journal of Educational Research, 20:335-42, December. 1929 (24) 7:147-5?. a Nov r emberA929.' R (3 m 9f U CaSCS '" Arithmctic '- Case l " Peab <>dy Journal of Education, Octobe? U i929 ^(ASi " Improving Computational Ability," Elementary School Journal, 30:111-16, 68 Summary of Research Relating to the Teaching of Arithmetic 69 seem worth while to present in detail the reported conclusions of all of these investigations. The conclusions of the single-group experi- ments and case studies contribute to our understanding of the effec- tiveness of discovering the individual arithmetical disabilities of pupils by means of diagnostic tests and by means of first-hand ob- servation of the work of the pupil in which he is requested to think aloud in performing the fundamental operations or in solving prob- lems. 20 The conclusions of these investigations also contribute to our understanding of the effectiveness of intensive and zealous instruc- tion to eliminate the disabilities so discovered, either through the use of practice materials prepared in advance or informally at the time. These conclusions, important as they are, do not contribute mate- rially, however, to our knowledge with respect to the relative effective- ness of the various methods of diagnostic and remedial treatment. The conclusions of the controlled experiments contribute, in some measure, to our knowledge of the relative effectiveness of the various methods of diagnostic and remedial treatment. Smith (107) reported that class drill, supplemented by individual assistance on points of weakness revealed by diagnostic tests, is more effective than class drill with extra drill periods provided for the slow pupils who were drilled in groups rather than individually and class drill in which explanations were made only with respect to the group as a whole. Sister Kathleen (54) reported that remedial treatment is more effective when based on analysis and classification of the errors made on the test than when based only on class medians on the test. Neal and Foster (87) have reported that "organized practice material in the hands of the children, with provision for the diagnosis of difficulties and remedial work, is more effective in economy of the teacher's time and of the children's time and in final results in maintaining skill in the manipulation of common fractions than is the usual practice provided by the teacher." The conclusion of Stone (113) that diag- nostic and practice tests produce "greater gains in ability to reason in arithmetic than does the regular work in arithmetic that the tests may displace in classroom use" agrees with that of Neal and Foster (87). "Lazar May. Diagnostic and Remedial Work in Arithmetic Fundamentals for Intermediate Grad 't>sS™M.°£ : "Altudy %Z?fi&2£iJ£ J&itfiStic." Elementary School Journal, 29:439- 42 ' FC .S' C 2 W (1 "An Experimental Study in Improving Ability to Reason in Arithmetic, ' Twenty mnt'h Yearbook of the National Society fc* the Study of Education. Bloomington. Illinois. P » bUC 0?ou?n^^ ^^n^Sd^f ir™ 1 ^ 70 Bulletin No. 58 Evaluation of the investigations. The studies reported by Merton, and others, (74) and by Yeager (127) are to be characterized as ''descriptive accounts of what is going on in some school." Some quantitative data are given and some comparisons in achievement of different classes are reported, but it is not possible to justify the labeling of such investigations "experiments." The studies of Kallom (53), Brownell (14), Chase (24), Gabbert (39), and Soth (108) were based on data secured from the following numbers of cases: 3, 4, 17, 1, 1. Descriptive accounts of what is taking place in schools and reports of case studies are interesting. They should be very suggestive to teachers in practice. It is impossible, however, to generalize from data so restricted. Morton (81), Stevenson (112), O'Brien (93), Otto (96), Clemens and Neubauer (28), Guiler (43), and Lazar (66) conducted single- group experiments. Morton (81) used one group of thirty-six eighth- grade pupils for a period of rive months. He measured the improve- ment of these pupils as a result of diagnostic and remedial treatment by means of tests constructed by himself. The substantial gains shown may not with certainty be ascribed to the experimental factor, because of the failure to employ a control group. The single-group experiment of Stevenson (112) was described and evaluated, rather unfavorably, in the previous chapter. 21 O'Brien (93) used 357 pupils in the seventh, eighth, ninth, and tenth grades of three small school systems. After an initial program of mental and achievement testing, diagnosis was made with respect to "mental ability, previous schooling, achievement in various phases of the subject, and specific types of errors or difficulties which char- acterized the students' work." The program of remedial instruction was based on the weaknesses discovered by the tests. Pupils were informed of their individual weaknesses, and the teachers were pro- vided with general and detailed suggestions for carrying out the re- medial instruction. They were also provided with advice in confer- ences and with information in the form of abstracts of selected articles in current literature. At the end of five months the final tests were administered. While the increases in achievement are large, it is difficult to ascribe these increases to any specific experimental factor. No control groups were used, and it is evident that the pupils were subjected to a complex of factors. Otto (96) used a single group of nine fourth-grade pupils for a period of seven months. Achievement was measured by diagnostic tests, and remedial treatment was provided by means of prepared 2l See pages 58 to 67. Summary of Research Relating to the Teaching of Arithmetic 71 practice materials, but, again, because of lack of control, it is impos- sible to say how much of the improvement found is to be ascribed to the experimental factor. Clemens and Neubauer (28) employed a single group of 425 fourth-, fifth-, sixth-, seventh-, and eighth-grade pupils in twelve elementary schools of one city. Tests were con- structed by the authors which covered forty-two multiplication diffi- culties. Tests were administered four times: (1) at the beginning of the experiment, (2) at the end of a week, (3) at the end of two more weeks, and (4) at the end of three months from the administration of the third test. "Individual help was given to each pupil who failed to obtain a perfect score in the first test. After correcting the child's error and showing him how to work the example correctly, the teacher gave him the drill card designed to meet his difficulty." Substantial gains in achievement were indicated by the test results, but failure to use a control group again makes it impossible to determine how much of this gain is to be ascribed to the experimental factor. Guiler (43) used a single group of ten seventh-grade pupils for one hour a week for twelve weeks. An analysis was made of the errors of these pupils on the diagnostic tests used, and remedial instruction adapted to individual needs was provided. The gains in achievement were meas- ured by several standardized arithmetic tests, but it must be re- peated again that failure to use a control group renders the con- clusions of doubtful dependability. Lazar (66) used a single group of forty-three sixth-grade pupils. The initial status of these pupils was determined by means of intelli- gence tests, of standardized arithmetic achievement tests, of a diag- nostic arithmetic test, and by individual observation and oral exam- ination. Ten minutes of the daily arithmetic period were devoted to remedial work characterized by the experimenter as follows: (1) Spe- cific instruction on class or individual weaknesses as determined by diagnosis was given; (2) the Courtis Standard Practice Tests were used for drill on the operations in which deficiencies were shown; (3) supplementary material was devised to overcome difficulties with the addition combinations, with long division, and with fractions; (4) the pupils were taught how to make records and graphs to show their achievement, and the teacher made graphs of the class achieve- ment; (5) training the pupils to have the proper attitude toward their deficiencies was an important phase of the work. At the end of five months the initial arithmetic tests were again administered. The gains in achievement appear to be "statistically" significant. While a control group was not used, some of the functions of a control group were attained by comparison of the experimental results with test 72 Bulletin No. 58 norms. The experiment is to be commended for its comprehensive and intensive nature, but Lazar's experiment deserves criticism sim- ilar to that applied to the experiment of O'Brien (93) — the experi- mental factor was exceedingly complex. Buswell and John (20) investigated the problem of arithmetical diagnosis by means of two types of laboratory technique and by means of a comprehensive single-group experiment. In the labora- tory study of eye-movements in column addition two fourth-grade, eight fifth-grade, and seven sixth-grade pupils were used. In addition to these groups of children, three adults were used. In the report of this research data are given in graphic form, which are dependable evidence with respect to the nature of eye-movements in column addition. This evidence emphasizes the need for diagnosis in arithmetical instruction. The second laboratory investigation, in which thirty subjects were used, was conducted by means of dictaphone and kymograph apparatus. Time analyses were made of the four fundamental operations. What each child was asked to do is described in the following quotation: The children who participated in the experiment were seated one at a time at a table on which was a sheet of paper. On this paper were typewritten the ex- amples which they were to work. The only piece of apparatus in the room was a specially constructed telephone transmitter, which was clamped to the edge of the table. The experimenter sat beside the child and instructed him as to his procedure. The child was asked to give his partial answers aloud and also to say the digits which he wrote on the paper at the same time that he wrote them. In the case of an example in column addition the child was instructed to give each of the sums as he proceeded down the column. The sound of the child's voice was reproduced by an amplifier in another room and recorded by means of a dictaphone. These records were then ''transcribed on kymograph paper by using an electric time-marker and a telegraph key." The kymograph record may be described as follows. One line broken at regular intervals showed the time elapsed in intervals of fifths of a second. The second line, broken at irregular intervals revealed the time required for each partial answer. To illustrate by data secured from one child it was found that the child in adding a single column of thirteen digits required three-fifths of a second each to add 4 + 9, 13 + 3, and 16 + 2. He required 19/5 of a second to add the combination 29 + 3. Data relative to time required to perform the fundamental operations for all the subjects are presented in tabular form in the monograph. An examination of the description of the techniques used gives no reason to doubt the reliability of these data. They are additional evidence of the need for diagnosis in arithmetical instruction. Summary of Research Relating to the Teaching of Arithmetic 73 Buswell and John used a single-group of 303 children, in nine classes, in twelve elementary schools. In a preliminary study they used a single group of 250 children in the third, fourth, fifth, and sixth grades. Diagnostic sheets, for each of the fundamental processes, were followed by remedial treatment administered by the teachers to suit the individual needs of the pupils. The Cleveland Survey Test was administered before and after the ten-weeks' period of diagnosis and remedial treatment, and substantial gains were found. Buswell and John hold that these gains may be ascribed to the experimental factor, even though a control group was not used. They state: Owing to the lack of a refined technique in carrying on the experiment, a small difference between the actual improvement shown and the normal expected improvement cannot be considered significant. However, if the difference is fairly large, it seems fair to conclude that the difference is due to the diagnostic procedure and remedial instruction given by the teacher. If this contention is accepted as correct, the evaluation of the dependability of the conclusions of the other single-group experiments must be modified. The gains in achievement were, without exception, large. The present writers do not feel, however, that the conclusions derived from data secured by single-group experimentation can be as satisfying, other things being equal, as those obtained from controlled experimentation. Obviously, it is impossible to determine how much of the gains in achievement was due to inherent qualities in the meth- ods of diagnosis and remedial treatment and how much was due to additional and zealous instruction and to the mere drill afforded. Control groups were used in the experiments of Smith (107), Sister Kathleen (54), Neal and Foster (87), and Stone (113). The experiment of Smith ( 107) has already been described and evaluated somewhat unfavorably. 22 Sister Kathleen (54) used two groups of fifty sixth- and seventh-grade pupils in neighboring schools in her investi- gation of the relative effectiveness of remedial treatment based on analysis and classification of the errors made on a diagnostic test and remedial treatment based only on class medians on the test. She stated with respect to equivalence that the groups were "about the same average mental ability." The differences in gains which are not highly "statistically" significant were measured by the Woody- McCall Mixed Fundamentals Test, Forms I and II. The conclusions of Sister Kathleen seem to be somewhat more dependable than those of Smith (107), but the techniques used in this experiment were not without criticism. There is evidence of failure to control important non-experimental factors, particularly the factor of zeal on the part of the teachers. 22 See pages 30 to 33. 74 Bulletin No. 58 Neal and Foster (87) used approximately six hundred experi- mental and approximately four hundred control pupils in the fifth grade. These groups were not equivalent according to the initial-test scores, but allowance for non-equivalence is made in interpreting the results. The pupils in the larger group used "organized practice ma- terial, with provision for diagnostic and remedial work," while the pupils in the smaller group had "the usual practice provided by the teacher." The experiment lasted three months. The differences in gains in achievement, which are possibly "statistically" significant, were measured by the Stanford Achievement Test, Forms A and B, and by an informal fraction test prepared by the investigators. The experimentation deserves commendation with respect to the direc- tions given participating teachers by means of mimeographed sheets. The conclusions stated would be more satisfying to the critical reader if appropriate restrictions had been made in addition to the recog- nition given to faulty equivalence. Stone (113) made comparisons between groups of paired fifth-, sixth-, seventh-, and eighth-grade pupils of various sizes. In his pre- liminary trial 175 pairs of equivalent pupils were used. In his main trial comparisons were made between a total of 1 72 pairs. Other com- parisons were made without resorting to pairing. The pupils partici- pating in the experiment were located in twenty-three schools of five school systems. These pupils were paired with respect to arithmetic scores, mental age, chronological age, and school grade. Pairs were located in the same school systems. The pupils in the experimental groups had the benefit of a program of diagnostic and practice tests described by the experimenter as follows: The diagnostic tests were designed to accompany the survey tests. Their purpose is to afford more precise means of locating each pupil's difficulties in arithmetical reasoning. They enable each pupil to think, by graduated steps, into and through his individual difficulty. The practice tests were designed to follow the diagnostic tests. Their purpose is to afford needed practice on specific difficulties, as located by survey and diagnostic tests. They enable each pupil to rethink the reasoning involved in his individual difficulty. The pupils of the control group had the regular work in arithmetic without the benefit of a program of diagnosis and remedial treatment. The experiment lasted for five weeks. Gains in achievement were measured by the Stone Survey Tests I and II and by the Stone Reasoning Tests in Arithmetic. The differences in gains appear to be "statistically" significant. The chief criticism with respect to this experiment concerns the validity of the measuring instruments used. It seems possible that the tests may have been more valid with re- spect to the abilities engendered by the practice material. If this Summary of Research Relating to the Teaching of Arithmetic 75 was the case, some of the differences in gains should be attributed to this cause. The techniques used in this experiment are for the most part very commendable, especially those used in securing a repre- sentative sample and equivalent groups. The conclusions in favor of the diagnostic and remedial methods used with the experimental pupils are stated conservatively and as such seem quite dependable. Justified conclusions. The generalization seems justified that diagnosis and remedial treatment should be recognized as necessary phases of instruction in arithmetic. The conclusions relative to the methods of diagnosis and remedial instruction are less certain. It seems evident from the comprehensive investigation of Buswell and John (20) that individual diagnosis and remedial instruction adapted to the needs of individual pupils are most effective. Other investi- gators obtained good results by means of diagnostic tests and practice material placed in the hands of the pupils, with less individual atten- tion being given. There seems to be no reason to doubt that such methods are effective. Further research is needed, however, before it may be said that such methods are as effective as, or more effective than, methods in which emphasis is placed on direct observation of the pupil engaged in arithmetical learning activity and in which im- mediate provision of remedial instruction for the disabilities is dis- covered. It is quite evident that more attention should be given, in experimental evaluations of diagnostic and remedial methods, to the evaluation of specific aspects of such instruction rather than to evaluation of a complex of factors. CHAPTER VI METHODS OF TEACHING READING OF ARITHMETICAL SUBJECT-MATTER It is fairly well known that children differ in their abilities to read various types of subject-matter. The reading of examples and of verbal problems in arithmetic involves the use of abilities quite differ- ent from those used in reading historical description or exposition. The research referred to in the first part of this chapter indicates the necessity of recognizing the significance of unique reading skills as factors in arithmetical achievement. The small number of experi- mental evaluations of methods of teaching the reading of arithmetical subject-matter is an indication that this problem has not received wide recognition among research workers in the field of arithmetic. One of the experiments described deals with the effectiveness of general training in reading. The second experiment deals with the effectiveness of a questioning method. It is also an attempt to evaluate dramatization and story telling as means of teaching the reading of verbal problems. In the third experiment, instructions in reading were included on the problem solution sheets provided for the pupils. There is need for an evaluation of a method which is more likely to engender the specific reading abilities required for arith- metical subject-matter. Summary of reported conclusions. The necessity of instruct- ing pupils in the reading of arithmetical subject-matter has been shown in a number of studies. Buswell and John, 1 Brooks, 2 Chase, 3 Edano, 4 and Partridge 5 have reported that a technical vocabulary is needed by children engaged in arithmetical learning activity. The conclusion stated by Chase (23) is typical: .... the investigation here recorded has shown after careful study of numerous textbooks, that many problems involve conditions that are quite untrue to life; that many of the words used are quite unknown to the one hundred children tested; and finally that forty-five experienced teachers from various school systems have found the subject-matter and vocabularies of the various texts whi ch they have used quite unsuited to the capacities of their pupils. l/0Ho*™X e N?'™ a rift h ": L T e r n - rC - -! Th f Vocabulary of Arithmetic,- Supplementary Educational Monographs, No. 38. Chicago: University of Chicago Press, 1931. 146 p (21) *™„, ,• , I' , ~ A , Stu ^ of the Tech nical and Semi-Technical Vocabulary of Arithmetic" Educational Research Bulletin (Ohio State University), 5:219-22, May 26 1926 (12) * Amnmeuc - ^hase, S. E. "Waste in Arithmetic" Teachers College Record, 18:360-70, September, 1917. (23) 1:81-84; Fe n bruSy, U i r 92°8. (36) Analys,s of Ari thmetic Textual Matter," Philippine Public Schools, 26^6^^^^^ Ne ° ds in Childre »' s R ™ di "* Activities." Elementary School Journal, 76 Summary of Research Relating to the Teaching of Arithmetic 77 Several studies of errors made by pupils in the solution of arith- metical problems indicate that reading disability is an important cause of errors. 6 Studies of the correlation between arithmetical ability and reading ability seem to indicate that a small but "statis- tically" significant correlation exists. 7 In certain discussions of measurement in arithmetic it has been indicated that arithmetical achievement is in part a function of reading ability. 8 In the opinion of the present writers the most significant evidence relative to the importance of instructing pupils to read arithmetic is to be found in the laboratory studies of Buswell and John 9 and of Terry. 10 The latter investigator has stated some suggestions for instructing pupils in reading arithmetical problems which seem worthy of quotation: 1. Pupils should be taught to distinguish between the first reading and the re-reading phases in their attack on problems. 2. They should learn to consider numerals and the accompanying descriptive conditions as different elements of a problem and separable for reading purposes. 3. During the first reading, they should devote their attention to the conditions of the problem. 4. At the same time skill should be developed in partial reading of numerals. 5. While this skill is being acquired, pupils should be apprised of the essential similarity between the conditions of the problem and such details of the numerals as are perceived by partial reading. 11 Experimental investigations of methods of instructing pupils to read arithmetical subject-matter have been reported by Newcomb 12 6 H ydle L. L. and Clapp. F. L. "Elements of Difficulty in the Interpretation of Concrete Problems in Arithmetic," Bureau of Educational Research Bulletin No. 9. Madison: University of Wis- COnSin john 2 , 7 Lenor?: ^Difficulties in Solving Problems in Arithmetic," Elementary School Journal, 31: % 1 ort?n°T 1 £ r, ''S A^lV8is of Errors in the Solution of Arithmetic Problems," Educational Research Bulletin (Ohio State University), 4:187-90 Apnl 29 1925 (82) p> Stevenson, P. R. "Increasing the Ability of Pupils to Solve Arithmetic Problems, Educational Research Bulletin (Ohio State University), 3:267-70 October 15, 1924. (112) 11 -95-103 Stevenson, P. R. "Difficulties in Problem Solving," Journal of Educational Research, ll.vs-iuj, FebrU 7Hlckl 9 er 5 J M.^'The Relation between Successful Progress in Mathematics and the Ability to Read and U^de^nd, and the Factors that Contribute to Success, or Failure in Mathematics. Unpublished master's thesis in Education. Chicago: University of Chicago, 1921 82 p ;> (44) Harlan, C L. "Years in School and Achievements in Reading and Arithmetic, Journal, oj Educational Research, 8:145-49, September, 1923. (48) . .. ~„ „ nf PrnhlMn< , in Wheat, H. G. "The Relative Merits of Conventional and Imaginative Types of Problems in Arithmetic," Teachers College, Columbia University Contributions to Education, No. .359. New York. Bureau of Publications, Teachers College, Columbia University, 1929. 124p. 021) >( sDawson, C D. "Some Results in Using Starch's Arithmetic Reasoning Test, Journal of ^"'ifco^ Rea'sonfn'gTests in Arithmetic," School and Society, 8:295-99, 32 "?feS£lU G? T^AdThn/Linore. "Diagnostic Studies in Arithmetic - Supplementary Edu- al Monographs, No. 30. Chicago: University of Chicago Press, 1926. 212 p. (2UJ "Terry P. W. "How Numerals Are Read: An Experimental Study of the Reading of Isolated Numerals in Arithmetical Problems," Supplementary Educational Monographs, No. 18. Chicago. University of Chicago Press, 1922. 110 p. (115) < See also: i 7/ 7„ / .„,v /Mf/ ,i Pwrhnlnev 1 ?-365- Terry P. W. "The Reading Problem in Arithmetic," Journal of Educational Psychology, 12.^05 77 October' 1921. (A Summary of the monograph referred to above.) (116) . -..„,, 1 "Terry, P. W. "How Numerals Are Read: An Experimental Study of the Reading of Isolated Numerals in Arithmetical Problems," Supplementary Educational Monographs, No. 18. Chicago. ^^^omb^^'^i^uX &wto Solve Problems in Arithmetic," Elementary School Journal, 23:183-89, November, 1922. (90) cational Monographs 78 Bulletin No. 58 Wilson, 13 and Lessenger. 14 The pupils in the experiment of New- comb (90) were given instructions in reading problems on problem solution sheets, while in the experiment of Wilson (122) the pupils were taught to read problems by a questioning method and through dramatization and story telling. Lessenger (67) reported an experi- ment where general reading instruction was the experimental factor. These experiments lead to the general conclusion that reading instruc- tion increases significantly the ability of pupils to solve arithmetical problems. 2. Evaluation of experiments. The investigations of Brooks (12), Chase (23), Edano (36), and Partridge (99) were analytical rather than experimental in character. The need for instruction in reading was inferred from analyses, of arithmetical materials of instruction. The investigations of Hydle and Clapp (50), John (51), Morton (82), Stevenson (111), and Stevenson (112) were also analytical in nature, but the analysis was made of pupil responses to arithmetical prob- lems. Buswell and John (21) prepared group tests of arithmetical vocabulary and administered them to 1500 fourth-, fifth-, and sixth- grade pupils in several school systems. Their findings are probably the most significant in this group. It is evident that the analytical investigations are limited by the inferences which had to be made. One may not be sure from observ- ing a mistake made in a problem whether the cause of the faulty response was lack of reading ability or lack of some other ability. For example, the written performances of two pupils may be identical and thus not indicative of the fact that one of the pupils was handi- capped by arithmetic disability while the other failed to solve the problem correctly because of reading disability. Hackler (44) and W 7 heat (121) indicated the importance of reading ability in arithmetic learning activity by typical correlation tech- niques, and Harlan (48) showed that arithmetic and reading ability tend to occur together, indicating his correlation in graphic form. The correlation studies of Hackler (44), Wheat (121), 15 and Harlan (48) are limited in dependability in the sense that all correlation studies are limited when the attempt is made to interpret them in terms of cause and effect. The raw coefficients obtained between arithmetic scores and reading scores are probably due in a large meas- ure to the common factor of intelligence. If an attempt is made to partial out intelligence, the coefficient so obtained may be too much 13 Wilson, Estaline. "Improving the Ability to Read Arithmetic Problems," Elementary School Journal, 22:380-86, January, 1922. (122) u Lessenger, W. E. "Reading Difficulties in Arithmetical Computation," Journal of Educational Research, 11:287-91, April, 1925. (67) n See page 57 for unfavorable criticism of Wheat's use of correlation methods. Summary of Research Relating to the Teaching of Arithmetic 7<> reduced. Intelligence as represented in the intelligence score usually obtained includes reading ability. Partial correlation would not separate the two effectively, and the partial coefficient would of necessity be low. 16 The laboratory investigation of Buswell and John (20) has already been described and favorably evaluated. 17 Terry (115) used similar techniques. A portion of his data was secured by having his subjects record by means of a telegraph key and kymograph apparatus the time spent in the first reading and in the re-reading of arithmetical problems. The following data secured from one subject on one prob- lem are illustrative: 7.6 seconds— time required for first reading 1.4 seconds— time required to re-read one numeral 2.4 seconds— time required to re-read another numeral .2 seconds— time required to re-read last sentence Additional data were secured by means of eye-movement appa- ratus. All of Terry's data appear reliable evidence of the important function of reading ability in solving arithmetical problems. The suggestions made by Terry with respect to instruction in reading arithmetical problems may be regarded, however, only as suggestions. Terry has not shown by experimental trial that the method suggested is effective in increasing reading ability with respect to arithmetical problems. The experiment of Newcomb (90) has already been described and criticized with respect to lack of representativeness of pupils used, lack of equivalence, and failure to secure adequate control of non- experimental factors. 18 Wilson (122) used one group of thirty-four sixth-grade pupils of relatively low intelligence. These pupils were given the Stone Reasoning Test at the beginning of the experiment and were taught to read problems by a questioning method for twelve minutes three times a week for five weeks; at the end of this time they were tested again. The significant increase in achievement may not be ascribed with certainty to the experimental factor. Wilson re- ported similar results for instruction by which the children were directed to convert problems into stories and to dramatize them. One wonders how much of the reading ability so engendered would transfer to ordinary problem-solving activity. Lessenger (67) used data collected from a single group of 111 EdUCa gZ£r ^TdSSS'. 8*2* V&ffilS^tS^J^ of EtucaUenal Psychology, 21:657-79, December, 1930. 17 See pages 72 and 73. l arettatTd in fan^ilfar terminology and without i™j«™£*™^ reasonably dependable The conclusion that responses are less likely to be satistac S wfe^ d r e o P blemsare stated imaginatively is less dependably but appears Reason- able The conclusions with respect to other aspects of problem statements are even Lss dependable Further research is needed for determining what is most effective with respect to these aspects. 33 What is the effectiveness of providing pupils with systematic training in finding the facts pertaining to the problem, in deciding the processes to be used, and in finding the answer in round numbers? Aof . iA : na th e Svstematic training in finding the facts pertaining to the prob em, in deciding the dence supporting this conclusion appears reasonably dependable. 34. What is the relative effectiveness of teaching pupils to solve problems hv the graphic and by the conventional methods? > It "reported that" it is more effective to teach pupils to solve Y^ probUms .n 35. What is the effectiveness of assigning large numbers of problems in teaching children to solve problems? It * reported to be effective in increasing problem-solving achievement to ass.gn 94 Bulletin No. 58 large numbers of problems. This conclusion, while not based on acceptable exoeri- mental data, agrees with the Law of Exercise. 36. What is the effectiveness of teaching pupils to see the analogies between difficult written problems and correspondingly easy oral problems? The conclusion is reported that the method is not effective. Since this conclusion is not supported by very dependable experimental evidence, and since the method would appear to be compatible with the Law of Association, it would seem reasonable to suppose that the method is effective. 37. What is the value of diagnostic and remedial treatment in arithmetic? Diagnostic and remedial treatment is highly effective in the field of arithmetic' Ine experimental evidence in support of this conclusion is comprehensive and reasonably dependable. 38. What is the relative effectiveness of individual diagnosis in which "first-hand observation is made of the actual work of the pupil" and diagnosis by means of diagnostic tests? Conclusions have been reported in favor of both methods of diagnosis. Further research is needed to determine which method is relatively more effective It seems reasonable that both methods are very feasible. 39. What is the relative effectiveness of remedial treatment in which pupils are given organized drill material affording practice of abilities diagnosed as weak and of informal material prepared by the teacher? The conclusion favoring the expertly prepared remedial drill material is not sup- ported by adequate experimental evidence, but it does conform with other con- clusions respecting expertly prepared drill material. 40. To what extent is reading ability a factor in arithmetical achievement? ii Tha J jading is an important factor in arithmetic achievement seems reasonably well established. Further research is needed to show the precise magnitude of the influence of this factor. 41. What is the effectiveness of general training in reading in engendering greater achievement in arithmetic? General training in reading is reported effective in engendering greater achieve- ment in arithmetic. The experiment in which general training in reading constituted the experimental factor was very crude, but the conclusion is supported by the re- search which reveals that reading ability is a factor in arithmetical achievement. 42. What is the effectiveness of solution sheets containing information with respect to the manner of reading problems and containing spaces for recording of data useful at different stages in the solution of the problem? Solution sheets containing information with respect to the manner of reading problems and containing spaces for the recording of data useful at different stages in the solution of problems are reported to be an effective device in teaching pupils to solve problems. While the experimental evidence is not of acceptable dependability, the method would seem to be feasible since more direction is given to the learning activity. It is possibly more desirable for the earlier rather than the later stages of earning to solve verbal problems. 43. What is the effectiveness of story-telling and dramatization in teaching pupils to read verbal problems in arithmetic? Story-telling and dramatization are reported, on the basis of very limited experi- mental evidence, to be effective devices in teaching pupils to read verbal problems in arithmetic. This conclusion appears to be in agreement with the principle that inten- sive effort is secured in learning activity through creating a need. It is likely, however that neither of these devices should be given prolonged use. 44. What types of learning exercises are most stimulating to learning activity in arithmetic? Summary of Research Relating to the Teaching of Arithmetic 95 While the experimental evidence is not of acceptable dependability, it seems ^rfnn difficult In order that well motivated learning act.vrty may be secured, ITu^^trp supe^r pupils may be stated ^^^-^00, of the purely computational type. 45. In stimulating learning activity in arithmetic, what is the effectiveness of informing pupils of definite goals to be achieved? 46 In stimulating learning activity in arithmetic, what is the effectiveness of informing pupils with respect to their status or progress. The conclusions in favor of this method of stimulating learnmg activity are sup- ported oy dependable experimental evidence both from anthmettc and from other subject-matter. 47. What is the value of competition as a means of stimulating learning * C *1^o£Znt basis of fairly dependable experimental evidence that com- mmmmmmm SeSevfce to^HnVa cl/sfoTp^pibout of a slump in learning by relievmg the monotony of ordinary learning exercises. 48. What are the relative merits of commendation and reproof in stimulating fcm S l ^K^S are both reported to be stimulating to learning mmmmmtm conclusion also conforms to the Law of Effect. The contributions of research to the teaching of arithmetic. What constitutes a contribution depends upon the interpretation given to that term. It may be considered a contribution to show that an instructional procedure as applied to a particular group of pupils produces as satisfactory results, or nearly as satisfactory results, as another procedure may produce. Usually, however, a contribution is interpreted to mean the demonstration of the relative merits of two or more comparable procedures not merely for a particular poupot pupils, but for all groups of pupils of a certain intellectual and edu- cational status. If this more restricted interpretation is applied to the conclusions indicated in the preceding list, it is apparent that the dependable contributions of research in the teaching of arithmetic are relatively meager. Probably the most significant contributions relate to the specific- 96 Bulletin No. 58 ity of calculation abilities and to the use of practice materials con- structed so that adequate exercise is provided for each specific ability involved. Although research has not yet produced a complete and dependable list of the specific abilities in the field of arithmetical cal- culation, there are tentative lists for certain segments of this field which appear to be rather highly dependable with reference to many of the items. The superiority of practice materials which provide for the exercise of each specific ability in proportion to the difficulty of attaining it has been demonstrated. It is, of course, not unlikely that, as these tentative lists of specific abilities are refined, superior practice materials may be devised, but this possibility does not de- tract from the fact that research has already contributed to the improvement of practice materials. Closely related to this contribution is the demonstration of the effectiveness of diagnosis and of remedial instruction, and of systematic practice. Research has contributed to an understanding of the nature of pupil responses to verbal problems and of the effect of introducing certain changes in the problem statement. Pupil responses to verbal problems are more satisfactory when they are stated in familiar ter- minology, and it appears that very little reasoning enters into the response of most pupils. Reading ability appears to be an important factor in the ability to respond to verbal problems, but the precise nature of its function has not been ascertained. Systematic training in finding the data given in a problem, in deciding upon calculations to be made, and in estimating the answer in round numbers is an effective procedure for teaching pupils to solve verbal problems. Informing pupils of the status of their achievements in arithmetic is an effective means of securing intensity and persistence of effort in attaining higher levels of achievement. This procedure encourages each pupil to compete with his own past record. Competition be- tween individual pupils and between groups is also effective. There is considerable evidence that there is little or possibly no difference in the relative merits of several alternative calculation techniques. For example, the data secured in the studies of down- ward versus upward addition have been interpreted as favoring the latter technique, but the fact that the differences in achievement are so small that their significance is doubtful suggests the generalization just stated. This conclusion is also supported by a priori reasoning. If there is any significant difference in the relative merits of such alternative techniques it is likely that they would not be very appar- ent except on the higher levels of achievement, and since the function Summary of Research Relating to the Teaching of Arithmetic 97 of the school is not to produce highly expert calculators, it seems that the generalization stated at the beginning of this paragraph is the most significant contribution of the research attempting to evaluate alternative calculation techniques. Of course this generalization does not apply to cases in which one of the techniques is obviously time consuming or otherwise inefficient. For example, it should not be applied in support of "counting on the fingers." Suggestions for research in the field of instructional methods in arithmetic. The evaluation and summary of research relating to the teaching of arithmetic afford a basis for some suggestions for future studies in this field. Although it is difficult to cite much definite evidence, the present writers have been impressed with the need for additional studies of verbal problems and of the nature of pupil responses to them. In the field of arithmetical calculation investi- gators have gone far in identifying the types of examples and the abilities involved in responding to them. It seems reasonable to assume that these are types of verbal problems. Research is needed to identify these types, if they exist. There is also need for more information about the function of reading in pupil responses to verbal problems and the relation of the form and vocabulary of problem statements to these responses. Another suggested field of research relates to the instructional procedures employed in teaching pupils to solve problems. Should a method of analysis be employed? Should a complex problem be broken up into a series of simpler problems? Should a pupil be di- rected to compare the problem with ones he has solved and with solutions given in the text? What sort of attention should be given to the vocabulary? What types of learning exercises should be used in connection with verbal problems? Should pupils be taught a variety of problem types simultaneously or should each type be taught separately? To what extent and for what pupils is problem- solving activity stimulated by an occasional problem of the puzzle type? To what extent is the level of intelligence of the pupils a factor in generalization from number combinations specifically taught to those not taught? To what extent are flash cards used for drill purposes likely to engender improper eye-movement habits with respect to arithmetical subject-matter? The possibility of evaluating comparable instructional proced- ures. The relatively meager contribution of the research summarized in this bulletin probably has suggested to the thoughtful reader the possibility that comparable instructional procedures cannot be eval- uated with a high degree of precision. The evaluation of a procedure 98 Bulletin No. 58 by experimentation is dependent upon the control of all factors affecting the learning of pupils except the one being studied. The zeal and skill of the teacher in applying a given procedure affect the achievements of the pupils and these factors are difficult or impos- sible to control in many cases. Consequently it does not appear that precise and highly dependable evaluations of comparable instruction- al procedures should be expected. Attempts to determine the relative merit of certain 'methods of teaching" will show that the procedures are approximately equal in merit, except when one of the procedures is distinctly inferior. In most such cases it is likely that a competent person could accurately predict this inferiority. In support of this judgment, the requirements for precise and dependable evaluation of instructional procedures are briefly described. REQUIREMENTS FOR PRECISE EVALUATION OF INSTRUCTIONAL METHODS IN ARITHMETIC 2 1. Equivalent groups. The groups of pupils used in the experi- ment should be equivalent in all respects that will affect their arith- metical achievement during the experiment. This requirement can be approximated by pairing pupils on the basis of intelligence test scores and then comparing the groups thus formed with respect to chronological age, to previous achievement in the school subject, and to measures of arithmetical reading ability. If the differences be- tween the means and the standard deviations of the groups with respect to these three characteristics are relatively small, the groups may be considered approximately equivalent. It is desirable that the groups also be approximately equivalent with respect to personality traits, physical conditions, sex, and race. Two other techniques of securing equivalent groups may be sug- gested. The first is particularly adequate for investigations of the relative effectiveness of differing types of learning exercises. It is that of using such large groups that equivalence with respect to many factors is secured as a result of the operation of chance. 3 It should be noted that this procedure is only feasible where the learning activity of the pupils is wholly directed by means of printed or mimeographed instructions. When this procedure is used the different groups are equally r epresented in all the classes participating in the experiment. *These requirements have been taken with considerable adaptation from Tin~ n - R n ;^'^>f-o a - 7 n( ir En ^ el l 1 , art - M -,£\ "Experimental Research in Education," University of Illinois Bulletin Vol. 27, No. 32, Bureau of Educational Research Bulletin No. 48. Urbana: University ot Illinois, 1930, p. 77-79. 3 For a description of this technique, see: ir i ^°£ roe ;„ W ^ S - " H ?w Pupils Solve Problems in Arithmetic," University of Illinois Bulletin, 1929 9 °(7 )' eau ° Educational Research Bulletin No. 44. Urbana: University of Illir MM.i.-, Summary of Research Relating to the Teaching of Arithmetic 99 The second procedure which may be suggested is that used by Olander. 4 This investigator paired pupils chiefly on the basis of growth in arithmetical ability over a period of five weeks during which the pupils were subjected to the same, or similar, instruction. The argument presented for this technique may be quoted here: If two groups exhibit similar learning curves under similar instruction until a certain point is reached, it can be assumed that the groups are equal in the function in question. If a variation in the instruction of one group is then intro- duced which causes the learning curve of that group to rise abnormally, whereas the curve of the group under the unchanged technique continues to rise normally, it may be assumed that a difference in scores at any later point on the curve is attributable to the entrance of the variation in instruction. 2. Specification of experimental factor and control of non- experimental factors. The experimental factor should, if possible, be restricted to a single phase or detail of instructional procedure. The method used with the experimental group should vary from that used with the control group in only this single phase, and if other variations are permitted, their effect must be accurately measured or a plan of neutralization must be devised. 5 The total instructional procedure to be used in both groups should be specified in writing, or at least a detailed record should be kept of what is done. Controlled experimentation involves maintaining equal status for all factors in both the experimental and the control groups, except the single phase or detail of procedure which constitutes the experi- mental factor; or if the equal status is not maintained, the non- equivalence must be recognized and its effect on the experimental learning must be determined. The teacher factors whose control in arithmetic experiments appears to be the most important are (1) instructional techniques employed during the recitation period, especially those relating to the assignment, and motivation; (2) skill of the teacher in carrying out instructional techniques and classroom- management procedures; (3) zeal of the teacher; (4) personality traits of the teacher. In addition, care should be exercised to avoid marked differences in the minor teacher factors— physical condition, sex, and age. The important factors under the head of general and extra-school factors are (1) materials of instruction, (2) environment in which learning activity takes place, and (3) minutes per day devoted to learn- ing activity in arithmetic. The materials of instruction, desks, chairs, light, heat, ventilation, and other aspects of the learning environment should be identical for both groups. Study and recitation periods ♦Olander, H. T. "Transfer of Learning in Simple Addition and Subtraction," Elementary School Journal, 31:363, January, 1931. (94) c: n „i„ Variable STbis requirement is sometimes designated as the Law of the Single Variable. 100 Bulletin No. 58 should be of equal length in the experimental and control group. Parents should be urged to refrain from influencing the arithmetical learning activity of the pupils, and, possibly, should be asked to cooperate in restricting the arithmetic learning activity to the classroom. It should be noted that the precise prescription of an instructional procedure and the strict control of non-experimental factors is incom- patible with good teaching. A teacher should adapt her techniques to the needs of her pupils as they become apparent. Hence conform- ity to the requirement for precise experimentation will, in many cases, tend to reduce the effectiveness of the teaching, and this in turn will introduce an element of uncertainty in the interpretation of the results of the experiment. 3. The measurement of achievement. In the consideration of the requirements under this head, the meaning of the validity of a test should be given careful attention. The problem of an experiment, when fully defined, either specifies or definitely implies the achieve- ment to be measured. This achievement may be restricted to certain calculation skills or it may include also certain items of knowledge and certain general patterns of conduct. It may be restricted to the degree of ability possessed at the close of the period of experimenta- tion, or it may consist of the residue after a period during which there is limited exercise of the ability. A test that is highly valid for one purpose may be distinctly lack- ing in validity when used for another purpose. Consequently the validity of a test is a relative rather than an absolute characteristic, and this quality of one used in an experimental investigation can be determined only with reference to the specifications or implications of the problem. This means that the experimenter must assume the responsibility for determining the validity of the tests that he uses. The reliability of a test refers to the variable errors in the resulting scores, assuming perfect validity. If the validity is also considered, any variable errors introduced because the achievement measured is not identical with that specified by the problem must be added to the effects of unreliability. Consequently the actual variable errors in the measures of achievement may be considerably larger than is indicated by the coefficient of reliability. Finally the measures of achievement may involve constant or systematic errors. 4. The interpretation of differences in mean gains in achievement. In a typical experiment the treatment of the data results in a differ- Summary of Research Relating to the Teaching of Arithmetic 101 ence between the mean gains in achievement, or between the means of the final- test scores, of the experimental group and of the control group. If the groups are perfectly equivalent, if all non-experimental factors have been completely controlled, and if the measures of achievement are perfect— i.e., do not involve any errors, either var- iable or systematic— the obtained difference may be accepted as the actual difference in the mean gains of the two groups. These condi- tions are seldom, if ever, completely realized. Furthermore, when interpreting a difference in mean gains, the investigator usually de- sires to generalize— i.e., to make a statement with reference to the probability that the obtained difference has the same sign as the difference which might be obtained from any repetition of the experi- ment. The investigator may also wish to make a statement with reference to the probability that the obtained difference, in addition to having the same sign, is of the same order of magnitude as the difference which might be obtained from any repetition of the exper- iment. Hence, it is necessary to consider also the effect of sampling upon the data secured. In the following paragraphs attention is first directed to the statistical procedures to be employed in making allow- ances for variable errors of measurement and of sampling. The statistical procedures outlined in the following paragraphs yield the standard 6 error of the difference in mean gains, or of the difference between final-test means, due to the combined 7 effect of variable errors of measurement and variable errors of sampling. If the difference in mean gains, or final-test means, is equal to, or greater than, 2.78 times the standard error of the difference, or 4.4 times the probable error of the difference, it is customary to recognize the dif- ference as "statistically" significant. The statement may be made in interpretation, that the chances are 369 to 1, or better, that the sign of the obtained difference is not due to the combined effect of the variable errors of measurement and the variable errors of sampling. The chances that the true difference does not differ from the ob- tained difference by more than plus or minus the standard error of the difference are 2.15 to 1, by more than plus or minus twice the standard error of the difference, 21 to 1, and by more than plus or minus three times the standard error of the difference, 369 to 1. This interpreta- tion may be used when the investigator is interested in stating the ^Th^robable error may be obtained by multiplying the standard error by the constant, .6745. 'For a discussion of the fact that -?- allows for the combined effect of variable errors of meas- Vn Urem Kel?e n y d T L.'^Note upon Holzinger's Formula for the Probable Error," Journal of Educational ^SS? L 7 kn S d P Dou g b ra r ss!H 23 R. "On the Standard Errors of the Mean Due to Sampling and to Measurement," Journal of Educational Psychology, 19:643-49, December, 1928. 102 Bulletin No. 58 probabilities that the true difference is of the same order of magnitude as well as of the same sign as the obtained difference. 8 The maximum allowance which needs to be made for the combined effect of variable errors of measurement and variable errors of sam- pling may be determined by means of the following formulae in which or where scores have not been converted into comparable units, cal- l^°rA ^vidual gains is impossible. In using the above formulae. * e and tI ' ° Uld re P res ? n f th e relationships between the test scores used in pairing and the final- standard error of ?h?H-ff C r ntal a i nd t C ° ntr ? P , U , pils res Pectively. The third formula then yields the standard error of the difference between final-test means for matched grouDs 15 Lindquist, op. cit., p. 202-03. =*o^ T !T formu f ^ g^en by Lindquist has been slightly modified by the authors to represent the standard error of the difference in mean gains, and the symbols have been changed. Summary of Research Relating to the Teaching of Arithmetic 105 N e and N c represent the numbers of pupils in the experimental and control groups, usually the same, a e and