LI B R.AR.Y OF THE UNIVERSITY or ILLINOIS 5/0-7 lJiG4ct mo n-is 5SQ^A?I0N Digitized by the Internet Archive in 2013 http://archive.org/details/attempttofindanp13rose UNIVERSITY OF ILLINOIS Urbana, Illinois An Attempt to Find an A priori ivleasure of £tep Size Ellen F. Rosen and Lawrence M. Stolurow COMPARATIVE STUDIES OF PRINCIPLES FOR PROGRAMMING MATHEMATICS IN AUTOMATED INSTRUCTION Technical Report No. 13 July, 1964 Co-Investigators: Project Sponsor: Lawrence M. Stolurow Educational Media Branch Professor, Department of Psychology U. S. Office of Education Training Research Laboratory Title VII Max Beberman Project No. 7111 51 .01 Professor, College of Education University of Illinois Committee on School Mathematics (UiCSM) ijj. .-« .!». 'I- U. S. Office of Education Title Vn COMPARATIVE STUDIES OF PRINCIPLES FOR PROGRAMMING MATHEMATICS IN AUTOMATED INSTRUCTION Technical Report No. 13 An Attempt to Find an _A Priori Measure of Step Size Ellen F. Rosen and Lawrence M. Stolurow July, 1964 ^I ■iic v.. ■.•:;• u Table of Contents Page List of Tables and Figures ill Problem 1 Method 1 Materials 1 Judge's materials 2 Procedure for judges 2 Results 3 Correlations of judgments with empirical difficulty 3 Conclusions • 9 Summary 11 Appendix A , 12 Appendix B t « t • • 13 References 20 '^ :v!c a^,>jc::' I . . .vn;-f>^ui;:!t: Ill List of Tables Table No. Page 1 Descriptive Statistics on Judges' Ratings of Two 4 Versions of Part 112 of the UICSlVi Programed Learning materials Distribution Statistics for Empirical Difficulty (Student's Response) Under Three Conditions of Use for the Two Versions Correlation of Judged and Observed Step Size for the Condition of Use Called Pure Mode (Program Only) Correlations of Judged and Observed Step Size for the Condition of Use Called Lead Mode (Program First) Correlations of Judged and Observed Step Size for the Condition of the Called Follow Mode (Program Follow) An Attempt to Find An A Priori Measure of Step Size Ellen F. Rosen and Lawrence M. Stolurow PROBLEM Step size in an important determiner of student performance. Although it may seem to be so, step size is not readily measurable. Logically, the most reasonable measure of step size is empirical difficulty as calculated from student performance, but this is an a posteriori measure. An a priori measure is needed. The present investigation is an attempt to find a fine grain predictor of empirical difficulty. METHOD Subjects and Judges The judges who served as raters were ten programers from the staff of UICSM. The subjects (students) have been described elsewhere (Beberman and Stolurow, 1963 , Quarterly Report 9 & 10, Chapter vn). IVIATERIALS Student ^s materials. The materials consisted of the two versions of Part 112 of the UICSM-PIP materials (See Beberman and Stolurow, 1963). Large step size version prepared by Clark Himmel. n ■./■ » 2 Two booklets were prepared for the students' use and were assigned randomly to those available for the study. One version was called the small step version and designated 11 2S, the other was the large step version and designated 11 2L. Both versions were given to students as learning materials under three conditions of use in conjunction with a teacher. In one condition, the program was given to the students, after which the teacher covered the material. This was called the "lead" mode. In a second condition, the program was given to the students, after the teacher had covered the material. This was called the "follow" mode. In the third condition, called the "pure" mode, only the pro- gram was given to the student; the teacher did not cover the material. Judge's materials. Two booklets were prepared for the judges. Judges 1 and Judges 2. These two books consisted of a segment from both student versions so that each judge rated half of each student version. Procedure for judges. Judges were given one form of the judge's booklets and asked to rate it according to four categories. A copy of the instructions to judges is presented in Appendix A. The instructions are self- explanatory. They define and illustrate the judge's task which was to relate pairs of adjacent steps and to rate changes in complexity on a scale from -5 through +5 on four separate characteristics: (a) the concept; (b) the vehicle; (c) the numeral; and (d) the response. RESULTS The judges ratings were converted into standard scores for each category (Guilford, 1956, Pp. 489-494). The standard scores for each step were then averaged across judges within categories and across categories and judges. Thus two sets of ratings were arrived at, one for each (student) booklet version. From the students' responses an empirical difficulty was calculated (percent of students getting all the problems on the page correct). The means and standard deviations for the ratings and students under the three different conditions of teacher presentation are presented in Table 1 and Table 2, respectively. Correlations Of Judgments V/ith Empirical Difficulty Tables 3, 4, and 5 present the correlations of step size judgments with empirical difficulty. The judgments and empirical difficulty were paired by considering the difficulty of the last page of the step as the measure to be predicted. Thus, for example, each judge's ratings of the step from page 1 to page 2 of Part 112, was paired with the empirical difficulty as calculated from students' responses to the questions on page 2 of Part 112. It might be noted here that Part 112 has more than one problem per frame. Consequently these data are likely to have greater reliability than those obtained from more conventional linear programs with only one response per page. 4:V. ;'»^;,:K Tabic 1 Descriptive Statistics on Judges' Ratings of Two Versions of Part 112 of the UICSM Programed Learning Materials Versions Category' Mean Rank Standard Amount of Error Change Part 11 2S" Concept (small step) Vehicle Numeral Response Total Part 112L^ Concept (large step) Vehicle Numeral Response Total -.085 5 .846 .011 1 .593 .010 2 .654 -.004 3 .668 -.017 4 .401 .172 1 .503 .008 4 .854 0.000 5 .797 .045 3 ,696 .056 2 .523 ^hese categories are described in Appendix A. I- Based on the average rating of five judges on 51 steps using a standard score conversion of scale values. ^Based on the average rating of five judges on 32 steps using a standard score conversion of scale values. Table 2 Distribution Statistics for Empirical Difficulty (Student's Response) Under Three Conditions of Use for the Two Versions Version Conditions iviean Standard of use Difficulty Deviation 112S (small step) Program Lead^ Program Follow 78.425 18.705 75.490 19. 007 "Pure" (Only 75.686 17.740 Program)^ 112L Program Lead (large step) 78.361 17.983 Program Follow® 76. 875 22. 141 "Pure" (Only 74. 023 18. 229 Program)^ oased on sample of 11 students on 51 pages, based on sample of 8 students on 51 pages. Q based on sample of 20 students on 51 pages. based on sample of 13 students on 32 pages. based on sample of 10 students on 32 pages, f based on sample of 16 students on 32 pages. .r -i Table 3 Correlation of Judged and Observed Step Size for the Condition of Use Called Pure Mode (Program Only) Version Concept # Vehicle # Numeral # Response t/ Total Part 112S (51 frames) Part 112L (32 frames) -.080 -.270 -.071 -.293 -.010 -.329 -. 278** -. 360* -.178 -.429* *for H^: jO=o, T Q^ = . 349 for 30 df (two-sided). **for H^: |^= 0, r gg ^. 274 for 49 df (two-sided). Table 4 Correlations of Judged and Observed Step Size for the Condition of Use Called Lead iV^ode (Program First) Version Concept Vehicle Numeral Response Total Part 11 2S -.096 -o 127 -.213 -.336** -.312** (51 frames) Part 11 2L -.248 -.188 -.419* -.271 -.386* (32 frames) *for H^: |^= 0, r gg = . 349 for 30 df (two-sided). **for H^: P= 0, r gg ^.274 for 49 df (two-sided). Table 5 Correlations of Judged and Observed Step Size for the Condition of the Called Follow Mode (Program Follow) Version Part 112 S (51 frames) Part 112L (32 frames) -.031 -.089 065 108 052 -. 434* .. 293** -.175 -.089 -.289 ♦for H^: p= 0, r gg = . 349 for 30 df (two-sided). **for H : p = 0, r ^^ s. 275 for 49 df (two-sided). Correlations significantly different from zero at . 05 level were obtained from (1) the pure mode (Table 3) between (a) the response category ratings and the empirical difficulty for both the large and small step size program^ and between the overall average (total) rating and difficulty for the large step sequence; (2) the lead mode (Table 4) between (a) the numeral category and difficulty for the large step sequence, (b) the response category and difficulty for the small step sequence, and (c) the average overall rating across categories for both sequences; and (3) the follow mode (Table 5) between the numeral category and difficulty for the large step sequence, and between the response category ratings and difficulty for the small step sequence. CONCLUSIONS The results of this study are not exactly clear. A quick glance at Table 2 indicates that, in fact, the average empirical difficulty of the steps did not differ for the two versions within the presentation mode. This is probably due to the fact that the two versions were prepared before the beginning of the study. The large step version was generated by means of deletion of frames which were felt to be unnecessary. Thus, it is quite probably that the two versions really did not differ in terms of st ep size. This has potentially important implications for the previous studies of step size (Coulson and Silberman, 1960; Evans, Glaser and Homme, 1960; Glaser and Reynolds, 1962; I -accoby and Sheffield, 1958; Iv.argolius and Sheffield, 1961; Smith and ivioore 1961. ) in which the typical method of 10 manipulation has been the simple deletion or addition of frames to create the so-called larger step version. Present results suggest that the deletion pro- cedure may produce an illusion of change other than an actual change in step size. Certainly this simple manipulation is suspect unless step size changes are documented by some additional information relating to program changes produced by frame deletion. The important point of these results is that step size and number of frames deleted are most likely not in one-to-one correspondence; when aiming at increasing step size one also must consider quality (kind of material deleted) as well as quantity (number or amount of material deleted). This issue of quantity and quality will be discussed in a report on sequential analysis of parts within the sequence and frames within the parts. The data in Tables 3, 4 and 5 suggest that variations in difficulty probably could be achieved by systematic variation in the response and numeral characteristics of the steps. These two dimensions seem to be the most promising basis for changing step size. Contrary to the finding of Rothkopf (1963), this study has shown that judges can reliably estimate empirical difficulty by examining the stimulus materials. In part, reliability was obtained, with the present rating scale, by using judgments based upon changes between adjacent frames. The indices that seem to be most promising for this purpose are response and numeral, the former being somewhat more dependable (significant correlations in three out of four possibilities) than the latter (significant correlations in one out of four possibilities). 11 SUMMARY This study is an attempt to develop a methodology for the estimation of empirical difficulty under conditions in vMcli the reirtive range of step sizes is small. The judgment of changes taking place from frame to frame were obtained with a standardized 10 point scale which required the judges to evaluate four characteristics of the stimulus materials: concept, vehicle, numeral and response. Judgments were obtained for a "small-step" version and for the same material with some steps deleted ("large step"). The stimulus materials were booklets consisting of 54 and 35 frames respectively, taken, as a random sample from the original version of the experimental edition of the UICSM High School mathematics programed materials. I;-i -••J/; 12 APPENDIX A^ In::tr'.!cli'^r.r f -ir Judges We are interested in the similarities and differences in pairs of adjacent pages or "learning steps" contained in the accompanying booklet of programed instruction, and we v/ouM like you: help in finding out how much these adjacent pages are different from and similar to each other with regard to the complexity (abstractness) of certain given characteristics of the material present in the pages. (The pages to be judged will be considered in serial order, i. e. , pages i and 2 v/i.U be compared, then pages 2 and 3, then pages 3 and 4, etc. through the final two pages in booklet. ) V/e want you to rate the changes in complexity (abstractness) of certain characteristics in going from the first page of the pair to the second page on a scale from -5 through +5, with a rating of zero (0) representing no change in the complexity of a characteristic, ratings above zero representing progressively increasing complexity from the first to the second page, and ratings below zero representing progressively decreasing complexity from the first to the second page, ::.o that a rating oijS represents the most expreme change in complexity of a characteristic in either direction, U a characteristic is not present on either of the pages of the pair, record a zero (0) as your rating. 2 Prepared and developed by Clark Eimmel to conform to the dimensional requirements developed in work with a program or fractions by L. M. lEtolurow witli the j?Grl "iice of Gaiia Grubb. 18 The four characteristics that we want you to consider are (A) the Concept, (B) the Vehicle, (C) tlie Numeral, and (D) the Response. A description of each of these characteristics, along with an example, and a rating guide is given below. Concept : refers to the mathematical rule, principle, idea, or closely related group of rules, concepts, conventions, ideas, or principles in mathematics; such as, the associative principle of addition, or the axiomatic system in Euclidean geometry, or the idea of negative numbers. You should be looking for one of the following: Changes in the complexity, in levels of description or in manner of pre- sentation. You are to identify and rate these changes when leaving one concept and turning to another as they happen within two adjacent pages. Also, note changes in overall complexity when two or more concepts (or, if you prefer, "sub- concepts") are presented simultaneously on one or both of the pair of pages being considered. For example, if only addition is presented on one page and both addition and multiplication are presented on the following page, the change probably is an increase in the complexity of this characteristic. If this occurred then the rating assigned to the pair of pages might be a +2 for the concept. 14 Vehicle: that which is used to help communicate or convey the concept (and the associated material) being presented by giving a con- crete or exemplar background or "real setting" to the problems and expository material; such as, two airplanes traveling toward each other in a rate of travel problem in algebra, or the ledger entries for a retail business in a bookkeeping problem. This characteristic is one which may not be present on all program steps. Consider the vehicle "a road with mile markers" for presenting the idea of real numbers (both positive and negative), where a trip from R to B (represented 3) is a +3 Z M E . R L and a trip from T to B (represented '2 ) is a -2. K this same vehicle with no additions or deletions is present on both pages of a pair, the rating assigned would be zero (0). If it is absent only on the second page of the pair, the rating assigned would be 15 + 5. (The abov^e assumes that no new vehicle characteristics were introducc^d on either of the pages in the pair. ) If something (diagrams, notation, verbal explanation) ib added to the vehicle or a new vehicle is introduced in going from the first page to the second, a rating comii)3nsurate v;ith the accompanying change in complexity should be assigned. If the same material were deleted from the second page, a rating commensurate with this change should be assigned. Numeral : refers simply to all symbols for or representations of numbers presented, by the Roman numerals, Hindu-Arabic numerals, or others, plus their accompanying "operators" and "designators, " such as +, ^,"^2 , =, or -7, so that an entire expression like (+16 -^ -4) X +2 = -8 would be considered under this characteristic. Consideration should be given to changes in complexity in the types of numerals given on the pages. This should be relatively straightforward, since numerals and their "operators" and "designators" are presented in an explicit notation system. For example, a first page might present addition of simple three digit numerals while the next page calls for multiplication of the square roots of similar three digit numerals. Then the pair would probably receive a fairly high positive rating, perhaps a +3. 16 Response : refers to the particular answer(s) to be chosen, constructed or written, or in some way indicated by the student as he finishes the probiem(s) or question(s) on a page. Response complexity will vary due to the characteristics of the actual response given and due to the abstractness or difficulty of the specific question(s) or explicitly stated problem(s) to be answered or solved. For example, a response that would be relatively complex in the UlCSIi/i Unit I material would be one which is constructed or v/ritten by the student; for example, "the associative principle of addition. " A relatively less complex response would be choosing one of two alternatives. The second facet of "response" to be considered is the nature of the problem(s) or question(s) to be answered. It also can be scaled in terms of complexity or abstractness. A question like "2 + 2 = 7' is probably less complex than a long and tedious word problem which also requires only a single digit answer. Each of the characteristics on the pair of steps (pages) to be compared should be rated with regard to the change in complexity (or abstractness in the sense of being abstruse, more difficult to comprehend, ideationally complex or intricate) in going from one step to the next one. 17 On your rating sheets you will find the four characteristics listed as headings of four columns. Each pair of pages to be compared and then rated is listed at the left. When comparing pairs of pages, do not include the answers and "feedback" material (usually included between the statements "check your answers" and "record your results") in your considerations for rating. V/e are interested in having you rate the "instructional" and "question" portions of the pages. Remember: 1. Rate Changes on the scale from Mid -point +5 -5 Increased (no change) Decreased Complexity Complexity 2. Consider the four following characteristics when rating each pair of pagas: A. Concept B. Vehicle C. Numeral D. Response 3. For each characteristic consider the amount of change in your perception. ■y ^:- APPENDIX B Sample Rating Sheet 18 Name PAGES Concept Vehicle Numeral Date Response 1-2 i 2-3 1 1 1 3-4 i 4-5 i 5-6 6-7 i 1 7-8 i i 8-9 9-10 10-11 "! ! 1 1 11-12 j 1 1 12-13 1 13-14 1 1 14-15 1 15-16 1 1 16-17 1 i 1 j i : iC Name Date PAGES Concept Vehicle Numeral Response 1 i 17-18 \ 1 i 18-19 i 1 : 1 ! 1 19-20 i 20-21 1 21-22 ' 1 i 22-23 23-24 24-25 25-26 26-27 1 27-28 1 1 j 28-29 1 29-30 30-31 1 1 31-32 32-33 1 33-34 1 34-35 1 35-36 1 1 1 i 20 Name PAGES Concept Vehicle Date Numeral Response 1 36-37 37-38 38-39 39-40 41-42 42-43 43-44 44-45 45 21 REFERENCES Beberman, M, and Stolurow, L. hi. Comparative studies of principles for programming mathematics in automated instruction. Semi-Annual Report (Quarterly Reports 9 and 10), USOE, Project No. 711151.01, June 6, 1963 - December 6, 1963. Coulson, J. E. and Silberman, H. F» Effects of three variables in a teach- ing machine. J. educ. Psychol. , 1960. 51, 135-43. Evans, J. L. , Glaser, R. , and Homme, L. E. An investigation of variation in the properties of verbal learning sequences of a teaching machine type. In A. A. Lumsdaine and R. Glaser (Eds. ) Teaching Machines and programed learning: A source book. Vv^ashington, D. C. : Dept. of Audiovisual Instruction, Nat. Educ. Assoc. , 1960, Pp. 446-51. Glaser, R. and Reynolds, J. K. Multi -tracking in programmed instruction: Studies with a program on mathematical bases for management decision making. Pittsburgh, Pa. : Univer. of Pittsburgh, Pro- grammed Learning Lab. , 1962. Guilford, J, P. Fundamental statistics in psychology and education. New York: IvicGraw-Hill, 1956. Maccoby, N. and Sheffield, F. D. Theory and experimental research on the teaching of complex sequential procedures by alternate demonstration and practice. In G. Finch and F. Cameron (Eds. ) Symposium on Air Force Human Engineering, Personnel and Training Research. Washington, D. C. : Nat. Academy of Science, Nat. Research Council, 1958. Margolius, G. J. and Sheffield, F. D. Optimum methods of combining practice with filmed demonstration in teaching complex response sequences: serial learning of a mechanical assembly task. In A, A. Lumsdaine (Ed. ) Student response in programmed instruction. Wash- ington, D. C. : Nat. Academy of Sciences, Nat. Res. Council, 1961. Rothlopf, E. Z. Some observations on predicting instructional effectiveness by simple inspection. J. programed Instruction , 1963, 2, (2), 19-20. I Smith, W. I. and Moore, J. W. Size of step and cueing. Psych. Reports, I 1962, 10, 287-94. i * '- ji