Colleqe y Aqricuftutt ' University of Illinois Library at Urbana-Champaign ACES UNIVERSITY OF ILLINOIS Agricultural Experiment Station BULLETIN No. 119 TYPE AND VARIABILITY IN CORN BY EUGENE DAVENPORT AND HENRY L,. RIETZ URBA-NA, ILLINOIS, OCTOBER, 1907 SUMMARY OF BULLETIN No. 119. Three conceptions of type should rest in the breeder's mind : 1. The ideal, or standard for selection; attained by few individ- uals, perhaps by none. Page 2 2. The mode, or prevailing type as represented by the highest proportion of what the breeder actually produces. Page 6 3. The mean, or average of all the breeder produces. Page 7 Variability is deviation from type. It is best indicated by the standard deviation, a mathematical expression involving the devi- ation of every individual. Page 11 Variability may be reckoned from the mean, the mode, the selec- tion standard, or any other desired basis. Page 12 The coefficient of variability is a purely abstract expression for variability, so that by its means the variability of one character may be compared with that of another either in the same or different races. Page 12 The effect of selection is to shift the type without greatly reduc- ing variability. Page 17 Each character of every race has a variability that is natural, and this variability cannot be greatly reduced by selection. Page 19 The indirect effect of selection is to influence physical or other characters correlated with those selected. Page 20 The type of ear is directly affected by fertility, so far as length, circumference and weight are concerned, but not as to number of rows. Page 21 Variability is slightly less on fertile land than on lands giving lower yields. Page 24 The breeder of the future will be a statistician and a book- keeper. TYPE AND VARIABILITY IN CORN BY EUGENE DAVENPORT, PROFESSOR OK THREMMATOLOGY, AND HENRY L,. RIETZ, STATISTICIAN The purpose of the present bulletin is to outline and define a clearer conception of type and variability than commonly rests in the breeder's mind, and to present certain data showing conditions that influence type and variability in corn. TYPE AND VARIABILITY IN GENERAL The subject is treated by the statistical method, now every- where employed for the study of the more complicated questions in variation and heredity. 1 This method was first used by Galton in his study of stature of English people (See Natural Inheritance) and afterward elaborated by Pearson and others and applied to the study of heredity problems generally. No excuse is offered for em- ploying the method of treatment here, because it is the only proper one for these purposes and because the time has come when breeders generally are expected to be somewhat familiar with this method of study. The reader is therefore urged not to pass by this form of study because it may happen to be new and unfamiliar. The technical terms and conceptions, such as standard deviation and coefficient of variability, are no more difficult than are interest and percentage, and a little careful attention will enable the reader to become fully acquainted, not only with their meaning and the method of determination, but with the larger conceptions of hered- ity that come with their habitual use. WHAT is MEANT BY TYPK A farmer plants corn from an ear, say ten inches in length. What he gets is not a crop of ears all ten inches long, nor of any other even length, but rather a mass of ears ranging in length all the way from three or four inches up to perhaps eleven or twelve, and very unevenly distributed between the extremes. The same ] For a more complete statement of this method of study of breeding problems the reader is referred to chapters X and XI of Pearson's "Grammar of Science," published by A. and C. Black, London, or Part III of Davenport's "Principles of Breeding," Ginn and Company, Boston. 2 BULLETIN No. 119. [Octobe,-, principle would have held if the ear planted had been nine inches long instead of ten except that the distribution would have been different, lengths running in general slightly lower; that is to say the length of ear in the offspring is not the same as that of the par- ent but it constitutes a "distribution" extending both above and below that length. So far as known this principle of transmission holds true in all races and for all characters. Stated in more general terms, apply- ing to all breeding, we may say that the offspring a.y a whole is not the same as the immediate parents but it constitutes a distribution extending from near the lower to approximately the upper limits of the race. This suggests at once the idea of type and that deviation from type which ive call variability. What now is our conception of type? If ten inch ears will not bring ten inch ears but something else, and not only something else but a considerable variety of lengths; and if what we get extends both above and below the parent, then we arrive at once at a double conception as to type ; that is to say the type of the offspring is not the same as that of the parent. The type of the parent is very def- inite, representing an ideal ; but if the offspring is distributed both above and below that ideal, some better and some not so good, then a close analysis of the real character of that offspring becomes nec- essary in order to make any just comparison between the two or to arrive at any adequate conception of type in a mixed population, even in one arising from a selected ancestry. A concrete case will serve best to illustrate the principle involved. In the year 1906, some Learning corn was raised on good ground from seed ears of ten inches in length. A "random sample" 1 of this crop consisting of 327 ears gave the following distribution as to length : One ear was 3.0 inches long; one was 4.0 inches; two were 5.0 inches; three were 5.5 inches; nine were 6.0 inches ; eight were 6.5 inches; twelve were 7.0 inches; nineteen were 7.5 inches; thirty-two were 8.0 inches ; forty were 8.5 inches ; sixty-seven were 9.0 inches; sixty-three were 9.5 inches; thirty-eight were 10 inches; ..'By a random sample is meant a sufficient portion of the whole and taken so much at random as to fairly represent the entire crop, or total "population" as the technical phrase goes. Statistical problems, were first studied with reference to people and the term population was thus a natural one. As the studies have been extended to other fields, even of inani- mate nature, we still retain the old terms and "population" in this sense is applicable as well to animals as to men; to bricks or stems as to either. /9Q7-] TYPE AND VARIABILITY IN CORN. 3 twenty-one were 10.5 inches; eight were n.o inches; two were 11.5 inches, and one was 12.0 inches long. 1 Put in tabular form as it appears in actual work we have the following : 2 , LENGTH OF EARS NO. OF EARS OR OR VALUE -V FREQUENCY- f 3.0 / I 3.5_ _0 4J5 Q 5.0 // 2 5. 5 /// 3 6.0 fHJ /III 9 6. 5 /////// 8 7.0 fHJ fHI II 12 7.5MLMMM 19 8. Q/W fHI tHI fHJ fHJ fHJ II __3 2 8.5 W fHJ fHJ fHJ IHI fHJ fW fHJ _40 V.tolHI mi Ml Wl M M IW-fW Ml fW IHI Ml IHJII 6 7 9 . 5 /W fHJ fHJ fHJ fHJ fHJ fHJ fHJ fHJ fHJ fHJ fHJ III _63 IQ.Q/W 'fHJ fW IHI fHJ fHJ fHJ III _3 8 \*>fHJ fHI M tWI _2 I I I.O/W/// _8 l2.Q/ Here we have a "frequency distribution" representing the entire "population" or crop, and as it lies spread out before the eye a glance is sufficient to afford considerable information as to the pre- vailing type. 'Measurements might tc taken at quarter inches with a seeming higher degree of ac- curacy, but repeated trials show that the same final results follow whether measurements are taken at the quarter inch or at the half inch. The main point is that the numbers be suf- ficient and that the sample be representative. Judgment must dictate as to the accuracy of the s-amDle, but the number depends upon the degree of reliability desired. This matter will be fully discussed in the appendix under probable error, but experience shows that in studies with corn excellent results can be gotten with 300 to 400 ears, and very fair results may generally be had from half that number. 2 Thts is tbc most convenient form in which to make the ^'"'"p' record. A mark is made for every individual examined, after which Uie additions are i^ao"rry^^. fcd totals constitute the "frequency distribution" and each group (as 12, 19 etc.,) is as a "class," and its measurement (as 7, 7-5, etc.,) is known as the class mark or value. 4 BULLETIN No. 119. [October, It will be noted at once that there are more ears 9.0 inches long than of any other length and that the distribution decreases in both directions from this "highest frequency." The Mode. This highest frequency or most common length is called the mode. It shows clearly what is the prevailing type as to length in the crop, as distinct from the selection type in the seed ear. This mode represents the value or measurement that is of the most common occurrence, and it is held by statisticans and by students generally to be the best obtainable single expression for type. When it is ascertained, therefore, we know at once what one might conceive as the natural type of the race or variety so far as the character in question is concerned. When this is determined for a number of important characters we shall have a good knowledge of racial type as a whole. Thus we might after the same manner obtain the mode for circumference, number of rows, weight of ear, color of grain, percent of corn to cob, or any other desired charac- ter. Having done so a typical ear of this variety could be definitely described. We thus arrive at an accurate idea of type and of its definite measurement as well. 1 For the purpose of comparing the variability of races we use the "coefficient of variability" to be described later. The modal co- efficient is chiefly valuable for comparing one type with another within the race, which is all that is required in practical breeding. Practical Value of the Frequency Distribution, the Mode, and Modal Coefficient. The practical importance of the information afforded by these values must be apparent. By means of the frequency distribution the breeder is enabled at any time when he can secure sufficient numbers, to spread out before his eyes a good and fair representation of the whole population of the variety or race he is breeding, and in respect to any character which he can measure or accurately estimate. *The Empirical and the Theoretical Mode. It is evident by inspection of the frequency table that if measurements had been taken at the quarter inch, or some less fraction, the highest frequency would have fallen not at the nine inch point, but slightly above it, say at 9.25 for example, for the next frequency above (63) is greater than the next one below (40); that is to say the mode is to some extent dependent upon the scheme of measurements adopted. A mode so determined is therefore only a close approximation to the actually most common length, and it is known as the empirical mode. If, however, the theoretical curve should be platted then all values would be accurately represented (see appendix) and the highest point in this curve would be the actual, or as it is called, the theoretical mode. In practical breeding operations the empirical mode arising from convenient measurements is sufficiently P/*- " aLC - ** ' ea ^s_Jxi no error becao a convenient scheme of measurements <,** A'mid is generally employed by all observers, so that empirical modes are comparable, ihus the scheme of half inch measurements is the one likely to be universally employed for corn. 7907.] TYPE AND VARIABILITY IN CORN. ' 5 When he has ascertained its mode he knows what is the natural type, for mode indicates type ; and he then knows by how much, if any, it differs from the type which he has chosen as the standard for selection. By this he may judge whether and to what extent he is operating at variance with nature. The Mean. There is still another conception of type as to this distribution, and that is the average or mean as it is technically called. It will be noted that the distribution does not decline uni- formly both above and below the mode; that is to say there are twelve values below and only six above, from which we conclude that the average length of ear is somewhat different from the most usual length. By multiplying each separate length by the number of ears oi that length and adding the products, (or, what is the same thing, adding together the lengths of all the ears) then divid- ing by the total number of ears we find the average or mean length to be 8.83 inches. Accordingly we have the following for the determination of the mean. 1 Multiply each value by its frequency, add the results and divide the sum by the number of individuals or variates. Applying this principle to the case in hand we have: 2 V f fV 3-o X I = 3-o 3-5 X = o.o 4.0 X I = 4.0 4-5 X o = 0.0 S.o X 2 10.0 5-5 X 3 = 165 6.0 X 9 = 54-0 6-5 X 8 = 52.0 7-0 X 12 = 840 7-5 X 19 = 142.5 8.0 X 32 256.0 8-5 X 40 =z 340.0 9.0 X 67 = 603.0 9-5 X 63 = 598.5 10.0 X 38 = 380.0 10.5 X' 21 = 220.5 II. X 8 88.0 "S X 2 23.0 12.0 X I = 12.0 327 2887.0 2887.0 -=- 327 = 8.83 the mean length of ear in inches. 1 By "mean" is Iiere meant the "arithmetical average" which is the average most com- monly accepted. z ln this table "V" stands for "values" or "magnitudes" in this case length and "f" stands for frequency, or the number of varieties (ears) of each separate class. The head- ing "f V" means the products of the values (lengths) multiplied by the corresponding fre- quencies. 6 BULLETIN No. 119. [October, Here we have a third valuation for type (8.83) representing the average as distinct from 9.0 of the highest frequency representing the most usual length, and both distinct from the 10 inches of the ear planted. Practical Use of the Mean. The mean gives a good average value of the character, and establishes the practical or commercial value of a race or variety, for it shows what it will do on the aver- age. It is not always, however, a good index of the prevailing type, for as often happens, the variety with the higher mean may have the lower mode. Neither is the mean always a good index of con- ditions ; for example, in a population of one thousand paupers and one millionaire, the mean wealth is fair, but the type is clearly that of the pauper. Here then are three separate and very definite conceptions of type, all of which have distinct applications to the practical affairs of breeding: i. The ideal, which is used in selecting the parentage. 2. The prevailing type of the offspring as represented by the highest frequency (the mode). 3. The average of the offspring as repre- sented by the mean. These distinctions apply not only to length of ears in corn, but to all characters and all races ; that is, to breed- ing in general. The breeder of pedigreed stock is interested primarily in the ideal and in the mode or highest frequency, while the general far- mer who multiplies or raises it for the open market is most inter- ested in the mean or average production. 1 VARIABILITY, OR DEVIATION FROM TYPE Having established definite distinctions as to type the student of breeding problems should form equally clear conceptions as to deviation from type, commonly known as variability. 2 . 'It is lo be noted (bat the generation to which the selected parent belonged had also its own mode and mean which may have been quite different from those of the offspring. 2 The ttrm variability should not be understood as expressing departure in the sense of wandering from a fixed standard. Students sometimes g&in the impression that if the law of heredity were infallible all offspring would be of a common type, and that any departure from the type of the race, variety, or breed is to be regarded as by so much a failure of hered ity and a concession to variation. The truth is that all transmission is heterogeneous in the sense that the individuals of any race, whether parents or offspring belong not to a fixed type but to a frequency distri- bution similar to the one now under discuss-on, and the idea of type arises out of the dis- tribution. The chief conception to rest in the mind of the breeder is that whatever the parentage, the offspring will constitute a distribution extending through a considerable range, and that the parent itself also belonged to and was drawn from some portion of a frequency distribu- tion is not very different from that of the race in general. Variability is therefore not the opponent of heredity but its inevitable accompaniment in transmission and our problem is to devis-e methods of accurately measuring and express- ing its range and extent in any particular instance. 7907.] TYPE AND VARIABILITY IN CORN. 7 In the study of variability it is worse than useless to study a few scattered individuals here and there. What we seek is a measure of what may be called the average tendency to deviate from type. Some individuals deviate but little, others more, and still others very much; and we seek a measure of this non-conformity to type. To find this we must study groups of individuals sufficiently large to be representative of their race. This brings us back to the fre- quency distribution and what it can teach as to variability. Again the concrete serves well as a medium of teaching a prin- ciple. In this connection we refer once more to our distribution of 327 ears and note that every ear in the lot deviates somewhat from the mean of 8.83 inches. The range and extent of this deviation are shown in the following table, column D. V f D 1 3-0 I -5.83 3-5 o -5-33 4-0 I -4-83 4-5 o -4-33 5-0 2 -3.83 S-5 3 -3-33 6.0 9 -2.83 6-5 8 -2-33 7.0 12 -1-83 7-5 19 -1-33 8.0 32 -0-83 8-5 40 -0-33 9.0 6 7 0.17 9-5 63 0.67 10.0 38 1.17 10.5 21 1.67 II.O 8 2.17 11.5 2 2.67 I2.O I 3-17 327 The practical question now is to reduce this column of devia- tions to a single expression denoting the variability .of the popula- tion of which this distribution is representative. Manifestly when this is done the variability of this distribution can be compared di- rectly with that of any other distribution, and at the present or any future time. Two methods of procedure are possible in thus se- curing a kind of general expression for the average amount of deviation, giving rise to two similar but slightly different values; viz., the average deviation and the standard deviation. The Average Deviation. If each deviation (column D) repre- sented an equal number of ears this "single expression" could be readily derived by adding the deviations and dividing by the total number. But these deviations do not represent equal numbers of ears. The deviation 5.83, for example, represents but one ear '"D" indicates the deviation of the several classes from the common mean of the popu- lation 8.83 inches. Thus the first ear deviates the difference between 3 inches and 8.83 inches, or ^5.83, and being below it is written with the negative sign. Also, for example, the 21 ears 10.5 inches long deviate 10. 5-8. 8? or 1.67 inches from the mean and beingabove the mean we write it with the positive sign, and similarly for other values. 8 BULLETIN No. 119. [October, while no less than twelve ears deviated 1.83 inches below the mean and two ears deviated 2.67 inches above, with others unevenly dis- tributed. Manifestly each deviation should first be multiplied by the num- ber of ears involved, thus -, 1 The result of this calculation is that the to- i X 5-83 == 5-83 tal deviation of all the 327 ears from their av- * 5-33 = - erage length is 318.41 inches, some above and o x 4-33 = o.oo some below the mean. 2 If now we divide . , . 3 X 3-33 9 99 318.41 by 327, the number of ears involved, we. 9 x 2.83 = 25.47 have 0.97+ inches, which is a good expression J >< ^.33 == 18.64 for the average deviation of this particular I9 x 1.33 = 25.27 population. If another variety should give a 32 X 0.83 = 26.56 , ,, i ,, .. , 40X0.33 = 1320 larger quotient we should conclude it to be 57 x 0.17 = 11.39 more variable. In this manner we may reduce 63 X 0.67 = 42 21 the variability of a whole population to a single 2 i x 1.67 = 35.07 expression. 8 x 2.17 = 17.36 Standard Deviation. Mathematicians have ^^5^7= 317 another method of calculating variability. It differs from the one just discussed in only one 327 detail; viz., the deviations are squared before multiplying by their respective frequencies, thus : V f D D 2 D 2 f 3-o i -5-83 33-9889 33-9889 3-5 o -5.33 28.4089 oo.oooo 4.0 i -4.83 23.3289 23.3289 4-5 o -4.33 18.7489 oo.oooo 5.0 2 -3.83 14.6689 29.3378 5-5 3 -3-33 11.0889 33-2667 6.0 9 -2.83 8.0089 72.0801 6-5 -2.33 5.4289 434312 7-o 12 -1.83 3-3489 40.1868 7-5 19 -1-33 1-7689 33-6091 8.0 32 -0.83 0.6889 22.0448 8.5 40 -0-33 0.1089 4-356o 9.0 67 0.17 0.0289 1.9363 9.5 63 0.67 0.4489 28.2807 10.0 38 1.17 1.3689 52.0182 10.5 21 1.67 2.7889 58.5669 n.o 2.17 4.7089 37.6712 II.5 2 2.67 . 7.1289 14.2578 12.0 I 3.17 10.0489 10.0489 327 538.4103 1 When the variability is to be obtained in this way the minus sign is disregarded. The reader will note that this total 318.41 is exactly what would have resulted if we had added the deviations of each separate ear of the entire 327 measured from their average length, 8.83. 3 The column marked D 2 is secured by squaring the various deviations, thus eliminat- ing the minus sign. For example, -5.83 X -5.83 = 33.9889, etc., etc. *The column marked D 2 f is obtained by multiplying the squared deviations, each by its respective frequency, on the same principle as before. For example, 8.0089 X 9 = 72.0801,- the seventh number down the last column, corresponding to the frequency 9 and the devia- tion -2.83. I907-] TYPE AND VARIABILITY IN CORN. Dividing 538.4103 by 327 after the manner of finding the aver- age deviation we have the quotient 1.6465, but as the deviations have all been squared during the operation it is necessary to extract the square root of this number in order to arrive at the units in which the measurements were taken. The square root of 1.6465 is 1.28+, and this is the so-called standard deviation of the mathema- tician. Hence to find the standard deviation we have : Find the devia- tion of each frequency from the mean; square each deviation, and multiply by its corresponding frequency; add the products, divide by the total number of variates and extract the square root. Shortening the Method. The calculations just described neces- sarily involve large decimals. These large decimals can be avoided and the process of finding both the mean and the standard deviation can be very much shortened by assuming as a mean the nearest probable measurement as determined by inspection of the frequency distribution, and afterward applying the necessary correction. For example, in the present instance, we should judge by inspection that the mean cannot be far from 9.O 1 This we infer from the fact that the distribution reduces both ways from this point and quite evenly. Proceeding with this assumption, denoting our "guess" by G and, reckoning deviation provisionally from this point, we have the fol- lowing, using exactly the same methods as before : 2 'The advantage of assuming this value from which to reckon deviation lies in the fact that it is exact and contains but one decimal, while the true mean has at least two decimal places, making relatively large numbers to deal with. *The following table will be found useful for obtaining the squares of numbers con- taining only two significant figures-pp or 9.9, correct to three significant figures. SQUARES OF NUMBERS. .0 .1 _2 .3 .4. .5 .6 .7 .8 .9 1. 1.00 1.21 1.44 1.69 1.96 2.25 2.56 2.89 3.24 3.61 2. 4.00 4.41 4.84 5.29 5.76 6.25 6.76 7.29 7.84 8.41 3. 9.00 9.61 10.2 10.9 11.6 12.2 13.0 13.7 14.4 15.2 4. 16.0 16.8 17.6 18.5 19.4 20.2 21.2 22.1 23.0 24.0 5. 25.0 26.0 27.0 28.1 29.2 30.2 31.4 32.5 33.6 34.8 6. 36.0 37.2 38.4 39.7 41.0 42.2 43.6 44.9 46.2 47.6 7. 49.0 50.4 51.8 53.3 54.8 56.2 57.8 59.3 60.8 62.4 8. 64.0 65.6 67.2 68.9 70.6 72.2 74.0 75.7 77.4 79.2 9. 81.0 82.8 84.6 86.5 88.4 90.2 92.2 94.1 96.0 98.0 10 BULLETIN No. 119. [October, V 3-0 3-5 4.0 4-5 5-0 5-5 6.0 6-5 7-0 7-5 8.0 8.5 9-0 9-5 10.0 10.5 II.O ii-S I2.O f I O I 2 3 9 8 12 19 32 40 67 63 38 21 8 2 I 327 V-G f (V-G) (V-G) 2 f(V-G) 2 -6 -6.0 36.00 36.00 -5-5 0.0 30.25 00.00 -5-0 -5-o 25.00 25.00 -4-5 o.o 20.25 oo.oo -4.0 -8.0 16.00 32.00 -3-5 -10.5 12.25 36.75 -3-0 -27.0 9.00 81.00 -2-5 -20.0 6.25 50.00 -2.0 -24.O 4.00 48.00 -1-5 -28.5 2.25 42.75 -I.O -32.0 I.OO 32.00 -0.5 -20.0 0.25 10.00 o.o -181.0 o.oo oo.oo 0.5 31-5 0.25 15.75 I.O 38.0 I.OO 38.00 ifS 31-5 2.25 47.25 2.0 16.0 4.00 32.00 2-5 5-0 6.25 12.50 3-0 3 125.0 9.00 9.00 Difference 56.0 548.00 This method gives us- both the mean 'and standard deviation. Considering first the mean: In column f(V-G) we find that after multiplying the deviations from our assumed mean (9.0) by their respective frequencies, the sum of the negative products (-181.0) exceeds the sum of the positive products (125.0) by 56.0; that is the algebraic sum of the products is -56.0. Our assumed mean is therefore too high by the amount of -56.0 -f- 327* = -0.171. We then reduce our assumed mean by this amount (9.0 - 0.171 = 8.829) and arrive at the true mean 8.83~. 2 Considering next the standard deviation:. In column f(V-G) 2 we have 548.00 as the sum of the products of the several frequencies into their respective deviations from. the assumed mean, derived on the same plan as when working from the true mean D. Dividing by the total number (327) we have 548.00 -f- 327 = 1.6758, corresponding to the quotient, 538.4103 -f- 327 = 1.6465 of the previous calculation when working from the true mean. The correction made in the mean was -0.171, but as we are now dealing with the second powers it seems but natural that this amount be squared before it be taken from the quotient i:6/58 3 . The square of -0.171 is 0.029241 or 0.0292 +. We have therefore after applying this correction 1.6758-0.0292 + =1.6466. 'We divide by the total number (327) because we are dealing with a column of pro- ducts arising from the introduction of the frequencies. 2 On the other hand should the sum of the positive deviations exceed the sum of the negative deviations it would indicate that our assumed value is too small and we should add the correction in order to arrive at the true mean. case 3 This can be justified by a strictly mathematical proof. It is to be noted that in the of standard deviations the square of the correction is always to be subtracted. TYPE AND VARIABILITY IN CORN. 11 This agrees very nearly with the value 1.6465 previously found, but the shorter method is the more accurate because no decimals have been lost during the process. The square root of 1.6466 is 1.28+, the standard deviation sought, agreeing with the former value and derived by a very much shorter method. The first method is useful for expounding the principles involved but the latter is far preferable for actual use, not only on account of its brevity but its increased accuracy as well. The farmer is at liberty of course to choose whether he will use the average deviation or the standard deviation as an index of vari- ability. The average deviation is the simpler, but it is seldom used by mathematicians. As the results are different, generally smaller, they cannot be compared with those found in standard literature of this kind. The standard deviation, obtained by one of the two latter meth- ods, is strongly recommended. It is the one that will be used in all publications of this station. The breeder may employ either the shorter, or the longer and slightly less accurate method. The shorter method is far more convenient and is no more complicated except in making the correction and this, after a little practice, of- fers no difficulty. Practical Value of Standard Deviation. The standard devia- tion is a good measure of deviation from the mean. It is therefore a good measure of variability reckoned from that point. It is mani- fest that by the same methods we could calculate the deviation and express the variability from the mode, the selection standard, or any other type on which the mind might rest. The practical value of standard deviation is that it stands as a definite measure of variability of the population in question, and if records be kept the variability of any race may be compared from year to year. The advantage of being able to make comparisons of this sort is too obvious to require elaboration. Coefficient of Variability. It is often desirable to compare the variabilities of different characters measured in different units either within the same race or between separate races ; thus, which is more variable, the length, the circumference or the weight of ear? In such cases one standard deviation cannot be compared directly with another for two reasons : First one mean is very much larger than another, and second, they are of entirely different units, as inches and pounds, in which cases direct comparison is impossible. We seek, therefore, an abstract expression combining the idea both of. standard deviation and type. Such an expression is known as the coefficient of variabilitv and is found as follows : Divide the 12 BULLETIN No. 119. [October, standard deviation by the mean as a base and the result will be an excellent index of variability in the form of a rate percent. Thus for the case in question we have: 1.28 -~ 883 = 0.145-, indicating the variability of this population to be over 14.5 percent of its own mean. Here we have a mathematical expression for com- paring variability on an abstract basis, and by this means we can compare the variability of this population with that of any other from any race, plant or animal; and for any character of which accurate measurements can be made. For example, the coefficient of variability has been worked out for a large number of characters in man as is shown in the follow- ing table : ' Nose length 9.49 Head length 2.44 breadth 7-57 breadth 2.78 " height 15.2 Upper arm length 6.50 Forehead height 10.4 Forearm 3.85 Underjaw length 4.81 Upper leg 5.00 Mouth breadth 5.18 Lower leg 5.04 Foot 5.92 From this we note that the most variable character in these phy- sical measurements of man is the height of the nose from the plane of the face (15.2) and this is the only character that is as variable as is length of ear in the distribution now under discussion (14.5 percent). It is manifest that the variability of the nose in man or of the weight of animals could not be directly compared one with another, because the units are different and because they are reck- oned on different means, but when variability is reduced to a co- efficient then direct comparison becomes entirely possible and intel- ligible. Practical Use of Statistical Constants. The practical advantage to the breeder in being able to calculate the mode, mean, and varia- bility of the animals and plants he is breeding, and thus to know definitely their behavior from generation to generation under his methods of selection and treatment all this is too obvious to need discussion. Breeding operations in the past have lacked much in definiteness because of the inability of breeders to possess them- selves of this class of knowledge or even to appreciate its bearing upon breeding operations. The successful breeder of the future will be a statistician and a bookkeeper. He will keep himself as ac- curately and as fully informed as may be as to the type and varia- bility in succeeding generations of the breeds and strains he attempts to improve, and he will know this of all important characters that can be subjected to any form of measurement. 'See Var. in An. and PI. Vernon, p. 24. 1907 ] TYPE AND VARIABILITY IN CORN. 13 Manifestly the methods here given do not avail with characters that cannot be subjected to measurement, nor can they be employed when it is impossible to find sufficient numbers to make the calcula- tions reliable. The characters that can be classified and measured, at least ap- proximately, are however, more numerous than might at first thought seem possible. Dimensions and weights are in most cases easily taken. Gains in weight, yield of milk, rate of speed, etc., are readily handled by the statistical methods, and even such characters as color, degree of intelligence, and the like, are not impossible of classification and approximate measurement. While most characters can thus be brought into the form for statistical treatment it is useful to know that present day knowledge .of breeding operations seem to indicate that all characters, whether meaSiireable or not, tend to behave after the same general principles as to type and variability, so that we may confidently believe that every character of every individual belongs in some portion of a distribution whether the distribution could or could not be definitely written. PROBABLE ERROR Clearly no calculations based on a portion of the population can represent the entire race with absolute accuracv. If one more ear had been measured it would have fallen somewhere in the scheme of distribution, and wherever it may have fallen it would have slightly changed our calculations. When we are able to examine all the individuals involved in a problem we can of course determine absolute values to within the limits of measurement, as in the average weight of a bunch of steers or the yield per acre of a field of grain. Our present discussion, however, is of a class of problems in which we can never hope to see and examine more than a fraction of the total population, as when we ask what is the average weight of steers the country over, or the average length or weight of ears of corn. In practice when dealing with this class of problems we can do no better than to take a random sample and assume it to be repre- sentative of the entire population, accepting whatever error may be involved, and there is always an error of some magnitude, for no random sample can be assumed as being completely representative of the entire race to which it belongs. Now no method can inform us as to the exact magnitude of this error. If it could we should at once correct for it and thus come into possession of the true value ; but methods are known by which 14 BULLETIN No. 119. [October, we may judge fairly well of the degree of confidence which may be reposed in results of this kind. These methods result in deducing what is called the "probable error," written plus or minus E ( E). The formulas for calculating E are derived by mathematical methods which are too complicated for discussion here, but which are briefly stated in the appendix, with the following results : i. Probable Error of Mean. The formula for probable error in Standard deviation determinations for mean is E = \/~n~ X -O745> in which n is the number of variates examined and 0.6745 is a mathematical constant. In words, it is, Multiply the standard deviation by 0.6745 and divide by the square root of the number of variates examined. Thus in the present instance by substituting values for standard deviation and number we have E = x 0.6745 = 0.047. This probable but a small fraction of the value determined (j^a&f*and indicates that a high degree of confidence can be placed in the result. If, however, another calculation should show a smaller value for prob- able error we should conclude that a still higher degree of confi- dence could be placed in its accuracy. The reader will note that the number (n) is the only element of the formula that is under our control, others arising necessarily out of the problem. He will note, too, the overwhelming influence of numbers ; that as numbers in- crease the denominator increases and probable error decreases, and that when the number should reach infinity, E would become zero. 2. Probable Error of Standard Deviation. The formula for probable error in determinations for standard deviation is, E Standard deviation I/-25- < :6 745. In words, it is, Multiply the standard deviation by 0.6745 and divide by the square root of tivice the number examined. Substi- tuting values for the case in hand we have as the probable error of 1 28 the standard deviation 1.28, E = 77^^ X 0-6745 =0.034. 3.- Probable Error of Coefficient of Variability. The formula for probable error in determinations for coefficient of variability be- Coefficient of Variability low 10 percent is, E = j/2^~ ~ X 0.6745. In words, it is, Multiply the coefficient of variability by 0.6745 and divide by the square root of twice the number. In the case in hand, however, the coefficient of variability (14.5) is greater than ten percent and in such cases the following formula is used : E = 0.6745 j7== [l + 2 (wVl* which e( l uals -39 as the corrected probable error. 1907.} TYPE AND VARIABILITY IN CORN. 15 Meaning of Probable Error. It is important that the meaning of probable error be not misunderstood. It has no reference to er- rors in our computations, which are assumed to be correct. It is not the actual magnitude of errors made nor is it the most probable size of any mistake, neither does it set the limits within which errors must lie. Such limits cannot be set, but it does mean that the chances are even that the true value lies within the range set by E ; that is, if the determination (ac in mean) be 8*&Land the E be 0.02 then the chances are even that the true than 7.23 (7.25-0.02) nor greater than 7.27 (7.25+0.02). Of course the chances are also even that the true value may lie outside this range but these chances rapidly decrease as we increase the range. Thus the chances against the true value lying outside of twice the probable error are as 4.5 to I. The following table shows the rapid increase in the chances that the true value lies within the range set by E, 2E, etc. They are as follows :* E, the chances are even. 2 E, 3 E, 4 E, 5 E, 6 E, 7 E, 8 E 9 E, 4.5 to I. 21. to i. 142 to I. 1310 to I. 19200 to I. 420,000 to I. 17,000,000 'to I. about a billion to one. It will be noticed that by the time we have made an allowance of three or four times the probable error we have reached a chance which amounts to practical certainty, and even 21 to I involves far less chance than is involved in most business transactions. Degree of Confidence Shoivn by Probable Error. There is a popular notion that most affairs of life rest on a positive basis of fact and that unless error or chance can be entirely eliminated from our calculations no confidence is to be placed in the results. This is erroneous. A large element of uncertainty is nearly always involved in all affairs of life whether we recognize the fact or not. Problems of the class now under discussion differ from 'ordinary affairs of life, therefore, only in this, that we can calculate the prob- able error involved and by that determine the degree of confidence that may be reposed in the conclusions. If the probable error is large as compared with the determination then we have nothing but a shrewd guess unless we increase the numbers, but if the probable error is small as compared with the determination then a high de- gree of confidence may be placed in the results and the facts taught may be relied upon as practically certain. 'C. B. Davenport, Statistical Methods, p. 14. 16 BULLETIN No. 119. [October, For a graphic illustration of the meaning of =b E : Suppose in the following figure the line AB represents our determination and the lines ab and a 1 !) 1 represent +E and -E as follows : a A al b B b Now this means that the chances are even that the line which would represent the true value is not outside the limits set by the lines ab and a. 1 !)- 1 ; 'representing --E. a, a A a 1 a* If. now "we set other lines at 2 E as follows, t - b, i> B b 1 b 2 then we know from the table that the chances are 4.5 to one that the line which represents the true value is not outside the lines a l b l and a 2 b 2 , each removed twice the probable error from the deter- mination. The first of the two purposes of this bulletin is accomplished in the text up to this point and in the appendix. It remains to present certain data showing how type and variability of corn behave under various influences. INFLUENCE OF SELECTION UPON TYPE AND VARIABILITY. In 1896 Dr. C. G. Hopkins, chemist of the Experiment Station, began a series of breeding experiments to determine whether the chemical composition of corn could be influenced by selection. The mass of data which has accumulated during the ten years of the investigation affords some of the most reliable information that has ever been secured concerning the influence of selection upon both type and variability. The selections were made in four directions ; namely, for high oil, for low oil, for high protein, and for low protein, giving rise to four strains, known as Illinois' high-oil, Illinois' low-oil, Illinois' high-protein, and Illinois' low-protein. All four strains sprung from the same original stock of 163 good ears of a local strain, known as Burr's White. Table I, exhibits the distribution of the original 163 ears and the effects of ten years of selection for oil content. The column headed "Seed" gives the average of the seed ears planted both for high oil and for low oil, and the distributions opposite show the actual oil content of the entire crop of good ears raised therefrom. 1 The column headed "Average" gives the average content of oil as J These different strains were raised in small isolated plots, each ear in a separate row. 1907.] TYPE AND VARIABILITY IN CORN. 17 determined by chemical analysis of the actual ears, differing slightly from what would be the computed mean of these distributions be- cause in the distribution the ears are made to fall in definite classes. Nothing can exhibit more clearly than, this table the readiness with which the type responds to selection. In the fourth crop the high oil and low oil strains parted company, that is to say, their distributions no longer overlapped ; in the seventh year the entire low oil crop dropped below the lowest ear of the original stock, and in the ninth the entire high oil distribution was above the highest ear of the original stock. That is to say, the two strains of high and low oil, though developed from the same stock, had separated by a space wider than that covered by the original distribution or that of either strain. Both strains had entirely departed from the space occupied by the original stock and the mean oil content of the high oil corn was nearly twice that of the foundation, or almost three times that of the low oil strain. This illustrates the principle of progression as it is illustrated by no other definite data known to the writers. The distributions began from the very first to separate, and within four years the sep- aration was complete. Not only are these facts clearly established but the separation continued and all the distributions are normal; that is they slope both ways from a maximum that is not far from the middle point. The plain conclusion is that response to selection is rapid and pronounced and by persistent selection the type may be carried entirely beyond the former limits of the race. Table 2 gives in condensed form the effect of selection upon the mean, standard deviation, and coefficient of variability for all four strains of corn and through the ten years of the experiment. A glance at this table will show that the high and low protein strains followed the same general principle as indicated by the high and low oil strains. It will be noted, however, that the response to selection was upon the whole least prompt in the low protein corn. 18 BULLETIN No. 119. [October, K* r- 00 CD r- o ss ** !M CC OO (M t- H IO o: oo O** ^Hco co o -H OO O5 1 00 CD CD O x. eg D U io'-<* toco COM CD CO cocq coco CD eg i eg t-cg t-C\l 02 KJ WJ W i-5 ffij W J HJ KJ WJ KJ >-H -; I-M M rt <1> 00 1 ' r O5 00 oo O5 OO 05 05 oo O5 i i O 35 05 T CO 1-1 s a lO o O5 O5 t* O w fc M fc O M eg B I i X & /907-] TYPE AND VARIABILITY IN CORN. 19 TABUS 2. VARIABILITY OF CORN BRED FOR OIL AND PROTEIN VH C *& Percent Protein in Original Stock, 163 Ears. Mean, Standard Deviation, Coefficient of Variability, 10.930.05 1.040 04 9.50+0.35 Percent protein in High protein selection. Percent protein in I/ow protein selection. Mean. Standard Deviation. Coefficient of Variability. Mean. Standard Deviation. Coefficient of Variability. 1897 1898 1899 10.99-i-0.07 10.98-4-0.05 11 62 0.06 1.16-f-O.OS 1.22-1-0.04 1.28+0.04 10. 90-1-0.50 ll.15-f-0.33 11.000.36 10.49-f-0.08 9.59+0.06 1.32+0.06 1.01+0.04 12.61+0.53 10.50+0.42 1900 1901 1902 12. 62 -HO. 05 13.78-J-0.07 12.900.08 1.02+0.03 1.17-f-0.05 1.100.05 8.09-1-0.26 8.48-1-0.38 8 500.43 9.13-f-0.06 9.63+0.07 7.86+0.05 1.04+0.04 1.10+0.05 0.75+0.04 11.34+0.45 11.47+0.49 9.60+0.48 1903 1904 1905 1906 13.5lH-0.09 15.04+0.10 14.71-f-0.08 14.25-f-0.08 1.36-1-0.06 1.34-1-0.07 1.24-f-0.05 1.33-f-0.06 10.04-t-0.48 8.94-f-0.43 8.84+0.42 9.33-1-0.41 8.00+0.06 8.17+0.06 8.55+0.06 8.66+0.06 0.83+0.04 0.86+0.04 1.05+0.05 0.93+0.04 10.41+0.50 10.55+0.51 12.24+0.54 10.77+0.47 * s rt O O x& Percent Oil in Original Stock, 163 Ears. Mean, Standard Deviation, Coefficient of Variability, 4.68+0.02 0.41+0.02 8.83+0.33 Percent oil in high oil selection. Percent oil in low oil selection. Mean. Standard Deviation. Coefficient of Variability. Mean. Standard Deviation. Coefficient of Variability. 1897 1898 1899 4.79+0.03 5.10+0.02 5.65+0.03 0.38+0.02 0.48+0.02 0.42+0.02 7.87+0 42 9.33+0.30 7.47+0.34 3.95+0.02 3 85+0.02 0.32+0.01 0.32+0.01 8.13+0.37 8.42+0.33 1900 1901 1902 6.10+0.03 6.24+0.03 6.25+0.04 0.44+0.02 0.45+0.02 0.50+0.03 7.26+0.33 7.26+0.31 8.06+0.41 3.57+0.02 3.45+0.02 3.00+0.02 0.36+0.01 0.26+0.01 0.32+0.02 10.13+0.40 7.59+0.32 10.84+0.55 1903 1904 1905 1906 6.51+0.03 7.12+0 04 7.30+0.03 7.37+0.03 0.46+0.02 0.58+0.03 0.55+0.02 0.45+0.02 7.07+0.34 8.19+0.39 7.47+0.33 6.15+0.27 2.99+0.02 2.91+0 02 2.56+0.02 2.66+0.02 0.23+0.01 0.25+0.01 0.28+0.01 0.31+0.01 7.83+0.39 8.45+0.40 10.86+0.55 11.69+0.51 This table shows the effects of selection. It will be noted that the means steadily increase or decrease with selection, but that the variability coefficient of variability does not greatly change. This all shows that the effect of selection is to shift the type without sensibly reducing variability. The chief interest in Table 2 is with respect to variability as shown by the standard deviations or better yet by the coefficients of variability of the different strains. On this point one fact is clear cut and significant; namely, while the different strains differ as to variability, the high oil upon the whole being least variable and the low protein most variable, yet in every instance the varia- bility was not sensibly reduced during the ten years of rigid selec- tion. True, it fluctuates from year to year but rarely more than is ac- counted for by the probable error, and it cannot be said from these figures that the effect of selection is greatly to reduce variability. This agrees with other modern studies in the field of breeding and 20 BULLETIN No. 119. [October, in connection with the data given in Table i, tends strongly to con- firm the statement that in general the effect of selection is to shift the type without greatly altering variability. All of this means that after great improvement has been secured there is still left abundant variability on which to base future selection, and that if the limits of improvement are ever reached it will be for some reason other than the failure of variability. INDIRECT EFFECTS OF SELECTION Table 3, gives the physical characters of these four strains of corn for two separate crops, 1905 and 1906. By this we see that the effect of selecting for chemical content has been also to alter the physical characters of the different strains ; that is to say, while the ears differed in the two years, 1905 and 1906, yet in both cases the low protein strain had the longest and the high oil strain the short- est ears ; the low oil corn had the largest and the high protein the smallest circumference; the low oil corn was the heaviest in both years and the high protein the lightest. Aside from this there seems to be a tendency to affect the number of rows. In any event in both these years the high oil corn had the largest number of rows and the low oil corn the smallest number of rows. While these two years are not enough to determine whether these differences will re- main permanent with these strains, the data presented are certainly sufficient to show that these four strains of corn are coming to differ decidedly in respect to physical characters though not as widely as they differ in their chemical composition, for which they were se- lected. TABLE 3. EFFECT OF SELECTION FOR CHEMICAL CONTENT UPON PHYSICAL CHARACTERS OF CORN L/ENGTH OF BAR. Chemical strain. Crop of 1905. Crop of 1906. Mean. Standard deviation. Coefficient of variability. Mean. Standard deviation. Coefficient of variability. ligh Protein 7.210.04 1.27+0 03 17.6+0.4 7.8800.043 1.201+0 030 15.24+0.39 vow Protein . 7.800.04 1.540.03 19.7+0.4 8.841+0.050 1.3640.035 15. 43+0. 4S lighOil 6.870.04 1.390.03 20.20.4 7.6060.042 1.014+0.030 13.330 41 vow Oil 7.48+0.04 1.30+0.03 17.40.4 8.3780.056 1 5100.040 18 02+0.49- Library, New Mexico State College 7907.] TYPE AND VARIABILITY IN CORN. TABI,E 3 CONTINUED. CIRCUMFERENCE OF EAR. 21 Chemical strain. Cropjjf 1905. Crop of 1906. Mean. Standard deviation. Coefficient of variabilit}^. Mean. Standard deviation. Coefficient of variability. High Protein 5 760.01 0.440 X 01 7.60.2 5. 863 0.014 0.3930.010 6.700.17 Low Protein. 6.510.02 0.610.01 9.40.2 6. 495 0.018 0.4780.013 7.360 19 High Oil Low Oil 6.050.01 6.650.02 0.530.01 0.590.01 8.80.2 8.90 2 6 1340.017 6.7170.018 4170.012 0.5150.013 6.800.20 7.520.19 WEIGHT OF EAR. Chemical strain. Crop of 1905. Crop of 1906. Mean. Standard deviation. Coefficient of variability. Mean. Standard deviation. Coefficient of variability. High Protein Low Protein. High Oil. '7.530.04 9.660.10 7.790 07 9 840.08 2.500.03 3.300.07 2.430.05 2.870.06 33.20.4 34.20.7 31.20.6 29.20.7 8.2890.081 10.770.11 8.8500.075 11.500.13 2.0610.057 2.6800.071 1.7630.053 3.3490.091 24.860 72 24.880.74 19.920.63 29.130.84 Low Oil NUMBER OF Rows IN EAR. Chemical strain. Crop of 1905. Crop of 1906. Mean. Standard deviation. Coefficient of variability. Mean. Standard deviation. Coefficient of variability. High Protein 13.720 03 1.850.02 13.50 2 13.7740.060 1.7070.042 12.390.31 Low Protein.' 14.170.06 1 940.04 13 70 3 14 5970.072 1 9220.051 13.170 35 High Oil 15.650.06 2 080.04 13.30 3 14.7120.073 1.802 0.052 12.250.36 Low Oil 12.800.05 1.770.04 13.80.3 13.3100 067 1 8970.047 14.250.38 22 BULLETIN No. 119. [October, EFFECT OF FERTILITY UPON TYPE AND VARIABILITY OF CORN Upon this point the experiments of Dr. Hopkins upon soil fer- tility afford considerable information. Tables 4 and 5. M O ?? O g to i-i ft 8 W PH ?i o W < (4 fl !5 o n 3 p 4-1 +1 il ro 00 rH ^O ^ **1" 41 41 41 41 41 oo v/> 41 8 41 rH O cti fO C^l C*l in d t^ l<"i vo i^ rf- 8 ** ro ro ro ro ro ts r cs| to OB a 00 ro oo t^ rH .Jj C/3 'S \O UO 00 1> oo t- GO (O T3 N M M fM r-i n <^i M M to H rH rH M N rH o rH in to "C ra odd 41 41 41 Q VO t^ d 41 d 41 d 41 d 41 CO 41 d 41 d 41 j^ . rO -t- f>. n O rH > H If) IT) *- o 00 00 00 X ^ ^ >sium*. . . 1 1 i cc o OH '^n re "o O OH O 5 01 'J) to X OH 6 i 4- P a ^ i i > 4 < td B S F P V fl s ^3 1907.} TYPE AND VARIABILITY IN CORN. 23 * w PQ H Coefficient of variability. ^ CO cO ^t" co cO CO rO CO ^J* d. dddoddddd 41 41 41 41 41 41 41 41 41 41 cO-*vOrH>Ot^OOcOvocO TH CO ON rH J> d JS- rH CO ON Tt-cOcO^-^cOfOcOcOC-l cfl c3 0> tc o Standard deviation. O VO i>* TH O ON ^f O 1 C-1 rH t^ vo vc !> t>- vo vo vo vo t-~ ddddoddddd 41 41 41 41 41 41 41 41 41 41 O> C-l rH O '^ t** VO VO t^* VO ON cO O t* ^" ON C^ 00 rH c-1 ^cOVO-^-vOcOvOcOTj-cO S3 ri -^ ^H O^ *O CN 00 ^ J> ?H ClOOTHO^VOrHVOO^ O "^ O^ iO GO O ^O r4 rH O\ |>. 1>- !> !> !> GO 00 00 00 "** c3 Coefficient of variability. iO tO ^J~ !> ^ ^? G> rO ^^ GO ddo'odddddd 41 41 41 41 41 41 41 41 41 41 C-IVOC-1OOOTHOOONC-1 C-lOONl>rHrHOONOOrH ONOOCOONONOOCOVO !> ON H vO^vOVOiOcOcOC^^hOO inference < Standard deviation oooooooooo dddddodddd i 1 S oo i 1 S S i 1 ON' tt 1 J; 1 dddddodddd Mean. dOrHCOrHOOONrHONVO OOOOOOOOOO dddddodddd 41 41 41 41 '41 4! 41 41 41 41 THrHCOO-*ONCOlO>OrH T$- ri vo CO O CO vO O vo VO VOVOVOVOVOVOVOVOVOVO Coefficient of variability. l>-cOOOrHOVOVOOO vO vo VO vo -^ ^}* ^ ^ VO dddddodddd 41 41 41 41 41 41 4! 41 41 41 ONTHVOOOVOOOVOC-IOOO C-1 O O C-1 ON VO vo * CO C-1 ClddC-lTHrHrHTHrHd a V 'o 1 Standard deviation. Cl^OOrHt^cOC-lVOI>O *cOCO-*cOcOcOcOcOvo OOOOOOOOOO dodo 'odd odd 41 41 41 41 41 41 41 41 41 41 O ^C 1>- ^O O\ 00 tO ^t~ 'O ^O Cl ^O d r-< VO "^ \O t** *O uO O^^O'O^frOC'lrO^J't^ d D t^rO^QOC-lO^rHcOTH lOiOiOU^iO^"^ 1 ^)^^ o o o o o o o o o o ddddoddddd 41 41 41 ^41 41 ^41 41 41 41 41 rf rC CO 1^- rO H to 00 00 00 O Cl *^" O *-O O TH CS O ^C ^Id rH Cl tO "^ ^-O VC I""* 00 C^ O O O O O O O O O O ^H M ci ci ri ci n ci ci M n 24 BULLETIN No. 119. > < M 5* ON ON ON ro ON 00 ON 8 ro TH o TH in ounces. Coefficien variabili O O O 41 41 41 ON ro 00 t> TH 00 O ON ON TH 41 00 ro O 41 TH TH ro 41 41 41 ro 41 ro 41 % individual ears Standard deviation. C-l TH ON \O "O to O O O odd 41 jhl 41 8 d 41 TH i d 41 ro rO 8 d 41 Tj- 1 d 41 Cl d 41 8 d 41 o d 41 o III i TH B TH ON O 00 i i bD c ed o o o 41 4! 4! o <* o VO f>J VO ro TH 1> t^ i> vd 6.5910 7.4930 41 00 O 41 00- TH o 41 rO o 4! ON vd 6.2320 1 ' > 3 H D H 4 ro ON ro 00 TH rh to to to * oo ro ro oo TH 00 to ON o\ . ssium . ,sium. . . B m 3 . CC O O o< C o 5 4 H None k . . Legume Manure 1 -a c rt e be V Manure and lime . . Lyegume, lime, and Manure, lime, and p 01 J3 a aT g oT g be a Manure, lime, phosj 1" 1 i i B Ed tj number. o o o & to g 1 s to i g O 1907.} TYPE AND VARIABILITY IN CORN. 25 "B > ^^^^h^^^^^^ q> .^ OOOOOOOOOO 'S MH 3 rt 41 41 41 41 41 41 41 41 41 41 o i MM^J-'J-^tCr'JNrOrO >O O ^5 f-1 rH rH a "2 c g g g g '^ g a fc.2 13 -M oooooooooo 41 41 41 41 4! 41 41 41 41 41 a !> ojSr^ioSSScoS! *.0 M^ciciciweiMcici K |gg|gg|o| c oooooooooo ri .tJ ssssasssss oooooooooo 1 *0 "rt 41 41 41 41 41 41 41 41 41 41 ri o 'C W 1> rH 00 r3 rH V o p oor-coi>ccooccc*^c* ^f^^r^f^ff)t^^-ifirf) 1> Eg OOOOOOOOOO w SJ oooooooooo V c .5 $ o o S i 1 i 1 d 1 i 1 $ i 1 MH ^^ dddddddddd V- oooooooooo c oooooooooo V S S S o o i 1 tt 1 tt 1 S 1 dddddddddd i 1 i 1 S oo i 1 o i 1 i 1 i 1 i 1 o 'C t^^^o6a;^c;^ri^ u rH M M M M *o ** w * 0) "2 o o o o O rt . oooooooooo - "E 2 41 41 41 41 41 41 41 41 41 41 s rt > $Do3>S^8SSo B 0) co + %} n That is, in other symbols n c r (/^) n where the symbol "c,.. indicates the number of combinations of n things taken r at a time. In order to emphasize the fact that there is a much greater proba- bility of getting an almost equal number of heads and tails than of getting widely different numbers, and in order to lead up to the normal probability curve, we present the following table for n = 999 obtained from Quetelet's Lettres sur la Theorie des Probabilites. As indicated in the table, columns I and 2 give the number of heads and tails whose probabilities are in question, while column 3 gives the corresponding probabilities : 1907 ] TYPE AND VARIABILITY IN CORN. 31 1 2 3 1 2 3 499 500 0.025225 450 549 0.0001863 490 509 0.021069 440 559 0.0000209 480 519 0.011794 430 569 0.0000016 470 529 0.004423 420 579 0.00000004 460 539 0.001110 It may be observed from this table that in the long run one should expect 499 heads and 500 tails more than 600,000 times as often as 420 heads and 579 tails. In Figure 2 the results in this table are presented graphical by taking as class marks the various combinations |QO 5o> etc., and taking as ordinates the probabilities recorded in columns 3 of table. If we had taken all the intermediate integers from |S8 to 1% we should have had ten times as many points which would arrange themselves on the curve in Fig. 2. By increasing the number of coins and decreasing the horizontal scale we can get the plotted points as close together as we please. The curve so obtained is known as the normal probability curve. The curve in Fig. 2 is a close approxi- mation. Clearly, the probabilities can be converted into frequencies by multiply- ing each of them by the same large number, and then we obtain the normal frequency curve identical with the probability curve by merely adjusting the scale. The causes of deviations in the case of biological measurements are an- alogous to the causes which produce deviations in the tossing of pennies, and it has, furthermore, been found by experience that the frequency curves of many populations obtained in biology follow the normal probability curve. While more will be said later about distributions which are not normal, for the pres- ent, let us assume that we are dealing with normal distributions, and proceed to justify the standard deviation as a measure of variability. Geometrical Meaning of Standard Deviations. It should be noted that there are two points, A and B, Fig. 2, on the normal frequency curve such that, as we follow the course of the curve from left to right, the curve changes at A from concave upward to concave downward, and it changes at B from concave downward to concave upward. Such points on a curve are called points of in- flexion. The important fact is that y z the distance between these two points is the standard deviation* of the population represented by the frequency curve, and that this distance determines the curve in a manner analogous to the way in which the radius determines a circle. For this reason, we can say that the standard deviation is a perfect measure of variability for a norrfial distribution. For, when it is given along with the type, we can draw the curve which is com- pletely descriptive of variability. In other words, the form of the population can be reproduced. This completely justifies the use of standard deviation as a measure of variability for a normal distribution. When the distribution is not normal, the standard deviation can at most be considered as only approximately descriptive, but it is always a significant meas- ure of variability. If along the base line of the probability curve (Fig. 3) we measure distances in terms of standard deviation so that when x is any horizontal distance OP in ordinary units, and a the standard deviation in the same units, we can present 'This is proved by methods involving the calculus. BULLETIN No. 119. [October, the following useful table of areas which correspond to ^ For a value of x = OP the area concerned is bounded by the base line, the probability curve, OY, and the line through P parallel to OY. These areas are given in Table I for various values of ^ and with such a unit of area that the total area under the curve is unity. TABLE 1. AREAS CORRESPONDING TO X ve which % of the population lies. Then B "^ Sl is the probable error of a single variate. Applied to the population given on p 27 we have that S = 7.88 S 1 = 6_6S S S 1 = 7.88 6.65 = 1.23 -^ = 0.61 Probable Error of the Mean. Given the mean of n variates taken at ran- dom from a larger group, we set the problem of finding the probable error in the mean. Imagine that we continue selecting random samples of n variates from this group until we find a considerable number m of means from these different samples. Let M,, M. a , M m be these means. They will not all be equal to each other but will themselves constitute a population which can be repre- sented by a frequency curve. Such a frequency curve of means will, of course, be much steeper than the frequency curve of the original observations. The standard deviation of this population of means can be shown, by calculus meth- ods, to be equal, to the standard deviation of the single population divided by the square root of n. Now, we can apply to this population of means the same definition of prob- able error that we have applied to a single observation, and if EM represents the 0.671 DO- probable error of the mean, EM = / [/ n Probable Error of Standard Deviation. If we have found the standard deviations of m different population each of n variates and obtain a lt ff,, . . . , er n . These constitute a population whose standard deviation can be shown to be vo d d o t> VO o 01 x -o r-i d d d d O TT d en W IH * 01 OI rH O * I- -3- M rH rH rH c^ rH vo O O CO J.^ vc odd S O ON vo ro O * *** dodo X CO d s rj- rH 00 * * * ro d o'd o VC ro d > to Q M W O LO I> CS IO fN r-1 rH rH O rH 00 rO vo i^ vo vo odd ON * d vo CM ON l>- * rt ro ro d o'd d vo rO O tn M W rH M SOO rH * M N rH rH s rH -fr O M 000 i> * d ro ON 1> vo * ro ro ro o,d d d ro ro - p 8 00 O * C^ Tt- rH rH rH t-H ON ON d Ol>- O t^ vo vo odd 5 d O !> vo rO <* ro ro ro d o o d rH ro d & t/3 & rH CO rO X 00 CO O rH rH rH 3 O VO * t- vo vo n- o'o d C-l * d X vo ro rH ro ro ro ro dodo 8 o 5 HH * 00 TH t^ 10 n t^ CM C rH rH rH s ro rH * VO vo * d o d o * 6 vo ro rH O ro ro ro ro d d o d oo CM d > M M 4-J C u l> T-l t^ 00 VO O rH O\ rH rH O CO ON 00 CM VO T- ^ 000 r r5 O * CM ON X CO rO CM CM dodo S d O fa o 1_ V VO TH t- rH O LO rH ON rH rH O 8 O vo vo ON vo ^J- ro o d o vo eo O CM ON X VO ro CM CM CM d o'd d vo CM O > 1^ .5 vo TH VO rO * * O 00 rH rH O rO ^ CM CM 1> vo Tf ro d'd d ro O X vo <+ rO CM OI OI dodo ro CM d M B >. (-> * rH $ rH O O 00 * o x ~ ^t- * ro ro odd s O X vo -f ro CN OI CM CM dodo CM 01 3 M * < .0 .5 'C ro rH VO ON rO C-) 00 1> TH O O r<5 * O iO VO CM Tt ro ro d o' o' oo o VO * CM rH CM C-l CM CM d d o" d o OI d > ft a > 4H M rH VO M t^ rH 00 VO rH 00 * rH Th ON * rO CM odd VO CM o. * Ol rH ON CM CM CM rH d d o' d X rH O {H O rH VO 'O rH O t- vo ro vo X> rH t- rO ro CM Tt * CM CM O ON X CM 01 rH rH t^ rH fc C rH 00 odd d d d d d O W M M 4J "3 O rH VO 00 VO O^ VO i/5 odd 00 * o ** x ^i- rO CM CM odd CM O O X l> vo O) rH rH rH o'd o d vo O to W 0) O ON t- rH O 00 VO 10 odd <3 t O rH VO CM ro CM CM odd ON 1 X O W $ rH rH rH rH o'o d d <* rH d O i^ rh n- ?8 l> CM ON t> VC vo * ro 01 w GO d o d o odd d d H to I> I> t>- ON vo * . O <^3 LO rj- rO d o d ON n o O I> rh M rH rH c d d ro -1 O 01 rH O O rH rH rH rH d d d d 8 Q as 'O oo ^ oo * rO ct d o o * M o t- -t ri T-H rH rH d d d t-H rH O ON X X rH O O O d d od $ W W H> "* X 1> M ro C<1 CN d o o 3 s rH o ro rH O rH rH rH odd s X I> t^ vo 0000 8 o B ! a ro ON O I> M M rH odd ^J* rH d o go t^ rH O O d o o" s d vo vo vo vo 0000 o d d d iO o d X & C) O rO rH rH rH rH odd o o 0-33 d d d s d TJ- ^J- ro ro O O O O d d o d 8 d oi rH 2S8 odd s rO ro CM O O O d d o" 8 d 8SSS 8 d TABLE Number OI variates iO O VO r vo t> rH CM ro * , 88S8 O l^. X ON rH 38 BULLETIN No. 119. [October, In beginning this investigation, the measurement of length and circumfer- ence were taken to the nearest tenth inch. Great care must, however, be taken to get such a close measurement, and, in general, results derived from such masurements are no better than results derived from measurements taken with less apparent accuracy. In fact, our experiments have shown that length may well be taken to half inches, circumference to .three-tenths inches and weights to ounces. The closeness of measurements is closely connected with the question of grouping measurements into classes. c) Grouping into Classes. In forming the frequency distribution the meas- urements are grouped into classes as has been shown in this bulletin. There is no object in taking measurements with extreme accuracy and then grouping them into broad classss. In fact, the nature of the frequency distribution with a given grouping must help to settle the question of grouping, and this in turn the close- ness of the measurements. In short, measurements should be so grouped as to show the variability and at the same time to leave the frequency distribution fairly smooth. In the matter of grouping, there are two opposing tendencies grouping into too few classes to show variability, and grouping into too many classes to give a smooth distribution. In short, the law of distribution is hidden because of too much detail. We may lay it down as a general rule that the classes should be only just broad enough to make the distribution fairly smooth, that is, there should be no vacant classes except very near the extremes of the range, and a gradual in- crease from one extreme up to a maximum and then a gradual decrease to the other extreme, if there is only one maximum in the distribution as is, in general, the case with these populations. In respect to grouping into classes the characters treated in this bulletin, we have settled upon one-half inch classes for length of ears, three-tenths inch for circumference, one ounce for weight and even numbers for rows. This classifi- cation or grouping was decided upon after experimenting with classes taken at more frequent intervals. There is a further danger of error in grouping besides the narrowness and broadness of classes. For example, at first we measured ears to the nearest tenth inch in length, then suppose we had made quarter inch groupings as follows : 4, 4.25, 4.50, 475, 5.00, 5.25, 5.50, 5.75, 6.00, etc. At 5.75 would be grouped all ears which measured 57 and 5.8 while at 5.00 would be grouped those which measured 4.9, 5.0, and 5.1. In the long run, this would clearly result in placing more ears at 5.0 than at 5.25 other things being equal. If we should group measurements taken to the nearest tenth inch in 0.5 inch or 0.3 inch classes, no such difficulty arises. Such a grouping as that into quarter-inch group would not greatly disturb the mean and variability, but would destroy the smoothness of the distribution. Again, if we measure to quarter inches, but group to half inches, some measurements fall on the division lines between classes. Then one-half a variate may be recorded in each of the classes between which the variate falls, or if we are dealing with large numbers one can alternately put such a variate into a class above, and below such a meas- urement. While many other questions may arise in taking the measurements of a certain character, this brief discussion covers the main difficulties in obtaining measurements for this bulletin.