^^^ uA LOS ANGiiLEb LIBRARY INTRODUCTION TO ECONOMIC STATISTICS INTRODUCTION TO ECONOMIC STATISTICS BY GEORGE R. DAVIES, Ph. D. PROFESSOR OF SOCIOLOGY, UNIVERSITY OF NORTH DAKOTA t W i »f^ < jj*W\i **t NEW YORK THE CENTURY CO. 1922 91409 Copyright, 1922, by The Century Co. Printed in U. S. A. "K .%;: . .5 , PREFACE ' The application of statistical methods to special fields such as demography, education, and economics has in recent years advanced very rapidly. As a re- sult a study of statistical principles must be confined chiefly to one of the special fields if it is not to be lost in the multiplicity of specific methods and illus- trations. This text-book has been written with the in- ►i terests of the student of economics in mind. A common difficulty which the teacher of statistics encounters is a lack of provision for laboratory work. An attempt has here been made to supply this need by furnishing illustrative problems, graphs, and data which may be worked over by the student, and by add- 1^5] ing to each chapter a list of related exercises. The ^ exercises will be found extensive enough so that the teacher may select those which are adapted to his re- quirements. The longer problems should be subdivided, and the parts assigned to different members of the class. Both the tables and the exercises may very well ^ be supplemented by the use of data drawn from such ^ sources as the Survey of Current Business, the Monthly Labor Review, and the Statistical Abstract of the United States (Superintendent of Documents, Govern- ment Printing Office, Washington, D. C). The topics covered in the text represent probably a maximum of what can be mastered by a college class in a term. Perhaps it may be found advisable to omit certain topics, such as interpolating for quartiles, theories vi PREFACE of price indexes, parabola trends, seasonal variations, and the more complex methods of correlation. Nearly all the material here presented has been ac- cumulated from experience in the statistical laboratory and class-room. Particular attention has been given to the requirements in respect to fundamental theory of the statistical departments of the larger banks and business houses. Some of the recently developed methods of handling business barometers have there- fore been touched upon, and some attention has been given to the theory of price and production indexes. The book is an outgrowth of the undergraduate course in statistics given by the writer at Princeton University during the school year 1920-1921. This course was modeled in its general features upon the course given the preceding year by Professor J. H. Williams, now of Harvard University. The writer wishes to acknowledge his indebtedness to Professor Williams for the general plan of the laboratory exer- cises, as well as for many specific suggestions. Thanks are also extended to Professors F. A. Fetter and E. W. Kemmerer of Princeton for their interest and en- couragement, and to Professor W. F. Willcox of Cor- nell who read the first draft of the manuscript and made several valuable suggestions for its revision. In- debtedness is also acknowledged to the following for their kind permission to reprint data: Mr. Roger W. Babson, Professor Stanley E. Howard, Bureau of La- bor Statistics, National City Bank, National Bureau of Economic Research, National Industrial Conference Board, Review of Economic Statistics, and the Quar- terly Journal of the University of North Dakota. CONTENTS CHAPTER PAGE I Tabulation 3 II Types and Measures of Dispersion .... 20 III Indexes of Wages and Prices ...... 47 IV Quantity Indexes and Their Uses ..... 74 V Trends and Cycles 100 VI Correlation 131 Appendix I Laboratory Material and References . . . 153 Appendix II Tables of Powers and Roots 157 Appendix III A Picture of the Progress of the United States During 120 Years of National Life . 160 Index 161 INTRODUCTION TO ECONOMIC STATISTICS INTRODUCTION TO ECONOMIC STATISTICS CHAPTER I TABULATION The term ** statistics" when used to designate a branch of study, implies an exposition of certain methods employed in presenting and interpreting the numerical aspects of a given subject. The science of statistics consists, therefore, of principles and methods, rather than of data. The principles are essentially the same whether the application is made to biology, demography, education, or economics. But the de- tailed methods in these and other fields have in late years become so specialized that it is hardly practi- cable any longer to study statistics in the abstract. The field of application here adopted is chiefly that of general economics.^ Illustrations will be given of the methods employed in organizing data, in computing and employing indexes, and in measuring trends and correlations. * It should be noted that economic statistics are commonly distin- guished from business statistics. The former subject studies general market conditions, while the latter subject deals with the details of a specific business establishment, and is therefore an adjunct of account- ing. Business statistics vary so greatly from one establishment to an- other that it is difficult to generalize from them. Their problems consist largely of the application of statistical principles to specific situations. For a discussion of the distinction see "The Scope of Busi- ness Statistics," by R. P. Falkner, in the Quarterly Publications of the American Statistical Association, June, 1918, pp. 24-29. 3 4 INTRODUCTION TO ECONOMIC STATISTICS Preliminary Schedules. Statistical field work com- monly begins with the preparation of schedules in which to enter the desired data. These schedules may be in the nature of questionnaires, or they may be mere forms in w^hich to copy certain records. Since such schedules will vary with the particular task in hand, few rules can be laid down for their preparation. Ex- perience will teach, however, that they must be very carefully worked out in advance. In the first place, it must be determined as precisely as possible what data will be needed. If the schedule is in questionnaire form, great care must be taken to make the questions unambiguous and easily comprehended by those who are to answer. Errors may often be checked by asking a question in two ways, at different places in the list ; as by calling for both the age and the date of birth. After the preliminary schedules have been compiled, the statistician begins to organize his data, so that con- clusions may be deduced and presented in simplified form. In so doing he will make use of the processes of tabulation. These may best be explained by taking an example. As such an example, we shall choose certain wage schedules which have been recorded in the Aldrich Report on ''Wholesale Prices, Wages, and Transportation" (Senate Report No. 1394, dated 1893). The data here used will be found in Vol. IV, pages 1463-1497, and refer to a Connecticut woolen mill designated as Establishment No. 86. The wage rolls as recorded in the report cover about half a century, ending in 1891. They show the daily wages paid to each class of workmen employed in the mill in January and July of each year. We shall select TABULATION 5 for tabulation only the wages paid in July of 1870, 1880, and 1891. The Primary or General Purpose Table. In tran- scribing the selected wage schedules into a primary table, it will be found advisable to edit the original figures by making certain minor modifications. An in- spection of the schedules will show that most of the wage rates are expressed as multiples of five cents. Of course, we may assume theoretically that the exact economic values which are approximated in the actual wages must form a continuous series, instead of one having a regular interval. That is, if the wages could be expressed as theoretically exact values, and if a very large number of workers were involved, the rates would be separated by intervals of only a small frac- tion, of a cent. The case may be compared with the measurement of the height of the individuals in a large group. If the measurements are taken with very precise instruments, the results will be expressed in hundredths or perhaps thousandths of an inch. But for practical purposes measurements to perhaps the nearest quarter inch are sufficiently accurate. In the same way when the employer made an offer of wages he set his figure at a multiple of five cents, or in the case of the larger wages at a multiple of twenty-five cents. For his purpose, such an estimate of the market value of labor was sufficiently accurate. The exceptions will be found to be rates that were paid to an especially large number of workers. In such cases differences of a cent or two are of considerable consequence. Continuous ajid Discrete Series. A series of meas- urements or values occurring only at more or less reg- 6 INTRODUCTION TO ECONOMIC STATISTICS ular intervals is said to be discrete. Sometimes a series A\all be naturally discrete, as when flowers are clas- sitied by the number of petals, or when the spots in a number of dice-throws are tabulated. But a series which is theoretically continuous becomes artificially discrete when a limit of accuracy is determined upon, as when height is measured to the nearest quarter- inch, or wages are expressed in multiples of five cents. Our series of wage rates, as it stands in the records, is therefore discrete at intervals of five cents, except for a few items. For purposes of classification it is desirable to have our series of wage rates regularly discrete throughout. Since it is not possible to break down the five cent intervals to smaller ones, it will be necessary to modify such rates as 67c. and $1.28 so as to classify them as multiples of five. If there were only a few scattered cases .of this sort affecting only a small number of workers, they might merely be entered at the nearest five cent intervals; that is, 67c. could be entered as 65c. and $1.28 as $1.30. But since the number of work- ers at these irregular rates is exceptionally large, such a procedure might result in too great a degree of inaccuracy. We shall therefore apply a familiar arith- metical device and break up the given number of work- ers at each inconvenient rate into two groups, one having a higher and one a lower rate than that stated, but maintaining together the same average wage. The procedure may be illustrated as follows : {9 workers at 65c. -f 6 workers at 70c. TABULATION This result is obtained by taking 3/5 and 2/5 of the fifteen workers (the fractional parts of the five cent interval which lie between 67c. and the nearest mul- tiples of 5) and placing the larger of the two results at the rate expressed by the multiple of five nearest to 67c., that is, at 65c. The smaller of the two results is TABLE I WAGE EOLL IN A CONNECTICUT WOOLEN MILL JULY CTF SPECIFIED YEARS OCCUPATION Burlers Card cleaners. Card tenders. . Carpenters Cloth Inspectors. Drawers-in Dressers. Djera. Firemen Foremen-burlers . . . . Fullers and giggers. Hand«ra-in .... Harness hands. Loom fixers. . . 1870 NO. WAfiB $ .80 .85 .70 1.10 1.25 .55 .60 .65 .80 2,75 1.25 1880 NO. WAGE 1.35 1.50 1.50 1.10 1.15 1.25 1.35 1.50 .40 1.50 1.10 1.15 11* 1 1 1* 1* 2 2 ^ .80 .85 1.15 .60 .65 2.75 1.60 1.90 1.95 1.50 1.55 1.25 1.30 1.05 1.20 1.25 .40 1.95 1891 NO. WAGE 14^ $1.10 1 1.10 1 1.15 1 1.20 2 1.25 1 .75 1 .85 1 .90 1 1.00 1 2.75 1 1.90 1* 1.75 1* 1.85 4* 1.90 1 1.25 2 1.60 1 1.65 3 1.75 1 1.15 13 1.25 1 1.40 2 1.50 3 1.50 1 2.25 2 1.05 4 1.10 3 1.15 3 1.25 1 1.50 5» .50 1 1.50 1 2.00 2 2.10 4 2.20 8 INTRODUCTION TO ECONOMIC STATISTICS TABLE I (Continued) OCCUPATION Machinists Machinists — helpers Number sewers Overseers — Carding dept . . " Dyehouse " Finishing dept. * * Fulling dept . . . " Spinning dept. " Spooling dep't. , ** Weaving dept. Piecers Second hands. Sewers Shearers . Sorters. . Speckers. Spinners — jack and mule. Spoolers Teamsters. Twisters . . Watchmen . Weavers . Weavers — pattern . Winders Yam carriers Totals. . . 1870 NO. WAGE 1 1 1 2 1 1* 3 7 4* 3* 19 5 1^ 1 108 ^2.75 1.75 3.50 3.75 2.50 2.75 2.75 2.25 3.00 .75 .80 1.50 1.75 1.15 1.40 1.50 2.00 2.75 1.35 1.75 1.80 .60 1.50 1.50 1.05 1.10 1.30 1.35 1.00 1.25 1880 NO. WAGE 1 1 1 1 1 1 1 2 8 1 1 1* 2 1 1* 2* 1 5* 4* 1 7* 1 79 10' 2' 213 1891 $3.00 3.25 3.25 3.00 2.50 3.00 2.25 3.00 .70 .75 1.25 1.50 1.25 1.25 1.80 2.75 .90 .95 1.50 .65 .70 1.50 .90 1.50 1.20 1.40 1.45 .95 1.00 1.75 2 1 2* 2* 4 7 9* 14* 1 4* 1 1 1 89 60 3 2 4 1 1 12* 17' 1 361 NO. WAGE $2.75 2.10 .90 4.00 4.25 3.50 2.50 2.75 2.50 3.00 1.75 1.00 1.25 1.35 1.70 2.75 .80 .85 1.25 1.30 .75 .80 1.60 .90 1.30 1.35 1.40 1.30 1.35 1.70 1.75 1.25 1.35 1.50 1.75 2.00 1.15 1.20 1.85 Female employes. TABULATION 9 placed at the rate of 70c. That the average wage has not been changed by this operation is shown by the fact that 15 X 67c. = 9 X 65c + 6 X 70c. By the foregoing method all irregular wage rates may now be reduced to approximately equivalent mul- tiples of five cents. Of course, when a fraction appears in the operation the nearest whole number is taken. Thus modified, our wage schedules appear as shown in Table I, which may be taken as an example of a primary or general purpose table. The Frequency Curve. The tabulation which we are about to undertake has for its immediate object the presentation of the frequency distribution of the wages in question. Since the concept of a frequency distribu- tion has a concise theoretical basis, it will be of ad- vantage to turn briefly at this point to the theoretical aspects of the subject. If we take the square of a binomial, as ar -f 2ab -{- b-, we have three classes of values as expressed by the letters and their exponents, and these classes have frequencies expressed by their coefficients, 1:2:1. If instead of the second power we take the fourth power of the binomial, we have five classes of values, having frequencies respectively of 1 :4 :6 :4 :1. These frequen- cies graphed as vertical blocks will form a figure such as is outlined by the dotted line in Fig. 1. If instead of the fourth power of the binomial we should take the thousandth or millionth power, the steps in this blocked frequency polygon would practically disappear, and the figure would approach a smoothed bell-shaped curve as indicated in the same figure. This theoretical 10 INTRODUCTION TO ECONOMIC STATISTICS distribution of classes of values, or something similar to it, may be discovered to exist very generally in natural and social phenomena, and is also the expres- sion of what are known as the laws of chance. The length of leaves on a given tree, the height of a group of persons, the per cent net earnings of corporations, IOOa Figure 1. NormEil frequency curve (solid line), and an approxima- tion to it (dotted line) based on the fourth power of a bionomial. Horizontal scale in units of standard deviation from average (0). Quartilo deviation (Q.D.), and average deviation (A.D.) or the deviations from normal of a price index through a series of years, will show when properly classified and graphed an approximation to the bell-shaped fre- quency curve. In order to discover whether this curve is inherent in a given set of data, it is necessary first that the data contain a considerable number of items, and second that the classification be suitably adjusted to the range and numbers. If, for example, the height of a hundred persons were taken merely to the nearest TABULATION 11 foot, only two or three classes would appear. If, how^ ever, the measurements were taken accurately to .01 inch, so many classes would appear that the fre- quencies would be hopelessly scattered. But if we made our measurements to the nearest inch, we would obtain a series of frequencies somewhat like the fol- lowing (the classes ranging from 60 to 73 inches in- clusive) : 1:2:4:7:10:14:16:16:12:8:5:3:1:1. These fre- quencies when graphed will give an approximation to the bell-shaped curve. It will be necessary, therefore, in tabulating our wage data to work out experimentally the most suitable classification. The Tally Sheet and Frequency Table. To facilitate the classification of the wages selected for study, a tally sheet is drawn up as shown in the first two columns of Table II. We shall show here the details for only the 1891 figures, leaving the 1870 and 1880 figures to be worked out by the*student. In studies where the items must be entered singly, it is customary to use the famil- iar ''four and cross" method of tallying (/>y/ =5), but this is inappropriate when the items are already partially grouped as they are here. In this case the number of workers as shown in the wage roll is entered in the appropriate line of the tally sheet, much as jour- nal items are posted to a ledger. Each entry is sepa- rated from adjacent ones by a dash. Each line is then totaled, and the result entered under the five cent column of ''Frequency Classes." A series of values thus arranged according to magnitude is known as an array. An inspection of the five cent frequencies shows that we have discovered only a very rough approximation 12 INTEODUCTION TO ECONOMIC STATISTICS TABLE II WAGES IN A CONNECTICUT WOOLEN MILL, JULY, 1891 Freqdency Frequency Daily Tally Classes— Inter- Daily Tally Classes — Inter- Wage (No. of vals of: 2 Wage (No. of vals OK : 2 $ Workers) $ Workers) 5(J 15<^ 25^ 5Qco -O P Oi Q t-5 r-l i-H tJ ;< eg ,« ^ ►= OS Eh CO . O Oi H 5 CO v5^ OOOlL^t^lOOOi— lOCi— llOl^OO'^i-HtOCOi— IOOO«3CO rHi-li-li-(T-l-H(Moj'-''OCO(MTtl'-l''tlOOCM Tfl C<0 O cx5 ■*" T}H ~ OQ Oi O CM rti lO TiH OJ CO CO ?0 CM oi Oi lO I— li— IrH ,— ' iH r-(fO'<*< m<0(M t-COr-( r-l?D0000Oa00ib-C0Ol£>Cq00IMC»C001t^C0t^CCliH ^OCMTtft-OSOCOt^OSOcCt^i— l«OiHcO^QOOI»Oi'# TtlT}nCT^i— li-loOrHi-HCM-^t^ <»CDOOOC00000030i»OCOCMCOO(MOlOODlf50l?CI CMcOCOiX)CO«£'05'-Ht-Oi-*t~l005COi— IC\lTj-iou505coooocMtoait--.r>-5ooa>oaico OOt^tOTjioat^rHinuiTjirHlOOJciaJodr-icOiHodrHTjJ .«» ca ca (M CM i-H i-( Tj'«+l CMCMr-li-lr-l(MCMCMi-l0MC0c0CMOOOOOOoCM>0 jQjQjQ^jD^p2S;S:5'§S;2 "&;S ;2 S ;2 £ 5 iS -^ CMCMlHr-ICOb~CMTt*COi-t«5CMb~i-l-*'*lOTt-tCMc0CM«0^i-(C0f0«5>C00O'^-^ CO lO CQ t^ rH ; ntagos. Tlio valuo is, of course, unaircctod by audi loduction; and for j)urj)oflcs o*' lator com- parison tlio form as jj^iven is proforablo. 'Use is somotimofl made of tho Imrmoiiic mean In Hiidin^j tlie average of the relatives. This averaj:je is found 1/y lakinp the reciprocals oJ the relatives, comjjutinfj their averajje, and then takiii}^ the reciprocal of this averaj^'e. it may easily bo seen that the effect of takiiifj^ the reciprocals of the relatives is to reverse the base; that is, tlu! later year becomes the bast< instead of the earlier year. If vveij^hts nrc^ usi-d it is therefore preferable that they be derived from the data of the lat.i^r year in (indinjf the harmonic mean. Takin;j the reciprocal of the averajife again reverses the bases. Rut the index obtained by the usual direct method will not be quite the same as that obtained by the harmonic mean. This will ordinarily be the case whether the weif^hta are shifted to the later base, or not; or indeeOOl.OOa5'^01lO-^aO(Mt~mOO Tj'CC'M— lOOOOOOCOlX) QO«30«0005Cvlt--fOt^CDI>-lOMOiiX>«5(McoaiO'* rOCOCO(M.moOT}< CJOJrH,-,rH(MCgca.-((MCOeO(MOOOOOOONir5 ^^^j^jQjOrOjOj^jQ^jQ^ -^^ ^ ^ jQ j2 ^ jQ ^ 00000'*<'OlCTtlOOlot~tDlOlO"*t^mC jrf c3 3 pj -2 . /o -j . / / / / u/ T'/ ^O -s . y' ... 1 1 1 1 1 •^ 1 1 1 1 1 •S-4 -J -£ -/ / a 5 4- s FiouEE 8. Straight-line trend. TT', line of least squares for the seven points indicated; UU', line of unit slope. squares should be thoroughly understood, not only because of its usefulness as a trend, but also because it is the basis from which the principal method of com- puting correlation is developed. In explaining it, the following simple illustration will be taken : TRENDS AND CYCLES 107 Suppose that it is required to find a straight-line trend for the index y (see next page), the average of which is zero. The data are plotted upon coordinate paper — the x and y scales having preferably the same unit — as shown in Figure 8. The average of the data is made to fall on the x-axis, and the middle item is plotted on the y-axis. If we consider the vertical distance of each index from the x-axis to represent a force bearing upon that line, then the total moments of these forces will be expressed by the sum of the xy 's. This sum may be compared with the sum of the moments of a line passing through the same ordinates, and forming an angle of forty-five degrees with the X and y axes at their point of intersection.^ Such a line is said to have a unit slope; that is, it rises one unit {y) for each unit {x) to the right. Its slope may also be expressed by saying that the tangent of its angle (UOX) is unity. The sum of its xy^s is, of course, identical with the sum of its ic's squared. The slope of the line of least squares is found by compar- ing the moments of the data with those of the line of unit slope, as expressed in the formula : s = ^^^ in which, S = slope of the line of least squares, or tangent of its angle X ^ position of items relative to middle ordinate y = items, as given The data (y), their moments (xy), the moments of the line of unit slope (x-), and the computation of * The angle will vary, of course, if the x and y scales differ. 108 INTRODUCTION TO ECONOMIC STATISTICS the line of least squares, are shown in the following table : y X X* xy Trend -3 -3 9 9 -4.5 -4 -2 4 8 -3 -2 -1 1 2 -1.5 -1 1 1 1 1 1.5 5 2 4 10 3 4 3 9 12 4.5 = 28 S = ) 42 = 1.5 The last column headed ** trend" gives the line of least squares. This column is computed from the middle or- dinate (0) by adding the slope (S = 1.5) once for each successive ordinate in a positive direction, and sub- tracting it once for each successive ordinate in a nega- tive direction. Data having an average of zero have here been taken merely for the sake of simplicity; the same process w^th but little modification may be applied to any values. The dates or other numbers correspond- ing to the items cannot be used, however, but must be replaced by an x scale centering at the middle point of the series. The average of the data is found, and the trend is computed from this average by suc- cessive additions of the slope in a positive direction, and subtractions in a negative direction. The method is illustrated by the use of the following data, which parallel those used in the preceding illustration, except that the average is 100. This increase in the values of TRENDS AND CYCLES 109 y disappears from 2xy because it affects equally both the minus and the plus items. Year y X X2 xy Trend 1900 97 -3 9 -291 95.5 1901 96 -2 4 -192 97 1902 98 -1 1 -98 98.5 1903 99 100 1904 101 1 1 101 101.5 1905 105 2 4 210 103 1906 104 3 9 312 104.5 7 )700 28 ) 42 (average) A = 100 S = 1.5 The following details may be noted: (a) If there are an even number of ordinates, the y-axis will lie mid- way between the two middle ordinates, which are num- bered as -0.5 and 0.5 respectively. The horizontal positive scale will therefore read 0.5, 1.5, 2.5, etc., and the negative scale will be the reverse, (b) It will some- times be found that the value of ^ is negative. This indicates a downward slope of the line of least squares, (c) The position of the line of least squares is de- scribed by designating the period coinciding with the y axis as the point of origin, and by expressing the value of y algebraically in terms of the average and the slope. Thus in the above illustration the point of origin is 1903, and the equation of the trend is y = 100 -f 1.5x.i ^ It has been suggested that the method of least squares might be applied to the finding of a price index {Quarterly Journal of Economics, August, 1921, page 567). The expenditure for any given commodity may be plotted on a coordinate chart as the value of y, and the num- ber of units bought may be plotted as the value of x. The slant of a line (tangent of the base angle) drawn from the intersection of the axes to the point determined by the values of x and y, represents the price. The average price of a number of commodities may be 110 INTRODUCTION TO ECONOMIC STATISTICS The Parabola Trend. A broad treatment of the sub- ject of curve fitting would lead the student beyond the range of ordinary statistical work. We shall not, therefore, follow the subject farther, except to take up /2o/ \ m. m^ 90. io. 70. 60^ /870 1 mo mo mo /m Figure 9. Index of wholesale prices in the United States (see Table X) and trend. Indexes for 1870-1880 converted to a gold basis. a simple method of adjusting a parabola to an index. The method is one which is often used by engineers, and has also recently come into use to some extent among statisticians. taken as the slant of the line of least squares determined by all the coordinate pairs of x and y, and having the intersection of the axea as the point of origin (y = Sx). In such a case, of course, all the values of xy will be positive. To give definite comparisons at different dates, this method would require the use of "dollar's worths" as physical units. While the method is ingenious, it is of questionable validity, since in effect it involves a weighting of the prices by the square of the quantities in the process of finding the average. A defense of tlic nietliod on the basis of the use of least squares in the theory of errors docs not appear to be valid, since the theory of errors would call for merely the arithmetic mean of the number of determina- tiona. TRENDS AND CYCLES 111 The method of adjusting a parabola will be illus- trated by applying it to the Bureau of Labor Statistics ' wholesale price index for the years 1896 to 1915 inclu- sive, as charted in Figure 9. This figure includes also the same index for the years 1870-1895 (gold prices), to which a line of least squares has been fitted. But it may be easily seen that a similar trend would not be suited to the succeeding index numbers. The some- what regular curve of the latter portion of the index indicates that a parabola of the second order would be appropriate. In fitting the parabola, the year 1895 has been taken as the point of origin, though this year is not included in the results. It is estimated by inspection of the graph that the trend, if extended to 1895, would have a value of 64 at that date. Two other points deter- mining the trend may be similarly located, one at about the middle of the series and one at the end. These points have been taken as 88 for the year 1905 (x = 10), and 102 for the year 1915 (x = 20). We have, then, these coordinate values of x and y: If X r- 0, y = 64 If X = 10, y = 88 If X = 20, y = 102 The equation of a parabola of the second order is, y = a -f- bx + cx^ If the coordinate values of x and y as just stated are substituted successively in this equation, the following results will be obtained : 64 = a 88 = a + 10b + 100c 102 = a + 20b + 400c 112 INTRODUCTION TO ECONOMIC STATISTICS Solving for the constants gives, a = 64 b= 2.9 c= -.05 Substituting these values in the original equation gives the equation of the required trend, y = 64 + 2.9x -.OSx^ from which the value of each item in the trend may be found by substituting the coordinate value of x. Since trends are used as a basis for measuring fluc- tuations, the deviations of the data from the trend are usually computed. This is done by subtracting each item in the trend from the corresponding item in the data. If the trend has been accurately con- structed, the positive and negative deviations should be practically equal ; that is, their sum should be zero. Where the parabola has been used, a certain error will probably be found to have resulted from the fact that the original points determining the curve were located merely by inspection. An adjustment (centering) to remove this error may be made by finding the sum of the deviations (2D), dividing it by the number of the items (N), adding the result f-^j to each of the trend items, and subtracting it from each of the deviations. This is expressed in the equation of the trend simply by adding the correction ( -^ j to the value of a. In work in correlation, however, the correction may be more easily made by another method, as will be evi- dent later. When comparisons of the fluctuations in different scries are to be made, either by graphing or TRENDS AND CYCLES 113 by the computation of a coefficient of correlation, it is often necessary to find the standard deviation. For graphic representation, the deviations are reduced to multiples of the standard deviation, which serves as a comparable unit. Table XVI shows the derivation of TABLE XVI DEEIVATION OF TREND AND CYCLES OF WHOLESALE PRICES, BASED ON BUREAU OF LABOR STATISTICS INDEX, UNITED STATES, 1896-1915 Equation of trend, y = 64 + 2.9x — .05x^ Equation, corrected, y = 63.825 + 2.9x — ,05x» Point of origin, 1895. TEAR PRICE TREND D D D» CYCLES • INDEi X ,y CENTERED D/a 1896 66 1 66.85 -.85 -.68 .4624 -.33 1897 67 2 69.60 -2.60 -2.42 5.8564 -1.17 1898 69 3 72.25 -3.25 -3.08 9.4864 -1.49 1899 74 4 74.80 -.80 -.62 .3844 -.30 1900 80 5 77.25 2.75 2.92 8.5264 1.41 1901 79 6 79.60 -.60 -.42 .1764 -.20 1902 85 7 81.85 3.15 3.32 11.0224 1.60 1903 85 8 84.00 1.00 1.18 1.3924 .57 1904 86 9 86.05 -.05 .12 .0144 .06 1905 85 10 88.00 -3.00 -2.82 7.9524 -1.36 1906 88 11 89.85 -1.85 -1.68 2.8224 -.81 1907 94 12 91.60 2.40 2.58 6.6564 1.25 1908 91 13 93.25 -2.25 -2.08 4.3264 -1.00 1909 97 14 94.80 2.20 2.38 5.6644 1.15 1910 99 15 96.25 2.75 2.92 8.5264 1.41 1911 95 16 97.60 -2.60 -2.42 5.8564 -1.17 1912 101 17 98.85 2.15 2.32 5.3824 1.12 1913 100 18 100.00 .18 .0324 .09 1914 100 19 101.05 -1.05 -.88 .7744 -.43 1915 101 20 102.00 -1.00 -.82 .6724 -.40 16.40 17.92 20)85.9880 -19.90 -17.92 20 )-3.50 K = - .175 0. a' = 4.2994 (J = 2.07 8.66 -8.66 ^ Any set of deviations taken from an average or trend, and intended to measure cyclic movements, are commonly designated as cycles. They need not necessarily be reduced to units of the standard deviation. In some cases no complete cyclic movement may be discovered, but the same designation may be used. 114 INTRODUCTION TO ECONOMIC STATISTICS the trend just discussed, together with the correction, and the reduction of the deviations to multiples of the standard deviation. Figures 10 and 11 show price cj^cles as thus measured compared mth other cycles similarly obtained. The parabola may be used where compound curves are required by adding the term dx^, and perhaps ex* to the equation. For each term thus added, an addi- 1900 I I I 1905 I I I i I I9IS Figure 10. Cycles of wholesale prices (solid line) and commitments to New York State prisons (broken line). Eeprinted, with permission, from Tlie Quarterly Journal of the Uni- versity of North Dakota, January, 1922. tional point may be located by inspection, and the trend may thus be more exactly fitted. But the work of solving and applying such equations becomes very laborious. With practice, however, the student will find ways of abbreviating the process and modifying it to suit his purposes. Often the terms of the equa- tion may be estimated by simple experimentation. The position of the point of origin may be varied to suit given requirements. A compound curve shaped some- TRENDS AND CYCLES 115 what like an italic / may be obtained by using only odd numbered powers of x in the equation, and taking the point of origin near the middle of the original series. "With a little ingenuity, sine curves and other trends may be experimentally fitted.^ Figure 11. Cycles of wholesale prices (solid line) and marriage rate (broken line) in the United States. Eeprinted, with permission, from The Quarterly Journal of the Uni- versity of North Dakota, January, 1922. Analyzing Business Barometers. A somewhat in- tricate problem in trends is met when monthly data are employed as indexes, or barometers, of business conditions. Since such barometers are very generally consulted as guides to business activities, their inter- * When it appears that the trend of an index series increases or decreases by approximately a fixed ratio, an exponential curve may be readily fitted as follows. Find the logarithms of the data, plot them, and construct a straight-line trend by means of the semi-aver- ages, or by a line of least squares. Eead the items of the trend from the chart, consider them to be logarithms, and find the corresponding numbers. These numbers will be the items of the required trend. In a long series the work may be abbreviated by grouping the original items, aa by decades, and fitting the curve to the averages of these groups. 116 INTRODUCTION TO ECONOMIC STATISTICS pretation is a matter of great practical importance. The difficulty involved in their use lies in the com- plexity of the influences playing upon them. For con- venience of analysis these influences have been clas- sified as (1) a seasonal variation usually due to the dependence of industry upon weather conditions, illus- trated by the rising of the interest rate mth the move- ment of crops, (2) a cyclic movement covering an in- terval of several years and marked by alternating in- dustrial depression and activity, and (3) a secular trend or gradual change due in most cases to the growth movement, as seen in the increase in produc- tion. In addition to influences which may be appro- priately classed under one of these three headings, there are others that must be looked upon as more or less accidental interruptions, of which no exact account can be taken. Measuring Seasonal Variations. The most sat- isfactory method of analyzing monthly business ba- rometers is first to compute an index of seasonal va- riations, and then to subtract it, month by month, from an index of the data based upon the secular trend. The result is an index of the cycles. As has already been noted, the twelve-month moving average is sometimes assumed to measure the data as distinct from the seasonal variations. But such a method of elimination takes into account the fluctuations of only one year at a time. Other methods have therefore been resorted to with the purpose of measuring the seasonal swing more exactly on the basis of several years. A simple method of this sort, which may be considered valid for a period in which the cyclic in- TRENDS AND CYCLES 117 o o Q < GQ ■■§ Eh Wo J2i w o O •^i-it~«c>a>i>-co«oocoooo I rH I I I I 1 ,-( r-l rH (MrHOCiOOI-^OlO-^fOMi— I p(MTj •<* Tj? Tji T)i •*' TjJ -^^ lo in in in in ic irj in in ui w to to to' lo lo in O O- r-J tJH 1-; O in (M 00 o o o ■* CO T}I TjI •^ -^^ -.^J in in to to to CO o o to to to oi ■* in in o in '^■^^COCOCOCO'^^^'^Tji oqin»>.oqoqo>; Tj? ci co' c«5 co' CO co' Tj? r}I in in \a v^ p.^ g <; ;^ ►^ ►^ <5 c» D ^ ft P3 02 H o ^2; e3 00 *^ eg fl N O CO 02 w W t^ r-l 00 CO I in I ■* rH O iH r}< M iH O rH CQ I P rr> ;$2 OOOt-(MtO PH P2 C0r^O00t0 o ■<}< m ■* ^^ m 63 C5 O rH W CO ^ O rH r-( r-l rH Ed 0^0)0)02 P r^ 1-^ T-\ r-\ j-\ 118 INTRODUCTION TO ECONOMIC STATISTICS fluences are moderate and well distributed along an approximately straight-line trend, is illustrated in Table XVII. The Method of Averages. The problem stated and solved in the table is the finding of the seasonal varia- tions in the interest rate on the somewhat slender basis la^Q I8 60 18 80 I900 1920 Figure 12. Average interest rate on commercial paper in the United States (lower line) compared with index of wholesale prices (upper line). Adapted from Monthly Review, New York Federal Reserve Bank. of the five years 1909 to 1913, inclusive. The monthly rates as given are first averaged both by columns and rows in order to obtain the annual averages and the month averages. The process may therefore be called the method of averages. If the interest rate had maintained approximately the same level from year to year during the interval TRENDS AND CYCLES 119 studied, nothing more would be needed than to reduce the month averages, or types, to a percentage of their own annual average. The results would form an index of the seasonal variations. But if there is a secular movement, it must be canceled from such an index. If an interval of half or three-quarters of a century is taken into account, a general downward trend of the interest rate may be discovered, as may be seen in Figure 12. But the five years here studied happen to be an exception. By means of the line of least squares, a positive slope of 0.019 monthly is revealed. By ap- plying this slope to the average of the month types, an annual trend is constructed, having its point of origin midway between the June and July items. It will be noted that one-half the slope must be added to the average to obtain the July trend item, and the same amount subtracted to obtain the June item. In ob- taining the remaining trend items, the slope is applied as previously explained. Confusion may perhaps here arise from the fact that the slope was computed from the five annual aver- ages, but was applied to the construction of a trend with which to compare the monthly data. But it should be obvious that in this case a slope obtained from the month averages would be materially affected by the seasonal swing. This we wish to retain, while the secu- lar trend we wish to cancel. Of course we could find the secular trend from the monthly data taken consecu- tively as sixty items, but such a method would be un- necessarily laborious. Hence we find it from the five annual averages, and apply it to the construction of a line that will serve to cancel the secular trend from 120 INTRODUCTION TO ECONOMIC STATISTICS the month types. This cancellation is accomplished by dividing the month averages, item by item, by the trend. The quotients should be centered, if necessary, by reducing them to percentages of their common aver- age. The result is an index of seasonal variations, from which the percentage deviations, month by month, may be directly stated.^ Applying the Seasonal Index. The method of applying the seasonal index has already been sug- gested, and may be described as follows. A secular trend for the whole period under consideration is con- structed — in this case the line of least squares already found may be extended — and the data are reduced month by month to percentages of the trend. From each month's item as thus found the seasonal index for the same month is subtracted. The remainders are assumed to measure the cycles, and may be plotted as deviations above and below a horizontal axis. The seasonal index may, with caution, be applied to other years than those from which it is derived; of course, the greater the number of normal years from which it is derived, the safer such an extension of its use to comparable years becomes. The Link-relative Method. A complex but more ac- curate method for measuring seasonal variations has ^ Another variation of the method of averages — one that is per- haps theoretically preferable, but which involves more extended cal- culations — may be briefly described as follows: A twelve-month moving average of the monthly data tlirough a given scries of years is first ^ound. This is adjusted to make it conform to the ordinatea of the original series by deriving from it a two-month moving average. There is then obtained, for each month^ the ratio of the original monthly item to the corresponding adjusted moving average. The median of the ratios so obtained for the Januaries is taken as the index of sea- sonal variation for January; and index numbers for the other months are similarly obtained. The twelve results are then centered, if neces- TRENDS AND CYCLES 121 been developed by Professor Persons, and applied to the analyses appearing in the early numbers of the Review of Economic Statistics. In a somewhat sim- plified form, this method is illustrated in Table XVII- A, which is based on the data of Table XVII. Briefly stated, the method consists in finding what are called '' link-relatives"; that is, the percentage which the index of each month is of the preceding month. The median link-relatives are then selected from each month's series, and are tabulated as the month types.^ Beginning with December as a base (100%), the types are multiplied consecutively, producing an index series from January to December for a typical year. If the final December item fails to come out to 100%, in con- formity with the base in the preceding December, a secular trend is evidently disturbing the index. The discrepancy, if moderate, may be removed by distribut- ing it throughout the year ; that is, by subtracting one- twelfth of it from the January index, two-twelfths from the February index, and so on through the year.^ The results are designated an adjusted index. The ad- justed items are next centered by reducing them to a base of the average of the series. In making the computations the figures were carried to one more place than is shown in the table. The superiority of this method lies in the fact that in taking the link-relatives, and in selecting their medians as the month types, the effects of the cyclic sary, by reducing them to percentages of their common average (Cf. Jordan, Biisiness Forecasting, p. 212). * To get the best results, the median should be based on a larger number of years than are here taken. 'A more exact method is to apportion the discrepancy geometrically; that is, to divide the January index by the twelfth root of the final December index (written as a decimal), the February index by the square of this root, and so on. 122 INTRODUCTION TO ECONOMIC STATISTICS H < m « H O O CO :?. 1-5 < 1 1- w o (M 1-1 00 a> tOOii-IIMi-lfOOClOtMT-IOOOi osooasoiOJOOi'-i'-i'-ioo oot-i-^m'^'OOT-icocvioo COaOOOOOCX5000lOOOOO w < a '^ a OiCCTtlc^lOfMtOCOCQOaOOJ OOOSOOOOOrHOOOlO I-H COOfMCaiOOjNOOt^OOCCl OOOrHOOOOOOOiOiO I— liHi— 1 I— II— It-liH tH I-H Oi OlCOi-ll^eOOONiniMMOO COOli— I00505i-li— Ii-hOOO 1— fr— 1 I— li— IrHi— li— li— 1 g ^ 1-1 i-l COCOOOOOOOCOIMOOiO) OiClOOSOOOi-tOOoOrH 1-1 i-li-li-(i-li-li-l rH o 1—1 I-l COOlOOOOoiHOoOlt— o TtlNOCOOlXl> TRENDS AND CYCLES 123 movements and of chance influences are minimized. The secular trend is satisfactorily eliminated by the process of adjusting. The index thus obtained should be accurate for the years just preceding the establish- ment of the Federal Reserve System. Since that time seasonal changes have lessened.^ Business Cycles. The Review of Economic Statis- tics, in the analyses just referred to, has made exhaus- tive studies of the cyclic movements of the commonly used business barometers. After measuring the in- fluence of seasonal variations by a process similar to that just described, it determined the cycles on the basis of lines of least squares. It then combined twelve of the principal barometric indexes into three composite indexes which are taken as measurements of speculative activity, business activity, and banking strain, respectively. A chart of these three composite indexes for the years 1903 to 1913, inclusive, is here reproduced (Figure 13). The chart shows very clearly the general stages of the business cycle, from the pre- dominance of the speculative activity which marks the awakening from a period of depression, through the period of intensified production, and into the period of banking strain which heralds another depression. The various barometric series used in constructing the figure are indicated in the explanation accompanying the title. It should be added that while New York bank loans point to the speculative aspects of the cycle, those outside New York conform more closely to business *For the most exhaustive study of seasonal variations in the interest rate, the student should consult "Seasonal Variations in the Relative Demand for Money and Capital in the Fnited States" (National Monetary Commission, 1910), by E. W. Kemmerer, See also the Monthly Review, Federal Reserve Bank of New York, Feb. 1, 1922. 124 INTRODUCTION TO ECONOMIC STATISTICS activity. For practical purposes measurements of the general aspects of the cycle need to be supplemented by indexes showing the position of particular indus- tries, as indicated by relative production and stocks of goods on hand. This need is now being met in part by the Federal Reserve Bank of New York, and by other agencies. Several phenomena more psychological than eco- nomic in nature show a tendency to fluctuate more or Figure 13. The index of general business conditions, 1903-14. A, Speculation: New York Bank clearings, average price of indus- trial stocks, average price of railroad stocks, and average price of railroad bonds. B, Business: Bank clearings of the United States outside New York City, Bradstrect's index of wholesale commodity prices, United States Bureau of Labor Statistics' index of wholesale commodity prices, and pig-iron production. C, Banking: Interest rates on 60-90 day and on 4-6 months com- mercial paper in New York City, loans and deposits of New York City clearing house banks (both inverted). Reproduced from Beview of Economic Statistics, by permission of the editors. less closely with the business cycle. Professor A. H. Hansen has recently showTi statistically that since 1898 strikes increased with prosperity, though before that date, when the trend of prices was downward, they increased with depression (cf. American Eco- nomic Review, Dec, 1921, pp. 617-621). Mr. Roger Babson has traced a connection between church growth and the business cycle; religious activity being ap- parently intensified during a period of depression. TRENDS AND CYCLES 125 Unemployment, failures, suicides, and crime generally, also increase during a period of depression. On the other hand, immigration, the marriage rate, and ex- travagance, tend to increase with a period of prosper- ity (cf. Figures 10 and 11, pages 114 and 115). As an example of the application of statistical meth- ods to the practical problem of forecasting the cyclic movements of business, the ''Annalist Barometer and Business Index Line" may be cited. This index is published in graphic form, together with a brief ex- planation, each week in the Annalist, a well-known financial paper of New York. It is derived from the data elaborated by the Review of Economic Statis- tics, already briefly described. The index is the recip- rocal of a weighted average of the deviations from normal of commodity prices, interest rates, pig iron production. New York bank clearings, and bank clear- ings outside of New York. Since these series of data, when directly combined, measure the later phases of the business cycle, their decline precedes and fore- casts the rise in stocks marking the beginning of the next cycle; and their rise similarly forecasts a decline in stocks. By taking the reciprocal of the deviations of the combined series, the forecast is made direct instead of inverse. By a comparison of the forecast- ing index and the movement of stocks in former years, the decisiveness of change in the former necessary to constitute a forecast of the latter has been determined. A detailed account of the construction and use of the index will be found in the Annalist of March 28 and of October 24, 1921. The student who desires to inquire more intensively 126 INTRODUCTION TO ECONOMIC STATISTICS into the statistics of the business cycle should consult for himself the data published in the Review of Eco- nomic Statistics already mentioned. In addition he should become famihar wdth Wesley C. Mitchell's standard work on Business Cycles, and with a more recent work by D. F. Jordan on Business Fore- casting. In connection with the underlying causes of the cycle, reference should be made to the interest- ing but very technical works of H. L. Moore (cf. Eco- nomic Cycles, and articles in the Quarterly Journal of Economics, February, August, and November, 1921). Professor Moore discovers some relation to exist between a weather cycle of heavier and lighter rainfall and the business cycle, the average duration of each cycle being about eight years. The moist years bring as a rule larger crops, with some tendency to a lowering of the general price level, followed during the drier years by a rise in the price level. This rela- tion seems to be clearer for English prices than for American. The weather cycle is shown to be synchro- nous in several countries, and to be correlated with a cycle of barometric pressure, which in turn may have astronomical causes. But while the subject is very interesting and valuable theoretically, the correlations disclosed are too irregular to be of great practical value. REFERENCES Babson, Roper W., Business Barometers. Davies, G. R., "Social Aspects of the Business Cycle," Qimr- ierly Journal of the University of North Dakota, January, 1922. Hurlin, Ralph G., "The Long-Time Trend of Prices in the United States," The Annalist, July 4, 1921. TRENDS AND CYCLES 127 Jordan, D. F., Business Forecasting. Kemmerer, E. W., High Prices and Deflation. Mitchell, Wesley C., Business Cycles. Moore, Henry L., Economic Cycles: Their Law and Cause. Peddle, John B., The Construction of Graphical Charts, Chapter VI. Persons, W. W., "Construction of a Business Barometer Based upon Annual Data," American Economic Review, December, 1916, pp. 739-769. Piatt, Andrew A., National Monetary Commission. Tingley, Richard H., "Another Yardstick of Banking Condi- tions," The Annalist, November 28, 1921, p. 511. EXERCISES 1. Plot the data for production and price of wheat, 1870- 1920 (Tables XIV and XIV-A, pp. 76 and 77), and draw a free-hand trend for each series. 2. As in Exercise 1, construct free-hand trends for the pro- duction and price of com. 3. Plot the index of physical production of crops, 1870- 1920 (Table XV, p. 81), and draw a straight-line trend by inspection. 4. Apply the method of semi-averages to the construction of a straight-line trend for the data of the preceding exercise. 5. Compute a five-year moving average of per capita pro- duction, 1890-1918. Plot both the trend and the data from which it is derived on 17"x22" cross-section paper, and measure graphically the deviations from the trend. 6. Plot on a horizontal axis the deviations obtained in the preceding exercise. Find the average deviation, and indicate this on the graph for both the plus and the minus deviations. 7. Compute and plot a straight-line trend (line of least squares) for the following price index. Year Prices (3 articles) 1897 85 1898 70 1899 90 1900 130 1901 125 128 INTRODUCTION TO ECONOMIC STATISTICS 8. Find straight-line trends (lines of least squares) for the production and price of pig iron as given below, taking each year separately. PEODUCTION AND PRICE OF PIG IRON (Iron Age) (000 omitted from production) Month 1909 1910 1911 1912 1913 Tons $ Tons $ Tons $ Tons $ Tons $ Jan. . 1,797 16.25 2,608 17.25 1,759 14.25 2,057 13.25 2,795 16.95 Feb.. 1,707 16.13 2,397 17.06 1,794 14.25 2,100 13.31 2,586 16.69 March 1,832 15.05 2,617 16.30 2,188 14.25 2,405 13.50 2,763 16.31 April . 1,738 14.25 2,483 15.37 2,065 14.25 2,375 13.75 2,752 15.65 May.. 1,883 14.50 2,390 15.00 1,893 13.95 2,512 14.15 2,822 14.94 June. 1,930 14.70 2,265 14.85 1,787 13.44 2,440 14.25 2,628 14.06 July.. 2,103 15.75 2,148 14.75 1,793 13.25 2,410 14.70 2,560 13.75 Aug.. 2,248 16.38 2,106 14.31 1,926 13.45 2,512 15.06 2,543 14.06 Sept- . 2,385 17.35 2,056 14.25 1,977 13.31 2,463 15.87 2,505 14.25 Oct.. 2,599 17.88 2,093 14.25 2,102 13.25 2,689 16.80 2,546 14.35 Nov.. 2,547 17.75 1,909 14.25 1,999 13.20 2,630 17.25 2,233 13.87 Dec. . 2,635 17.45 1,777 14.25 2,043 13.19 2,782 17.25 1,983 13.95 Total 25,410 16.12 26,855 15.16 23,329 13.67 29,383 14.93 30,722 14.90 (Reprinted, with permission, from Babson's Desk Sheet.) 9. Find the average index of per capita physical pro- duction in the United States (page 81) for each dec- ade from 1870 to 1919. Using the resulting five aver- ages, construct a line of least squares. Plot the original data and the trend thus found. 10. Find the average index of production of iron and copper by five year periods from 1870 to 1914. Plot these aver- ages, and fit to them a parabola of the second degree. 11. Compute and plot the cycles of the interest rate, 1909- 1913, using the data and index of seasonal variations presented in Table XVII, page 117. 12. Using the data given below, find an index of seasonal variations in exports of merchandise (a) by the method of averages, and (b) by the link-relative method. TRENDS AND CYCLES 129 EXPORTS OF MERCHANDISE, UNITED STATES, 1909-1913 (In Millions of Dollars) 1909 1910 1911 1912 1913 January 157 144 197 202 227 February 126 125 176 199 194 March 139 144 162 205 187 April 125 133 158 179 200 May 123 131 153 175 195 June 117 128 142 138 163 July 109 115 128 149 161 August 110 135 144 168 188 September 154 169 196 200 218 October 201 208 210 255 272 November 194 207 202 278 246 December 172 229 225 250 233 (December, 1908, 170) 13. The following table, adapted from the Yearbook of the Department of Agriculture, 1918, gives the farm price of wheat in the United States (cents per bushel) on the first day of each month for the years 1909 to 1913 inclusive. Using the method of link relatives, derive an index of seasonal variations. 1909 1910 1911 1912 1913 January 1 . . . . 93.5 103.4 88.6 88.0 76.2 February 1 ... 95.2 105.0 89.8 90.4 79.9 March 1 103.9 105.1 85.4 90.7 80.6 April 1 107.0 104.5 83.8 92.5 79.1 May 1 115.9 99.9 84.6 99.7 80.9 June 1 123.5 97.6 86.3 102.8 82.7 July 1 120.8 95.3 84.3 99.0 81.4 August 1 107.1 98.9 82.7 89.7 77.1 September 1 . . 95.2 95.8 84.8 85.8 77.1 October 1 94.6 93.7 88.4 83.4 77.9 November 1 ... 99.9 90.5 91.5 83.8 77.0 December 1 . . . 98.6 88.3 87.4 76.0 79.9 (December 1, 1908, 92.2) 14. Using the method of averages, derive an index of sea- sonal variations from the following table of farm prices of wheat in the United States (cents per bushel) for the years 1909 to 1918, inclusive. 130 INTRODUCTION TO ECONOMIC STATISTICS Yearly- Monthly averages ; averages ; 1909 101.3 Jan. 1 109.4 1910 96.5 Feb. 1 115.2 1911 86.9 March 1 115.2 1912 87.4 April 1 116.4 1913 78.4 May 1 125.6 1914 88.4 June 1 126.0 1915 105.2 July 1 117.7 1916 125.9 Aug. 1 117.9 1917 200.8 Sept. 1 117.4 1918 204.3 Oct. 1 116.5 Nov. 1 119.7 Dec. 1 118.6 15. Compute and plot the cycles in the price of wheat, 1909- 1913, as measured from a line of least squares. Use the data of Exercise 13. 16. From financial journals and other sources obtain monthly or weekly quotations for recent and current dates illus- trating barometric subjects such as are mentioned on pages 66 and 124. Plot the data and construct trends. On the basis of these barometers and such other informa- tion as may be available, make a forecast of business conditions for the immediate future, allowing for seasonal variations. CHAPTER VI CORRELATION Correlation Defined. A study of the cycles of busi- ness barometers leads to the problem of classifying and measuring the relationships among them. These relationships may be discovered in various forms and degrees. For example, the cycles of building permits and pig iron production will be found to move some- what closely together. Then again, stock prices and commodity prices form similar waves, though the lat- ter usually follow a few months behind. On the other hand, commodity prices and business failures show opposite movements — when one is up the other is down. All such relationships between two sets of data are known as correlations. When the two sets of cycles agree, the correlation is called positive ; when they dis- agree, it is negative. When the cycles are not quite coincident in point of time, the one which follows is said to show a **lag" of a given interval.^ Correla- * The term ' ' lag ' ' is also sometimes used to designate a smaller degree of variation occurring in one series than in another comparable to it. Thus the lag of retail prices behind wholesale prices ia largely a matter of degree, and only slightly a matter of time. But it is the time element only that enters into the calculation of correlation. In allowing for the lag, the series coming later in time is considered aa moved back by the length of the lag, and the corresponding items are then compared. When the length of the lag is difficult to determine, estimates must be made, and the correlation computed on the basia of each estimate. The lag resulting in the most marked correlation is assumed to be the correct one. In determining the lag it is often necessary to take into account the causal relation existing between the two series under consideration, as in a case where the assumption of a lag for one series results in a positive correlation, while the transfer of the lag to the other series results in a negative correlation. 131 132 INTRODUCTION TO ECONOMIC STATISTICS tion carries the idea of a fundamental relationship: either one phenomenon acts or reacts upon the other, or both are due to common causes. The principle is not limited to time series. Comparison might be made, for example, between the advertising and the rate of earnings of given business firms. But in any case the principle would be the same as before, and the methods used would be practically identical. The Graphic Method. A fairly good study of cor- relation in economic phenomena can often be made mthout any more elaborate methods than those al- ready described in isolating and plotting the cycles. Two time series, reduced to standard deviation cycles and plotted on equal horizontal scales may be very well compared by superimposing one on the other. To fa. cilitate comparison, one may be drawn on a trans- parent medium, such as tracing cloth ; or a mimeoscope may be used. By shifting the superimposed cycles back and forth, the lag may be fairly accurately deter- mined. The correlation may be described as posi- tive or negative, high, moderate or low, and the lag and its consistency may be stated. This is the method adopted by the Review of Economic Statistics in its study of the correlations existing among 24 business barometric series for the years 1903-1914. Method of Concurrent Deviations. It is often desir- able, however, to measure the degree of correlation in precise mathematical terms. This is particularly true when correlated data are being; used in support of a given theory. In order to obtain a precise result, mathematical methods developed originally for use in biometrics have been borrowed and adapted. CORRELATION 133 As an introduction to the mathematical methods of measuring correlation, we may take up a simple for- mula which is well adapted to the comparison of short- time fluctuations. The formula is the expression of what is called the method of concurrent deviations. It may be illustrated by applying it to a comparison be- tween the short-time fluctuations of real wages and per capita production in the United States (cf. pp. 51, 53, and 81). The fluctuations may be most readily deter- mined by reference to a graph of each series (cf. Fig. 6, p. 101). If at any given year the line makes an in- verted angle, like a caret (A), the fluctuation is reg- istered on the index as positive ( + ). If the angle is V-shaped, it is registered as negative (-). If no angle is formed, the year is indicated as neutral (0). In some cases it may not be possible to determine from the graph whether the angle is neutral, or slightly posi- tive or negative; in which case resort may be had to the data. After the deviations of both series have all been registered, they are compared across, item by item. If in a given year both indexes show a positive fluctua- tion, one agreement is counted. If positive and nega- tive meet, one disagreement is counted. If one or both of the fluctuations of a given year are neutral, one- half is added to both the agreements and disagree- ments. When this summation is complete, the larger of the two totals thus obtained is designated as the number of concurrent deviations, denoted in the for- mula by the letter C. The sign of the coefficient to be obtained by the use of the formula is determined by the nature of the concurrences. If they are agree- 134 INTRODUCTION TO ECONOMIC STATISTICS ments, the sign is positive ; if disagreements, the sign is negative. The formula for correlation (B) as thus measured is : In the case of the wage and production indexes just mentioned, the number of disagreements, or concur- rences, totals to 33>4 and the number of comparisons is 49. The formula therefore becomes: = — .61 The derivation of the formula is of little importance, as it is patterned empirically on the one next to be described. The significance of the coefficient will be- come evident in the same connection. The Pearson Method. We come now to the so-called Pearson **r", the most satisfactory method to apply to straight line correlations: that is, to those which when graphed show an approximation to a straight line rather than a curve. This statement does not refer to the trends of the two series taken separately, but only to the trend formed by the two sets of cycles plotted as x and y, respectively. In explaining the method, it is most convenient to begin with two sets of cycles, or deviations, already reduced to units con- sisting of their respective standard deviations. The following table gives two such series, and the process of finding their correlation. The two cycles are graphed as coordinates in Figure 14, page 136. Since the units used in both cases are standard devia- tions, the spread on the two axes, as measured by the CORRELATION 135 CORRELATION OF PRICES (X) AND EMPLOYMENT (Y) JANUARY, 1920, TO JANUARY, 1921 Cycles, in Units of Standard Deviation Totals X y X^ xy .75 .59 .5625 .4425 .94 .85 .8836 .7990 .91 .51 .8281 .4641 .88 1.10 .7744 .9680 .89 .68 .7921 .6052 .57 .51 .3249 .2907 .37 .34 .1369 .1258 .18 .42 .0324 .0756 -.14 -.09 .0196 .0126 -.53 -.34 .2809 .1802 -.98 -.59 .9604 .5782 -1.74 -1.27 3.0276 2.2098 -2.10 -2.71 4.4100 5.6910 13.0334 12.4427 12.4427 r = 1 o nnoA — .96 13.0334 cycles squared, must necessarily be equal. If every deviation in one series concurs with an equal deviation in the other series, the points when plotted will neces- sarily fall on a diagonal sloping upward from left to right at 45°. If positive deviations concur with nega- tive, the points will lie in a diagonal sloping downward from left to right at 45°. In the first case a line of least squares drawn through the points will necessarily have a slope of -f 1, and in the second case of — 1. These are obviously the largest results, both positive, and negative, that could be obtained from two such correl- ative series. A neutral result of zero would be ob- tained if the points as plotted fall in haphazard posi- tions about the two axes. The slope of the line of least squares (the tangent of its angle \vith the X-axis) 136 INTRODUCTION TO ECONOMIC STATISTICS is therefore taken as the measure of correlation. Its basic formula is : 2 X y ^""2x2 In ordinary work it is, of course, necessary first to find a trend for each series, if the cycles are to be meas- ured. If the deviations are taken from the average of each series, the general direction and form of the two lines will be contrasted. This is equivalent to assum- ^ < P3 rr OCQ ^ -f^ <1S <1/ ^ )-l o "4H o 1— ( Oh-) t-H p4 a o f^ <*H O T) O ^ r-) en M rt ^ a O t-. WQ A. ^-^ !^ P O o M E-i -< W o o 00 o ■<*< CO i-i o 00 COCMrHOr-ICNCOTtlkn oooooooooco i-t t-- CO t-- ,-( in 1-H (M L-- a> CI b-^ CQ in CO 00 in CO in oo ti< •>*' rHr-li— li-(Csll-Hi— li-liHrHi— i -I oj co' in co' t-' oo' oi i-(i-I(M(M(M(MCJ(M(MIM(M ooO(©ao cccDi— looo ocooJoinoin-^iHoo'id OJcocoinoQ (M»nt~i— iin I I I I 1 "^M ^ fe o in o CO 00 C5 00 CO t^ t~ o 00 CO CO in in in in t-^ co' oi "-H ,— II— IOJ(MCO CO -^ in CO t^ 00 ci o t-i ci CO OoOOOOOi-HrHi-Hi-t d Ol 05 Ci Ol C3 05 O". O; <»5 Oi o CD in o in CO 02 O © © A C3 rH en o to '^ 3 ^ J^ aj tx CI •2?fl - 9 !3 la ,Xi « 3 o ff 3 0^ £ ■ »; TO J3 •n " g en Cj (S ;M . a Pi m .a .2 3 t^ 11 gat a| o £-§ cj TO 'n'*^ g •a o " 0) 5? ® ^ t -(^ *i "^ eg ° f..^ § fl OJ.S ® 1^2 +3 .9 o tc «J C4 vu -^a a a a tc a p,S CORRELATION 139 2 3 u s a ^ o Oi o Tt< t- 05 cvi ■* CO iq •* f O C r ' f iH r f t~ «o CO ai 00 cci iq ■* iq l' N N CO CO t-- * r-J CO r-H rH COTjn Oi CM CO CO X 1 i tH fH 1 s to 1 jz; co 'i 1^ 140 INTRODUCTION TO ECONOMIC STATISTICS The Probable Error. It will be seen that the first part of Table XVIII records merely the finding of the trends (lines of least squares) for the two series. The figures have not been carried out beyond approximate- ly one per cent accuracy. The production and price cycles shown in the latter part of the table are obtained by the usual method of subtracting the trend items 'Z tH tH I— 1 t— 1 r-l iH 1 Oi to m CO (M 1 '^ OO CO o CO •* ' ^ •O iH o iH CO II II o ^- • CO f 0"J tH rH CO iH MW II II'PII " t^ II ^ r> 111! b t-H ■* l^ O t^ O CO (M 1-1 1-1 >-l UO n II ^ .|. W to t. ^ 1-1 t^ 1-1 iH t- 05 «£> iH iH to CO CO M m T-( tH iH rH ^ o o (M C<1 o iH M ^ '^ iH (M (M iH CO Tj( CO o (M (M ^ S ^ CO rH iH W to OO es I i 144 INTRODUCTION TO ECONOMIC STATISTICS itive influence upon the latter within a period of a year or two, but the relation is not very regular.^ Correlation from Frequency Tables. A somewhat diflScult application of the Pearson method of meas- uring correlation is encountered when the two series which are to be compared are each compiled in fre- quency tables. The case is illustrated in Table XIX, where average entrance examination grades and aver- age scholarship grades for the four college years are compared. The entrance examination data are ex- pressed in per cents, tabulated to the nearest multiple of five. The scholarship grades were expressed pri- marily in six groups, ranking downward in order from the first to the sixth. The averaging of such groups for four years gave results carried out to fourths of a group, as shown in the table. For convenience of cal- culation the values of both scales are converted into unit intervals measuring the deviations, the new scales centering at a zero set opposite the values (70 and 314) which are assumed as the averages. The number of frequencies for the combined series is written at the appropriate coordinate points in the body of the table, while the frequencies for each scale taken inde- pendently are written to the right and below (column F and row F). The initial steps in the computation will be readily understood by reference to the short-cut method of finding the standard deviation. By this method the standard deviations for both series, respectively, are determined. The finding of 2xy is a somewhat long * H. L. Mooro, in Economic Cycles, finds a positive correlation be- tween an eight-year crop cycle and general prices, allowing a four- yeaj lag to the latter. CORRELATION 145 process, since each of the frequencies at the coordinate points in the body of the table must be taken into ac- count. Each of these frequencies is multiplied by its two coordinate values, and the products are totaled. To illustrate, the first three columns of frequencies give the following results: F X y Fxy 1 -6 1 -6 -2 12 2 -5 -1 10 1 ^ 1 -4 2 -4 -1 8 By continuing this computation, a total result of 148, as the value of 2xy, is obtained. Since the two inserted scales measuring the devia- tions are not centered with precision at the two axes of the table, as determined by the averages of the two series, respectively, the plus and minus deviations from the assumed averages will not exactly balance. Hence a correction must be made in the 2xy, just as in the two standard deviations. The corrections (Ki and K2) applied to the finding of the standard deviations are, of course, merely 2FD -^ N. It may be shown that Sxy will be increased by the product of the two corrections, for every item included. The corrected summation of the moments about the coordinate axes is therefore expressed : 2xy - NK.Ks In other respects the formula is as previously used. For convenience, however, it is written in the revised form shown at the foot of the table. ^ Applying the * In correlations where the deviations to be contrasted are intended to be taken from the average of each series, the coefficient may be 146 INTRODUCTION TO ECONOMIC STATISTICS formula to the data in question, the value of r is found to be .62 ± .06, a well-marked correlation. TABLE XX COERELATION OF RANKING OF STATES IN MANUFACTUR- ING AND IN LITERACY, U. S., 1860 STATE Alabama Arkansas Connecticut Delaware Florida Georgia Illinois Indiana Iowa Kentucky Louisiana Maine Maryland Massachusetts. . . Michigan Mississippi Missouri New Hampshire. New Jersey New York JSTorth Carolina. . Ohio Pennsylvania. . . . Rhode Island. . . South Carolina. . Tennessee Vermont Virginia Wisconsin lUNK IN MANUFAC- TURES (CAPITAL PER SQUARE MILE) 24 29 3 8 28 23 16 14 26 15 25 12 9 2 17 27 19 7 4 6 22 10 5 1 21 18 11 13 20 RANK IN LITERACY OF NATIVE WHITES 23 24 2 21 22 27 14 17 13 25 18 5 15 1 9 16 19 6 11 7 29 12 10 8 20 28 3 26 4 1 5 1 13 6 4 2 3 13 10 7 7 6 1 8 11 1 7 1 7 2 5 7 1 10 8 13 16 D" 1 25 1 169 36 16 4 9 169 100 49 49 36 1 64 121 1 49 1 49 4 25 49 1 100 64 169 256 r = 1 - 62D» P.E. .10 NCN^'-l) =: 1 - 6X1618 29x840 .60 1618 found directly from the original items. This is done by the use of the formula given in Table XIX. An average of zero is assumed for both series, and the items are treated as positive deviations from this average. The standard deviations are found by the modified formula explained on page 41. CORRELATION 147 The Method of Rank-differences. One further modi- fication of the Pearson method of correlation, known as the method of rank-differences, may be noted. This method has the advantage of simphcity, and is espe- cially applicable to comparisons which are made on the basis of approximate measurements only. In Table XX this method is illustrated by applying it to a com- parison of the ranking of twenty-nine states in 1860 for manufacturing and literacy. The rankings as here shown are based upon the census of 1860. In arrang- ing such rankings, ties may sometimes occur. In such a case the average rank of the tied items is applied to each of the items. Thus if the second and third items happen to be equal, each is ranked 2i/^ ; if the second, third, and fourth are equal, each is ranked 3. When the rankings have been tabulated, as shown, the dif- ference between the two ranks for each state is found. These differences are then squared, and the squares totaled. The formula, as given at the foot of the table, is an adaptation from the one last discussed. Apply- ing the formula, we find that a correlation of .60 exists between the two series. This comparison is an illustration of a number of interesting relationships which may be shown to exist between the economic and the social environment. Conclusion. The purpose of this chapter will have been served if the student has gained a knowledge of the simpler methods commonly employed in measuring correlation. The full theory of the subject is very complex, and is hardly within the scope of an introduc- tory course. A caution must be sounded, however, against an undiscriminating application of the meth- 148 INTRODUCTION TO ECONOMIC STATISTICS ods here explained. In particular, conclusions stating causal relationships should never be based on mathe- matical processes alone. The data, their methods of collection, and the concrete realities they are assumed to measure, must all be subjected to careful scrutiny. The same caution may indeed very properly be ex- tended to the whole field of statistical methods. These methods should prove to be valuable tools in the inter- pretation of physical, biological, and social phenomena, but they may be a source of positive error if their use is not directed by an adequate comprehension of the field of knowledge in which they are employed. REFERENCES Bowley, Arthur L., Elements of Statistics (4th Edition), Part II, Chapters VI-IX. Jevons, W. Stanley, The Principles of Science. King, W. I., "The Correlation of Historic Economic Vari- ables," Quarterly Publications of the American Statistical Association, December, 1917, pp. 847-853. Persons, W. W., "The Correlation of Economic Statistics," Quarterly Publications of the American Statistical Asso- ciation, December, 1910, pp. 287-322. Secrist, Horace, Readings and Problems in Statistical Meth- ods, Chapter X. West, Carl S., Introduction to Mathematical Statistics. Yule, G. U., An Introduction to the Theory of Statistics, Chapters IX-XII. EXERCISES 1. On separate sheets of cross-section paper having the same horizontal scale, and! with the vertical scales so ad- justed as to bring the deviations as nearly as possible to the same measured average, plot the cycles of pro- duction and price as obtained in exercises 1 and 2 of the preceding chapter. Similarly plot the cycles in the interest rate, measuring them from Figure 12, and the CORRELATION 149 price cycles as shown in Figure 6 (pp. 118 and 101). Copy these cycles on tracing paper. Describe the correla- tion of production and price of the two crops (allowing a lag of one year for prices), and of the interest and price cycles. 2. By the method of concurrent deviations, measure the following correlations (Tables X, XI, and XV, pp. 51, 53, and 81) : (a) Wholesale prices and per capita produc- tion (both concurrently, and allowing a lag of one year for prices), and (b) Wholesale prices and real wages. 3. Using the table given in exercise 8 of the preceding chap- ter, and the lines of least squares there obtained, measure by the Pearson "r" the correlation of production and price of iron for each year there studied. 4. From the data on page 53, find the correlation between wages and the cost of living for the years 1913-1920 inclusive (Pearson "r"). Measure the deviations from the average of each series, respectively; that is, assume a horizontal trend, 5. Reduce the deviations obtained in the preceding exer- cise to units of the standard deviation of each series, respectively, and plot as coordinates the two sets of deviations thus obtained. Compute the line of least squares for the points so plotted, and show that the slope of this line is identical with the coefficient of cor- relation. 6. Correlate the following indexes (Pearson "r") taking the deviations from the average, without finding a trend. Explain the significance of the result. Year Prices Unemployment 1912 110 70 1913 100 120 1914 90 140 1915 90 100 1916 110 70 7. Find the Pearson coefficient of correlation for the in- dexes of normal seasonal variations of merchandise ex- ports from the United States and the price of sterling exchange at New York, as follows: 150 INTRODUCTION TO ECONOMIC STATISTICS Month Exports Sterling January 110 100 February 95 108 March 99 109 April 90 115 May 87 116 June 80 120 July 78 119 August 85 106 September 98 74 October 125 70 November 123 80 December 130 83 (a) Find the Pearson coefficient of correlation measur- ing the relationship between the following indexes of seasonal variation in the visible supply and the price of wheat in the United States, based on the years 1909-1913. (b) Find the coefficient, as before, but assuming that prices tend to anticipate the supply by about a month. Visible Price Month supply (first of mo.) January 139 97 February 130 100 March 122 101 April 112 101 May 89 103 June 69 106 July 52 104 August 60 100 September 77 97 October 99 97 November 118 98 December 133 96 The following correlation table presents entrance groups (vertical scale) and scholarship groups (horizontal scale) for a certain class of students. Find the coef- ficient of correlation (Pearson "r") and the probable error. CORRELATION 151 10 9 8 7 6 5 4 3 2 1 2 1 1 1 1 2 1 1 1 4 2 3 1 1 1 1 2 1 1 1 1 1 3 2 1 2 10. The following correlation table classifies to the nearest twenty-five per cent sixty-nine important commodities according to their price indexes (base, 1913) in May, 1920, and May, 1921 {Monthly Labor Review, Aug., 1921, pp. 84-85). The correlation measures approxi- mately the evenness of the price changes occurring between the two dates. Find Pearson's "r." 400 ^ 375 2 350 .325 ^ 300 § 275 g-250 ^ 200 M 175 flj 150 •i 125 Ph 100 75 Price Indexes, May, 1920 oicoiooicoiooiooiooinoiooiooinoiooiooio oocimt^ocMiot^ooainii^oiMkot^oocimi^-oiMiot^ooa 1 113 1 2 1 1 1248859444545111 1 69 11. The following table shows the ranking of states in (a) Noted men bom in state, per 1000 population in 1880, (b) Population per square mile in 1890, and (c) Per cent of urban population in 1890. By the method of rank-differences measure the correlations existing among these three series. (a) Alabama 23 Arkansas 29 (b) (c) 24 25.5 28 28 152 INTRODUCTION TO ECONOMIC STATISTICS Connecticut 4 Delaware 8 Florida 28 Georgia 26 Illinois 16 Indiana 15 Iowa 19 Kentucky 18 Louisiana 25 Maine 5 JMaryland 10 Massachusetts 2 Michigan 17 Mississippi 27 Missouri 21 New Hampshire 3 New Jersey 11 New York 7 North Carolina 22 Ohio 9 Pennsylvania 12 Rhode Island 6 South Carolina 20 Tennessee 24 Vermont 1 Virginia 13 Wisconsin 14: 4 3 9 11 29 20 22 23 10 10 11 17 20 19 12 21 26 18 27 9 7 8 2 2 18.5 14 25 29 16 16 14 6 3 5 5 4 21 27 8 12 6 7 1 1 17 25.5 13 24 18.5 13 15 22 23 15 APPENDIX I LABORATORY MATERIAL AND REFERENCES Equipment for Graphic Work. While the larger part of statistical work may be done without any systematic train- ing in mechanical drawing, yet some degree of skill in this field is necessary if graphic representations are to be satis- factorily prepared. The necessary degree of skill may readily be acquired. The student should provide himself with a drawing board, celluloid triangles and irregular curves, a ruler with decimal subdivisions of the inch, a ruling pen, some round-pointed and fine pens for lettering, India ink, several styles of cross-section paper, and a loose-leaf note- book. If he is unfamiliar with the use of drafting materials, he should read the introductory directions given in an ele- mentary treatise on mechanical drawing. In addition to the material just mentioned, some of the more complicated apparatus used in an engineer's drafting room will be found useful. This equipment may include a drafting machine, a pantagraph, a line-spacer, a map-meas- urer, and a planimeter. A blue-print outfit is also very use- ful, indeed is almost a necessity, unless some improved copy- ing device like the photostat is available for use. For ele- mentary work a simple S^xlO'' photography printing frame, and corresponding blue-print paper, will be found quite sat- isfactory and inexpensive. Lettering. It is not difficult to learn to draw freehand the italic letters used by draftsmen. Directions for such work will be found in "Lettering for Draftsmen, Engineers, and Stu- dents," by Charles W. Reinhardt (D. Van Nostrand Com- pany, New York), or in other books on the same subject. Other Material. There are several aids to statistical work that will materially lighten the drudgery incidental to long mathematical processes. The most common of these is the slide rule. A ten-inch rule, giving squares and cubes, will be found sufficient for the greater part of the work involving multiplication, division, and powers or roots. The slide rule 153 154 INTRODUCTION TO ECONOMIC STATISTICS is not difficult to use, and should be mastered by every stu- dent of statistics. Besides being an inexpensive and portable device for mathematical operations, it will be found useful in the drawing of logarithmic or ratio graphs, which are now coming into general use. If, however, it is necessary to ob- tain products or quotients accurate to four or five significant figures, a large cylindrical slide rule may be used, such as the Thatcher, though this is less convenient and considerably more expensive. For powers, roots, and reciprocals, elaborate printed tables are obtainable. If possible, an adding machine (listing) should be available for occasional use, such as the Dalton, the Burroughs, or the Federal. While such a machine is a convenience, a calculating machine is an absolute neces- sity if very extensive work is to be attempted; and it is well for the student to become acquainted with its operation. Sev- eral successful models are now on the market, among which may be mentioned the Burroughs, the Comptometer, the Mon- roe, and the Marchant. The last two are dial machines, par- ticularly adapted for subtractions and divisions. For certain kinds of statistical work tabulating machines (Hollerith and Powers types) are required, but these machines are so com- plex and expensive that they can hardly be made available except in the larger laboratories. Recording. Laboratory exercises and other statistical work should be recorded fully, and should be put in clear and neat form. Every graph should be accurately labeled, and the units used in each scale should be indicated. When the scales of a graph do not begin at the zero point, the initial coor- dinates should not be drawn more heavily than the others, since they are likely to be looked upon as base lines if so drawn. In so far as is practicable, the tables of data from which a graph is drawn should accompany the figure, and the source should be noted. Graphing and lettering should be done in pencil first; the pencil draft may then be com- pleted wnth India ink, and the pencil lines erased with a soft rubber. Errors in calculation should not be tolerated. All mathematical operations should be performed twice, or some other reliable method of checking should be adopted. Data copied from an original source should always be carefully verified. Tables and mathematical processes may most con- veniently be recorded on cross-section paper having one-fifth or one-sixth inch spacing. In any given study the conclu- sions should be brought out clearly, and their significance ex- plained. Various Types of Graphs. Most of the types of graphs in APPENDIX I 155 common use have been illustrated in the preceding pages. In addition, mention may be made of certain elementary- types. One of these is the bar diagram, in which bars of uniform width, and proportional in length to given magni- tudes, are used. They are drawn horizontally, except in time series. When they are subdivided, the parts are distinguished by various kinds of cross-hatching and shading. Another is the "pie diagram," in which a circle is subdivided by radii. This diagram is particularly adapted to the representation of percentage subdivisions, such as the relative expenditures for certain classes of goods in a family budget. Another type is the polar chart, designed for graphing seasonal data. Draw- ings of similar surfaces and solids are sometimes used in the representation of given magnitudes. It should be re- membered that, geometrically, magnitudes compared by the use of similar surfaces vary as the square of the dimensions; and by the use of similar solids, as the cube of the dimensions. Sometimes in such drawings it is explicitly stated that the ratio is represented by one dimension only, as when the military forces of different countries are set forth by means of drawings of soldiers whose heights are proportional to the size of the armies. Such drawings are not scientific, how- ever, and are justified only in material of a very popular nature. In general, the use of similar surfaces and solids in the representation of magnitudes is to be avoided. A more complex type of graph is the statistical map. This may be drawn in so many different ways that a general description is impossible. The student having occasion to use it should con- sult the excellent examples contained in the Statistical Atlas of the United States. Sources, References, and Tables. Brief summaries are given below of the principal sources of statistical material, and the textbook references and statistical tables which are most likely to be of use in connection with an introductory course. SOURCES OF STATISTICAL DATA Aldrich Report {Senate Report No: 1394) Annalist Bradstreet's Commercial and Financial Chronicle Dun's Review Federal Reserve Bidletin Financial Review {year hook) 156 INTRODUCTION TO ECONOMIC STATISTICS Monthly Labour Review Monthly Review (Federal Reserve Bank of New York) Monthly Summary of Foreign Commerce of the United States Review of Economic Statisitics [Harvard) Statesman's Yearbook Statistical Abstract of the United States (yearbook) Statistical services: Bahs&n's, Banker's Statistical Corpora- tion, Brookmire's, and Prentice-Hall. Survey of Current Business. United States Census Weather, Crops, and Markets (U. S. Dept. of Agriculture) World Almanac Yearbook of the Department of Agriculture TEXTBOOKS, TABLES, AND GENERAL REFERENCES American Econmnic Review (Bi-monthly) Bailey and Cummings, Statistics Barker, E. H., Computing Tables and Formulas Barlow's Tables Bowley, A. L., Elements of Statistics Brinton, W. C., Graphic Methods for Presenting Facts Copeland, M. T., Business Statistics Davenport, C. B., Statistical Methods Jordan, D. F., Business Forecasting Journal of Political Economy (Monthly) King, W. I., Elements of Statistical Method Marshall, Wm. C, Graphical Methods Quarterly Journal of Economics Quarterly Publications of the American Statistical Associa- tion Secrist, H., An Introduction to Statistical Methods Secrist, H., Readings and Problems in Statistical Methods West, C. S., Introduction to Mathematical Statistics Whipple, G. C, Vital Statistics APPENDIX II TABLE OF POWERS AND ROOTS Square Cube No. Square Cube Root Root 1 1 1 1.000 1.000 2 4 8 1.414 1.259 3 9 27 1.732 1.442 4 16 64 2.000 1.587 6 25 125 2.236 1.709 6 36 216 2.449 1.817 7 49 343 2.645 1.912 8 64 512 2.828 2.000 9 81 729 3.000 2.080 10 100 1,000 3.162 2.154 11 121 1,331 3.316 2.223 12 144 1,728 3.464 2.289 13 169 2,197 3.605 2.351 14 196 2,744 3.741 2.410 15 225 3,375 3.872 2.466 16 256 4,096 4.000 2.519 17 289 4,913 4.123 2.571 18 324 5,832 4.242 2.620 19 361 6,859 4.358 2.668 20 400 8,000 4.472 2.714 21 441 9,261 4.582 2.758 22 484 10,648 4.690 2.802 23 529 12,167 4.795 2.843 24 576 13,824 4.898 2.884 25 625 15,625 5.000 2.924 26 676 17,576 5.099 2.962 27 729 19,683 5.196 3.000 28 784 21,952 5.291 3.036 29 841 24,389 5.385 3.072 30 900 27,000 5.477 3.107 31 961 29,791 5.567 3.141 32 1,024 32,768 5.656 3.174 33 1,089 35,937 5.744 3.207 34 1,156 39,304 5.830 3.239 36 1,225 42,875 157 5.916 3.271 158 INTRODUCTION TO ECONOMIC STATISTICS Square Cube No. Square Cube Root Root 36 1,296 46,656 6.000 3.301 37 1,369 50,653 6.082 3.332 38 1,444 54,872 6.164 3.361 39 1,521 59,319 6.244 3.391 40 1,600 64,000 6.324 3.419 41 1,681 68,921 6.403 3.448 42 1,764 74,088 6.480 3.476 43 1,849 79,507 6.557 3.503 44 1,936 85,184 6.633 3.530 45 2,025 91,125 6.708 3.556 46 2,116 97,336 6.782 3.583 47 2,209 103,823 6.855 3.608 48 2,304 110,592 6.928 3.634 49 2,401 117,649 7.000 3.659 50 2,500 125,000 7.071 3.684 51 2,601 132,651 7.141 3.708 52 2,704 140,608 7.211 3.732 53 2,809 148,877 7.280 3.756 54 2,916 157,464 7.348 3.779 55 3,025 166,375 7.416 3.802 56 3,136 175,616 7.483 3.825 57 3,249 185,193 7.549 3.848 58 3,364 195,112 7.615 3.870 59 3,481 205,379 7.681 3.892 60 3,600 216,000 7.745 3.914 61 3,721 226,981 7.810 3.936 62 3,844 238,328 7.874 3.957 63 3,969 250,047 7.937 3.979 64 4,096 262,144 8.000 4.000 65 4,225 274,625 8.062 4.020 66 4,356 287,496 8.124 4.041 67 4,489 300,763 8.185 4.061 68 4,624 314,432 8.246 4.081 69 4,761 328,509 8.306 4.101 70 4,900 343,000 8.366 4.121 71 5,041 357.911 8.426 4.140 72; 5,184 373,248 8.485 4.160 APPENDIX II 159 Square Cube No. Square Cube Root Root 73 5,329 389,017 8.544 4.179 74 5,476 405,224 8.602 4.198 75 5,625 421,875 8.660 4.217 76 5,776 438,976 8.717 4.235 77 5,929 456,533 8.774 4.254 78 6,084 474,552 8.831 4.272 79 6,241 493,039 8.888 4.290 80 6,400 512,000 8.944 4.308 81 6,561 531,441 9.000 4.326 82 6,724 551,368 9.055 4.344 83 6,889 571,787 9.110 4.362 84 7,056 592,704 9.165 4.379 85 7,225 614,125 9.219 4.396 86 7,396 636,056 9.273 4.414 87 7,569 658,503 9.327 4.431 88 7,744 681,472 9.380 4.447 89 7,921 704,969 9.433 4.464 90 8,100 729,000 9.486 4.481 91 8,281 753,571 9.539 4.497 92 8,464 778,688 9.591 4.514 93 8,649 804,357 9.643 4.530 94 8,836 830,584 9.695 4.546 95 9,025 857,375 9.746 4.562 96 9,216 884,736 9.797 4.578 97 9,409 912,673 9.848 4.594 98 9,604 941,192 9.899 4.610 99 9,801 970,299 9.949 4.626 100 10,000 1,000,000 10.000 4.641 Note: In the above table the last two columns are correct to three decimal places, without allowance for decimals dropped. York 1 r89 2 »21 3 [39 5 193 6 k02 7 .91 8 Ms 9 m 10 111 11 54 12 90 13f75 15 16 17 18 19 20 21 22 23 , 24 57 1850 2,995,772 23,191,876 7.74 7,135,780,000 307.69 63,452,774 2.74 63,452,774 3,7:2,393 0.16 31,981,739 1,866,100 147,395,456 131,366,526 278,761,982 12.02 1840 1,793,299 17,069,453 9 . o2 3,573,344 0.21 3,573,344 174,698 0.01 1,675,483 1,726,703 79,336,916 106,968,572 186,305,488 10.91 1830 1,793,299 12,866,020 7.17 48,565,407 3.77 48,565,406 1,912,575 0.15 643,105 2,495,400 26,344,295 61,000,000 87,344,295 6.79 1800 843,246 5,308,483 0.30 1 2 3 4 5 82,976,294 6 15.63 7 82,976,294 8 3,402,601 9 0.64 10 317,760 11 224,296 12 16,000,000 \^ 15 16 17 18 10,600,000 19 26,500,000 20 5.00 21 22 23 24 25 A Picture of the Progress of the United States During 120 Years of National Life CompUed (roni umdal Sources by O. F. Austin. StatisUclan or The NaUooal City Bank of New York in. 1910 1900 189U 188U 1870 1800 K i^ rcipts, not pr^loualf I d )lBS.e80.01T. ropMtlrelr. b o INDEX Aggregate method, 56 Aldrich Eeport, 4 American Economic Review, 53, 96, 124, 127 American Statistical Association, 96-148 Annalist, The, 97 Annalist Barometer, 125 Arithmetic mean, 20 Average, 20 Average deviation, 36 B Babson, Roger W., 76, 124, 126 Babson 's index, 65 Bailey, W. B., 19 Bowley, Arthur L., 19, 44, 67, 148 Bradstreet's index, 64, 124 Bureau of Labor Statistics, 51, 56, 63, 113, 124 Barnett, George E., 66 Brinton, W. C, 44 Business barometers, 115 Business cycles, 123 Business statistics, 3 Circulation, 98 Concurrent deviations, 132 Continuous series, 5 Correlation, 131 Cost of living, 53 Corn, 76, 77 Cummings, John, 19 Cycles, 101, 113, 137 Day, E. E., 19, 96 Derived table, 13 Discrete series, 5 Dun's index, 64 E Exports, 129, 150 F Falkner, R. P., 3 Farm wages, 70 Federal Reserve Bank of New York, 124 Federal Reserve Board, 64 Federal Reserve Bulletin, 65 Field, J. A., 44 Fisher, Irving, 44, 67, 96 Fisher's index, 86 Food prices, 56 Foreign exchange, 68 Free-hand method, 100 Frequency curve, 9 Frequency polygon, 16 G Graphic work, 153 Hansen, A. H., 124 Howard, Stanley E., 49, 67 Hoffmann, F. L., 96 Hurlin, Ralph G., 126 Income distribution, 92 Indexes, 47, 62, 65 Index of physical production, 75 Index of quantity, 74 Index of value, 74 Ingalls, W. R., 96 Interest rates, 118 Interpolation, 30 Jevons, Stanley W., 148 Jordon, D. F., 121, 126, 127 161 162 INDEX Kemmerer, E. "W., 88, 123, 127 King, W. I., 44, 96, 98 Knauth, Oswald W., 92 Koren, John, 19 Lettering, 153 Line of least squares, 105, 135 Link-relative method, 120 Lorenz curve, 40, 44 M Machine tabulation, 18 Macauley, Frederick E., 92 Marshall, Wm. C, 44 Median, the, 27 Meeker, Royal, 67, 96 Method of rank-differences, 147 Method of averages, 118 Method of semi-averages, 102 Method of standard quantities, 83 Mitchell, Wesley C, 67, 92, 127 Mode, 21 Monthly Labor Review, 29, 70 Monthly Review, 118, 123 Moore, H. L., 126, 127, 144 Moving average, 102 N National Bureau of Economic Re- search, 91, 96 National income, 91 National Industrial Conference Board, 54 Normal curve, 10 Ogive, 34, 36 Primary table, 5 Probable error, 30, 140 Property ownership, 96 Proportional expenditure method. 57 Pareto's law, 94 Q Quantity indexes, 74 Quantity theory, 87 Quartile deviation, 29 Quartile dispersion, 32 Quarterly Journal of Economics, 109, 126 Rank-differences, 147 Ratio paper, 33 Real wages, 48, 51, 53 Rectangular histogram, 25 Reference list, 156 Review of Economic Statistics, 96, 121, 123, 124, 126 Royal Statistical Society, 67 Rugg, H. O., 19 S Salaries, 16 Salaries in Universities, 28 Schedules, 4 Seasonal index, 120 Seasonal variations, 116 Secrist, Horace, 19, 44, 67, 148 Semi-logarithmic paper, 33 Skewness, 43 Slichter, C. S., 105 Sources of data, 155 Standard price, 78 Statistical units, 17 Stewart, Walter W., 67, 96 Parabola trend, 110 Pearson method, 134 Persons, W. W., 121, 127, 148 Peddle, John B., 127 Pig iron, 76, 77, 128 Physical production in U. S., 80, 81, 98 Pratt, Andrew A., 127 Price indexes, theory of, 82 Prices, 51, 55, 60, 63, 65, 77, 113, 128, 142 » » > Tabulation methods, 18 Tally sheet, 11 Tingley, Richard H., 127 Trends, nature of, 100 Types of graphs, 154 Types compared, 27 Value indexes, 74 Visible supply of wheat, 141, 150 INDEX 163 W Weighted average, 20 Weights, cost of living, 71 Wages and salaries, 16 West, Carl S., 148 Wage indexes, 49, 51, 53, 70 Wheat, 76, 77, 129, 141, 150 Wage roll, 7 Whipple, G. C, 44 Walsh, C. M., 97 Working, Holbrook, 97 r^ov This book is DUE on the last date stamped below BEC'D URL CiRC ^JUL271393 m ^^^^ JUL 2 9 1348 OCT >^l OCT 1 9 1950 Form L-9-10m-5,'28 'i^-'-"---l£JS'*''" \29ICat, i/;''7,^„/o,sA,,,,H,. L 006 91?%" r itJC SOUTHERN REGIONAL LIBRARY FACILITY AA 001 1 70 673 6 ERSITY of CALIFORNIA, AT LOS ANGELES LIBRARY