jHR' ' 
 
 §li 
 
UNIVERSITY Of 
 
 ILLINOIS LIBRARY 
 
 AX UREANA CHAMPAIGN 
 
 STACKS 
 
Digitized by the Internet Archive 
 
 in 2011 with funding from 
 
 University of Illinois Urbana-Champaign 
 
 http://www.archive.org/details/dataanalysis540brig 
 
-of i^ 
 
 Faculty Working Papers 
 
 College of Commerce and Business Administration 
 
 University of Illinois at U rba n a - Cha m p a ig n 
 
FACULTY WORKING PAPERS 
 College of Commsrce and Business Administration 
 University of Illinois at Urbana-Champaign 
 January 16, 19TJ 
 
 DATA ANALYSIS 
 
 Gerald D, Brighton, Professor, Department of 
 
 Accountancy 
 
 Robert H. Michaelson, Graduate Student in 
 
 Accountancy 
 
 #5A0 
 
 Summary : 
 
 In any comprehensive res-E.^'rch project, there are essentially five steps. 
 First, one starts with a li'"i="::ture review with regard to a particular 
 research question. Second, one seeks to develop a theory. Third, the 
 research question is finalized, frequently in the form of a hypothesis 
 to be tested. Fourth, data are collected. Fifth, the subject matter 
 of this paper, the data are analyzed in order to come to a resolution of 
 the research question. There are two general approaches to analyzing 
 research data. If the data were gathered concerning a "research question," 
 a description of the data may be sufficient. However, if the data were 
 gathered to accept or reject a formal hypothesis, statistical analysis 
 is usually in order. This paper briefly surveys the principal data 
 analysis methodologies that are available. 
 
tl 
 
 ■ i.-:r.- 
 
 .'-?■■■, 
 
 ;;t: 
 
1. Introduction 
 
 In any comprehensive research project, there are essentially five 
 steps. First, one starts with a literature review ^rf.th regard to a 
 particular research question. Second, one seeks to develop a theory. 
 Third, the research question is finalized, frequently in the form of 
 a hypothesis to be tested. Fourth, data are collected. Fifth, the sub- 
 ject matter of this paper, the data are analyzed in order to come to 
 a resolution of the research question. 
 
 There are two general approaches to analyzing research data. If 
 the data were gathered concerning a "research question," a description 
 of the data may be sufficient. Also, data may be so obvious that a 
 statistical test is not really necessary. However, if the data were 
 gathered to accept or reject a formal hypothesis, statistical analysis 
 is usually in order. 
 
 Since this is a brief paper intended only to remind the aspiring 
 researcher of the principal data analysis methodologies that are avail- 
 able, it of necessity includes some "technical jargon" that is not 
 fully explained. To help in the solution of that potential problem, 
 we have cited a general reference just below and are citing specific 
 references in each section of the paper. 
 
 General reference: Kerlinger, Fred N., Foundations of Behavioral 
 Research , New York: Holt, Rinehart , & Winston, 1973, 
 
 II. Nonstatistical Methods 
 Once data have been collected concerning a study, it will always 
 be necessary to arrange these data in a logical manner, whether or not 
 statistical analysis is employed, so that the relationships of the re- 
 search problem can be studied. If statistical analysis is not employed, 
 
-2- 
 
 the data are still analyzed in a descriptive manner, often with the use 
 of frequency distributions, crossbreaks, and graphs. Once the data are 
 properly analyzed, interprenation can proceed so that conclusions can 
 be drawn concerning the res(;arch relationships studied. 
 
 In addition to informal description, formalized methods of nonsta- 
 tistical data analysis have been developed. This formalization concerns 
 the manner In which the data are measured, categorized, ordered, and 
 summarized. For example, scales are employed to measure variables. A 
 scale is a set of symbols o-.: numerals so constructed that they can be 
 assigned by rule to individuals to whom the scale is applied, the assign- 
 ment being indicated by the individual's possession of whatever attrib'-*"' 
 the scale is supposed to measure. As an example of a particular appli- 
 cation, a scaling technique that has been used to analyze the decisions 
 of Tax Court judges is a cumulative attitude scale called the Guttman 
 scale. ; 
 
 Guttman scaling postulares that if the responses to a set of items 
 referring to an attitude area can be arranged in certain specified ways, 
 that area is scalable and a respondent's rank makes it possible to pre- 
 dict exactly which iteiTis were answered favorably. If a set of data is 
 scalable, scalogram analysis will establish an order among the data and 
 pinpoint deviations from thii group. This m.ethod is simpler than sophis- 
 ticated statistical methods, but does not provide their detailed speci- 
 fication of pivotal factors. 
 
 Reference: Edwatds, A., Techniques of Attitude Scale Construction ,: ' 
 New York: Appletoi, 1957. 
 
-3~ 
 
 Again, even if statistical methods are being applied, it is always 
 necessary first to sort and arrange the data in some logical order. 
 This may be as simple as tables of data. It is a mistake, therefore, 
 to think that nonstatistical analysis methods are not important. The 
 real point is that increasingly they are not sufficient . 
 
 The emphasis in research on statistical methods is because statis- 
 tical methods are a more scientific approach and consequently increase 
 the acceptability of the results. The remainder of this discussion con- 
 cerns statistical methods of data analysis. 
 
 III. Modern Statistical Methods 
 
 A. Situations With Only One Independent and Dependent Variable (or 
 Only Two Groups ) 
 
 The most common situations with one independent and dependent variable 
 
 include comparison of a mean, variance, correlation coefficient, or 
 
 proportion of a sample with the same property of another sample or of 
 
 the population. Because of the simplicitj' in experimental design that 
 
 is required for such statistics, they are often inapplicable. 
 
 Reference: Glass, G. , and Stanley, J,, Statistical Methods in 
 
 Education and Psychology , Englewood Cliffs, N.J.: Prentice- 
 Hall, Inc., 1970. 
 
 B. Situations With More Than One Independent or Dependent Variable (or 
 More Than Two Groups ) 
 
 In more complex situations, three basic approaches are available: 
 analysis of variance, regression analysis, and factor analysis. Each of 
 these approaches is superior to the others in a particular set of circum- 
 stances. In the remainder of this discussion, the question of when each 
 of these approaches is preferred will be briefly sketched. 
 
-4- 
 
 1. Analysis of variance (anova) 
 
 Analysis of variance is a method of identifying, breaking down, and 
 testing for statistical significance the variances in the dependent 
 variable that come from different sources of variation. According to 
 Kerllnger (general reference, p. 238), analysis of variance is the pre- 
 ferable method of data analysis for the following reasons. It: 
 
 (1) Permits us to test several hypotheses at one time. 
 
 (2) Permits us to test hypotheses that cannot be tested in any other 
 way, at least with precision. 
 
 (3) Gives insight into research approaches and methods by focusing 
 sharply and constantly on variance thinking, by making clear the 
 close relationship between research problems and statistical methods 
 and inference, and by clarifying the structure of research design. 
 Although analysis of variance would virtually always be preferable 
 
 if It were applicable, applicability Is a serious problem. In general, 
 anova is best suited to experimental research In which the subjects can 
 be randomly assigned to cells (groups), the group n's thus kept equal 
 and the assumptions behind the method more or less satisfied (Kerllnger, 
 general reference, pp. 268-269). Anova is not nearly so well suited to 
 expost (data retrieval or field study) research, or experimental re- 
 search that uses a number of nonexperimental (attribute) variables, 
 for the following reasons: 
 
 (1) In such cases, interaction can be caused by some extraneous, un- 
 wanted, uncontolled effect. 
 
 (2) If the n's in the cells of a factorial design are not equal or 
 proportionate, the independence of the Independent variables is 
 
-5- 
 
 Itnpaired. While adjustments can be made, they are awkward and not 
 too satisfactory. In nonexperimental anova, the n's get beyond 
 the control of the researcher. Also, In experiments with more 
 than one categorical variable (like race and sex), n's almost nec- 
 essarily become unequal. 
 A brief description of the various types of anova follows: - 
 
 a. Factorial Anova 
 
 Factorial anova is the statistical method that analyses the inde- 
 pendent and interactive effects of two or more Independent variables on 
 a dependent variable. 
 
 Reference: Kirk, R, , Experlm-ental Design; Procedures for the 
 Behavioral Sciences , Belmont, Calif.: Brooks/Cole, 196S. 
 
 b. Nonparametric Anova 
 
 Nonparametric statistics use properties of data other than the 
 
 strictly quantitative. These statistics are based on properties of 
 
 data that can be tested against chance expectation: rank, range, 
 
 periodicity, distribution, etc. 
 
 Reference: Slegel, Nonparametric Statistics for the Behavioral 
 Sciences , New York: McGraw-Hill, 1956. 
 
 c. Analysis of Covariance (Ancova) 
 
 Ancova is used when it is necessary to study groups as they are; 
 when subjects cannot be matched or assigned at random. Ancova tests 
 the significance of the differences hetvjeen means of final experimental 
 data by taking into account the correlation between the dependent 
 variable and one or more covariates, and by adjusting initial mean dif- 
 ferences in the experimental groups. 
 
 Reference: Kirk, R. , Experimental Design: Procedures for the 
 Behavioral Sciences, Belmont, Califs : Brooks/Cole, 1968. 
 
d. Multivariate Anova 
 
 Multivariate anova is the generalization of anova to any number of 
 
 independent variables and any number of dependent variables. 
 
 Reference: Tatsuol<:a, M. , Multivariate Analysis; Techniques for 
 
 Educational and Psychological Research ^ New York: Wiley, 1*971. 
 
 2. Regression Analysis 
 
 Regression analysis is a method for studying the effects and the 
 magnitude of the effects of one or more independent variables on one or 
 more dependent variables using principles of correlation and regression. 
 According to Kerlinger (general reference, p. 268), regression analysis 
 is the best method for analyzing data when the unequal n problem (dis- 
 cussed under anova) appears. Tnis problem is most common with expost 
 (data retrieval or field study) research, or experimental research 
 that uses a number of nonexperimental (attribute) variables. 
 
 A brief description of the various types of regression analysis 
 follows : 
 
 a. Multiple Regression Analysis 
 
 Multiple regression analysis is a method for studying the effects 
 and the magnitudes of the effects of more than one independent variable 
 on one dependent variable using principles of correlation and regres- 
 sion. This analysis can be applied to cross-sectional data, time-series 
 data, or a combination of both. 
 
 Reference: Kerlinger, F. , and Pedhazur, E. , Multiple Regression 
 
 in Behavioral Research , Nevv? York: Holt, Rinehart and Winston, 
 1973. 
 
 b. Discriminant Analysis 
 
 A discriminant function is a regression equation with a dependent 
 variable that represents group membership. The function maximally 
 
-7- 
 
 dlscriminates the members of the group; It tells us to which group each 
 member probably belongs. In short, if we have two or more independent 
 variables and the members of say, three groups, the discriminant func- 
 tion gives the "best" prediction, in the least squares sense, of the 
 correct group membership of each member of the sample. The discriminant 
 function, then, can be used to assign individuals to groups on the basis 
 of their scores on two or more measures. The discriminant approach 
 generally provides the most accurate predictions in the sense of quan- 
 tity right divided by predictions made. 
 
 Reference: Tatsuoka, M. , Discriminant Analysis: The Study of 
 
 Group Differences , Cham.paign, 111.: Institute for Personality 
 and Ability Testing, 1970. 
 
 c. Canonical Correlation 
 
 This is simply multiple regression analysis with more than one 
 
 dependent variable. There are limitations in the interpretation of 
 
 the results it yields. 
 
 Reference: Cooley, w. , and Lohnes , P., Multivariate Data Analysis , 
 New York: Wiley, 1971. 
 
 3. Factor Analysis 
 
 Factor analysis is a method for determining the number and nature 
 
 of the underlying variables among larger numbers of measures. It serves 
 
 the cause of scientific parsimony. It reduces the multiplicity of tests 
 
 and measures toward greater simplicity. It tells us, in effect, what 
 
 tests or measures belong together — which ones virtually measure the same 
 
 thing, and hov? much they do so. It thus is a technique to reduce the 
 
 number of variables with which the scientist must cope. 
 
 Reference: Harraan, H. , Modern Factor Analysis , 2nd ed . , Chicago: 
 University of Chicago Press, 1967. 
 
IV. Conclusion 
 Obviously, no one method of data analysis is appropriate in all cir- 
 cumstances. The best method must be determined on a project by project 
 basis using the following considerations: i 
 
 1. Nature of the data available. See discussion under anova. 
 
 2. Intent or essence of the hypothesis or research question. 
 
 3. Feasibility. 
 
 4. Cost. 
 
 5. Skill of the researcher. 
 
 Finally, since data analysis is the last of five steps In the re- 
 search process, it will need to be appropriate to the establishing of 
 the validity of the data resulting from the preceding steps. , 
 
 D/16 
 
.f\^.