jHR' ' §li UNIVERSITY Of ILLINOIS LIBRARY AX UREANA CHAMPAIGN STACKS Digitized by the Internet Archive in 2011 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/dataanalysis540brig -of i^ Faculty Working Papers College of Commerce and Business Administration University of Illinois at U rba n a - Cha m p a ig n FACULTY WORKING PAPERS College of Commsrce and Business Administration University of Illinois at Urbana-Champaign January 16, 19TJ DATA ANALYSIS Gerald D, Brighton, Professor, Department of Accountancy Robert H. Michaelson, Graduate Student in Accountancy #5A0 Summary : In any comprehensive res-E.^'rch project, there are essentially five steps. First, one starts with a li'"i="::ture review with regard to a particular research question. Second, one seeks to develop a theory. Third, the research question is finalized, frequently in the form of a hypothesis to be tested. Fourth, data are collected. Fifth, the subject matter of this paper, the data are analyzed in order to come to a resolution of the research question. There are two general approaches to analyzing research data. If the data were gathered concerning a "research question," a description of the data may be sufficient. However, if the data were gathered to accept or reject a formal hypothesis, statistical analysis is usually in order. This paper briefly surveys the principal data analysis methodologies that are available. tl ■ i.-:r.- .'-?■■■, ;;t: 1. Introduction In any comprehensive research project, there are essentially five steps. First, one starts with a literature review ^rf.th regard to a particular research question. Second, one seeks to develop a theory. Third, the research question is finalized, frequently in the form of a hypothesis to be tested. Fourth, data are collected. Fifth, the sub- ject matter of this paper, the data are analyzed in order to come to a resolution of the research question. There are two general approaches to analyzing research data. If the data were gathered concerning a "research question," a description of the data may be sufficient. Also, data may be so obvious that a statistical test is not really necessary. However, if the data were gathered to accept or reject a formal hypothesis, statistical analysis is usually in order. Since this is a brief paper intended only to remind the aspiring researcher of the principal data analysis methodologies that are avail- able, it of necessity includes some "technical jargon" that is not fully explained. To help in the solution of that potential problem, we have cited a general reference just below and are citing specific references in each section of the paper. General reference: Kerlinger, Fred N., Foundations of Behavioral Research , New York: Holt, Rinehart , & Winston, 1973, II. Nonstatistical Methods Once data have been collected concerning a study, it will always be necessary to arrange these data in a logical manner, whether or not statistical analysis is employed, so that the relationships of the re- search problem can be studied. If statistical analysis is not employed, -2- the data are still analyzed in a descriptive manner, often with the use of frequency distributions, crossbreaks, and graphs. Once the data are properly analyzed, interprenation can proceed so that conclusions can be drawn concerning the res(;arch relationships studied. In addition to informal description, formalized methods of nonsta- tistical data analysis have been developed. This formalization concerns the manner In which the data are measured, categorized, ordered, and summarized. For example, scales are employed to measure variables. A scale is a set of symbols o-.: numerals so constructed that they can be assigned by rule to individuals to whom the scale is applied, the assign- ment being indicated by the individual's possession of whatever attrib'-*"' the scale is supposed to measure. As an example of a particular appli- cation, a scaling technique that has been used to analyze the decisions of Tax Court judges is a cumulative attitude scale called the Guttman scale. ; Guttman scaling postulares that if the responses to a set of items referring to an attitude area can be arranged in certain specified ways, that area is scalable and a respondent's rank makes it possible to pre- dict exactly which iteiTis were answered favorably. If a set of data is scalable, scalogram analysis will establish an order among the data and pinpoint deviations from thii group. This m.ethod is simpler than sophis- ticated statistical methods, but does not provide their detailed speci- fication of pivotal factors. Reference: Edwatds, A., Techniques of Attitude Scale Construction ,: ' New York: Appletoi, 1957. -3~ Again, even if statistical methods are being applied, it is always necessary first to sort and arrange the data in some logical order. This may be as simple as tables of data. It is a mistake, therefore, to think that nonstatistical analysis methods are not important. The real point is that increasingly they are not sufficient . The emphasis in research on statistical methods is because statis- tical methods are a more scientific approach and consequently increase the acceptability of the results. The remainder of this discussion con- cerns statistical methods of data analysis. III. Modern Statistical Methods A. Situations With Only One Independent and Dependent Variable (or Only Two Groups ) The most common situations with one independent and dependent variable include comparison of a mean, variance, correlation coefficient, or proportion of a sample with the same property of another sample or of the population. Because of the simplicitj' in experimental design that is required for such statistics, they are often inapplicable. Reference: Glass, G. , and Stanley, J,, Statistical Methods in Education and Psychology , Englewood Cliffs, N.J.: Prentice- Hall, Inc., 1970. B. Situations With More Than One Independent or Dependent Variable (or More Than Two Groups ) In more complex situations, three basic approaches are available: analysis of variance, regression analysis, and factor analysis. Each of these approaches is superior to the others in a particular set of circum- stances. In the remainder of this discussion, the question of when each of these approaches is preferred will be briefly sketched. -4- 1. Analysis of variance (anova) Analysis of variance is a method of identifying, breaking down, and testing for statistical significance the variances in the dependent variable that come from different sources of variation. According to Kerllnger (general reference, p. 238), analysis of variance is the pre- ferable method of data analysis for the following reasons. It: (1) Permits us to test several hypotheses at one time. (2) Permits us to test hypotheses that cannot be tested in any other way, at least with precision. (3) Gives insight into research approaches and methods by focusing sharply and constantly on variance thinking, by making clear the close relationship between research problems and statistical methods and inference, and by clarifying the structure of research design. Although analysis of variance would virtually always be preferable if It were applicable, applicability Is a serious problem. In general, anova is best suited to experimental research In which the subjects can be randomly assigned to cells (groups), the group n's thus kept equal and the assumptions behind the method more or less satisfied (Kerllnger, general reference, pp. 268-269). Anova is not nearly so well suited to expost (data retrieval or field study) research, or experimental re- search that uses a number of nonexperimental (attribute) variables, for the following reasons: (1) In such cases, interaction can be caused by some extraneous, un- wanted, uncontolled effect. (2) If the n's in the cells of a factorial design are not equal or proportionate, the independence of the Independent variables is -5- Itnpaired. While adjustments can be made, they are awkward and not too satisfactory. In nonexperimental anova, the n's get beyond the control of the researcher. Also, In experiments with more than one categorical variable (like race and sex), n's almost nec- essarily become unequal. A brief description of the various types of anova follows: - a. Factorial Anova Factorial anova is the statistical method that analyses the inde- pendent and interactive effects of two or more Independent variables on a dependent variable. Reference: Kirk, R, , Experlm-ental Design; Procedures for the Behavioral Sciences , Belmont, Calif.: Brooks/Cole, 196S. b. Nonparametric Anova Nonparametric statistics use properties of data other than the strictly quantitative. These statistics are based on properties of data that can be tested against chance expectation: rank, range, periodicity, distribution, etc. Reference: Slegel, Nonparametric Statistics for the Behavioral Sciences , New York: McGraw-Hill, 1956. c. Analysis of Covariance (Ancova) Ancova is used when it is necessary to study groups as they are; when subjects cannot be matched or assigned at random. Ancova tests the significance of the differences hetvjeen means of final experimental data by taking into account the correlation between the dependent variable and one or more covariates, and by adjusting initial mean dif- ferences in the experimental groups. Reference: Kirk, R. , Experimental Design: Procedures for the Behavioral Sciences, Belmont, Califs : Brooks/Cole, 1968. d. Multivariate Anova Multivariate anova is the generalization of anova to any number of independent variables and any number of dependent variables. Reference: Tatsuol<:a, M. , Multivariate Analysis; Techniques for Educational and Psychological Research ^ New York: Wiley, 1*971. 2. Regression Analysis Regression analysis is a method for studying the effects and the magnitude of the effects of one or more independent variables on one or more dependent variables using principles of correlation and regression. According to Kerlinger (general reference, p. 268), regression analysis is the best method for analyzing data when the unequal n problem (dis- cussed under anova) appears. Tnis problem is most common with expost (data retrieval or field study) research, or experimental research that uses a number of nonexperimental (attribute) variables. A brief description of the various types of regression analysis follows : a. Multiple Regression Analysis Multiple regression analysis is a method for studying the effects and the magnitudes of the effects of more than one independent variable on one dependent variable using principles of correlation and regres- sion. This analysis can be applied to cross-sectional data, time-series data, or a combination of both. Reference: Kerlinger, F. , and Pedhazur, E. , Multiple Regression in Behavioral Research , Nevv? York: Holt, Rinehart and Winston, 1973. b. Discriminant Analysis A discriminant function is a regression equation with a dependent variable that represents group membership. The function maximally -7- dlscriminates the members of the group; It tells us to which group each member probably belongs. In short, if we have two or more independent variables and the members of say, three groups, the discriminant func- tion gives the "best" prediction, in the least squares sense, of the correct group membership of each member of the sample. The discriminant function, then, can be used to assign individuals to groups on the basis of their scores on two or more measures. The discriminant approach generally provides the most accurate predictions in the sense of quan- tity right divided by predictions made. Reference: Tatsuoka, M. , Discriminant Analysis: The Study of Group Differences , Cham.paign, 111.: Institute for Personality and Ability Testing, 1970. c. Canonical Correlation This is simply multiple regression analysis with more than one dependent variable. There are limitations in the interpretation of the results it yields. Reference: Cooley, w. , and Lohnes , P., Multivariate Data Analysis , New York: Wiley, 1971. 3. Factor Analysis Factor analysis is a method for determining the number and nature of the underlying variables among larger numbers of measures. It serves the cause of scientific parsimony. It reduces the multiplicity of tests and measures toward greater simplicity. It tells us, in effect, what tests or measures belong together — which ones virtually measure the same thing, and hov? much they do so. It thus is a technique to reduce the number of variables with which the scientist must cope. Reference: Harraan, H. , Modern Factor Analysis , 2nd ed . , Chicago: University of Chicago Press, 1967. IV. Conclusion Obviously, no one method of data analysis is appropriate in all cir- cumstances. The best method must be determined on a project by project basis using the following considerations: i 1. Nature of the data available. See discussion under anova. 2. Intent or essence of the hypothesis or research question. 3. Feasibility. 4. Cost. 5. Skill of the researcher. Finally, since data analysis is the last of five steps In the re- search process, it will need to be appropriate to the establishing of the validity of the data resulting from the preceding steps. , D/16 .f\^.