Fuzzy linear regression-based detection of earnings management Fuzzy linear regression-based detection of earnings management Henrik Höglund ⇑ Hanken School of Economics, Handelsesplanaden 2, 65101 Vasa, Finland a r t i c l e i n f o Keywords: Earnings management Discretionary accruals Fuzzy linear regression a b s t r a c t A large number of accounting studies have examined the occurrence of earnings management in various contexts. In most of these studies, the earnings management detection model is based on the linear regression model suggested by Jones (1991). A considerable problem with the Jones model is the require- ment of long time series of financial statement data. An alternative to estimating the linear regression model coefficients with ordinary least squares (OLS) is to use fuzzy linear regression (FLR) instead. One of the main advantages with FLR described in the literature is its ability to handle small data sets. The purpose of this study is to compare the performance of the OLS-based Jones model with the perfor- mance of the FLR-based Jones model. The results show that the performance of both types of models decreases when the length of the time series decreases and that there is no significant difference in the estimated discretionary accruals between the models. The results also show that the FLR-based Jones model outperforms the OLS-based Jones model in detecting simulated earnings management when the estimation time series is short. Overall, the results show that the FLR-based Jones model is a feasible alternative to the OLS-based Jones model, especially when the length of the estimation time series is restricted by data availability. � 2013 Elsevier Ltd. All rights reserved. 1. Introduction A large number of accounting studies have examined the occur- rence of earnings management in various contexts. There is, for example, evidence suggesting that firms manage earnings in order to avoid violating debt covenants (DeFond & Jiambalvo, 1994) or to influence initial public offering (IPO) valuations (Teoh, Welch, & Wong, 1998). A major challenge in earnings management studies is to find a measure for how much firms have managed their earn- ings. A common assumption is that earnings are managed through accounting accruals. With this assumption the measure of earnings management is the unexpected part of a firm’s total accruals. Var- ious models have been suggested for dividing the total accruals into non-discretionary (expected) and discretionary (unexpected) accruals. Most of these models are based on the earnings manage- ment detection model suggested by Jones (1991). The Jones model is a linear regression model where the level of total accruals is as- sumed to be explained by property, plant and equipment and the change in sales. In the original Jones model, the regression coeffi- cients are estimated using a firm specific time series comprising data prior to the event year. The expected level of accruals is then calculated using the estimated coefficients with event year data. A considerable problem with the Jones model is the require- ment of long time series of financial statement data. Typically, in studies where the Jones model is used the requirement is set to at least ten years of data prior to the event year (e.g. Dechow, Sloan, & Sweeney, 1995; Thomas & Zhang, 2000). This requirement might lead to several problems, such as survivorship bias and non-station- ary regression coefficients (Peasnell, Pope, & Young, 2000; Young, 1999). An alternative to estimating the firm specific regression coefficients with ordinary least squares (OLS) regression is using fuzzy linear regression (FLR), first introduced by Tanaka, Uejima, and Asai (1982). Contrary to probability theory-based OLS regres- sion, FLR is based on possibility theory and fuzzy set theory. With FLR the objective is to minimize the fuzziness of the model rather than the sum of squared residuals. One of the main advantages with FLR described in the literature is its ability to handle small data sets (Shapiro, 2004). This was corroborated empirically by Kim, Mosko- witz, and Koksalan (1996) who provided evidence that FLR outper- forms statistical linear regression in predictive capability with small data sets. Thus, it is possible that the Jones model regression could be run with shorter time series using FLR. If this is the case, an FLR-based Jones model could reduce several problems originating from the requirement of long time series. The purpose of this study is to compare the performance of the OLS-based Jones model with the performance of the FLR-based Jones model. The focus is on the time series version of the Jones model and the comparison is made using different lengths of the time series. The remainder of this study is organized as follows. The basic operating principle of the linear regression-based accrual models is covered in Section 2. In Section 3 the estimation of FLR 0957-4174/$ - see front matter � 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.05.046 ⇑ Tel.: +358 (0)40 3521768. E-mail address: henrik.hoglund@hanken.fi Expert Systems with Applications 40 (2013) 6166–6172 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a http://crossmark.dyndns.org/dialog/?doi=10.1016/j.eswa.2013.05.046&domain=pdf http://dx.doi.org/10.1016/j.eswa.2013.05.046 mailto:henrik.hoglund@hanken.fi http://dx.doi.org/10.1016/j.eswa.2013.05.046 http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa coefficients and prediction with FLR models are covered. The re- search design is presented in Section 4 and the results from the empirical study are presented in Section 5. Section 6 concludes the study. 2. Time series-based discretionary accrual estimation models During the past 30 years a large number of studies have exam- ined the occurrence of earnings management in various contexts. Typically, the assumption in these studies has been that deviations from a normal or expected level of accruals constitute earnings management. In the early earnings management detection models the expected level of accruals was defined as the average accruals over an estimation period (Healy, 1985) or as the previous year accruals (DeAngelo, 1986). The shortcoming of these models is that they do not consider changing circumstances of the firms. For example, a normal increase in accruals as a result of an increase in sales would turn up as earnings management. Jones (1991) re- laxed the assumptions that accruals remain stationary over time by suggesting a regression approach where the level of total accru- als (TACC) is explained by the change in sales (DREV) and property, plant and equipment (PPE). The change in sales is assumed to ex- plain the current accruals, such as receivables, inventory and pay- ables, whereas property, plant and equipment should mainly explain the level of depreciation. TACCt TAt�1 ¼ a0 1 TAt�1 þ a1 DREV t TAt�1 þ a2 PPEt TAt�1 Typically, the variables in the Jones model are deflated with lagged total assets (TA) in order to reduce heteroscedasticity. The coeffi- cients a0. . .a2 in the Jones model are estimated using a sufficiently long firm specific time series. The coefficients are then used to calcu- late the expected level or the non-discretionary part of total accruals using event period data. The difference between the observed event period total accruals and the calculated non-discretionary accruals is considered as unexpected accruals or discretionary accruals equal- ing earnings management. Jones (1991) used 13 years of data for estimating the regression coefficients whereas subsequent studies have typically settled for a minimum of 10 years (e.g. Dechow et al., 1995; Thomas & Zhang, 2000). A firm specific time series this long might lead to several problems. First, issues with survivorship bias are likely to arise (Bartov, Gul, & Tsui, 2000; Peasnell et al., 2000; Young, 1999). Second, a large number of firms do not have data stretching back ten years, reducing the size of the data set. Third, it is unlikely that the regression model coefficients remain stationary over a long period of time (Peasnell et al., 2000). Finally, the self-reversing nature of accruals can lead to problems with serial correlation of residuals (Peasnell et al., 2000). To remedy some of these problems and shortcomings, DeFond and Jiambalvo (1994) suggested a cross-sectional approach where the regression coefficients are industry and year specific rather than firm specific. This approach does, however, also have its shortcomings. First, the industry membership is usually defined at a 2-digit SIC level. This might result in that some industries do not have a sufficient number of observations to run the regression, reducing the size of the data set. Second, the assumption is that the accrual generating process is similar among the firms in the same industry. There is, however, evidence suggesting that this assump- tion does not necessarily hold (Ecker, Francis, & Olsson, 2011; Höglund, 2013; Kothari, Leone, & Wasley, 2005). For example, Eck- er et al. (2011) showed that the performance of the Jones model improved when running the regression with firms matched on lagged total assets rather than on industry membership. 3. Fuzzy linear regression The concept of fuzzy linear regression (FLR) builds on the fuzzy set theory developed by Zadeh (1965) and it was first introduced by Tanaka, Uejima, and Asai (1982). The idea behind FLR is that the deviations between the observed and estimated values are assumed to originate from imprecise observations or vague relations be- tween the model variables. This is contrary to OLS regression where random errors are assumed to be the reason for differences be- tween observed and estimated values. Thus, the uncertainty in FLR models is fuzziness rather than randomness (Yang & Lee, 2002). 3.1. Estimating fuzzy linear regression coefficients In OLS regression the objective is to minimize the sum of squared residuals. In FLR, on the other hand, the objective is to minimize the fuzziness of the model. The general form of the FLR model is: bY i ¼ ~A0i þ ~A1i x1i þ���þ ~Ajixji where x1. . .xj denotes the independent variables, Ã0 the estimated fuzzy intercept coefficient, Ã1. . .Ãj the estimated fuzzy slope coeffi- cients and Ŷ i the estimated fuzzy output. The fuzzy coefficients are generally represented as symmetric triangular fuzzy numbers where aj equals the center value of Ãj and cj the spread (see Fig. 1). However, other than symmetric triangular membership functions, such as asymmetric triangular and trapezoidal (Ishibuchi & Nii, 2001), are also used. Before estimating the coefficients of the FLR, the fit between the FLR model and the data set has to be defined. This is done by set- ting the value of the h term, also called the target degree of belief (e.g. Chang & Ayyub, 2001), between 0 and 1. Each observed fuzzy ~Y i or crisp Yi output must fall within the estimated fuzzy output Ŷ i at h (see Fig. 2). As the value of h increases, the fuzziness of the FLR model also increases. The h value does, however, not affect the cen- ter value of the fuzzy coefficients (Tanaka & Watada, 1988). The FLR coefficients Ã0. . .Ãj are estimated using linear program- ming with the objective of minimizing the total spread or fuzziness of the FLR model. The following objective function, where S de- notes the total fuzziness of the regression model, m the number of independent variables and n the number of observations, is to be minimized: S ¼ Xm j¼0 Cj Xn j¼0 jxjij The FLR linear programming problem comprises two sets of con- straints. First, the spread cj of the fuzzy coefficients Ãj has to be zero or positive. Second, all observed output variables ~Y i must fall within the estimated fuzzy output variables Ŷ i at level h. Yi is the center αj cj m em b er sh ip 1.0 0.0 Ãj cj Fig. 1. Fuzzy coefficient with center value and spread. H. Höglund / Expert Systems with Applications 40 (2013) 6166–6172 6167 https://isiarticles.com/article/24668