Fuzzy linear regression-based detection of earnings management


Fuzzy linear regression-based detection of earnings management

Henrik Höglund ⇑
Hanken School of Economics, Handelsesplanaden 2, 65101 Vasa, Finland

a r t i c l e i n f o

Keywords:
Earnings management
Discretionary accruals
Fuzzy linear regression

a b s t r a c t

A large number of accounting studies have examined the occurrence of earnings management in various
contexts. In most of these studies, the earnings management detection model is based on the linear
regression model suggested by Jones (1991). A considerable problem with the Jones model is the require-
ment of long time series of financial statement data. An alternative to estimating the linear regression
model coefficients with ordinary least squares (OLS) is to use fuzzy linear regression (FLR) instead.
One of the main advantages with FLR described in the literature is its ability to handle small data sets.
The purpose of this study is to compare the performance of the OLS-based Jones model with the perfor-
mance of the FLR-based Jones model. The results show that the performance of both types of models
decreases when the length of the time series decreases and that there is no significant difference in
the estimated discretionary accruals between the models. The results also show that the FLR-based Jones
model outperforms the OLS-based Jones model in detecting simulated earnings management when the
estimation time series is short. Overall, the results show that the FLR-based Jones model is a feasible
alternative to the OLS-based Jones model, especially when the length of the estimation time series is
restricted by data availability.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

A large number of accounting studies have examined the occur-
rence of earnings management in various contexts. There is, for
example, evidence suggesting that firms manage earnings in order
to avoid violating debt covenants (DeFond & Jiambalvo, 1994) or to
influence initial public offering (IPO) valuations (Teoh, Welch, &
Wong, 1998). A major challenge in earnings management studies
is to find a measure for how much firms have managed their earn-
ings. A common assumption is that earnings are managed through
accounting accruals. With this assumption the measure of earnings
management is the unexpected part of a firm’s total accruals. Var-
ious models have been suggested for dividing the total accruals
into non-discretionary (expected) and discretionary (unexpected)
accruals. Most of these models are based on the earnings manage-
ment detection model suggested by Jones (1991). The Jones model
is a linear regression model where the level of total accruals is as-
sumed to be explained by property, plant and equipment and the
change in sales. In the original Jones model, the regression coeffi-
cients are estimated using a firm specific time series comprising
data prior to the event year. The expected level of accruals is then
calculated using the estimated coefficients with event year data.

A considerable problem with the Jones model is the require-
ment of long time series of financial statement data. Typically, in

studies where the Jones model is used the requirement is set to
at least ten years of data prior to the event year (e.g. Dechow, Sloan,
& Sweeney, 1995; Thomas & Zhang, 2000). This requirement might
lead to several problems, such as survivorship bias and non-station-
ary regression coefficients (Peasnell, Pope, & Young, 2000; Young,
1999). An alternative to estimating the firm specific regression
coefficients with ordinary least squares (OLS) regression is using
fuzzy linear regression (FLR), first introduced by Tanaka, Uejima,
and Asai (1982). Contrary to probability theory-based OLS regres-
sion, FLR is based on possibility theory and fuzzy set theory. With
FLR the objective is to minimize the fuzziness of the model rather
than the sum of squared residuals. One of the main advantages with
FLR described in the literature is its ability to handle small data sets
(Shapiro, 2004). This was corroborated empirically by Kim, Mosko-
witz, and Koksalan (1996) who provided evidence that FLR outper-
forms statistical linear regression in predictive capability with
small data sets. Thus, it is possible that the Jones model regression
could be run with shorter time series using FLR. If this is the case, an
FLR-based Jones model could reduce several problems originating
from the requirement of long time series.

The purpose of this study is to compare the performance of the
OLS-based Jones model with the performance of the FLR-based Jones
model. The focus is on the time series version of the Jones model and
the comparison is made using different lengths of the time series.

The remainder of this study is organized as follows. The basic
operating principle of the linear regression-based accrual models
is covered in Section 2. In Section 3 the estimation of FLR

0957-4174/$ - see front matter � 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2013.05.046

⇑ Tel.: +358 (0)40 3521768.
E-mail address: henrik.hoglund@hanken.fi

Expert Systems with Applications 40 (2013) 6166–6172

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a

http://crossmark.dyndns.org/dialog/?doi=10.1016/j.eswa.2013.05.046&domain=pdf
http://dx.doi.org/10.1016/j.eswa.2013.05.046
mailto:henrik.hoglund@hanken.fi
http://dx.doi.org/10.1016/j.eswa.2013.05.046
http://www.sciencedirect.com/science/journal/09574174
http://www.elsevier.com/locate/eswa


coefficients and prediction with FLR models are covered. The re-
search design is presented in Section 4 and the results from the
empirical study are presented in Section 5. Section 6 concludes the
study.

2. Time series-based discretionary accrual estimation models

During the past 30 years a large number of studies have exam-
ined the occurrence of earnings management in various contexts.
Typically, the assumption in these studies has been that deviations
from a normal or expected level of accruals constitute earnings
management. In the early earnings management detection models
the expected level of accruals was defined as the average accruals
over an estimation period (Healy, 1985) or as the previous year
accruals (DeAngelo, 1986). The shortcoming of these models is that
they do not consider changing circumstances of the firms. For
example, a normal increase in accruals as a result of an increase
in sales would turn up as earnings management. Jones (1991) re-
laxed the assumptions that accruals remain stationary over time
by suggesting a regression approach where the level of total accru-
als (TACC) is explained by the change in sales (DREV) and property,
plant and equipment (PPE). The change in sales is assumed to ex-
plain the current accruals, such as receivables, inventory and pay-
ables, whereas property, plant and equipment should mainly
explain the level of depreciation.

TACCt
TAt�1

¼ a0
1

TAt�1
þ a1

DREV t
TAt�1

þ a2
PPEt
TAt�1

Typically, the variables in the Jones model are deflated with lagged
total assets (TA) in order to reduce heteroscedasticity. The coeffi-
cients a0. . .a2 in the Jones model are estimated using a sufficiently
long firm specific time series. The coefficients are then used to calcu-
late the expected level or the non-discretionary part of total accruals
using event period data. The difference between the observed event
period total accruals and the calculated non-discretionary accruals is
considered as unexpected accruals or discretionary accruals equal-
ing earnings management. Jones (1991) used 13 years of data for
estimating the regression coefficients whereas subsequent studies
have typically settled for a minimum of 10 years (e.g. Dechow
et al., 1995; Thomas & Zhang, 2000). A firm specific time series this
long might lead to several problems. First, issues with survivorship
bias are likely to arise (Bartov, Gul, & Tsui, 2000; Peasnell et al.,
2000; Young, 1999). Second, a large number of firms do not have
data stretching back ten years, reducing the size of the data set.
Third, it is unlikely that the regression model coefficients remain
stationary over a long period of time (Peasnell et al., 2000). Finally,
the self-reversing nature of accruals can lead to problems with serial
correlation of residuals (Peasnell et al., 2000).

To remedy some of these problems and shortcomings, DeFond
and Jiambalvo (1994) suggested a cross-sectional approach where
the regression coefficients are industry and year specific rather
than firm specific. This approach does, however, also have its
shortcomings. First, the industry membership is usually defined
at a 2-digit SIC level. This might result in that some industries do
not have a sufficient number of observations to run the regression,
reducing the size of the data set. Second, the assumption is that the
accrual generating process is similar among the firms in the same
industry. There is, however, evidence suggesting that this assump-
tion does not necessarily hold (Ecker, Francis, & Olsson, 2011;
Höglund, 2013; Kothari, Leone, & Wasley, 2005). For example, Eck-
er et al. (2011) showed that the performance of the Jones model
improved when running the regression with firms matched on
lagged total assets rather than on industry membership.

3. Fuzzy linear regression

The concept of fuzzy linear regression (FLR) builds on the fuzzy
set theory developed by Zadeh (1965) and it was first introduced by
Tanaka, Uejima, and Asai (1982). The idea behind FLR is that the
deviations between the observed and estimated values are assumed
to originate from imprecise observations or vague relations be-
tween the model variables. This is contrary to OLS regression where
random errors are assumed to be the reason for differences be-
tween observed and estimated values. Thus, the uncertainty in
FLR models is fuzziness rather than randomness (Yang & Lee, 2002).

3.1. Estimating fuzzy linear regression coefficients

In OLS regression the objective is to minimize the sum of
squared residuals. In FLR, on the other hand, the objective is to
minimize the fuzziness of the model. The general form of the FLR
model is:

bY i ¼ ~A0i þ ~A1i x1i þ���þ ~Ajixji
where x1. . .xj denotes the independent variables, Ã0 the estimated
fuzzy intercept coefficient, Ã1. . .Ãj the estimated fuzzy slope coeffi-
cients and Ŷ i the estimated fuzzy output. The fuzzy coefficients are
generally represented as symmetric triangular fuzzy numbers
where aj equals the center value of Ãj and cj the spread (see
Fig. 1). However, other than symmetric triangular membership
functions, such as asymmetric triangular and trapezoidal (Ishibuchi
& Nii, 2001), are also used.

Before estimating the coefficients of the FLR, the fit between the
FLR model and the data set has to be defined. This is done by set-
ting the value of the h term, also called the target degree of belief
(e.g. Chang & Ayyub, 2001), between 0 and 1. Each observed fuzzy
~Y i or crisp Yi output must fall within the estimated fuzzy output Ŷ i
at h (see Fig. 2). As the value of h increases, the fuzziness of the FLR
model also increases. The h value does, however, not affect the cen-
ter value of the fuzzy coefficients (Tanaka & Watada, 1988).

The FLR coefficients Ã0. . .Ãj are estimated using linear program-
ming with the objective of minimizing the total spread or fuzziness
of the FLR model. The following objective function, where S de-
notes the total fuzziness of the regression model, m the number
of independent variables and n the number of observations, is to
be minimized:

S ¼
Xm
j¼0

Cj
Xn
j¼0
jxjij

The FLR linear programming problem comprises two sets of con-
straints. First, the spread cj of the fuzzy coefficients Ãj has to be zero
or positive. Second, all observed output variables ~Y i must fall within
the estimated fuzzy output variables Ŷ i at level h. Yi is the center

αj
cj

m
em

b
er

sh
ip

1.0

0.0

Ãj

cj

Fig. 1. Fuzzy coefficient with center value and spread.

H. Höglund / Expert Systems with Applications 40 (2013) 6166–6172 6167


https://isiarticles.com/article/24668