Edinburgh Research Explorer 
 
 
Dynamic prediction of financial distress using Malmquist DEA

Citation for published version:
Li, Z, Crook, J & Andreeva, G 2017, 'Dynamic prediction of financial distress using Malmquist DEA', Expert
Systems with Applications, vol. 80, pp. 94-106. https://doi.org/10.1016/j.eswa.2017.03.017

Digital Object Identifier (DOI):
10.1016/j.eswa.2017.03.017

Link:
Link to publication record in Edinburgh Research Explorer

Document Version:
Peer reviewed version

Published In:
Expert Systems with Applications

General rights
Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)
and / or other copyright owners and it is a condition of accessing these publications that users recognise and
abide by the legal requirements associated with these rights.

Take down policy
The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer
content complies with UK legislation. If you believe that the public display of this file breaches copyright please
contact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately and
investigate your claim.

Download date: 06. Apr. 2021

https://doi.org/10.1016/j.eswa.2017.03.017
https://doi.org/10.1016/j.eswa.2017.03.017
https://www.research.ed.ac.uk/portal/en/publications/dynamic-prediction-of-financial-distress-using-malmquist-dea(c2c245a9-3a0a-4208-977f-5e5a721e4945).html


1 

 
Dynamic Prediction of Financial Distress Using Malmquist 

DEA 

 
Zhiyong Lia, b, *, Jonathan Crookb, Galina Andreevab 

a School of Finance, Southwestern University of Finance and Economics, Chengdu 611130, China 

b Credit Research Centre, University of Edinburgh Business School, Edinburgh EH8 9JS, UK 

 
Emails: zhiyong.li@ed.ac.uk (Z. Li), j.crook@ed.ac.uk (J. Crook), Galina.Andreeva@ed.ac.uk (G. 
Andreeva) 

*Corresponding author: Zhiyong Li 
Tel.: +86 185 0822 6040 
Address: Room 521, Gezhi Building, School of Finance, SWUFE, 555 Liutai Avenue, Wenjiang, 

Chengdu 611130, Sichuan, China. 
  

2 

 
Abstract 

Creditors such as banks frequently use expert systems to support their decisions when issuing loans 

and credit assessment has been an important area of application of machine learning techniques for 

decades. In practice, banks are often required to provide the rationale behind their decisions in addition 

to being able to predict the performance of companies when assessing corporate applicants for loans. 

One solution is to use Data Envelopment Analysis (DEA) to evaluate multiple decision-making units 

(DMUs or companies) which are ranked according to the best practice in their industrial sector. A 

linear programming algorithm is employed to calculate corporate efficiency as a measure to distinguis h 

healthy companies from those in financial distress. This paper extends the cross-sectional DEA models 

to time-varying Malmquist DEA, since dynamic predictive models allow one to incorporate changes 

over time. This decision-support system can adjust the efficiency frontier intelligently over time and 

make robust predictions. Results based on a sample of 742 Chinese listed companies observed over 10 

years suggest that Malmquist DEA offers insights into the competitive position of a company in 

addition to accurate financial distress predictions based on the DEA efficiency measures.   

 
Keywords: Malmquist DEA; Bankruptcy prediction; Financial distress; Efficiency; Dynamic model 

 
Highlights: 

- This paper is the first to apply dynamic time-varying efficiency scores in distress prediction. 

- Malmquist DEA is used to produce dynamic efficiency scores over several periods. 

- Various efficiency scores are compared in discrete time hazard models. 

- A comparison is made between generic and industry-specific models. 

 
3 

 
1. Introduction 

Decision-support systems are vital for creditors who need to distinguish between firms that will 

perform well and those that under-perform, and therefore may have difficulties in repaying their loans.  

Such systems use various techniques including traditional statistical models and expert systems or 

intelligent machine- learning algorithms to evaluate the creditworthiness of borrowers. Those 

predictive methods are often regarded as Early Warning Systems to give early signals of potential 

business bankruptcy or financial distress. Numerous applications of machine learning algorit hms  

include Neural Networks (NN) by López Iturriaga & Sanz (2015) , Genetic Algorithm (GA) by Gordini 

(2014), Support Vector Machines (SVM) by Yang, You, & Ji (2011), and Ensemble models that 

combine several statistical and intelligent classifiers (Fedorova, Gilenko, & Dovzhenko, 2013; 

Marques, Garcia, & Sanchez, 2012). These studies compared the predictive accuracy of differe nt 

algorithms and examined statistically significant predictors of bankruptcy or distress. It has been 

shown that machine learning algorithms outperform statistic methods in terms of classificat io n 

accuracy because the objective of machine learning is to minimise misclassified errors so that an 

optimal solution can be found.  

Yet predictive accuracy is not the only feature that lenders are interested in, since there is a need 

(and in many cases a regulatory requirement) to understand and explain the risk drivers or factors that 

affect the probability of financial distress. Many machine- learning techniques, e.g. neural networks 

cannot provide such an explanation in contrast to Data Envelopment Analysis (DEA), which is an 

optimisation algorithm based on linear programming. It allows one to find the efficiency frontier (or 

benchmark) so that relative efficiency of Decision Making Units (DMUs) can be measured by the 

distance to this frontier. Cielen, Peeters, & Vanhoof (2004) argued that DEA as a type of machine 

learning technique can provide insights into the value of a company’s efficiency for bankruptcy 

prediction. The idea was developed further by Xu & Wang (2009), Psillaki, Tsolas, & Margaritis 

(2010), Premachandra, Chen, & Watson (2011), Shetty, Pakkala, & Mallikarjunappa (2012) etc. These 

studies demonstrated that corporate efficiency measures can successfully distinguish between ‘good’ 

and ‘bad’ companies, while Min & Lee (2008) suggested using efficiency scores generated by DEA to 

predict bankruptcy directly as a practical approach. 

However, most studies on the significance of efficiency in estimating the probability of financ ia l 

distress used cross-sectional models that fail to capture temporal changes in efficiency, and yet internal 

and external conditions associated with company performance do change over time. So as Shumway 

(2001) argues, dynamic models, in contrast to cross-sectional or static models, are preferred in business 

failure prediction. Premachandra, Chen, & Watson (2011), Li, Crook, & Andreeva (2014), Wanke, 


4 

 
Barros, & Faria (2015) and Wanke, Azad, & Barros (2016) all strongly suggested using dynamic 

efficiency models to predict the risk of bankruptcy or financial distress. Yet so far, to the best of our 

knowledge, no study has conducted an analysis of efficiency scores in dynamic prediction models. 

This paper fills this gap and is the first to develop a dynamic model integrated with Malmquist DEA 

and hazard models in order to predict financial distress, which can be easily extended to other 

bankruptcy or business failure prediction models within the scope of credit assessment. We explore 

the question of whether changes in the efficiency of companies over time really affects the chance that 

they will suffer financial distress.  

Our paper adds to the literature in several important ways. First, we propose two stage DEA-based 

programming models as decision-support systems in identifying efficient (healthy) and ineffic ie nt 

(distressed) companies in advance. Second, it enhances the bankruptcy prediction literature by 

providing a dynamic model that offers insights into the competitive position of a business, in addition 

to accurate distress predictions. Third, we address the methodological limitations of existing studies 

Xu & Wang (2009), Yeh, Chi, & Hsu (2010), Cielen, Peeters, & Vanhoof (2004), Premachandra, 

Bhabra, & Sueyoshi (2009), Premachandra, Chen, & Watson (2011) which either assume constant 

returns to scale (CRS) or homogeneous production technology in their samples. Our estimation of 

corporate efficiency is more realistic given mixed industrial sectors.  

This paper builds on the work of Premachandra, Chen, & Watson (2011), Paradi, Asmild, & Simak 

(2004), Shetty, Pakkala, & Mallikarjunappa (2012) and Li, Crook, & Andreeva (2014), all of which  

employ cross-sectional analysis. Our analysis is based on a sample of 742 Chinese listed companies 

observed over 10 years, in total 5,490 company-years. Dynamic DEA efficiency scores calculated by 

the Malmquist Index are used to classify healthy and distressed companies and to predict the 

probability of distress as the firm progresses through its lifecycle. Observations are made about the 

predictive utility of several model specifications with different efficiency scores and assumptions that 

contribute to decision-support modelling for bankruptcy prediction. We find that computing the 

efficiency of a company relative to those across a broad range of industries gives more accurate 

predictions than computing the efficiency relative only to others in the same industry. We also find 

that if one does the latter, then comparing the efficiency of a company with the most effic ie nt 

companies at any time throughout the sample period is more accurate than using only technical or 

super efficiency. 

The rest of the paper is structured as follows. Section 2 reviews the literature on dynamic corporate 

credit risk models and dynamic DEA models, which is fundamental to our extensions of previous 

studies in both areas. Sections 3 and 4 introduce the methodology and the data used, respectively, with 

the descriptions of the sample and variables. The results of four comparative models that employ 


5 

 
different types of efficiency scores are presented and discussed in Section 5, and Section 6 concludes 

the paper.  

 
2. Literature review 

2.1 Dynamic credit risk models 

Altman (1968) introduced statistical methods (Discriminant Analysis or DA) to corporate 

bankruptcy prediction before the age of machine learning techniques. Statistical methods estimate 

coefficients of financial ratios in a parametric format and were more popular compared to machine 

learning techniques given the limited computing power decades ago. Nevertheless, Shumway (2001) 

claimed that half of the financial ratios that were found to be successful in cross-sectional models 

turned out to be unrelated to bankruptcy probability in later periods.  

As a result Shumway (2001) proposed a hazard model that has advantages over cross-sectional 

models. First, hazard models incorporate the effect of time on the risk of an event occurring. Second, 

hazard models can also incorporate Time-Varying Covariates (TVCs) relating to an individual firm 

and to macroeconomic factors, the latter representing systemic effects. Third, hazard models can make 

better predictions by utilising data observed over several time periods. Fourth, hazard models can 

handle censoring: where an event occurs but is not observed in the observation time window. All of 

these advantages imply that dynamic or hazard models should be preferred in credit risk modelling.  

Unlike static models, dynamic models imply a time varying hazard rate and thus are more 

appropriate to make predictions. An event (e.g. default, bankruptcy or financial distress) can happen 

any time during interval ],[ ttt   ( t  is duration time) in Cox proportional hazard regression and that 

was applied by Bonfim (2009) to Portuguese firms over the period 1996 to 2002 to predict bankruptcy 

risk during different macroeconomic cycles.  

In corporate credit, the default event is usually defined to occur within a specific period of time, 

commonly one year (Carling, Jacobson, Linde, & Roszbach, 2007). Covariates are also observed only 

at given points of time when financial statements are disclosed. Therefore it is more appropriate to use 

a discrete time version rather than a continuous time hazard model. The discrete time hazard model 

(DHM) is equivalent to multi-period logistic regression in terms of computation but with an additiona l 

term 0 ( )h t  as the baseline hazard function. Such a method was applied to predict corporate default 

risk by Shumway (2001), Carling, Jacobson, Linde, & Roszbach (2007), Nam, Kim, Park, & Lee 

(2008), Wilson & Altanlar (2014) etc. 

 
6 

 
2.2 Dynamic DEA models 

If DEA efficiency is one of the covariates in dynamic models, there is an obvious need to evaluate 

it in multiple periods correspondingly. Whilst conventional (i.e. static) DEA models are constructed 

for a single period, many researchers and practitioners are interested in how efficiency changes over 

time. Specifically, if a DMU can be observed at different points of time, its change in efficiency over 

the periods can be informative for predicting future financial distress. One possible approach is to solve 

DEA problems period by period separately and build a panel dataset consisting of these efficie nc y 

scores, as Bryan, Fernando, & Tripathy (2013) suggested. Yet it can be argued that methodologic a lly 

the scores in different periods are incomparable because DEA scores are based on the frontier formed 

by the peers in that period. That is, a relative efficiency of 0.5 in the second period may be no better 

than a relative efficiency of 0.3 in the first period since efficiency also depends on the frontier shift for 

the industry, for example a change in membership or in technology. 

A possible solution to this might be to still perform a static DEA analysis for each period separately 

but in the second stage using a standard regression method to estimate the change over time and then 

extend it to further periods. Emel, Oral, Reisman, & Yolalan (2003) and Min & Lee (2008) used this 

two-stage method to forecast DEA scores and hence bankruptcy. However, Cook & Seiford (2009) 

commented that this approach was unsatisfactory because it failed to capture the interaction of one 

period with another. 

Window DEA was introduced by Charnes, Clark, Cooper, & Golany (1984) to deal with the 

efficiency change in the sense of time series. The idea of Window DEA, similar to other window 

analyses, is to set up a fixed observation window, and to move it across the whole period. Finally the 

movements and stability of the results can be analysed across different panel subsets. However Cooper, 

Seiford, & Tone (2006) argued that a shortcoming of Window DEA was evident in the initial and last 

period where cases were less well evaluated.  

The Malmquist DEA model is particularly suitable in dealing with panel data. The original idea of 

the Malmquist Index (MI) was to compare the production technology of two economies, so it is a 

bilateral index. Let ( )f x  be the production function of an economy, where x is a vector of inputs such 

as labour and capital. To calculate the MI between Economy a  and Economy b  of differe nt 

production functions, we can substitute 
a

x  in ( )
b

f   and vice versa. So the MI is defined as 

( ) ( )
MI ( ) ( ) / ( ) ( )

( ) ( )

b a a a
a a b a a b b b

b b a b

f f
f f f f

f f
  

x x
x x x x

x x
  .                        (1) 

Inspired by Caves, Christensen, & Diewert (1982) who introduced this index in productivity analysis, 

Färe, Grosskopf, Lindgren, & Roos (1992) and Färe, Grosskopf, Norris, & Zhang (1994) integrated 


7 

 
the MI into DEA and developed a DEA-based Malmquist productivity index. The Malmquist 

productivity index evaluates the total factor productivity change of a DMU between two periods where 

a  and b  in equation (1) each relate to a period. It is defined as the product of efficiency change (catch-

up) and technological change (frontier-shift) where the catch-up effect describes how much closer a 

DMU gets to the most efficient production frontier, and the frontier-shift effect describes the 

technology improvement in the sample. The decomposed elements of the MI can determine how much 

of a relative efficiency increase from period t  to 1t  can be credited to individual effort and how 

much to industry innovation. The efficiency change reflects the extent to which a DMU improves or 

worsens its efficiency, while technological change reflects the change of the efficiency frontiers 

between two periods. Since the introduction of the MI, there have been various studies of productivit y 

change over time in different fields, for example in Italian manufacturing firms (Costa, 2012), Spanish 

government tax offices (Fuentes & Lillo-Bañuls, 2015), Taiwanese banks (Shyu & Chiang, 2012) and 

Korean universities (Sohn & Kim, 2012). These studies looked into the efficiency changes over time 

to derive managerial implications and strategic recommendations and did not consider distress 

prediction. 

Table 1 Summary of corporate credit prediction literature 

Literature Samp le Event M ethod 
Ty p e of 

p rediction 

Altman (1968) US M anufacturing firms Bankrup tcy DA Static 

Hua, Wang, Xu, Zhang, & Liang (2007) Chinese listed comp anies Distress SVM  Static 

Sun, Jia, & Li (2011) Chinese listed comp anies Distress Ensemble models Static 

Yang, You, & Ji (2011) Polish firms Bankrup tcy SVM  Static 

Cao (2012) Chinese listed comp anies Distress Choquet integral Static 
Fedorova, Gilenko, & Dovzhenko (2013) Russian manufacturing firms Bankrup tcy Ensemble models Static 

Gordini (2014) Italian manufacturing firms Bankrup tcy GA Static 

Lóp ez Iturriaga & Sanz (2015) US banks Bankrup tcy NN Static 

Bonfim (2009) Portuguese firms Default Prop ortional hazard model Dy namic 

Shumway  (2001) US firms Bankrup tcy DHM  Dy namic 
Chava & Jarrow (2004) US firms Bankrup tcy DHM  Dy namic 

Carling, Jacobson, Linde, & Roszbach (2007) Swedish firms Default DHM  Dy namic 

Nam, Kim, Park, & Lee (2008) Korean listed comp anies Bankrup tcy DHM  Dy namic 

Wilson & Altanlar (2014) UK new firms Bankrup tcy DHM  Dy namic 

Emel, Oral, Reisman, & Yolalan (2003) Turkish manufacturing firms Bankrup tcy DEA, DA Static 
Cielen, Peeters, & Vanhoof (2004) Belgian firms Bankrup tcy DEA Static 

Paradi, Asmild, & Simak (2004) US M anufacturing firms Bankrup tcy DEA Static 

M in & Lee (2008) Korean manufacturing firms Bankrup tcy DEA Static 

Xu & Wang (2009) Chinese listed comp anies Distress DEA+DA, Logit & SVM  Static 

Psillaki, Tsolas, & M argaritis (2010) French manufacturing firms Bankrup tcy DEA+Logit Static 
Premachandra, Bhabra, & Suey oshi (2009) US firms Bankrup tcy DEA Static 

Yeh, Chi, & Hsu (2010) Taiwanese manufacturin g 

firms 

Bankrup tcy DEA+Rough Set Thoery  & 

SVM  

Static 

Premachandra, Chen, & Watson (2011) US firms Bankrup tcy DEA Static 

Shetty , Pakkala, & M allikarjunap p a (2012) Indian firms Bankrup tcy DEA Static 
Bry an, Fernando, & Trip athy  (2013) US firms Bankrup tcy DEA+DA Static 
Li, Crook, & Andreeva (2014) Chinese listed comp anies Distress DEA+Logit Static 

Paradi, Wilson, & Yang (2014) US Non-manufacturing firms Bankrup tcy DEA+DA Static 
Kingy ens, Paradi, & Tam (2016) US retail comp anies Bankrup tcy DEA Static 

Yang & Dimitrov (2017) US Non-manufacturing firms Bankrup tcy DEA+SVM  Static 

 
8 

 
Studies such as Emel, Oral, Reisman, & Yolalan (2003), Paradi, Asmild, & Simak (2004), Cielen, 

Peeters, & Vanhoof (2004), have been amongst the first to explore accuracy of bankruptcy predictions 

using DEA efficiency (see Table 1). It is worth extending it to a panel analysis as Malmquist DEA 

models proved capable of analysing change in performance over time. Unfortunately, we have only 

seen applications restricted to cross-sectional analysis, even in the latest studies such as Kingyens, 

Paradi, & Tam (2016) and Yang & Dimitrov (2017).  

Having understood the advantages of Malmquist DEA in dealing with panel data over other methods, 

this paper applies Malmquist DEA scores to dynamic prediction of financial distress by taking the time 

dimension into account, which bridges the literature of DEA applications and dynamic credit risk 

modelling. The details of the methodology are presented next. 

 
3. Methodology 

We compute the MI and dynamic efficiency scores first and then regress financial distress on the 

indicators of dynamic efficiency and other variables. Negative values are occasionally observed in 

financial data that can be used as inputs and outputs for DEA. Therefore, dealing with negative values 

becomes necessary. An appropriate model would be the input-oriented VRS Slack Based Model 

(SBM), given that only the outputs contain negative values (Cooper, Seiford, & Tone, 2006), as in our 

case. The Malmquist DEA model is based on these assumptions while other choices may not guarantee 

both translation and units invariance at the same time. 

 
3.1 Malmquist DEA 

In order to build the required panel dataset, it is necessary to calculate the efficiency scores for each 

company in each year of observation. Using Malmquist DEA we assume multiple inputs and outputs 

when DMUs are repeatedly observed on a certain interval basis. 

Caves, Christensen, & Diewert (1982) defined a distance function ( )D   based on the Malmquis t 

productivity index (Malmquist, 1953) to calculate technical efficiency (TE). A company is efficient if 

( , ) 1D x y  . Let 
0

t
x  denote a vector of inputs and 

0

t
y  denote a vector of outputs for DMU0, both at 

period t . The relative efficiency of DMU0 at period t , 
*

0 0 0 0
( , )

t t t
D x y , is calculated as the amount by 

which input 
0

x  can be reduced while producing the given output level 
0

y  compared to the most 

efficient company on the frontier. Similarly, 
1 1 1

0 0 0
( , )

t t t
D

  
x y is its efficiency score at period 1t  . Thus, 

with multiple periods, 
1 1

0 0 0
( , )

t t t
D

 
x y  and 

1

0 0 0
( , )

t t t
D


x y  are actually efficiency scores using a set of 


9 

 
inputs and outputs in one period, 1t   and t  respectively, compared with the frontier of the other 

period, t  and 1t   respectively.  

Following the ideas of Farrell (1957) to decompose the total factor productivity into the efficie nc y 

change (EC) and the technology change (TC), Färe, Grosskopf, Lindgren, & Roos (1992) defined the 

input-oriented Malmquist productivity index (MI) to measure the productivity change of DMU0 

between period t  and 1t   as  

1 1 1 1 1
1/ 20 0 0 0 0 0

0 1

0 0 0 0 0 0

1 1 1 1 1
1/ 20 0 0 0 0 0 0 0 0

1 1 1 1

0 0 0 0 0 0 0 0 0

( , ) ( , )
MI [ ]

( , ) ( , )

( , ) ( , ) ( , )
      [ ]

( , ) ( , ) ( , )

t t t t t t

t t t t t t

t t t t t t t t t

t t t t t t t t t

D D

D D

D D D

D D D

    



    

   

 

 

x y x y

x y x y

x y x y x y

x y x y x y

  .                (2) 

The first part is the relative change of efficiency from period t  to 1t  . Hence they defined 

1 1 1

0 0 0

0 0 0

EC
( ,

)
 

)

( ,

t t t

t t t

D

D

  


x y

x y
                                                         (3) 

and 

1 1
1/ 20 0 0 0 0 0

1 1 1 1

0 0 0 0 0 0

( , ) ( ,
T

)
[ ]

(
 

, ,
C

) ( )

t t t t t t

t t t t t t

D D

D D

 

   


x y x y

x y x y
  .                            (4) 

Other than using a distance function ( )D   to calculate efficiency, under the nonparametr ic 

framework Färe, Grosskopf, Norris, & Zhang (1994) calculated the MI by an oriented radial DEA 

model, while other DEA models are also suitable (Cooper, Seiford, & Tone, 2006).  

Let ( 1, , )
t

ij
x i m  and ( 1, , )

t

rj
y r q denote the inputs and outputs for DMUj ( 1, ,j n ) 

respectively at any given point of time t . The production possibility set of a VRS model is defined by 

Cooper, Seiford, & Tone (2006) as 

( , ) ( , ) , 0 , 1, 0
n n

t t t t

j j j j

j j

X Y x y x x y y 
  

      
  

  eλ λ   ,               (5) 

where e  is the row vector with all elements equal to one, 
n

Rλ is the intensity vector. Let 
0 0 0
( , )

t t t
 x y  

be the optimal solution to the programming problem (6): 

0 0 0

1 0

0

0

1
( , ) min    1-

m

. .             

                  
                  1

                  ,

m
t t t i

t
i i

t t

t t

s

x

s t X

Y












  





 

x y

x λ s 0

y λ

eλ

λ 0 s 0

  ,                                           (6) 


10 

 
where 


s  is a vector of slacks, λ  is a non-negative vector and 
1

1
n

j

j




 . 

The reciprocal efficiency 
+1 +1

0 0 0
( , )

t t t
 x y is the optimal solution of equation (7): 

1 1

0 0 0

1 0

1

0

1

0

1
( , ) min    1-

m

. .             

                  
                  1

                  ,

m
t t t i

t
i i

t t

t t

s

x

s t X

Y




 



 







  





 

x y

x λ s 0

y λ

eλ

λ 0 s 0

  .                                       (7) 

By solving the linear programmes (6) and (7) four times for 
0 0 0
( , )

t t t
 x y ,

1 1 1

0 0 0
( , )

t t t


  
x y , 

1 1

0 0 0
( , )

t t t


 
x y , 

and 
1

0 0 0
( , )

t t t
x y


, we have the MI, 

 to 1

0

t t
M


, as 

1 1 1 1 1
 to 1 0 0 0 0 0 0

0 1

0 0 0 0 0 0

( , ) ( , )

( , ) ( , )

t t t t t t
t t

t t t t t t
M

 

 

    



 

x y x y

x y x y
  .                           (8) 

This study is interested in the relative efficiency of a DMU calculated by the interaction of periods. 

Multiplying 
0 0 0
( , )

t t t
 x y by 

 to 1

0

t t
M


 will give the relative efficiency at period 1t   compared to period 

t  for each DMU. 

In the domain of Malmquist DEA, the reference set, namely period t  mentioned above, to which 

the relative efficiency is compared, is of importance in our research. More specifically, the model 

defined on two consecutive periods t  and 1t   can be seen using adjacent periods as the reference set, 

which was the original design in Färe, Grosskopf, Lindgren, & Roos (1992). Suppose there are five 

periods 1, , 5t  . By running Malmquist DEA with adjacent references, one can only get relative 

efficiency 
1

 , 
2

  compared to period 1, 
3

 compared to period 2 and so on. However, it is not intuit i ve 

for the other relative efficiency of period 3 compared to period 1 or period 4 compared to period 2 . 

Thus it is not possible to interpret the relative efficiency directly with adjacent moving references. A 

solution is to use a fixed reference set as suggested by Berg, Førsund, & Jansen (1992). Therefore, in 

this research, all relative efficiency is referred to the first period as the beginning of the observation. 

Thus it is not period 1t  compared to period t  but period t  ( 2t  ) compared to period 1. In this way, 

it is very likely that in later periods efficiency scores are larger than 1 as technology develops. It should 

be noted that apart from scores in period 1, all other scores larger than 1 do not necessarily imply being 

efficient in that period.  

 
3.2 Discrete hazard model with DEA scores 


11 

 
Computing efficiency scores calculated by DEA in the first stage and inputting them into the second 

stage analysis where other classifying methods are involved is a popular approach. Examples can be 

found in Psillaki, Tsolas, & Margaritis (2010), Xu & Wang (2009), Yeh, Chi, & Hsu (2010), Li, Crook, 

& Andreeva (2014), etc. The advantage of doing this is that we can evaluate the marginal effects and 

the statistical significance of new variables conditional on other covariates such as financial ratios , 

which have been shown to have predictive power in detecting potential bankruptcy risk.  

Dyson et al. (2001) and Li, Crook, & Andreeva (2014) have argued that homogeneity of DMUs in 

terms of technology is important both in the DEA modelling step and in the regression analysis. Thus 

industry diversity becomes a critical issue in the process. Although the homogeneity requirement, as 

Li, Crook, & Andreeva (2014) commented, may increase the complexity of modelling, it is consistent 

with the findings in corporate credit risk modelling that attention should be given to the differe nces  

between industrial sectors (Bonfim, 2009; Chava & Jarrow, 2004).  

This research employs this two-stage modelling process and takes into account industrial differe nces 

separately in both stages, i.e. the sectors are separated in the Malmquist DEA programming, and in the 

discrete hazard model with the sectors being represented by dummy variables. After Malmquist 

efficiency scores of multiple periods are obtained in the way given in the last section, they enter the 

discrete time hazard model with sector dummies written as 

1 0 0 1, , , 2 2 , 2

1

logit( ( )) ( )
S

T e T r

d s s i s t i t

s

h t h t D 
  



    β x β x   ,                      (9) 

where 1d   when a company suffers financial distress, 0 otherwise;  

0
( )h t  is the baseline hazard function; 

, , 2

e

i s t
x  is a vector of efficiency scores for sector s company i  at time 2t  , 3, 4,...,

i
t T ; 

, 2

r

i t
x  is a vector of financial ratios for company i  at time 3, 4,...,

i
t T ; 

1
s

D   if company i  is a member of sector s , 0 otherwise, 1,...,s S ; 

0
  is the coefficient of the baseline hazard to be estimated; 

1,s
β  is a vector of parameters for efficiency scores for sector s  to be estimated; 

2
β  is a vector of parameters for financial ratios to be estimated. 

 
MaxDEA Pro 6.1 is used to solve Malmquist DEA problems indicated in equations (6), (7) and (8).  

It can handle unbalanced panel datasets where some cases of late entry or early exit are censored. 

Combined with equation (9), the base model for financial distress prediction is proposed.  

 
12 

 
3.3 Global reference 

Further to fixed reference, Pastor & Lovell (2005) introduced the idea of global reference. In some 

cases where efficient frontiers of different periods cross each other, a global reference set represents 

the best practices in all periods. For example, in Figure 1, there are four DMUs lying on each of two 

frontiers ABCD and EFGH. The DMU to be evaluated, unit N, could be referred to frontier ABCD, 

frontier EFGH or the most efficient units ever over the observation period AFBGH. It is acceptable 

that when the observation window is long enough, all DMUs at the current period are under the cover 

of the best historical DMUs, possibly including themselves. Thus the relative efficiency in this 

circumstance can be treated as absolute efficiency, if the sample is very large. The scores to global 

reference would be less than or equal to 1. In practice, when the model is built, it is the historic data 

prior to the current moment that is used in model training and the historic global reference of the past 

that is available. Therefore, the efficiency calculated by the global reference as an option is embedded 

into the comparative models.  

Figure 1 An example of global reference 

 
3.4 Super efficiency 

Cooper, Seiford, & Tone (2006) referred to the two intertemporal scores, 
1 1

0 0 0
( , )

t t t


 
x y  and 

1

0 0 0
( , )

t t t



x y

as the ‘exclusive schemes’. They explain that the exclusive scheme in solving intertempora l 

programming treats the DMU in the period to be evaluated as having been removed from the evaluator 

group of the other period. This is mathematically equivalent to what is known as ‘super efficiency’ in 

DEA. Super efficiency is used as a solution to the problem that common DEA models do not provide 


13 

 
an efficiency ranking for efficient units as their scores are all equal to 1 (Andersen & Petersen, 1993). 

The difference between a super efficiency model and standard models is that in super efficiency models 

the DMU to be evaluated is eliminated from the reference frontier, so its score can be greater than 1, 

as shown in Figure 2. Units A, B, C, D and E consist of the productivity possibility set. If unit E is to 

be evaluated, its efficiency score is 1 as it is on the frontier AECD of standard DEA models; in super 

models, the new frontier ABCD is employed. For another unit C, its new reference frontier is AED. In 

this way, though units E and C are both efficient (score = 1) in standard models, a difference between 

them can be observed by obtaining a new unbound score greater than 1. 

Figure 2 An example of super efficiency 

 
DEA as a frontier technique is arguably an outlier analysis. However, extreme outliers may change 

the local frontier sufficiently for other units referred to it to be incorrectly measured. In this 

circumstance, super efficiency can be used to identify outliers (Banker & Chang, 2006). Obviously, 

super efficiency scores offer more discriminant power between efficient units, which is particular ly 

useful in classifying good and bad companies in credit risk models. This can be found in the model of 

Premachandra, Chen, & Watson (2011) who employed super efficiency scores to predict corporate 

failure. Our paper considers super efficiency in a model for comparison with the base model. 

The Malmquist SBM DEA model with super efficiency is described by Tone (2002) where 
0

= i
i t

i

s

x




 
as:  


14 

 
0 0 0
,

1

0

1

0

1
( , ) min    1+

m

. .             (1+ )   ( 1, , )

                  
                  1

                  ,

m
t t t

i

i

m
t

i i j ij

i

t t

s t x x i m

Y

 

 







 





 





ξ λ
x y

y λ

eλ

λ 0 ξ 0

  .                   (10) 

3.5 Model specification 

This paper uses a two-stage analysis. In the first stage, DEA efficiency scores for each company at 

each period are calculated by DEA models defined in the previous sections. In the second stage, the 

proposed discrete hazard model incorporates efficiency scores as covariates in a panel dataset. Four 

models outlined in Figure 3 are to be compared with each other for the reasons given below. 

Figure 3 Model specification 

 
Model One, as introduced in Section 3.1 and 3.2, uses Malmquist DEA scores calculated by 

equations (6), (7) and (8) as covariates in the hazard model (equation (9)). The efficiency score is 

Technical Efficiency under the VRS assumption. Model One is the base model of this research and 

predicts the probability of financial distress in two years’ time given that the company has survived 

until the time of prediction.  

Model Two applies global reference as introduced in Section 3.3. In this model, the relative 

efficiency score of company i  in sector s  at period t  is calculated with reference to the most 


15 

 
productive companies in all possible periods in the same sector. It is of interest to investigate the 

predictive power of efficiency scores compared in a cross-period scenario.  

Model Three follows the super efficiency setting in Section 3.4. Super efficiency scores provide 

more discrimination for efficient companies so one may expect to see some improvement in predictive 

accuracy. 

Model Four is the simplest method in terms of DEA calculation and regression, but heterogeneous 

technologies are combined. In the first stage when calculating the efficiency scores, all industries are 

pooled together so the efficient frontier may be pushed outward as more units are considered. In the 

second stage, the term 
1, , , 2

1

S
T e

s s i s t

s

D




 β x in equation (9) is replaced by a simpler form 1 , 2
T e

i t
β x  without the 

sector dummies. This is in line with previous literature that pools heterogeneous samples for 

bankruptcy prediction. This approach is referred to as the ‘generic’ model. 

The predictive accuracy of models is compared on the out-of-time test sample by standard measures 

used in predictive modelling (Lessmann, Baesens, Seow, & Thomas, 2015): AUC (the area under 

Receiver Operating Curve), KS (the Kolmogorov-Smirnov statistic), Type I error (a distressed 

company that is wrongly classified as a non-distressed company) and Type II error (a non-distressed 

company that is wrongly classified as a distressed company). For the latter two measures the cut-off is 

set to the percentage of the distressed companies in the training set. 

 
4. Data 

4.1 Sample description 

The data in this research is taken from the two Chinese stock exchanges, which by 2014 were listing 

over 2,500 Chinese companies. The Chinese government impose ‘Special Treatment’ (ST) on listed 

companies in financial distress, so ST is chosen as the official indicator of distress (marked as 1d  ) 

in this research. Predicting financial distress of Chinese listed companies indicated by ST is consistent 

with many previous studies using various machine learning techniques, for example, Hua, Wang, Xu, 

Zhang, & Liang (2007), Sun, Jia, & Li (2011) and Cao (2012). Since the number of employees is only 

available in annual reports after the year 2000, the observation period is between 2001 and 2010. After 

the initial filtering, 2,027 individually listed companies over the period 2001 to 2010, a total of 12,431 

firm years were left in the sample for analysis. Among them, there are 12,058 healthy firm years and 

373 distressed firm years, giving a distress rate of approximately 3%.  


16 

 
As industry classification is essential to this research, the starting point was to consider all industr ies. 

Banking and insurance companies are excluded from the sample as their accounting conventions are 

different from those in other sectors. For some industries the ST numbers were very low, so we focus 

on three industries (a total of 742 individual companies) that accounted for nearly half of all distressed 

cases (49.87%). These are Raw Materials (sector code 1510), Industrial Equipment (sector code 2010) 

and Real Estate (sector code 4040). A sufficient number of ST companies is necessary for the follow ing 

two reasons. First, as the panel analysis covers ten years, the valid number of firm years falling in each 

period cannot be too small. Second, DEA models require that in each period the number of units is 

more than double the number of inputs and outputs (8 in our case) for good estimates (Dyson et al., 

2001). In the end, 5,490 firm years in these three industries were left in the sample. 

Table 2 indicates that the average distress rate across all years is 3.37% (185/5490=3.37% ). The 

average number of observations for each company in ten years is 7.4 ( 5490 / 742 7.40 ). In the 

years 2002, 2003 and 2006 there are significantly more companies suffering financial distress than in 

other years.  

Figure 4 shows the distress rate of the three sample industries in each period of observation. In most 

years, there are more distressed Real Estate companies than in the other two sectors. In later years 

(2008 to 2010) the distress rate is considerably lower than those during 2002-2003 and 2006-2007. 

The whole sample is split into a training set covering 2001-2008, eight years, and the test set includes 

companies with covariates measured over 2009 to 2010 and a distress/non-distress indicator measured 

in 2011 and 2012. This provides an out-of-time sample and is consistent with Shumway (2001) and 

other studies using DHM (Nam, Kim, Park, & Lee, 2008; Wilson & Altanlar, 2014). 

Table 2  Distributions of samples in three industries over 2001-2010 

Sector Raw Materials Industrial Equipment Real Estate Total 

N 328 277 137 742 

year 
Distress 

Total 
Distress 

Total 
Distress 

Total 
Distress 

Total 
0 1 0 1 0 1 0 1 

2001 208 1 209 167 5 172 121 5 126 496 11 507 

2002 217 8 225 176 6 182 115 9 124 508 23 531 

2003 224 9 233 181 8 189 104 11 115 509 28 537 

2004 234 6 240 193 7 200 101 5 106 528 18 546 

2005 231 5 236 189 6 195 98 3 101 518 14 532 

2006 237 7 244 189 13 202 86 14 100 512 34 546 

2007 251 9 260 214 4 218 82 6 88 547 19 566 

2008 273 5 278 221 5 226 82 2 84 576 12 588 

2009 265 6 271 219 2 221 80 2 82 564 10 574 

2010 254 10 264 214 5 219 79 1 80 547 16 563 

Total 2394 66 2460 1963 61 2024 948 58 1006 5305 185 5490 

 
17 

 
Figure 4 Distress rates of the three sectors over 2001-2010 

 
4.2 DEA inputs and outputs 

Variables for the MI are selected from physical or monetary items that are contained in standard 

annual reports. There are five inputs: number of employees, total liabilities, total costs, total assets, 

and share capital, and three outputs: total profits, total cash flow and total sales. The reason for keeping 

both total sales and total profits is that a large revenue does not necessarily imply a large profit. Having 

correlated variables in DEA does not lead to a problem because their weights can automatically adjust 

without a significant impact on the efficiency score (Dyson et al., 2001). On the contrary, Dyson et al. 

(2001) (p.249) argued ‘omission of a highly correlated variable can on occasion lead to significa nt 

changes in efficiencies’. This argument may also be applied to the inclusion of both the number of 

employees and total costs as the latter covers labour costs. Therefore, simplistically, companies make 

use of resources (measured by total assets and share capital), hire people (measured by the number of 

employees), pay for labour and raw materials (measured by total costs), turn them into products and 

services, sell them for revenue (measured by total sales) and aim for large earnings (measured by total 

profits) and positive cash inflow (measured by total cash flow).  

The descriptive statistics of the covariates are reported in aggregate because Malmquist DEA models 

are estimated on the whole dataset (both training and test samples). For convenience of presentation, 

only graphs of means over time are presented in Figure 5. 


18 

 
Figure 5 Descriptions of DEA variables over 2001-2010 

Number of employees Debts (mCNY) 

  
Costs (mCNY) Assets (mCNY) 

  
Capitals (mCNY) Profits (mCNY) 

  
Cash (mCNY) Sales (mCNY) 

  
Generally, the size of listed companies (in terms of total assets) increased over the ten years under 

study. Their total debts, total costs and share capital had similar growth rates. It is the same for total 


19 

 
sales as for output. However, there were some changes that did not follow the trend. For example, the 

number of employees in sector 2010 (Industrial Equipment) nearly doubled in the three years 2006 -

2008. There was a noticeably large drop in profit for sector 1510 (Raw Materials) in years 2008 and 

2009, which might be due to the influence of the global financial crisis. Additionally, there was a sharp 

net cash inflow in sector 4040 (Real Estate) in 2009. These large changes highlight the importance of 

running DEA peer comparison analyses separately for each industry so that the relative efficie nc y 

scores are not biased. 

 
4.3 Duration time and variables  

The sample of this study consists of listed companies so following Shumway (2001) we choose the 

stock trading age to be the duration time in the hazard model, because companies met the same 

requirements to be listed on an exchange. The average trading age in the sample is 7.79 years. 

Following experimentation the baseline function of the duration time, ln( )t , proved to be a good fit in 

the models, as in Shumway (2001).  

The indicator of financial distress, ST, is a status indicator where a company can go to ST and 

recover from it. Here, only the first occurrence of ST is regarded as the event of distress and the 

information after that is ignored. All companies at the time of entering the observation window in 2001 

are healthy companies (not in the status of ST). So the model predicts the probability of a company 

going into financial distress (ST) for the first time in the next two years, conditional on lagged values 

of covariates, given the duration time since the company was listed on the stock exchange.  

Six categories of financial ratios are considered in the regression model: profitability, liabilities and 

liquidity, capital and asset composition, cash flow, operation and growth rate. In the preliminar y 

analysis of group mean difference tests and collinearity, one ratio from each category is selected to 

represent that aspect of a company’s financial position. These six ratios are Return on Equity, Current 

Liabilities / Total Liabilities, Tangible Assets / Total Assets, Cash Flow from Operation per Share, 

Total Assets Turnover and Total Assets Growth.  

  
5. Results 

5.1 Dynamic DEA score 

 
20 

 
Table 3  Description of efficiency scores 

Sector Distress Stats 
Technical 
Efficiency 

Global 
Efficiency 

Super 
Efficiency 

Generic 
Efficiency 

Raw Materials 

No 

N 2394 2394 2394 2394 

Mean 1.225 0.653 1.234 0.918 

SD 0.772 0.167 0.788 0.315 

Min 0.029 0.021 0.029 0.028 

Max 8.638 1 8.638 5 

Yes 

N 66 66 66 66 

Mean 0.756 0.464 0.756 0.67 

SD 0.487 0.207 0.487 0.313 

Min 0.04 0.026 0.04 0.035 

Max 3.458 0.929 3.458 2.234 

Mean dif. 
btw groups 

F 17.484 53.593 21.914 30.198 

p  0.000 0.000 0.000 0.000 

Industrial Equipment 

No 

N 1963 1963 1963 1963 

Mean 1.033 0.457 1.059 0.936 

SD 0.53 0.18 0.603 0.405 

Min 0.07 0.032 0.07 0.062 

Max 7.378 1 7.378 6.178 

Yes 

N 61 61 61 61 

Mean 0.702 0.307 0.702 0.664 

SD 0.329 0.193 0.329 0.293 

Min 0.004 0.002 0.004 0.003 

Max 1.93 1 1.93 1.821 

Mean dif. 
btw groups 

F 4.970 13.121 25.262 28.014 

p  0.026 0.000 0.000 0.000 

Real Estate 

No 

N 948 948 948 948 

Mean 0.972 0.702 1.017 0.929 

SD 0.569 0.196 0.666 0.503 

Min 0.059 0.049 0.059 0.059 

Max 6.277 1 8.204 5.931 

Yes 

N 58 58 58 58 

Mean 0.66 0.518 0.667 0.628 

SD 0.381 0.26 0.389 0.35 

Min 0.024 0.021 0.024 0.023 

Max 1.805 1 1.805 1.603 

Mean dif. 
btw groups 

F 3.683 21.919 14.439 22.350 

p  0.050 0.000 0.000 0.000 

Mean dif. btw sectors F 4.598 104.354 3.192 0.187 


21 

 
p  0.010 0.000 0.041 0.829 

 
Figure 6 Distributions of efficiency scores over 2001-2010 

Technical Efficiency Global Efficiency 

  
Super Efficiency Generic Efficiency 

  
In order to see how efficiency and technology change over time, we plot graphs of mean efficiency 

scores across all periods in  

Figure 6. For convenience, graphs for three industrial sectors are drawn on one chart. Generally the 

efficiency and technology levels increased while, in the later years, there were some declines, 

presumably due to the influence of the financial crisis in 2008. 

 
5.2 Regression results 

Six ratios, together with efficiency scores, are integrated in Models One to Four as showed in Figure 

3. The results are presented in Table 4. The 
2

  tests indicate that all four models explain significa nt 

amounts of variation in the probability of distress. The coefficient for each type of efficiency has a 

negative sign, which indicates that the more efficient a company is, the less likely it is to go into 

financial distress. The value of the coefficient on each type of efficiency differs between the models 

because their mean values and distributions are different. For Models One to Three, when three 


22 

 
industries were treated separately, differences in the parameters are observed. All parameters are 

significant at the 5% level. The parameters of the financial ratios in Table 4 have the expected signs 

and all are statistically significant. 

Table 4  Model results  

Covariates Model One Model Two Model Three Model Four 

ln(duration) -0.042 -0.101 0.219 0.194 

     
Technical Efficiency (1510) -2.646**    

Technical Efficiency (2010) -2.968**    

Technical Efficiency (4040) -3.056**    

Global Efficiency (1510)  -4.762**   

Global Efficiency (2010)  -7.630**   

Global Efficiency (4040)  -3865**   

Super Efficiency (1510)   -2.565**  

Super Efficiency (2010)   -2.886**  

Super Efficiency (4040)   -2.941**  

Generic Efficiency    -5.485** 

     
Return on Equity -9.084** -9.386** -9.105** -8.656** 

Current Liabilities / Total Liabilities 6.776** 6.670** 6.809** 6.567** 

Tangible Assets / Total Assets -1.463** -1.553** -1.508** -1.555** 

Cash Flow from Operation per Share -0.971** -0.890** -0.964** -0.872** 

Total Assets Turnover -1.551** -1.136** -1.562** -0.546** 

Total Assets Growth -2.695** -2.595** -2.777** -2.185** 

      
Constant -4.721** -4.499** -4.703** -2.984** 

Log likelihood -453.36 -448.03 -451.59 -437.53 

Number of observations      4017 4017 4017 4017 

LR 
2

  380.4 391.04 383.94 412.06 

Prob > 
2

                  0 0 0 0 

Pseudo R2 0.2955 0.3038 0.2934 0.3201 

** indicates the coefficients is  significant at the 5% level of significance. 

 
5.3 Predictive accuracy 

Type I error occurs when a distressed company is wrongly classified as a healthy company while 

Type II error occurs when a healthy company is wrongly classified as a distressed company. The results 

of Models One to Four in Table 5 show much larger Type I errors in the test set than those in the 

training set. However, the opposite is the case with Type II errors, which is attributed to the lower 


23 

 
distress rate in the later years, as the classifications are based on the cut-off which is measured by the 

percentage of distressed companies in the training set.  

The AUC and KS statistics measure relative rank ordering of predicted probabilities of distress for 

healthy and distressed companies, with higher values corresponding to better models (Lessmann, 

Baesens, Seow, & Thomas, 2015). Values in Table 5 indicate very similar accuracy of rank orderings 

between all the models. The overall predictive accuracy is around 95%, which is higher than what is 

found in the cross sectional logit model combined with DEA efficiency in Xu & Wang (2009) (overall 

accuracy 91%) and in Li, Crook, & Andreeva (2014)  (overall accuracy 93%). 

Table 5  Predictive accuracy 

 Training set Test set 

 AUC KS 
Type I 
error 

Type II  
error 

AUC KS 
Type I  
error 

Type II  
error 

Model One 0.881 0.622 58.28% 2.28% 0.861 0.629 88.89% 1.45% 

Model Two 0.883 0.631 56.95% 2.22% 0.866 0.655 77.78% 1.27% 

Model Three 0.882 0.632 57.62% 2.25% 0.860 0.625 83.33% 1.36% 

Model Four 0.898 0.661 56.29% 2.20% 0.880 0.670 72.22% 1.18% 

Figures in bold indicate the best performance across all models while the next best is marked in italics. 

 
Model Four, which disregards industrial classification, seems to do consistently better than the other 

models. This indicates that by relaxing the assumption of homogeneity between DMUs of DEA and 

pooling all industries together before calculating DEA scores and probabilities of default in hazard 

models, practically one is more able to make more accurate predictions of future corporate distress 

than if we distinguish between industrial sectors. These results are similar to the findings in former 

studies in consumer credit of Banasik, Crook, & Thomas (1996) and Bijak & Thomas (2012), where 

segmentation did not produced the expected effect because of the sample size.  

Model Two is the next best model in terms of AUC and KS on the test set in Table 5. Model Two 

with global reference has a larger AUC (0.866) than those of Models Two and Four in the out-of-time 

predictions. The Super Efficiency model (Model Three) was slightly less accurate than other models. 

This suggests that greater discrimination between the most efficient companies is unnecessary for the 

prediction of financial distress. 

 
6. Conclusion 

    One of the aims of expert systems and machine- learning algorithms is to provide analytical support 

for business decisions based on intelligent data analysis. We present one way of providing such support 

which offers the benefits of insights into relative performance of companies and captures its change 


24 

 
over time. We contribute to previous research in the area of DEA in expert systems (Min & Lee (2008); 

Shetty, Pakkala, & Mallikarjunappa, (2012); Xu & Wang (2009)) by introducing the time-varying 

component; by comparing industry-specific models against the generic models; and by investigat ing 

the potential effect on predictive accuracy of different efficiency measure. 

Dynamic models have inherent advantages over static models in the context of event prediction 

because conditions and behaviours change over time, so predictions need to be adjusted by 

incorporating as much information as possible. In this paper, our financial distress hazard models are 

enhanced by dynamic DEA scores which provide insights into the efficiency of a company relative to 

others over time. In the domain of DEA, Malmquist DEA is the only one that catches temporal changes 

of DMUs so it allows efficiency to be compared in both cross-sectional and time series formats. Other 

DEA algorithms such as standard DEA, Network DEA or Window DEA do not quite fit the bankruptcy 

prediction paradigm. A Malmquist productivity index is defined as the product of efficiency change 

(catch-up) and technological change (frontier-shift) and is calculated by the standard DEA scores in 

two periods and two inter-temporal scores with reference to the efficienc y frontier of the other period.   

A weakness of Malmquist DEA is that it is computationally intensive and cannot handle very large 

datasets. Nevertheless, corporate loan portfolios are often relatively small in sample size as compared 

to retail credit portfolios, such as mortgages.  

No previous research has attempted a panel analysis of DEA efficiency in predicting the probabilit y 

of financial distress. This paper has bridged this gap by calculating dynamic relative efficiency scores 

using Malmquist DEA and incorporating them as covariates in hazard models. Our models therefore 

provide time-varying information about the probability of distress and use panel rather than cross-

sectional estimators. We find all efficiency scores are negatively associated with the probability of 

financial distress. These results confirm the findings from previous literature that the more efficient a 

company is, the less likely it is to encounter financial difficulties.  

We have experimented with several types of efficiency measures that offer insights into relative 

performance of companies over time. This offers the possibility for lenders to understand risk drivers 

of a company’s financial distress and how they vary over time. Our findings imply that the highest 

predictive accuracy is achieved when pooling the industries together and using generic efficiency for 

prediction. This implies that lenders can achieve their goals of accurate predictions and interpretable 

results without the need for segmented models and component efficiency scores. 

Pooling all industries together rather than carrying out a DEA analysis for each industry separately 

to calculate DEA scores may be practically effective in detecting financial distress, since the best 

predicting model in our analysis is the Generic Efficiency one. However, the second best is Global 

Efficiency Malmquist DEA with only slightly less accurate predictions than the generic scores. The 


25 

 
Global efficiency takes all historic records into account and chooses the most efficient company years 

as the reference units. This implies that when the sample is sufficiently large, global efficiency can be 

seen as absolute efficiency which is generalised in all units and periods. Therefore, if one is concerned 

with the essential assumption of homogeneity of technology for DEA, the Global Efficiency model 

that keeps this assumption and at the same time produces accurate predictions is the one to choose.  

Although this paper only employs data from three sectors in the empirical analysis, it can obviously 

be extended to a large variety of industries as long as their production technologies are similar, 

otherwise they should be dealt with carefully. Our method provides a feasible solution to accommodate 

differences and commonalities in the business operations of companies. From the comparisons with 

cross-sectional peers and time series histories, managers of companies will be able to identify the 

weaknesses in their businesses and improve their performance to avoid financial difficulties.  

Another useful application of the current paper would be to extend the combination of Malmquist 

DEA and the discrete hazard model to a single sector such as financial institutions. The failure of banks 

was noticeable during the subprime crisis, and would have great impact on the real economy of all 

countries. Giving dynamic early warning of their failure by inclusion of the time dimension will be of 

interest in the scope of bankruptcy prediction. Finally, incorporation of DEA scores into machine -

learning predictive algorithms can be another fruitful avenue of investigation. 

 
References 

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. 

Journal of Finance, 23, 589-609. 
Andersen, P., & Petersen, N. C. (1993). A procedure for ranking efficient units in data envelopme nt 

analysis. Managemeng Science, 39, 1261-1264. 
Banasik, J. L., Crook, J. N., & Thomas, L. C. (1996). Does scoring a subpopulation make a difference. 

The International Review of Retail, Distribution and Consumer Research, 6, 180-195. 

Banker, R. D., & Chang, H. (2006). The super-efficiency procedure for outlier identification, not for 
ranking efficient units. European Journal of Operational Research, 175, 1311-1320. 

Berg, S. A., Førsund, F. R., & Jansen, E. S. (1992). Malmquist Indices of Productivity Growth during 
the Deregulation of Norwegian Banking, 1980-89. The Scandinavian Journal of Economics, 94, 
S211-S228. 

Bijak, K., & Thomas, L. C. (2012). Does segmentation always improve model performance in credit 
scoring? Expert Systems with Applications, 39, 2433-2442. 

Bonfim, D. (2009). Credit risk drivers: Evaluating the contribution of firm level information and of 
macroeconomic dynamics. Journal of Banking & Finance, 33, 281-299. 

Bryan, D., Fernando, G. D., & Tripathy, A. (2013). Bankruptcy risk, productivity and firm strategy. 

Review of Accounting and Finance, 12, 309-326. 
Cao, Y. (2012). MCELCCh-FDP: Financial distress prediction with classifier ensembles based on firm 

life cycle and Choquet integral. Expert Systems with Applications, 39, 7041-7049. 


26 

 
Carling, K., Jacobson, T., Linde, J., & Roszbach, K. (2007). Corporate credit risk modeling and the 
macroeconomy. Journal of Banking & Finance, 31, 845-868. 

Caves, D. W., Christensen, L. R., & Diewert, W. E. (1982). The Economic Theory of Index Numbers 

and the Measurement of Input, Output, and Productivity. Econometrica, 50, 1393-1414. 
Charnes, A., Clark, C. T., Cooper, W. W., & Golany, B. (1984). A developmental study of data 

envelopment analysis in measuring the efficiency of maintenance units in the U.S. air forces. Annals 
of Operations Research, 2, 95-112. 

Chava, S., & Jarrow, R. A. (2004). Bankruptcy Prediction with Industry Effects. Review of Finance, 

8, 537-569. 
Cielen, A., Peeters, L., & Vanhoof, K. (2004). Bankruptcy prediction using a data envelopme nt 

analysis. European Journal of Operational Research, 154, 526-532. 
Cook, W. D., & Seiford, L. M. (2009). Data envelopment analysis (DEA) – Thirty years on. European 

Journal of Operational Research, 192, 1-17. 

Cooper, W. W., Seiford, L. M., & Tone, K. (2006). DATA ENVELOPMENT ANALYSIS A 
Comprehensive Text with Models, Applications, References and DEA-Solver Software (2nd ed.): 

Springer. 
Costa, R. (2012). Assessing Intellectual Capital efficiency and productivity: An application to the 

Italian yacht manufacturing sector. Expert Systems with Applications, 39, 7255-7261. 

Dyson, R. G., Allen, R., Camanho, A. S., Podinovski, V. V., Sarrico, C. S., & Shale, E. A. (2001). 
Pitfalls and protocols in DEA. European Journal of Operational Research, 132, 245-259. 

Emel, A. B., Oral, M., Reisman, A., & Yolalan, R. (2003). A credit scoring approach for the 
commercial banking sector. Socio-Economic Planning Sciences, 37, 103-123. 

Färe, R., Grosskopf, S., Lindgren, B., & Roos, P. (1992). Productivity changes in Swedish pharamacies 

1980–1989: A non-parametric Malmquist approach. Journal of Productivity Analysis, 3, 85-101. 
Färe, R., Grosskopf, S., Norris, M., & Zhang, Z. (1994). Productivity Growth, Technical Progress, and 

Efficiency Change in Industrialized Countries. The American Economic Review, 84, 66-83. 

Farrell, M. J. (1957). The Measurement of Productive Efficiency. Journal of the Royal Statistical 
Society. Series A (General), 120, 253-290. 

Fedorova, E., Gilenko, E., & Dovzhenko, S. (2013). Bankruptcy prediction for Russian companies: 
Application of combined classifiers. Expert Systems with Applications, 40, 7285-7293. 

Fuentes, R., & Lillo-Bañuls, A. (2015). Smoothed bootstrap Malmquist index based on DEA model to 

compute productivity of tax offices. Expert Systems with Applications, 42, 2442-2450. 
Gordini, N. (2014). A genetic algorithm approach for SMEs bankruptcy prediction: Empirical evidence 

from Italy. Expert Systems with Applications, 41, 6433-6445. 
Hua, Z., Wang, Y., Xu, X., Zhang, B., & Liang, L. (2007). Predicting corporate financial distress based 

on integration of support vector machine and logistic regression. Expert Systems with Applications, 

33, 434-440. 
Kingyens, A. T., Paradi, J. C., & Tam, F. (2016). Bankruptcy Prediction of Companies in the Retail-

Apparel Industry Using Data Envelopment Analysis. In J. Aparicio, C. A. K. Lovell & J. T. Pastor 
(Eds.), Advances in Efficiency and Productivity (pp. 299-329). Cham: Springer Internatio na l 
Publishing. 

Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-of-the-art 
classification algorithms for credit scoring: An update of research. European Journal of Operational 

Research, 247, 124-136. 
Li, Z., Crook, J., & Andreeva, G. (2014). Chinese companies distress prediction: an application of data 

envelopment analysis. Journal of the Operational Research Society, 65, 466-479. 

López Iturriaga, F. J., & Sanz, I. P. (2015). Bankruptcy visualization and prediction using neural 
networks: A study of U.S. commercial banks. Expert Systems with Applications, 42, 2857-2869. 

Malmquist, S. (1953). Index numbers and indifference surfaces. Trabajos de Estatistica, 4, 209-242. 


27 

 
Marques, A. I., Garcia, V., & Sanchez, J. S. (2012). Exploring the behaviour of base classifiers in 
credit scoring ensembles. Expert Systems with Applications, 39, 10244-10250. 

Min, J. H., & Lee, Y.-C. (2008). A practical approach to credit scoring. Expert Systems with 

Applications, 35, 1762-1770. 
Nam, C. W., Kim, T. S., Park, N. J., & Lee, H. K. (2008). Bankruptcy prediction using a discrete- time 

duration model incorporating temporal and macroeconomic dependencies. Journal of Forecasting, 
27, 493-506. 

Paradi, J., Asmild, M., & Simak, P. (2004). Using DEA and Worst Practice DEA in Credit Risk 

Evaluation. Journal of Productivity Analysis, 21, 153-165. 
Paradi, J. C., Wilson, D. A., & Yang, X. (2014). Data Envelopment Analysis of Corporate Failure for 

Non-Manufacturing Firms Using a Slacks-Based Measure. Journal of Service Science and 
Management, Vol.07No.04, 14. 

Pastor, J. T., & Lovell, C. A. K. (2005). A global Malmquist productivity index. Economics Letters, 

88, 266-271. 
Premachandra, I. M., Bhabra, G. S., & Sueyoshi, T. (2009). DEA as a tool for bankruptcy assessment: 

A comparative study with logistic regression technique. European Journal of Operational Research, 
193, 412-424. 

Premachandra, I. M., Chen, Y., & Watson, J. (2011). DEA as a tool for predicting corporate failure 

and success: A case of bankruptcy assessment. Omega, 39, 620-626. 
Psillaki, M., Tsolas, I. E., & Margaritis, D. (2010). Evaluation of credit risk based on firm performance. 

European Journal of Operational Research, 201, 873-881. 
Shetty, U., Pakkala, T. P. M., & Mallikarjunappa, T. (2012). A modified directional distance 

formulation of DEA to assess bankruptcy: An application to IT/ITES companies in India. Expert 

Systems with Applications, 39, 1988-1997. 
Shumway, T. (2001). Forecasting bankruptcy more accurately: A simple hazard model. Journal of 

Business, 74, 101-124. 

Shyu, J., & Chiang, T. (2012). Measuring the true managerial efficiency of bank branches in Taiwan: 
A three-stage DEA analysis. Expert Systems with Applications, 39, 11494-11502. 

Sohn, S. Y., & Kim, Y. (2012). DEA based multi-period evaluation system for research in academia. 
Expert Systems with Applications, 39, 8274-8278. 

Sun, J., Jia, M.-y., & Li, H. (2011). AdaBoost ensemble for financial distress prediction: An empirica l 

comparison with data from Chinese listed companies. Expert Systems with Applications, 38, 9305-
9312. 

Tone, K. (2002). A slacks-based measure of super-efficiency in data envelopment analysis. European 
Journal of Operational Research, 143, 32-41. 

Wanke, P., Azad, M. A. K., & Barros, C. P. (2016). Financial distress and the Malaysian dual baking 

system: A dynamic slacks approach. Journal of Banking & Finance, 66, 1-18. 
Wanke, P., Barros, C. P., & Faria, J. R. (2015). Financial distress drivers in Brazilian banks: A dynamic 

slacks approach. European Journal of Operational Research, 240, 258-268. 
Wilson, N., & Altanlar, A. (2014). Company failure prediction with limited information: newly 

incorporated companies. Journal of the Operational Research Society, 65, 252-264. 

Xu, X., & Wang, Y. (2009). Financial failure prediction using efficiency as a predictor. Expert Systems 
with Applications, 36, 366-373. 

Yang, X., & Dimitrov, S. (2017). Data envelopment analysis may obfuscate corporate financial data: 
using support vector machine and data envelopment analysis to predict corporate failure for 
nonmanufacturing firms. INFOR: Information Systems and Operational Research, 1-17. 

Yang, Z., You, W., & Ji, G. (2011). Using partial least squares and support vector machines for 
bankruptcy prediction. Expert Systems with Applications, 38, 8336-8342. 

Yeh, C.-C., Chi, D.-J., & Hsu, M.-F. (2010). A hybrid approach of DEA, rough set and support vector 
machines for business failure prediction. Expert Systems with Applications, 37, 1535-1541. 


28