A new dynamic modeling framework for credit risk assessment


Expert Systems With Applications 45 (2016) 341–351

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

A new dynamic modeling framework for credit risk assessment

Maria Rocha Sousa a,∗, João Gama a,b, Elísio Brandão a

a School of Economics and Management, University of Porto, Portugal
b Laboratory of Artificial Intelligence and Decision Support of the Institute for Systems and Computer Engineering, Technology and Science, Portugal

a r t i c l e i n f o

Keywords:

Credit risk modeling

Credit scoring

Dynamic modeling

Temporal degradation

Default concept drift

Memory

a b s t r a c t

We propose a new dynamic modeling framework for credit risk assessment that extends the prevailing credit

scoring models built upon historical data static settings. The driving idea mimics the principle of films, by

composing the model with a sequence of snapshots, rather than a single photograph. In doing so, the dynamic

modeling consists of sequential learning from the new incoming data. A key contribution is provided by

the insight that different amounts of memory can be explored concurrently. Memory refers to the amount

of historic data being used for estimation. This is important in the credit risk area, which often seems to

undergo shocks. During a shock, limited memory is important. Other times, a larger memory has merit. An

application to a real-world financial dataset of credit cards from a financial institution in Brazil illustrates our

methodology, which is able to consistently outperform the static modeling schema.

© 2015 Elsevier Ltd. All rights reserved.

1

m

m

a

p

i

c

h

i

t

s

A

f

a

b

a

i

i

t

j

m

h

c

r

i

a

u

y

s

a

d

u

t

m

i

u

n

m

c

e

u

n

h

h

0

. Introduction

In banking, credit risk assessment often relies on credit scoring

odels, so called PD models (Probability of Default models).1 These

odels output a score that translates the probability of a given entity,

private individual or a company, becoming a defaulter in a future

eriod. Nowadays, PD models are at the core of the banking business,

n credit decision-making, in price settlement, and to determine the

ost of capital. Moreover, central banks and international regulation

ave dramatically evolved to a setting where the use of these models

s favored, to achieve soundness standards for credit risk valuation in

he banking system.

Since 2004, with the worldwide implementation of regulations is-

ued by the Basel Committee on Banking Supervision within Basel II

ccord, banks were encouraged to strengthen their internal models

rameworks for reaching the A-IRB (Advanced Internal Rating Based)

ccreditation (BCBS, 2006; BIS, 2004). To achieve this certification,

anks had to demonstrate that they were capable of accurately evalu-

ting their risks, complying with Basel II requirements, by using their

nternal risk models’ systems, and keep their soundness. Banks own-

ng A-IRB accreditation gained an advantage over the others, because

hey were allowed to use lower coefficients to weight the exposure of
∗ Corresponding author. Tel.: +351967139811.
E-mail addresses: 100427011@fep.up.pt, jsc@inescporto.pt (M.R. Sousa),

gama@fep.up.pt (J. Gama), ebrandao@fep.up.pt (E. Brandão).
1 Other names can be used to refer to PD models, namely: credit scoring, credit risk

odels, scorecards, credit scorecards, rating systems, or rating models, although some

ave different meanings.

c

e

1

s

t

r

ttp://dx.doi.org/10.1016/j.eswa.2015.09.055

957-4174/© 2015 Elsevier Ltd. All rights reserved.
redit at risk, the risk weighted assets, and benefit from lower capital

equirements. A lot of improvements have been made in the exist-

ng rating frameworks, extending the use of data mining tools and

rtificial intelligence. Yet, this may have been bounded by a certain

nwillingness to accept less intuitive algorithms or models going be-

ond standard solutions being implemented in the banking industry,

ettled in-house or delivered through analytics providers.

Developing and implementing a credit scoring model can be time

nd resource consuming, easily ranging from 9 to 18 months, from

ata extraction until deployment. Hence, it is not rare that banks use

nchanged credit scoring models for several years. Bearing in mind

hat models are built using a sample file frequently comprising 2 or

ore years of historical data, in the best case scenario, data used

n the models are shifted 3 years away from the point they will be

sed. Should conditions remain unchanged, then this would not sig-

ificantly affect the accuracy of the models, otherwise, their perfor-

ance can greatly deteriorate over time. The recent financial crisis

onfirmed that financial environment greatly fluctuates, in an un-

xpected manner, posing renewed attention regarding models built

pon time-frames that are by far outdated. By 2007–2008, many fi-

ancial institutions were using stale credit scoring models built with

istorical data of the early-decade. The degradation of stationary

redit scoring models is an issue with empirical evidence in the lit-

rature (Avery, Calem, & Canner, 2004; Crook, Thomas, & Hamilton,

992; Lucas, 2004; Sousa, Gama, & Gonçalves, 2013b), however re-

earch is still lacking more realistic solutions.

Dominant approaches rely on static learning models. However, as

he economic conditions evolve in the economic cycle, either deterio-

ating or improving, also varies the behavior of an individual, and his

http://dx.doi.org/10.1016/j.eswa.2015.09.055
http://www.ScienceDirect.com
http://www.elsevier.com/locate/eswa
http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.09.055&domain=pdf
mailto:100427011@fep.up.pt
mailto:jsc@inescporto.pt
mailto:jgama@fep.up.pt
mailto:ebrandao@fep.up.pt
http://dx.doi.org/10.1016/j.eswa.2015.09.055


342 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351

p

h

s

i

i

c

o

c

2

a

a

i

c

a

u

s

(

p

i

s

c

d

f

c

m

t

t

t

l

e

o

t

T

l

t

s

b

A

p

c

a

l

v

t

s

o

e

a

c

c

t

d

w

t

e

c

s

c

i

m

i

m

a

b

ability to repay his debt. Furthermore, the default evolution echoes

trends of the business cycle, and related with this, regulatory move-

ments, and interest rates fluctuations. In good times, banks and bor-

rowers tend to be overoptimistic about the future, whilst in times of

recession banks are swamped with defaulted loans, high provisions,

and tighten capital buffers turn highly conservative. The former leads

to more liberal credit policies and lower credit standards, the later

promotes sudden credit-cuts. Hence, default needs to be regarded as

time changing.

Traditional systems that are one-shot, fixed memory-based,

trained from fixed training sets, and static settings are not prepared

to process the evolving data. And so, they are not able to continuously

maintain an output model consistent with the actual state of environ-

ment, or to quickly react to changes (Gama, 2010). These are some of

the features of classic approaches that evidence the constraints of the

existing credit scoring systems. As the processes underlying credit

risk are not strictly stationary, consumers’ behavior and default can

change over time in unpredictable ways. A few limitations to the ex-

isting approaches, idealized in the classical supervised classification

paradigm, can be traced in published literature:

• The static models usually fail to adapt when the population

changes. Static and predefined sample settings often lead to an

incomplete examination of the dynamics influencing the prob-

lem (Gama, 2010; Hand, 2006).
• Certain assumptions that are implicit to the methods, often fail in

real-world environments (Yang, 2007). These assumptions relate

to:

– Representativeness - the standard credit scoring models rely

on supervised classification methods that run on 2-years-old

static samples, in order to determine which individuals are

likely to default in a future fixed period, 1 year for PD mod-

els (Thomas, 2010; Thomas, Edelman, & Crook, 2002). Such

samples are supposed to be representative of the potential

borrowers consumers of the future, the through-the-door pop-

ulation. They should also be sufficiently diverse to reflect dif-

ferent types of repayment behavior. However, a wide range of

research is conducted in samples that are not representative.

– Stability and non-bias - the distribution from which the design

points and the new points is the same; classes are perfectly de-

fined, and definitions will not change. Not infrequently there

are selective biases over time. Simple examples of this occur-

rence can be observed when a bank launches a new product or

promotes a brand new segment of customers. It can also occur

when macroeconomics shifts abruptly from an expansion to a

recession phase, or vice versa.

– Misclassification costs - these methods assume that the costs

of misclassification are accurately known, but in practice they

are not.
• The methods that are most widely used in the banking industry,

logistic regression and discriminant analysis are associated with

some instability with high-dimensional data and small sample

size. Other limitations regard to intensive variable selection effort

and incapability of efficiently handling non-linear features (Yang,

2007).
• Static models are usually focused in assessing the specific risk of

applicants and obligors. However, a complete picture can only be

achieved by looking at the return alongside risk, which requires

the use of dynamic rather than static models (Bellotti & Crook,

2013).

There is a new emphasis on running predictive models with the

ability of sensing themselves and learn adaptively (Gama, 2010). Ad-

vances on the concepts for knowledge discovery from data streams

suggest alternative perspectives to identify, understand and effi-

ciently manage dynamics of behavior in consumer credit in chang-

ing ubiquitous environments. In a world where the events are not
reordained and little is certain, what we do in the present affects

ow events unfold in unexpected ways. So far, no comprehensive

et of research to deal with time changing default had much impact

nto practice. In credit risk assessment, a great deal of sophistication

s needed to introduce economic factors and market conditions into

urrent risk-assessment systems (Thomas, 2010).

The study presented in this paper is a large extension of a previ-

us research that delivered the winning model within the BRICS 2013

ompetition in data mining and finance (Sousa, Gama, Brandão et al.,

013a; Sousa et al., 2013b). This competition opened to academics

nd practitioners, was focused on the development of a credit risk

ssessment model, tilting between the robustness of a static model-

ng sample and the performance degradation over time, potentially

aused by market gradual changes along few years of business oper-

tion. Participants were encouraged to use any modeling technique,

nder a temporal degradation or concept drift perspective. In the re-

earch attached to the winning model, Sousa, Gama, and Gonçalves

2013b) have proposed a two-stage model for dealing with the tem-

oral degradation of credit scoring models, which produced motivat-

ng results in a 1-year horizon. The winners first developed a credit

coring method using a set of supervised learning methods, and then

alibrated the output, based on a projection of the evolution in the

efault. This adjustment considered both the evolution of the de-

ault and the evolution of macroeconomic factors, echoing potential

hanges in the population of the model, in the economy, or in the

arket. In so doing, resulting adjusted scores translated a combina-

ion of the customers’ specific risk with systemic risk. The winning

eam (Sousa, Gama, & Gonçalves) concluded that the performance of

he models did not significantly differ among classification models,

ike logistic regression (LR), AdaBoost, and Generalized Additive Mod-

ls (GAM). However, after training in several windows lengths, they

bserved that the model based on the longest window has produced

he best performing model over the long-run, among all competitors.

his finding allowed to realize that some specifics of the credit portfo-

ios and macroeconomic environments may reveal quite stable along

ime. For those cases, a model built with a static learning setting may

eem appropriate, if tested during stable phases. The question yet to

e answered was in which conditions credit risk models degrade?

nd when so, if there is any alternative modeling technique to the

revailing credit scoring models? The aim of this study is to find a

learer understanding on which type of modeling framework allows

rapid adaptation to changes, and in which circumstances a static

earning setting still delivers well-performing models. With this in

iew, we implemented a dynamical modeling framework and two

ypes of windows for model training, which enable testing our re-

earch questions: (a) In which conditions can a dynamic modeling

utperform a static model?; (b) Is the recent information more rel-

vant to improve forecasting accuracy?; (c) Does older information

lways improve forecasting accuracy?

This paper introduces a new dynamic modeling framework for

redit risk assessment, imported from the emerging techniques of

oncept drift adaptation, in streaming data mining and artificial in-

elligence. The proposed model is able to produce more robust pre-

ictions in stable conditions, but also in the presence of changes,

hile the prevailing methods cannot. This is a promissory tool both

o academics and practitioners, because unlike the traditional mod-

ls, it has the ability of adjusting the predictions in the presence of

hanges, like inversions in the economic cycles, major crisis, or intrin-

ic behavioral circumstances (e.g. divorce, unemployment and finan-

ial distress). Besides the goal of enhancing the prediction of default

n credit, the new modeling framework also enables developing a

ore comprehensive understanding of the evolution of the credit rat-

ng systems over time and anticipating unexpected events. Further-

ore, we study the implications to credit risk assessment of keeping

long-term memory, and forgetting older examples, which have not

een done so far.


M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 343

i

s

d

s

f

u

a

fi

b

c

t

G

m

t

s

f

u

I

a

m

w

c

i

m

p

d

2

c

fi

l

m

a

t

e

2

l

y

t

a

u

u

m

u

i

a

a

k

v

a

a

P

f

v

i

e

2

t

t

d

c

t

o

t

d

t

L

o

S

t

m

m

i

i

b

p

A

m

c

s

m

A

s

g

w

c

o

j

o

w

o

a

a

r

2

p

e

d

t

a

c

b

w

T

m

c

o

m

f

i

c

b

f

t

t

Few authors have explicitly tried a dynamic modeling framework

n credit risk assessment, or connected concepts. Based on a national

ample of a credit reporting agency, Avery et al. (2004) show that tra-

itional modeling often fails to consider situational circumstances,

uch as local economic conditions and individual trigger events, af-

ecting the ability of scoring systems to accurately quantify individ-

als’ credit risk. We can trace the few existing contributions in this

rena over the most recent years. Sun and Li (2011) formally de-

ne financial distress concept drift and build a dynamic modeling

ased on instance selection. Saberi et al. (2013) worked on the con-

ept of granularity for selecting the optimum size of the testing and

raining groups with a sample of credit cards of a bank operating in

erman. Pavlidis, Tasoulis, Adams, and Hand (2012) proposed a

ethodology for the classification of credit applications with the po-

ential of adapting to population drifts.

This paper follows in Section 2 with a brief description of the main

ettings and concepts of the supervised learning problem and score

ormulation. It also presents an overview of the methods typically

sed in supervised learning, and specifically in credit score modeling.

n Section 3, we introduce the topic of concept drift in credit default

nd some adaptation methods that can be promising for dynamic

odeling credit risk. In Section 4 we present a case study, where

e employ a set of these adaptation methods to a real-world finan-

ial dataset. First, we characterize the database and provide some

ntuition on the background of the problem. Then, we explain the

ethodology of this research. Section 5 provides the fundamental ex-

erimental results. Conclusions and future applications of the new

ynamic modeling framework are traced in Section 6.

. Settings and concepts

In this work we import some of the emerging techniques in con-

ept drift adaptation into credit risk assessment models. This is a

eld of research that has been receiving much attention in machine

earning over the last decade, as an answer for suitably shaping the

odels and processes to a reality that is ever-changing over contexts

nd time. The settings and definitions adopted in this paper replicate

he general nomenclature surveyed by Gama, Žliobaitė, Bifet, Pech-

nizkiy, and Bouchachia (2014).

.1. Supervised learning problem

Credit risk assessment can be addressed as a classification prob-

em, a subset of supervised learning. The aim is to predict the default

∈ {good, bad}, given a set of input characteristics x. The term at-
ribute refers to each of the possible values that a characteristic can

ssume; the term bin denotes a set of attributes or an interval of val-

es in a continuous characteristics; the term example, or record, is

sed to refer to one pair of (x, y). Supervised learning classification

ethods try to determine a function that best separates the individ-

als in each of the classes, good and bad, in the space of the problem.

The model building is carried on a set of training examples - train-

ng set – collected from the past history of credit, for which both x

nd y are known. The best separation function can be achieved with

classification method. These methods include, among others, well-

nown classification algorithms such as decision trees (DT), support

ector machines (SVM), artificial neural networks (ANN), and Gener-

lized Additive Models (GAM). Hands-on software packages are avail-

ble to the user for example in R, SAS, Matlab, and Model Builder for

redictive Analytics. In credit scoring models, the accuracy of such

unctions is typically assessed in separate sets of known examples –

alidation or out-of-sample data sets. The idea behind this procedure

s to mimic the accuracy of that function in future predictions of new

xamples where x is known, but y is not.

According to the Bayesian Decision Theory (Duda, Hart, & Stork,

001), a classification can be described by the prior probabilities of
he classes p(y) and the class conditional probability density func-

ion p(x|y) for the two classes, good (G) and bad (B). The classification

ecision is made according to the posterior probabilities of the two

lasses, which for class B can be represented as:

p(B|x) = p(x|B)p(B)/p(x) (1)
where p(x) = p(B)p(x|B) + p(G)p(x|G). Here, it is assumed that

he costs for misclassifying a bad customer are the same as for the

pposite situation, the equal costs assumption. It is worth recalling

hat, in real-world financial environments, the costs of failing the pre-

iction in a real bad are by far superior to failing in a real good. In

he first case, there is essentially a loss of the exposure at default,

oss Given Default (LGD), possibly mitigated with collateral. The sec-

nd case affects the business, as it translates into a loss of margin.

ousa and da Costa (2008) show several possibilities to overcome

his practical issue, by adapting the output of standard classification

ethods under the equal costs assumption to imbalanced costs for

isclassification, associated with the decision and prediction tasks. It

s worth discussing the related issue of class imbalance in credit scor-

ng datasets. Quite often, these datasets contain a much smaller num-

er of observations in the class of defaulters than in that of the good

ayers (Brown & Mues, 2012; Marqués, García, & Sánchez, 2012a).

large class imbalance is therefore present which some techniques

ay not be able to successfully handle. Baseline methods to handle

lass imbalance include oversampling the minority class or under

ampling the majority class; Tomek links is an example of the for-

er, SMOTE of the latter (Chawla, Bowyer, Hall, & Kegelmeyer, 2002).

nother established approach to correct imbalance adopt a cost sen-

itive classifier with the misclassification cost of the minority class

reater than that of the majority class. Within this approach, it is

orth mentioning MetaCost, a general method for making classifiers

ost-sensitive (Domingos, 1999). All these methodologies, implicitly

r explicitly, optimize the decision process for a specific business ob-

ective. In other words, the optimization is made for a specific trade-

ff between the error committed in identifying someone as defaulter

hen one is in fact a non-defaulter individual and the opposite type

f error of diagnosing someone as non-defaulter when one is in fact

defaulter. This individualization is unconnected with our study and

ny of these methods can be incorporated in the methodology under

esearch.

.2. Score formulation

A credit scoring model is a simplification of the reality. The out-

ut is a prediction of a given entity, actual or potential borrower,

ntering in default in a given future period. Having decided on the

efault concept, conventionally a borrower being in arrears for more

han 90 days in the following 12 months, those matching the criteria

re considered bad and the others are good. Other approaches may

onsider a third status, the indeterminate, between the good and the

ad classes, e.g. 15 to 90 days overdue, for which it may be unclear

hether the borrower should be assigned to one class or to the other.

his status is usually removed from the modeling sample, despite the

odel can be used to score them. For simplicity, in this paper we will

onsider the problem of two classes, although the proposed method-

logy can easily be adapted to the other case.

The output is a function of the input characteristics x, which is

ost commonly referred as score, s(x). We also consider that this

unction has a monotonic decreasing relationship with the probabil-

ty of entering in default (i.e. reaching the bad status). A robust score-

ard enables an appropriate differentiation between the good and the

ad classes. It is achieved by capturing an adequate set of information

or predicting the probability of the default concept (i.e. belonging to

he bad class), based on previous known default occurrences. The no-

ation of such probability, Pr{bad|score based on X}, is:

p(B|s(x)) = p(B|s(x), x) = p(B|x), ∀x ∈ X (2)


344 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351

&

g

2

c

2

w

a

a

o

u

v

i

a

l

c

t

t

a

p

S

t

2

e

m

3

3

p

i

b

a

a

d

e

m

c

s

f

W

e

c

t

m

t

t

u

i

n

i

(

t

v

p

u

t

t

o

h

Since p(G|x) + p(B|x) = 1, it naturally follows the probability of the
complementary class:

p(G|s(x)) = P(G|x) = 1 − p(B|x), ∀x ∈ X (3)
Among researchers and real-world applications, a usual written form

of the score is the log odds score:

s(x) = ln p(G|x)
p(B|x) , and p(G|x) + p(B|x) = 1. (4)

In so saying, the score may vary from −∞, when P(G|x) = 0, to
+∞, when P(G|x) = 1, i.e. s(x) ∈ R. The probability of the default
event can be written in terms of the score:

p(B|x) = 1/(1 + es(x)), ∀x ∈ X
The most conventional way to produce log odds score is based in

the logistic regression. However, other classification algorithms can

also be used, adjusting the output to the scale of that function. In

so saying, we assume that independently of the method used to de-

termine the best separation between the two classes, good and bad,

and then the resulting scorecard has the same property of the log

odds score. Although a grounded mathematical treatment may be

tempting to tackle this problem, it goes beyond the scope of this

work. Notwithstanding, we provide some intuitions on the techni-

cal material to survey. The basics of credit scoring and the most

common approaches to build a scorecard, are further detailed in the

operational research literature (Anderson, 2007; Crook, Edelman, &

Thomas, 2007; McNab & Wynn, 2000; Thomas, 2009; Thomas et al.,

2002). Recent advances in the area also deliver methods to build risk

based pricing models (Thomas, 2009) and methodologies towards

the optimization of the profitability to the lenders (Einav, Jenkins, &

Levin, 2013).

2.3. Supervised classification methods

The first approach to differentiate between groups took place

in Fisher’s original work in (1936) for general classification prob-

lems of varieties of plants. The objective was to find the best sep-

aration between two groups, searching for the best combination of

variables such that the groups were separated the most in the sub-

space. Durand (1941) brought this methodology to finance for distin-

guishing between good and bad consumer loans.

Discriminant analysis was the first method used to develop credit

scoring systems. Altman (1968) introduced it in the prediction of cor-

porate bankruptcy. First applications in retail banking were mainly

focused on credit granting in two categories of loans: consumer loans,

and commercial loans (for an early review and critique on the use

of discriminant analysis in credit scoring see Eisenbeis (1978)). The

boom of credit cards demanded the automation of the credit decision

task and the use of better credit scoring systems, which were doable

due to the growth of computing power. The value of credit scoring

became noticed and it was recognized as a much better predictor

than any other judgmental scheme. Logistic regression (Steenackers

& Goovaerts, 1989) and linear programming (see Chen, Zhong, Liao,

and Li, 2013 for a review) were introduced in credit scoring, and

they turned out to be the most used in financial industry (Anderson,

2007; Crook et al., 2007). The use of artificial intelligence tech-

niques imported from statistical learning theory, such as classifi-

cation trees (Breiman, Friedman, Olshen, & Stone, 1984; Quinlan,

1986) and neural networks (Desai, Crook, & Overstreet Jr, 1996;

Jensen, 1992; Malhotra & Malhotra, 2002; West, 2000) have arisen

in credit scoring systems. Support Vector Machine (SVM) is another

method based in optimization and statistical learning, that received

increased attention over the last decade in research in finance, ei-

ther to build credit scoring systems for consumer finance or to predict

bankruptcy (Li, Shiue, & Huang, 2006; Min & Lee, 2005; Wang, Wang,

& Lai, 2005). Genetic algorithms (Chen & Huang, 2003; Ong, Huang,
Tzeng, 2005), colony optimization (Martens et al., 2007), and re-

ression and multivariate adaptive regression splines (Lee & Chen,

005) have also been tried. Evolutionary computing (Marqués, Gar-

ía, & Sánchez, 2013), including genetic algorithms (Chen & Huang,

003; Ong et al., 2005) and colony optimization (Martens et al., 2007),

as also considered for credit scoring. Regression (Lee & Chen, 2005)

nd clustering (Wei, Yun-Zhong, & Ming-shu, 2014) techniques have

lso been tailored to the problem.

The choice of a learning algorithm is a difficult problem and it is

ften based on which happen to be available, or best known to the

ser (Jain, Duin, & Mao, 2000). The number of learning algorithms is

ast. Many frameworks, adaptations to real-life problems, intertwin-

ng of base algorithms were, and continue to be, proposed in the liter-

ture, ranging from statistical approaches to state-of-the-art machine

earning algorithms, from parametric models to non-parametric pro-

edures (Abdou & Pointon, 2011; Baesens et al., 2003). As an alterna-

ive to using a single method, a trend that is still evolving relates to

he use of hybrid systems (Hsieh, 2005; Lee, Chiu, Lu, & Chen, 2002),

nd ensemble of classifiers with which the outputs are achieved by a

redefined sequence or rule, or a voting scheme (Marqués, García, &

ánchez, 2012b; Wang, Hao, Ma, & Jiang, 2011).

New concepts for adapting to changes (Adams, Tasoulis, Anagnos-

opoulos, & Hand, 2010; Pavlidis et al., 2012; Sousa et al., 2013b; Yang,

007) and modeling the dynamics (Crook & Bellotti, 2010; Saberi

t al., 2013) in populations start being exploited in credit risk assess-

ent.

. Dynamic modeling for credit default

.1. Concept drift in credit default

Credit default is mostly a consequence of financial distress. A

erson, or a company, is in financial distress when is experiencing

ndividual financial constraints or is being exposed to external distur-

ances. In private individuals, financial constraints may result from

brupt or intrinsic circumstances. In the first case, distress is usu-

lly an outcome of sorrowful events like unemployment, pay cuts,

ivorce, and disease. The second is most commonly related to over-

xposure, low assets, erratic behavior, or bad management perfor-

ance. In this paper we tackle the phenomenon of concept drift in

redit default, which we now briefly explain.

In the existing literature, concept drift is generally used to de-

cribe changes in the target concept, which are activated by trans-

ormations in the hidden context (Schlimmer & Granger Jr, 1986;

idmer & Kubat, 1996) in dynamically changing and non-stationary

nvironments. As a result of these transformations, the target concept

an shift suddenly or just cause a change in the underlying data dis-

ribution to the model. This means that with time, optimal features

ay drift significantly from their original configuration or simply lose

heir ability to explain the target concept. For example, a reduction of

he minimum LTV (loan to value), tighten the space of possible val-

es, which is noticed with a change in the distribution, and eventually

n the credit default concept. When such drifts happen, the robust-

ess of the model may significantly decrease, and in some situations

t may no longer be acceptable.

Some authors distinguish real concept drift from virtual drift

Gama et al., 2014; Sun & Li, 2011; Tsymbal, 2004). The former refers

o changes in the conditional distribution of the output (i.e., target

ariable) given the input features, while the distribution of the in-

ut may remain unchanged. The later refers to gradual changes in the

nderlying data distribution with new sample data flowing, whereas

he target concept does not change (Sun & Li, 2011).

Real concept drift refers to changes in p(y|x), and it happens when

he target concept of credit default evolves in time. Such changes can

ccur either with or without a change in p(x). This type of drift may

appen directly as a result of new rules for defining the target classes,


M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 345

g

d

g

r

p

d

d

o

w

i

i

fi

s

a

s

M

d

t

p

m

i

(

i

i

l

V

a

m

i

t

w

c

t

p

t

3

l

d

o

c

b

t

w

t

a

d

m

T

t

s

m

t

s

i

c

f

m

g

m

d

r

t

w

d

f

p

d

fi

i

c

t

s

b

p

s

d

(

b

i

i

T

fi

t

t

b

b

d

p

M

s

4

c

t

s

t

p

a

4

c

i

i

o

a

c

i

B

u

B

T

t

4

t

g

ood or bad, as those settled by regulators, when new criteria for

efault are demanded to the banks. Examples of these include the

uidelines for the minimum number of days past due or in the mate-

iality threshold for the amount of credit in arrears, issued with the

revious Basel II Accord. Another understanding of the real concept

rift in credit default is associated with indirect changes in the hid-

en context. In this case, credit default changes when evolving from

ne stage of delinquency to another. For example, most of the people

ith credit until five days past due tend to pay before the following

nstallment, as most of them are just delayers. Yet, the part of debtors

n arrears that also fail the next installment are most likely to be in

nancial distress, possibly as a result of an abrupt or intrinsic circum-

tance, and therefore they require more care from the bank. When

rrears exceed three installments, the debtor is most certainly with

erious financial constraints, and is likely to fail his credit obligations.

ore extreme delays commonly translate into hard stages of credit

efault, which require intensive tracking labor or legal actions.

Virtual drifts happen when there are changes in the distribu-

ion of the new sample data flowing without affecting the posterior

robability of the target classes, p(y|x). With time, virtual drifts may

ove to real concept drifts. Other interpretations can also be found

n literature, for describing an incomplete representation of the data

Widmer & Kubat, 1993), and changes in the data distribution lead-

ng to changes in the decision boundary (Tsymbal, 2004). Accord-

ng to some authors, other events can also be seen as virtual drifts,

ike sampling shift (Salganicoff, 1997), temporary drifts (Lazarescu,

enkatesh, & Bui, 2004), and feature change (Salganicoff, 1997). As

n example of virtual drift, we might consider the credit decision-

aking along the recent financial crisis. The lenders had to anticipate

f a borrower would enter in default in the future (i.e. being bad). Al-

hough the macroeconomic factors have worsened, employed people

ith lower debt to income remained good for the lenders, and so they

ontinued to have access to credit.

Although we are mostly interested to track and detect changes in

he real target concept, p(y|x), the methodology introduced in this

aper attempts to cover both real concept and virtual drifts applied

o the default concept drift detection and model rebuilding.

.2. Methods for adaptation

Traditional methods for building a scorecard consider a static

earning setting. In so doing, this task is based in learning in a pre-

efined sample of past examples and then used to predict an actual

r a potential borrower, in the future. This is an offline learning pro-

edure, because the whole training data set must be available when

uilding the model. The model can only be used for predicting, af-

er the training is completed, and then it is not re-trained alongside

ith its utilization. In other words, when the best separation func-

ion is achieved for a set of examples of the past, it is not updated for

while, possibly for years, independently of the changes in the hid-

en context or in the surrounding environment. New perspectives on

odel building arise together with the possibility of learning online.

he driving idea is to process new incoming data sequentially, so that

he model may be continuously updated.

One of the most intuitive ideas for handling concept drift by in-

tance selection is to keep rebuilding the model from a window that

oves over the latest batches and use the learn model for predic-

ion on the immediate future. This idea assumes that the latest in-

tances are the most relevant for prediction and that they contain the

nformation of the current concept (Klinkenberg, 2004). A framework

onnected with this idea consists in collecting the new incoming data

or sequential batches in predefined time intervals, e.g. year by year,

onth by month, or every day. The accumulation of these batches

enerates a panel data flow for dynamic modeling.

In Finance, it remains unclear whether it is best having a long

emory or forgetting old events. If on the one hand, a long memory is
esirable because it allows recalling a wide range of different occur-

ences, in the other, many of those occurrences may no longer adjust

o the present situation. A rapid adaptation to changes is achieved

ith a short window, because it reflects the current distribution of

efault more accurately. However, for the contrary reason, the per-

ormance of models built upon shorter windows worsens in stable

eriods. In credit risk assessment modeling, this matter has been in-

irectly discussed by practitioners and researchers when trying to

gure the pros and cons of using a through-the-cycle (TTC) or point-

n-time (PIT) schema to calibrate the output of the scorecards to the

urrent phase of the economic cycle. For years, a PIT schema was

he only option, because banks did not have sufficient historical data

eries. Since the implementation of the Basel II Accord worldwide,

anks are required to store the data of default for a minimum 7-years

eriod and consider a minimum of 5-years period for calibrating the

corecards.

An original idea of Widmer and Kubat (1996) uses a sliding win-

ow of fixed length with a data processing structure first-in-first-out

FIFO). Each window may consist of a single or multiple sequential

atches, instead of single instances. At each new time step, the model

s updated following two processes. In the first process, the model

s rebuilt based on the training data set of the most recent window.

hen, a forgetting process discards the data that move out of the

xed-length window.

Incremental algorithms (Widmer & Kubat, 1996) are a less ex-

reme hybrid approach that allows updating the prediction of models

o the new contexts. They are able to process examples batch-by-

atch, or one-by-one, and update the prediction model after each

atch, or after each example. Incremental models may rely on ran-

om previous examples or in representative selected sets of exam-

les, called incremental algorithms with partial memory (Maloof &

ichalski, 2004). The challenge is to select an appropriate window

ize.

. Case study

This research evolves from a one-dimensional analysis, where we

ome across the financial outlook underlying the problem, to a mul-

idimensional analysis along several points in time. The former, de-

cribed in Sections 4.1, 4.2, and 4.3, is tailored to gain intuition on

he default predictors and the main factors ruling the context of the

roblem. The latter, in Section 4.4, is designed to gradually develop

nd test a new dynamic framework to model credit risk.

.1. Dataset and validation environment

The research summarized here was conducted in a real-life finan-

ial dataset, comprising 762,966 records, from a financial institution

n Brazil along two years of operation, from 2009 to 2010. Each entity

n the modeling dataset is assigned to a delinquency outcome - good

r bad. In this problem, a person is assigned to the bad class if she had

payment in delay for 60 or more days, along the first year after the

redit has been granted. The delinquency rate in the modeling dataset

s 27.3%, which is in line with the high default rates in credit cards in

razil, one of the countries with the highest default rates in the prod-

ct. The full list of variables in the original data set is available in the

RICS 2013 official website. It contains 39 variables, categorized in

able 1, and one target variable with values 1 identifying a record in

he bad class and 0 for the good class.

.2. Data analysis and cleansing

Some important aspects of the datasets were considered, because

hey can influence the performance of the models. These aspects re-

ard to:


346 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351

Table 1

Predictive variables summary.

Type # Information

Numerical 6 Age, monthly income, time at current address, time at

current employer, number of dependents, and

number of accounts in the bank.

Treated as

nominal

13 Credit card bills due day, 1st to 4th zip digit codes,

home (state, city, and neighborhood), marital status,

income proof type, long distance dialing code,

occupation code, and type of home.

Binary 16 Address type proof, information of the mother and

fathers names, input from credit bureau, phone

number, bills at the home address, previous credit

experience, other credit cards, tax payer and

national id, messaging phone number, immediate

purchase, overdraft protection agreement, lives and

work in the same state, lives and work in the same

city, and gender.

Date 1 Application date.

ID 3 Customer, personal reference, and branch unique

identifiers.

2009

2010

0%

20%

40%

60%

80%

100%

72,532 109,200 159,800 225,800 325,700 902,000

C
u

m
u

la
ti

ve
 f

re
q

u
en

cy

Income (R$)

Fig. 1. Cumulative frequency of the monthly income for 2009 and 2010.

Table 2

Information values for the tested combi-

nations.

Combination IV

Age × income 0.315
Age × occupation 0.009
Income × marital status 0.208
Income × occupation 0.334
Income × proof of income 0.123
Age × income × occupation 0.007

c

t

W

n

c

4

i

s

t

G

u

g

c

b

r

a

t

w

u

t

4

t

w

I

d

a

i

0

0

4

n

a

w

• Significant percent of zero or missing values In exception to the vari-

ables ‘lives and work in the same state’ and ‘previous credit expe-

rience’, binary flags have 95% to 100% concentrated in one of the

values, which turn them practically unworkable. The same occurs

for the numerical variables number of dependents and number

of accounts in the bank, both with more than 99% zeroes. The re-

maining variables were reasonably or completely populated.
• Outliers and unreasonable values The variable age presents 0.05%

of applications assigned to customers with ages between 100 and

988 years. A small percent of values out of the standard ranges are

observable in the variables credit card bills due day, monthly in-

come and time at current employer. Unreasonable values are de-

tected in the first semester of 2009, suggesting that the data were

subjected to corrections from the second semester of 2009 on-

wards.
• Unreliable and informal information Little reliability on socio-

demographic data is amplified by specific conditions in the back-

ground of this problem. This type of scorecards is usually based in

verbal information that the customer provides, and in most of the

cases no certification is made available. In 85% of the applications,

no certification for the income was provided, and 75% do not have

proof for the address type. Customers have little or no concern to

provide accurate information. The financial industry is aware of

this kind of limitations. However, in highly competitive environ-

ments there is little chance to amend them, while keeping in the

business. Hence, other than regulatory imperatives, no player is

able to efficiently overcome this kind of data limitations. As cur-

rently there are no such imperatives in Brazilian financial market,

databases attached to this type of models are likely to keep lack-

ing reliability in the near future.
• Bias on the distributions of modeling examples The most noticeable

bias is in the variable monthly income, where values shift from

one year to another, exhibited in Fig. 1. This is most likely related

to increases in the minimum wages and inflation.

Slight variations are also observable in the geographical variables,

which are possibly related with the geographical expansion of the

institution. In the remaining characteristics, the correlation between

the frequency distributions of 2009 and 2010 range from 99 to 100%,

suggesting a very stable pattern during the analyzed period.

4.3. Data transformation and new characteristics

4.3.1. Data cleansing and new characteristics

We focused the data treatment on the characteristics that were

reasonably or fully populated. Fields state, city, and neighborhood
ontain free text, and were subjected to a manual cleansing. At-

ributes with 100 or less records were assigned to a new class “Other”.

e could observe that there may be neighborhoods with the same

ame in different cities; and hence we concatenated these new

leansed fields, state and city, into the same characteristic.

.3.2. Data transformation

Variables were transformed using the weights of evidence (WoE)

n the complete modeling dataset, which is a typical measure in credit

core modeling (FICO, 2006). W oE = ln g/G
b/B

, where g and b are respec-

ively the number of good and the number of bad in the attribute, and

and B are respectively the total number of good and bad in the pop-

lation sample. The larger the WoE the higher is the proportion of

ood customers in the bin. For the nominal and binary variables we

alculated the WoE for each class. Numerical variables were firstly

inned using SAS Enterprise Miner, and then manually adjusted to

eflect domain knowledge. In so doing we aim to achieve a set of char-

cteristics less exposed to overfitting. Cases where the calculation of

he WoE rendered impossible - one of the classes without examples –

ere given an average value. The same principle was applied to val-

es out of the expected ranges (e.g. credit card bills due day higher

han 31).

.3.3. One-dimensional analysis

The strength of each potential characteristic was measured using

he information value (IV) in the period, IV = ∑ni=1 (g/G − b/B)W oE,
here n is the number of bins in the characteristic. The higher is the

V, the higher is the relative importance of the characteristic. In a one-

imensional basis, for the entire period, the most important char-

cteristics are age, occupation, time at current employer, monthly

ncome and marital status, with information values of 0.368, 0.352,

.132, 0.117, and 0.116, respectively. Remaining characteristics have

.084 or less.

.3.4. Interaction terms

Using the odds in each attribute of the variables, we calculated

ew nonlinear characteristics using interaction terms between vari-

bles to model the joint effects. We tested six combinations, for which

e present the information value in Table 2.


M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 347

4

d

o

r

c

e

i

i

d

c

4

s

m

w

t

u

s

c

b

A

t

t

S

l

b

t

p

e

i

t

o

f

b

p

t

m

a

s

o

s

T

a

w

r

T

i

s

t

f

m

m

c

d

u

a

T

t

m

s

s

i

t

a

t

r

s

5

t

t

w

t

t

A

s

v

l

f

q

a

y

t

n

w

s

s

5

m

w

f

o

t

s

t

i

d

i

t

t

s

t

w

.3.5. Time series descriptive analysis

Fig. 2a shows the real concept drift along 2009–2010. The highest

efault rates are noticed in the first quarter of 2009, and at the end

f 2010. Fig. 2b displays the evolution of the business in the same pe-

iod. It exhibits two features of the business. First, we can see that the

redit cards business follow an annual seasonality, increasing along

ach year. Second, the credit cards business is rising over time, which

s related with the expansion of the branch network of the financial

nstitution. The decrease of default rate during 2009 suggests that the

ecision-making process might have been slightly enhanced, when

omparing to the beginning of the period.

.4. Dynamic modeling framework

The dynamic modeling framework presented in this research con-

iders that data is processed batch-by-batch. Sequentially, at each

onthly window, a new model is learned from a previous selected

indow, including the most recent month. To mimic the time evolu-

ion, we assumed that the current month gradually shifts from 2009

ntil the third quarter of 2010.

Each learning unit for the model building was grounded on a static

etting. The training of each unit consists of a supervised classifi-

ation procedure, executed in three steps. First, characteristics are

inned. Second, the classification model is designed with Generalized

dditive Models (GAM) and a 10 fold crossed-validation, upholding

he classification algorithm used to develop the winning model in

he BRICS 2013 in data mining and finance (BRICS-CCI&CBIC, 2013;

ousa et al., 2013b). Concurrently, the best set of characteristics is se-

ected until no other characteristic in the training dataset adds contri-

ution to the information value (IV) of the model. In this application

he threshold was set for a minimum increment of 0.03. Third, the

erformance of the model is measured based on the Gini coefficient,

quivalent to consider the area under the ROC curve (AUC), which

s a typical evaluation criteria among researchers and in the indus-

ry (Řezáč & Řezáč, 2011). This coefficient refers to the global quality

f the credit scoring model, and ranges between −1 and 1. The per-
ect scoring model fully distinguishes the two target classes, good and

ad, and has a Gini coefficient equal to 1. A model with a random out-

ut has a Gini coefficient equal to zero. If the coefficient is negative,

hen the scores have a reverse meaning. The extreme case −1 would
ean that all examples of the good class are being predicted as bad,

nd vice versa. In this case, the perfect model can be achieved just by

witching the prediction.

At each month, instances for modeling are selected from all previ-

us available batches, according to a selection mechanism. We use in-

tance selection methods to test the hypothesis under investigation.

wo methods for tackling default concept drift were implemented -

full memory time window, and a fixed short memory time window

ith a forgetting mechanism.

The full memory time window assumes that the learning algo-

ithm generates the model based on all previous instances (Fig. 3a).

he process is incremental, so every time a new instance arises, it

s added to the training set, and a new model is build. This schema

hould be appropriate to detect mild concept drifts, but it is unable

o rapidly adapt to major changes. Models of this schema should per-

orm suitably in stable environments. A shortcoming of this incre-

ental schema is that the training dataset quickly expands which

ay requires a huge storage capacity, and constrain the use of some

lassification algorithms, to be able of processing the expanding

ataset.

In the fixed short memory time window, the model development

ses the most recent window. With this schema, illustrated in Fig. 3b,

new model is build in each new batch, by forgetting past examples.

he fundamental assumption is that past examples have low correla-

ion with the current default concept. Under this setting, the dynamic

odeling should quickly adapt to changes. The most extreme case of
hort memory time window is when only the current example is con-

idered to train the new model, which represents to the online learn-

ng without any memory of the past. A deficiency of this method is

hat it often lacks of generalization ability in stable conditions that is

mplified with extremely short windows.

These modeling frameworks enable comparing these configura-

ions between themselves, and also compare them with the model

eached with a static learning setting. The research questions of this

tudy should be answered following the reasoning:

• If the full memory time window outperforms the other schema,

then more recent data are not fundamental for the prediction; the

environment of the decision-making should be in a stable phase.

Otherwise, the default concept is drifting, and so the most recent

data are more relevant for the prediction.
• If a model built with static learning in the first window of the

period has the best performance, then older data can improve

the prediction. This may happen, for example, when a new credit

product is launched, and the credit decision-making criteria are

adjusted afterwards. In such case, the oldest data are more rep-

resentative, as they can illustrate a more diverse range of risk be-

haviors. Otherwise, over the long-run, dynamic modeling should

outperform the model learnt with static setting.

. Experimental results

We assessed the performance of the sequential models built with

he dynamic modeling framework introduced in the previous sec-

ion, through the period 2009 and 2010. The experimental design

as drawn for assessing the performance in the modeling period, in

he short-term, and in the farthest-term. In each model rebuilding,

he performance in the modeling period was assessed in the test set.

dditionally, using two out-of-sample windows, we measured the

hort-term performance of the model in the month following the de-

elopment, and the farthest-term performance was measured in the

ast quarter of 2010. Although we have considered monthly windows

or developing the model, for the long-run assessment we chose a

uarterly window instead of a single month. In so doing, potential

typical properties of the decision-making process at the end of the

ear were smoothed.

In this section, we provide further evidence on temporal degrada-

ion of static credit scoring. Then, we challenge the robustness of the

ew concept of dynamic modeling against a static model, developed

ith a traditional framework. We finally present and discuss the re-

ults for the two sliding-window configurations - full memory and

hort memory.

.1. Temporal degradation of static credit scoring

The temporal degradation of the credit scoring is detected when

easuring the performance of each model in the sequence generated

ith the dynamic modeling. Fig. 4a and b exhibit the Gini coefficient

or each model, measured in the modeling test set and two different

ut-of-sample windows, one month after rebuild the model, and in

he farthest quarter in the period (2010 Q4).

Fig. 4a shows the performance along the entire period with the

hort memory configuration. One month after rebuilding the model,

he performance curve is always below the performance measured

n the modeling period, showing that the performance consistently

ecreases one month after rebuilding the model. When evaluat-

ng the performance with the full memory configuration, in Fig. 4b,

he extent of degradation within a month is not consistent over

he period. During the first semester of 2009, performance mea-

ured in the month after rebuilding the model is slightly superior to

he one measured in the modeling period, and from that point on-

ards, it is marginally inferior. This may suggest that the short-term


348 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351

Default

Average default

17%

22%

27%

32%

37%

42%

2009 -Jan 2009 -Apr 2009 -Jul 2009 -Oct 2010-Jan 2010-Apr 2010-Jul 2010-Oct

(%
)

r
a

te
D

e
fa

u
lt

Application month

(a) Default rate.

0

20,000

40,000

60,000

80,000

2009 -Jan 2009 -Apr 2009 -Jul 2009 -Oct 2010-Jan 2010-Apr 2010-Jul 2010-Oct

(#
)

c
a
r
d

s
c
r
e
d

it
U

n
d

e
r
w

r
it

e
d

Month

(b) Number of new credit cards contracts.

Fig. 2. Default rate and new contracts in the period 2009–2010.

…

Learning 2

Learning 1

Learning 3

Learning n

timeMonth 1 Month 2 Month 3 Month n

…

(a) Full memory time window.

…

Learning 2

Learning 1

Learning 3

Learning n

timeMonth 1 Month 2 Month 3 Month n

…

(b) Fixed short memory time window.

Fig. 3. Configurations for tackling concept drift in credit default.

Modeling period

1 month after 

rebuilding

Farthest quarter 

(2010 Q4)

0.27

0.31

0.35

0.39

0.43

0.47

0.51

2009-Jan 2009-Mar 2009-Jun 2009-Sep 2009-Dec 2010-Mar 2010-Jun 2010 - Sep

c
o
e
ff

ic
ie

n
t

G
in

i

Model rebuilding month

(a) Short memory.

Modeling period

1 month after 

rebuilding

Farthest quarter 

(2010 Q4)

0.27

0.31

0.35

0.39

0.43

0.47

0.51

2009-Jan 2009-Mar 2009-Jun 2009-Sep 2009-Dec 2010-Mar 2010-Jun 2010 - Sep

c
o
e
ff

ic
ie

n
t

G
in

i

Model rebuilding month

(b) Full memory.

Fig. 4. Gini coefficient of the sequence of models produced with the dynamical modeling.

e

t

t

c

i

performance is more similar to the performance in the modeling pe-

riod when using the full memory configuration.

The extent of degradation is higher, when the performance of the

model is measured at the end of the period (2014 Q4). The farthest

is the point of the prediction from the point of the development;

the highest is the extent of degradation of the performance. These
ffects are consistently perceived on the two windowing configura-

ions - short memory (Fig. 4a) and full memory (Fig. 4b).

Considering the real performance of the models one month after

hey were built, the average degradation of the models sequentially

onstructed, shown in Table 3, is 0.02 in the short memory and 0.01

n the full memory configuration. In the farthest quarter in the period


M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 349

Table 3

Average degradation of the sequence of models produced with the dynamic modeling.

Memory type Gini index Degradation

Modeling Month after Farthest quarter Month after Farthest quarter

period rebuilding (2010 Q4) rebuilding (2010 Q4)

Short 0, 40 0, 38 0, 33 −0, 02 −0, 07
Full 0, 39 0, 38 0, 33 −0, 01 −0, 06

(

u

q

t

5

p

s

n

m

s

i

c

q

p

h

o

p

-

c

t

m

m

m

t

h

i

h

t

m

T

o

i

n

u

s

5

w

o

t

t

H

f

m

t

t

t

c

t

t

d

0

H

s

b

i

d

6

a

u

c

r

d

a

b

e

d

d

a

d

S

w

m

t

p

t

p

n

i

t

o

t

l

l

m

d

c

o

(

w

c

e

i

i

n

t

e

2010 Q4) the degradation reaches 0.07 in the short memory config-

ration and 0.06 in the full memory schema.

Although degradation can be observed in all models of the se-

uence, updating the model always yields the best discrimination be-

ween the target classes - good and bads.

.2. Dynamic versus static

The proposed dynamic modeling framework enables a major im-

rovement of the initial static model, which was trained with the

ample in the first month of 2009 (2009 M1).

Fig. 5a shows the immediate performance achieved with the dy-

amic modeling – full and short term memory – versus the static

odel, measured in the month following the development. Fig. 5b

hows the performance of the models in each point in time, measured

n the farthest quarter of the period. Consistently, for both memory

onfigurations and performance criteria, immediate or in the farthest

uarter, either the static or the dynamic modeling performances im-

rove until the third quarter of 2009, which might reflect the en-

ancement of the set of characteristics x that was partially corrected

ver that period.

In Fig. 5a, we observe a certain overlap between the immediate

erformance achieved with the two types of memory configurations

short and full. For all the periods, the short-term performance in-

reases until the third quarter of 2009, and slightly decreases from

hat point onwards. The extent of improvement with the dynamic

odeling reaches 0.05 in 2010.

Fig. 5b shows that the farthest-term performance of the first

odel in the sequence of the dynamic modeling, same as the static

odel, is significantly improved with the sequential rebuilding un-

il the third quarter of 2009, possibly as a consequence of the en-

ancement of the set of characteristics. In this period, performance

ncreases from 0.28 to 0.36, meaning that the risk assessment is en-

anced with the new dynamic modeling, rather than the static. From

hat quarter onwards, the long-run predictions given by the dynamic

odeling slightly improve, and always outperform the static frame.

his suggests that, the new incoming data allow a better knowledge

f the new context. Although we know beforehand that the increase

n performance is somewhat a consequence of the training being

earer to the out-of-sample validation window, still we can see that

sing the newest data improves the initial prediction given by the

tatic model (2009 Q1).

.3. Memory - keep or lose it

The new dynamic modeling framework enables investigating on

hether it is preferable keeping a long-term memory or forgetting

lder observations, or if they are equivalents in some contexts. From

he second semester of 2009 onwards, the best results in the farthest-

erm (2014 Q4) are reached with the full memory configuration.

owever, we realize that there is a certain overlap between the per-

ormances of the sequence of models resulting from the two types of

emory configurations, both for the short-term and for the farthest-

erm. This suggests that, in the period, the information contained in

he older examples remain appropriate for the default target, and that

he context is not drifting as a result of particular changes in the set of
haracteristics. Hence, drifts in particular characteristics, like income,

ranslate into virtual drifts because they did not have an impact in

he distribution of target concept, p(y|x). To some extent, the imme-

iate performance, exhibited in Fig. 5a, decreases during 2010 from

.44 to 0.38%, which could be interpreted as the presence of a drift.

owever, as the timeframe is small, it remains uncertain if it is a tran-

itory outcome or a persistent drift in the context, potentially caused

y changes in features that are not represented in the set of character-

stics available for modeling in this application, like macroeconomic

ata.

. Conclusions

This research presents a new modeling framework for credit risk

ssessment that extends the prevailing credit scoring models built

pon historical data static settings. Our framework mimics the prin-

iple of films, by composing the model with a sequence of snapshots,

ather than a single picture. Within the new modeling schema, pre-

ictions are made upon the sequence of temporal data, and are suit-

ble for adapting to the occurrence of real concept drifts, translated

y changes in the population, in the economy or in the market. It also

nables improving the existing models based on the newest incoming

ata.

We present an empirical simulation using a real-world financial

ataset of 762,966 credit cards, from a financial institution in Brazil

long two years of operation. A first conclusion is that monthly up-

ates avoid the degradation of the model following the development.

econdly, newest data consistently improve the forecasting accuracy,

hen compared to the previous models in the sequence of dynamic

odeling, both in a short-term as in a full-term memory configura-

ion. Particularly, the static model available at the beginning of the

eriod is outperformed by every succeeding model, suggesting that

he dynamic modeling framework has the ability of improving the

rediction by integrating new incoming data. Third, a slight domi-

ance is achieved with the full-term memory, suggesting that older

nformation remains meaningful for predicting default target within

he analyzed period.

In banking industry, prevailing credit scoring models are devel-

ped from static windows and kept unchanged possibly for years. In

his setting, the two basic mechanisms of memory, short-term and

ong-term memory are fundamental to learning, but are still over-

ooked in current modeling frameworks. As a consequence, these

odels are insensitive to changes, like population drifts or financial

istress. The usual outcomes are the default rates rising and abrupt

redit cuts, as those that were observed in the U.S. in the aftermath

f the last Global crisis (as documented by Sousa, Gama, and Brandão

2015)). This problem could be overcome with the proposed frame-

ork, since it would allow to gradually relearning along time and

hanges.

Still, there are some real business problems with rebuilding mod-

ls over time. First, lenders have little incentive to enhance the exist-

ng rating systems frameworks because there is a recycling idea that

t expensive and time-consuming to build new scorecards. Then, they

eed to be internally tested and validated, and then regulators need

o approve them. Second, regulators still promote models whose co-

fficients do not change over time. This is one area where practice is


350 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351

Short memory

Full memory

Static model 
(2009 M1)

0.26

0.28

0.30

0.32

0.34

0.36

0.38

0.40

0.42

0.44

2009-Jan 2009-Mar 2009-Jun 2009-Sep 2009-Dec 2010-Mar 2010-Jun 2010 - Sep

c
o
e
ff

ic
ie

n
t

G
in

i

Model rebuilding month

(a) Short-term performance.

Short memory

Full memory

Static model 
(2009 M1)

0.26

0.28

0.30

0.32

0.34

0.36

0.38

0.40

0.42

0.44

2009-Jan 2009-Mar 2009-Jun 2009-Sep 2009-Dec 2010-Mar 2010-Jun 2010 - Sep

c
o
e
ff

ic
ie

n
t

G
in

i

Model rebuilding month

(b) Farthest-term performance, 2010 Q4.

Fig. 5. Performance with the dynamic modeling – full and short memory – versus the static model (2009 M1).

B

B

B

C

C

C

C

C

C

D

D

D

D

E

E

F

F

G

G

H

H

J

J

K

L

L

far distant from the technical advances, and new thoughts, like sim-

plifying current decision layers, need to be encouraged.

There are some important topics in default concept drift that we

did not consider, which we defer for future research. While this pa-

per provides convincing results, some additional simulations using

real-world datasets from highly stressed economic environments and

longer time frames would be valuable. Second, modeling the delin-

quency presents a specificity since a window of time is required in

order to measure the outcome, i.e. the true class, before the new

model is built. Therefore, for forecasting, it turns out that there will

be a time gap of the same length between the values of predictor

variables used in the model and the first possible forecast period in

the future. Although this is not a problem of the proposed methodol-

ogy, future research should bring new insights to overcome this issue,

with a view on practicality. Third, some good alternatives to using

windows of data blocks are encouraged, which may be based on us-

ing ensembles of the models learned in the past, possible combining

the two components of memory, short-term and long-term memory,

or a forgetting factor method. There is some material on this going

back to Adams et al. (2010). Fourth, our empirical study considered

a set of fixed predictors. Therefore, future research should consider

sets of predictor of variable length. This is important for detecting

concept drift, because the set predictors that are being used may be

quite limited to exhibit signs of change, even if they are occurring in

the environment. Finally, performance is reported in this paper, but

the conditions leading to difference in performance are not explored.

This is another future research direction.

References

Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techniques and evaluation

criteria: A review of the literature. Intelligent Systems in Accounting, Finance and

Management, 18(2-3), 59–88.
Adams, N. M., Tasoulis, D. K., Anagnostopoulos, C., & Hand, D. J. (2010). Temporally-

adaptive linear classification for handling population drift in credit scoring. In
Y. Lechevallier, & G. Saporta (Eds.), Proceedings of compstat’2010 (pp. 167–176).

Physica-Verlag HD.
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corpo-

rate bankruptcy. Journal of Finance, 23(4), 589–609.

Anderson, R. (2007). The credit scoring toolkit: Theory and practice for retail credit risk
management and decision automation. OUP Oxford.

Avery, R. B., Calem, P. S., & Canner, G. B. (2004). Consumer credit scoring: Do situational
circumstances matter? Journal of Banking & Finance, 28(4), 835–856.

Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003).
Benchmarking state-of-the-art classification algorithms for credit scoring. Journal

of the Operational Research Society, 54(6), 627–635.

BCBS (2006). International convergence of capital measurement and capital standards:
A revised framework - comprehensive version. Bank for International Settlements.

Bellotti, T., & Crook, J. (2013). Forecasting and stress testing credit card default using
dynamic models. International Journal of Forecasting, 29(4), 563–574.

BIS (2004). Implementation of Basel II: Practical considerations. Bank for International
Settlements.
reiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees.
Belmont, California: Wadsworth International Group.

RICS-CCI&CBIC (2013). CI algorithms competition (CIAC): Credit risk assessment sys-
tem robustness against degradation and seasonal variation. http://brics- cci.org/

ci- algorithms- competition- ciac/ Accessed 19.06.13.

rown, I., & Mues, C. (2012). An experimental comparison of classification algorithms
for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3),

3446–3453. http://dx.doi.org/10.1016/j.eswa.2011.09.033.
hawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic

minority over-sampling technique. Journal of artificial intelligence research, 321–
357.

hen, D., Zhong, Y., Liao, Y., & Li, L. (2013). Review of multiple criteria and multi-
ple constraint-level linear programming. Procedia Computer Science, 17(0), 158–

165.

hen, M.-C., & Huang, S.-H. (2003). Credit scoring and rejected instances reassigning
through evolutionary computation techniques. Expert Systems with Applications,

24(4), 433–441.
rook, J., & Bellotti, T. (2010). Time varying and dynamic models for default risk in

consumer loans. Journal of the Royal Statistical Society: Series A (Statistics in Society),
173(2), 283–305.

rook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer

credit risk assessment. European Journal of Operational Research, 183(3), 1447–
1465.

rook, J. N., Thomas, L. C., & Hamilton, R. (1992). The degradation of the scorecard over
the business cycle. IMA Journal of Management Mathematics, 4(1), 111–123.

esai, V. S., Crook, J. N., & Overstreet Jr, G. A. (1996). A comparison of neural networks
and linear scoring models in the credit union environment. European Journal of

Operational Research, 95(1), 24–37.

omingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive.
In Proceedings of the fifth ACM SIGKDD international conference on knowledge discov-

ery and data mining. In KDD ’99 (pp. 155–164). New York, NY, USA: ACM.
uda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. John Wiley & Sons.

urand, D. (1941). Risk elements in consumer installment financing. National Bureau of
Economic Research, Inc.

inav, L., Jenkins, M., & Levin, J. (2013). The impact of credit scoring on consumer lend-

ing. The RAND Journal of Economics, 44(2), 249–274.
isenbeis, R. A. (1978). Problems in applying discriminant analysis in credit scoring

models. Journal of Banking & Finance, 2(3), 205–219.
ICO (2006). Introduction to scorecard for FICO model builder, .

isher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals
of eugenics, 7(2), 179–188.

ama, J. (2010). Knowledge discovery from data streams. London: Chapman & Hall/CRC.

ama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey
on concept drift adaptation. ACM Comput. Surv., 46(4), 44:1–44:37. doi:10.1145/

2523813.
and, D. J. (2006). Classifier technology and the illusion of progress. Statistical Science,

21(1), 30–34.
sieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models.

Expert Systems with Applications, 28(4), 655–665.
ain, A. K., Duin, R. P. W., & Mao, J. (2000). Statistical pattern recognition: A review. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37.

ensen, H. L. (1992). Using neural networks for credit scoring. Managerial Finance, 18(6),
15–26.

linkenberg, R. (2004). Learning drifting concepts: Example selection vs. example
weighting. Intelligent data analysis, 8(3), 281–300.

azarescu, M. M., Venkatesh, S., & Bui, H. H. (2004). Using multiple windows to track
concept drift. Intelligent data analysis, 8(1), 29–59.

ee, T.-S., & Chen, I.-F. (2005). A two-stage hybrid credit scoring model using artificial

neural networks and multivariate adaptive regression splines. Expert Systems with
Applications, 28(4), 743–752.

http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0004
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0004
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010
http://brics-cci.org/ci-algorithms-competition-ciac/
http://dx.doi.org/10.1016/j.eswa.2011.09.033
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0023
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0023
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0024
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0024
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0025
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0025
http://dx.doi.org/10.1145/2523813
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0027
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0027
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0028
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0028
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0030
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0030
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0031
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0031
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0033
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0033
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0033
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0033


M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 351

L

L

L

M

M

M

M

M

M

M

M

O

P

Q
R

S

S

S

S

S

S

S

S

S

T

T

T

T

W

W

W

W

W

W

Y

ee, T.-S., Chiu, C.-C., Lu, C.-J., & Chen, I.-F. (2002). Credit scoring using the hybrid neural
discriminant technique. Expert Systems with Applications, 23(3), 245–254.

i, S.-T., Shiue, W., & Huang, M.-H. (2006). The evaluation of consumer loans using sup-
port vector machines. Expert Systems with Applications, 30(4), 772–782.

ucas, A. (2004). Updating scorecards: Removing the mystique. In Readings in credit
scoring: foundations, developments, and aims (pp. 93–109). New York: Oxford Uni-

versity Press.
alhotra, R., & Malhotra, D. K. (2002). Differentiating between good credits and bad

credits using neuro-fuzzy systems. European Journal of Operational Research, 136(1),

190–211.
aloof, M. A., & Michalski, R. S. (2004). Incremental learning with partial instance

memory. Artificial intelligence, 154(1), 95–126.
arqués, A. I., García, V., & Sánchez, J. S. (2012a). On the suitability of resampling tech-

niques for the class imbalance problem in credit scoring. Journal of the Operational
Research Society, 64(7), 1060–1070.

arqués, A. I., García, V., & Sánchez, J. S. (2012b). Two-level classifier ensembles for

credit risk assessment. Expert Systems with Applications, 39(12), 10916–10922.
arqués, A. I., García, V., & Sánchez, J. S. (2013). A literature review on the application

of evolutionary computing to credit scoring. Journal of the Operational Research So-
ciety, 64(9), 1384–1399.

artens, D., De Backer, M., Haesen, R., Vanthienen, J., Snoeck, M., & Baesens, B. (2007).
Classification with ant colony optimization. IEEE Transactions on Evolutionary Com-

putation, 11(5), 651–665.

cNab, H., & Wynn, A. (2000). Principles and practice of consumer credit risk manage-
ment. CIB Publishing.

in, J. H., & Lee, Y.-C. (2005). Bankruptcy prediction using support vector machine
with optimal choice of kernel function parameters. Expert Systems with Applica-

tions, 28(4), 603–614.
ng, C.-S., Huang, J.-J., & Tzeng, G.-H. (2005). Building credit scoring models using ge-

netic programming. Expert Systems with Applications, 29(1), 41–47.

avlidis, N., Tasoulis, D., Adams, N., & Hand, D. (2012). Adaptive consumer credit classi-
fication. Journal of the Operational Research Society, 63(12), 1645–1654.

uinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81–106.
ˇ ezáč, M., & Řezáč, F. (2011). How to measure the quality of credit scoring models.

Finance a Uver: Czech Journal of Economics & Finance, 61(5), 486–507.
aberi, M., Mirtalaie, M. S., Hussain, F. K., Azadeh, A., Hussain, O. K., & Ashjari, B. (2013).

A granular computing-based approach to credit scoring modeling. Neurocomput-

ing, 122(0), 100–115.
alganicoff, M. (1997). Tolerating concept and sampling shift in lazy learning using pre-

diction error context switching. Artificial Intelligence Review, 11(1-5), 133–155.
chlimmer, J. C., & Granger Jr, R. H. (1986). Incremental learning from noisy data. Ma-

chine learning, 1(3), 317–354.
ousa, M. R., & da Costa, J. P. (2008). A tripartite scorecard for the pay/no pay decision-
making in the retail banking industry. Frontiers in Artificial Intelligence and Applica-

tions, 45.
ousa, M. R., Gama, J., & Brandão, E. (2015). Links between scores, real default and pric-

ing: Evidence from the Freddie Mac’s loan-level dataset. Journal of Economics, Busi-
ness and Management, 3(12), 1106–1114.

ousa, M. R., Gama, J., Brandão, E., et al. (2013a). Introducing time-changing economics
into Credit Scoring. Technical Report. Universidade do Porto, Faculdade de Econo-

mia do Porto.

ousa, M. R., Gama, J., & Gonçalves, M. J. S. (2013b). A two-stage model for dealing with
temporal degradation of credit scoring. In Proceedings of BRICS-CCI & CBIC.

teenackers, A., & Goovaerts, M. (1989). A credit scoring model for personal loans. In-
surance: Mathematics and Economics, 8(1), 31–34.

un, J., & Li, H. (2011). Dynamic financial distress prediction using instance
selection for the disposal of concept drift. Expert Systems with Applications, 38(3),

2566–2576.

homas, L. C. (2009). Consumer credit models: pricing, profit and portfolios: Pricing, profit
and portfolios. Oxford University Press.

homas, L. C. (2010). Consumer finance: Challenges for operational research. Journal of
the Operational Research Society, 61, 41–52.

homas, L. C., Edelman, D. B., & Crook, J. N. (2002). Credit scoring and its applications.
Philadelphia: Society for Industrial and Applied Mathematics.

symbal, A. (2004). The problem of concept drift: Definitions and related work. Dublin:

Computer Science Department, Trinity College.
ang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble learn-

ing for credit scoring. Expert systems with applications, 38(1), 223–230.
ang, Y., Wang, S., & Lai, K. (2005). A new fuzzy support vector machine to evaluate

credit risk. IEEE Transactions on Fuzzy Systems, 13(6), 820–831.
ei, G., Yun-Zhong, C., & Ming-shu, C. (2014). A new dynamic credit scoring model

based on the objective cluster analysis. In Practical applications of intelligent sys-

tems. In Advances in Intelligent Systems and Computing: 279 (pp. 579–589). Springer
Berlin Heidelberg.

est, D. (2000). Neural network credit scoring models. Computers & Operations Re-
search, 27(11), 1131–1152.

idmer, G., & Kubat, M. (1993). Effective learning in dynamic environments by explicit
context tracking. In Machine learning: Ecml-93 (pp. 227–243). Springer.

idmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden

contexts. Machine learning, 23(1), 69–101.
ang, Y. (2007). Adaptive credit scoring with kernel learning methods. European Journal

of Operational Research, 183(3), 1521–1536.

http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0036
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0036
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0037
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0037
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0037
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0037
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0038
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0038
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0038
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0038
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0043
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0043
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0043
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0043
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0044
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0044
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0044
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0044
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0047
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0047
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0048
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0048
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0048
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0048
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0050
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0050
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0051
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0051
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0051
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0051
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0052
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0052
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0052
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0052
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0056
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0056
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0056
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0056
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0057
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0057
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0057
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0057
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0058
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0058
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0059
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0059
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0061
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0061
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0065
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0065
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0066
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0066
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0066
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0066
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0067
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0067
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0067
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0067
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0068
http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0068

	A new dynamic modeling framework for credit risk assessment
	1 Introduction
	2 Settings and concepts
	2.1 Supervised learning problem
	2.2 Score formulation
	2.3 Supervised classification methods

	3 Dynamic modeling for credit default
	3.1 Concept drift in credit default
	3.2 Methods for adaptation

	4 Case study
	4.1 Dataset and validation environment
	4.2 Data analysis and cleansing
	4.3 Data transformation and new characteristics
	4.3.1 Data cleansing and new characteristics
	4.3.2 Data transformation
	4.3.3 One-dimensional analysis
	4.3.4 Interaction terms
	4.3.5 Time series descriptive analysis

	4.4 Dynamic modeling framework

	5 Experimental results
	5.1 Temporal degradation of static credit scoring
	5.2 Dynamic versus static
	5.3 Memory - keep or lose it

	6 Conclusions
	 References