A new dynamic modeling framework for credit risk assessment Expert Systems With Applications 45 (2016) 341–351 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa A new dynamic modeling framework for credit risk assessment Maria Rocha Sousa a,∗, João Gama a,b, Elísio Brandão a a School of Economics and Management, University of Porto, Portugal b Laboratory of Artificial Intelligence and Decision Support of the Institute for Systems and Computer Engineering, Technology and Science, Portugal a r t i c l e i n f o Keywords: Credit risk modeling Credit scoring Dynamic modeling Temporal degradation Default concept drift Memory a b s t r a c t We propose a new dynamic modeling framework for credit risk assessment that extends the prevailing credit scoring models built upon historical data static settings. The driving idea mimics the principle of films, by composing the model with a sequence of snapshots, rather than a single photograph. In doing so, the dynamic modeling consists of sequential learning from the new incoming data. A key contribution is provided by the insight that different amounts of memory can be explored concurrently. Memory refers to the amount of historic data being used for estimation. This is important in the credit risk area, which often seems to undergo shocks. During a shock, limited memory is important. Other times, a larger memory has merit. An application to a real-world financial dataset of credit cards from a financial institution in Brazil illustrates our methodology, which is able to consistently outperform the static modeling schema. © 2015 Elsevier Ltd. All rights reserved. 1 m m a p i c h i t s A f a b a i i t j m h c r i a u y s a d u t m i u n m c e u n h h 0 . Introduction In banking, credit risk assessment often relies on credit scoring odels, so called PD models (Probability of Default models).1 These odels output a score that translates the probability of a given entity, private individual or a company, becoming a defaulter in a future eriod. Nowadays, PD models are at the core of the banking business, n credit decision-making, in price settlement, and to determine the ost of capital. Moreover, central banks and international regulation ave dramatically evolved to a setting where the use of these models s favored, to achieve soundness standards for credit risk valuation in he banking system. Since 2004, with the worldwide implementation of regulations is- ued by the Basel Committee on Banking Supervision within Basel II ccord, banks were encouraged to strengthen their internal models rameworks for reaching the A-IRB (Advanced Internal Rating Based) ccreditation (BCBS, 2006; BIS, 2004). To achieve this certification, anks had to demonstrate that they were capable of accurately evalu- ting their risks, complying with Basel II requirements, by using their nternal risk models’ systems, and keep their soundness. Banks own- ng A-IRB accreditation gained an advantage over the others, because hey were allowed to use lower coefficients to weight the exposure of ∗ Corresponding author. Tel.: +351967139811. E-mail addresses: 100427011@fep.up.pt, jsc@inescporto.pt (M.R. Sousa), gama@fep.up.pt (J. Gama), ebrandao@fep.up.pt (E. Brandão). 1 Other names can be used to refer to PD models, namely: credit scoring, credit risk odels, scorecards, credit scorecards, rating systems, or rating models, although some ave different meanings. c e 1 s t r ttp://dx.doi.org/10.1016/j.eswa.2015.09.055 957-4174/© 2015 Elsevier Ltd. All rights reserved. redit at risk, the risk weighted assets, and benefit from lower capital equirements. A lot of improvements have been made in the exist- ng rating frameworks, extending the use of data mining tools and rtificial intelligence. Yet, this may have been bounded by a certain nwillingness to accept less intuitive algorithms or models going be- ond standard solutions being implemented in the banking industry, ettled in-house or delivered through analytics providers. Developing and implementing a credit scoring model can be time nd resource consuming, easily ranging from 9 to 18 months, from ata extraction until deployment. Hence, it is not rare that banks use nchanged credit scoring models for several years. Bearing in mind hat models are built using a sample file frequently comprising 2 or ore years of historical data, in the best case scenario, data used n the models are shifted 3 years away from the point they will be sed. Should conditions remain unchanged, then this would not sig- ificantly affect the accuracy of the models, otherwise, their perfor- ance can greatly deteriorate over time. The recent financial crisis onfirmed that financial environment greatly fluctuates, in an un- xpected manner, posing renewed attention regarding models built pon time-frames that are by far outdated. By 2007–2008, many fi- ancial institutions were using stale credit scoring models built with istorical data of the early-decade. The degradation of stationary redit scoring models is an issue with empirical evidence in the lit- rature (Avery, Calem, & Canner, 2004; Crook, Thomas, & Hamilton, 992; Lucas, 2004; Sousa, Gama, & Gonçalves, 2013b), however re- earch is still lacking more realistic solutions. Dominant approaches rely on static learning models. However, as he economic conditions evolve in the economic cycle, either deterio- ating or improving, also varies the behavior of an individual, and his http://dx.doi.org/10.1016/j.eswa.2015.09.055 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.09.055&domain=pdf mailto:100427011@fep.up.pt mailto:jsc@inescporto.pt mailto:jgama@fep.up.pt mailto:ebrandao@fep.up.pt http://dx.doi.org/10.1016/j.eswa.2015.09.055 342 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 p h s i i c o c 2 a a i c a u s ( p i s c d f c m t t t l e o t T l t s b A p c a l v t s o e a c c t d w t e c s c i m i m a b ability to repay his debt. Furthermore, the default evolution echoes trends of the business cycle, and related with this, regulatory move- ments, and interest rates fluctuations. In good times, banks and bor- rowers tend to be overoptimistic about the future, whilst in times of recession banks are swamped with defaulted loans, high provisions, and tighten capital buffers turn highly conservative. The former leads to more liberal credit policies and lower credit standards, the later promotes sudden credit-cuts. Hence, default needs to be regarded as time changing. Traditional systems that are one-shot, fixed memory-based, trained from fixed training sets, and static settings are not prepared to process the evolving data. And so, they are not able to continuously maintain an output model consistent with the actual state of environ- ment, or to quickly react to changes (Gama, 2010). These are some of the features of classic approaches that evidence the constraints of the existing credit scoring systems. As the processes underlying credit risk are not strictly stationary, consumers’ behavior and default can change over time in unpredictable ways. A few limitations to the ex- isting approaches, idealized in the classical supervised classification paradigm, can be traced in published literature: • The static models usually fail to adapt when the population changes. Static and predefined sample settings often lead to an incomplete examination of the dynamics influencing the prob- lem (Gama, 2010; Hand, 2006). • Certain assumptions that are implicit to the methods, often fail in real-world environments (Yang, 2007). These assumptions relate to: – Representativeness - the standard credit scoring models rely on supervised classification methods that run on 2-years-old static samples, in order to determine which individuals are likely to default in a future fixed period, 1 year for PD mod- els (Thomas, 2010; Thomas, Edelman, & Crook, 2002). Such samples are supposed to be representative of the potential borrowers consumers of the future, the through-the-door pop- ulation. They should also be sufficiently diverse to reflect dif- ferent types of repayment behavior. However, a wide range of research is conducted in samples that are not representative. – Stability and non-bias - the distribution from which the design points and the new points is the same; classes are perfectly de- fined, and definitions will not change. Not infrequently there are selective biases over time. Simple examples of this occur- rence can be observed when a bank launches a new product or promotes a brand new segment of customers. It can also occur when macroeconomics shifts abruptly from an expansion to a recession phase, or vice versa. – Misclassification costs - these methods assume that the costs of misclassification are accurately known, but in practice they are not. • The methods that are most widely used in the banking industry, logistic regression and discriminant analysis are associated with some instability with high-dimensional data and small sample size. Other limitations regard to intensive variable selection effort and incapability of efficiently handling non-linear features (Yang, 2007). • Static models are usually focused in assessing the specific risk of applicants and obligors. However, a complete picture can only be achieved by looking at the return alongside risk, which requires the use of dynamic rather than static models (Bellotti & Crook, 2013). There is a new emphasis on running predictive models with the ability of sensing themselves and learn adaptively (Gama, 2010). Ad- vances on the concepts for knowledge discovery from data streams suggest alternative perspectives to identify, understand and effi- ciently manage dynamics of behavior in consumer credit in chang- ing ubiquitous environments. In a world where the events are not reordained and little is certain, what we do in the present affects ow events unfold in unexpected ways. So far, no comprehensive et of research to deal with time changing default had much impact nto practice. In credit risk assessment, a great deal of sophistication s needed to introduce economic factors and market conditions into urrent risk-assessment systems (Thomas, 2010). The study presented in this paper is a large extension of a previ- us research that delivered the winning model within the BRICS 2013 ompetition in data mining and finance (Sousa, Gama, Brandão et al., 013a; Sousa et al., 2013b). This competition opened to academics nd practitioners, was focused on the development of a credit risk ssessment model, tilting between the robustness of a static model- ng sample and the performance degradation over time, potentially aused by market gradual changes along few years of business oper- tion. Participants were encouraged to use any modeling technique, nder a temporal degradation or concept drift perspective. In the re- earch attached to the winning model, Sousa, Gama, and Gonçalves 2013b) have proposed a two-stage model for dealing with the tem- oral degradation of credit scoring models, which produced motivat- ng results in a 1-year horizon. The winners first developed a credit coring method using a set of supervised learning methods, and then alibrated the output, based on a projection of the evolution in the efault. This adjustment considered both the evolution of the de- ault and the evolution of macroeconomic factors, echoing potential hanges in the population of the model, in the economy, or in the arket. In so doing, resulting adjusted scores translated a combina- ion of the customers’ specific risk with systemic risk. The winning eam (Sousa, Gama, & Gonçalves) concluded that the performance of he models did not significantly differ among classification models, ike logistic regression (LR), AdaBoost, and Generalized Additive Mod- ls (GAM). However, after training in several windows lengths, they bserved that the model based on the longest window has produced he best performing model over the long-run, among all competitors. his finding allowed to realize that some specifics of the credit portfo- ios and macroeconomic environments may reveal quite stable along ime. For those cases, a model built with a static learning setting may eem appropriate, if tested during stable phases. The question yet to e answered was in which conditions credit risk models degrade? nd when so, if there is any alternative modeling technique to the revailing credit scoring models? The aim of this study is to find a learer understanding on which type of modeling framework allows rapid adaptation to changes, and in which circumstances a static earning setting still delivers well-performing models. With this in iew, we implemented a dynamical modeling framework and two ypes of windows for model training, which enable testing our re- earch questions: (a) In which conditions can a dynamic modeling utperform a static model?; (b) Is the recent information more rel- vant to improve forecasting accuracy?; (c) Does older information lways improve forecasting accuracy? This paper introduces a new dynamic modeling framework for redit risk assessment, imported from the emerging techniques of oncept drift adaptation, in streaming data mining and artificial in- elligence. The proposed model is able to produce more robust pre- ictions in stable conditions, but also in the presence of changes, hile the prevailing methods cannot. This is a promissory tool both o academics and practitioners, because unlike the traditional mod- ls, it has the ability of adjusting the predictions in the presence of hanges, like inversions in the economic cycles, major crisis, or intrin- ic behavioral circumstances (e.g. divorce, unemployment and finan- ial distress). Besides the goal of enhancing the prediction of default n credit, the new modeling framework also enables developing a ore comprehensive understanding of the evolution of the credit rat- ng systems over time and anticipating unexpected events. Further- ore, we study the implications to credit risk assessment of keeping long-term memory, and forgetting older examples, which have not een done so far. M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 343 i s d s f u a fi b c t G m t s f u I a m w c i m p d 2 c fi l m a t e 2 l y t a u u m u i a a k v a a P f v i e 2 t t d c t o t d t L o S t m m i i b p A m c s m A s g w c o j o w o a a r 2 p e d t a c b w T m c o m f i c b f t t Few authors have explicitly tried a dynamic modeling framework n credit risk assessment, or connected concepts. Based on a national ample of a credit reporting agency, Avery et al. (2004) show that tra- itional modeling often fails to consider situational circumstances, uch as local economic conditions and individual trigger events, af- ecting the ability of scoring systems to accurately quantify individ- als’ credit risk. We can trace the few existing contributions in this rena over the most recent years. Sun and Li (2011) formally de- ne financial distress concept drift and build a dynamic modeling ased on instance selection. Saberi et al. (2013) worked on the con- ept of granularity for selecting the optimum size of the testing and raining groups with a sample of credit cards of a bank operating in erman. Pavlidis, Tasoulis, Adams, and Hand (2012) proposed a ethodology for the classification of credit applications with the po- ential of adapting to population drifts. This paper follows in Section 2 with a brief description of the main ettings and concepts of the supervised learning problem and score ormulation. It also presents an overview of the methods typically sed in supervised learning, and specifically in credit score modeling. n Section 3, we introduce the topic of concept drift in credit default nd some adaptation methods that can be promising for dynamic odeling credit risk. In Section 4 we present a case study, where e employ a set of these adaptation methods to a real-world finan- ial dataset. First, we characterize the database and provide some ntuition on the background of the problem. Then, we explain the ethodology of this research. Section 5 provides the fundamental ex- erimental results. Conclusions and future applications of the new ynamic modeling framework are traced in Section 6. . Settings and concepts In this work we import some of the emerging techniques in con- ept drift adaptation into credit risk assessment models. This is a eld of research that has been receiving much attention in machine earning over the last decade, as an answer for suitably shaping the odels and processes to a reality that is ever-changing over contexts nd time. The settings and definitions adopted in this paper replicate he general nomenclature surveyed by Gama, Žliobaitė, Bifet, Pech- nizkiy, and Bouchachia (2014). .1. Supervised learning problem Credit risk assessment can be addressed as a classification prob- em, a subset of supervised learning. The aim is to predict the default ∈ {good, bad}, given a set of input characteristics x. The term at- ribute refers to each of the possible values that a characteristic can ssume; the term bin denotes a set of attributes or an interval of val- es in a continuous characteristics; the term example, or record, is sed to refer to one pair of (x, y). Supervised learning classification ethods try to determine a function that best separates the individ- als in each of the classes, good and bad, in the space of the problem. The model building is carried on a set of training examples - train- ng set – collected from the past history of credit, for which both x nd y are known. The best separation function can be achieved with classification method. These methods include, among others, well- nown classification algorithms such as decision trees (DT), support ector machines (SVM), artificial neural networks (ANN), and Gener- lized Additive Models (GAM). Hands-on software packages are avail- ble to the user for example in R, SAS, Matlab, and Model Builder for redictive Analytics. In credit scoring models, the accuracy of such unctions is typically assessed in separate sets of known examples – alidation or out-of-sample data sets. The idea behind this procedure s to mimic the accuracy of that function in future predictions of new xamples where x is known, but y is not. According to the Bayesian Decision Theory (Duda, Hart, & Stork, 001), a classification can be described by the prior probabilities of he classes p(y) and the class conditional probability density func- ion p(x|y) for the two classes, good (G) and bad (B). The classification ecision is made according to the posterior probabilities of the two lasses, which for class B can be represented as: p(B|x) = p(x|B)p(B)/p(x) (1) where p(x) = p(B)p(x|B) + p(G)p(x|G). Here, it is assumed that he costs for misclassifying a bad customer are the same as for the pposite situation, the equal costs assumption. It is worth recalling hat, in real-world financial environments, the costs of failing the pre- iction in a real bad are by far superior to failing in a real good. In he first case, there is essentially a loss of the exposure at default, oss Given Default (LGD), possibly mitigated with collateral. The sec- nd case affects the business, as it translates into a loss of margin. ousa and da Costa (2008) show several possibilities to overcome his practical issue, by adapting the output of standard classification ethods under the equal costs assumption to imbalanced costs for isclassification, associated with the decision and prediction tasks. It s worth discussing the related issue of class imbalance in credit scor- ng datasets. Quite often, these datasets contain a much smaller num- er of observations in the class of defaulters than in that of the good ayers (Brown & Mues, 2012; Marqués, García, & Sánchez, 2012a). large class imbalance is therefore present which some techniques ay not be able to successfully handle. Baseline methods to handle lass imbalance include oversampling the minority class or under ampling the majority class; Tomek links is an example of the for- er, SMOTE of the latter (Chawla, Bowyer, Hall, & Kegelmeyer, 2002). nother established approach to correct imbalance adopt a cost sen- itive classifier with the misclassification cost of the minority class reater than that of the majority class. Within this approach, it is orth mentioning MetaCost, a general method for making classifiers ost-sensitive (Domingos, 1999). All these methodologies, implicitly r explicitly, optimize the decision process for a specific business ob- ective. In other words, the optimization is made for a specific trade- ff between the error committed in identifying someone as defaulter hen one is in fact a non-defaulter individual and the opposite type f error of diagnosing someone as non-defaulter when one is in fact defaulter. This individualization is unconnected with our study and ny of these methods can be incorporated in the methodology under esearch. .2. Score formulation A credit scoring model is a simplification of the reality. The out- ut is a prediction of a given entity, actual or potential borrower, ntering in default in a given future period. Having decided on the efault concept, conventionally a borrower being in arrears for more han 90 days in the following 12 months, those matching the criteria re considered bad and the others are good. Other approaches may onsider a third status, the indeterminate, between the good and the ad classes, e.g. 15 to 90 days overdue, for which it may be unclear hether the borrower should be assigned to one class or to the other. his status is usually removed from the modeling sample, despite the odel can be used to score them. For simplicity, in this paper we will onsider the problem of two classes, although the proposed method- logy can easily be adapted to the other case. The output is a function of the input characteristics x, which is ost commonly referred as score, s(x). We also consider that this unction has a monotonic decreasing relationship with the probabil- ty of entering in default (i.e. reaching the bad status). A robust score- ard enables an appropriate differentiation between the good and the ad classes. It is achieved by capturing an adequate set of information or predicting the probability of the default concept (i.e. belonging to he bad class), based on previous known default occurrences. The no- ation of such probability, Pr{bad|score based on X}, is: p(B|s(x)) = p(B|s(x), x) = p(B|x), ∀x ∈ X (2) 344 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 & g 2 c 2 w a a o u v i a l c t t a p S t 2 e m 3 3 p i b a a d e m c s f W e c t m t t u i n i ( t v p u t t o h Since p(G|x) + p(B|x) = 1, it naturally follows the probability of the complementary class: p(G|s(x)) = P(G|x) = 1 − p(B|x), ∀x ∈ X (3) Among researchers and real-world applications, a usual written form of the score is the log odds score: s(x) = ln p(G|x) p(B|x) , and p(G|x) + p(B|x) = 1. (4) In so saying, the score may vary from −∞, when P(G|x) = 0, to +∞, when P(G|x) = 1, i.e. s(x) ∈ R. The probability of the default event can be written in terms of the score: p(B|x) = 1/(1 + es(x)), ∀x ∈ X The most conventional way to produce log odds score is based in the logistic regression. However, other classification algorithms can also be used, adjusting the output to the scale of that function. In so saying, we assume that independently of the method used to de- termine the best separation between the two classes, good and bad, and then the resulting scorecard has the same property of the log odds score. Although a grounded mathematical treatment may be tempting to tackle this problem, it goes beyond the scope of this work. Notwithstanding, we provide some intuitions on the techni- cal material to survey. The basics of credit scoring and the most common approaches to build a scorecard, are further detailed in the operational research literature (Anderson, 2007; Crook, Edelman, & Thomas, 2007; McNab & Wynn, 2000; Thomas, 2009; Thomas et al., 2002). Recent advances in the area also deliver methods to build risk based pricing models (Thomas, 2009) and methodologies towards the optimization of the profitability to the lenders (Einav, Jenkins, & Levin, 2013). 2.3. Supervised classification methods The first approach to differentiate between groups took place in Fisher’s original work in (1936) for general classification prob- lems of varieties of plants. The objective was to find the best sep- aration between two groups, searching for the best combination of variables such that the groups were separated the most in the sub- space. Durand (1941) brought this methodology to finance for distin- guishing between good and bad consumer loans. Discriminant analysis was the first method used to develop credit scoring systems. Altman (1968) introduced it in the prediction of cor- porate bankruptcy. First applications in retail banking were mainly focused on credit granting in two categories of loans: consumer loans, and commercial loans (for an early review and critique on the use of discriminant analysis in credit scoring see Eisenbeis (1978)). The boom of credit cards demanded the automation of the credit decision task and the use of better credit scoring systems, which were doable due to the growth of computing power. The value of credit scoring became noticed and it was recognized as a much better predictor than any other judgmental scheme. Logistic regression (Steenackers & Goovaerts, 1989) and linear programming (see Chen, Zhong, Liao, and Li, 2013 for a review) were introduced in credit scoring, and they turned out to be the most used in financial industry (Anderson, 2007; Crook et al., 2007). The use of artificial intelligence tech- niques imported from statistical learning theory, such as classifi- cation trees (Breiman, Friedman, Olshen, & Stone, 1984; Quinlan, 1986) and neural networks (Desai, Crook, & Overstreet Jr, 1996; Jensen, 1992; Malhotra & Malhotra, 2002; West, 2000) have arisen in credit scoring systems. Support Vector Machine (SVM) is another method based in optimization and statistical learning, that received increased attention over the last decade in research in finance, ei- ther to build credit scoring systems for consumer finance or to predict bankruptcy (Li, Shiue, & Huang, 2006; Min & Lee, 2005; Wang, Wang, & Lai, 2005). Genetic algorithms (Chen & Huang, 2003; Ong, Huang, Tzeng, 2005), colony optimization (Martens et al., 2007), and re- ression and multivariate adaptive regression splines (Lee & Chen, 005) have also been tried. Evolutionary computing (Marqués, Gar- ía, & Sánchez, 2013), including genetic algorithms (Chen & Huang, 003; Ong et al., 2005) and colony optimization (Martens et al., 2007), as also considered for credit scoring. Regression (Lee & Chen, 2005) nd clustering (Wei, Yun-Zhong, & Ming-shu, 2014) techniques have lso been tailored to the problem. The choice of a learning algorithm is a difficult problem and it is ften based on which happen to be available, or best known to the ser (Jain, Duin, & Mao, 2000). The number of learning algorithms is ast. Many frameworks, adaptations to real-life problems, intertwin- ng of base algorithms were, and continue to be, proposed in the liter- ture, ranging from statistical approaches to state-of-the-art machine earning algorithms, from parametric models to non-parametric pro- edures (Abdou & Pointon, 2011; Baesens et al., 2003). As an alterna- ive to using a single method, a trend that is still evolving relates to he use of hybrid systems (Hsieh, 2005; Lee, Chiu, Lu, & Chen, 2002), nd ensemble of classifiers with which the outputs are achieved by a redefined sequence or rule, or a voting scheme (Marqués, García, & ánchez, 2012b; Wang, Hao, Ma, & Jiang, 2011). New concepts for adapting to changes (Adams, Tasoulis, Anagnos- opoulos, & Hand, 2010; Pavlidis et al., 2012; Sousa et al., 2013b; Yang, 007) and modeling the dynamics (Crook & Bellotti, 2010; Saberi t al., 2013) in populations start being exploited in credit risk assess- ent. . Dynamic modeling for credit default .1. Concept drift in credit default Credit default is mostly a consequence of financial distress. A erson, or a company, is in financial distress when is experiencing ndividual financial constraints or is being exposed to external distur- ances. In private individuals, financial constraints may result from brupt or intrinsic circumstances. In the first case, distress is usu- lly an outcome of sorrowful events like unemployment, pay cuts, ivorce, and disease. The second is most commonly related to over- xposure, low assets, erratic behavior, or bad management perfor- ance. In this paper we tackle the phenomenon of concept drift in redit default, which we now briefly explain. In the existing literature, concept drift is generally used to de- cribe changes in the target concept, which are activated by trans- ormations in the hidden context (Schlimmer & Granger Jr, 1986; idmer & Kubat, 1996) in dynamically changing and non-stationary nvironments. As a result of these transformations, the target concept an shift suddenly or just cause a change in the underlying data dis- ribution to the model. This means that with time, optimal features ay drift significantly from their original configuration or simply lose heir ability to explain the target concept. For example, a reduction of he minimum LTV (loan to value), tighten the space of possible val- es, which is noticed with a change in the distribution, and eventually n the credit default concept. When such drifts happen, the robust- ess of the model may significantly decrease, and in some situations t may no longer be acceptable. Some authors distinguish real concept drift from virtual drift Gama et al., 2014; Sun & Li, 2011; Tsymbal, 2004). The former refers o changes in the conditional distribution of the output (i.e., target ariable) given the input features, while the distribution of the in- ut may remain unchanged. The later refers to gradual changes in the nderlying data distribution with new sample data flowing, whereas he target concept does not change (Sun & Li, 2011). Real concept drift refers to changes in p(y|x), and it happens when he target concept of credit default evolves in time. Such changes can ccur either with or without a change in p(x). This type of drift may appen directly as a result of new rules for defining the target classes, M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 345 g d g r p d d o w i i fi s a s M d t p m i ( i i l V a m i t w c t p t 3 l d o c b t w t a d m T t s m t s i c f m g m d r t w d f p d fi i c t s b p s d ( b i i T fi t t b b d p M s 4 c t s t p a 4 c i i o a c i B u B T t 4 t g ood or bad, as those settled by regulators, when new criteria for efault are demanded to the banks. Examples of these include the uidelines for the minimum number of days past due or in the mate- iality threshold for the amount of credit in arrears, issued with the revious Basel II Accord. Another understanding of the real concept rift in credit default is associated with indirect changes in the hid- en context. In this case, credit default changes when evolving from ne stage of delinquency to another. For example, most of the people ith credit until five days past due tend to pay before the following nstallment, as most of them are just delayers. Yet, the part of debtors n arrears that also fail the next installment are most likely to be in nancial distress, possibly as a result of an abrupt or intrinsic circum- tance, and therefore they require more care from the bank. When rrears exceed three installments, the debtor is most certainly with erious financial constraints, and is likely to fail his credit obligations. ore extreme delays commonly translate into hard stages of credit efault, which require intensive tracking labor or legal actions. Virtual drifts happen when there are changes in the distribu- ion of the new sample data flowing without affecting the posterior robability of the target classes, p(y|x). With time, virtual drifts may ove to real concept drifts. Other interpretations can also be found n literature, for describing an incomplete representation of the data Widmer & Kubat, 1993), and changes in the data distribution lead- ng to changes in the decision boundary (Tsymbal, 2004). Accord- ng to some authors, other events can also be seen as virtual drifts, ike sampling shift (Salganicoff, 1997), temporary drifts (Lazarescu, enkatesh, & Bui, 2004), and feature change (Salganicoff, 1997). As n example of virtual drift, we might consider the credit decision- aking along the recent financial crisis. The lenders had to anticipate f a borrower would enter in default in the future (i.e. being bad). Al- hough the macroeconomic factors have worsened, employed people ith lower debt to income remained good for the lenders, and so they ontinued to have access to credit. Although we are mostly interested to track and detect changes in he real target concept, p(y|x), the methodology introduced in this aper attempts to cover both real concept and virtual drifts applied o the default concept drift detection and model rebuilding. .2. Methods for adaptation Traditional methods for building a scorecard consider a static earning setting. In so doing, this task is based in learning in a pre- efined sample of past examples and then used to predict an actual r a potential borrower, in the future. This is an offline learning pro- edure, because the whole training data set must be available when uilding the model. The model can only be used for predicting, af- er the training is completed, and then it is not re-trained alongside ith its utilization. In other words, when the best separation func- ion is achieved for a set of examples of the past, it is not updated for while, possibly for years, independently of the changes in the hid- en context or in the surrounding environment. New perspectives on odel building arise together with the possibility of learning online. he driving idea is to process new incoming data sequentially, so that he model may be continuously updated. One of the most intuitive ideas for handling concept drift by in- tance selection is to keep rebuilding the model from a window that oves over the latest batches and use the learn model for predic- ion on the immediate future. This idea assumes that the latest in- tances are the most relevant for prediction and that they contain the nformation of the current concept (Klinkenberg, 2004). A framework onnected with this idea consists in collecting the new incoming data or sequential batches in predefined time intervals, e.g. year by year, onth by month, or every day. The accumulation of these batches enerates a panel data flow for dynamic modeling. In Finance, it remains unclear whether it is best having a long emory or forgetting old events. If on the one hand, a long memory is esirable because it allows recalling a wide range of different occur- ences, in the other, many of those occurrences may no longer adjust o the present situation. A rapid adaptation to changes is achieved ith a short window, because it reflects the current distribution of efault more accurately. However, for the contrary reason, the per- ormance of models built upon shorter windows worsens in stable eriods. In credit risk assessment modeling, this matter has been in- irectly discussed by practitioners and researchers when trying to gure the pros and cons of using a through-the-cycle (TTC) or point- n-time (PIT) schema to calibrate the output of the scorecards to the urrent phase of the economic cycle. For years, a PIT schema was he only option, because banks did not have sufficient historical data eries. Since the implementation of the Basel II Accord worldwide, anks are required to store the data of default for a minimum 7-years eriod and consider a minimum of 5-years period for calibrating the corecards. An original idea of Widmer and Kubat (1996) uses a sliding win- ow of fixed length with a data processing structure first-in-first-out FIFO). Each window may consist of a single or multiple sequential atches, instead of single instances. At each new time step, the model s updated following two processes. In the first process, the model s rebuilt based on the training data set of the most recent window. hen, a forgetting process discards the data that move out of the xed-length window. Incremental algorithms (Widmer & Kubat, 1996) are a less ex- reme hybrid approach that allows updating the prediction of models o the new contexts. They are able to process examples batch-by- atch, or one-by-one, and update the prediction model after each atch, or after each example. Incremental models may rely on ran- om previous examples or in representative selected sets of exam- les, called incremental algorithms with partial memory (Maloof & ichalski, 2004). The challenge is to select an appropriate window ize. . Case study This research evolves from a one-dimensional analysis, where we ome across the financial outlook underlying the problem, to a mul- idimensional analysis along several points in time. The former, de- cribed in Sections 4.1, 4.2, and 4.3, is tailored to gain intuition on he default predictors and the main factors ruling the context of the roblem. The latter, in Section 4.4, is designed to gradually develop nd test a new dynamic framework to model credit risk. .1. Dataset and validation environment The research summarized here was conducted in a real-life finan- ial dataset, comprising 762,966 records, from a financial institution n Brazil along two years of operation, from 2009 to 2010. Each entity n the modeling dataset is assigned to a delinquency outcome - good r bad. In this problem, a person is assigned to the bad class if she had payment in delay for 60 or more days, along the first year after the redit has been granted. The delinquency rate in the modeling dataset s 27.3%, which is in line with the high default rates in credit cards in razil, one of the countries with the highest default rates in the prod- ct. The full list of variables in the original data set is available in the RICS 2013 official website. It contains 39 variables, categorized in able 1, and one target variable with values 1 identifying a record in he bad class and 0 for the good class. .2. Data analysis and cleansing Some important aspects of the datasets were considered, because hey can influence the performance of the models. These aspects re- ard to: 346 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 Table 1 Predictive variables summary. Type # Information Numerical 6 Age, monthly income, time at current address, time at current employer, number of dependents, and number of accounts in the bank. Treated as nominal 13 Credit card bills due day, 1st to 4th zip digit codes, home (state, city, and neighborhood), marital status, income proof type, long distance dialing code, occupation code, and type of home. Binary 16 Address type proof, information of the mother and fathers names, input from credit bureau, phone number, bills at the home address, previous credit experience, other credit cards, tax payer and national id, messaging phone number, immediate purchase, overdraft protection agreement, lives and work in the same state, lives and work in the same city, and gender. Date 1 Application date. ID 3 Customer, personal reference, and branch unique identifiers. 2009 2010 0% 20% 40% 60% 80% 100% 72,532 109,200 159,800 225,800 325,700 902,000 C u m u la ti ve f re q u en cy Income (R$) Fig. 1. Cumulative frequency of the monthly income for 2009 and 2010. Table 2 Information values for the tested combi- nations. Combination IV Age × income 0.315 Age × occupation 0.009 Income × marital status 0.208 Income × occupation 0.334 Income × proof of income 0.123 Age × income × occupation 0.007 c t W n c 4 i s t G u g c b r a t w u t 4 t w I d a i 0 0 4 n a w • Significant percent of zero or missing values In exception to the vari- ables ‘lives and work in the same state’ and ‘previous credit expe- rience’, binary flags have 95% to 100% concentrated in one of the values, which turn them practically unworkable. The same occurs for the numerical variables number of dependents and number of accounts in the bank, both with more than 99% zeroes. The re- maining variables were reasonably or completely populated. • Outliers and unreasonable values The variable age presents 0.05% of applications assigned to customers with ages between 100 and 988 years. A small percent of values out of the standard ranges are observable in the variables credit card bills due day, monthly in- come and time at current employer. Unreasonable values are de- tected in the first semester of 2009, suggesting that the data were subjected to corrections from the second semester of 2009 on- wards. • Unreliable and informal information Little reliability on socio- demographic data is amplified by specific conditions in the back- ground of this problem. This type of scorecards is usually based in verbal information that the customer provides, and in most of the cases no certification is made available. In 85% of the applications, no certification for the income was provided, and 75% do not have proof for the address type. Customers have little or no concern to provide accurate information. The financial industry is aware of this kind of limitations. However, in highly competitive environ- ments there is little chance to amend them, while keeping in the business. Hence, other than regulatory imperatives, no player is able to efficiently overcome this kind of data limitations. As cur- rently there are no such imperatives in Brazilian financial market, databases attached to this type of models are likely to keep lack- ing reliability in the near future. • Bias on the distributions of modeling examples The most noticeable bias is in the variable monthly income, where values shift from one year to another, exhibited in Fig. 1. This is most likely related to increases in the minimum wages and inflation. Slight variations are also observable in the geographical variables, which are possibly related with the geographical expansion of the institution. In the remaining characteristics, the correlation between the frequency distributions of 2009 and 2010 range from 99 to 100%, suggesting a very stable pattern during the analyzed period. 4.3. Data transformation and new characteristics 4.3.1. Data cleansing and new characteristics We focused the data treatment on the characteristics that were reasonably or fully populated. Fields state, city, and neighborhood ontain free text, and were subjected to a manual cleansing. At- ributes with 100 or less records were assigned to a new class “Other”. e could observe that there may be neighborhoods with the same ame in different cities; and hence we concatenated these new leansed fields, state and city, into the same characteristic. .3.2. Data transformation Variables were transformed using the weights of evidence (WoE) n the complete modeling dataset, which is a typical measure in credit core modeling (FICO, 2006). W oE = ln g/G b/B , where g and b are respec- ively the number of good and the number of bad in the attribute, and and B are respectively the total number of good and bad in the pop- lation sample. The larger the WoE the higher is the proportion of ood customers in the bin. For the nominal and binary variables we alculated the WoE for each class. Numerical variables were firstly inned using SAS Enterprise Miner, and then manually adjusted to eflect domain knowledge. In so doing we aim to achieve a set of char- cteristics less exposed to overfitting. Cases where the calculation of he WoE rendered impossible - one of the classes without examples – ere given an average value. The same principle was applied to val- es out of the expected ranges (e.g. credit card bills due day higher han 31). .3.3. One-dimensional analysis The strength of each potential characteristic was measured using he information value (IV) in the period, IV = ∑ni=1 (g/G − b/B)W oE, here n is the number of bins in the characteristic. The higher is the V, the higher is the relative importance of the characteristic. In a one- imensional basis, for the entire period, the most important char- cteristics are age, occupation, time at current employer, monthly ncome and marital status, with information values of 0.368, 0.352, .132, 0.117, and 0.116, respectively. Remaining characteristics have .084 or less. .3.4. Interaction terms Using the odds in each attribute of the variables, we calculated ew nonlinear characteristics using interaction terms between vari- bles to model the joint effects. We tested six combinations, for which e present the information value in Table 2. M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 347 4 d o r c e i i d c 4 s m w t u s c b A t t S l b t p e i t o f b p t m a s o s T a w r T i s t f m m c d u a T t m s s i t a t r s 5 t t w t t A s v l f q a y t n w s s 5 m w f o t s t i d i t t s t w .3.5. Time series descriptive analysis Fig. 2a shows the real concept drift along 2009–2010. The highest efault rates are noticed in the first quarter of 2009, and at the end f 2010. Fig. 2b displays the evolution of the business in the same pe- iod. It exhibits two features of the business. First, we can see that the redit cards business follow an annual seasonality, increasing along ach year. Second, the credit cards business is rising over time, which s related with the expansion of the branch network of the financial nstitution. The decrease of default rate during 2009 suggests that the ecision-making process might have been slightly enhanced, when omparing to the beginning of the period. .4. Dynamic modeling framework The dynamic modeling framework presented in this research con- iders that data is processed batch-by-batch. Sequentially, at each onthly window, a new model is learned from a previous selected indow, including the most recent month. To mimic the time evolu- ion, we assumed that the current month gradually shifts from 2009 ntil the third quarter of 2010. Each learning unit for the model building was grounded on a static etting. The training of each unit consists of a supervised classifi- ation procedure, executed in three steps. First, characteristics are inned. Second, the classification model is designed with Generalized dditive Models (GAM) and a 10 fold crossed-validation, upholding he classification algorithm used to develop the winning model in he BRICS 2013 in data mining and finance (BRICS-CCI&CBIC, 2013; ousa et al., 2013b). Concurrently, the best set of characteristics is se- ected until no other characteristic in the training dataset adds contri- ution to the information value (IV) of the model. In this application he threshold was set for a minimum increment of 0.03. Third, the erformance of the model is measured based on the Gini coefficient, quivalent to consider the area under the ROC curve (AUC), which s a typical evaluation criteria among researchers and in the indus- ry (Řezáč & Řezáč, 2011). This coefficient refers to the global quality f the credit scoring model, and ranges between −1 and 1. The per- ect scoring model fully distinguishes the two target classes, good and ad, and has a Gini coefficient equal to 1. A model with a random out- ut has a Gini coefficient equal to zero. If the coefficient is negative, hen the scores have a reverse meaning. The extreme case −1 would ean that all examples of the good class are being predicted as bad, nd vice versa. In this case, the perfect model can be achieved just by witching the prediction. At each month, instances for modeling are selected from all previ- us available batches, according to a selection mechanism. We use in- tance selection methods to test the hypothesis under investigation. wo methods for tackling default concept drift were implemented - full memory time window, and a fixed short memory time window ith a forgetting mechanism. The full memory time window assumes that the learning algo- ithm generates the model based on all previous instances (Fig. 3a). he process is incremental, so every time a new instance arises, it s added to the training set, and a new model is build. This schema hould be appropriate to detect mild concept drifts, but it is unable o rapidly adapt to major changes. Models of this schema should per- orm suitably in stable environments. A shortcoming of this incre- ental schema is that the training dataset quickly expands which ay requires a huge storage capacity, and constrain the use of some lassification algorithms, to be able of processing the expanding ataset. In the fixed short memory time window, the model development ses the most recent window. With this schema, illustrated in Fig. 3b, new model is build in each new batch, by forgetting past examples. he fundamental assumption is that past examples have low correla- ion with the current default concept. Under this setting, the dynamic odeling should quickly adapt to changes. The most extreme case of hort memory time window is when only the current example is con- idered to train the new model, which represents to the online learn- ng without any memory of the past. A deficiency of this method is hat it often lacks of generalization ability in stable conditions that is mplified with extremely short windows. These modeling frameworks enable comparing these configura- ions between themselves, and also compare them with the model eached with a static learning setting. The research questions of this tudy should be answered following the reasoning: • If the full memory time window outperforms the other schema, then more recent data are not fundamental for the prediction; the environment of the decision-making should be in a stable phase. Otherwise, the default concept is drifting, and so the most recent data are more relevant for the prediction. • If a model built with static learning in the first window of the period has the best performance, then older data can improve the prediction. This may happen, for example, when a new credit product is launched, and the credit decision-making criteria are adjusted afterwards. In such case, the oldest data are more rep- resentative, as they can illustrate a more diverse range of risk be- haviors. Otherwise, over the long-run, dynamic modeling should outperform the model learnt with static setting. . Experimental results We assessed the performance of the sequential models built with he dynamic modeling framework introduced in the previous sec- ion, through the period 2009 and 2010. The experimental design as drawn for assessing the performance in the modeling period, in he short-term, and in the farthest-term. In each model rebuilding, he performance in the modeling period was assessed in the test set. dditionally, using two out-of-sample windows, we measured the hort-term performance of the model in the month following the de- elopment, and the farthest-term performance was measured in the ast quarter of 2010. Although we have considered monthly windows or developing the model, for the long-run assessment we chose a uarterly window instead of a single month. In so doing, potential typical properties of the decision-making process at the end of the ear were smoothed. In this section, we provide further evidence on temporal degrada- ion of static credit scoring. Then, we challenge the robustness of the ew concept of dynamic modeling against a static model, developed ith a traditional framework. We finally present and discuss the re- ults for the two sliding-window configurations - full memory and hort memory. .1. Temporal degradation of static credit scoring The temporal degradation of the credit scoring is detected when easuring the performance of each model in the sequence generated ith the dynamic modeling. Fig. 4a and b exhibit the Gini coefficient or each model, measured in the modeling test set and two different ut-of-sample windows, one month after rebuild the model, and in he farthest quarter in the period (2010 Q4). Fig. 4a shows the performance along the entire period with the hort memory configuration. One month after rebuilding the model, he performance curve is always below the performance measured n the modeling period, showing that the performance consistently ecreases one month after rebuilding the model. When evaluat- ng the performance with the full memory configuration, in Fig. 4b, he extent of degradation within a month is not consistent over he period. During the first semester of 2009, performance mea- ured in the month after rebuilding the model is slightly superior to he one measured in the modeling period, and from that point on- ards, it is marginally inferior. This may suggest that the short-term 348 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 Default Average default 17% 22% 27% 32% 37% 42% 2009 -Jan 2009 -Apr 2009 -Jul 2009 -Oct 2010-Jan 2010-Apr 2010-Jul 2010-Oct (% ) r a te D e fa u lt Application month (a) Default rate. 0 20,000 40,000 60,000 80,000 2009 -Jan 2009 -Apr 2009 -Jul 2009 -Oct 2010-Jan 2010-Apr 2010-Jul 2010-Oct (# ) c a r d s c r e d it U n d e r w r it e d Month (b) Number of new credit cards contracts. Fig. 2. Default rate and new contracts in the period 2009–2010. … Learning 2 Learning 1 Learning 3 Learning n timeMonth 1 Month 2 Month 3 Month n … (a) Full memory time window. … Learning 2 Learning 1 Learning 3 Learning n timeMonth 1 Month 2 Month 3 Month n … (b) Fixed short memory time window. Fig. 3. Configurations for tackling concept drift in credit default. Modeling period 1 month after rebuilding Farthest quarter (2010 Q4) 0.27 0.31 0.35 0.39 0.43 0.47 0.51 2009-Jan 2009-Mar 2009-Jun 2009-Sep 2009-Dec 2010-Mar 2010-Jun 2010 - Sep c o e ff ic ie n t G in i Model rebuilding month (a) Short memory. Modeling period 1 month after rebuilding Farthest quarter (2010 Q4) 0.27 0.31 0.35 0.39 0.43 0.47 0.51 2009-Jan 2009-Mar 2009-Jun 2009-Sep 2009-Dec 2010-Mar 2010-Jun 2010 - Sep c o e ff ic ie n t G in i Model rebuilding month (b) Full memory. Fig. 4. Gini coefficient of the sequence of models produced with the dynamical modeling. e t t c i performance is more similar to the performance in the modeling pe- riod when using the full memory configuration. The extent of degradation is higher, when the performance of the model is measured at the end of the period (2014 Q4). The farthest is the point of the prediction from the point of the development; the highest is the extent of degradation of the performance. These ffects are consistently perceived on the two windowing configura- ions - short memory (Fig. 4a) and full memory (Fig. 4b). Considering the real performance of the models one month after hey were built, the average degradation of the models sequentially onstructed, shown in Table 3, is 0.02 in the short memory and 0.01 n the full memory configuration. In the farthest quarter in the period M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 349 Table 3 Average degradation of the sequence of models produced with the dynamic modeling. Memory type Gini index Degradation Modeling Month after Farthest quarter Month after Farthest quarter period rebuilding (2010 Q4) rebuilding (2010 Q4) Short 0, 40 0, 38 0, 33 −0, 02 −0, 07 Full 0, 39 0, 38 0, 33 −0, 01 −0, 06 ( u q t 5 p s n m s i c q p h o p - c t m m m t h i h t m T o i n u s 5 w o t t H f m t t t c t t d 0 H s b i d 6 a u c r d a b e d d a d S w m t p t p n i t o t l l m d c o ( w c e i i n t e 2010 Q4) the degradation reaches 0.07 in the short memory config- ration and 0.06 in the full memory schema. Although degradation can be observed in all models of the se- uence, updating the model always yields the best discrimination be- ween the target classes - good and bads. .2. Dynamic versus static The proposed dynamic modeling framework enables a major im- rovement of the initial static model, which was trained with the ample in the first month of 2009 (2009 M1). Fig. 5a shows the immediate performance achieved with the dy- amic modeling – full and short term memory – versus the static odel, measured in the month following the development. Fig. 5b hows the performance of the models in each point in time, measured n the farthest quarter of the period. Consistently, for both memory onfigurations and performance criteria, immediate or in the farthest uarter, either the static or the dynamic modeling performances im- rove until the third quarter of 2009, which might reflect the en- ancement of the set of characteristics x that was partially corrected ver that period. In Fig. 5a, we observe a certain overlap between the immediate erformance achieved with the two types of memory configurations short and full. For all the periods, the short-term performance in- reases until the third quarter of 2009, and slightly decreases from hat point onwards. The extent of improvement with the dynamic odeling reaches 0.05 in 2010. Fig. 5b shows that the farthest-term performance of the first odel in the sequence of the dynamic modeling, same as the static odel, is significantly improved with the sequential rebuilding un- il the third quarter of 2009, possibly as a consequence of the en- ancement of the set of characteristics. In this period, performance ncreases from 0.28 to 0.36, meaning that the risk assessment is en- anced with the new dynamic modeling, rather than the static. From hat quarter onwards, the long-run predictions given by the dynamic odeling slightly improve, and always outperform the static frame. his suggests that, the new incoming data allow a better knowledge f the new context. Although we know beforehand that the increase n performance is somewhat a consequence of the training being earer to the out-of-sample validation window, still we can see that sing the newest data improves the initial prediction given by the tatic model (2009 Q1). .3. Memory - keep or lose it The new dynamic modeling framework enables investigating on hether it is preferable keeping a long-term memory or forgetting lder observations, or if they are equivalents in some contexts. From he second semester of 2009 onwards, the best results in the farthest- erm (2014 Q4) are reached with the full memory configuration. owever, we realize that there is a certain overlap between the per- ormances of the sequence of models resulting from the two types of emory configurations, both for the short-term and for the farthest- erm. This suggests that, in the period, the information contained in he older examples remain appropriate for the default target, and that he context is not drifting as a result of particular changes in the set of haracteristics. Hence, drifts in particular characteristics, like income, ranslate into virtual drifts because they did not have an impact in he distribution of target concept, p(y|x). To some extent, the imme- iate performance, exhibited in Fig. 5a, decreases during 2010 from .44 to 0.38%, which could be interpreted as the presence of a drift. owever, as the timeframe is small, it remains uncertain if it is a tran- itory outcome or a persistent drift in the context, potentially caused y changes in features that are not represented in the set of character- stics available for modeling in this application, like macroeconomic ata. . Conclusions This research presents a new modeling framework for credit risk ssessment that extends the prevailing credit scoring models built pon historical data static settings. Our framework mimics the prin- iple of films, by composing the model with a sequence of snapshots, ather than a single picture. Within the new modeling schema, pre- ictions are made upon the sequence of temporal data, and are suit- ble for adapting to the occurrence of real concept drifts, translated y changes in the population, in the economy or in the market. It also nables improving the existing models based on the newest incoming ata. We present an empirical simulation using a real-world financial ataset of 762,966 credit cards, from a financial institution in Brazil long two years of operation. A first conclusion is that monthly up- ates avoid the degradation of the model following the development. econdly, newest data consistently improve the forecasting accuracy, hen compared to the previous models in the sequence of dynamic odeling, both in a short-term as in a full-term memory configura- ion. Particularly, the static model available at the beginning of the eriod is outperformed by every succeeding model, suggesting that he dynamic modeling framework has the ability of improving the rediction by integrating new incoming data. Third, a slight domi- ance is achieved with the full-term memory, suggesting that older nformation remains meaningful for predicting default target within he analyzed period. In banking industry, prevailing credit scoring models are devel- ped from static windows and kept unchanged possibly for years. In his setting, the two basic mechanisms of memory, short-term and ong-term memory are fundamental to learning, but are still over- ooked in current modeling frameworks. As a consequence, these odels are insensitive to changes, like population drifts or financial istress. The usual outcomes are the default rates rising and abrupt redit cuts, as those that were observed in the U.S. in the aftermath f the last Global crisis (as documented by Sousa, Gama, and Brandão 2015)). This problem could be overcome with the proposed frame- ork, since it would allow to gradually relearning along time and hanges. Still, there are some real business problems with rebuilding mod- ls over time. First, lenders have little incentive to enhance the exist- ng rating systems frameworks because there is a recycling idea that t expensive and time-consuming to build new scorecards. Then, they eed to be internally tested and validated, and then regulators need o approve them. Second, regulators still promote models whose co- fficients do not change over time. This is one area where practice is 350 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 Short memory Full memory Static model (2009 M1) 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 2009-Jan 2009-Mar 2009-Jun 2009-Sep 2009-Dec 2010-Mar 2010-Jun 2010 - Sep c o e ff ic ie n t G in i Model rebuilding month (a) Short-term performance. Short memory Full memory Static model (2009 M1) 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 2009-Jan 2009-Mar 2009-Jun 2009-Sep 2009-Dec 2010-Mar 2010-Jun 2010 - Sep c o e ff ic ie n t G in i Model rebuilding month (b) Farthest-term performance, 2010 Q4. Fig. 5. Performance with the dynamic modeling – full and short memory – versus the static model (2009 M1). B B B C C C C C C D D D D E E F F G G H H J J K L L far distant from the technical advances, and new thoughts, like sim- plifying current decision layers, need to be encouraged. There are some important topics in default concept drift that we did not consider, which we defer for future research. While this pa- per provides convincing results, some additional simulations using real-world datasets from highly stressed economic environments and longer time frames would be valuable. Second, modeling the delin- quency presents a specificity since a window of time is required in order to measure the outcome, i.e. the true class, before the new model is built. Therefore, for forecasting, it turns out that there will be a time gap of the same length between the values of predictor variables used in the model and the first possible forecast period in the future. Although this is not a problem of the proposed methodol- ogy, future research should bring new insights to overcome this issue, with a view on practicality. Third, some good alternatives to using windows of data blocks are encouraged, which may be based on us- ing ensembles of the models learned in the past, possible combining the two components of memory, short-term and long-term memory, or a forgetting factor method. There is some material on this going back to Adams et al. (2010). Fourth, our empirical study considered a set of fixed predictors. Therefore, future research should consider sets of predictor of variable length. This is important for detecting concept drift, because the set predictors that are being used may be quite limited to exhibit signs of change, even if they are occurring in the environment. Finally, performance is reported in this paper, but the conditions leading to difference in performance are not explored. This is another future research direction. References Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techniques and evaluation criteria: A review of the literature. Intelligent Systems in Accounting, Finance and Management, 18(2-3), 59–88. Adams, N. M., Tasoulis, D. K., Anagnostopoulos, C., & Hand, D. J. (2010). Temporally- adaptive linear classification for handling population drift in credit scoring. In Y. Lechevallier, & G. Saporta (Eds.), Proceedings of compstat’2010 (pp. 167–176). Physica-Verlag HD. Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corpo- rate bankruptcy. Journal of Finance, 23(4), 589–609. Anderson, R. (2007). The credit scoring toolkit: Theory and practice for retail credit risk management and decision automation. OUP Oxford. Avery, R. B., Calem, P. S., & Canner, G. B. (2004). Consumer credit scoring: Do situational circumstances matter? Journal of Banking & Finance, 28(4), 835–856. Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6), 627–635. BCBS (2006). International convergence of capital measurement and capital standards: A revised framework - comprehensive version. Bank for International Settlements. Bellotti, T., & Crook, J. (2013). Forecasting and stress testing credit card default using dynamic models. International Journal of Forecasting, 29(4), 563–574. BIS (2004). Implementation of Basel II: Practical considerations. Bank for International Settlements. reiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont, California: Wadsworth International Group. RICS-CCI&CBIC (2013). CI algorithms competition (CIAC): Credit risk assessment sys- tem robustness against degradation and seasonal variation. http://brics- cci.org/ ci- algorithms- competition- ciac/ Accessed 19.06.13. rown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. http://dx.doi.org/10.1016/j.eswa.2011.09.033. hawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of artificial intelligence research, 321– 357. hen, D., Zhong, Y., Liao, Y., & Li, L. (2013). Review of multiple criteria and multi- ple constraint-level linear programming. Procedia Computer Science, 17(0), 158– 165. hen, M.-C., & Huang, S.-H. (2003). Credit scoring and rejected instances reassigning through evolutionary computation techniques. Expert Systems with Applications, 24(4), 433–441. rook, J., & Bellotti, T. (2010). Time varying and dynamic models for default risk in consumer loans. Journal of the Royal Statistical Society: Series A (Statistics in Society), 173(2), 283–305. rook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447– 1465. rook, J. N., Thomas, L. C., & Hamilton, R. (1992). The degradation of the scorecard over the business cycle. IMA Journal of Management Mathematics, 4(1), 111–123. esai, V. S., Crook, J. N., & Overstreet Jr, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24–37. omingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discov- ery and data mining. In KDD ’99 (pp. 155–164). New York, NY, USA: ACM. uda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. John Wiley & Sons. urand, D. (1941). Risk elements in consumer installment financing. National Bureau of Economic Research, Inc. inav, L., Jenkins, M., & Levin, J. (2013). The impact of credit scoring on consumer lend- ing. The RAND Journal of Economics, 44(2), 249–274. isenbeis, R. A. (1978). Problems in applying discriminant analysis in credit scoring models. Journal of Banking & Finance, 2(3), 205–219. ICO (2006). Introduction to scorecard for FICO model builder, . isher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), 179–188. ama, J. (2010). Knowledge discovery from data streams. London: Chapman & Hall/CRC. ama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Comput. Surv., 46(4), 44:1–44:37. doi:10.1145/ 2523813. and, D. J. (2006). Classifier technology and the illusion of progress. Statistical Science, 21(1), 30–34. sieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655–665. ain, A. K., Duin, R. P. W., & Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37. ensen, H. L. (1992). Using neural networks for credit scoring. Managerial Finance, 18(6), 15–26. linkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent data analysis, 8(3), 281–300. azarescu, M. M., Venkatesh, S., & Bui, H. H. (2004). Using multiple windows to track concept drift. Intelligent data analysis, 8(1), 29–59. ee, T.-S., & Chen, I.-F. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications, 28(4), 743–752. http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0010 http://brics-cci.org/ci-algorithms-competition-ciac/ http://dx.doi.org/10.1016/j.eswa.2011.09.033 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0023 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0023 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0024 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0024 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0025 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0025 http://dx.doi.org/10.1145/2523813 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0028 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0028 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0030 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0030 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0031 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0031 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0033 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0033 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0033 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0033 M.R. Sousa et al. / Expert Systems With Applications 45 (2016) 341–351 351 L L L M M M M M M M M O P Q R S S S S S S S S S T T T T W W W W W W Y ee, T.-S., Chiu, C.-C., Lu, C.-J., & Chen, I.-F. (2002). Credit scoring using the hybrid neural discriminant technique. Expert Systems with Applications, 23(3), 245–254. i, S.-T., Shiue, W., & Huang, M.-H. (2006). The evaluation of consumer loans using sup- port vector machines. Expert Systems with Applications, 30(4), 772–782. ucas, A. (2004). Updating scorecards: Removing the mystique. In Readings in credit scoring: foundations, developments, and aims (pp. 93–109). New York: Oxford Uni- versity Press. alhotra, R., & Malhotra, D. K. (2002). Differentiating between good credits and bad credits using neuro-fuzzy systems. European Journal of Operational Research, 136(1), 190–211. aloof, M. A., & Michalski, R. S. (2004). Incremental learning with partial instance memory. Artificial intelligence, 154(1), 95–126. arqués, A. I., García, V., & Sánchez, J. S. (2012a). On the suitability of resampling tech- niques for the class imbalance problem in credit scoring. Journal of the Operational Research Society, 64(7), 1060–1070. arqués, A. I., García, V., & Sánchez, J. S. (2012b). Two-level classifier ensembles for credit risk assessment. Expert Systems with Applications, 39(12), 10916–10922. arqués, A. I., García, V., & Sánchez, J. S. (2013). A literature review on the application of evolutionary computing to credit scoring. Journal of the Operational Research So- ciety, 64(9), 1384–1399. artens, D., De Backer, M., Haesen, R., Vanthienen, J., Snoeck, M., & Baesens, B. (2007). Classification with ant colony optimization. IEEE Transactions on Evolutionary Com- putation, 11(5), 651–665. cNab, H., & Wynn, A. (2000). Principles and practice of consumer credit risk manage- ment. CIB Publishing. in, J. H., & Lee, Y.-C. (2005). Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Systems with Applica- tions, 28(4), 603–614. ng, C.-S., Huang, J.-J., & Tzeng, G.-H. (2005). Building credit scoring models using ge- netic programming. Expert Systems with Applications, 29(1), 41–47. avlidis, N., Tasoulis, D., Adams, N., & Hand, D. (2012). Adaptive consumer credit classi- fication. Journal of the Operational Research Society, 63(12), 1645–1654. uinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81–106. ˇ ezáč, M., & Řezáč, F. (2011). How to measure the quality of credit scoring models. Finance a Uver: Czech Journal of Economics & Finance, 61(5), 486–507. aberi, M., Mirtalaie, M. S., Hussain, F. K., Azadeh, A., Hussain, O. K., & Ashjari, B. (2013). A granular computing-based approach to credit scoring modeling. Neurocomput- ing, 122(0), 100–115. alganicoff, M. (1997). Tolerating concept and sampling shift in lazy learning using pre- diction error context switching. Artificial Intelligence Review, 11(1-5), 133–155. chlimmer, J. C., & Granger Jr, R. H. (1986). Incremental learning from noisy data. Ma- chine learning, 1(3), 317–354. ousa, M. R., & da Costa, J. P. (2008). A tripartite scorecard for the pay/no pay decision- making in the retail banking industry. Frontiers in Artificial Intelligence and Applica- tions, 45. ousa, M. R., Gama, J., & Brandão, E. (2015). Links between scores, real default and pric- ing: Evidence from the Freddie Mac’s loan-level dataset. Journal of Economics, Busi- ness and Management, 3(12), 1106–1114. ousa, M. R., Gama, J., Brandão, E., et al. (2013a). Introducing time-changing economics into Credit Scoring. Technical Report. Universidade do Porto, Faculdade de Econo- mia do Porto. ousa, M. R., Gama, J., & Gonçalves, M. J. S. (2013b). A two-stage model for dealing with temporal degradation of credit scoring. In Proceedings of BRICS-CCI & CBIC. teenackers, A., & Goovaerts, M. (1989). A credit scoring model for personal loans. In- surance: Mathematics and Economics, 8(1), 31–34. un, J., & Li, H. (2011). Dynamic financial distress prediction using instance selection for the disposal of concept drift. Expert Systems with Applications, 38(3), 2566–2576. homas, L. C. (2009). Consumer credit models: pricing, profit and portfolios: Pricing, profit and portfolios. Oxford University Press. homas, L. C. (2010). Consumer finance: Challenges for operational research. Journal of the Operational Research Society, 61, 41–52. homas, L. C., Edelman, D. B., & Crook, J. N. (2002). Credit scoring and its applications. Philadelphia: Society for Industrial and Applied Mathematics. symbal, A. (2004). The problem of concept drift: Definitions and related work. Dublin: Computer Science Department, Trinity College. ang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble learn- ing for credit scoring. Expert systems with applications, 38(1), 223–230. ang, Y., Wang, S., & Lai, K. (2005). A new fuzzy support vector machine to evaluate credit risk. IEEE Transactions on Fuzzy Systems, 13(6), 820–831. ei, G., Yun-Zhong, C., & Ming-shu, C. (2014). A new dynamic credit scoring model based on the objective cluster analysis. In Practical applications of intelligent sys- tems. In Advances in Intelligent Systems and Computing: 279 (pp. 579–589). Springer Berlin Heidelberg. est, D. (2000). Neural network credit scoring models. Computers & Operations Re- search, 27(11), 1131–1152. idmer, G., & Kubat, M. (1993). Effective learning in dynamic environments by explicit context tracking. In Machine learning: Ecml-93 (pp. 227–243). Springer. idmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1), 69–101. ang, Y. (2007). Adaptive credit scoring with kernel learning methods. European Journal of Operational Research, 183(3), 1521–1536. http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0034 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0035 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0036 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0036 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0037 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0037 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0037 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0037 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0038 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0038 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0038 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0038 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0039 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0040 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0041 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0042 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0043 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0043 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0043 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0043 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0044 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0044 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0044 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0044 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0045 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0046 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0047 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0047 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0048 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0048 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0048 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0048 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0049 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0050 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0050 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0051 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0051 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0051 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0051 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0052 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0052 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0052 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0052 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0053 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0054 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0055 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0056 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0056 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0056 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0056 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0057 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0057 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0057 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0057 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0058 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0058 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0059 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0059 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0060 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0061 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0061 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0062 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0063 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0064 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0065 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0065 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0066 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0066 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0066 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0066 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0067 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0067 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0067 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0067 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0068 http://refhub.elsevier.com/S0957-4174(15)00684-3/sbref0068 A new dynamic modeling framework for credit risk assessment 1 Introduction 2 Settings and concepts 2.1 Supervised learning problem 2.2 Score formulation 2.3 Supervised classification methods 3 Dynamic modeling for credit default 3.1 Concept drift in credit default 3.2 Methods for adaptation 4 Case study 4.1 Dataset and validation environment 4.2 Data analysis and cleansing 4.3 Data transformation and new characteristics 4.3.1 Data cleansing and new characteristics 4.3.2 Data transformation 4.3.3 One-dimensional analysis 4.3.4 Interaction terms 4.3.5 Time series descriptive analysis 4.4 Dynamic modeling framework 5 Experimental results 5.1 Temporal degradation of static credit scoring 5.2 Dynamic versus static 5.3 Memory - keep or lose it 6 Conclusions References