Title: 1 Does segmentation always improve model performance in credit scoring? Katarzyna Bijak a * and Lyn C. Thomas a a School of Management, University of Southampton, Southampton SO17 1BJ, UK * Corresponding author. Tel.: +44 23 80598964. E-mail address: K.Bijak@soton.ac.uk. Abstract Credit scoring allows for the credit risk assessment of bank customers. A single scoring model (scorecard) can be developed for the entire customer population, e.g. using logistic regression. However, it is often expected that segmentation, i.e. dividing the population into several groups and building separate scorecards for them, will improve the model performance. The most common statistical methods for segmentation are the two-step approaches, where logistic regression follows Classification and Regression Trees (CART) or Chi-squared Automatic Interaction Detection (CHAID) trees etc. In this research, the two-step approaches are applied as well as a new, simultaneous method, in which both segmentation and scorecards are optimised at the same time: Logistic Trees with Unbiased Selection (LOTUS). For reference purposes, a single-scorecard model is used. The above-mentioned methods are applied to the data provided by two of the major UK banks and one of the European credit bureaus. The model performance measures are then compared to examine whether there is improvement due to the segmentation methods used. It is found that segmentation does not always improve model performance in credit scoring: for none of the analysed real-world datasets, the multi-scorecard models perform considerably better than the single-scorecard ones. Moreover, in this application, there is no difference in performance between the two-step and simultaneous approaches. Keywords Credit scoring; Segmentation; Logistic regression; CART; CHAID; LOTUS mailto:K.Bijak@soton.ac.uk 2 1. Introduction Thomas et al. (2002) define credit scoring as “the set of decision models and their underlying techniques that aid lenders in the granting of consumer credit” (p. 1). These models and techniques are used to assess the credit risk of bank customers (individuals as well as small and medium enterprises). Depending on the data used to build models, there are different types of scoring. Application scoring is based on data from loan application forms while behavioural scoring is based on data on customers’ behaviour stored in bank databases. A special type of the latter is credit bureau scoring. Credit bureaus are institutions that collect and analyse data on loans granted by banks operating in a given country (Anderson, 2007; Van Gestel and Baesens, 2009). Such data enable tracking the credit history of a customer in the banking sector. Credit bureau scoring is based on data on customers’ credit histories. Application scoring can also be enriched with data from a credit bureau. As a rule, using such data increases performance of a scoring model (Van Gestel and Baesens, 2009). A scoring model describes the relationship between customer’s characteristics (independent variables) and his or her creditworthiness status (a dependent variable). A customer’s status can be either “good” or “bad” (and sometimes also “indeterminate” or “other”). The most common form of scoring models is referred to as a scorecard. According to Mays (2004), the scorecard is “a formula for assigning points to applicant characteristics in order to derive a numeric value that reflects how likely a borrower is, relative to other individuals, to experience a given event or perform a given action” (p. 63). Scorecards are used to calculate scores and/or probabilities of default (PD). They are sometimes scaled to obtain a required relationship between scores and PD. A scoring model can consist of one or more scorecards. In the latter case, it can be referred to as a suite of scorecards. In order to develop such a multi-scorecard model, segmentation has to be applied. It is commonly expected that segmentation will improve the model performance. Segmentation is often carried out using the two-step approaches, where logistic regression follows Classification and Regression Trees (CART) or Chi-squared 3 Automatic Interaction Detection (CHAID) trees. In this research, these approaches were applied as well as Logistic Trees with Unbiased Selection (LOTUS). The latter is a new, simultaneous method, in which both segmentation and scorecards are optimised at the same time. A single-scorecard logistic regression model was used as a reference. All these methods were applied to the data provided by two of the major UK banks and one of the European credit bureaus. Once the models were developed, the obtained results were analysed to examine whether there is improvement in the model performance due to the segmentation methods used. Moreover, the segmentation contribution was assessed. The paper is structured as follows. In the next section, the theoretical background of segmentation is presented as well as segmentation methods and other researchers’ findings on its impact on the model performance. In the third section, the basics of logistic regression, CART, CHAID and LOTUS are introduced. In the fourth section, the datasets are described. The fifth section is on the research results. The sixth section is a discussion and the last section includes the research findings and conclusions. 2. Segmentation 2.1. Theoretical background In credit scoring, segmentation can be defined as “the process of identifying homogeneous populations with respect to their predictive relationships” (Makuch, 2001, p. 140). The identified populations are treated separately in the process of a scoring model development, because of possible unique relationships between customer’s characteristics and a dependent variable. Nowadays segmentation is widely used in banking. There are various segmentation drivers, i.e. factors that can drive the division of a scoring model into two or more scorecards. Anderson (2007) classifies them into: marketing, customer, data, process and model fit factors. The first four factors reflect, respectively, the special treatment of some market segments, or customer groups, data issues (such as data availability) and business process requirements (e.g. different definitions of a dependent variable). 4 The model fit relates to interactions within the data and using segmentation to improve the model performance. In this research, the focus is on segmentation which is driven by the model fit factors. As far as segmentation is concerned, there are two key concepts: a segmentation basis and a segmentation method. A segmentation basis is a set of variables that allow for the assignment of potential customers to homogeneous groups. Segmentation bases can be classified as either general or product-specific, and either observable or unobservable (Wedel and Kamakura, 2000). As far as scorecard segmentation is concerned in this research, there is an unobservable product-specific basis. Once the segmentation is implemented, customers are grouped on the basis of their unobservable behavioural intentions to repay their loans or the relationship between their intentions and characteristics. On the date of grouping, it is not known whether the customers are going to repay or not. According to Wedel and Kamakura (2000), there are six criteria for effective segmentation. It seems that three of them are especially important in credit scoring: identifiability (customers can be easily assigned to segments), stability and responsiveness (segments differ from each other in their response/behaviour). Unobservable product-specific bases, which contain behavioural intentions, are characterised by good identifiability, moderate stability and very good responsiveness (Wedel and Kamakura, 2000). The above-mentioned advantages make these bases promising as far as scorecard segmentation is concerned. Segmentation methods can be classified as either associative (descriptive) or regressive (predictive) approaches (Aurifeille, 2000; Wedel and Kamakura, 2000). Since the ultimate goal is to assess the credit risk, the latter are applied in this research. There are two types of regressive approaches: two-step (a-priori) and simultaneous (post-hoc) methods (Aurifeille, 2000; Wedel and Kamakura, 2000). In the two-step approaches, segmentation is followed by the development of a regression model in each segment. In the simultaneous methods, both segmentation and regression models are optimised at the same time. 5 The two-step approaches are not designed to yield optimal results in terms of the prediction accuracy but rather to aid the understanding of overall strategy. On the other hand, the simultaneous methods give priority to a low, tactical level rather than to a high, strategic level of decision: the optimisation objective is to obtain the most accurate prediction, and not necessarily a meaningful and easily understandable segmentation (Desmet, 2001). 2.2. Segmentation methods There is not much literature on segmentation methods in credit scoring. According to Siddiqi (2005), segmentation methods can be classified as either experience-based (heuristic) or statistical. As far as the experience-based methods are concerned, one approach is to define segments that are homogeneous with respect to some customers’ characteristics. This allows for the development of segment-specific variables. For example, creating a segment of customers, who have a credit card, enables construction of such characteristics as credit limit used. Another approach is to define segments that are homogeneous with respect to the length of customers’ credit history (cohorts) or data availability (thin/thick credit files). For instance, creating a segment of established customers allows building behavioural variables based on the data from the last 12 months, the last 24 months etc. Furthermore, if there is a group (e.g. mortgage loan owners or consumer finance borrowers) that is expected to behave differently from other customers, or for whom the previous scoring model turned out to be inefficient, it is worth creating a separate segment for such a group. Moreover, customers can be grouped into segments in order to make it easier for a bank to treat them in different ways, e.g. by setting different cut-offs, i.e. score thresholds used in the decision making (Thomas, 2009). Finally, segmentation can be based on variables (e.g. age) that are believed to have strong interactions with other characteristics (Thomas, 2009). This is a heuristic approach but it has been developed into statistical methods based on interactions. An alternative to segmentation based on a selected variable is to include all its interactions with the other variables in a single-scorecard model (Banasik et al., 1996). 6 However, such a model has a large number of parameters and is less understandable than a multi-scorecard one. The experience-based segmentation methods can help achieve various goals such as improving the model performance for a certain group of customers or supporting the decision making process. The experience-based segmentation may also allow for better risk assessment for the entire population of customers. However, there is no guarantee that segmentation, which intuitively seems reasonable, will increase the model performance (Makuch, 2001). As far as statistical methods are concerned, segmentation is obtained using statistical tools as well as data mining and machine learning techniques. One approach is to do the cluster analysis (Siddiqi, 2005). The cluster analysis can be conducted using hierarchical clustering, the k-means algorithm or Self-Organizing Maps (SOMs). Regardless of the algorithm applied, clustering is based on customers’ characteristics. Therefore, customers with different demographic or behavioural profiles are classified into different segments. The resulting groups are homogeneous with respect to the characteristics but, since the customers’ status is not used in segmentation, they do not need to differ in risk profiles. Another approach is to use tree-structured classification methods such as CART or CHAID (VantageScore, 2006). In this approach, grouping is based on the customers’ status, and thus segments differ in risk profiles. Both the cluster analysis and classification trees can constitute the first step in the two-step regressive approaches. However, the classification trees often yield sub-optimal results (VantageScore, 2006). In 2006 VantageScore introduced a new, multi-level segmentation approach: combining experience-based segmentation (at higher levels) and segmentation based on a dedicated score (at lower levels). This score must be calculated using an additional scoring model which has to be built first. The split points on the score are determined using CART. Using the score enables dividing customers in such a way that in each segment, customers are similar to one another as far as their risk profile is concerned. There is an assumption that different risk profiles are associated with different relationships between a dependent variable and customer’s characteristics. 7 The VantageScore approach makes it easier for a bank to treat subprime and prime customers in different ways, but it seems that this approach does not have to be always optimal in terms of the model performance. There were also some attempts to develop methods that would allow for the optimal segmentation, i.e. a segmentation that would maximise the model performance. Their results can be classified as the simultaneous methods. Hand at el. (2005) suggested a method for the optimal division into two segments. In both segments, the same set of variables is used to develop a scorecard. The optimal division into the two groups is found using exhaustive search (each possible split point is examined on each variable or the linear combination of variables). For each possible pair of segments, two logistic regression models are built. The fit of the two-scorecard model is assessed using its overall likelihood, i.e. a product of likelihoods of the scorecards, and that division is chosen which gives the highest overall likelihood. However, the adopted assumptions (only two segments, the same variables) result in limited usefulness of the suggested method. In banking practice, customers are usually divided into at least a few segments, in which different sets of variables are used. Another approach to the optimal segmentation is Fair Isaac’s Adaptive Random Trees (ART) technology (Ralph, 2006). In this approach, the trees are not built level by level as in most tree-structured classification methods. In the beginning, the trees are randomly created using some predefined split points on the possible splitting variables. Then a genetic algorithm is applied to find the best tree, i.e. the tree that gives the highest divergence in the system of scorecards in its leaves, where the scorecards are naïve Bayes models. In all of them, there is the same set of characteristics as in the parent scorecard which is built on the entire sample. The ART technology has fewer drawbacks than other methods. It should allow for the maximisation of the model performance (measured using divergence). The number of segments is not predetermined. The use of the genetic algorithms avoids the exhaustive search that is both expensive and time-consuming. However, there is still a serious disadvantage, since – as in Hand at el. (2005) – the same set of variables is used in all scorecards. 8 2.3. Impact of segmentation It is commonly asserted by scorecard developers that a suite of scorecards allows for better risk assessment than a single scorecard used for all customers. According to Makuch (2001), segmentation usually increases performance by 5 to 10 percent in comparison with a single-scorecard system. It is also believed that segmentation can significantly contribute to performance of a scoring model. Impact of segmentation on the model performance measures can be assessed using simulated results of random scorecards applied to the identified segments (Thomas, 2009). The segmentation contribution to the model performance can also be assessed using difference between a performance measure of the model and the weighted average among the scorecards. This average is calculated using weights equal to percentages of customers classified to the segments. Banasik et al. (1996) analysed impact of some experience-based divisions on discrimination of a model. They set a few cut-offs and measured the discrimination in terms of errors that occur on a holdout sample. As a result, they found that “it is not the case that creating scorecards on separate subpopulations is necessarily going to give better discrimination than keeping to one scorecard on the full population”. For a suite of scorecards, it is difficult to choose cut-offs that are independent, good and robust at the same time. However, if cut-offs are chosen in the same way for all models, multi-scorecard models reject less applicants than single-scorecard ones. This may also be considered an advantage of segmentation. 3. Models 3.1. Logistic regression Logistic regression is the most commonly used method for developing scoring models. Since there is a binary dependent variable (either good or bad), binomial logistic regression is applied. In binomial logistic regression, a dependent variable y is equal to the cumulative distribution function F of a logistic distribution: 9 βx βx    e Fy 1 1 )( , where x is a vector of independent variables (covariates) and  is a vector of model parameters (Greene, 2000, p. 815). The parameters are usually estimated using the maximum likelihood (ML) method. The estimated value of a dependent variable lies between 0 and 1. Thus, it can be interpreted as probability of a dependent variable being equal to one. In credit scoring, this is probability of a customer being bad (probability of default). In scorecards, covariates are often used in the form of Weights of Evidence (WoE). If a discrete or discretised variable X takes K values, then the Weight of Evidence for its nth value (n ≤ K) is computed using the following formula (Anderson, 2007, p. 192):                            K k kn K k knn BBGG BnP GnP WoE 11 ln )|( )|( ln , where Gn (Bn) is a number of goods (bads) for whom X takes the nth value. A ratio of goods to bads is referred to as the odds in credit scoring. The population odds is a ratio of the proportion of goods pG to the proportion of bads pB in the population. It is often assumed that there is a linear relationship between the score and the log odds (Mays, 2004). Using the Bayes’ rule, it can be shown that the log odds sn among customers, for whom X takes the nth value, are equal to a sum of the population log odds spop and the Weight of Evidence for the nth value of X (Thomas, 2009, p. 33): npop B G B G n WoEs BnP GnP p p pBnP pGnP nBP nGP s                              )|( )|( lnln )|( )|( ln )|( )|( ln . Weights of Evidence allow for the assessment and comparison of the relative credit risk associated with different values of a variable (attributes of a characteristic). There is sometimes no theory that would support the choice of covariates. Therefore, the best set of covariates is often identified using the stepwise selection of variables (Hosmer and Lemeshow, 2000). The stepwise selection is a procedure of alternate inclusion and exclusion of variables from a model based on the statistical significance of their coefficients that is measured with a p-value. In logistic regression, the 10 likelihood ratio test or the Wald test are used to assess significance of the coefficients. In both cases there are the chi-square test statistics. In a forward selection step, the variable is included that, once added to the model, has the most significant coefficient. In a backward elimination step, the variable, which has the least significant coefficient, is excluded from the model. The stepwise selection is especially useful in case of a large number of possible covariates. Therefore, it is popular in behavioural scoring. The goodness-of-fit of a logistic regression model can be measured e.g. using the deviance. In logistic regression, the deviance plays the same role as the residual sum of squares in linear regression. It is calculated according to the following formula:                            n i i i i i i i y p y y p yD 1 1 ˆ1 ln)1( ˆ ln2 , where i y is the dependent variable value and i p̂ is the estimated probability of 1 i y for the ith observation, i = 1, …, n (Hosmer and Lemeshow, 2000, p. 13). In credit scoring, it is important how well the model fits the data but it is even more important how effectively it separates the goods and the bads. The separating ability is often referred to as the discriminatory power. There is a wide selection of discriminatory power measures (Thomas, 2009), with the Gini coefficient and the Kolmogorov-Smirnov (KS) statistic being the most commonly used ones. Both the Gini coefficient and the KS statistic can be calculated using the cumulative distribution functions (CDFs) of scores, computed separately for goods and bads (Thomas, 2009). The KS statistic is equal to the maximum difference between these CDFs. In order to calculate the Gini coefficient, the Receiver Operating Characteristic (ROC) curve is usually constructed. The ROC curve can be drawn by plotting the above-mentioned CDFs against each other. The Gini coefficient is equal to the double area under the ROC curve (AUROC) less one. Similarly to the KS statistic, it takes values between 0 and 1 with higher values meaning the stronger discriminatory power. Among other discriminatory power measures, there are the Somers D-concordance statistic and the Mann-Whitney U-statistic. The relationship between these statistics, the Gini coefficient and AUROC is as follows (Thomas, 2009, p. 113 and p. 120): 11 121AUROC2GINI  BG S nn U D , where nG and nB are numbers of good and bad customers, respectively. 3.2. CART Classification and Regression Trees (CART) are a popular nonparametric statistical method (Breiman et al, 1998). In this research, the focus is on classification trees, i.e. trees with a nominal dependent variable. In CART, predictors can be both continuous and categorical while splits are binary. All possible splits on all variables are examined and assessed. In order to measure quality of a split, the impurity function values are calculated for both child nodes. The impurity is often assumed to take the form of the entropy: )1log()1(log)( ppppNI  or the Gini index: )1(2)( ppNI  , where p is a fraction of observations with a positive response in the node N (Izenman, 2008, p. 288). Once all splits are assessed, such a split of the node N into N1 and N2 is selected that results in the largest decrease in impurity (Breiman et al, 1998, p. 32): )()()(),( 2 2 1 1 21 NI N N NI N N NINNIG  . The tree is grown using the recursive partitioning, i.e. each child node is split in the same way (Berk, 2008). The growing process continues until no more nodes can be split. In order to avoid excessively large structures and overfitting, the tree is then pruned back. The pruning process consists in minimising the cost-complexity measure that is defined as follows: TTRTR   )()( , where R(T) is an estimate of the misclassification cost of the tree T and  is the complexity parameter while |T| denotes the number of leaves (Breiman et al, 1998, p. 66). For each value of the complexity parameter, the smallest tree can be identified that minimises the cost-complexity measure. As a result, there is a sequence of nested subtrees. The best subtree is selected using a test sample or cross-validation. In this research, test samples were used. The trees were created in SAS Enterprise Miner and 12 served as the first step in the two-step approach. Splits were selected using the Gini index as the impurity function. The CART method is often compared to the C4.5 algorithm, another popular method for building classification trees (Hand et al, 2001; Larose, 2005). However, there are some important differences between them, e.g. the latter allows splitting into three or more child nodes (multi-way splits). Moreover, in the C4.5 algorithm, the split selection is always based on the information gain, i.e. reduction in entropy. 3.3. CHAID Chi-Square Automatic Interaction Detection (CHAID) is also a tree-structured classification method (Kass, 1980). It belongs to a family of methods known as Automatic Interaction Detection (AID). As its name suggests, the AID allows for the detection of interactions between variables. Thus, the segmentation is based on the interactions. The AID requires that predictors are categorical, i.e. either discrete or discretised (if originally continuous). The original categories of the predictors are grouped into a number of classes using a stepwise procedure that includes both merging and splitting steps. In a merging step, all categories or classes are compared to one another using some tests. The least significantly different ones are then grouped into a new class. In a splitting step, all possible binary divisions of a class are analysed and such a division is selected that leads to the most significantly different classes. Only classes, which consist of 3 or more categories, can be divided. The resulting grouping is then selected to split the node. There can be multi-way splits (Hawkins and Kass, 1982). In CHAID, the dependent variable has to be nominal, and the split selection is based on the chi-square tests of independence between the grouped predictors and the dependent variable. In order to account for multiple testing, the Bonferroni correction is used (Hawkins and Kass, 1982). The Bonferroni correction adjusts the test significance level for many tests that are performed at the same time. 13 Once a node is split, the grouping and testing process is repeated for each child node. Growing the tree continues until there are no more nodes that can be split. No pruning is carried out. However, in this research, CHAID was used as the first step in the two- step approach. Thus, manual pruning was performed to ensure that in each leaf, there are enough bads to build a logistic regression model. The trees were produced in SAS Enterprise Miner. Classification trees, including CART and CHAID, can be used not only for segmenting customers but also for developing scoring models (Thomas et al., 2002; Yobas et al., 2004). They can be applied instead of e.g. logistic regression. In such an application, each customer can be assigned probability of default equal to the bad rate in the leaf that he or she falls into. 3.4. LOTUS Chan and Loh (2004) noticed that there is selection bias in CART (but not CHAID) and in all other methods where exhaustive search is used for variable selection: if all possible splits based on all variables are considered, then variables with more unique values are more likely to be selected to split the node. The selection bias problem is overcome in the Logistic Tree with Unbiased Selection (LOTUS) algorithm (Chan and Loh, 2004; Loh, 2006). This algorithm allows for the development of classification trees with logistic regression models in their leaves. Since the trees are built together with the models, this is a simultaneous method. The algorithm starts with a regression model developed using the entire training sample (at the root). Once a node is split, new models are built in the child nodes. In order to avoid the bias, the split selection is divided into two separate steps: variable selection and split point selection (Chan and Loh, 2004). For all variables, which are analysed in the first step, the chi-square statistics are computed. The statistic used depends on whether the analysed variable serves as a regressor in the parent node, i.e. the node to be split. For non-regressors the ordinary chi-square statistic is calculated while for regressors the trend-adjusted chi-square statistic is computed. The latter tests whether there are any nonlinear effects after adjusting for a linear trend (Armitage, 1955). The variable with the lowest p-value is selected to split the node. In the second 14 step, the split point is selected that minimises the total deviance, i.e. the sum of deviances of regression models built in the child nodes. The algorithm stops when there are too few observations to split a node or to develop a model. The CART pruning method is then used to prune the tree. The cost- complexity measure is based on the total deviance (summed over all leaves). Finally, the subtree with the lowest total deviance is selected (Chan and Loh, 2004). The LOTUS algorithm is implemented in the LOTUS software (Chan, 2005). In this research, the LOTUS software was used with the following options: logistic regression models with stepwise selection were built in all nodes, and the pruning process was based on test samples. 4. Data In this research, three real-world datasets are used. The data describes individual customers. There are two datasets containing application data and one dataset with behavioural (credit bureau) data. The datasets are referred to as A1, A2 and B, respectively. In order to get unbiased results, each dataset was randomly divided into training, validation and test samples. In all these samples, the bad rate is the same as in the original dataset. The datasets A2 and B were divided into the samples that contain ca 50, 30 and 20 per cent of customers, respectively. The samples, which were created as a result of the dataset A1 division, include ca 50, 25 and 25 per cent of customers (there would be an insufficient number of bads in a smaller test sample). The training samples were used to develop models. The validation samples served as holdout ones, i.e. they were not used in the model development. Once a model was built, its stability was evaluated through the comparison of its discriminatory power on the training and validation samples. The smaller the difference, the more stable the model. The test samples were only used to prune the trees. 15 4.1. Dataset A1 The dataset A1 was provided by one of the major UK banks. There is data on 7,835 applicants, of whom 6,440 were goods and 1,395 were bads. Originally, there was also data on some rejected applications but they were then excluded from the dataset. The applications were made between April and September 1994. Customers applied for personal loans for different purposes. Loan amounts ranged from £500 to £50,000 while repayment periods varied from 6 months to 5 years. The characteristics are listed in Appendix A. They describe both a customer and a loan that he or she applied for. There are also some credit bureau variables in the dataset. 4.2. Dataset A2 The dataset A2 was provided by another major UK bank. There is data on 39,858 customers, including 38,135 goods and 1,723 bads. Originally, there were also some indeterminates who were then eliminated from the dataset. The loans were opened between May 1994 and August 1996. Loan amounts ranged from £300 to £15,000 while loan terms (durations) varied from 6 months to 10 years. In the original dataset, there were 111,946 customers. There was not only application but also credit bureau data (see Appendix A). However, the additional data was provided only for a part of the dataset. There are reasons to assume that the bank had such data for other customers, too. In order to account for this, the bad rate should be the same among customers with and without the credit bureau data (4.32%). All goods and bads, for whom there is the additional data, are included in the dataset. As far as customers without the credit bureau data are concerned, all bads are included as well as such a number of randomly sampled goods that the bad rate is equal to 4.32%. The resulting numbers of goods and bads are mentioned above. 16 4.3. Dataset B The dataset B was provided by one of the European credit bureaus. There is data on 186,574 customers, of whom 179,544 were goods and 7,030 were bads. In the original dataset, there was also data on some indeterminates but they were then excluded. Since the data was sampled from the credit bureau database, the customers had different credit products with different banks. There are 324 characteristics based on the customer’s credit history. However, they cannot be listed since this is proprietary information. Some examples include: worst payment status within the last 12 months, number of credit inquiries within the last 12 months, number of open accounts, number of past loans, total credit limit etc. The characteristics are as of the 1st of July 2008 (observation point) and the customer’s status is as of the 1st of July 2009 (outcome point). Thus, the outcome period length is exactly equal to twelve months. 5. Results In this research, suites of scorecards were developed based on the above-mentioned datasets. Both the two-step and simultaneous approaches were adopted. In the two- step approaches, segmentation was performed using CART and CHAID, and scorecards were built for the identified segments. In the simultaneous approach, the LOTUS algorithm was used to develop both segmentation and scorecards. For reference purposes, a single-scorecard model was estimated based on each dataset. All the scorecards were built using logistic regression with stepwise selection. No interaction variables were allowed in the scorecards. The variable grouping process was performed in the Interactive Grouping node in SAS Enterprise Miner. Categories of discrete variables were grouped into classes while continuous variables were discretised (binned) first. For each variable, such a division was selected that maximises reduction in entropy on the entire training sample. No more than five classes were allowed. The groupings were sometimes modified manually to put them in line with the banking experience. 17 In all the adopted approaches, only grouped variables and those original ones, which are categorical, were allowed to split the nodes. If necessary, the CART and CHAID trees were pruned back manually until there were at least a minimum number of bads in each leaf. This number was assumed to be equal to 100 for the datasets A1 and A2 and 500 for the dataset B. The same minimum numbers of bads were set as an option in the LOTUS algorithm. The CART, CHAID and LOTUS trees are presented in Appendix B. In each leaf, numbers represent: the number of bads and the bad rate in the leaf, as well as the number of all customers and their share in the training sample. In the CHAID tree for the dataset B, there is one leaf with only 16 bads (marked with an asterisk). It was not possible to prune the tree more because this leaf is a child node of the root. However, with such a number of bads, it was not possible to build a scorecard, either. Therefore, in this leaf all customers were assigned the same probability of default that is equal to the bad rate (0.3%). As a result, there is no separating ability and both the Gini coefficient and the KS statistic are equal to 0 in this leaf. For each dataset, there is at least one variable that was selected to split nodes in most trees based on this dataset. Time with Bank was used in all trees for the dataset A1. For the dataset A2, all nodes were split using either Loan Amount or Loan Purpose. For the dataset B, Var2 was used in both the CART and the LOTUS trees. The variables Var1, Var2 and Var3 are based on the payment statuses of customer’s loans (describing delinquencies etc.). In all the developed scorecards, characteristics were used in the form of WoE (based on the entire training sample). It was assumed that no scorecard could consist of more than 10 characteristics since in a credit scoring application, there are usually between 6 and 15 best variables (Anderson, 2007). In Appendix A, the characteristics, which were used in the reference logistic regression models based on the datasets A1 and A2, are marked with a bold font. In the reference scorecard based on the dataset B, there are, among other variables, Var1, number of credit inquiries within the last 9 months and age of the oldest loan. Some variables were used both in the reference models and in the trees: Time with Bank and Insurance (A1), Loan Amount and Loan Purpose (A2) as well as Var1 (B). 18 In each suite, the scorecards are consistent in terms of scale, i.e. there is the same relationship between scores and PD. This enables the calculation of discriminatory power measures for the entire model. The Gini coefficients and KS statistics are presented in Tables 1 and 2, respectively. There are values obtained on the training, test and validation samples. Only for the dataset A1, do the multi-scorecard models perform slightly better than the reference logistic regression model on a training sample: both the Gini coefficients and the KS statistics are higher by 2-3 percentage points. For the other datasets, the differences in the Gini coefficient do not exceed one percentage point, what makes them negligible. All the models for the dataset B are perfectly stable: the Gini coefficients and the KS statistics are very similar on the training and validation samples. The perfect stability is probably due to the size of a training sample and the power of credit bureau variables. The models for A2 are still stable while those for A1 cannot be considered stable: the Gini coefficients are lower by more than 10 percentage points on the validation sample as compared to the training sample. For both A1 and A2, logistic regression models are the most stable, probably due the smallest number of parameters and the simplest structure. The Gini coefficients and the KS statistics, which were obtained on the validation samples for the datasets A2 and B, are similar for single- and multi-scorecard models. However, on the validation sample for the dataset A1, the discriminatory power measures are higher by 3-5 percentage points for the logistic regression than for the CART- and CHAID-based models. For each approach, the segmentation contribution to the model performance was assessed using difference between the Gini coefficient or the KS statistic of the model and the weighted average among the scorecards on the training sample (see Tables 3 and 4). For comparison purposes, the discriminatory power measures were also calculated for the CART, CHAID and LOTUS trees. In order to compute these measures, it was assumed that each customer was assigned a probability of default equal to the bad rate in his or her segment. The results are presented in Tables 3 and 4. There are the Gini coefficients and the KS statistics of the entire models (“Model”) 19 and scorecard averages calculated using weights equal to percentages of customers classified to the segments (“Scorecards”). There are also differences between the former and the latter (“Difference”) as well as the discriminatory power measures of the trees (“Tree”). For the dataset A1, the trees are much weaker than the scorecards, the segmentation contribution does not exceed 9 percentage points and the scorecards are comparable to the logistic regression. As a result, the multi-scorecard models slightly outperform the single-scorecard one. For the datasets A2 and B, both the Gini coefficients and the KS statistics of the trees are high, often higher than those of the scorecards. The segmentation contribution is up to even 20 percentage points. However, the scorecards, which were built for the identified segments, are much weaker than the logistic regression models developed on the entire training samples. Therefore, there is no difference in performance between the single- and multi-scorecard models. 6. Discussion It can be surprising that segmentation does not improve the model performance, especially on the credit bureau dataset. As far as the credit bureau is concerned, the population is heterogeneous because there are customers of different banks, using different products etc. It could be expected that segmentation would bring an improvement in risk assessment for this population. It is worth seeing, in what situations segmentation can improve the model performance and the simultaneous approach can perform better than the two-step approaches. In order to show an example of such a situation, an artificial dataset was constructed. It is assumed that there is a random variable X and two simple logistic regression models based on this variable. In the first model, the parameter coefficient is equal to  while in the second model it is equal to –. It means that the relationship between X and a binary dependent variable Y is positive in the former and negative in the latter model. Values of Y are randomly generated using these two models. As a result, there are two groups of customers: G1 and G2. Their sizes do not have to be equal but should not differ much. In G1, the bad rate is higher than in G2. Subsequently, G1 is 20 split into G11 and G12 so that G12 is similar to G2 in terms of the bad rate. Ultimately, there are three groups of customers: G11 (the first model, high bad rate), G12 (the first model, low bad rate) and G2 (the second model, low bad rate). In order to distinguish them from one another, a new variable Z is created. For different groups, Z takes random values from different, non-overlapping intervals, e.g. (a, b) for G11, (b, c) for G12 and (c, d) for G2. It is determined for each customer separately. The artificial dataset contains three variables (X, Y and Z). There are training, validation and test samples having at least a few thousand customers each. The two-step approaches based on CART and CHAID as well as the LOTUS algorithm and logistic regression were applied to an artificial dataset that was constructed in the above-described way. The results (Gini coefficients and KS statistics) are presented in Tables 5 and 6. The single-scorecard model performance is relatively poor since both X and Z are weak variables on the entire sample. CART and CHAID produced the same segmentation: the sample is split on Z equal to b so that G11 is in one node and G12 and G2 are in another node. The high-bad-rate group was separated from the low-bad-rate ones (this is how the classification trees work). However, it was difficult to build a good scorecard for the node, which contains both G12 and G2, since the data were generated using the completely different models. As a result, the entire model performs only slightly better than the single-scorecard one. The LOTUS algorithm split the sample on Z equal to c so that G11 and G12 are in one node and G2 is in another node. The groups, whose data were generated using the different models, were separated from each other. This allowed for the development of good scorecards in both nodes. Therefore, the simultaneous approach outperforms the two-step approaches on the artificial dataset. This is an example of a situation in which segmentation improves the model performance and the simultaneous approach outperforms the two-step approaches. However, it seems rather unusual in banking practice that the same characteristic affects the score positively in one group and negatively in another. Provided that there 21 is such a characteristic in a real-world application, will it make a difference in a ten- or-more-characteristic scorecard? 7. Conclusions For none of the analysed real-world datasets, the multi-scorecard models perform considerably better than the logistic regression. Thus, the first and most important finding is that segmentation does not always improve model performance in credit scoring. The performance improvement is not necessary to occur even if it is going to be the only goal of segmentation, as in this research. This is in line with findings of Banasik et al. (1996) which were confirmed here also for the statistical methods of segmentation. Secondly, there is no difference in performance between the two-step and simultaneous approaches. Classification trees (CART and CHAID) followed by logistic regression in their leaves yield similar results to the LOTUS algorithm, in which both segmentation and scorecards are optimised at the same time. The LOTUS algorithm had seemed promising as a method for the optimal segmentation. However, it outperforms neither the two-step approaches nor the logistic regression. Thirdly, for a large sample including strong characteristics, all the models have the same separating ability and are equally stable. In this case, the two-step and simultaneous approaches as well as the logistic regression perform very similarly. For smaller samples and/or weaker characteristics, the logistic regression models are the most stable since they have less parameters and a simpler structure than the multi- scorecard models. Fourthly, segmentation contribution can be up to 20 percentage points. The discriminatory power measures of the trees, which are used for segmentation, can be even higher than those of the scorecards developed in their leaves. This means that segmentation itself can be a very powerful tool. However, it seems that such strong segmentation does not leave much space for the scorecards to further discriminate customers. Thus, the scorecards on average are weaker than the single-scorecard model. 22 Fifthly, it is possible to show an example of a situation in which segmentation improves the model performance and the simultaneous approach outperforms the two- step approaches on an artificial dataset. However, such a situation as in the example seems rather unusual in credit scoring practice. Building more than one scorecard requires more time and resources to be allocated to development, implementation, maintenance, monitoring and validation of the model. These additional costs should be compensated for by the improvement in performance, if it is the goal of segmentation. As this research shows, such improvement is not necessary to occur. If it does not occur, it makes sense to use a single-scorecard model. In banking practice it is common not to compare the developed multi-scorecard model with a single-scorecard one. Building the latter is usually considered a waste of time since there is a strong belief that segmentation allows for better risk assessment. However, maintaining several scorecards, which perform like a single one, seems to be a much greater waste of resources. In light of this research, it is strongly recommended to develop a single-scorecard model for comparison purposes. Although the model performance is very important, it is not the only criterion of the model choice. It is possible that a multi-scorecard model is similar to a single- scorecard one in terms of the performance but e.g. the ROC curve has a better shape for the former than for the latter. Then it makes sense to choose the multi-scorecard model since there are better cut-off levels. In this research, the focus is on segmentation, which is driven by the model fit factors, but it should not be forgotten that segmentation is sometimes driven by other factors. Then the model performance improvement is not the goal. Further analysis of segmentation in credit scoring could also include using other simultaneous approaches, e.g. Logistic Model Trees (Landwehr et al., 2005). 23 Training sample Test sample Validation sample Dataset A1 CART 0.527 0.374 0.359 CHAID 0.531 0.392 0.351 LOTUS 0.520 0.425 0.386 Logistic regression 0.499 0.404 0.397 Dataset A2 CART 0.663 0.623 0.618 CHAID 0.664 0.621 0.622 LOTUS 0.664 0.634 0.634 Logistic regression 0.657 0.640 0.635 Dataset B CART 0.807 0.813 0.808 CHAID 0.807 0.814 0.805 LOTUS 0.805 0.817 0.803 Logistic regression 0.801 0.818 0.807 Table 1. The Gini coefficient values for training, test and validation samples Training sample Test sample Validation sample Dataset A1 CART 0.389 0.296 0.267 CHAID 0.386 0.320 0.283 LOTUS 0.379 0.344 0.298 Logistic regression 0.362 0.317 0.316 Dataset A2 CART 0.516 0.479 0.477 CHAID 0.520 0.469 0.489 LOTUS 0.502 0.491 0.487 Logistic regression 0.497 0.505 0.485 Dataset B CART 0.705 0.704 0.701 CHAID 0.705 0.712 0.696 LOTUS 0.702 0.710 0.700 Logistic regression 0.692 0.708 0.698 Table 2. The KS statistic values for training, test and validation samples 24 Model (1) Scorecards (2) Difference (1) – (2) Tree Dataset A1 CART 0.527 0.442 0.086 0.328 CHAID 0.531 0.453 0.077 0.295 LOTUS 0.520 0.485 0.036 0.164 Dataset A2 CART 0.663 0.502 0.161 0.567 CHAID 0.664 0.499 0.165 0.563 LOTUS 0.664 0.554 0.110 0.397 Dataset B CART 0.807 0.671 0.136 0.634 CHAID 0.807 0.635 0.172 0.619 LOTUS 0.805 0.608 0.197 0.572 Table 3. The Gini coefficient values of models, scorecards and trees Model (1) Scorecards (2) Difference (1) – (2) Tree Dataset A1 CART 0.389 0.353 0.036 0.261 CHAID 0.386 0.355 0.031 0.234 LOTUS 0.379 0.370 0.009 0.164 Dataset A2 CART 0.516 0.395 0.121 0.443 CHAID 0.520 0.389 0.130 0.443 LOTUS 0.502 0.433 0.070 0.384 Dataset B CART 0.705 0.514 0.190 0.615 CHAID 0.705 0.496 0.209 0.595 LOTUS 0.702 0.546 0.156 0.517 Table 4. The KS statistic values of models, scorecards and trees 25 Training sample Test sample Validation sample Artificial Dataset CART/CHAID 0.528 0.519 0.517 LOTUS 0.636 0.635 0.633 Logistic regression 0.482 0.479 0.469 Table 5. The Gini coefficient values for training, test and validation samples (artificial dataset) Training sample Test sample Validation sample Artificial Dataset CART/CHAID 0.392 0.388 0.380 LOTUS 0.486 0.497 0.499 Logistic regression 0.335 0.344 0.330 Table 6. The KS statistic values for training, test and validation samples (artificial dataset) 26 References Anderson, R. (2007). The Credit Scoring Toolkit. New York: Oxford University Press. Armitage, P. (1955). Tests for Linear Trends in Proportions and Frequencies. Biometrics, 11(3), 375-386. Aurifeille, J.M. (2000). A bio-mimetic approach to marketing segmentation: Principles and comparative analysis. European Journal of Economic and Social Systems, 14(1), 93-108. Banasik, J.L., Crook, J.N. and Thomas, L.C. (1996). Does scoring a subpopulation make a difference. The International Review of Retail, Distribution and Consumer Research, 6(2), 180-195. Berk, R.A. (2008). Statistical Learning from a Regression Perspective. New York: Springer. Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1998). Classification and regression trees. Boca Raton, Florida: Chapman & Hall/CRC. Chan, K.-Y. and Loh, W.-Y. (2004). LOTUS: An algorithm for building accurate and comprehensible logistic regression trees. Journal of Computational and Graphical Statistics, 13, 826-852. Chan, K.-Y. (2005). LOTUS User Manual (version 2.2). Available at: http://www.stat.wisc.edu/~kinyee/Lotus/manual.pdf. (Accessed: 15 January 2010). Desmet, P. (2001). Buying behavior study with basket analysis: pre-clustering with a Kohonen map. European Journal of Economic and Social Systems, 15(2), 17-30. Greene, W.H. (2000) Econometric Analysis. Upper Saddle River: Prentice Hall. Hand, D., Mannila, H. and Smyth, P. (2001). Principles of Data Mining. Cambridge, Massachusetts: MIT Press. Hand, D.J., Sohn, S.Y. and Kim, Y. (2005). Optimal bipartite scorecards. Expert Systems with Applications, 29(3), 684-690. Hawkins, D.M. and Kass, G.V. (1982). Automatic Interaction Detection. In: D.M. Hawkins (ed) (1982). Topics in Applied Multivariate Analysis. Cambridge: Cambridge University Press, 269-302. Hosmer, D.W. and Lemeshow, S. (2000). Applied Logistic Regression. New York: Wiley. http://www.stat.wisc.edu/~kinyee/Lotus/manual.pdf 27 Izenman, A.J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. New York: Springer. Kass, G.V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, 29(2), 119-127. Landwehr N., Hall M. and Frank E. (2005). Logistic Model Trees. Machine Learning, 59(1-2), 161-205. Larose, D.T. (2005). Discovering knowledge in data: an introduction to data mining. Hoboken, New Jersey: Wiley. Loh, W.-Y. (2006). Logistic Regression Tree Analysis. In: H. Pham (ed) (2006) Handbook of Engineering Statistics. London: Springer, 537-549. Makuch, W.M. (2001). The Basics of a Better Application Score. In: E. Mays (ed) (2001) Handbook of Credit Scoring. Chicago: Glenlake Publishing Company, 127-148. Mays, E. (2004). Credit Scoring for Risk Managers. The Handbook for Lenders. Mason, Ohio: Thomson South-Western. Ralph, C. (2006). Using Adaptive Random Trees (ART) for optimal scorecard segmentation. A Fair Isaac White Paper. Available at: http://www.computerworlduk.com/cmsdata/whitepapers/5126/UsingAdaptiveRan domTreesARTforoptimalscorecardsegmentationwp0406.pdf. (Accessed: 15 January 2010). Siddiqi, N. (2005). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. New York: Wiley. Thomas, L.C., Edelman, D.B. and Crook, J.N. (2002). Credit Scoring and Its Applications. Philadelphia: SIAM. Thomas, L.C. (2009). Consumer Credit Models: Pricing, Profit, and Portfolios. New York: Oxford University Press. Van Gestel, T. and Baesens, B. (2009). Credit Risk Management. Basic concepts: financial risk components, rating analysis, models, economic and regulatory capital. New York: Oxford University Press. VantageScore (2006). Segmentation for Credit Based Delinquency Models White Paper. Available at: http://www.vantagescore.com/docs/ segmentation.pdf. (Accessed: 22 January 2010). Wedel, M. and Kamakura, W.A. (2000). Market Segmentation: Conceptual and Methodological Foundations. New York: Springer. http://www.computerworlduk.com/cmsdata/whitepapers/5126/UsingAdaptiveRandomTreesARTforoptimalscorecardsegmentationwp0406.pdf http://www.computerworlduk.com/cmsdata/whitepapers/5126/UsingAdaptiveRandomTreesARTforoptimalscorecardsegmentationwp0406.pdf http://www.vantagescore.com/docs/segmentation.pdf http://www.vantagescore.com/docs/segmentation.pdf 28 Yobas, M.B., Crook, J.N. and Ross, P. (2004). Credit scoring using neural and evolutionary techniques. In: Thomas, L.C., Edelman, D.B. and Crook, J.N. (eds) (2004) Readings in Credit Scoring: Foundations, Developments, and Aims. New York: Oxford University Press, 277-293. 29 Appendix A. Customer’s characteristics Dataset A1 Dataset A2 Age Age a Marital Status Marital Status Residential Status Number of Children MOSAIC Classification Residential Status Time at Current Address Time at Current Address Time at Previous Address Home Phone Home Phone Time with Current Employer Occupation Gross Income Time with Current Employer FiNPiN Classification Time with Previous Employer Loan Type Net Income Loan Amount Pension Scheme Loan Purpose Time With Bank Insurance Number of Credit Cards Payment Frequency Amex / Diners Card Holder Number of Searches for Exact Name (Current Address) Loan Amount Time since Last CCJ for Exact Name (Current Address) Loan Term Number of Write-offs for Exact Name (Current Address) Loan Purpose Time since Last CCJ for Similar Name (Current Address) Total Cost of Goods Number of Write-offs for the Same Surname (Current Address) Insurance Number of Bad Events for the Same Surname (Current Address) Payment Frequency Number of Bad Events at the Postal Code (Current Address) Payment Method Number of Bad Events Which Have Turned Good at the Postal Code (Current Address) Number of Searches in the Last 6 Months Percentage of Bad Events Which Have Turned Good at the Postal Code (Current Address) Value of CAIS (Bad Debts, Same Surname, Other Initial, Current and Previous Address) Number of Dormant Events at the Postal Code (Current Address) Value of CAIS (Bad Debts, Same Surname, Same Initial, Current and Previous Address) Electoral Roll Status for the Same Surname (Current Address) Value of CCJ (Same Surname, Other Initial, Current and Previous Address) Time on Electoral Roll (Current Address) Value of CCJ (Same Surname, Same Initial, Current and Previous Address) Number of Searches for Exact Name (Previous Address) Time since Most Recent CAIS (Bad Debt, Same Surname, Other Initial, Current and Previous Address) Time since Last CCJ for Exact Name (Previous Address) 30 Dataset A1 Dataset A2 Time since Most Recent CAIS (Bad Debt, Same Surname, Same Initial, Current and Previous Address) Number of Write-offs for Exact Name (Previous Address) Time since Most Recent CCJ (Same Surname, Other Initial, Current and Previous Address) Time since Last CCJ for Similar Name (Previous Address) Time since Most Recent CCJ (Same Surname, Same Initial, Current and Previous Address) Number of Write-offs for the Same Surname (Previous Address) Number of CAIS (Bad Debts, Same Surname, Other Initial, Current and Previous Address) Number of Bad Events for the Same Surname (Previous Address) Number of CAIS (Bad Debts, Same Surname, Same Initial, Current and Previous Address) Number of Bad Events at the Postal Code (Previous Address) Number of CCJ (Same Surname, Other Initial, Current and Previous Address) Number of Bad Events Which Turned Good at the Postal Code (Previous Address) Number of CCJ (Same Surname, Same Initial, Current and Previous Address) Percentage of Bad Events Which Have Turned Good at the Postal Code (Previous Address) Number of Dormant Events at the Postal Code (Previous Address) Electoral Roll Status for the Same Surname (Previous Address) Time on Electoral Roll (Previous Address) a The characteristics, which were used in the reference logistic regression models, are marked with a bold font. Table 7. Customer’s characteristics 31 Appendix B. Tree structures Time with Bank Insurance Bads: 152 (9.4%) All: 1612 (41.0%) Time at Current Add. Bads: 146 (16.1%) All: 908 (23.1%) Bads: 185 (23.5%) All: 786 (20.0%) Bads: 216 (34.5%) All: 626 (15.9%) < 3 years >= 3 years Yes No < 5.2 years >= 5.2 years Figure 1. The CART tree for the dataset A1 Time with Bank Bads: 216 (34.5%) All: 626 (15.9%) Bads: 168 (21.6%) All: 777 (19.8%) Bads: 193 (14.7%) All: 1314 (33.4%) Bads: 122 (10.0%) All: 1215 (30.9%) < 3 years 3-6 years 6-12 years >= 12 years Figure 2. The CHAID tree for the dataset A1 Time with Bank Bads: 577 (21.2%) All: 2717 (69.1%) Bads: 122 (10.0%) All: 1215 (30.9%) < 12 years >= 12 years Figure 3. The LOTUS tree for the dataset A1 32 Loan Amount Loan Amount Bads: 104 (1.1%) All: 9904 (49.7%) Loan Purpose Bads: 202 (3.6%) All: 5625 (28.2%) Bads: 110 (7.8%) All: 1402 (7.0%) Loan Purpose Loan Amount Bads: 145 (9.2%) All: 1577 (7.9%) Bads: 116 (16.2%) All: 714 (3.6%) Bads: 185 (26.1%) All: 708 (3.6%) < £ 3400 >= £ 3400 < £ 5200 >= £ 5200 < £ 1400 >= £ 1400 RF RF other other Figure 4. The CART tree for the dataset A2 Loan Amount Bads: 104 (1.1%) All: 9904 (49.7%) Bads: 154 (3.4%) All: 4512 (22.6%) Bads: 158 (6.3%) All: 2515 (12.6%) Loan Purpose Bads: 120 (8.9%) All: 1350 (6.8%) Bads: 118 (18.1%) All: 652 (3.3%) Bads: 208 (20.9%) All: 997 (5.0%) < £ 1400 £ 1400 - £ 2500 £ 2500 - £ 3400 £ 3400 - £ 5200 >= £ 5200 other RF or HI Figure 5. The CHAID tree for the dataset A2 Loan Amount Bads: 416 (2.5%) All: 16931 (85.0%) Loan Purpose Bads: 149 (9.9%) All: 1502 (7.5%) Bads: 297 (19.8%) All: 1497 (7.5%) < £ 3400 >= £ 3400 other RF, MP or HI Figure 6. The LOTUS tree for the dataset A2 33 Var1 Var2 Bads: 1085 (1.3%) All: 84040 (90.1%) Bads: 500 (11.7%) All: 4279 (4.6%) Bads: 1929(38.8%) All: 4967 (5.3%) Figure 7. The CART tree for the dataset B Var3 Bads*: 16 (0.3%) All: 6156 (6.6%) Bads: 1090 (1.4%) All: 76595 (82.1%) Bads: 525 (9.4%) All: 5614 (6.0%) Bads: 1883(38.3%) All: 4921 (5.3%) Figure 8. The CHAID tree for the dataset B Number of Loans Var2 Bads: 969 (1.3%) All: 72181 (77.4%) Bads: 1423(28.0%) All: 5077 (5.4%) Bads: 1122 (7.0%) All: 16028 (17.2%) < 3 >= 3 Figure 9. The LOTUS tree for the dataset B