Self-Adaptive Attribute Weighting for Naive Bayes Classification Jia Wua,b, Shirui Panb, Xingquan Zhuc, Zhihua Caia, Peng Zhangb, Chengqi Zhangb aSchool of Computer Science, China University of Geosciences, Wuhan 430074, China. bQuantum Computation & Intelligent Systems (QCIS) Centre, Faculty of Engineering & Information Technology, University of Technology Sydney, NSW 2007, Australia. cDepartment of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA. Abstract Naive Bayes (NB) is a popular machine learning tool for classification, due to its simplicity, high computational efficiency, and good classification accuracy, especially for high dimensional data such as texts. In reality, the pronounced advantage of NB is often challenged by the strong conditional independence assumption between attributes, which may deteriorate the classification perfor- mance. Accordingly, numerous efforts have been made to improve NB, by using approaches such as structure extension, attribute selection, attribute weighting, instance weighting, local learning and so on. In this paper, we propose a new Artificial Immune Sys- tem (AIS) based self-adaptive attribute weighting method for Naive Bayes classification. The proposed method, namely AISWNB, uses immunity theory in artificial immune systems to search optimal attribute weight values, where self-adjusted weight values will alleviate the conditional independence assumption and help calculate the conditional probability in an accurate way. One noticeable advantage of AISWNB is that the unique immune system based evolutionary computation process, including initialization, clone, section, and mutation, ensures that AISWNB can adjust itself to the data without explicit specification of functional or distribu- tional forms of the underlying model. As a result, AISWNB can obtain good attribute weight values during the learning process. Experiments and comparisons on 36 machine learning benchmark data sets and six image classification data sets demonstrate that AISWNB significantly outperforms its peers in classification accuracy, class probability estimation, and class ranking performance. Keywords: Naive Bayes, Self-Adaptive, Attribute Weighting, Artificial Immune Systems, Evolutionary Computing 1. Introduction Naive Bayes (NB) (Friedman et al., 1997), a special Bayesian network, is a Bayes’ theorem oriented learning model par- ticularly useful for learning tasks involving high dimensional data (Hernández-González et al., 2013), such as text classifi- cation (Kim et al., 2006; Chen et al., 2009) and web mining (Zhang et al., 2009). In general Bayesian models, the clas- sification is derived by using the dependency (or conditional dependency) between random variables. This process is typi- cally time consuming because examining relationships among all random variables is a combinatorial optimization task. Al- ternatively, NB relaxes the restriction of the dependency struc- tures between attributes by simply assuming that attributes are conditionally independent, given the class label. As a result, examining relationships between attributes is no longer needed and the derivation of an NB model can linearly scale to the training data. In reality, attributes in many learning tasks are correlated to each other, so NB’s conditional independence assumption may impair its classification performance (Webb et al., 2012). Email addresses: jia.wu@student.uts.edu.au (Jia Wu), shirui.pan@student.uts.edu.au (Shirui Pan), xzhu3@fau.edu (Xingquan Zhu), zhcai@cug.edu.cn (Zhihua Cai), peng.zhang@uts.edu.au (Peng Zhang), chengqi.zhang@uts.edu.au (Chengqi Zhang) In order to relax the conditional independence assumption and simultaneously retain NB’s efficiency, many approaches have been proposed by using solutions in five main categories: (1) structure extension (Liu et al., 2011; Jiang et al., 2012a); (2) attribute weighting (Zaidi et al., 2013; Wu et al., 2013a); (3) attribute selection; (4) instance weighting; and (5) instance se- lection. In this paper, we propose to use attribute weighting to mitigate NB’s primary weakness (the conditional indepen- dence assumption of the attributes) by assigning a weight value to each individual attribute. Because weight values enforce at- tributes to play different roles in classification, the correspond- ing Weighted Naive Bayes (WNB) will help relax the condi- tional independence assumption and make NB efficient for data (Wu et al., 2007) with strong attribute correlations. Indeed, the objective of assigning different weight values to attributes share striking similarity as feature selection, where the later intends to discover a subset set of features (with equal importance) to train a classification model. Assume weight values of all attributes are suitably determined, feature selec- tion can be achieved by using a subset of features ranked in a preference order. Therefore, feature weighting can be consid- ered as a generalization of feature selection, and many methods have been using feature selection to help improve NB classi- fication. For example, Langley & Sage (1994) proposed the Selective Bayes Classifier (SBC), by using feature selection to accommodate redundant attributes in the prediction process Preprint submitted to ELsevier September 6, 2014 and to augment Naive Bayes with the ability to exclude at- tributes that introduce dependencies. Meanwhile, in order to discover proper weight values for weighted NB classification, researchers have proposed many useful methods to evaluate the importance of attributes, including gain ratio (Zhang & Sheng, 2004), correlation-based algorithm (Hall, 2000), mutual infor- mation (Jiang et al., 2009, 2012b), and ReliefF attribute rank- ing algorithm (Robnik-Šikonja & Kononenko, 2003). Zhang & Sheng (2004) investigated the gain ratio based weighting scheme and several wrapper based methods for finding attribute weights in order to improve the Area Under Curve (AUC), which is a common metric used to compare algorithms by comparing their performance with respect to different param- eter settings. Hall (2007) proposed a new attribute weight- ing method to improve the AUC value, where the weights as- signed to the attributes are inversely proportional to the mini- mum depth at which the attributes are first tested in an unpruned decision tree. The above methods for weighted Naive Bayes classification have achieved good performance to solve domain specific prob- lems, through the employment of some external criteria, such as gain ratio, to determine suitable weight values for attributes. By doing so, the assessment of the attributes and the deriva- tion of the NB models are separated into two steps, with the attribute weights being determined without taking the NB ob- jective function into consideration. In order to address the prob- lem and seamlessly integrate attribute weighting and NB learn- ing into an integrated process, in this paper we first carry out a systematic experimental analysis for the existing improved al- gorithms for naive Bayes via attribute weighting (WNB), and then propose a new method to automatically calculate opti- mal attribute weight values for WNB, by directly working on WNB’s objective function. To this end, we employ an evolu- tionary computation based method, namely Artificial Immune System (AIS) (Er et al., 2012; Cuevas et al., 2012; Haktanir- lar Ulutas & Kulturel-Konak, 2012), to assign proper weight values for NB classification. In our previous study (Wu et al., 2013b), we have successfully proposed an AIS based method to automatically and self-adaptively select optimal terms and values for probability estimation. By proposing evolution com- putation for attribute weighting, our method in this paper will further advance weighted Naive Bayes to ensure that attribute weighting can automatically adapt to different learning tasks. In order to enable adaptive attribute weighting for NB clas- sification, we will propose to use AIS mechanism to adaptively determine attribute weights, where an automated search strat- egy is used to find optimal attribute weight values for each data set. The unique immune system computation processes, including initialization, clone, mutation, and selection, ensure that our method can adjust itself to the data without any explicit specification of functional or distributional form of the underly- ing model. Experiments and comparisons, on 36 UCI machine learning benchmark data sets (Bache & Lichman, 2013) and six image classification data sets (Li & Wang, 2008), demon- strate that the proposed artificial immune systems based weight- ing scheme for Naive Bayes classification (AISWNB) can suc- cessfully find optimal weight combinations for different learn- ing tasks, and its performance consistently outperforms other state-of-the-art NB algorithms. The corresponding superior- ity is demonstrated through three major performance metrics, including classification accuracy, class probability estimation, and class ranking performance (Zhang & Su, 2004; Jiang et al., 2009, 2012a). AISWNB is a self-learning algorithm by utilizing the im- munological properties, such as memory property and clonal selection. In contrast to the conventional statistical probabilis- tic evaluation in NB, the niche and advantages of AISWNB can be understood from the following four aspects: • AISWNB is a data-driven self-adaptive method because it does not require explicit specification of functional or distributional form of the underlying model. • AISWNB is a nonlinear model and is flexible in modeling complex real-world relationships. • AISWNB inherits the memory property of human immune systems and can recognize the same or similar antigen quickly at different times. • AISWNB can self-adaptively select suitable affinity func- tions to meet different types of learning tasks. The remainder of the paper is organized as follows. Section 2 reviews related work. Preliminary concepts and problem state- ments are addressed in Section 3. Section 4 introduces our new AISWNB framework, followed by the experiments in Section 5. We conclude the paper in Section 6 . 2. Related Work By proposing to use artificial immune system based method to search optimal weight values for weighted naive Bayes clas- sification, our method is related to attribute weighting in ma- chine learning and AIS based evolutionary computation. 2.1. Attribute Weighted Methods In real-world learning tasks, attributes often play different roles for classification. Therefore, assigning different weight values to attributes can potentially help improve the classifica- tion performance. During the whole process, the way of learn- ing the attribute weights plays an essential role. In this subsec- tion, we review existing work on attribute weighting by sepa- rating them into two categories: methods which consider each single attribute’s correlation with the class, and methods which consider multiple attributes’ joint correlations with the class. 2.1.1. Attribute Weighting via Single Attribute Correlation Mutual Information (MI) between two random variables pro- vides a quantity measure to evaluate the mutual dependence of two variables. A high MI value indicates a large reduction in uncertainty and low MI reflects a small reduction, and a zero MI value between two random variables means that the variables are independent. Friedman et al. (1997) provides a complete definition of mutual information between a pair of variables. 2 Mutual information has a long history of being used for mea- suring correlation between attributes and the class variable for classification. For instance, Jiang et al. (2012b) applied mutual information help to improve the accuracy of AODE (Averaged One-Dependence Estimators). Jiang et al. (2009) proposed a Hidden Naive Bayes (HNB) classifier, which uses MI based at- tribute weighting method to weight one-dependence estimators. Han et al. (2001) proposed a new algorithm called AWKNN (Attribute Weighted K-Nearest-Neighbor). In our experiments, we will apply mutual information to calculate the weight value between each attribute and the class attribute for WNB, and will use this approach, MIWNB, as a baseline for comparisons. Information Gain (IG), originally used by Quinlan (1993) in the decision tree leaning algorithm, is a commonly used mea- sure to evaluate the correlation of the attribute to the class. A notable drawback of IG is that the resulting score is biased to attributes with a large number of distinct values. Accordingly, information gain ratio (Quinlan, 1993) was proposed to solve the drawback by dividing each attribute’s IG score by the infor- mation encoded in each attribute itself. Zhang & Sheng (2004) argued that an attribute with a higher gain ratio value deserves a larger weight in WNB. In their studies, they proposed a gain ratio weighted method that calculates the weight of an attribute from a data set. 2.1.2. Attribute Weighting via Multiple Attribute Correlation Correlation-based Feature Selection (CFS) for attribute weighing uses a correlation-based heuristic evaluation function as an attribute quality measure (Hall, 2000) to calculate the weight value of each attribute. It uses a best-first search to tra- verse the feature space. CFS starts with an empty set and gener- ates all possible single feature expansions. The subset with the highest evaluation is selected and expanded in the same man- ner by adding new features. If expanding a subset results in no improvement, the search drops back to the next best unex- panded subset and continues from there. The best subset found is returned after the search terminates. The core of CFS is the heuristic process that evaluates the worth or “merit” of a feature subset. Hall (2007) employed this method to evaluate the importance of attributes according to the heuristic “merit” value. Relief is a feature selection method based on attribute estima- tion (Kira & Rendell, 1992). Relief assigns a grade of relevance to each feature by examining the change of the feature values with respect to instances within the same class (i.e., the near- est hit) and instances between classes (i.e., the nearest miss). If a feature’s values remain relatively stable for instances within the same class, the feature will receive a higher weight value. The original Relief only handles binary classification problems. Its extension, Relief-F, can be applied for multi-class classi- fications (Kononenko, 1994). Besides, Tucker et al. (2010) applied Relief-F attribute weighted approach to deal with top- down product, which is an engineering optimization problem. Attribute Correlation-based Weighting is a method which explicitly considers the correlation of each attribute to all other attributes to calculate the attribute’s weight value (Hall, 2007). Selection AntigenB-cell Differentiation M M Proliferation/ Clone Matuation M e m o ry c e lls P la s m a c e lls Figure 1: A conceptual view of immune response in immune systems: A B-cell contains the antibody (the middle rings on the left) that allows it to recognize the antigen (triangle), which denotes pathogenic materials invading to the sys- tem. The binding between B-cell and antigen can be evaluated by using cer- tain affinity (i.e., degree of binding). In a learning system, this resembles to the assessment of how good a solution (i.e., antibody) recognizes/resolves the training data (i.e., antigen). After the recognition, the system will respond and result in proliferation, differentiation, and maturation process of the B-cell as secondary antibodies. The secondary antibodies with high affinity becomes a memory cell, and others become plasma cells. Memory cells are retrained in the system to allow faster response to the same (or similar) attacks in the future (if the body is re-infected by the same pathogenic materials). A large weight value will be assigned to the attributes with strong dependencies on other attributes. In order to estimate each attribute’s dependence, an unpruned decision tree is con- structed from the training instances with a minimum depth, which indicates the depth for testing the tree. The weight as- signed to each attribute is inversely proportional to the mini- mum depth at which they are first tested in an unpruned deci- sion tree. Attributes that do not appear in the tree receive zero weight values. 2.2. Artificial Immune Systems 2.2.1. Human Immune System The human immune system contains two major parts: (1) hu- moral immunity, which deals with infectious agents in the blood and body tissues, and (2) cell-mediated immunity, which deals with body cells that have been infected. In general, the humoral system is managed by B-cells (with help from T-cells), and the cell-mediated system is managed by T-cells. Each cell (B or T) has a unique type of molecular receptor (location in shape space), which allows for the binding of the antigens (shown as triangles in Fig. 1). A higher affinity between the receptor and antigens indicates a stronger binding. In immunology, immune system contains two types of lym- phocyte cells (B- and T-cells), each of which has a unique type of molecular receptor allowing others to bind to them. When pathogens (i.e., biological agents that may cause diseases or ill- ness) invade the body, antibodies which are produced from B- cells are response for the detection/binding of a foreign protein or antigen (i.e., pathogenic materials). Once the binding be- tween B-cells and antigens are established, B-cells will undergo a series of process including proliferation, differentiation, and maturation, and eventually result in memory cells. The memory cells are retrained in the system to allow faster response to the same (or similar) attacks in the future (if the body is re-infected by the same pathogenic materials). This response process could 3 be explained by clonal selection theory, and the conceptual re- view is shown in Fig. 1. In this paper, humoral immunity is delegated to the natural immune system and the action of T- cells is not explained. The clonal selection followed by the B- cells of human immune system is the fundamental mechanism on which Artificial Immune Systems (AIS) is modeled. 2.2.2. AIS: Artificial Immune Systems Artificial Immune Systems denotes a class of evolutionary computational methods which intend to exploit and simulate the functions and behaviors of mammalian immune systems’ learn- ing and memorization capability to solve a learning task. The theme of an AIS is to resemble a biological immune systems’ ability to distinguish foreign molecules (or elements) which can attack/damage the body, and provide a learning system with ca- pability of distinction between self vs. non-self. This capability eventually leads to the assessment of the fitness scores of can- didates with respect to the underlying system. More specifically, AIS consists of three major components, including representation, recognition, and clone selection when dealing with learning algorithms. The representation, known as shape-shape problem, focuses on how to model antibodies and antigens. When the immune system is attacked by antigen, antibodies try to neutralize the infection by binding to the anti- gen through the recognition process. Binding strength, also re- garded as affinity, is used as a threshold for the immune system to respond to the antigen. The clone selection is corresponding to an affinity maturation process, which means immune individ- uals with high affinity will gradually increase during clone and mutation process. At the same time, some immune individuals will polarize into memory individuals. Similar to the AIS, evolutionary algorithms (EAs), such as Genetic Algorithms (GA) (Park & Ryu, 2010), Evolution Strategies (ES) (Huang et al., 2011) and Differential evolution (DE) (Storn & Price, 1997) are all designed based on the basic idea of biological evolution to control, and optimize artificial systems. Evolutionary computation shares many concepts of AIS like a population, genotype phenotype mapping, and pro- liferation of the most fit. On the other hand, AIS models based on immune networks resemble the structures and interactions of neural network models. The key advantages of AIS over neural networks are the benefits of a population of solutions and the evolutionary selection pressure and mutation. Mean- while, the underlying mechanisms are fundamentally different in many aspects. First and foremost, the immune system is highly distributed, highly adaptive, self-organising, maintains a memory of past encounters and has the ability to continu- ously learn about new encounters. AIS is the system devel- oped around the current understanding of the immune system. Second, AIS is a general framework for a distributed adaptive system and could, in principle, be applied to many domains. Compared to most other evolutionary algorithms, AIS is much more simple and straightforward to be implemented, which is important for practitioners from other fields. In addition, be- cause AIS is self-organizing, it requires much less system pa- rameters than other evolutionary computation methods. Some works have also pointed out the similarities and the differences between AIS and other heuristics (Zheng et al., 2010; Aickelin et al., 2013; Castro & Timmis, 2002). In recent years, there has been considerable interests in ex- ploring and exploiting the potential of AIS for applications in computer science and engineering including pattern recognition (Yuan et al., 2012), clustering (de Mello Honorio et al., 2012), optimization (Woldemariam & Yen, 2010), and Remote Sens- ing (Zhong & Zhang, 2012). However, the advantage of AIS for Bayesian classification has received very little attention. In this paper, we propose a new AIS based attribute weighting method for Naive Bayes classification. The performance of this design is validated through numerous performance metrics, including classification accuracy, class probability estimation, and class ranking performance. It is worth noting that some works exist to improve AIS for domain specific problems, such as an im- proved artificial immune system for seeking the Pareto front of land-use allocation problem in large areas (Huang et al., 2013). However, in this paper, we do not consider the improved AIS for WNB. This is mainly because that we aim at proposing a self-adaptive attribute weighting framework based on the im- mune system for WNB, and our designs can be easily general- ized to any AIS based algorithms. 3. Preliminaries and Problem Definition Given a training set D = {x1, · · · , xN} with N instances, each of which contains n attribute values and a class label, we use xi = {xi,1, · · · xi, j, · · · xi,n, yi} to denote the ith instance xi in the data set D. xi, j denotes the jth attribute value of xi and yi denotes the class label of xi. The class space Y = {c1, · · · , ck, · · · , cL} denotes the set of labels that each instance belongs to and ck denotes the kth label of the class space. For ease of understanding, we use (xi, yi) as a shorthand to represent an instance and its class label, and use xi as a shorthand of xi. We also use a j as a shorthand to represent the jth attribute. For an instance (xi, yi) in the training set D, its class label satisfies yi ∈Y, whereas a test instance xt only contains attribute values and its class label yt needs to be predicted by a weighted naive Bayes classification model, which can be formally defined as c(xt) = arg max ck∈Y P(ck) n∏ j=1 P(xt, j |ck) w j (1) In Eq. (1), p(ck) represents the prior probability of class ck in the whole training set. P(xt, j |ck) denotes the conditional proba- bility distribution of attribute xt, j conditioned by the given class ck. w j denotes the weight value of the jth attribute. In this paper, we focus on the calculation of the conditional probability p(xi, j|ck)w j by finding optimal attribute weight val- ues w j, j = 1, · · · , n. While all existing attribute weighting ap- proaches define the weight without considering the uniqueness of the underlying training data, we intend to resolve the opti- mal w value selection problem as an optimization process. As- sume that the calculation of each conditional probability value p(xi, j|ck)w j has an optimal w j value, there are n w j vectors 4 In it ia l P o p u la ti o n Global Optimal Weight Clone (b) (c) Mutation (d) (f) (e) Re-selection of the weight with the best affinity :weight vector (a) individual wi lation is generated through certain random mechanisms. So ∗ (wtc best individual wc, corresponding to the obtained attribute Figure 2: A conceptual view of self-adaptive weighting strategy for AISWNB: An initial population contains many antibodies (i.e., weight vectors w) that allow themselves to recognize antigens (i.e., training instances Da) with certain affinity (i.e., classification performance on Db). After recognition, the system will respond and select the weight vector wtc (in the tth iteration) with the best affinity (a), and then clone it (b) to replace the weight vectors with low affinity (c). After that, the mutation strategy is adopted to maintain the diversity of the weight vectors (d). The mutation population will further replace the old population (e) to reselect the best weight vector as the memory antibody (a). Through the evolutionary process, the global optimal weigh vector wc will be obtained (f). needed for NB classification. As a result, the WNB classifica- tion can be transferred to an optimization problem as follows. w∗ = arg max w j∈w f (xt, w) s.t. 0 ≤ w j ≤ 1 (2) where w = {w1, · · · , w j, · · · , wn} denotes the attribute weight vector for WNB. And f (xt, w) is calculated by Eq. (1). 4. Self-Adaptive Attribute Weighted Naive Bayes 4.1. AIS Symbol Definitions and Overall Framework 4.1.1. AIS Symbol Definitions In this paper, we propose to use AIS to learn optimal attribute weight values for NB classification. In our solution, antigens in AISWNB are simulated as training instances which are pre- sented to the system during the training process. Antibodies represent attribute weight vector w with different set of values (i.e., candidates). The binding of the antibodies and antigens will resemble the fitness of a specific weight vector with re- spect to the given training data. This can be evaluated by using the affinity score. During the learning process, the antibodies with good affin- ity will experience a form of clonal expansion after being pre- sented with the training data sets (analogous to antigens). When antibodies are cloned they will undergo a mutation process, in which a specific mutation function will be designed (and de- ployed). The evolving optimization process of the AIS system will help discover optimal w vector with the best classification performance. Before introducing algorithm details, we briefly define fol- lowing key notations, which will help understand the learning of the weight values using AIS principle. In Table 1, we also summarize the mapping of the symbols between immune sys- tems and AIS based weighting scheme for Naive Bayes. • Antibodies: W represents the set of antibodies, W = {w1, · · · , wL}, where L represents the size of antibodies. wi = {wi,1, · · ·wi, j, · · ·wi,n} represents a single antibody (i.e., attribute weight vector). So wi, j will represent the jth value of the ith antibody wi. • Antigens: Da represents the set of antigens, Da = {xa1, · · · , x a Na }, where Na represents the size of antigens. xai represents a single antigen. In AISWNB, xai denotes an instance in the data set Da. • Affinity: A measure of closeness between antibodies and antigens. In the current implementation, this value is cal- culated as accuracy (ACC), the area under the ROC curve (AUC), or conditional log likelihood (CLL) on a given data set, when dealing with classification accuracy, rank- ing, and probability estimation learning tasks, respectively. • Memory Cell: wc represents the memory cell for the an- tibody which has the best affinity (i.e., best classification performance on the test data sets Db = {xb1, · · · , x b Nb }. • Clone rate: An integer value used to determine the number of mutated clones for a given antibody (weight vector). Specifically, a selected antibody is allowed to produce up to mutated clones with clone rate value after responding to a given antigen set. • Mutation rate: A parameter between 0 and 1 that indicates the probability of an antibody being mutated. For a given antibody, its mutation rate is equal to 1 minus its affinity. By doing so, the antibody with high affinity will have a low probability of being mutated. 4.1.2. AISWNB Overall Framework A conceptual view of the proposed self-adaptive weighting strategy for AISWNB is shown in Fig. 2. In our settings, we 5 Table 1: Symbol mapping between Immune System and AISWNB. Immune systems AISWNB Antibody Attribute weight vector w Antigens Training instances in Da Shape-space Possible values of the data vectors Affinity The fitness of the weight vector w on the testing datasets Clonal Expansion Reproduction of weight vectors that are well matched with antigens Affinity Maturation Specific mutation of w vector and removal of lowest stimulated weight vectors. Immune Memory Memory set of mutated weight vectors use antibody to simulate weight vector w of the naive Bayes models, so an initial population of random antibodies, which correspond to a set of random weight value vectors W, are se- lected. The antibodies will recognize the antigens (which cor- respond to the training instances Da) with certain affinity (i.e., classification performance on Db). This recognition process re- sembles to the assessment of evaluating how good the weigh value solutions fit the underlying training data. After the recognition process, the system will respond and select the weight vector wtc with a good affinity, which corre- sponds to step (a) in Fig. 2, and then clone some high promising weight vectors to replace some weight vectors with low affin- ity values. After that, the mutation strategy is carried out to maintain the diversity of the weight vectors, as shown on step (d) of Fig. 2. The mutation population will further replace the old population to reselect the best weight vector as the memory antibody. Through the repetitive evolutionary process, the final optimal weigh vector wc will be obtained, as show on step (f) in Fig. 2. The performance of a classifier is often measured by clas- sification accuracy (ACC). So one can use ACC to calculate the affinity as shown in the above process. In reality, many data mining applications also require the calculation of the class distributions and ranking of the classes, in addition to the classification accuracy (Zhang & Su, 2004; Jiang et al., 2009; Wu & Cai, 2014). In recent years, the area under the ROC curve (AUC) has been used by machine learning and data mining community, and researchers believe (Ling et al., 2003) that AUC is a more discriminant evaluation method than error rate for learning algorithms that also produce class probability estimations. For class probability estimation, conditional log likelihood (CLL) has also been used to evaluate the quality of class probabilities from a classifier (Grossman & Domingos, 2004). Meanwhile, some existing research works (Zhang & Sheng, 2004; Hall, 2007) have proposed to use attribute weight- ing to improve the AUC performance of NB. However, attribute weighting for Naive Bayes on accuracy performance or class probability estimation has received very little attention. Ac- cording to the experimental analysis in Section 5.2.1, most ex- isting attribute weighting approaches cannot work well on all the above mentioned three learning tasks. This is mainly be- cause that the attribute subset evaluation used in the traditional attribute weighted Naive Bayes (e.g., SBC and CFS), intends to maximize the classification accuracy, which may lead to mis- matching between the learning process and the other learning goals (Jiang et al., 2012a). In order to address this challenge, the affinity function in AISWNB can be dynamically adjusted to match the learning process and the learning goal. The de- tails related to the affinity function will be addressed in Section 4.2.2. 4.2. AISWNB: AIS based Attribute Weighted Naive Bayes The proposed AISWNB is achieved through the following two major steps: (1) use AIS algorithm to train models from the training instances, with the purpose of obtaining optimal at- tribute weight values; and (2) the test instances are classified by the AISWNB classifiers with the learned attribute weight val- ues. Algorithm 1 reports the details of the proposed AISWNB framework, which is described as follows: 4.2.1. Initialization During the initialization process, we generate a set of L weight candidates: W = {w1, · · · , wL}, where each individual wi = {wi,1, · · ·wi, j, · · ·wi,n} represents an antibody (i.e. a weight value vector with wi, j representing the weight value for the jth attribute). To generate random weight values for all candidates, we set each wi, j value as a uniformly distributed random vari- able within range (0, 1]. In our experiments (detailed in Section 5), we use 80% of instances in a given data set D as the antigens set Da to learn optimal weight values wc, and the remaining in- stances are used as the test set Db. 4.2.2. AISWNB Evaluation The AISWNB evaluation process intends to resemble the recognition and the evolution process of the immune systems to find good antibodies (i.e., weight vectors) as shown in Fig. 2. In a weighted NB learning context, the above process corresponds to finding and selecting good weight vectors, and then apply- ing clone and mutation actions to the selected weight vectors to generate new candidates. Some newly generated good can- didates will further be retained to train weighted naive Bayes networks. In the following, we briefly explain the actions in each indi- vidual step: • Calculation of affinity function: For the learning task concerning about the maximization of the classification accuracy (ACC), the affinity of the ith individual of the tth generation wti can be obtained by applying the current attribute weight vector wti to the WNB model, and then evaluate its affinity function as follows, f [wti] = 1 Nb Nb∑ i=1 δ[c(xbi ), y b i ] (3) In Eq. (3), c(xbi ) is the classification result of the ith in- stance in a test data set Db with Nb instances, by using an AISWNB classifier with attribute weight values wti. y b i is 6 Algorithm 1 AISWNB (Weighted Naive Bayes by AIS) Input: Clone Factor c; Threshold T ; Maximum Generation MaxGen; Antibody Population W; Antigen Population Da; Test affinity set Db; Output: The target class label c(xt ) of test instance xt ; 1: W ← The wi, j value of wi for each individual is initialized using a random number distributed between (0, 1]. 2: while t ≤ MaxGen and f [wt+1c ] − f [w t c] ≤ T do 3: f [wti ] ← Apply antigen population D a, test affinity set Db w.r.t. antibody wti , and calculate the affinity of w t i . 4: wtc ← Apply the sequence of each f [w t i ] to the whole antibody population Wt and find the wtc with the best affinity. 5: (Wr )t ← Select the temporary antibodies set with the lowest affinity with clone factor c. 6: (Wc)t ← Clone wtc with clone factor c and obtain clone anti- body set. 7: Wt ← [Wt − (Wr )t ] ∪ (Wc)t ; 8: for all each wti in W t do 9: vt+1i ← Apply w t c and a normally distributed random variable N(0,1) to wti and obtain the mutation individual. 10: wt+1i ← Apply v t+1 i to w t i and obtain the new individual in t + 1th generation. 11: end for 12: end while 13: c(xt ) ← Apply wc to instance xt to predict the underlying class label . the actual class label of the ith instance. δ[c(xbi ), y b i ] is one if c(xbi ) = y b i and zero otherwise. For learning tasks concerning about other types of learning goals, such as the ranking of the classes or the estimation of class probability distributions, the affinity function can be changed by using the corresponding evaluation crite- ria, AUC or CLL. So the attribute weight values can be learned in line with the underlying learning goals. The corresponding details are addressed in Section 5.1.3. • Antibody Selection: We sort the individuals in the initial antibody population according to the affinity of each indi- vidual, and choose the individual wtc with the best affinity performance in the tth generation as the memory antibody. • Antibody Clone: To ensure that the population size of every generation is fixed, the best individual wtc will be cloned under the clone factor c. After that, we use the clone set to replace the individuals with low affinity ac- cording to the same rate c. • Antibody Mutation: Using the mutation operation to treat the individuals in tth generation Wt. It means that we obtain the middle generation composed with the new variation individuals from the parent generation. For any individual wti from the tth generation, the new variation individual vt+1i can be generated as follows: vt+1i = w t i + F ∗ N(0, 1) ∗ (w t c − w t i) (4) Among them, N(0,1) is a normally distributed random variable within the range [0,1]. F, as the variation factor during the process of evolution, can be adaptively obtained according to the different clones (Zhong & Zhang, 2012). F = 1 − f [wti] (5) where f [wti] denotes the affinity of the ith individual from the tth generation. 4.2.3. AISWNB Updating To determine whether a variation individual vt+1i can replace a target individual vector wti, as a new individual w t+1 i for the t + 1th generation, AISWNB adopts a greedy search strategy. The target individual wti is replaced by v t+1 i , if and only if v t+1 i ’s affinity is better than that of wti. In addition, the system also chooses the individual wt+1c with the best affinity performance in the t + 1th generation as a new memory antibody. An unabridged evolutionary process for the population in- cludes Evaluation and Update, which continuously repeats un- til (1) the algorithm surpasses the pre-set maximum number MaxGen, or (2) the results obtained from two consecutive it- erations are less than the threshold (i.e., T). After obtaining the best individual wc (i.e., attribute weight value vector), we use the weight values to build a WNB classifier to classify test data. 4.3. Time Complexity The time complexity of AISWNB is mainly attributed to the following two processes: (1) evaluation of AISWNB, and (2) updating of the weight values. Prior to the evaluation of AISWNB model, an NB classi- fier needs to be trained from Da with Na instances, which will take O(Na · n), where n is the number of attribute (NB needs to scan the whole training set and build prior probabilities for all classes and conditional probabilities for all n attributes). For the weight population W in each generation, the calculation of affinity function for each weight individual w ∈ W is similar to testing an NB classifier on a test set Db with Nb instances, which will take O(Nb · n · L), where L is the size of the weight populations (i.e. the number of weight vectors). The rest four operations (e.g., selection, clone, mutation, and update) are all based on weight vectors. The corresponding time complexity is O(L · logL). Assume the average number of evolution genera- tions is M, the total time complexity U is given by Eq. (6). U = O(Na · n) + M × [O(Nb · n · L) + O(L · logL)] (6) Because Na+Nb = N, where N is the total number of training data, Eq. (6) can be rewritten as U =O[(N − Nb) · n] + O(Nb · n · L · M) + O(L · logL · M) ≤O(N · n) + O(Nb · n · L · M) + O(L · logL · M) ≤O(N · n · L · M) + O(L · logL · M) ≤O(N · n · L2 · M) (7) Eq. (7) shows that the total time complexity of AISWNB is bounded by four important factors: (1) the total number of train- ing samples N; (2) the number of the attribute n; (3) the size of 7 weight pollution L; and (4) the average number of evolution generations M. In our experiments, we use a threshold T to automatically determine the termination process by following the principe that if the results obtained from two consecutive iterations are less than T , the algorithm will terminate. This process will further reduce the number of iterations and save computational costs. 5. Experiments 5.1. Experimental Settings 5.1.1. Benchmark Data and Parameters We implement the proposed method using WEKA (Witten & Frank, 2005) data mining tool and validate its performance on 36 benchmark data sets from UCI data repository (Bache & Lichman, 2013) and six image classification data sets from Corel Image repository (Li & Wang, 2008). Because Bayesian classifiers are designed for categorical attributes, in our exper- iments, we first replace all missing attribute values using the unsupervised attribute filter ReplaceMissingV alues in WEKA. Then, we apply unsupervised filter Discretize in WEKA to dis- cretize numeric attributes into nominal attributes. The similar data preprocessing could also be found in previous works (Jiang et al., 2009, 2012a). The three parameters L, M and T in Algorithm 1 are set to 50, 50, and 0.001, respectively. All results are obtained via 10 runs of 10-fold cross validation, and our algorithm is carried out on the same training data sets and evaluated on the same testing data. Moreover, all experiments are conducted on a Linux clus- ter node with an Interl(R) Xeon(R) @3.33GHZ CPU and 3GB fixed memory size. 5.1.2. Baseline Methods For comparison purposes, we compare AISWNB with the following baseline methods: • NB: A standard Naive Bayes classifier with conditional at- tribute independence assumption (Friedman et al., 1997); • CFSWNB: An attribute weighted Naive Bayes based on correlation-based feature selection (Hall, 2000); • GRWNB: An attribute weighted Naive Bayes using gain ratio based feature selection (Zhang & Sheng, 2004); • MIWNB: An attribute weighted Naive Bayes using mu- tual information weighted method for feature selec- tion (Jiang et al., 2012b); • ReFWNB: An attribute weighted Naive Bayes using a feature selection method based on attribute estima- tion (Robnik-Šikonja & Kononenko, 2003); • TreeWNB: An attribute weighted Naive Bayes with the weighting method according to the degree to which they depend on the values of other attributes (Hall, 2007); • SBC: A bagged decision-tree based attribute selection fil- ter for Naive Bayes (Langley & Sage, 1994); Table 2: Detailed information of the 36 UCI benchmark data sets Data set Instances Attributes Classes Missing Numeric anneal 898 39 6 Y Y anneal.ORIG 898 39 6 Y Y audiology 226 70 24 Y N autos 205 26 7 Y Y balance-scale 625 5 3 N Y breast-cancer 286 10 2 Y N breast-w 699 10 2 Y N colic 368 23 2 Y Y colic.ORIG 368 28 2 Y Y credit-a 690 16 2 Y Y credit-g 1000 21 2 N Y diabetes 768 9 2 N Y Glass 214 10 7 N Y heart-c 303 14 5 Y Y heart-h 294 14 5 Y Y heart-statlog 270 14 2 N Y hepatitis 155 20 2 Y Y hypothyroid 3772 30 4 Y Y ionosphere 351 35 2 N Y iris 150 5 3 N Y kr-vs-kp 3196 37 2 N N labor 57 17 2 Y Y letter 20000 17 26 N Y lymph 148 19 4 N Y mushroom 8124 23 2 Y N primary-tumor 339 18 21 Y N segment 2310 20 7 N Y sick 3772 30 2 Y Y sonar 208 61 2 N Y soybean 683 36 19 Y N splice 3190 62 3 N N vehicle 846 19 4 N Y vote 435 17 2 Y N vowel 990 14 11 N Y waveform-5000 5000 41 3 N Y zoo 101 18 7 N Y • RMWNB: An attribute weighted Naive Bayes with the at- tribute weights randomly selected from (0, 1]; 5.1.3. Evaluation Criterion In our experiments, the selected algorithms are evaluated us- ing three performance metrics, including classification accu- racy (measured by ACC), class ranking performance (measured by AUC), and class probability estimation (measured by CLL). The ACC of each method is calculated by the percentage of correctly predicted samples in the test set. In some data mining applications, learning a classifier with accurate class ranking or class probability distributions is also desirable (Zhang & Su, 2004). For example, in direct market- ing, the limitation of the resources only allows promotion of the top x% customers during gradual roll-out, or different promo- tion strategies are deployed for customers with different likeli- hood of buying certain products. To accomplish these learning tasks, ranking customers according to their likelihood of buy- ing is more useful than simply classifying customers as: buyer or non-buyer (Jiang et al., 2012a). To evaluate the classifier performance in terms of class ranking and class probability dis- tributions, we use AUC and CLL, where AUC of the classifier is calculated as follows: 8 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AISWNB N B (a) AISWNB vs. NB 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AISWNB S B C (b) AISWNB vs. SBC 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AISWNB C F S W N B (c) AISWNB vs. CFSWNB 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AISWNB G R W N B (d) AISWNB vs. GRWNB 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AISWNB M IW N B (e) AISWNB vs. MIWNB 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 AISWNB R e F W N B (f) AISWNB vs. ReFWNB 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AISWNB T re e W N B (g) AISWNB vs. TreeWNB 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AISWNB R M W N B (h) AISWNB vs. RMWNB Figure 3: AISWNB vs. competing algorithms: classification accuracy (ACC). Table 3: Detailed experimental results on classification accuracy (ACC) and standard deviation %. Data set AISWNB NB SBC CFSWNB GRWNB MIWNB ReFWNB TreeWNB RMWNB anneal 96.99±2.02 94.32±2.23 • 94.03±2.37 • 93.35±2.19 • 94.78±2.29 • 86.01±3.97 • 94.28±1.99 • 89.73±1.94 • 93.31±2.74 • anneal.ORIG 91.08±2.92 88.16±3.06 • 84.66±3.74 • 87.81±2.55 • 85.21±4.23 • 77.45±4.33 • 76.54±0.84 • 88.11±2.53 • 84.16±3.63 • audiology 75.49±8.00 71.40±6.37 70.91±7.09 69.84±6.46 • 71.58±6.76 70.82±6.99 61.22±6.63 • 71.92±6.67 62.67±9.22 • autos 70.13±11.42 63.97±11.35 • 70.10±9.38 68.41±10.53 65.14±10.92 65.52±11.44 65.58±9.57 67.76±10.51 64.01±11.24 • balance-scale 91.41±1.32 91.44±1.30 91.44±1.30 82.13±3.32 • 90.27±1.89 • 90.27±1.91 • 87.71±3.11 • 90.03±1.99 • 81.01±7.58 • breast-cancer 72.91±7.98 72.94±7.71 73.25±7.60 71.73±7.40 71.07±6.30 70.63±8.76 70.30±1.37 72.39±7.47 72.60±6.61 breast-w 97.24±1.68 97.30±1.75 97.30±1.75 97.21±1.84 97.28±1.78 97.33±1.77 97.25±1.91 97.34±1.81 96.94±2.00 colic 81.46±6.07 78.86±6.05 82.28±5.86 83.45±5.35 81.77±6.12 83.56±5.79 84.07±5.27 83.64±5.47 78.78±6.22 colic.ORIG 73.99±7.22 74.21±7.09 74.57±5.85 74.61±6.62 75.29±5.92 71.78±6.70 66.20±1.37 • 76.00±6.53 73.04±6.39 credit-a 85.13±3.90 84.74±3.83 85.75±4.16 86.14±4.06 85.51±3.96 85.51±3.96 85.67±3.97 86.46±3.85 82.32±5.85 credit-g 75.80±3.59 75.93±3.87 72.43±3.61 • 76.13±3.64 74.06±2.85 70.43±4.55 • 70.00±0.00 • 76.14±3.62 73.23±3.63 diabetes 75.86±4.87 75.68±4.85 75.93±5.07 77.02±4.87 75.03±3.95 75.94±5.49 65.16±0.47 • 76.91±5.07 73.70±5.30 glass 57.74±10.16 57.69±10.07 56.22±10.36 56.70±9.79 57.87±9.28 56.63±9.84 55.56±8.67 57.44±9.37 55.49±9.63 heart-c 82.41±6.66 83.44±6.27 84.20±6.37 83.64±6.37 84.14±6.19 83.50±6.60 83.20±6.83 83.57±6.04 81.48±7.15 heart-h 82.42±5.98 83.64±5.85 82.59±6.40 83.31±6.46 83.15±6.77 81.75±6.58 80.04±5.91 83.34±6.28 81.47±7.66 heart-statlog 83.52±6.19 83.78±5.41 84.19±6.07 84.22±5.99 83.59±6.52 83.41±6.08 81.44±5.97 84.04±5.90 81.85±6.24 hepatitis 84.52±9.61 84.06±9.91 83.60±9.77 84.52±9.22 82.97±9.89 85.15±9.45 82.28±4.54 83.35±8.24 85.22±8.87 hypothyroid 93.42±0.62 92.79±0.73 • 93.52±0.48 93.60±0.51 93.33±0.45 75.98±2.04 • 93.30±0.44 93.58±0.50 93.37±0.53 ionosphere 90.69±4.05 90.86±4.33 90.89±4.72 92.20±3.94 92.02±4.19 92.00±4.08 91.08±4.18 92.00±4.06 90.66±4.74 iris 94.87±6.28 94.33±6.79 96.87±4.29 95.33±5.40 95.93±4.73 95.93±4.73 96.07±4.65 95.53±5.19 91.87±7.38 kr-vs-kp 95.84±1.55 87.79±1.91 • 92.38±1.56 • 91.22±1.45 • 89.79±1.63 • 90.83±1.72 • 90.51±1.60 • 94.21±1.29 • 82.63±4.99 • labor 95.80±8.73 96.70±7.27 84.97±12.91 • 90.20±11.28 92.30±9.98 90.57±11.18 88.63±12.10 88.10±12.66 92.17±11.15 letter 67.75±2.17 65.80±2.04 • 67.32±2.22 66.23±2.07 • 68.03±2.10 68.33±2.15 68.50±1.99 65.99±2.08 • 60.17±3.54 • lymph 84.94±8.42 85.97±8.88 82.72±9.39 83.00±9.35 81.03±9.15 82.85±8.96 78.78±8.66 • 82.20±10.01 83.33±9.51 mushroom 98.32±0.99 93.58±2.03 • 98.12±1.14 98.28±1.04 98.33±0.98 98.21±1.01 97.83±1.25 97.82±1.10 93.54±3.18 • primary-tumor 46.52±6.28 47.20±6.02 45.78±6.84 43.78±5.05 46.77±6.08 45.64±6.93 24.78±1.47 • 45.54±5.39 42.60±6.44 segment 91.54±1.82 89.03±1.66 • 90.46±2.10 • 89.92±1.76 • 85.82±2.06 • 86.23±1.89 • 85.57±2.03 • 90.48±1.56 • 85.56±3.95 • sick 97.27±0.79 96.78±0.91 • 96.81±0.89 97.30±0.88 96.47±0.93 • 96.26±0.95 • 96.64±0.71 • 96.94±0.92 95.58±1.38 • sonar 76.76±10.78 76.35±9.94 73.54±9.45 75.71±9.58 74.99±9.54 76.14±9.55 76.39±9.45 75.36±8.81 75.22±9.93 soybean 92.94±2.54 92.20±3.23 91.00±3.31 • 92.61±2.82 90.61±3.40 • 91.74±3.28 88.58±3.46 • 92.85±2.90 89.90±3.56 • splice 95.71±1.07 95.42±1.14 95.84±1.03 95.39±1.28 94.08±1.28 • 94.50±1.21 • 88.30±2.44 • 96.14±1.03 92.61±2.17 • vehicle 62.81±3.81 61.03±3.48 56.32±4.01 • 60.59±3.57 60.39±3.35 58.91±3.35 • 61.16±3.31 61.75±3.44 60.11±3.42 vote 94.35±3.67 90.21±3.95 • 94.46±2.81 93.49±3.55 93.03±3.57 93.01±3.54 95.24±2.91 94.83±3.01 89.86±4.56 • vowel 67.26±4.86 66.09±4.78 62.75±5.10 • 65.32±4.59 64.82±4.63 64.65±4.66 66.14±4.67 66.65±4.73 57.41±7.79 • waveform-5000 82.60±3.52 79.80±2.97 • 79.71±2.97 77.78±2.92 • 78.71±3.05 • 78.65±3.05 • 80.43±3.38 79.56±3.00 • 78.38±3.52 • zoo 95.95±5.62 94.37±6.79 91.51±7.68 • 95.15±4.98 95.55±5.30 94.39±5.96 94.25±4.93 90.65±7.29 • 92.58±7.25 • :Statistically significant degradation. E = P0 − t0(t0 + 1)/2 t0t1 (8) where t0 and t1 are the number of negative and positive in- stances, repressively. P0 = ∑ ri, with ri denoting the rank of the ith negative instance in the ranked list. It is clear that AUC 9 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 AISWNB N B (a) AISWNB vs. NB 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 AISWNB S B C (b) AISWNB vs. SBC 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 AISWNB C F S W N B (c) AISWNB vs. CFSWNB 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 AISWNB G R W N B (d) AISWNB vs. GRWNB 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 AISWNB M IW N B (e) AISWNB vs. MIWNB 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 AISWNB R e F W N B (f) AISWNB vs. ReFWNB 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 AISWNB T re e W N B (g) AISWNB vs. TreeWNB 0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 AISWNB R M W N B (h) AISWNB vs. RMWNB Figure 4: AISWNB vs. competing algorithms: area under the ROC curve (AUC). Table 4: The detailed experimental results on area under the ROC curve (AUC) and standard deviation %. Data set AISWNB NB SBC CFSWNB GRWNB MIWNB ReFWNB TreeWNB RMWNB anneal 98.88±1.67 98.76±1.84 98.27±2.54 98.66±1.89 98.63±1.90 98.82±1.73 98.77±1.77 98.58±1.96 98.74±1.81 anneal.ORIG 97.90±3.22 96.79±5.42 95.68±6.71 96.61±6.55 97.59±3.91 97.32±4.28 95.21±7.68 96.58±6.09 96.46±4.59 • audiology 84.32±1.56 83.85±1.44 84.05±1.61 84.06±1.59 83.75±1.45 84.26±1.56 83.98±1.58 83.99±1.59 83.43±1.37 • autos 94.68±3.27 91.96±3.32 • 94.26±3.13 93.78±3.40 92.98±3.68 93.27±3.73 93.50±3.48 94.44±3.70 92.39±3.85 balance-scale 89.76±4.24 85.00±4.03 • 85.00±4.03 • 76.84±4.78 • 76.29±3.84 • 83.64±3.79 • 74.67±3.75 • 82.40± 4.07 • 71.81±6.47 • breast-cancer 67.02±14.05 71.32±13.81 ◦ 69.45±14.71 69.93±13.90 70.10±14.49 70.61±13.08 67.13±12.81 70.98±13.89 ◦ 66.31±15.36 breast-w 99.13±0.94 99.23±0.83 99.23±0.83 99.19±0.87 99.20±0.88 99.16±0.89 99.19±0.90 99.29±0.76 98.89±1.21 colic 87.08±5.05 84.42±5.45 • 86.91±7.08 87.72±5.92 88.35±5.47 88.23±6.13 88.72±5.32 88.06±5.56 84.75±4.97 colic.ORIG 82.93±7.23 81.70±7.23 81.19±5.32 84.58±5.29 84.75±4.77 82.10±4.97 83.15±4.33 85.11±5.78 78.96± 6.56 credit-a 91.74±3.72 91.97±3.14 91.66±3.45 92.34±3.28 92.22±3.64 91.95±3.31 91.83±3.12 92.07±3.33 91.62±3.43 credit-g 79.22±5.20 79.42±4.52 74.26±5.55 • 79.13±5.01 77.90±5.62 78.15±5.84 77.91±6.15 79.64±4.85 78.45± 4.12 diabetes 84.44±4.68 82.74±4.94 • 84.08±5.04 84.10±4.66 84.00±4.80 83.81±4.80 83.79±4.64 83.67±4.81 78.49±4.81 • glass 88.43±3.18 82.63±6.07 • 83.33±7.29 84.25±4.60 • 81.84±5.59 • 84.97±6.12 85.63±4.36 • 82.90±5.23 • 81.11±5.19 • heart-c 84.17±0.66 84.17±0.50 84.05±0.61 84.13±0.54 84.07±0.57 84.05±0.57 84.15±0.56 84.14±0.55 83.79±0.67 heart-h 83.97±0.51 83.92±0.63 84.07±0.68 83.91±0.68 83.97±0.72 83.99±0.72 83.91±0.65 83.82±0.72 83.56±0.81 heart-statlog 91.00±4.80 91.33±5.15 89.61±5.01 91.39±4.17 91.28±4.84 91.28±4.13 91.56±3.94 91.39±4.26 88.28± 6.60 hepatitis 88.11±10.12 89.90±8.24 86.29±11.02 88.07±9.17 87.79±9.55 88.46±9.07 87.35±8.74 86.79±9.78 86.61±10.60 hypothyroid 89.05±9.74 87.53±9.20 • 85.43±8.19 • 86.96±8.70 • 86.30±9.11 • 88.46±9.98 83.68±7.65 • 87.03±8.80 • 85.60±8.07 • ionosphere 95.93±2.75 93.90±3.21 93.14±4.49 94.99±2.82 94.86±2.97 94.54±3.05 94.39±3.31 95.55±2.51 94.44± 2.89 iris 99.20±1.80 98.93±2.16 99.40±1.27 99.20±1.80 99.20±1.80 99.20±1.80 99.20±1.80 99.20±1.80 97.87±4.08 kr-vs-kp 99.29±0.36 95.20±1.28 • 96.94±0.83 • 97.75±0.92 • 98.10±0.77 • 97.95±0.81 • 98.48±0.69 • 98.72±0.67 • 87.00±2.22 • labor 98.75±3.95 98.75±3.95 80.63±33.13 100.00±0.00 100.00±0.00 92.50±23.72 90.00±27.51 96.25±8.44 95.00±12.08 letter 96.18±0.62 95.68±0.73 • 96.13±0.57 95.44±0.71 • 96.10±0.61 96.20±0.59 95.41±0.67 • 95.60±0.71 • 94.47±0.81 • lymph 94.88±4.92 95.01±4.87 94.64±4.87 95.30±5.32 94.67±4.71 94.81±4.92 92.99±4.15 94.10±4.65 94.68±4.74 mushroom 99.98±0.23 99.59±0.17 • 98.88±0.80 • 99.82±0.15 • 99.81±0.16 • 99.91±0.07 • 99.87±0.11 99.92±0.10 99.67±0.28 • primary-tumor 85.07±2.67 85.05±2.96 84.81±2.92 85.44±2.17 85.76±2.21 85.40±2.53 85.20±1.82 85.34±2.54 84.33± 2.64 segment 99.30±0.27 98.37±0.52 • 98.72±0.55 • 98.51±0.55 • 97.97±0.62 • 97.95±0.64 • 97.90±0.63 • 98.59±0.46 • 98.16±0.53 • sick 97.33±1.41 95.92±2.48 94.10±2.89 • 95.88±2.66 95.67±2.72 95.32±3.09 96.04±2.66 96.11±2.66 91.82±2.61 • sonar 87.86±10.08 86.79±9.83 81.11±11.83 • 84.52±10.52 84.61±10.17 85.27±10.16 85.47±10.10 83.05±9.80 86.03±10.79 soybean 99.97±0.07 99.90±0.07 • 99.87±0.11 99.93±0.07 99.91±0.07 99.92±0.06 99.89±0.07 99.92±0.06 99.68±0.19 • splice 99.50±0.21 99.41±0.22 • 99.45±0.26 99.45±0.27 99.27±0.29 • 99.32±0.27 • 99.40±0.27 99.54±0.22 98.34±0.34 • vehicle 84.41±3.74 80.85±3.73 • 78.53±4.19 • 80.57±4.24 • 79.21±4.19 • 78.91±3.95 • 80.64±4.60 • 81.50±3.92 • 80.46±3.73 • vote 98.93±1.10 96.79±1.95 • 98.25±1.64 98.02±1.44 97.82±1.58 97.94±1.47 98.53±1.35 98.56±1.12 95.15±2.45 • vowel 96.57±0.63 96.19±0.72 • 95.33±0.83 • 96.18±0.90 95.99±0.99 95.80±1.08 96.20±0.92 96.31±0.78 94.81±0.95 • waveform-5000 95.79±1.50 95.41±1.36 95.38±1.43 95.44±1.45 94.97±1.46 94.94±1.50 96.28±1.16 95.85±1.28 94.12±1.86 • zoo 98.57±1.66 98.57±1.66 97.86±2.37 98.57±1.66 98.57±1.66 99.05±1.23 98.57±1.66 97.86±2.37 98.57±1.66 ◦,•: Statistically significant upgradation and degradation, respectively. is essentially a measure of the quality of ranking. The above measure can only deal with two-class problem. For multiple classes, Hand & Till (2001) proposes an improved AUC calcu- lating measure: E′ = 2 g(g − 1) ∑ i< j