A redundancy-removing feature selection algorithm for nominal data Submitted 4 June 2015 Accepted 10 September 2015 Published 14 October 2015 Corresponding author Zhihua Li, zhli@jiangnan.edu.cn Academic editor Feiping Nie Additional Information and Declarations can be found on page 15 DOI 10.7717/peerj-cs.24 Copyright 2015 Li and Gu Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS A redundancy-removing feature selection algorithm for nominal data Zhihua Li1 ,2 ,3 ,4 and Wenqu Gu2 ,3 1 Key Laboratory of Advanced Process Control for Light Industry Ministry of Education, JiangSu, China 2 Engineering Research Center of Internet of Things Technology, Application Ministry of Education, JiangSu, China 3 Department of Computer Science, Engineering School of Internet of Things Engineering, Jiangnan University, JiangSu, China 4 Department of Computer Science, Georgia State University, Atlanta, GA, United States of America ABSTRACT No order correlation or similarity metric exists in nominal data, and there will always be more redundancy in a nominal dataset, which means that an efficient mutual information-based nominal-data feature selection method is relatively difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance. Subjects Data Mining and Machine Learning, Data Science Keywords Nominal data, Feature selection, Redundancy-removing, Mutual information INTRODUCTION There are two main feature reduction approaches in data analysis, feature extraction and feature selection (Jain, Duin & Mao, 2000). Feature extraction aims at creating new features that are based on transformations or combinations of the raw feature set, and feature selection means selecting one group of the most efficient features from a certain dataset according to certain evaluations that are based on the goodness of a feature, with the purpose of decreasing the feature dimensionality (Jain, Duin & Mao, 2000; Tesmer & Estévez, 2004; John, Kohavi & Pfleger, 1994). This approach is one of the major methods How to cite this article Li and Gu (2015), A redundancy-removing feature selection algorithm for nominal data. PeerJ Comput. Sci. 1:e24; DOI 10.7717/peerj-cs.24 mailto:zhli@jiangnan.edu.cn https://peerj.com/academic-boards/editors/ https://peerj.com/academic-boards/editors/ http://dx.doi.org/10.7717/peerj-cs.24 http://dx.doi.org/10.7717/peerj-cs.24 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 of feature reduction for high-dimensional data. The evaluation basis includes (Tesmer & Estévez, 2004) various distance measurements (Kira & Rendel, 1992), a dependency measurement (Modrzejejew, 1993), a consistency measurement (Almuallim & Dietterich, 1991), a probability density measurement (Battiti, 1994), a rough set measurement (Hu, Xie & Yu, 2007), an information measurement (Kwak & Ch, 2002; Kwak & Choi, 2002; Torkkola, 2003) and some other derivations such as based on optimization strategies (Hou, Nie & Li, 2014). Regardless of which evaluation basis is taken, the goal is to keep the number of selected features as small as possible, to avoid increasing the computational cost of the learning algorithm as well as the classifier complexity (Tesmer & Estévez, 2004). There have been numerous studies in the literature about various feature selection algorithms that depend on different evaluation bases. Among them, the information theory-based feature selection algorithm that operates with respect to the selected features and raw dataset can involve less work while processing the data after the optimization transformation and the maximization of the mutual information (MI) between class labels. The mutual information-based feature selection MIFS (Battiti, 1994) algorithm is based on this basis, which utilizes greedy selection to guarantee that candidate features of the evaluation function can satisfy the final effective features. After studying the cases of unbalance of the evaluation function in the MIFS algorithm, the MIFS-U (Kwak & Choi, 2002) algorithm was proposed. The adaptive feature selection criterion was studied, and then the AMIFS (Tesmer & Estévez, 2004) algorithm was presented to cater to feature selection in high-dimensional data. Considering the max-dependency and min-redundancy as a whole, the mRMR (Peng, Long & Ding, 2005) algorithm was given. By reconstructing the evaluation function, the NMIFS (Estévez, Tesmer & Perez, 2009) algorithm was then proposed. Both the mRMR and NMIFS algorithms can distinctly decrease the redundancy in the selected features. However, these algorithms also have their disadvantages. For example, MIFS (Battiti, 1994) and MIFS-U (Kwak & Choi, 2002) fail to consider the mutual information between the candidate features with selected subsets and class labels as well as the influences of the classification results. According to the aforementioned algorithms, their approximation computing of MI can perform only continuous-attributed data. Nominal data exist in a broad range of practical applications. This kind of data is typically characterized by having no order information, being discrete, and having semantics (Chow, Wang & Ma, 2008). No similarity metric or order correlation (Chow, Wang & Ma, 2008; Tang & Mao, 2005; Minho & Ramakrishna, 2009) exists in nominal data. The “distance” in pattern recognition is hard to identify in it, which makes the measurement of the similarity or dissimilarity difficult. Given the existing characteristics of nominal data, some problems in the feature selection appear. Due to having non-order information and discrete and non-metric data distributions, the features of different classes even intersect with one another (Chow, Wang & Ma, 2008; Li, Yang & Gu, 2013; Minho & Ramakrishna, 2009). Thus, most well-known feature selection algorithms are unsuitable for nominal-data feature selection or nominal-data feature extraction. Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 2/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Considering the above disadvantages specifically and aiming at the nominal-data feature selection and its specificity, this paper presents the new scheme of MI-based More Relevance Less Redundancy (MRLR) through the redefinition of the features’ information amounts, the relevance degree between the features and the conditional MI as well as re-construction of the corresponding new approximation computation method for MI with respect to nominal data. On the other hand, through studying the evaluation function and specifically the insufficient consideration of redundancy between features in most evaluation functions, this paper also creates a new evaluation function for nominal data. The new evaluation function not only considers the correlation between features and class labels but also accounts for the mutual correlation between features. In this way, the computation of MI for nominal-data features can be solved, and at the same time, an overly high redundancy of the selected subset caused by the redundant features can also be overcome. Combining the innovations together into a new method, the Redundancy-removing MRLR RedremovingMRLR algorithm is proposed. Several experiments were arranged. A total of three benchmarking nominal datasets are employed to compare the effectiveness and the efficiency by the naive Bayes classifier and the decision tree classifier on the selected subsets of RedremovingMRLR, MRLR (Gu & Li, 2013) and NMIFS algorithm. The experimental results show that the newly proposed scheme can deliver promising results. The remainder of this paper is organized as follows. The related work and the new definitions are introduced in ‘Notation and Related Studies.’ In ‘The proposed algorithms,’ we derive the framework of the proposed feature selection algorithm RedremovingMRLR. Promising experimental results on benchmarking datasets are presented in ‘Results and discussion,’ which are followed by the concluding remarks in ‘Conclusions.’ NOTATION AND RELATED STUDIES The related studies are introduced in ‘Related work,’ and some new definitions and necessary terminology are presented in ‘Notation and Definitions.’ Related work In this paper, we also use MI, which addresses taking the MI as a matrix of relevance and re- dundancy among the features, to study the nominal-data feature selection methods. Some of the literature about feature selection methods that are based on MI have been issued, and References (Tesmer & Estévez, 2004; Battiti, 1994; Kwak & Choi, 2002; Peng, Long & Ding, 2005; Estévez, Tesmer & Perez, 2009; Chow, Wang & Ma, 2008) are benchmarks. MIFS (Battiti, 1994) selects the features that maximize the information about the classes, which are corrected by subtracting a quantity that is proportional to the average MI of the previously selected features. When there are many irrelevant and redundant features, the performance of MIFS degrades because it penalizes too much redundancy. MIFS-U (Kwak & Choi, 2002) proposed an enhancement of the MIFS algorithm that makes a better estimation of the MI between the input features and output classes. However, although MIFS-U is usually better than MIFS, its performance also degrades in the presence of many irrelevant and redundant features (Estévez, Tesmer & Perez, 2009). Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 3/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 AMIFS (Tesmer & Estévez, 2004) presents an enhancement of MIFS and MIFS-U that overcomes their limitations in high-dimensional feature selection. An adaptive selection criterion is proposed in such a way that the trade-off between discarding the redundancy or irrelevance is adaptively controlled (Estévez, Tesmer & Perez, 2009), which eliminates the need for a user-predefined parameter, i.e., a fixed parameter. By deriving an equivalent form called the minimal-redundancy-maximal-relevance criterion, for first-order incremental feature selection, mRMR (Peng, Long & Ding, 2005) proposes a framework to minimize the redundancy and uses a series of intuitive measures of relevance and redundancy to select the most promising features. The mRMR algorithm combines other wrapper feature selection methods to select good features first according to the maximal statistical dependency criterion based on MI. It can select promising features for both continuous and discrete datasets (Peng, Long & Ding, 2005; Estévez, Tesmer & Perez, 2009). NMIFS (Estévez, Tesmer & Perez, 2009) takes the average normalized MI as a measure of the redundancy among the features (Tesmer & Estévez, 2004); it is also an enhancement of the MIFS, MIFS-U, and mRMR methods. NMIFS outperforms the MIFS, MIFS-U, and mRMR methods without requiring a predefined parameter. However, UFSN (Chow, Wang & Ma, 2008) can directly handle the nominal-data feature selection and overcome the shortcomings of converting data from nominal into binary. Nevertheless, UFSN must depend on the cluster algorithm at the beginning, and it has a high complexity for large datasets during the process of clustering. This paper focuses on three issues that have not been covered in earlier work and highlights them as follows. First, by rewriting the evaluation function in NMIFS (Estévez, Tesmer & Perez, 2009); thus, the new presented algorithm gives a stronger penalty factor to the redundancy than that in NMIFS. Second, the new algorithm considers the redundancy and relevance of the features as a whole, which the NMIFS does not. The new proposed algorithm also realizes less redundancy and more relevance regardless of the relationships between the features, as well as between the features and the class labels. Third, by aiming at the feature selection for nominal data by MI measurement and simplifying the computation of MI between the features of nominal data, several new definitions are given.The experimental results show that the RedremovingMRLR algorithm is effective in nominal-data feature selection. Notation and definitions To realize the nominal-data feature selection efficiently, several new definitions and the corresponding computing methods for them are first given, as follows: Definition 1 Given n values of the ith feature fi, i.e., {a1,a2,...,an} ∈ fi, then the information amount of fi can be expressed as I( fi) =− n i=1 pilog2(pi) (1) where pi represents the frequency of each value in fi, namely, pi = ai |fi| . Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 4/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Definition 2 Conditional mutual information between two different features fi and fj can be expressed as E( fi;fj) =− m j=1 pjI( f ij) (2) where E(fi;fj) represents the dependence degree of the ith feature fi on the jth feature fj, where fj is determined. Here, m denotes the number of values in fj. Definition 3 According to the above definition, the relevance degree between fi and fj can be expressed as G( fi;fj) = I( fi)−E( fj;fi) = I( fj)−E( fi;fj). (3) It can be obtained that the relevance degree between fi and fj satisfies symmetry. I(fi) can be obtained by Formula (1), and E(fj;fi) can be obtained by Formula (2). Definition 4 The preliminary evaluation function of the feature selection for nominal data on each fi can be expressed as J( fi) = G(S∪ fi;C)−  1 |S|  fs∈S G( fs;fi) (4) where G(S ∪ fi;C) represents the relevance degree between the class labels C and the selected subset S after being added to the candidate feature fi. Similarly, the penalty factor β is a user-predefined parameter in MIFS and MIFS-U, and it is difficult to determine. To overcome this limitation, this paper here replaces it with 1/|S|. THE PROPOSED ALGORITHMS Considering the specificity of nominal-data feature selection, this paper performs the following research: Basic idea of the algorithms Based on the above, the algorithm should select the features via a maximum MI with class labels. More concretely, the algorithm should also consider the MI between different features to avoid overlarge feature redundancy. In this way, the uncertainty of the other features can be determined to the maximum extent. This feature selection algorithm first chose the feature that had the largest relevance to the class labels. Then, the relevance degrees between the candidate feature and the selected features as well as the class labels are computed. Finally, the feature that has more relevance to the class label and less redundancy with the selected features is selected. After several iterations, the selected subset that satisfies the conditions can be obtained. Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 5/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Redundancy-removing feature selection algorithm Inspired by NMIFS (Estévez, Tesmer & Perez, 2009), the MI between the candidate feature fi in the dataset and the selected feature fs in subset S is shown in Formula (5). MI( fi;fs) = H( fi)−H( fi|fs) = H( fs)−H( fs|fi) (5) where H(fi) and H(fs) represent the entropy of the feature. H( fi|fs) and H( fs|fi) represent the corresponding conditional entropies. From Formula (5), it can be known that Formula (6) is satisfied. 0 ≤ MI( fi,fs) ≤ min{H( fi),H( fs)}. (6) Furthermore, the concept of a redundancy evaluation operator (Peng, Long & Ding, 2005; Estévez, Tesmer & Perez, 2009) is introduced here and is shown in Formula (7). This approach aims to evaluate the redundancy degree between the features fs. NMI( fi;fs) = MI( fi;fs) min{H( fi),H( fs)} (7) where in Formula (7), it can be known that NMI(fi;fs) ∈[0,1]. When NMI(fi;fs) = 0, the two features are mutually independent, whereas NMI(fi;fs) = 1 means that there is great redundancy between the candidate feature fi and the selected features. On the basis of MIFS (Battiti, 1994), NMIFS (Estévez, Tesmer & Perez, 2009) evolved a new strategy of a redundancy matrix operator for the evaluation function for feature selection and then proposed the NMIFS algorithm (Estévez, Tesmer & Perez, 2009), which selected the candidate feature fi with a maximum evaluation function value as the preferred feature and added it into the selected subset S. The greatest contribution of the NMIFS algorithm is that it addresses automatically preventing the feature that has more redundancy with the selected subset in the selection process (Estévez, Tesmer & Perez, 2009). However, the NMIFS algorithm can adapt only to continuous-attributed data instead of nominal data. Therefore, inspired by the redundancy-removing idea in NMIFS algorithm, Formula (4) is modified into Formula (8). J( fi) = G(S∪ fi;C)−  1 |S|  fs∈S G( fs;fi) min{H( fi),H( fs)} . (8) It is clear that the redundancy between the candidate features and selected features is guided by Formula (8). Thus, Formula (8) can always be used to evaluate whether the candidate feature can be finally selected; specifically, formula (8) can be used as an evaluation function. If so, then, obviously, there are two distinct advantages: (1) it is suitable for nominal data; and (2) it can prevent features that have more redundancy from being selected. To illustrate the above second advantage, we assume an extreme case in which fi only has more redundancy with one of the features in subset S, whereas there is less redundancy or even non-redundancy with the other features. In this case, the value Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 6/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 of  1 |S|  s∈S I( f;s) min{H( f ),H(s)} is less, and the candidate feature fi can be certainly selected into the subset S. The result enables the subset S to still have more redundancy. To overcome this extreme case, the following countermeasures are proposed. By adjusting the penalty factor in the evaluation function, for example, the second part in Formula (8) can be replaced with a stronger penalty factor, such as the second part of Formula (9). J( fi) = G(S∪ fi;C)−argmax  G( fs;fi) min{H( fi),H( fs)} ,fs ∈ S  . (9) Obviously, this extreme case can be overcome. Based on the above, the Redundancy- removing MRLR RedremovingMRLR algorithmis summarized below. Algorithm: AlgorithmRedremovingMRLR Step 1 Initialization: suppose F is the universal set with all features, and S is the empty set; initialize the value of k, which represents the dimensional number of the feature subset that was selected by the feature selection algorithm; Step 2 Compute the relevance degree according to Formula (3); for each feature fi ∈ F, compute G( fi;C); Step 3 According to the computational results in Step 2, fi with the maximum relevance degree value G( fi;C) is selected, and F ← F −{f}, S ←{f}; Step 4 For each fi in the candidate features, compute the preliminary evaluation value by Formula (4). If the preliminary evaluation value of fi is less than or equal to the average, then compute the evaluation value of fi according to Formula (8), whereas if the preliminary evaluation value of fi is greater than the average, then compute the evaluation value of fi according to Formula (9); Step 5 Successively, feature fi with the maximum evaluation value is selected as the next valid feature, and set F ← F −{fi}, S ←{fi}; Step 6 If |S|= k cannot be satisfied, then turn to Step 4; Step 7 Output subset S. The determining process of k in RedremovingMRLR is as follows. At the beginning, the algorithm computes R(F) |F| on the raw dataset as the initialized number of k. While an inflection point appears for the classification accuracy of the RedremovingMRLR algorithm on the selected subset S before and after being added to the next candidate feature, RedremovingMRLR computes R(S) |S| . As long as R(S) |S| ≥ R(F) |F| is satisfied, the norm |S| is the value of k. Specifically, k = |S|, and the |S| = k is the stopping condition of RedremovingMRLR. Here, R(X) is the classification accuracy of a certain employed classifier on a certain selected subset or dataset X. The time cost of the RedremovingMRLR algorithm contains mainly two parts. One is the time to compute the relevance degree between the features; its time complexity can be noted as mnlog2n. The other is the time to obtain the final k of the subset S, which requires k iterations. Thus, its time complexity is kmnlog2n, and the total time complexity of RedremovingMRLR is O(mnlog2n). At this point, the time complexity is the same as that in MIFS and MIFS-U, which clearly shows that the RedremovingMRLR algorithm realizes the nominal-data feature selection without increasing the time complexity. However, the RedremovingMRLR algorithm is not always the perfect approach. If the extreme feature fi is selected as the first feature in subset S, then it is inept at obtaining a Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 7/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Table 1 The experimental benchmarking nominal datasets are employed in this manuscript. Data set Number of features Number of patterns Classes Soybean 35 307 19 Vehicle 18 946 4 krvs 36 3,196 2 prime result. Basically, it can be seen that the new evaluation function in the algorithm has strong application flexibility. The RedremovingMRLR algorithm also employs the methods in MIFS (Battiti, 1994) and Reference (Hu, Xie & Yu, 2007) to compute H(fi), H(fs) in related formulas. Although the probability-equaling discretization processing is conducted on the continuous features, by summing the information entropy of each feature after discretization, MIFS (Battiti, 1994) and Reference (Hu, Xie & Yu, 2007) mainly focused on the feature selection of continuous-attributed data. Because the nominal data are discrete, it is feasible to equivalently replace H(fi), H(fs) in formulas that have nominal features and to directly compute the information entropy. Therefore, the formulas can be directly taken as the evaluation methods with respect to the redundancy between the candidate features and the selected nominal-data features. RESULTS AND DISCUSSION Experimental data sets In this section, experiments are performed on 3 benchmarking nominal datasets (Table 1) from the UCI machine learning repository (Blake & Merz, 2013). In each problem, all of the patterns that have missing feature values are initially removed. The dataset king-rook vs. king-pawn is represented as krvs briefly. Experimental results In this section, we arrange two experiments to test the feature selection performance, removing the redundancy capability and robust capability of RedremovingMRLR for nominal data. The decision tree classifier (Brodley & Utgoff, 1995) and the naive Bayes classifier (Liu, 2003) are employed to evaluate the nominal data subsets. A comparative study between different algorithms is also performed in terms of the indexes for three aspects, namely, (1) the number of final selected features; (2) the classification accuracy of the selected feature subset using different employed classifiers; and (3) the performance of the classification model that is created for different classifiers and both the feature selection complexity and feature classification accuracy. Based on the selected feature subset using the compared algorithms, candidate features are added one by one until the entire raw data set is addressed, and the classification experiment is conducted to evaluate the performance of the algorithm. For NMIFS, the best parameter is selected. Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 8/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Table 2 The selected Subsets by Redundancy-removing MRLR, MRLR and NMIFS algorithm respectively. Dataset FullSet MRLR NMIFS RedremovingMRLR Soybean 35 28 24 18 Vehicle 18 13 16 14 Krvs 36 10 9 8 Table 3 The classification accuracy on the different subsets from the Table 1 by the decision tree classifier. Dataset NMIFS (%) MRLR (%) Redremoving MRLR (%) Soybean 90.4255 ± 0.0835 91.1032 ± 0.1132 92.0213 ± 0.0653 Vehicle 70.8034 ± 0.0584 74.1135 ± 0.0452 74.3499 ± 0.0621 Krvs 99.2028 ± 0.1441 99.0257 ± 0.2167 99.2028 ± 0.1366 Table 4 The classification accuracy on the different subsets from the Table 1 by the naive Bayes classifier. Dataset NMIFS (%) MRLR (%) Redremoving MRLR (%) Soybean 91.1348 ± 0.1567 91.8149 ± 0.0203 90.9574 ± 0.245 Vehicle 45.7447 ± 0.0326 48.5814 ± 0.1095 45.9811 ± 0.0109 Krvs 89.2826 ± 0.2912 93.4455 ± 0.0712 95.9256 ± 0.296 Experiment 1. This experiment aims to quantitatively evaluate the applicability and effectiveness of RedremovingMRLR, and a comparative study among RedremovingMRLR, MRLR and the classical NMIFS is also performed. Here, several high-dimensional datasets (Table 1) that have redundancy features (Chow, Wang & Ma, 2008; Tang & Mao, 2007; Li, Yang & Gu, 2013; Chert & Yang, 2009) in UCI (Blake & Merz, 2013) are employed to evaluate the feature selection capability of the RedremovingMRLR, MRLR and NMIFS algorithm as well as the naive Bayes classifier and decision tree classifier, which are used to test the effectiveness of the selected subset from the different algorithms. This experiment illustrates the performance of RedremovingMRLR, MRLR and NMIFS for redundant-featured datasets. The experimental results are listed in Tables 2, 3 and 4. Table 2 lists the final selected feature subsets from the RedremovingMRLR, MRLR and NMIFS algorithms. The results show that the RedremovingMRLR, NMIFS and MRLR algorithms have higher feature-selecting capabilities for high-dimensional, redundant-featured datasets. In the results, we obtain nine new subsets, Yi,Yi ⊂ {Y1,Y2,...,Y9},i ∈ [1,9], which correspond to the RedremovingMRLR, MRLR and NMIFS algorithms, respectively. Here, we call them basic feature selected subsets to describe in context. However, on the vehicle dataset for the feature selection by the RedremovingMRLR algorithm, the final subset is one dimension more than that from the Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 9/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 MRLR algorithm. However, on the whole, Table 2 actually illustrates that the performance of RedremovingMRLR is superior to MRLR and NMIFS in terms of its feature-selecting capability. Table 3 lists the final classification accuracies that were obtained by the decision tree classifier on the nine basic feature selected subsets above. Table 4 lists the final classification accuracies from the naive Bayes classifier on the nine basic feature selected subsets above. For Tables 3 and 4, to make the comparative study fair, an average value over 10 sets of classification results is taken as the estimation value of the classification accuracy, and the 10-fold cross-validation method on them is taken as well as repeated 3 times. From Table 3, for the decision tree classifier, the classification accuracy of the Redremov- ingMRLR algorithm on the different basic subsets is the best among the three compared algorithms. RedremovingMRLR demonstrates its advantage over the MRLR and NMIFS algorithms in terms of its feature-selecting capability and effectiveness. Furthermore, the results show that the decision tree classifier is appropriate for classifying the nominal data. For the naive Bayes classifier, the classification accuracy of RedremovingMRLR on the krvs’s subset is higher than that of the other two compared algorithms. The main reason is that there exists not only redundancy but strong relevance between the features of the krvs dataset (Tang & Mao, 2005; Tang & Mao, 2007). Additionally, the classification accuracy of RedremovingMRLR on the soybean subset is lower than that of the other two compared algorithms, whereas for the vehicle, it is higher than that of the NMIFS algorithm and lower than that of the MRLR algorithm. From the analysis, we find that the main reason for these distinct differences is the classifying principle that is implemented in the classifiers, namely, the different theorems that form the bases for the different classifiers. The decision tree classifier primarily operates through the selection of classification features based on the acquired information, such as selecting one or a couple of key features as the rooting key node randomly; then, it classifies the data items into different classes along the tree clues that exist in the dataset. Moreover, each feature in the selected subsets is absolutely independent, and which one acts as the rooted node in the tree should not affect the final classification results. The more independent the selected features are, the less influence they have on the classification results. On the other hand, the naive Bayes classifier classifies the data items according to the different probability densities of the values of the different features. However, before or after the features are selected, the probability density of the features is varied not only in the raw dataset but also in the selected subset. Thus, the classification results are diverse. As a result, the difference in the results is due to the different classifying principles in the different classifiers rather than in the classification mechanism. Therefore, the results show that two cases are involved: (1) the specificity of the non-matrix characteristic, disorderliness characteristic, and disparity characteristic in the nominal data; and (2) the decision tree classifier is more suitable for classifying the nominal data. Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 10/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Experiment 2. This experiment aims to evaluate the efficiency and robust capability of RedremovingMRLR, and a comparative study among RedremovingMRLR, MRLR and the classical NMIFS is also performed. The same datasets as in Experiment 1, i.e., Yi,i ∈[1,9], are the subject of experimentation. Here, the experimental examples of each time are generated from one of the basic feature-selected subsets (in Experiment 1) by adding up one-dimensional features one by one individually, according to the descending order of the values of the evaluation function found in formulas (8) or (9). In this case, a series of new temporary subsets are formed after one of the features is added at each time. To describe clearly the context, we call them temporary subsets. For each temporary subset, the decision tree classifier and Naive Bayes classifier are employed to train, and thus, the two classification results are obtained at each step until all of the features add up completely. The experimental results are shown in Figs. 1–3. Furthermore, to make the comparative study fair, the average value of 10 instances of classification results is taken as the estimated value for the classification accuracy, and the 10-fold cross-validation method on each temporary subset for each algorithm is adopted. Figures 1A and 1B shows the classification results on the soybean dataset by the decision tree classifier and the Naive Bayes classifier, respectively. The decision tree classifier and the Naive Bayes classifier all approximately achieve the highest classification accuracy at the beginning of the basic selected subset without adding any other features. From these figures, we can readily see that RedremovingMRLR has the best classification accuracies in most of the cases and is comparatively competitive to NMIFS and/or MRLR even when Re- dremovingMRLR cannot achieve the best classification results. The classification accuracy of RedremovingMRLR always retains stability. This fact indeed indicates the advantage of RedremovingMRLR over NMIFS and MRLR in its robust capability, in an average sense, and shows that RedremovingMRLR has the resistance capability for outer interference. Figure 2A shows the generalization accuracy of the decision tree classifier that uses as inputs the temporary subsets of features selected by RedremovingMRLR, NMIFS, MRLR. It can be seen that the best results are obtained with RedremovingMRLR for 14 or more features. Each algorithm achieved its best classification accuracy near the number of the basic selected subsets. RedremovingMRLR outperforms NMIFS and MRLR for any number of features. Figure 2B shows that the generalization accuracy of the Naive Bayes classifier for the RedremovingMRLR, NMIFS, MRLR algorithms. MRLR outperforms NMIFS and RedremovingMRLR for any number of features. Obviously, Fig. 2 demonstrates the whole change curves on the vehicle dataset for three algorithms with the two classifiers. It can be easily seen that the variation tendency of the classification accuracy is the same or similar. After reaching the full set, the classification accuracy of the different classifiers is the same. From these experimental results, we find that (1) RedremovingM- RLR, MRLR, NMIFS all have redundancy-distinguishing capabilities; (2) the theorem basis of the classifiers certainly affects the classification results, but the affected level is limited; and (3) little interdependency exists in the vehicle dataset besides the redundancy. Figure 3 shows the best results that are obtained for the number of basic feature selected subsets with each compared algorithm. For the two classifiers, the classification results Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 11/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Figure 1 Image of generalization classifier accuracy on soybean by different classifiers. The general- ization classifier accuracy on soybean by different classifiers. (A) Shows the accuracy results on soybean by decision tree classifier; (B) Shows the accuracy results on soybean by naive Bayes classifier. using the 8 features selected by RedremovingMRLR are even better than those using the entire dataset. Figure 3A shows that the changing trend of the classification accuracy by RedremovingMRLR is similar to that of the MRLR algorithm, and it retains a relative height classification rate. Although NMIFS has a better behavior for less than ten features at the beginning, it also has relatively better classification rates after the ten features. In the interval from 10 to 20 features, while selecting some irrelevant or redundant features earlier than the relevant ones, the NMIFS has a bad performance. Figure 3B shows that the Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 12/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Figure 2 Image of the generalization classifier accuracy on vehicle by different classifiers. The gener- alization classifier accuracy on vehicle by different classifiers. (A) Shows the accuracy results on vehicle by decision tree classifier; (B) Shows the accuracy results on vehicle by naive Bayes classifier. variation trend of the classification accuracy by the three compared algorithms is similar. On the whole, they are in the desired descending order. After 14 features, the classification accuracy by the NMIFS algorithm is far lower than that of the other compared algorithms; this situation indicates that the performance of NMIFS is influenced by the selection order of the relevant features, redundant features and irrelevant features. On the whole, from Tables 2, 3 and 4 and Figs. 1–3, besides some wavering variation that exists in Fig. 3B on the krvs dataset by the Naive Bayes classifier for the RedremovingMRLR algorithm, RedremovingMRLR outperforms MRLR and NMIFS with and without Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 13/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Figure 3 Image of the generalization classifier accuracy on krvs by different classifiers. The generaliza- tion classifier accuracy on krvs by different classifiers. (A) Shows the accuracy results on krvs by decision tree classifier; (B) Shows the accuracy results on krvs by naive Bayes classifier. mutations, finding the best solution with a smaller number of features. The classification accuracy using the 8 features that are selected by RedremovingMRLR are even better than those using the entire dataset. From these experimental results, we find that (1) Redremov- ingMRLR always selects all of the features that are in the ideal selection order: first, the relevant features in the desired descending order, second, the redundant features, and last, the irrelevant features, rather than using the converse order. In some krvs-like datasets, both MRLR and NMIFS selected some irrelevant features earlier than the redundant features because they penalize too much the redundancy; (2) the experimental results Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 14/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 here indicate that RedremovingMRLR can be applied effectively to nominal data sets with high-dimensional features and has a relatively stronger redundancy-recognizing capability; and (3) the feature selection strategy utilized in RedremovingMRLR is practicability, and the redundancy matrix operator is expressed as formula (7) and its modification in formula (8) and formula (9) make it robust. CONCLUSIONS In this paper, the novel algorithm RedremovingMRLR, which is a method that aims to select features for nominal data, is proposed. The virtues of the proposed algorithm can be summarized as follows: (1) By forming several new information-related definitions for nominal data, such as the information amount, conditional mutual information, and relevance degree, a series of corresponding improvements in their computation methods are presented. With these, the RedremovingMRLR algorithm takes commonly used MIFS-like forms, which enhances its feature selection performance and effectiveness; (2) A reasonable evaluation function of feature selection deems the proposed algorithms to be fit to select the features from the nominal data. However, the computational complexity does not increase, and the feature selection for the nominal data becomes easier; (3) By considering the relevance and redundancy globally and rewriting the evaluation function that is in NMIFS (Estévez, Tesmer & Perez, 2009) and then employed by RedremovingM- RLR, its redundancy-removing capability and robust capability are enhanced; (4) Our experimental results demonstrate the average advantage of RedremovingMRLR over the algorithms MRLR and NMIFS in terms of the size of the feature selection subset, the feature efficiency and the classification accuracy. Improvements on these proposed methods will require further study. An estimation method for MI for nominal data should be developed in the future rather than employing methods from others. The feature selection capability for nominal data with noisy and mixed features as well as the improvement on the corresponding algorithms will be investigated in succeeding studies. ADDITIONAL INFORMATION AND DECLARATIONS Funding This work is supported by the Future Research Projects Funds for the Science and Tech- nology Department of Jiangsu Province (Grant No. BY2013015-23) and the Fundamental Research Funds for the Ministry of Education (Grant No. JUSRP211A 41).The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Grant Disclosures The following grant information was disclosed by the authors: Science and Technology Department of Jiangsu Province: BY2013015-23. Fundamental Research Funds for the Ministry of Education: JUSRP211A 41. Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 15/17 https://peerj.com/computer-science/ http://dx.doi.org/10.7717/peerj-cs.24 Competing Interests The authors declare there are no competing interests. Author Contributions • Zhihua Li contributed reagents/materials/analysis tools, wrote the paper, reviewed drafts of the paper. • Wenqu Gu conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, performed the computation work. Data Availability The following information was supplied regarding data availability: UCI repository of machine learning database http://www.ics.uci.edu/∼mlearn/MLRepository. Supplemental Information Supplemental information for this article can be found online at http://dx.doi.org/ 10.7717/peerj-cs.24#supplemental-information. REFERENCES Almuallim H, Dietterich TG. 1991. Learning with many irrelevant features. In: Proceedings of the 9th national conference on artificial intelligence. Palo Alto: AAAI Press, 547–552. Battiti R. 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5(4):537–550 DOI 10.1109/72.298224. Blake C, Merz C. 2013. UCI repository of machine learning database [EB/OL]. Available at http:// www.ics.uci.edu/∼mlearn/MLRepository. Brodley CE, Utgoff PE. 1995. Multivariate decision trees. Machine Learning 19(1):45–77. Chert J, Yang Z. 2009. An incremental clustering with attribute unbalance considered for categorical data. Computational intelligence and intelligent systems. In: 4th international symposium, ISICA Huangshi, China. Berlin, Heidelberg: Springer, 420–433. Chow TWS, Wang P, Ma EWM. 2008. A new feature selection scheme using a data distribution factor for unwupervised nominal data. IEEE Transaction on System 38(2):499–509 DOI 10.1109/TSMCB.2007.914707. Estévez PA, Tesmer SM, Perez CA et al. 2009. Normalized mutualinformation feature selection. IIEEE Transactions on Neural Networks 20(2):189–201 DOI 10.1109/TNN.2008.2005601. Gu W, Li Z. 2013. Mutual information-based feature selection algorithm for nominal data, Computer Engineering and Applications, online. Available at http://jsgg.chinajournal.net.cn/ WKC/WebPublication/paperDigest.aspx. Hou C, Nie F, Li X et al. 2014. Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Transactions on Cybernetics 44(6):793–804 DOI 10.1109/TCYB.2013.2272642. Hu Q, Xie Z, Yu D. 2007. Hybrid attribute reduction based on a novel fuzzy rough modeland infor- mation granulation. Pattern Recognition 40(12):3509–3521 DOI 10.1016/j.patcog.2007.03.017. Jain AK, Duin RPW, Mao J. 2000. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22:4–34 DOI 10.1109/34.824819. Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 16/17 https://peerj.com/computer-science/ http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.7717/peerj-cs.24#supplemental-information http://dx.doi.org/10.1109/72.298224 http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://www.ics.uci.edu/~mlearn/MLRepository http://dx.doi.org/10.1109/TSMCB.2007.914707 http://dx.doi.org/10.1109/TNN.2008.2005601 http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://jsgg.chinajournal.net.cn/WKC/WebPublication/paperDigest.aspx http://dx.doi.org/10.1109/TCYB.2013.2272642 http://dx.doi.org/10.1016/j.patcog.2007.03.017 http://dx.doi.org/10.1109/34.824819 http://dx.doi.org/10.7717/peerj-cs.24 John GH, Kohavi R, Pfleger K. 1994. Irrelevant features and the subset selection problem. Machine Learning Proceeds of the 11 International Conference 1:121–129. Kira K, Rendel LA. 1992. The feature selection problem: traditional methods and a new algorithm. In: AAAI-92 proceedings. Palo Alto: AAAI Press, 129–134. Kwak N, Ch C-H. 2002. Input feature selection for classification problems. IEEE Transactions on Neural Networks 13(1):143–159 DOI 10.1109/72.977291. Kwak N, Choi C-H. 2002. Input feature selection by mutual information based on Parzenwindow. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12):1667–1671 DOI 10.1109/TPAMI.2002.1114861. Liu Z. 2003. Construction of Bayesian networks based on mutual information, dissertations. Shanghai: Fudan University, 23–25. Li Z, Yang X, Gu W et al. 2013. Kernel-improved Support Vector Machine for semanteme data. Applied Mathematics and Computation 219:8876–8880 DOI 10.1016/j.amc.2013.03.069. Minho K, Ramakrishna RS. 2009. Projected clustering for categorical datasets. Pattern Recognition Letters 27:1405–1417 DOI 10.1016/j.patrec.2006.01.011. Modrzejejew M. 1993. Feature selection using rough sets theory. In: Proceedings of the Europenn conference on machine learning. Berlin, Heidelberg: Springer, 213–216. Peng H, Long F, Ding C. 2005. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(8):1226–1238 DOI 10.1109/TPAMI.2005.159. Tang W, Mao K. 2005. Feature selection algorithm for data with both nominal and continuous features. In: Advances in knowledge discovery and data mining. Proceedings of the 9th Pacific-Asia conference, PAKDD 2005, Hanoi, Vietnam, May 18–20, 2005. Berlin, Heidelberg: Springer, 683–688. Tang W, Mao K. 2007. Feature selection algorithm for mixed data with both nominal and contin- uous features. Pattern Recognition Letters 28(5):563–571 DOI 10.1016/j.patrec.2006.10.008. Tesmer M, Estévez PA. 2004. AMIFS: adaptive feature selection by using mutual information. In: Neural networks, proceedings 2004 IEEE international joint conference, vol. 1. 303–308. Torkkola K. 2003. Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research 3:1415–1438. Li and Gu (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.24 17/17 https://peerj.com/computer-science/ http://dx.doi.org/10.1109/72.977291 http://dx.doi.org/10.1109/TPAMI.2002.1114861 http://dx.doi.org/10.1016/j.amc.2013.03.069 http://dx.doi.org/10.1016/j.patrec.2006.01.011 http://dx.doi.org/10.1109/TPAMI.2005.159 http://dx.doi.org/10.1016/j.patrec.2006.10.008 http://dx.doi.org/10.7717/peerj-cs.24 A redundancy-removing feature selection algorithm for nominal data Introduction Notation and Related Studies Related work Notation and definitions The Proposed Algorithms Basic idea of the algorithms Redundancy-removing feature selection algorithm Results and Discussion Experimental data sets Experimental results Conclusions References