key: cord-0045469-fm0u1zm7 authors: Lazo-Cortés, Manuel S.; Martínez-Trinidad, José Fco.; Carrasco-Ochoa, Jesús A.; Almanza-Ortega, Nelva N. title: Towards Selecting Reducts for Building Decision Rules for Rule-Based Classifiers date: 2020-04-29 journal: Pattern Recognition DOI: 10.1007/978-3-030-49076-8_7 sha: 6728459031be3cf442f2a5e1bf2be39995537a4f doc_id: 45469 cord_uid: fm0u1zm7 In rule-based classifiers, calculating all possible rules of a learning sample consumes many resources due to its exponential complexity. Therefore, finding ways to reduce the number and length of the rules without affecting the efficacy of a classifier remains an interesting problem. Reducts from rough set theory have been used to build rule-based classifiers by their conciseness and understanding. However, the accuracy of the classifiers based on these rules depends on the selected rule subset. In this work, we focus on analyzing three different options for using reducts for building decision rules for rule-based classifiers . Rule-base classification is a Data Mining technique that consists in, given a set of training instances, identifying certain characteristics of the instances to construct rules that are later used for classifying new instances. Rule-based classifiers are easy to interpret, easy to generate and can correctly classify new instances. The advantages of rule-based classifiers are that they are extremely expressive since they are symbolic and operate with data attributes without any transformation. The calculation of all possible rules of a training sample is a task that requires many computational resources due to its exponential complexity. So, finding ways to reduce the number of rules without affecting the accuracy of a classifier is still an open research problem, see for example [7] . On the other hand, feature selection is a significant task in supervised classification and other problems of pattern recognition focused on eliminating irrelevant and/or redundant attributes [10] . It consists in selecting subsets of the whole set of attributes to reduce the space dimension according to certain criteria. The objective of reducing the dimensions is to find a minimum (or almost minimum) set of attributes that retains all the essential information of the training sample for further tasks of classification or description. Reducing the dimension can help to reduce the number of generated rules, as well as their length. These rules would be simpler and easier to interpret. Then, in practical applications, minimum length descriptions are preferred (see for example [1, 5, 6, 16] ). Reducts have been used to build classifiers based on rules [8, 9] . A reduct is a minimum subset of attributes that retains the same discernibility capacity as the whole set of attributes when considering objects belonging to different classes [12] . Decision rules derived from reducts are useful in practice because of their conciseness and understandability. Nevertheless, usually the number of reducts is too high and consequently the number of rules is too high too. In this work, we focus our effort on analyzing the following three questions at using reducts for building rules in rule-based classifiers: should we use all the reducts? Is it enough to use a single reduct? Is it enough using only the shortest reducts? Here, we present a controlled experimentation as an approach to discuss the above questions. The rest of the document is organized as follows. Section 2 provides some preliminary concepts. In Sect. 3, we present the experiments and discuss the results. Our conclusions are summarized in Sect. 4. In this section, we present some definitions and notations to make the paper more understandable. In Rough Set Theory [12] , the main data representation is a decision table, which is a special case of an information It is important to introduce the definition of the indiscernibility relation. where a(u) denotes the value of the attribute a for the object u. We can find several definitions of reduct (see for example, [11] ), nevertheless, according to the aim of this paper, we refer to reducts assuming the classical definition of discerning decision reduct [13] as follows. (i) IN D(R|d) = IN D(A t |d); (ii) For any a ∈ R, IN D((R − {a})|d) = IN D(A t |d). This definition ensures that a reduct has no lower ability to distinguish objects belonging to different classes than the whole set of attributes, being minimal with regard to inclusion, i.e. a reduct does not contain redundant attributes or, equivalently, a reduct does not include other super-reducts. The original idea of reduct is based on inter-class comparisons. If in M we just keep the columns belonging to a reduct, you can easily see that the classes remain distinguishable, and each column is essential for that purpose. M 1 , M 2 , M 3 and M 4 are the resulting decision tables when the representation space is reduced respectively to each one of the reducts. The generation of effective rules is essential for the development of any classifier that is easily understandable by the user. Any mechanism used for rule generation must maintain the underlying semantics of the feature set. Typically, rule-based classifiers are sensitive to the dimensionality of the dataset since a large number of superfluous or redundant rules may appear. This makes it advisable to try to reduce the dimensionality of the data, and/or the length or complexity of the rules so that the resulting set of learned rules becomes manageable and can overcome the classification results obtained by using rules containing too many attributes. To build the set of decision rules to be used in our rule-based classifiers, we used the tools included in the software RSES ver. 2.2.2 [4] , which has been widely used in the literature, see for example [3, 14, 15] . In RSES, once the reducts of a decision table have been computed, each object in the training sample is matched against each reduct. This matching gives as result a rule having in its conditional part, the attributes of the reduct, each one associated with the values of the currently considered object, and in its decision part it has the class of this training object. At classifying an unseen object through the generated rule set, it may happen that several rules suggest different decision values. In such conflict situations a strategy to reach a final result (decision) is needed. RSES provides a conflict resolution strategy based on voting. In this method, when the antecedent of a rule matches the unseen object, a vote in favor of the decision value of its consequent is cast. Votes are counted and the decision value reaching the majority of the votes is chosen as the class for the unseen object. This simple method may be extended by assigning weights to rules. In RSES, this method (known as Standard Voting) assigns as weight for a rule the number of training objects matching the antecedent of this rule. Then, each rule votes with its weight and the decision value reaching the highest weight sum is considered as the class for the object. Obviously, we can generate a set of rules from any subset of attributes or a set of subsets of attributes. For our study, we use all the reducts, each one of them individually, and all the shortest reducts as a set of attribute subsets.. In this section, we show our experimentation in controlled conditions, as a first approach to study three different options to use reducts for building decision rules in rule-based classifiers. For having controlled conditions in our experiments, we used four datasets (see Table 1 ) taken from the UCI Machine Learning Repository [2] . We select these datasets because they are small and have a small amount of reducts, and not all of them have the same length and the minimum length is reached in more than one reduct. The datasets Glass and Heart(Statlog) were previously discretized. All datasets were split into two folds, one for generating rules and the other for testing. For this, we used a ratio of 0.5. For each dataset, the whole set of reducts for the training fold was computed by using RSES [18] . Table 1 shows the characteristics of the reducts: the third column contains the amount of reducts, the next three columns contain the maximum, minimum and average length respectively; and the last column contain the number of shortest reducts. Figure 1 shows a screenshot of RSES for one of the projects executed in the study of the selected databases. After computing the reducts, using again RSES, the before mentioned sets of rules were generated, first using all the reducts, then using each reduct separately, in addition the rule set considering only the shortest reducts was generated . To use an external selection criterion, the CAMARDF [19] algorithm was used, which generates a minimum length reduct, and the set of rules obtained from that shortest reduct was separately considered. Each testing fold was classified using the rule-based algorithm standard voting, taking into account the four different cases: 1. the rules generated by all the reducts 2. the rules generated by each reduct individually 3. the rules generated by all the shortest reducts and 4. the rules generated by one shortest reduct obtained by the CAMARDF algorithm. Table 2 shows the obtained results. The columns headed as all, shortest and CAM ARDF contain the accuracies obtained for variants 1, 3 and 4, respectively. The remaining columns contain the maximum and minimum accuracy values obtained when applying variant 2. As we can see from Table 2 , for the second and third databases, the best result is achieved when considering the rules obtained from all reducts. In the case of Pima-diabetes, this result is also obtained if the rules generated by the shortest reducts are considered. For the Glass dataset, the best accuracy was achieved for the classifier built with the shortest reducts. However, in the case of the Zoo dataset, neither the rule-based classifier built with all the reducts, nor the classifier built with the shortest ones, obtained the highest result. For this dataset, the best result was obtained twice, for a classifier based on an individual reduct, but in none of the cases it was obtained by using one of the shortest reducts. Taking into account that for this dataset there are 34 reducts, it means that if we randomly choose a reduct to generate the rules for a classifier, the probability of constructing one of maximum accuracy classifiers is approximately 0.06. If we decide to choose one of minimum length reducts, then the probability is 0. Apparently, Sil and Das [17] report a different result for the Zoo dataset using the 10-fold cross-validation technique, although the authors' use of the term minimum length reducts is confusing. As they say, they achieve the minimum length reducts by eliminating the redundant attributes of each of the reducts, which is unclear. Finally, if we decide to select just one shortest reduct to build the classifier using the CAMARDF algorithm, it never would achieve the best result, for any of the four datasets. The problem of how to choose the best rules when building a rule-based classifier remains a problem to solve. Although some authors underestimate the task of computing all reducts of a dataset (perhaps covered by its requirement of excessive resources, given its exponential complexity), our preliminary experiments allow us to conclude that in certain cases, computing a single reduct, or only considering those with a minimum length can provide insufficient results compared to those obtained from all reducts. These results do not lead us to suggest that all reducts should be computed in every problem. More than anything else, our purpose is to establish that it is an unclosed matter and to emphasize that it is justified to continue investigating effective strategies for the selection of rules, especially if for their construction we rely on reducts, given their properties with respect to the ability to discern between objects of different classes. Binary butterfly optimization approaches for feature selection Rough set based segmentation and classification model for ECG The rough set exploration system A data sampling and attribute selection strategy for improving decision tree construction Model selection and the principle of minimum description length A data reduction strategy and its application on scan and backscatter detection using rule-based classifiers Class-specific reducts vs. classic reducts in a rule-based classifier: a case study On the use of constructs for rule-based classification: a case study Computational Methods of Feature Selection Reducts in consistent and inconsistent decision tables of the Pawlak rough set model Rough sets Rough sets, Theoretical Aspects of Reasoning About Data A rough set theory approach for rule generation and validation using RSES A comparative study based on rough set and classification via clustering approaches to handle incomplete data to predict learning styles Association rules mining among interests and applications for users on social networks Variable length reduct vs. minimum length reduct-a comparative study Rough Set Exploration System Research on complete algorithms for minimal attribute reduction