1 Introduction

A classification problem [1] is a research field in the area of data mining [2], which can be tackled in two different ways. An approach to deal with this problem is known as supervised learning, where a function (classifier) is generated from the available and labeled data (classes). Then, when a new example needs to be classified, the learned classifier is responsible to perform the prediction.

In the literature it is possible to find several methods that aim to cope with these problems using supervised learning, such as Support Vector Machines (SVM) [3], decisions trees [4] and neural networks [5]. Here, the focus is on Fuzzy Rule-Based Classification Systems (FRBCCs) [6], because they provide the user with interpretable models by using linguistic labels [7] in their rules. Another reason is related with their accurate results and versatility, as shown in the many different fields where they have been applied like health [8], security [9], economy [10], food industry [11].

An important role in any FRBCS is played by the Fuzzy Reasoning Method (FRM) [12]. This method is responsible to perform the classification of new examples. For that, it makes usage of the information available in the rule base and the database. Moreover, in order to perform the classification, this mechanism uses an aggregation operator in order to aggregate, by classes, the information provided by the fired fuzzy rules when classifying new examples.

A widely used FRM considers the function Maximum as aggregation operator. By using this aggregation function, for each class, the FRM performs the selection of the best fired rule since it has the highest compatibility with the example [13]. The issue of this inference method is that the information provided by the remainder fired fuzzy rules is ignored. The Maximum is an averaging aggregation operator, since the obtained result is within the range between the minimum and the maximum of the aggregated values (in this case, obviously, the result is always the maximum).

To avoid the problem of ignoring information, it was proposed a FRM that applies the normalized sum [12] to perform the aggregation of the available information given by the fired rules. In this way, for each class, all information is taken into account in the aggregation step. This aggregation operator is considered as non-averaging since the result of this function can leave the range minimum–maximum.

In [14] the authors introduced a FRM considering the usage of the Choquet integral (CI) [15], which is an averaging operator. In this way, this approach mixes the characteristics of the previous FRMs considering an averaging operator that uses the information provided by all the fired rules of the system. Moreover, the CI is defined in terms of a fuzzy measure, which provides it with the nice properties to take into account the interaction among the data to be aggregated [15].

The objective of this paper is to discuss different methodologies that change the aggregation step performed in the FRM, when considering different generalizations of the CI, which are supported by solid theoretical studies [16], varying from the generalization by t-norms (\(C_T\)-integrals) [17], by copulas (CC-integrals [18,19,20]) and functions F(\(C_F\)-integrals [21] and \(C_{F1F2}\)-integrals [22, 23]). Moreover, for each generalization it is provided provide a discussion of the main obtained results of each study (we highlight that our focus here are related with the main conclusions and not the specific obtained results of each approach).

This paper is organized as follows. Section 2 present the main components of a FRBC, showing an example of how the aggregation function is used in this context. Sections 36 discuss the theoretical and applied contributions of different generalizations of the CI. Section 8 is the conclusion.

2 The Role of Aggregation Functions in the FRM

Fuzzy Rule-Based Classification Systems (FRBCSs) [6] are extensions of the rule-based system by using fuzzy sets in the antecedents of the rules. The best-known FRBCSs are the ones defined by Takagi-Sugeno-Kang (TSK) [24] and Mamdani [25], which is the one that it is adopted. The standard architecture of the Mamdani method is presented in Fig. 1.

Fig. 1.
figure 1

A structure of FRBCS of the Mamdani type.

Where the Knowledge Base (KB) is composed by:

Data Base (DB) – Stores the membership functions associated with the linguistic labels considered in the fuzzy rules.

Rule Base (RB) – Is composed by a collection of linguistic fuzzy rules that are joined by a connective (operator and). Here we consider that a classification problem ins composed by t training patterns \(x_p = (x_{p1}, \ldots , x_{pm}), p = 1, 2, \ldots , t.\) where \(x_{pi}\) is the i-th attribute and with the rules having the following structure:

$$\begin{aligned} \text{ Rule } \; R_j : \text{ If } \, x_1\, \text{ is }\, A_{j1} \, \text{ and } \,\ldots \, \text{ and } \, x_n \, \text{ is } \, A_{jn} \,&\\ \text{ then } \, \text{ Class } \text{ is } \, C_j \, \text{ with }\, RW_j, \nonumber \end{aligned}$$
(1)

where \(R_j\) is the label of the j-th rule, \(A_{ji}\) is a fuzzy set modeling a linguistic term, modeled by a triangular shaped function. \(C_j\) is the class label and \(RW_j \in [0, 1]\) is the rule weight [26].

The fuzzyfication interface converts the inputs (real values) into fuzzy values. In case of categorical variables, each value is modeled by a singleton and, consequently, its membership value is either 1 or 0. Once the input is fuzzified, the inference process is the mechanism responsible for the use of the information stored in the KB to determine the class in which the example will be classified. The generalizations discussed in this paper are applied at this point.

Once the knowledge has been learnt and a new example \(x_p = {x_{p1}, \ldots , x_{pn}}\) has to be classified, the FRM [27] is applied to perform this task, where M is the number of classes of the problem and L is the number of rules that compose the RB. The stages of the FRM are:

Matching Degree: It represents the importance of the activation of the if-part of the rules for the example to be classified \(x_p\), using a t-norm as conjunction operator:

$$\begin{aligned} \mu _{A_{j}}(x) = T(\mu _{A_{j1}}(x_{1}), \ldots , \mu _{A_{jn}}(x_{n})). \end{aligned}$$
(2)

with \(j = 1,\ldots , L.\) and \(\mu _{A_{j1}}\) as the membership function with relation to a membership function.

Association Degree: For each rule, the matching degree is weighted by its rule weight:

$$\begin{aligned} b^{k}_{j} (x)= \mu _{A_{j}}(x)\cdot RW^{k}_{j}, \end{aligned}$$
(3)

with \(k = Class(R_{j})\) and \(j=1, \ldots , L\).

Example Classification Soundness Degree for All Classes: For each class k, the positive information \(b_j^k(x) > 0\), given by the fired fuzzy rules of the previous step, is aggregated by an aggregation function \(\mathbb {A}\):

$$\begin{aligned} S_{k} (x)= \mathbb {A}_{{k}} \left( b^{k}_{1}(x), \ldots , b^{k}_{L}(x) \right) , \end{aligned}$$
(4)

with \(k = 1, \ldots , M.\)

In what follows, three different well-known FRMs are presented. Observe that their main difference is in the use of a different aggregation function to perform the aggregation of the information provided by the rules:

Winning Rule (WR) – For each class, it only considers the rule having the maximum compatibility with the example.

$$\begin{aligned} S_{k} (x) = \underset{R_{j_k} \in RB;}{\max }b_j(x). \end{aligned}$$
(5)

Additive Combination (AC) – It aggregates all the fired rules, for each class k, by using the normalized sum.

$$\begin{aligned} S_{k} (x)= \displaystyle \frac{\sum _{j = 1}^{R_{j_k} \in RB}{b_j(x)}}{f_{1_{max}}}, \end{aligned}$$
(6)

where \(f_{1_{max}} = \max _{k=1,\ldots ,M} \sum _{j = 1}^{R_{j_k} \in RB}{b_j(x)}\).

The Choquet integral (CI) – It is the function \(\mathfrak {C}_{\mathfrak {m}} : [0,1]^n \rightarrow [0,1]\), defined, for all of \(\boldsymbol{x} \in [0,1]^n\), by:

$$\begin{aligned} \mathfrak {C}_{m} (\boldsymbol{x}) = \sum _{i=1}^{n} \left( x_{(i)} - x_{(i-1)} \right) \cdot m\left( A_{(i)} \right) , \end{aligned}$$
(7)

where \(N=\{0,\ldots , n\}\), \(\mathfrak {m}: 2^{N} \rightarrow [0,1]\) is a fuzzy measureFootnote 1, \(\left( x_{(1)}, \ldots , x_{(n)}\right) \) is an increasing permutation on the input \(\boldsymbol{x}\), that is, \(0 \le x_{(1)} \le \ldots \le x_{(n)}\), with \(x_{(0)} = 0\), and \(A_{(i)} = \{(i), \dots , (n) \}\) is the subset of indices corresponding to the \(n-i+1\) largest components of \(\boldsymbol{x}\). Then:

$$\begin{aligned} S_{k} (x)= \sum _{j = 1}^{R_{j_k} \in RB}\mathfrak {C}_{\mathfrak {m}}(b_j(x)). \end{aligned}$$
(8)

where \(\mathfrak {C}\) is the standard CI and \(\mathfrak {m}\) the fuzzy measure.

Classification: For the final decision, the class that maximizes all the example classification soundness degrees is considered, using the function \(F:[0,1]^M \rightarrow \{1, \ldots , M\} \):

$$\begin{aligned} F (S_{1}, \dots , S_{M}) = arg~\max _{k=1,\ldots ,M}(S_{k}). \end{aligned}$$
(9)

To exemplify the role of different aggregation operator in the FRM, consider a classification problem composed by 3 classes (\(C_1\), \(C_2\) and \(C_3\)). For each one, 3 generic fuzzy rules, \(R_a\), \(R_b\) and \(R_c\) are fired when classifying a new example (they can be different for each class). We present the information about this problem in Table 1. Notice that the numbers in this table represent the positive association degree (Step 2 of the FRM) obtained for each fired rule. Having into account that three fuzzy rules are fired for each class, by columns, three aggregations have to be computed (one for each class).

Table 1. Association degrees for each class.

Since the CI is defined with respect to a fuzzy measure, in this example the standard cardinality (see [28]) is considered as fuzzy measure. The values computed for each class using these three FRMs are the following ones:

  • C\(_1\)

    \(\bullet \):

    WR = 0.94

    \(\bullet \):

    AC = \(\frac{0.94\,+\,0.1\, +\, 0.25}{2.62}\) = 0.49

    \(\bullet \):

    Choquet = \(((0.1 - 0)\) \(\frac{3}{3}\))  +  \(((0.25 - 0.1)\) \(\frac{2}{3}\)) + \( ((0.94 - 0.25)\) \(\frac{1}{3}\)) = 0.43

  • C\(_2\)

    \(\bullet \):

    WR = 0.4

    \(\bullet \):

    AC = \(\frac{0.15\,+\, 0.4 \,+\, 0.1}{2.62}\) = 0.24

    \(\bullet \):

    Choquet =0 \(((0.1 - 0)\) \(\frac{3}{3}\))  +  \(((0.15 - 0.1)\) \(\frac{2}{3}\))  +  \(((0.4 - 0.15)\) \(\frac{1}{3}\)) = 0.21

  • C\(_3\)

    \(\bullet \):

    WR = 0.89

    \(\bullet \):

    AC = \(\frac{0.89\,+\, 0.88\, +\, 0.85}{2.62}\) = 1.0

    \(\bullet \):

    Choquet = \(((0.85 - 0)\) \(\frac{3}{3}\))  +  \(((0.88 - 0.85)\) \(\frac{2}{3}\))  +  \(((0.89 - 0.88)\) \(\frac{1}{3}\)) = 0.87

Once the example classification soundness degree for each class has been computed, the predicted class is the one associated with the largest value (step 4 of the FRM):

  • WR = arg \(\max \)[0.94, 0.4, 0.89] = C\(_1\)

  • AC = arg \(\max \)[0.49, 0.24, 1.0] = C\(_3\)

  • Choquet = arg \(\max \)[0.43, 0.21, 0.87] = C\(_3\)

It is observable that the usage of the maximum as an aggregation operator predicts class 1, since it only considers the information provided by one fuzzy rule (having the maximum compatibility). However, if we look in detail at the association degrees presented in Table 1, this prediction may not be ideal, since that class 1 has one rule having high compatibility whereas class 3 has three rules having high compatibilities (slightly less than that of class 1). Then, class 3 seems to be the most appropriated option. This fact is taken into account by the CI and the AC, since the information given by all the fuzzy rules and not only by the best one is considered and, consequently, the prediction assigns class 3.

In this example, it is noticeable the non-averaging behavior of AC. Observe that the result of this function for class C\(_3\) is greater than the maximum value. This fact does not occur for averaging functions. In the case of WR, the result is always the maximum, meanwhile for the CI the result is a value between the minimum and the maximum. Another interesting point that raises with this example, is that the usage of different aggregation in the FRM is directly related with the performance of the classifier.

3 The \(C_T\)-integral and Pre-aggregations

This study was originally based on [14], where the authors modified the FRM of the Chi et al. algorithm [29] by applying the CI to aggregate all available information for each class. Furthermore, they introduced a learning method using a genetic algorithm in which the most suitable fuzzy measure for each class was computed. We highlight that this fuzzy measure is considered in all the applications of the generalizations of the Choquet integral.

For the first proposed generalization, the product operator of the standard CI was replaced by different other t-norms [30]. In this way, the manner how the information was aggregated would be different, consequently leading into different FRMs that could present performances even more accurately. The Choquet integral generalized by t-norms T, known as \(C_T\)-integral [17], is defined as:

Definition 1

 [17] Let \(\mathfrak {m}: 2^{N} \rightarrow [0,1]\) be a fuzzy measure and \(T: [0,1]^2 \rightarrow [0,1]\) be an t-norm. A \(C_T\)-integral is the function \(\mathfrak {C}_{\mathfrak {m}}^{T}: [0,1]^n \rightarrow [0,1]\), defined, for all \({\boldsymbol{x}} \in [0,1]^n\), by

$$\begin{aligned} \mathfrak {C}_{\mathfrak {m}}^T ({\boldsymbol{x}}) = \sum _{i=1}^{n}T\left( x_{(i)} - x_{(i-1)} , \mathfrak {m}\left( A_{(i)} \right) \right) , \end{aligned}$$
(10)

where \( x_{(i)} \) and \(A_{(i)}\) are defined as in Eq. (7).

Observe that some \(C_T\)-integrals are not aggregation function. E.g., take the minimum t-norm \(T_{M} (x, y) = \min (x, y)\) and the cardinality measure (see [17, 28]), and consider \(\mathbf {x_1} = (0.05, 0.2, 0.7, 0.9)\) and \(\mathbf {x_2} = (0.05, 0.1, 0.7, 0.9)\), where \(\mathbf {x_1} > \mathbf {x_2}\). However, \(\mathfrak {C}_{\mathfrak {m}}^{T_M} (\mathbf {x_1}) = 0.7\) and \(\mathfrak {C}_{\mathfrak {m}}^{T_M} (\mathbf {x_2}) = 0.8\). Thus, the primordial condition of increasingness of any aggregation function is not fulfilled by \(\mathfrak {C}_{\mathfrak {m}}^{T_M}\).

Yet, it is noticeable that the monotonicity property is not crucial for aggregation functions. Take for example a well-known statistical tool, the mode. It is not considered as an aggregation since the monotonicity of this function is not fulfilled, although it is useful. In [31], Bustince et al. introduced the notion of directional monotonicity, which allows monotonicity to be fulfilled along (some) fixed ray. So, with this in mind, the concept of pre-aggregation functions was introduced in [17]. These functions respect the boundary condition as any aggregation function, however, they are directional increasing:

Definition 2

[31] Let \(\vec {r}=(r_1,\dots ,r_n)\) be a real n-dimensional vector, \(\vec {r} \ne \vec {0}\). A function \(F:[0,1]^n \rightarrow [0,1]\) is directionally increasing with respect to \(\vec {r}\) (\(\vec {r}\)-increasing, for short) if for all \((x_1,\dots ,x_n) \in [0,1]^n\) and \(c>0\) such that \((x_1+cr_1,\dots ,x_n+cr_n) \in [0,1]^n\) it holds that

$$\begin{aligned} F(x_1+cr_1,\dots ,x_n+cr_n) \ge F(x_1,\dots ,x_n).\end{aligned}$$
(11)

Similarly, one defines an \(\vec {r}\)-decreasing function.

Now, as the Chi algorithm it is not a state-of-the-art fuzzy classifier, the \(C_T\)-integrals were applied in the FRM of a powerful fuzzy classifier like FARC-HD [32]. The quality of the proposal was analyzed by applying these generalizations to cope with 27 classification problems. The considered datasets are available in KEEL [33] dataset repository. When comparing the different generalizations among themselves, it can be noticed that the one based on Hamacher t-norm was superior to the remaining ones. This fact occurred with four out the five considered fuzzy measures. The best accuracy was obtained when combining the Hamacher product with the power measure. To evaluate the quality of this best generalization, the study has compared it against the classical FRM of WR, since both FRMs apply averaging aggregation functions. In this comparison, it was empirically demonstrated that this generalization is statistically superior to WR and the standard CI.

4 Copulas and CC-integrals

The usage of the generalizations of the CI in a powerful fuzzy classifier has produced satisfactory results to cope with classification problems. However, these generalizations were pre-aggregation functions, that is, the monotonicity is not satisfied. Then, with this in mind, generalizations that are idempotent and averaging aggregation functions were developed. For that, in Eq. (7), firstly the distributivity property of the product operation is considered with the subtraction and then replaced the two instances of the product by copulas [30], obtaining the CC-integrals [18]:

Definition 3

Let \(\mathfrak {m}: 2^{N} \rightarrow [0,1]\) be a fuzzy measure and \(C: [0,1]^2 \rightarrow [0,1]\) be a bivariate copula. The CC-integral is defined as a function \(\mathfrak {C}_{\mathfrak {m}}^C : [0,1]^n \rightarrow [0,1]\), given, for all \({\boldsymbol{x}} \in [0,1]^n\), by

$$\begin{aligned} \mathfrak {C}_{\mathfrak {m}}^C ({\boldsymbol{x}}) = \sum _{i=1}^{n} C \left( x_{(i)}, \mathfrak {m}\left( A_{(i)} \right) \right) - C \left( x_{(i-1)}, \mathfrak {m}\left( A_{(i)} \right) \right) , \end{aligned}$$
(12)

where \( x_{(i)} \) and \(A_{(i)}\) are defined as in Eq. (7).

To demonstrate the efficiency of the CC-integrals to tackle classification problems, an experimental study considering 30 numerical datasets is considered. This study was conducted in two different ways. The first one was focused on comparisons per family of copulas (t-norms, overlap functions [34, 35] and specific copulas), in order to find the function that presented the best generalization. Then, this best generalization is compared with 1) the classical FRM of WR (considering that both functions are averaging); 2) to the standard CI and 3) the best pre-aggregation function achieved in the previous study (\(C_T\)-integral), the one based on the Hamacher t-norm. The best CC-integral is the CMin-integral, constructed with the Minimum copulaFootnote 2. The obtained results showed that the CMin-Integral is statistically equivalent to the CI and the \(C_T\)-integral and superior than the WR.

5 \(C_F\)-integrals

The acquired knowledge from the previous studies shows that the function responsible to generalize the CI is very important. At this point only generalizations with averaging characteristics were presented. Having this in mind, the CI was generalized by special functions, in order to produce more competitive generalizations, allowing to produce non-averaging integrals. To achieve it, its used a family of left 0-absorbing aggregation functions F, which satisfy: (LAE) \(\forall y \in [0,1]: F(0,y) = 0\). Moreover, the following two basic properties are also important:

(RNE) Right Neutral Element: \(\forall x \in [0,1]: F(x,1) = x\);

(LC) Left Conjunctive Property: \(\forall x,y \in [0,1]: F(x,y)\le x\);

Any bivariate function \(F:[0,1]^2 \rightarrow [0,1]\) satisfying both (LAE) and (RNE) is called left 0-absorbent (RNE)-function.

Then, the so-called \(C_F\)-integral [21] is defined as:

Definition 4

[21] Let \(F:[0,1]^2 \rightarrow [0,1]\) be a bivariate function and \(\mathfrak {m}: 2^{N} \rightarrow [0,1]\) be a fuzzy measure. The \(C_F\)-integral is the function \(\mathfrak {C}_{\mathfrak {m}}^{F} : [0,1]^n \rightarrow [0,1]\), defined, for all \({\boldsymbol{x}} \in [0,1]^n\), by

$$\begin{aligned} \mathfrak {C}_{\mathfrak {m}}^{F} ({\boldsymbol{x}}) = \min \left\{ 1, \sum _{i=1}^{n} F\left( x_{(i)} -x_{(i-1)}, \mathfrak {m}\left( A_{(i)} \right) \right) \right\} , \end{aligned}$$
(13)

where \( x_{(i)} \) and \(A_{(i)}\) are defined as in Eq. (7).

In [21, Theorems 1 and 2], it was proved that the set of conditions that the function F should fulfill for the \(C_F\)-integral to be a pre-aggregation function is one of the following ones: Theorem 1 ((LAE) and (RNE)) or Theorem 2 ((LAE), \(F(1,1)=1\) and (1, 0)-increasingness). Moreover, for the \(C_F\)-integral to be averaging F must satisfy (RNE) and (LC). This means that there exist a lot of non-averaging \(C_F\)-integrals.

The quality of the \(C_F\)-integrals to cope with classification problems was tested considering 33 different datasets. The experimental study was conducted considering \(C_F\)-integrals with and without averaging characteristics. Considering the non-averaging functions, six \(C_F\)-integrals were studied. In order to support the quality of this approach, a comparison with the best non-averaging \(C_F\)-integral with the FRM of AC and a FRM considering the probabilistic sum - PS (since it is an operator with non-averaging characteristics) is provided. The results showed that the non-averaging \(C_F\)-integrals-integrals, as expected, offer a performance superior than the averaging ones, and the best \(C_F\)-integral, based on the function FNA2Footnote 3 provides results that are statistically superior than all classical FRMs, and also, very competitive with the classical non-averaging FRMs like AC or PS.

6 \(C_{F_1F_2}\)-integrals

The previous study demonstrated that the generalization of the standard Choquet integral by functions F resulted in satisfactory results. Then, this study combine the ideas of previous approaches, precisely, it take the same idea of CC-integrals, generalizing the each of the two instances of copulas by a pair of functions F, called \(F_1\) and \(F_2\), as consequence obtaining the \(C_{F_1F_2}\)-integrals [22]:

Definition 5

Let \(\mathfrak {m}: 2^{N} \rightarrow [0,1]\) be a symmetric fuzzy measure and \(F_1, F_2:[0,1]^2 \rightarrow [0,1]\) be two fusion functions fulfilling:

  • (i) \(F_1\)-dominance (or, equivalently, \(F_2\)-Subordination): \(F_1 \ge F_2\);

  • (ii) \(F_1\) is (1, 0)-increasing,

A \(C_{F_1F_2}\)-integral is defined as a function \(\mathfrak {C}_{\mathfrak {m}}^{(F_1,F_2)} : [0,1]^n \rightarrow [0,1]\), given, for all \({\boldsymbol{x}} \in [0,1]^n\), by

(14)

where \( x_{(i)} \) and \(A_{(i)}\) are defined as in Eq. (7).

In this paper, twenty-three different functions, F, were considered. As consequence, 201 different pairs of functions that could be used as \(F_1\) and \(F_2\) could be combined, respecting the dominance property. An important question that could appear is related to the choice of the function to be selected as \(F_1\) and the one to act as \(F_2\). Therefore, a methodology to reduce the scope of the study have been proposed by using the concept of Dominance and Subordination Strength degree, DSt and SSt respectively.

Definition 6

Let \(\mathcal {F} = \{F_1, \ldots , F_m\}\) be a set of m fusion functions. The dominance and subordination strength degrees, DSt and SSt, of a fusion function \(F_i \in \mathcal {F}\) are defined, respectively, for \(j \in \{1, \ldots , m\}\), by as follows:

$$\begin{aligned} DSt (F_i)= & {} \frac{1}{m} \sum _{j=1}^m \left\{ \begin{array}{ll} 1 &{} \text{ if } \; F_i \ge F_j, \\ 0 &{} \text{ otherwise } \end{array} \right. \cdot 100\% \\ SSt (F_i)= & {} \frac{1}{m} \sum _{j=1}^m \left\{ \begin{array}{ll} 1 &{} \text{ if } \; F_i < F_j, \\ 0 &{} \text{ otherwise. } \end{array} \right. \cdot 100\% \end{aligned}$$

The generalizations provided in this study are non averaging. Moreover, they satisfy the boundary conditions of any (pre) aggregation function. However, considering the monotonicity, we observed that these functions are neither increasing nor directional increasing. In fact, they are Ordered Directionally (OD) monotone functions [36]. These functions are monotonic along different directions according to the ordinal size of the coordinates of each input.

The \(C_{F_{1}F_{2}}\)-integrals were used to cope with classification problems in 33 different datasets. When analyzing the results that were obtained by the usage of these generalizations, it is noticeable that the combination of a function having a high dominance as \(F_1\) combined with a function with high subordination as \(F_2\) presented the best results of this study (from the top ten of the best global accuracies from the 81 pairs, eight have this characteristic). We also observed that the opposite, for each function \(F_2\), is also true and that its best results are achieved when using a \(F_1\) with a high dominance.

The performance of this proposal is analyzed by comparing them against distinct state-of-the-art FRBCSs, namely: FARC-HD [32], FURIA [37], IVTURS [38], a classical non-averaging aggregation operator like the probabilistic sum, \(P^*\), and, the best \(C_F\)-integral that was selected from the previous study, \(F_{NA2}\). In this comparison, FURIA was the fuzzy classifier that achieved the highest accuracy mean, however, our new approach achieved a close classification rate. Furthermore, the number of specific datasets where the performance of our generalization is the worst among all the methods in the comparison is less than that of FURIA. The function representing the \(C_F\)-integrals also achieved good results, meanwhile the remainder cases (IVTURS, \(P^*\) and FARC-HD) where inferior and similar among themselves.

The 81 pairs of combinations considered to construct \(C_{F_{1}F_{2}}\)-integrals were compared against IVTURS, \(P^*\), FARC-HD and \(F_{NA2}\). The results highlighted the quality of our new method because an equal or greater average result was obtained by 39, 36, 34 and 12 different combinations in these comparisons.

Finally, from the considered pairs, it was observed that five different \(C_{F_{1}F_{2}}\)-integrals were considered as control variable in the statistical test in which all methods are compared, including FURIA. The last generalization only presented statistical differences with respect to FARC-HD. However, for any remaining pair, it is statistically equivalent when compared to FURIA and to \(F_{NA2}\) and superior to IVTURS, \(P^*\) and FARC-HD.

7 Detailed Results

  In this section the results obtained by the usage of different aggregation operator are shown. We highlight that these results consider the same 33 datasets as in [21, 22] and [28]. Also, the results are related with the power measure, as mentioned previously, take into consideration the 5-fold cross validation technique [2] and are applied in the FRM of the FARC-HD [32] fuzzy classifierFootnote 4.

The results are provided in Table 2 where each cell correspond to the mean accuracy among all folds, the rows are related with the different considered datasets and the columns are the results obtained by classical FRMs such as: of the Additive Combination (AC), Probabilistic Sum (PS), Winning Rule (WR), Choquet integral (CI), \(C_T\)-integrals (due to lack of space were summarized to int, in all integrals) with is defined by the Hamacher product t-norm, CC-integral that use the copula of the minimum, \(C_F\)-integral considering the \(FNA_2\) function and \(C_{F_{1}F_{2}}\)-integrals using the pair GMFBPC.

Table 2. Detailed results achieved in test by different generalizations of the CI.

From the detailed results, we can noticed that classic FRM of the WR is the one that achieved the lowest global mean, indicating that the usage of all information related with the problem is an interesting alternative. Moreover, it is also observable that all non-averaging generalizations (AC, PS, \(C_F\)-integrals and \(C_{F_{1}F_{2}}\)-integrals) presents superior results when compared against the averaging ones (WR, IC, \(C_T\)-integral and CC-integral).

The results also showed that the generalizations of the CI (\(C_T\), CC, \(C_F\), \(C_{F_{1}F_{2}}\)-integrals) provided a superior performance in comparison to the standard CI. Finally, as mentioned before, the largest performance is obtained when the \(C_{F_{1}F_{2}}\)-integral is used to cope with classification problems.

8 Conclusions

The application of the Choquet integral (CI) in the Fuzzy Reasoning Method (FRM) of Fuzzy Rule-Based Classification Systems (FRBCSs) modified the way in which the information was used and enhanced the system quality. After that, many generalizations of the CI were proposed and also applied in FRM, obtaining success as well. In this paper the main contributions, theoretical and applied, of the generalizations are summarized and discussed.

The first generalization was built by the replacement of the product operator of the standard CI by different t-norms. These generalizations were supported by an important theoretical concept known as pre-aggregation functions. Differently from a simple aggregation function, a pre-aggregation function is monotonic only in a determined direction. This first generalization produced averaging functions and its applications to cope with classification problems showed that the generalization by the Hamacher product t-norm was superior than the FRM of the Winning Rule (WR) and the CI.

The second step aimed in generalizations of the CI that produce aggregation functions. To do so, the IC was used in its expanded form and generalized by copula functions, introducing the concept of Choquet-like Copula-Based aggregation functions, the so called CC-integrals. These functions also present averaging characteristics. The results of their applications demonstrated that the classical WR was statistically overcame.

It is observable that up to this point only generalizations with averaging characteristics were presented. On the otter hand, fuzzy classifiers known as state-of-the-art take into account the usage of non-averaging functions. Thus, to produce more competitive generalizations, a family of fusion functions F were introduced. The generalization of the Choquet integral by F functions introduced the concept of \(C_F\)-integrals. This generalization has averaging and non-averaging characteristics, it depends on the considered function. It was observed that the application of any non-averaging function statistically overcome any averaging one. Also, the developed operators outperforms the classical WR and Additive Combination (AC).

The generalization of the expanded CI by two functions F, \(F_1\) and \(F_2\), introduced the concept of \(C_{F_1F_2}\)-integrals. These functions present an Ordered Directional increasing functions (OD increasing) and, therefore, represent a different level of aggregation operators. The summit of the performance in the classification problems was reached in this generalization. To do so, a methodology to select different functions as \(F_1\) and \(F_2\) were presented, based on the concept of degrees of dominance and subordination. For the considered \(C_{F_1F_2}\)-integrals, in five different cases the generalizations are equivalent, or even superior, in comparison with fuzzy classifiers found in the literature.

Taking as basis the analysis provided by this paper, some interesting research points emerge. For example, the application of these generalizations in the FRM of different fuzzy classifiers. Also, considering that the generalizations are based on the Choquet integral, the usage of a different operator, such as the Sugeno integral can produce even more powerful operators. Finally, the combinations with different fuzzy measures are an alternative with great potential.