1 Introduction

In a classifier ensemble, the individual classifiers work together to create a more robust and accurate system for classifying patterns. These individual classifiers, known as ensemble members, operate in parallel, each one receiving the same input pattern and independently producing its own output. A combination method receives the members outputs and provides the global output of the system [1].

Classifier ensemble is a two-step classification structure in which the main parameters are the individual classifiers (first step) and the combination method (second step). In these systems, one important aspect is the definition of the ensemble structure.

Several studies have proposed different ways to define the ensemble structure such as: Optimization techniques, meta-learning, among others [2, 3]. However, most of these studies are related to the selection of individual classifiers. We can find some studies that investigate efficient methods for combining classifiers in ensemble systems, such as in [4,5,6,7].

The selection of ensemble parameters (classifiers and/or combination methods) can be static or dynamic. In static selection, the ensemble structure is defined during the training phase and remains fixed throughout. In contrast, dynamic selection adapts the ensemble structure for each test instance, often enhancing predictive performance [4, 8].

Dynamic selection can be applied to both classifiers and combination methods, with most studies focusing on the dynamic selection of classifiers [6, 7, 9]. This is often based on techniques such as Region of Competence, Hyper-boxes, or Meta-learning. However, very little has been done to define a fully dynamic selection, making the selection of the ensemble parameters an automatic process.

To advance the design of efficient classifier ensembles, this paper conducts an exploratory analysis of integrating dynamic selection into the main ensemble parameters. In this investigation, the dynamic selection will be applied in one ensemble parameter (classifier or combination) and in both parameters. The main aim of this analysis is to assess the impact of the dynamic selection in the two most important parameters of a classifier ensemble. In other words, to analyze whether the dynamic selection leads to more efficient ensembles when applied to individual classifiers, combination methods or in both parameters at the same time.

In this analysis, three different scenarios will be evaluated. In the first scenario, the individual classifiers will be selected dynamically while the combination method will be selected statically. For this scenario, three well-known DES (dynamic ensemble system) methods are used, which are: KNORA-Eliminate (KNORA-E [4]) FH-DES [10] and META-DES [5]. In the second scenario, the combination method is selected dynamically while the individual classifiers will be selected statically. For this scenario, a dynamic fusion method is presented.

Finally, in the last scenario, both individual classifiers and combination method will be selected dynamically. As baseline, a full static ensemble structure will also be investigated in order to assess the impact of the dynamic selection in the performance of the classifier ensembles. All the ensemble structures will be evaluated using 20 classification datasets.

This paper is divided into 6 sections and its organization is defined as followed. Section 2 describes the theoretical concepts and related work of this paper, while Sect. 3 describes in more detail the ensemble structures to be used in the exploratory analysis of this paper. Section 4 presents the experimental methodology of the empirical analysis, while its results are presented in Sect. 5. Finally, Sect. 6 describes the final remarks of this paper.

2 Theoretical Concepts and Related Work

2.1 State of the Art

There are several studies that investigate the dynamic selection of ensemble structure, mainly for ensemble members [6, 7] and features [11] and both of them [9, 12].

Regarding ensemble members, in [6], for instance, a new method for dynamic ensemble member selection is presented. In this method, the confidence of the base classifiers during the classification and its general credibility is used as selection criterion.

Another interesting way is to use region of competence as selection criterion, making it possible to improve the combination of classifiers, in which the most competent ones in a certain region are selected. The use of region of competence as selection criterion helps to maximize results by focusing only on the most competent classifiers, and examples can be found in KNORA-E [4] and META-DES [5].

In terms of dynamic feature selection, in [11], a dynamic feature selection approach was proposed. The main aim of this approach is to select a different subset of features for one instance or a group of instances. The main goal of this approach is to explore the full potential of all instances in a classification problem.

In [9], an initial study on how to combine these two dynamic selection techniques was performed. According to the authors, an improvement in performance was detected with the use of this integrated dynamic selection technique. Already in [13], the authors presented an initial method of classifier fusion using K-nearest neighbors (KNN). Although the results are promising, there is no general comparison with static ensembles.

Although there are several studies to propose dynamic selection of ensemble members and feature selection, very little has been done in order to propose efficient dynamic selection of combination methods. This paper tries to bridge this gap and it proposes a dynamic selection method based on region of competence.

2.2 Classifier Ensembles

It is well-known that there is not a single classifier which can be considered optimal for all problem domains [1]. Therefore, it is difficult to select a good single classifier which provides the best performance in practical pattern classification tasks [14].

In this context, classifier ensembles have emerged as an efficient classification structure since it combines the advantages and overcomes the limitations of the individual classifiers. Thus, studies have shown that classifier ensemble provides better generalization and performance ability, when compared to the individual classifiers [14, 15].

In a classifier ensemble, an input pattern is presented to all individual classifiers [16, 17], and a combination method combines their outputs to produce the overall output of the system [1]. The Machine Learning literature has ensured that diversity plays an important role in the design of ensembles, contributing to their accuracy and generalization [1].

One important issue regarding the design of classifier ensembles involves the appropriate selection of its structure (individual classifiers and combination methods) [18]. As previously mentioned, there are basically two main selection approaches, static and dynamic. In this paper, we will focus on the dynamic approach. The next subsection will describe some existing dynamic selection methods that will be used in this paper.

2.3 Dynamic Ensemble Member Selection

The Dynamic Ensemble Selection (DES) methods perform the dynamic selection of a subset of classifiers to classify each test instance. The selection of the classifier subset is done through the use of a selection procedure and each DES method has its own procedure. There are several DES methods proposed in the literature. In this paper, we will use two well-known DES methods, KNORA-E and META-DES.

KNORA-E. Knora [4] is a well-known DES method and it seeks to find the best subset of classifiers for a given test instance. It applies a k-Nearest Neighbors method. The neighbors of a testing instance are selected from the validation set and the competence of each classifier is calculated. Based on a certain selection criterion, the classifier subset is selected.

KNORA-E is a Knora-based method, and the selection criterion is to select a set of classifiers formed only by the classifiers that correctly classify all k neighbors of a testing instance. In the case where no classifier can correctly classify all k neighbors, the k value is decremented by one and this is done until at least one classifier can be selected [4].

META-DES. The META-DES [5] is a DES method that uses the idea of selection using meta-learning. In this method, a meta-problem is created to determine whether a classifier is competent for a given test instance. According to [12], the META-DES method uses five criteria for extracting meta-features in order to establish the new region of a meta-problem.

After that, a meta-classifier is trained, based on the defined meta-features. This meta-classifier is then used to identify whether a classifier is competent or not to classify a testing instance. Classifiers that are labeled as competent will be selected to compose the ensemble to classify the test instance.

FH-DES. The FH-DES is also a DES method, but based on fuzzy hyperboxes [19] to solve the local sensitivity problem using KNN. Hyperboxes represent a group of samples using maximum and minimum corners. They are formed based on regions that classifiers work well or areas of competence, but can also be applied to regions that present poor classifications or areas of incompetence.

In the latter case, the classifiers whose hyperboxes have a lower degree of relevance will be further away from the query sample, therefore, they will be more competent to classify the query sample [10]. The tool can be applied in three ways, using a sum rule based on hyperbox weights, performing only the selection of competent classifiers, and hybrid, in which it uses the weights and selection of classifiers.

3 Proposal

3.1 Dynamic Selection Scenarios

In order to carry out the exploratory analysis of the dynamic selection in classifier ensemble, three scenarios are defined, which are described as follows.

  1. 1.

    Full static ensemble selection (FSES): In the first scenario, we will have full static ensembles. In other words, all ensemble parameters are selected statically.

  2. 2.

    Partially dynamic ensemble selection (PDES): In the second scenario, we will have partial dynamic ensembles, having the dynamic selection in only one parameter: ensemble members or fusion, but not both of them. Therefore, this leads to two sub-scenarios:

    1. 1.

      Partial - Dynamic Member Selection (P-DMS): In this case, only the ensemble members (individual classifiers) will be dynamically selected.

    2. 2.

      Partial - Dynamic Fusion Selection (P-DFS): In this cases, only the combination method will be dynamically selected. The Dynamic Fusion Selection method is presented in Sect. 3.2.

  3. 3.

    Full dynamic ensemble selection (FDES): in this scenario, full dynamic ensembles, both parameters (ensemble member and combination methods) will be chosen dynamically for each new instance of teste, as well as which fusion.

These three scenarios were selected because they gradually increase the dynamicity of the selection of the ensemble parameters. Therefore, we aim at evaluating the impact of the dynamic selection in the design of robust classifier ensembles.

3.2 The Dynamic Fusion Selection Method

In this paper, in order to achieve dynamicity in the selection of combination methods, we present the DFS (Dynamic Fusion Selection) method, which is an algorithm that dynamically selects the combination methods from a set of methods.

In other words, for each test instance, the most appropriate combination methods is selected. The selection is carried out in the testing phase. DFS calculates the competence of each combination method with respect to the presented test instance. Algorithm 1 presents the main steps of the DFS processing.

As it can be observed, DFS has two main parameters: the number of neighbors and the set of combination methods, which comes from the algorithm input. The number of neighbors determines the size of the neighborhood used to calculate the selection criterion. The set of combination methods define the methods that will be used in the dynamic selection method.

figure a

Competence is calculated based on the accuracy of the combination methods using the neighbors of the test instance (line 5). If there is a tie in the local accuracy, the number of neighbors is increased by 1 (k = k + 1) (line 2) until one combination method is selected. There is still a tie, the combination methods with accuracy equal to the maximum accuracy are selected (line 12).

Finally, DFS can be applied to a pool of classifiers selected statically or dynamically. In this paper, this method will be used in two dynamic selection scenarios, partially dynamic selection and the full dynamic selection one.

4 The Experimental Methodology

In this section, the main aspects of the empirical analysis will be described, mainly the used datasets, as well as its methods and materials.

4.1 Datasets

This paper uses datasets extracted from the UCI Machine Learning repository. Table 1 presents some characteristics of these datasets, including the number of instances (Inst), the number of attributes (Att) and the number of classes (Class).

Each dataset is divided into training, Validation1, Validation2 and Testing sets, in a proportion of 50%, 16.7%, 16.7%, and 16.6%, respectively. The training set is used to train the pool of classifiers (ensemble members). The testing set is used to assess the performance of the classifier ensembles. The Validation2 set is used to train the trainable combination methods (Neural Networks and Naive Bayes) while the Validation1 set is used to obtain the selection criteria of the presented dynamic member selection methods.

Table 1. Description of the used datasets.

This division is performed 30 times and the presented results of each ensemble configuration represent the average values over these 30 values.

4.2 Methods and Materials

In this paper, all classifier ensembles used decision trees as individual classifiers, generated through the Bagging method. In all analyzed scenarios, 6 different pool sizes are used, which are: 5, 10, 15, 20, 25, and 30 individual classifiers. The remaining parameter values were defined through extensive grid search experimental evaluation.

For the FSES scenario, the classifier ensembles will be evaluated using 12 different combination methods, which are: Majority Vote, Sum, Max, Min, Geometric Mean, Naive Bayes, Weigthed Sum, Weigthed Vote, Edge and three Multilayer Perceptron (MLP) versions: Hard, Soft, and Soft-Class. The three Neural Networks (NN) versions differ in the input information received by the ensemble members. In the Hard version, the ensemble member provides only the winner class for the testing instance. In other words, this MLP version is trained and tested using only the winner class of each ensemble member.

In the other two MLP versions, the prediction probability for each class is used. In this sense, the prediction probability for each class is provided for both MLP versions. Additionally, the Weighted sum and Weighted vote methods use weights in their functioning. The used weight is 1/(distance-of-classes), and it is applied to the voting procedure in the Weighted Vote as well as to the outputs of the classifiers in the Weighted sum method.

For the first case of the PDES scenario (P-DMS), three well-known methods are used Knora-E [4], META-DES [5] and FH-DES [19]. In the second case of the PDES scenario (P-DFS), the dynamic fusion method presented in Sect. 4.2 is used to dynamically select the most suitable combination method for each testing instance. For the FDES scenario, all three methods used in the F-DMS will be combined with the method for the P-DFS case, leading to 3 PDES variations (Knora-E with dynamic fusion selection, META-DES with dynamic fusion selection, and FH-DES with dynamic fusion selection in fusion).

It is important to highlight that the DFS method, and some of the DMS methods (KNORA-E and META-DES) use the idea of region of competence. In this sense, the same number of neighbors are used for all cases, in the selection of the combination method (DFS) and in the dynamic member selection (KNORA-E and META-DES), being defined in each iteration by means of a grid search between the values of 3, 7 and 11, using KNN in the dataset Validation 1. For the FH-DES, there is no idea of a region of competence, but rather the use of hyper-boxes, which were created based on the areas of incompetence as presented by the authors in [10].

The results of all analyzed methods will be evaluated using the Friedman statistical test [20]. The Friedman test is used to be able to state the hypothesis that the k-related observations derive from the same population (similar performance) or not (superiority in performance). In this test, the significance level used was set to 0.05.

Hence, if the p-value is less than the established value, the null hypothesis is rejected, with a confidence level greater than 95%. In cases where a statistically significant difference is detected, the Nemenyi post-hoc test is applied [21]. In order to present the obtained results by the post-hoc test, the critical difference diagram (CD) [21] is used. This diagram was selected in order to have a visual illustration of the statistical test, making it easier to interpret the obtained results.

5 The Obtained Results

This section presents the results obtained by the empirical analysis, in terms of accuracy levels. First, all Full Static configurations is evaluated. Then we evaluate separately the two cases of the PDES scenario, along with the best full static configuration. Finally the FDES scenario is evaluated, along with the best full static configuration, the best dynamic member fusion configuration, and the best dynamic fusion configuration. In order to define the best configuration, the critical difference diagram is used. If there is no statistical difference between the best techniques, then the one with the highest average is selected.

5.1 Full Static Ensembles

Tables 2, 3, 4, 5 and 6 present the accuracy results of all analyzed methods. As mentioned previously, 6 configurations were made for pool sizes; each configuration was executed 30 times. Therefore, the values in Tables 2, 3, 4, 5 and 6 represent the average of all 180 results. Additionally, the last row of all tables represent the overall accuracies for all 20 datasets. Finally, the numbers in bold represent the highest accuracy for each data set.

For the FSES scenario (Table 2), it can be seen that all methods (except Edge Fusion) provided the highest accuracy levels in at least three datasets. In a general perspective, it can be observed that the ensembles combined by Sum (second column) presented the highest overall accuracy (89.09%) and highest accuracy in 8 out of 20 datasets. They were followed by all three NNs (6 data sets) and Majority vote (5 data sets).

In order to evaluate the obtained results from a statistical point of view, the Friedman test [20] was applied to verify if there are statistical differences among all ensemble classifiers. The Friedman test was applied to all 12 FSES configurations. Therefore, the statistical test detected statistical differences among all analyzed methods, with a p-value < 0.05. In this sense, the post-hoc test was applied, and the results are presented in the Critical Difference Diagram [21], depicted in Fig. 1.

Table 2. Score of fusion methods in full static ensemble.
Fig. 1.
figure 1

Critical Difference Diagram for Full Static Ensemble

As it can be seen in this figure, the CD diagram detected no statistically significant difference in the accuracy of Sum, MLP Soft, and MLP Soft Class. For the remaining configuration, they provided superior performance, detected by the statistical test. Since no method stood out, SUM-combined ensemble is selected the remaining tests since it achieved the best overall highest accuracy level.

5.2 The PDES Scenario

In this subsection, the results for the PDES scenario are presented. For a comparative analysis, in each table, the accuracy of the best FSES configuration (FSES-SUM) is also presented. As mentioned previously, three well dynamic member selection are used. In this paper, Table 3, 4 and 5 represent the values of KNORA-E, FH-DES and META-DES, respectively. Thus, for each method, the result of the P-DFS case is also presented, leading to a total of 14 analyzed methods.

KNORA-E. For KNORA-E (Table 3), it can be seen that the P-DFS method presented the best overall accuracy level (89.88%) and obtained the best results in 4 out of 20 datasets. However, the KNORA-E method combined by the Majority Vote method achieved the best results in 9 datasets, presenting the same average accuracy as KNORA-E combined by Sum (89.85%) with 8 best results. Weight fusion methods showed the worst results. The Friedman test was then applied, which identified statistical differences with a p-value < 0.05.

Table 3. Score of fusion methods for partial dynamic ensembles in KNORA-E

When applying the post-hoc test, the CD Diagram, in Fig. 2, results showed that KNORA-E Sum and KNORA-E Vote provided the best performance. However, these methods provide similar performance from a statistical point of view. For the other methods, the statistical test detected a superiority, in terms of accuracy, of these methods.

The P-DFS and KNORA-E Edge methods also showed good results, being superior to the remaining methods, from a statistical point of view. As there was not just one method that stood out, the KNORA-E Vote method is selected for the best KNORA-E configuration. When comparing the P-DFS method and the best KNORA-E method (P-DMS) method, although the P-DFS method provided the best overall accuracy, the statistical test showed that the use of dynamic selection in the ensemble members provided more robust ensembles.

Fig. 2.
figure 2

Critical Difference Diagram for Partial Dynamic Ensembles in KNORA-E.

FH-DES. For FH-DES, in Table 4, it can be seen that FH-DES - Sum presented the best overall result (90.51%) and obtained the best results in 4 of the 20 datasets. Followed by the FH-DES - Majority Vote method with 90.49%, which obtained the best results in 3 datasets.

Table 4. Score of fusion methods for partial dynamic ensembles in FH-DES.

The FH-DES - MLP Hard and Weighted Sum methods also presented the best results in 4 datasets; however, the results were slightly lower than the best FH-DES methods. Fusion by Min and Geometric Mean provided the worst results. In order to check whether the accuracy levels derive from the same population, the Friedman test identified statistical differences with a p-value < 0.05.

In the post-hoc test, the Critical Difference Diagram (Fig. 3) in accuracy of FH-DES - Sum, Majority Vote, and MLP Soft Class did not present statistically significant differences. The FH-DES - Min and Geometric Mean presented the worst results, detected by the statistical test. Once again, as there was not just one FH-DES method that stood out in the statistical test, the FH-DES - Sum method was selected to be the best FH-DES method.

Fig. 3.
figure 3

Critical Difference Diagram for Partial Dynamic Ensembles in FH-DES.

When comparing the P-DFS method and the best FH-DES method (P-DMS) method, we can observe a superiority of the P-DMS case, showing that the use of dynamic selection in the ensemble members provided more robust ensembles.

META-DES. In Table 5, for Meta-DES, we can observe that META-DES - Sum and Majority Vote presented the best overall results (90.33%). META-DES - Sum obtained the best results in 8 of the 20 datasets. The META-DES - Vote method obtained the best results in 7 datasets, presenting a slightly lower accuracy than Sum. The Friedman test was then applied, which identified statistical differences, with a p-value < 0.05 in the tests.

When applying the post-hoc test, the results of the CD Diagram (Fig. 4) did not show any statistically significant difference in the accuracy of META-DES - Sum and Majority Vote again, but providing higher levels of accuracy, when compared to the other methods, from a statistical point of view.

Table 5. Score of fusion methods for partial dynamic ensembles in META-DES.
Fig. 4.
figure 4

Critical Difference Diagram for Partial Dynamic Ensembles in META-DES.

When comparing the P-DFS method and the best META-DES method (P-DMS) method, once again, we can observe a superiority of the P-DMS case, showing that the use of dynamic selection in the ensemble members provided more robust ensembles.

5.3 The FDES Scenario

For the FDES scenario, the Dynamic Fusion methods were combined with META-DES, KNORA-E and FH-DES, leading to 3 FDES configurations. For comparison purposes, the best 3 PDES configurations (one for META-DES, one for KNORA-E and one for FH-DES) are also presented, along with P-DFS and the best FSES configuration. Table 6 presents the results of all evaluated methods.

Table 6. Score of fusion methods in full dynamic ensembles.

From Table 6, we can see that FDES (FH-DES) achieved the best overall accuracy levels (90.86%), closely followed by FDES (META-DES) (90.75%), and then FDES (KNORA-E) (90.52). These methos delivered the best result in 5 out of 20 datasets each. Then, the FH-DES SUM method is the best PDES case (90.51%), followed by META-DES VOTE and KNORA-E VOTE and then the P-DFS method. Finally, the worst result was obtained by the FSES - SUM method. As it can be seen, the use of dynamic selection on both ensemble members and combination methods provides the most robust classifier ensembles.

Figure 5 presents the CD Diagram of the post-hoc test on the results of Table 6. From this figure, it was possible to detect that the accuracy of all three FDES configurations showed the most accurate classifier ensembles. Additionally, the statistical test detected superiority, in terms of accuracy of the FDES configurations, when compared to the remaining analyzed methods.

The results obtained in Fig. 5 corroborates the results of Table 6, in which the use of dynamic selection on both ensemble members and combination methods provides the most robust classifier ensembles. When using dynamicity in the selection of one parameter, the dynamic selection of ensemble members provided the best results. Finally, the static selection delivered the worst results, detected by the statistical test.

Fig. 5.
figure 5

CD Diagram for Full Dynamic Ensemble

6 Final Remarks

This paper proposed an exploratory analysis of the dynamic selection of the most important parameters of a classifier ensemble. In order to perform this analysis, three dynamic selection scenarios are defined, which are: FSES (Full Static Ensemble Selection); PDES (Partial Dynamic Ensemble Selection) with two cases: P-DFS (Partial - Dynamic Fusion Selection) and P-DMS (Partial - Dynamic Member Selection); and FDES (Full Dynamic Ensemble Selection). The main aim is to assess the impact of dynamic selection in the performance of classifier ensembles.

Through this exploratory analysis, it can be observed that the use of dynamic selection on both ensemble members and combination methods provides the most robust classifier ensembles. When using dynamicity in the selection of one parameter, the dynamic selection of ensemble members provided the best results. Finally, the static selection delivered the worst results, detected by the statistical test.

As future analysis, it is necessary to expand this analysis using different dynamic selection approach that are not based on region of competence. A statistical analysis comparing methods across different strategies is also needed. Finally, this empirical study was limited to 20 classification datasets. It is important to perform an analysis with more robust datasets.