Binary Flying Squirrel Optimizer for Feature Selection

de Oliveira Sementille, Luiz Fernando Merli; Rodrigues, Douglas; de Souuza, André Nunes; Papa, João Paulo

doi:10.1007/978-3-031-45392-2_4

Luiz Fernando Merli de Oliveira Sementille⁹,
Douglas Rodrigues⁹,
André Nunes de Souuza⁹ &
…
João Paulo Papa⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14197))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

532 Accesses

Abstract

Bio-inspired optimization algorithms aim to address the most diverse problems without the need for derivatives, and they are independent of the shape of the search space. The Flying Squirrel Optimizer belongs to the family of bio-inspired algorithms and simulates the movement of flying squirrels from tree to tree in search of food. This paper proposes a binary version of the flying squirrel optimizer for feature selection problems. To elucidate the performance of the proposed algorithm, we employed six other well-known bio-inspired algorithms for comparison purposes in sixteen benchmark datasets widely known in the literature. Furthermore, we employ the binary flying squirrel optimizer in selecting gas concentrations to identify faults in power transformers. The results expressed that Binary Flying Squirrell Optimizer can either find compact feature sets or improve classification effectiveness, corroborating its robustness.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

A Differential Squirrel Search Algorithm

Eight Bio-inspired Algorithms Evaluated for Solving Optimization Problems

Squirrel Search Optimizer for Solving Economic Load Dispatch Problem

1 Introduction

Finding the most cost-efficient route for transporting goods, the ideal amount of raw material to manufacture a product, or even the best way to drill oil wells are just a few examples of decision-making many companies face. In these cases and several other complex problems, mathematical optimization can be used to find feasible and optimal solutions.

Through mathematical models, optimization aims for the optimal solution among the feasible candidates in a search space guided by the objective function and the problem’s constraints. Given the nature of the objective function, classical mathematical optimization algorithms can be inefficient due to the function’s dependence on being continuous and differentiable in the search space. Furthermore, classical algorithms may lose performance in cases where the function is multimodal.

Bio- or nature-inspired algorithms emerge as an elegant alternative for solving complex optimization problems, mitigating the disadvantages of classical algorithms. Through the idea that natural processes are essentially optimal, many algorithms were created following metaphors of nature, such as Particle Swarm Optimization [10], Bat Algorithm [19], Flower Pollination Algorithm [18], Whale Optimization Algorithm [13], Butterfly Optimization Algorithm [2], and Jellyfish Search [4], among others. This group of bio-inspired algorithms offers near-optimal solutions in a reasonable time, an alternative to solving NP-hard combinatorial problems.

Artificial intelligence has benefited from using bio-inspired algorithms to optimize neural networks, support vector machines, or optimum-path forest models, among many others. In addition to hyperparameter adjustments to minimize misclassification, the dimensionality of datasets can negatively affect the classifier’s performance during training, increasing the computational burden and even decreasing accuracy.

One way to solve this problem is through dimensionality reduction techniques, i.e., selecting the most relevant features for the classification task. In other words, we want to remove irrelevant and redundant features to reduce the computational cost required during training and maximize the classifier’s hit rate. Over the last few years, bio-inspired algorithms have successfully addressed feature selection problems, for they can obtain good solutions in a reasonable time, even if the problem is complex.

The literature is vast and with promising results. Nakamura et al. [14] proposed the Binary Bat Algorithm (BBA) for feature selection. Although BA is effective for such a purpose, it lacks efficiency compared to other metaheuristics. Rodrigues et al. [16] introduced the Binary Cuckoo Search (BCS) to the same context, and Rodrigues et al. [16] designed a binary-valued Flower Pollination Algorithm (BFPA) to select the most critical sensors to person identification using electroencephalogram signals.

This paper proposes a binary version of the Flying Squirrel Optimizer (FSO) for feature selection, called BFSO. The FSO was proposed by Azizyan et al. [3] inspired by flying squirrels moving from one tree to another in search of food. This work’s main contributions lie in using a wrapper-based approach employing BFSO and the Naive Bayes classifier for the feature selection task. However, any other supervised classifier can be used. We considered 16 benchmark datasets to evaluate the proposed approach’s performance. The performance of the BFSO was also validated to select relevant gas concentrations to identify faults in power transformers.

In short, this paper figures the following contributions:

To propose the Binary Flying Squirrel Optimizer;
To evaluate the proposed approach for feature selection; and
To employ BFSO for fault diagnosis in power transformers using gas concentration.

The remainder of this paper is organized as follows. Section 2 presents the theoretical background concerning the flying squirrel optimizer, while Sects. 3 and 4 discuss the methodology and experimental results, respectively. Section 5 states conclusions and future works.

2 Flying Squirrel Optimizer

Flying Squirrel Optimizer [3] is a flying squirrel swarm-based algorithm whose population is given by $\mathcal{X} = \{\textbf{x}_1,\textbf{x}_2 \ldots , \textbf{x}_m \}$, where m is the number of squirrels and $\textbf{x}_i\in \Re ^n$ denotes a single possible solution. The proposed model follows two movements squirrels perform in their pursuit of food. The first simulates the awkward gait of squirrels when they are on the ground. This motion is modeled as random walks following a normal distribution $\textbf{r} \sim \mathcal {N}(\mu ,\,\sigma ^{2})$ in which $\mu $ is the mean position of all flying squirrels and $\sigma $ is given as follows:

$$\begin{aligned} \sigma = -\ln \left( 1 - \frac{1}{\sqrt{t + 2}} \right) ^2, \end{aligned}$$

(1)

where t corresponds to the current iteration.

In the second movement, the squirrels fly from one tree to another through a Lévy flight distribution, as follows:

$$\begin{aligned} L(\lambda , s, \alpha ) = \frac{\lambda \cdot \varGamma (\lambda )\cdot \text{ sin }(\lambda )}{\pi }\cdot \frac{\alpha }{s ^{1+\lambda }},\quad |s|\rightarrow \infty , \end{aligned}$$

(2)

where $\varGamma (\lambda )$ stands for the gamma function with index $\lambda $, $\alpha $ is a control parameter for the tail distribution ($\alpha = 1$), and s is the step size. According to Mantegna [12], for large steps $s \gg s_0 > 0$, s can be computed through a linear transformation as follows:

$$\begin{aligned} s = \frac{\psi }{\eta ^{\lambda -1}}, \end{aligned}$$

(3)

where $\psi $ and $\eta $ are drawn from a Gaussian distribution with zero mean and standard deviations $\sigma _\psi $ and $\sigma _\eta $ computed as follows

$$\begin{aligned} \sigma _\psi = \left[ \frac{\varGamma (1+\lambda )}{\lambda \varGamma ((1+\lambda )/2)}\frac{\sin (\frac{\pi \lambda }{2})}{2^{(\lambda -1)/2}}\right] , \sigma _\eta = 1. \end{aligned}$$

(4)

In this algorithm, $\lambda $ grows linearly at each iteration, increasing the length of steps of the Lévy flight. Moreover, $\lambda $ is computed as follows:

$$\begin{aligned} \lambda = \beta + (2 - \beta ) * ((t + 1) / T), \end{aligned}$$

(5)

in which $\beta $ is a user-configured parameter, and T corresponds to the maximum number of iterations. The position of each squirrel is updated as follows:

$$\begin{aligned} \textbf{x}_i(t+1) = \textbf{x}_i(t) + L(\lambda ,s,\alpha ) \cdot \textbf{r} \otimes (\textbf{x}_i(t) - \textbf{x}_{best}), \end{aligned}$$

(6)

where $\textbf{x}_{best}$ is the best solution found so far, and $\otimes $ denotes the pointwise multiplication.

2.1 Algorithmic Analysis

Algorithm 1 describes how FSO works in more detail. Lines 1–4 initialize each possible solution (squirrel) with random values uniformly distributed within [0, 1]. The fitness value of each solution is set to a great value in Line 4. This loop takes $\theta (mn)$ operations. The main loop in Lines 6–24 controls the FSO mechanism, taking $\theta (T)$ steps. As mentioned earlier, squirrels follow two main movements. The parameter $\sigma $ of the first (Eq. 1) is computed in Line 7, and variable $\lambda $, which concerns the second movement, is obtained in Line 8.

The inner loop in Lines 10–19 computes a new position for each squirrel (Lines 13–14), evaluates its fitness value (Line 15), and updates its position if it has a better solution (Lines 16–19). These steps take $\theta (mn)$ calculations. Last but not least, Lines 20–24 update the best solution whithin the swarm. This loop takes O(mn) operations. The overall complexity is given by $\theta (Tmno)$ operations in which $\theta (o)$ represents the complexity to compute the fitness function.

3 Methodology

Table 1 presents the datasets used to evaluate the robustness of BFSO in the context of feature selection. Note that for this work, we used publicly available datasets in the UCI repository^{Footnote 1} and six datasets, i.e., the last six items presented in Table 1, which will be used to validate the BFSO in the context of identifying faults in power transformers. The data used to form these six data sets were obtained from IEC TC10 [1] and scientific papers [5]. One can observe the different scenarios, i.e., datasets with varying size, from different domains, and with a wide range concerning the number of features. Each dataset was split into training, validation, and testing sets using a 2 : 1 : 1 ratio. The pipeline of the proposed approach is depicted in Fig. 1, in which Fig. 1a ilustrates the optimization process to find the best subswt of features and Fig. 1b ilustrates the final evaluation step, where the selected features are employed to classify the test subset.

Table 1. Datasets used in the comparative study.

Full size table

At each iteration, the agent’s position in the search space is updated, and the fitness value is computed by evaluating the objective function. Given that agents move through the search space with real values, we propose to use a transfer function to convert real to binary values before evaluating the objective function. The transfer function is given as follows:

$$\begin{aligned} \textbf{x}_i & = & \left\{ \begin{array}{ll} 1\quad &{} \text{ if } T(\textbf{x}_i)>\phi \text{, } \\ 0&{} \text{ otherwise } \end{array}\right. \end{aligned}$$

(7)

in which $\textbf{x}_{i}$ represents the i-th agent, $\phi \sim \mathcal {U}(0, 1)$, and $T(\cdot )$ stands for a transfer function described as follows:

$$\begin{aligned} T(\textbf{z}) = \frac{1}{1 + e^{-\textbf{z}}}. \end{aligned}$$

(8)

After applying the transfer function, a binary vector will originate a new subset, selecting only the representative characteristics of the original set. This process is accomplished by multiplying the binary vector that represents the agent’s position by the feature vector of the data set, i.e., the position of the vector that has the value 1 indicates that the characteristic corresponding to that position in the feature vector will be selected to compose the new subset. On the other hand, a 0 value indicates the feature will not be selected.

Therefore, with a new training and validation subsets in hand, the classifier is trained on the new training set and classifies the samples of the new validation set. The classifier’s hit rate is a fitness value to guide the optimization process. The feature subset that achieves the highest accuracy in this process will be stored and used later to train the classifier once again and classify the test subset to determine the real accuracy of the model. We used Naïve Bayes classifier in this paper, for it is parameterless and fast for training^{Footnote 2}.

Regarding FSO running parameters, we used the same recommended by Azizyan et al. [3], i.e., $\beta =0.5$, and the number of agents and iterations were set to 30 and 60, respectively. Once metaheuristics are stochastic processes, we performed 20 independent runs to allow the calculation of the mean and standard deviation. Besides, different seeds were generated for each run so that distinct samples constituted the division of training, validation, and test subsets.

To provide robust analysis, we chose five widely known bio-inspired algorithms^{Footnote 3} for comparison purposes with the BFSO:

Binary Aquila Optimizer (BAO) [11];
Binary Bat Algorithm (BBA) [14];
Binary Firefly Algorithm (BFA) [6];
Binary Flower Pollination Algorithm (BFPA) [16]; and
Binary Particle Swarm Optimization (BPSO) [9].

It is worth noting that the values of the hyperparameters of each algorithm are the same proposed by their respective authors.

4 Experimental Results

Table 2 presents the F1-score obtained using the methodology described in Sect. 3 with regard to the final evaluation, as illustrated in Fig. 1b. The values are the mean and standard deviation for each of the seventeen benchmark datasets. Note that the values highlighted in bold represent the best results for each dataset. BFSO achieved the highest mean values in twelve datasets, followed by BFA, which obtained the highest mean values in Arcene and BASEHOCK datasets. BAO obtained the highest average value only in the Wine set. The highest mean value in the PCMAC set was obtained in the original set, i.e., without feature selection, indicated in the table using the Naïve Bayes (NB) classifier.

Table 2. Results concerning the test set over all datasets.

Full size table

Figure 2 illustrates the average number of features selected for the Arcene, BASEHOCK, Coil20, ORL, Segment, and Spambase datasets. BFSO selected the lowest number of features in the Spambase set compared to the other bio-inspired algorithms and was the second-best technique in the ORL and Coil20 datasets, second only to BAO. Furthermore, it is worth the caveat of the size of the standard deviation of the BAO in these datasets. In the Arcene and BASEHOCK datasets, BFSO was not among the best feature selection algorithms.

Moreover, we performed the Friedman test [7, 8] with a significance of 0.05 ($5\%$) to assess whether the average F1-score obtained using each bio-inspired algorithm is similar, i.e., null hypothesis $H_0$ for this test. Then we employed the Nemenyi post-hoc test [15]. Figure 3c illustrates the test performed for the Coil20 dataset, where it can be seen that BFSO performed similarly to BFPA. In the Segment set, illustrated in Fig. 3e, we notice the similarity in the performance of BFSO with BFPA and BAO. Finally, in the Spambase set, illustrated in Fig. 3f, note the similarity between BFSO, BFPA, BAO, and BPSO.

Additionally, the Wilcoxon Signed-Rank test [17] with a significance of 0.05 ($5\%$) was performed to validate the data obtained more robustly. Table 3 presents the test result considering F1-score metric obtained during the 20 independent runs. The symbol $=$ indicates that the result obtained after feature selection using the bio-inspired algorithms performed statistically similar to that obtained using the original dataset, i.e., we accept the null hypothesis $H_0$. In contrast, the symbol $\ne $ indicates that the results were statistically different, i.e., we rejected the null hypothesis. One may observe that BFSO performed statistically similar to the NB only in the Caltech101 set. Based on the BFSO results shown in Table 2 together with the Wilcoxon test obtained, BFSO obtained the best result by significantly increasing the classifier’s hit rate when removing degrading features.

Table 3. Results concerning the Wilcoxon Signed-Rank test set over all datasets.

Full size table

Following, we tested the performance of bio-inspired algorithms in Dissolved Gas Analysis (DGA) datasets for the task of gas concentration selection for fault identification in power transformers. Table 4 presents the mean values and the standard deviation for each of the six datasets. Also, note that the top performers are highlighted in bold. NB obtained the highest average values in the datasets 1069_5gt, 1069_7gt, 1143_5gte, and 1143_7gte. BBA and BFA obtained the highest mean values in the 1086_5ge and 1086_7ge datasets, respectively.

Figure 4 illustrates the mean of the selected features for the datasets 1069_5gt, 1069_7gt, 1086_5ge, 1086_7g, and 1143_5gte, and 1143_7gte. It is important to note that the bio-inspired algorithms selected, on average, half of the features of the original set, i.e., even if there is a little loss in the hit rate, the computational cost for training can compensate in these cases, as well as the cost for feature extraction.

Table 5 present the Wilcoxon Signed-Rank test on DGA datasets. All bio-inspired algorithms performed similarly to NB statistically except in the 1143_5gte set and the BPSO in the 1143_7gte set. The difference in the hit rate obtained by the NB to the bio-inspired algorithms was insignificant, which further emphasizes the advantage of training in reduced datasets.

Table 4. Average F1-score values considering the comparison among the bio-inspired algorithms on DGA datasets.

Full size table

Table 5. Wilcoxon Signed-Rank test for the bio-inspired algorithms in comparison to Naíve Bayes classifier on DGA datasets.

Full size table

5 Conclusions

In this work, we propose the binary version of the Flying Squirrel optimizer. Considering the feature selection task, we validated its robustness and performance on sixteen benchmark datasets. Next, we employ the BFSO to select gaseous concentrations for fault identification in power transformers. For BFSO performance comparison purposes, we used the bioinspired algorithms BAO, BBA, BCS, BFA, BFPA, and BPSO.

The results showed that the BFSO can greatly reduce the set of features in all used datasets. Furthermore, in some cases, it achieved better predictive performance than the other bioinspired algorithms used for comparison.

Thus, the performance of the BFSO demonstrated in the feature selection task makes it a viable tool for its performance and low complexity. For future work, one idea would be to use chaotic maps for initializing the flying squirrels and employ opposition-based learning further to improve the exploration and exploitation of the algorithm.

Notes

1.
https://archive.ics.uci.edu/.
2.
It is worthy to say that any other supervised classifier can be used. We recommend models that figure a reasonably efficient training step, for the fitness function might be evaluated several times during the optimization process.
3.
The algorithms used for comparison purposes and FSO are part of Opytimizer library, which contains several implementations of metaheuristics in Python. The Opytimizer library is available in: https://github.com/gugarosa/opytimizer.

References

IEC 60599:2022 Mineral oil-filled electrical equipment in service - Guidance on the interpretation of dissolved and free gases analysis. IEC, Geneva, Switzerland, 4 edn. (2022)
Google Scholar
Arora, S., Singh, S.: Butterfly optimization algorithm: a novel approach for global optimization. Soft. Comput. 23(3), 715–734 (2019). https://doi.org/10.1007/s00500-018-3102-4
Article Google Scholar
Azizyan, G., Miarnaeimi, F., Rashki, M., Shabakhty, N.: Flying squirrel optimizer (FSO): A novel SI-based optimization algorithm for engineering problems. Iranian J. Optimiz. 11(2), 177–205 (2019)
Google Scholar
Chou, J.S., Truong, D.N.: A novel metaheuristic optimizer inspired by behavior of jellyfish in ocean. Appl. Math. Comput. 389, 125535 (2021)
MathSciNet MATH Google Scholar
Equbal, M.D., Khan, S.A., Islam, T.: Transformer incipient fault diagnosis on the basis of energy-weighted dga using an artificial neural network. Turk. J. Electr. Eng. Comput. Sci. 26(1), 77–88 (2018)
Article Google Scholar
Falcón, R., Almeida, M., Nayak, A.: Fault identification with binary adaptive fireflies in parallel and distributed systems. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 1359–1366. IEEE (2011)
Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Article MATH Google Scholar
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11(1), 86–92 (1940)
Article MathSciNet MATH Google Scholar
Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 5, pp. 4104–4108 (1997)
Google Scholar
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN 1995-International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Google Scholar
Li, L., Pan, J.S., Zhuang, Z., Chu, S.C.: A novel feature selection algorithm based on aquila optimizer for covid-19 classification. In: Shi, Z., Zucker, J.D., An, B. (eds.) Intelligent Information Processing XI, pp. 30–41. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-03948-5_3
Chapter Google Scholar
Mantegna, R.N.: Fast, accurate algorithm for numerical simulation of lévy stable stochastic processes. Phys. Rev. E 49, 4677–4683 (1994)
Article Google Scholar
Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016). https://doi.org/10.1016/j.advengsoft.2016.01.008
Article Google Scholar
Nakamura, R.Y.M., Pereira, L.A.M., Costa, K.A., Rodrigues, D., Papa, J.P., Yang, X.S.: BBA: a binary bat algorithm for feature selection. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 291–297 (2012)
Google Scholar
Nemenyi, P.: Distribution-free Multiple Comparisons. Princeton University (1963)
Google Scholar
Rodrigues, D., et al.: BCS: a binary cuckoo search algorithm for feature selection. In: IEEE International Symposium on Circuits and Systems, pp. 465–468 (2013)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Article Google Scholar
Yang, X.S.: Flower pollination algorithm for global optimization. In: International conference on Unconventional Computing and Natural Computation, pp. 240–249. Springer (2012). https://doi.org/10.1007/978-3-031-03948-5_3
Yang, X.S., Gandomi, A.H.: Bat algorithm: a novel approach for global engineering optimization. Eng. Comput. (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

São Paulo State University, Bauru, SP, Brazil
Luiz Fernando Merli de Oliveira Sementille, Douglas Rodrigues, André Nunes de Souuza & João Paulo Papa

Authors

Luiz Fernando Merli de Oliveira Sementille
View author publications
Search author on:PubMed Google Scholar
Douglas Rodrigues
View author publications
Search author on:PubMed Google Scholar
André Nunes de Souuza
View author publications
Search author on:PubMed Google Scholar
João Paulo Papa
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Douglas Rodrigues .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Oliveira Sementille, L.F.M., Rodrigues, D., de Souuza, A.N., Papa, J.P. (2023). Binary Flying Squirrel Optimizer for Feature Selection. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14197. Springer, Cham. https://doi.org/10.1007/978-3-031-45392-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-45392-2_4
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45391-5
Online ISBN: 978-3-031-45392-2
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

Binary Flying Squirrel Optimizer for Feature Selection

Abstract

Similar content being viewed by others

A Differential Squirrel Search Algorithm

Eight Bio-inspired Algorithms Evaluated for Solving Optimization Problems

Squirrel Search Optimizer for Solving Economic Load Dispatch Problem

Explore related subjects

1 Introduction

2 Flying Squirrel Optimizer

2.1 Algorithmic Analysis

3 Methodology

4 Experimental Results

5 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us