key: cord-0916024-g0pemyir
authors: EL‐Hasnony, Ibrahim M.; Elhoseny, Mohamed; Tarek, Zahraa
title: A hybrid feature selection model based on butterfly optimization algorithm: COVID‐19 as a case study
date: 2021-07-29
journal: Expert Syst
DOI: 10.1111/exsy.12786
sha: 611637b207f4f11d69c4ba550d3fe82650f61463
doc_id: 916024
cord_uid: g0pemyir

The need to evolve a novel feature selection (FS) approach was motivated by the persistence necessary for a robust FS system, the time‐consuming exhaustive search in traditional methods, and the favourable swarming manner in various optimization techniques. Most of the datasets have a high dimension in many issues since all features are not crucial to the problem, which reduces the algorithm's accuracy and efficiency. This article presents a hybrid feature selection approach to solve the low precision and tardy convergence of the butterfly optimization algorithm (BOA). The proposed method is dependent on combining the algorithm of BOA and the particle swarm optimization (PSO) as a search methodology using a wrapper framework. BOA is started with a one‐dimensional cubic map in the proposed approach, and a non‐linear parameter control technique is also implemented. To boost the basic BOA for global optimization, PSO algorithm is mixed with the butterfly optimization algorithm (BOAPSO). A 25 dataset evaluates the proposed BOAPSO to determine its efficiency with three metrics: classification precision, the selected features, and the computational time. A COVID‐19 dataset has been used to evaluate the proposed approach. Compared to the previous approaches, the findings show the supremacy of BOAPSO for enhancing performance precision and minimizing the number of chosen features. Concerning the accuracy, the experimental outcomes demonstrate that the proposed model converges rapidly and performs better than with the PSO, BOA, and GWO with improvement percentages: 91.07%, 87.2%, 87.8%, 87.3%, respectively. Moreover, the proposed model's average selected features are 5.7 compared to the PSO, BOA, and GWO, with average features 22.5, 18.05, and 23.1, respectively.

According to the FS system, the feature selection procedure consists of two main steps; firstly, search strategy and evaluation of sub-set quality secondly. The search strategy uses a method for selecting subgroups of features in the first stage. The next step consists of assessing the subset's quality as determined by the search strategy module using a classifier. On the other hand, three classes exist for feature selection strategies: wrapper, filter, and embedded-based techniques. The exhaustive search used in traditional methods for large data sets is not enough and takes a long time, so there are some limitations to look for the best variety of features. For example, if the feature size is d, it is hard to pick the required subset of features out of 2 d alternatives. The FS Wrapper method requires an internal classification to identify a more relevant subset of features, impacting its efficiency, particularly for massive datasets. Also, there are strategies for going backward and forward to incorporate or eliminate features that do not meet a broader range of specifications. According to these issues, the FS processes' efficiency is improved through metaheuristic algorithms (MA).

FS is a problem of optimization that aims to improve classification precision and reduce features number simultaneously. Metaheuristic methods are also promising alternatives to search in FS algorithms. These approaches are widely used to overcome various problems with optimization (Faris et al., 2020) . There is thus great potential for similar output when a near-optimal subset of features is found. Recently, MA have mainly been shown to mimic the collective behaviour of organisms. In several fields of optimization, these algorithms have brought significant progress. MA could be the better choice because they can have the best outcomes in a reasonable time. Often, it could be a suitable alternative for reducing the time-consuming search constraints.

Conversely, many MA have a high locality standard, a lack of variety, and imbalances of exploration and exploitation. MA are divided into two groups, namely single-based and population-based metaheuristics (p). An evolutionary algorithm (EAs) is a type of population-based metaheuristics (p) algorithms. Feature selection begins with a thorough search over the feature subset within the EAs and discovers a specific evaluation criterion as the most attractive feature between the major possible subclasses. If the feature set includes n features, a feature subsets examination using an efficient feature selection procedure is needed to decide the best subset. Since evolutionary computing technology has a global search option, it is used to fix these problems with the best outcome and an alternative to classical search methods. The selection function has been widely used for the optimization of particle swarm (PSO), genetic algorithms (GA) and genetic programming (GP), and ant colony optimization (ACO).

MA is suitable for a wide range of implementations, including FS. Some classic methods have been used to efficiently fix the FS problem using GA, PSO, and differential evolution (DE) . Moreover, modern MA algorithms such as competitive swarm optimizer (CSO), grasshopper optimization algorithm (GOA), gravitational search algorithm (GSA), and other algorithms have also been employed for FS. Since FS can be viewed as a problem of optimization, MA cannot manage all FS difficulties. This latter information is established according to no-free-lunch (NFL); consequently, the exploration of new alternative MA must continue (Yousri et al., 2020) . Moreover, many researchers have also tried stochastic methodologies to solve feature selection problems such as PSO, GA (Kabir et al., 2011) , (Bello et al., 2007) , artificial bee colony (ABC) (Wang et al., 2010) , and simulated annealing (SA) (Jensen & Shen, 2004) . dragonfly algorithm (DA) (Tawhid & Dsouza, 2018) and grey wolf optimizer (GWO) (Emary et al., 2016) are the latest algorithms efficiently utilized to fix problems of feature selection. BOA, a recently evolved optimization algorithm, has influenced the researcher's enthusiasm because of its reliability, simplicity, and robustness of addressing real-world and engineering efficiency. To fix global optimization problems, BOA mimics food search and butterfly matching behaviour. In contrast to other optimization algorithms, BOA shows excellent efficiency (Arora & Singh, 2019) . This metaheuristic based on the population can avoid local optima stagnation to some extent. It is also well able to converge towards the optimum. Arora and Singh (2017) utilized BOA to fix node locations in sensor networks wireless and compare the outputs with the firefly algorithm (FA) and PSO. Singh and Anand (2018) suggested a new algorithm for adaptive butterfly optimization to adjust the original BOA's sensory modality. This paper's significant contributions are summarized into five folds. Firstly, a binary version of a new hybrid model (BOAPSO) is proposed for feature selection. The proposed hybrid model combines the BOA's functionality and the PSO for capabilities of exploration and exploitation, respectively. With exploration capability for the search area, the BOA has an excellent global convergence capacitance than other optimization algorithms. The PSO is power for BOA through preserving the search agent's experience. Secondly, the proposed BOAPSO is transformed into a binary version using the sigmoid transfer function that approved many enhancements according to the literature. The binary version of the proposed BOAPSO is utilized to select feature subsets using a wrapper framework with the classifier of K-nearest neighbour (KNN) for the evaluation process. For evaluating the proposed binary BOAPSO, the proposed model is applied to 25 standard feature subset selection datasets from the UCI machine learning repository and COVID-19 dataset. The proposed model achieves a better result according to three performance metrics: classification accuracy, selected feature set, and computational time. The proposed binary BOAPSO is compared to GWO, PSO, and BOA. The outcomes demonstrate the supremacy of the proposed model BOAPSO.

Thirdly, algorithms begin with the initial random population by investigating feature selection using MA. The techniques for initialization depend on randomness or compositionality. The Cubic map is used for this work because It is one of the popular maps in chaotic sequence generation in many implementations. The chaotic movement is characterized by randomness, regularity, and ergodicity. These features will prevent an optimum local problem from locking up the algorithm when solving feature optimization issues, sustaining demographic diversity, and enhancing universal search capabilities. Fourthly, a control approach for the nonlinear parameter is used to update the proposed model's position updating process. The linear parameters do not translate the nature of optimization in the convergence to the optimal solution. Lastly, the proposed BOAPSO is compared to the recent works in most of the utilized datasets. Its supremacy is approved in terms of classification precision, chosen features, and computing time. The main contribution of this paper can be outlined as follows:

1. A new hybrid metaheuristic algorithm (BOAPSO) focused on the BOA and the PSO.

3. The cubic map is used for the initial population generation.

4. The nonlinear parameters utilized instead of the linear parameters in the native BOA.

5. The proposed binary BOAPSO evaluated by 25 datasets and approved its supremacy compared to the PSO, GWO, BOA, and some of the most recent related works.

6. The proposed BOAPSO applied to the COVID-19 dataset.

The remainder of the paper is arranged as follows: Section 2 introduces some of the previous works. Section 3 provides a background on the main concepts of the paper. Section 4 explains the proposed binary BOAPSO in detail, and Section 5 illustrates the outcomes and different relations. Section 5.4 presents some of the future search directions. Finally, the future work and conclusions are discussed in Section 6.

Because of its importance, many studies tried to enhance the feature selection process. Arora and Anand (2019) introduced paired alternatives of BOA to pick the optimum feature subset suitable for classification objects in wrapper procedure. The suggested binary algorithms were matched over 21 datasets available at the UCI repository using four high-performance optimization algorithms and with five approaches. Tubishat et al. (2020) suggested dynamic butterfly optimization algorithm (DBOA) as an enhanced version for feature selection issues. Two significant changes have been made in the central BOA: introducing a local search algorithm based on mutation (LSAM) to prevent local optima problems and LSAM usage to increase the variety of BOA solutions. Twenty UCI repository benchmark datasets have been included. The experiments have shown that DBOA significantly outperforms comparable algorithms. Rodrigues et al. (2020) suggested an individual, multi-and multi-objective paired alternatives of artificial butterfly optimization for feature selection. The trials were performed in eight common databases. The findings revealed that the binary single-objective is superior to the other meta-heuristic approaches, with a minimum number of chosen features. Regarding multi-and multi-objective function collection, both the suggested methods have done better than their single-objective meta-heuristic equivalents. Abualigah et al. (2018) presented a strategy for selecting features using the PSO algorithm (FSPSOTC) to address choosing features by generating a new subgroup of informative features. Attempts were performed using six regular text datasets with a variety of features. The findings demonstrated that the suggested approach enhanced the text's usefulness assemblage strategy by addressing a new subgroup of descriptive written features.

Yong Zhang et al. (2019) performed a feature selection process based on an unsupervised PSO algorithm, named a filter-based bare-bone particle optimization algorithm (FBPSO). Two techniques based on filter mode were suggested to improve the algorithm's convergence; the first one was a space-reduction method focused on the average of reciprocal content, and the second was a search approach for redundancy of features using a local filter. Experimental findings on specific standard datasets have demonstrated the supremacy and efficacy of the presented FBPSO. Qasim and Algamal (2018) suggested PSO along with the logistic regression method. Besides, a fitness function called Bayesian knowledge criterion (BIC) has been suggested. Experimental findings show the utility of the proposed approach to dramatically boost classification efficiency with minor features using various datasets. Furthermore, the outcomes confirmed that the recommended strategies had competitive efficiency relative to other known fitness functions. Too et al. (2019) addressed the problem of feature selection for electromyography (EMG) signal categorization; a personal best mode for binary particle swarm optimization (PBPSO) was suggested for solving this issue. Sadeghian et al. (2021) suggested binary butterfly optimization algorithm for information gain (IG-bBOA) to solve binary butterfly optimization algorithm in the form of an S (S-bBOA) constraints. The outcomes were based on six routine UCI registry datasets. The results demonstrated the efficacy of the suggested approach in enhancing the precision of classification and choosing the best optimum features of the subset with minimal features in most situations. Li et al. (2019) established and developed BOA by incorporating the cross-entropy (CE) approach into the initial algorithm. The suggested solution's efficiency depended on 19 common benchmark assessment mechanisms and three widespread engineering design difficulties. The test function results referred to the supremacy of the proposed algorithm, as it could deliver promising results for local optima avoidance, enhanced discovery, and exploitation reduction. Abualigah and Khader (2017) suggested a PSO algorithm with genetic operators for an FS problem. The k-means clustering approach was used to determine the utility of the function subsets obtained. The results were obtained by analyzing eight standard text datasets with varying features. Ibrahim et al. (2019) suggested a crossbred optimization approach for the feature selection issue, coupled with a slap swarm algorithm and a particle swarm optimization method (SSAPSO). To test the efficacy of the proposed algorithm, it was examined across two experimental ranges; firstly, it was compared with other related methods. Secondly, SSAPSO was utilized to evaluate the optimal features set using separate UCI benchmark datasets. Tawhid and Dsouza (2018) suggested a hybrid binary dragonfly and enhanced particle swarm optimization (HBDESPO) for manipulating the feature selection issue.

Indicating to the theorem of NFL (Wolpert & Macready, 1997) , there is no particular algorithm suitable for all forms of FS problems. The algorithm's success in solving a specific feature selection problem does not ensure comparable results when applied to other FS issues. From this view, there are several possibilities for developing more efficient FS systems by introducing novel algorithms or developing derivatives of existing systems.

The principles used in this article include the feature selection procedure, particle swarm optimization algorithm, the butterfly optimization algorithm, and a comparison between the BOA and different MA, which are covered in detail in the following subsections.

Among the most common methods proposed in machine learning is feature selection. Feature extraction aims to eliminate redundant features and choose the most appropriate features from among the main features to enhance learning algorithms' effectiveness. The two most important criteria are the selection and construction features in machine learning (ML). The two variables are generally very time-consuming and complex tasks, as the characteristics need to be manually designed. Attributes are aggregated, merged, or separated to generate features from raw data (Moslehi & Haeri, 2020) .

It is typically challenging to perform a comprehensive search to locate the most features in computing costs. The reduction of features has therefore been a significant problem in machine learning pattern recognition. This technique enjoys excellent attention in several applications, including regression and categorization; since there are typically many features in these applications, most of them lead to decreased performance precision or inefficient. Deleting these features reduces the computational complexity, in addition to increasing accuracy (Jovi c et al., 2015) .

Feature selection techniques aim to get the most useful subset of the N feature and 2 N subsets. In both approaches, a subset is chosen as an answer such that the evaluation mechanism can be refined depending on the application and form of description. While each system attempts to identify the most critical features in terms of the extent of potential answers, seeking an optimal solution is challenging and relatively expensive in large and medium-sized datasets. Three key types, namely a wrapper, filter, and embedded versions, can be used to define the method of selecting features, as seen in Figure 1 . To solve feature selection, the statistical methodology is employed to feature compilation in the filter process (Moradi & Gholampour, 2016) . Filter modes assess and pick the features' significance utilizing a rating system that eliminates unnecessary features. Filter approaches have been demonstrated to be rapid, scalable, simple computationally, and independent. Similar methods are classified into two types: multivariate filter methods and univariate filter methods.

Wrapper approaches are based on a particular machine learning algorithm when selecting the feature. The chosen feature subset is used to train the learner directly in the screening process and determine the advantages of the feature subset depending on the learner test collection results.

The approach is not as practical as the filter approaches, but the chosen subset size of features is comparatively slight. In this approach, the generation algorithm (GA) develops each new feature subset, and the search process determines this output. In general, the wrapper method is more efficient than the filter approach, but it is more computationally complicated (Tang et al., 2014) .

Differences from most ways of choosing features are the manner of learning interaction and feature selection. The methods of filtering do not integrate learning. Wrapper approaches use machine learning to calculate the consistency of feature subsets without incorporating awareness of the essential essence of the regression or classification method and can thus be used with any learning machine. Unlike filtering and wrapper, embedded approaches do not isolate learning from the feature's selection aspect-the class structure of functions represents a fundamental role (Lal et al., 2006) .

The concept of the particle swarm optimization (PSO) algorithm is established by the behaviour of social foraging of certain species, such as the schooling behaviour of the fish and the flocking behaviour of birds. The PSO algorithm is made up of particles; each particle has its velocity and position. The objective function will be examined after each position updates. Particle clusters meet over time surrounding single or multiple optima with a mixture of known locations in the search space (Brownlee, 2011) .

PSO is a statistical approach that improves the problem by recursively trying to enhance the nominee solution for a given quality metric (Golbon-Haghighi et al., 2018) . PSO has many likenesses with evolutionary programming methods such as genetic algorithms. PSO's main power is its fast convergence, distinguishing it from specific global optimization algorithms such as simulated ringing, genetic algorithms, and other optimization methods (Umarani & Selvi, 2010) .

The simplest version of the PSO algorithm operates by providing a population or a swarm of nominee solutions (named particles). PSO improves the problems by generating a population of particles and carrying them around in the quest space using simple mathematical formulas to calculate the particle's location and velocity. Its local best-known position guides every particle's movement. It is also driven to the most prominent places in the search-space, modified as better positions get from other particles. This is supposed to move the population towards the correct solutions to the assigned problem (Yudong Zhang et al., 2015) .

Particle motions rely on the best local and global in each iteration; each particle has its best local (the best location obtained by that particle) and the best global (best position from all the local best) (Mathiyalagan et al., 2010) . Parameters for optimization methods are presented in Table 1 An n-dimensional vector, Xi = (xi 1 , xi 2 …xi n ) represents the location of i th particle in the whole population. Also, the n-dimensional vector Vi = (vi 1 ; vi 2 …vi n ) represents the velocity of the specified particle. And, Pi = (pi 1 , pi 2 ,…pi n ) denotes the best-visited place previously of the i th particle. 'g' is used as the best particle index overall population. Equation (1) is used to update the velocity of the i th particle:

And the location of this particle is calculated using Equation (2):

as i = 1; 2…S; and S is the swarm's size; c 1 and c 2 are factors of constant cognitive and social scaling. The inertia weight (w) has been reported previously, so Equation (3) for the velocity update becomes:

The PSO algorithm has been considered in this paper, so it will be discussed according (Sarangi & Thankchan, 2012) . The pseudo-code of the PSO is given in Algorithm 1.

There was no inertia weight in the initial PSO, but the inertia weight was added to boost performance by researchers. Then efficiency is attempted to improve by carrying out various initialization methods. Researchers are still working on the global best particle to escape the nearby minima. For this reason, the different mutation operators are added to boost the efficiency of the PSO (Imran et al., 2013) . The flowchart of PSO is presented in Figure 2 .

Parameter Denotation X i k Current position of particle i at iteration k X i k+1 Position of the particle i at iteration k + 1 V i k Velocity of particle i at iteration k V i k+1 Velocity of particle i at iteration k + 1 W Inertia weight between 0.9 to 0.1 c j Positive acceleration coefficients; j = 1, 2 rand i Random number between 0 and 1; i = 1, 2 pbest i Best position of particle i gbest Position of best particle in a population Algorithm 1

Input population size (S), particle position (X), inertia weight (W), and learning parameters {c 1 ,c 2 }, solution dimension (d), and maximum number of iteration (T max ).

Output optimum solution (g best )

1. Start 2. while t < T max 3. Evaluate each particle fitness 4. For i = 1: S

Find p best (the better value for each particle from start)

Find g best (the overall best value) 7. For j = 1: d

Velocity update using Equations (1) and (3) 9.

Positions update using Equation (2) 10. (Saccheri et al., 1998) . Butterflies have some senses like sight, smell, touch, taste, and hearing used to locate food and mating partner. Other benefits from these senses include hiding from predators, transporting from one location to another, and laying eggs in suitable places. The smell is the most significant of all these senses that allow butterflies to locate food, often from a long distance, usually nectar. Figure 3 displays some images of the butterflies.

Nature-inspired MA have drawn a great deal of interest from numerous researchers in the past (Yang, 2010 with the butterfly's fitness, measured using the problem's objective function. This means that if a butterfly moves in the quest space from one location to another, its fitness will update. There are various butterflies in the neighbourhood that can feel with the fragrance produced by the butterfly. If a butterfly senses the fragrance of the best butterfly in the quest space, it will work its way towards it, and this stage is referred to as the BOA global search stage.

In the second example, if a butterfly cannot identify another butterfly's scent in the search field, it will take random steps, referred to as the local search stage. The scent in BOA is formed as a function of the physical strength of the stimulus, as seen in Equation (4):

where pf i is the relative intensity of the scent, that is, how strongly other butterflies in the region perceive the fragrances of ith butterfly, the sensory modality is denoted by c, the stimulus intensity is I and a is the strength exponent that varies with modality, and accounts for the absorption degree. In BOA, an artificially positioned butterfly can be modified using the optimization procedure, as shown in Equation (5):

where x i t represents the solution vector for ith butterfly in iteration sequence t and F i describes scent, which x i th butterfly uses to upgrade its location throughout all iterations. In addition, the algorithm includes two key steps: local and global search. During the stage of global search, the butterfly moves closer to the best solution g*, as illustrated in Equation (6):

In this case, g* is the best solution of all current iteration solutions. pf i represents i th butterfly's perceived scent. Equation (7) can be used to describe the local search phase:

where x j t and x k t come from the space solutions j th and kth butterflies. If x j t and x k t are in the same swarm and r is a random number in range between 0, 1. So Equation (7) is a haphazard local stroll. BOA employs a transfer probability p to transition from global search to local search. The pseudo-code of the BOA is given in Algorithm 2.

This section provides the steps and the sequence of the proposed model in detail-the block diagram for arranging the proposed model processes displayed in Figure 4 . As seen in Figure 4 , the model first initialized with a set of parameters and random solutions using the cubic map sequence. The next step involved the objective function evaluation of the population initialized by the cubic map. Finally, the optimization process or the position updating is performed for every candidate solution using the hybrid between the butterfly optimization algorithm and the particle swarm optimization algorithm (BBOAPSO). A novel hybrid HPSOBOA is proposed in this section to combine the advantages of the three improvement strategies presented in this paper, which are the cubic map for the initial population, nonlinear parameter control strategy of power exponent a, PSO algorithm, and BOA algorithm. These steps are provided in algorithm three and discussed in detail in the following subsections.

The first step in the proposed model is the method by which n butterflies, or search agents, are initialized in random form. Each search agent is a workable alternative, and length D equals the number of features in the initial dataset and represents a desirable solution. The potential solution of the dataset, including d features, an example is shown in Figure 5 . To this end, M records and D features are first loaded with data.

Therefore, the aim is to identify the feature chosen from the D number of features to reduce the problem dimension provided that the main concern is not damaged. It is, therefore, essential to decide which one of these D features can maximize the accuracy of classification.

The problems with selecting the classification features are summarized by choosing a specific subset of the features that maximize the classification accuracy. Initially, binary values (0 and 1) were set in each solution. We must therefore define the relevant characteristics (one value) and

ignore others (zero). There are several random initializing methods such as distributed sampling (DS), chaotic maps, etc. Recently, chaotic arrays were used instead of random number sequences in many applications. The chaotic movement characterized by regularity, randomness, and ergodicity. These features will avoid a local optimization problem-free algorithm for addressing function optimization issues, maintaining population diversity, and improving global search functionalities. Chaotic maps have many forms, like logistic maps, tent maps, circle map, cubic map, gauss map, ICMIC map, and sinusoidal Iterator (Lu et al., 2014) .

In nonlinear systems, chaos is a relatively common phenomenon. Cubic map is one of the most widely used maps for chaotic sequences generated in several applications. This map is defined formally by Equation (8) (Rogers & Whitley, 1983 ):

where the ρ denotes to control parameter, in Equation (8), the cubic map sequence is in (0, 1). when ρ = 2.595, the chaotic variable x k+1 produced at this time has better ergodicity. A graphical presentation of the cubic map is in Figure 6 .

The proposed BPSOBOA binary feature selection model, a combination of separate PSO and BOA, is discussed in this section. The most significant gap between PSO and BOA is the generation of new individuals. The PSO algorithm's disadvantage is that it covers a small space to resolve Algorithm 2

The butterfly optimization algorithm 

For j = 1: d

Update the scent of current search agent by Equation (4) 6. End for

End for

Find the best f 9.

For i = 1: S

For j = 1: d

Set r as a random number in [0,1]

If r < p, then

Move closer towards best location by Equations (5) and (6) 14. Else

Move with random steps using Equations (5) and (7) 16.

End for

End for

Update the value of c and Update the value of a using Equations (11) and (12) 20. End for

F I G U R E 5 Problem-making method for the proposed model

The block diagram for the proposed feature selection model problems with high-dimensional optimization. To consolidate the two algorithms' benefits, both algorithms' functionality is combined, and the algorithms are not used one after the other. In other words, the method used to produce the results of these two algorithms is heterogeneous.

The following equations establish the way to generate the following position values:

where The fragrance (f i ) can be formulated as follows:

where c represents the sensory modality, f i represents the perceived magnitude of fragrance; a means the power exponent based on the degree of fragrance absorption, I define the stimulus intensity, and a represents the power exponent based on the degree of fragrance absorption.

The proposed BBOAPSO feature selection model Input agent position X, total number of iteration T max , population size (N), swing factor C, feature dimension d, control parameter a, switch probability p, sensory modality c, the initial value of power exponent a, and learning factors c1,c2. (9) 20. Else

Update the velocity using Equation (13) 22. Update the position by Equation (14) 23. (11) 27. Update c according to Equation (12) 28. Update W according to Equation (15) 29. End for

We can see the role of the power exponent (a) in the ability of BOA is essential to find the best optimization. If a = 1 indicates that no scent is absorbedthat is, other butterflies will perceive the scent issued by a particular butterflythus narrowing the search range and enhancing local algorithm exploration. If a = 0 is not perceivable to any Butterfly, the fragrance will expand to include a search range, that is, improve the algorithm's global exploratory capability. However, if a = 0.1, it cannot effectively balance the basic BOA search capabilities and take a fixed value. Consequently, we propose an Equation (11) 

whereas a s and a f represent the initial value and final value of the parameter, a, μ is the tuning parameter, and T max represents the maximum number of iterations.

A value of c sensory morphology in the range [0,1] can be theoretically taken. Its value is however, dependent upon its specificity in the iterative BOA process as the optimization problem. The sensory modality c can be formulated as Equation (12) in the optimal research phase of the algorithm:

where T max is the maximum number of iterations of the algorithm, and the initial value of parameter c is set to 0.01.

Global and local food search can take place, as well as a butterfly matching partner in nature. A switch probability p is therefore set to switch between global search and local intensive search. The BOA generates a [0,1] number on a random basis, compared to a p-value to determine whether a global search is to be performed or a local search conducted. If the random number is less than p, the position will be updated according to Equation (9), otherwise, the position updated according to Equations (13) and (14).

where v t and v t+1 represent the velocity of i th particle at iteration number (t) and (t + 1). Usually, c1 = c2 = 2, r 1 , and r 2 are the random numbers in (0, 1). The w can be calculated as Equation (15):

where w max = 0.9, and w min = 0.2, and T max represents the maximum number of iterations. Max and Min are the maximum and minimum values in the continuous feature vector, respectively.

The cubic map sequence

Feature selection can also be seen as a problem of multi-target optimization. The best solution in BOAPSO includes the minimum number of characteristics with the most significant classification accuracy. The fitness function, therefore, has been formulated as Equation (16) (Abdel-Basset et al., 2020) . Based on this, the fitness function of the solutions assessment is designed to balance the objectives as follows:

where jSj represents the cardinality of the selected features set, γ R D ð Þ is the error rate of the classifier, jDj represents the total features cardinality of the original dataset. α and β are measurement parameters that reflect the value of the accurate classification and the selected features set, α ϵ [0,1] and β = 1Àα, and these values have been determined based on the evaluation function.

The Euclidean distance (Gou et al., 2019) , used in KNN, is thus evaluated as in Equation (17) to evaluate the K neighbours adjacent to this sample:

where Q i and P i are represented for a given record in the dataset for specific attributes, and i is a variable from 1 to d. A common method is to save part for the validation dataset, and the rest can be used for the classification training. However, if we do, we can probably confront the overfitting problem, when the accuracy of a particular classifier is more than the test data for learning. Cross-validation is a popular way to reduce the overfitting problem. K -fold cross-validation with K = 10 is implemented in this paper. It is assumed that the samples should be divorced in K folds or partitions of the roughly same size. The classifier is trained in K-1 and then tested for predicting each sample to which class label the remainder of the partition belongs. The proportion is then evaluated for the inappropriate class mark estimate known as the percentage error rate of classification. The results of various data rounds are statistically accurate on average (Wong & Yeh, 2019) .

The generated values of the search agent positions are continuous. Since there are conflicts with the standard binary format for selecting features, it is not directly applicable. The best features are chosen to improve a specific classification algorithm's performance and accuracy, according to the feature selection problem with values (0 or 1). By transforming values from continuous to binary, the result/calculated search space will be changed. As seen in Figure 7 , the sigmoidal function is an S-shaped function example (Abdel-Basset et al., 2020).

Any continuous value can be translated into binary by the sigmoid function using the following Equations (18) and (19):

where x si in the S-shaped search agent is a continuous value (feature), i = 1,…, d, and x binary value can be 0 or 1 by random number R ϵ [0,1] value compared to x si .

To validate the proposed model's performance for COVID-19 detection, a set of experimental series are performed. 

The proposed algorithm (BOAPSO) was implemented on 25 datasets received from the UCI repository to evaluate the potency of this approach (Dheeru & Taniskidou, 2017) . Table 2 introduced these datasets.

The data set contains various variables, classes, and instances to prepare an overall and broad survey of the proposed and used approaches to selecting features. The primary reason why these data sets are chosen is that they contain several attributes and instances, which are a variety of problems where the binary proposed approach is tested.

Moreover, to assess the performance of the proposed BOAPSO in the high dimensional search areas, a set of high dimensional data sets is also selected. Each data set is cross-validated for evaluation purposes. The dataset is divided into K À 1 folds for cross-validation training, and the rest of the folding is used for testing. This is repeated for M times. Thus, there are K Â M times for each data set evaluated for each optimization algorithm. Data are distributed into parts of equal size for training, testing, and validation. The training portion is devoted to the classifier training during the optimization process, while the validation portion is used to evaluate the classifier performance during the optimization period. The test fraction is used to assess the selected feature of the trained classifier. 1. Classification accuracy: Is an indicator that explains how the classification is accurate given the set of features chosen; that is, the correct classification number of features and the precision of classification in this study is determined as Equation (20):

where M is the number of times the optimization algorithm is performed, N indicates the number of test set points, Ci indicates the output class label of a unique Data point i, Li is a reference class label for i and corresponds to the comparison function, which gives output 1 if two labels are the same and output 0 otherwise.

2. Average selection size: Represents the average of the selected features for M times and can be evaluated as Equation (21):

where size(x) denotes to the selected feature size in the testing data set. 3. Average computational time: Is the overall runtime of an individual optimization algorithm in seconds over different runs, and it can be calculated using Equation (22):

where M is the number of runs for an optimization algorithm o, and RunTime o,i for the actual computational time for the optimization algorithm o at run number i.

Compared to standard PSO, standard BOA, and standard GWO, the efficiency of the proposed algorithm is compared with other common modern functional selection algorithms.

To ensure the contrast of the algorithms, the algorithms have been taken out of the literature. The KNN classifier is a popular wrapper for selecting features and is known as a kind of learning algorithm that is generally regulated by the simplicity and rapid implementation of the classification. Each algorithm with a random seed comprises 20 different runs. For the following studies, the maximum number of iterations is 20 for the k-fold cross validation norm (10-fold cross-validation). The data set was broken down into 10 folds in a ratio of 9:1 between the train and test data. For classification KNN with K = 5, training data is used while test data are kept separate. The number of search agents for the algorithm is 5. A solution is found in every search agent. Selected the best literature values for α and β were performed on some of the data sets in several observational experiments. α and β are therefore values of 0.9 and 0.1, respectively. Table 3 describes the parameters of the proposed algorithm in collaboration with the GWO, PSO, and BOA. Table 4 describes the results of the proposed approach as regards to classification accuracy, computational time, and the feature resulting after the irrelevant features have been eliminated. It is obvious that the proposed BOAPSO algorithm was far more efficient to classify accurately than the original data set. Figures 8 and 9 show a comparison between the classification accuracy of the proposed BOAPSO and the selected features as opposed to the accuracy and all the features from the original datasets.

This section's analysis illustrates that BOAPSO has superior performance in terms of classification accuracy, average selection size, and computational time compared to other approaches. The suggested model's performance is compared to various state-of-the-art methods that are widely used in the literature to resolve the problem of feature selection. Regarding the accuracy of classification, Table 6 describes the initial dataset results, PSO, GWO, BOA, and the proposed BOAPSO. As shown in Table 5 , all other approaches in all datasets are outperformed by the proposed model, clearly demonstrating the proposed approach's strength. The native BOA is second in performance than GWO. In Table 7 , the proposed BOAPSO has the best computing performance on all datasets, while the general BOA was second in most data sets with higher performance and third is the PSO. Compared to state-of-the-art techniques, the proposed BOAPSO has shown competitive calculation speed. Consequently, the BOAPSO has been well performed relative to state-of-the-art methods in general.

All results that include the classification accuracy, the selected features, and the computational time are visualized in Figures 10 and 11, respectively. The convergence speed is the other factor in discussing, testing, and evaluating this recommended BOAPSO algorithm. The convergence curve based on the best fitness function and mean convergence curves for the proposed BOAPSO has been generated for three data set with high dimensionality to illustrate the effectiveness of the recommended BOAPSO, as seen in Figure 12 . The proposed BOAPSO algorithm shows highly qualified performance from Figure 13 by inspecting the minimum fitness functions' convergence curves.

Compared with state-of-the-art approaches in terms of classification accuracy, the average number of attributes selected, and computational time, the proposed BOAPSO shows superior performance. BOAPSO is compared to better validate the performance of the proposed approach, with some newly developed techniques called fractional-order cuckoo search using heavy-tailed distributions (FO-CS) (Yousri et al., 2020) and the native Binary butterfly optimization approaches (Arora & Anand, 2019) . Table 8b presents and visualizes the classification accuracy provided by the proposed model's selected features and comparative methods (b). It is easy to remember that in most of the standard data sets used in this research, the proposed (BOAPSO) over-performed all the other approaches. This finding demonstrates the ability of the BOAPSO to explore the search space and locate the ideal feature sub-set with the highest classification accuracy. The superior performance of the proposed BOAPSO can be found in Table 8a in terms of selecting the ideal function subset. The proposed approach outperformed all data sets by other algorithms, as seen in Figure 14a .

The World Health Organization (WHO) declared in 2020 that the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), known as COVID-19, had begun to strike China and spread exponentially worldwide. Also, COVID-19 has caused the deaths of over 600,000 individuals across the world since August 2020. Artificial intelligence has recently become the breakthrough in current technologies and can be used to the fight against COVID-19 for diagnosis, detection, and prevention (Too & Mirjalili, 2020) . Feature selection is an essential task for healthcare. In COVID-19, the feature selection is necessary to determine the main attributes and features that provide an efficient decision for manipulating the patients. In this section, the proposed BOAPSO is employed for COVID-19 patient health prediction. The dataset of COVID-19 patients was collected and used from the GitHub data store (Novel Corona Virus 2019 Dataset, 2020 . This dataset comprises 15 features, and 864 cases, the description of the dataset is shown in Table 9 , and a sample of the transformed dataset is presented in Table 10 .

This study intends to predict the death and recovery conditions depending on the given factors. All the features are converted into numeric form. Figures 15 and 16 

In this study, a binary version of BOAPSO is proposed and used to solve a feature-selection problem in the wrapper mode for a hybrid butterfly optimization algorithm (BOA) and the particle swarm optimization algorithm (PSO 

There are various open issues reported in the research papers following an analysis of the solutions in the literature that most mimic the kind of innovations carried out. Some of them are as mentioned following:

1. Evolutionary algorithms (EAs) are usually stochastic search techniques based on population that share one algorithmic step, called population initialization. The role of this stage is to have an initial idea of solutions. These initially assumed solutions would then be iteratively modified during the optimization process before the stopping criteria is met. Generally, strong initial assumptions will make it easier for EAs to find the optimum. On the opposite, it can preclude EAs from finding the optimum starting from using poor guesses. This concern gets more critical when it comes to solving large-scale optimization problems using a finite size population. As population size is often small, the opportunity for a population to meet promising areas of the search space reduces as the size of the search space increases (Kazimipour et al., 2014) .

2. The trade-off of exploration-exploitation is a well-known dilemma that arises in situations where a learning system must regularly make a decision of unknown payoffs. Exploration makes it possible, in one hand, to identify specific places in the search space and, on the other hand, manipulation makes it possible to maintain better options by searching the local search space. Among the metaheuristic search strategies listed above, some use the exploration approach, while others use the exploitation process for better returns. Consequently, the output of the search algorithm can be advanced by applying hybrid methods. Hybridization incorporates the positive features of at least two processes, thereby improving the yield of each procedure.

3. The values of the locations of the search agent created by the algorithm are continuous. Since it violates the common binary feature selection construction, it cannot be extended specifically to our problem. Based on the issue of the selection of features (0 or 1), the most suitable features are picked to improve the accuracy and efficiency of the classification algorithm. The calculated/resulting search space is converted into (0) death Whether the patient passed away due to COVID-19

Yes (1) 4. Linear modifications to the optimization algorithm parameters that vary in a linear way cannot represent the real optimization search process of the algorithm. But it is easier to adjust the control parameter nonlinearly by the number of iterations. And the typical test function optimization findings demonstrate that the use of nonlinear strategy is easier than linear strategy optimization.

This paper presented a hybrid metaheuristic algorithm based on the standard butterfly optimization algorithm (BOA) and the standard particle swarm optimization algorithm (PSO) for the feature selection process. Three enhancement strategies are recommended to global optimize the basic BOA. The enhancements included the cubic map model's initialization, the nonlinear power exponent control parameter, and the PSO's use to enhance search capability in the BOA. To analyze the proposed model's effectiveness, it compared with other swarm algorithms such as PSO, GWO, BOA, and other recent works using 25 datasets and a COVID-19 dataset are used. The initial HPSOBOA population had a cubic map sequence used, and the results of the tests showed that the initial fitness value was higher than the BOA and other algorithms.

Furthermore, the experimental results approved that one-dimensional chaotic maps can boost the standard BOA in improving its performance. The results supported the proposed model's superiority in improving the classification process through the classification accuracy, the features selected, and the computational time. Future work involves improving the efficiency of the proposed algorithm and improving BOA by adapting its control parameters to maximize performance. The model proposed can also address other problems in reality, such as proportionalintegral-derivative (PID) control problems, problems in engineering, regional economic activity research analysis, and the implementation problems of the wireless sensor network (WSN). Moreover, the butterfly can be hybridized with other MA like the salp swarm optimization algorithm.

Besides, it is suggested that more clinical features can be obtained for accurate patient health prediction for COVID-19-patient-health-analytics.

The authors show no conflict of interest to submit this paper to this journal.

F I G U R E 1 5 COVID-19 dataset classification accuracy F I G U R E 1 6 COVID-19 dataset selected features

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Mohamed Elhoseny https://orcid.org/0000-0001-6347-8368

A new fusion of Grey wolf optimizer algorithm with a two-phase mutation for feature selection

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

A new feature selection method to improve the document clustering using particle swarm optimization algorithm

Particle swarm optimization for total operating cost minimization in electrical power system

Binary butterfly optimization approaches for feature selection

Node localization in wireless sensor networks using butterfly optimization algorithm

Butterfly optimization algorithm: A novel approach for global optimization

A modified butterfly optimization algorithm for mechanical design optimization problems

Two-step particle swarm optimization to solve the feature selection problem

Clever algorithms: Nature-inspired programming recipes

{UCI} machine learning repository

Binary Grey wolf optimization approaches for feature selection

Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: A case from the Spanish market

Pattern synthesis for the cylindrical polarimetric phased array radar (CPPAR)

A generalized mean distance-based k-nearest neighbor classifier

Improved Salp swarm algorithm based on particle swarm optimization for feature selection

An overview of particle swarm optimization variants

Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches

A review of feature selection methods with applications

A new local search based hybrid genetic algorithm for feature selection

A review of population initialization techniques for evolutionary algorithms

Embedded methods

An improved butterfly optimization algorithm for engineering design problems using the cross-entropy method

The effects of using chaotic map on improving the performance of multiobjective evolutionary algorithms

Grid scheduling using enhanced PSO algorithm

A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy

A novel hybrid wrapper-filter approach based on genetic algorithm, particle swarm optimization for feature subset selection

Novel Corona Virus

Feature selection using particle swarm optimization-based logistic regression model. Chemometrics and Intelligent Laboratory Systems

A multi-objective artificial butterfly optimization approach for feature selection

Chaos in the cubic mapping

Inbreeding and extinction in a butterfly metapopulation

A hybrid feature selection method based on information theory and binary butterfly optimization algorithm

A novel routing algorithm for wireless sensor network using particle swarm optimization

A novel adaptive butterfly optimization algorithm

Feature selection for classification: A review

Hybrid binary dragonfly enhanced particle swarm optimization algorithm for solving feature selection problems

EMG feature selection and classification using a Pbest-guide binary particle swarm optimization

A hyper learning binary dragonfly algorithm for feature selection: A COVID-19 case study. Knowledge-Based Systems

Particle Swarm Optimization-Evolution, Overview and Applications

A real time IDSs based on artificial Bee Colony-support vector machine algorithm

No free lunch theorems for optimization

Reliable accuracy estimates from K-fold cross validation

Nature-inspired metaheuristic algorithms

COVID-19 X-ray images classification based on enhanced fractional-order cuckoo search optimizer using heavy-tailed distributions

A chaotic hybrid butterfly optimization algorithm with particle swarm optimization for high-dimensional optimization problems

A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection

His research interests include cloud computing, big data, data analysis, smart city, the Internet of Things, neural networks, artificial intelligence, web service composition, blockchain, and evolutionary algorithms. He has published several papers in international and local conferences

He was granted several awards by diverse funding bodies such as the Egypt State Encouragement Award in 2018, the Young Researcher Award in Artificial Intelligence from the Federation of Arab Scientific Research Councils in 2019, Obada International Prize for young distinguished scientists 2020

she joined, as assistant teacher, the Department of Computer Science, Mansoura University, and in 2017 she registered as a PhD research student in computer science department at faculty of computer and information system in the same university. She obtained the Ph.D. degree in Computer Science from the Faculty of Computers and Information in 2019

A hybrid feature selection model based on butterfly optimization algorithm: COVID-19 as a case study