key: cord-1054805-pycm42gp authors: Bulaghi, Zohre Arabi; Navin, Ahmad Habibizad; Hosseinzadeh, Mehdi; Rezaee, Ali title: World competitive contest-based artificial neural network: A new class-specific method for classification of clinical and biological datasets date: 2020-09-28 journal: Genomics DOI: 10.1016/j.ygeno.2020.09.047 sha: a7d810b9dbf7c52e6872f1d3e3a31404c9f90da8 doc_id: 1054805 cord_uid: pycm42gp Many data mining methods have been proposed to generate computer-aided diagnostic systems, which may determine diseases in their early stages by categorizing the data into some proper classes. Considering the importance of the existence of a suitable classifier, the present study aims to introduce an efficient approach based on the World Competitive Contests (WCC) algorithm as well as a multi-layer perceptron artificial neural network (ANN). Unlike the previously introduced methods, which each has developed a universal model for all different kinds of data classes, our proposed approach generates a single specific model for each individual class of data. The experimental results show that the proposed method (ANNWCC), which can be applied to both the balanced and unbalanced datasets, yields more than 76% (without applying feature selection methods) and 90% (with applying feature selection methods) of the average five-fold cross-validation accuracy on the 13 clinical and biological datasets. The findings also indicate that under different conditions, our proposed method can produce better results in comparison to some state-of-art meta-heuristic algorithms and methods in terms of various statistical and classification measurements. To classify the clinical and biological data, a multi-layer ANN and the WCC algorithm were combined. It was shown that developing a specific model for each individual class of data may yield better results compared with creating a universal model for all of the existing data classes. Besides, some efficient algorithms proved to be essential to generate acceptable biological results, and the methods' performance was found to be enhanced by fuzzifying or normalizing the biological data. Classification is a process of placing samples in the related groups. The clinical data classification which is an important step in the clinical research, can be used to diagnose the seriously life-threatening diseases such as heart diseases or diabetes [1] . Based on the total number of samples, the clinical datasets are divided into two types [2] : balanced and unbalanced. Each class has an almost identical number of samples in the balanced datasets in contrast to the unbalanced datasets, in which the number of samples of every class is not alike. To deliver better results, a classifier inserts samples into the class with the largest number of samples. Since the produced model is unsuitable for classes with a small number of samples, learners will have problems with classes that have small numbers of samples. To deal with such limitations, our proposed algorithm, ANNWCC, can not only generate a specific model for every one of the existing classes but also apply to both the balanced and unbalanced data with several different class sizes. There are many methods and algorithms which can be applied to solve the problem of classification. A learner such as an ANN, for example, can be applied to find the connection among samples in a specific class. Also, the researchers have proposed many new approaches to training artificial neural networks and have shown the usefulness of their approach in data classification. Leema et al. used the particle swarm optimization (PSO) algorithm to train the neural networks. For this purpose, first they applied their method to three datasets obtained from the University of California Irvine (UCI), then compared their acquired results with those delivered by the previously introduced methods. Although their proposed method seemingly had better performance compared to the other existing methods, it appears that the state-of-art meta-heuristic optimization algorithms had the ability to enhance its performance J o u r n a l P r e -p r o o f Journal Pre-proof [3] Masoudi-Sobhanzade et al introduced a new optimization algorithm and showed that their optimization algorithms can outperform other algorithms [4] . In the present study, the works related to the clinical data classification fall into several main categories, including: i) The Supervised machine learning approaches: These methods aim to develop a model based on the machine learning methods such as ANNs [5] , support vector machines [6] , Bayesian networks [7] etc. To generate the model, first the data are divided into two training and testing sets, then the machine is trained and validated using the training and testing sets, respectively. Although the mentioned methods produce remarkable results, the state-of-art techniques can enhance their performance. Our proposed method, which is a combination of an ANN and the WCC algorithm, belongs in this class. ii) The Unsupervised machine learning approaches: In this class of related works, unlike the first group, the existing data do not include any labels; and based on the relationship among data, they are classified into different categories. In this category, The K-nearest neighbour (KNN) is one of the most popular techniques used for clustering [8] , Our proposed method is also fall into this class. iii) The Semi-supervised machine learning approaches: In the third group of machine learning techniques, some of the existing data include label whereas others not. A combination of supervised and unsupervised methods is usually employed for generating a model [9] . The ensemble techniques may also function properly in this category [10] . iv) The Heuristic-based approaches: This class of related works searches for some connections among the existing data and arranges data in groups using a heuristic algorithm. An efficient algorithm, proposed based on Gauss-Newton [11] , a J o u r n a l P r e -p r o o f Journal Pre-proof [12] combination of Expectation maximization (EM) and principal component analysis (PCA) [13] , and an improved ReliefF algorithm [14] , are several instances of works that belong to the fourth group. Although the mentioned methods have big advantages over the other ones and produce acceptable results, they have low performance compared with the meta-heuristic methods [15] . In order to present a suitable computer-aided diagnostic model, a 3-layer architecture based on artificial intelligence has been presented [16] . In this architecture the data received from the sensors, are classified in two categories of healthy and suspicious and if suspicious, finally the doctor will be informed for further examinations [16] . The suggested architecture, requires efficient methods for data classification, whose efficiency in suspicious case diagnosis could be enhanced with ANNWCC. To ease the Identification of disease-causing genes a tool named GPS has been introduced, whose main job is to prioritize and detect the patients genes [17] . To detect the most important disease-causing genes, this tool implements functional linkage networks. The procedure first uses HIPPIE datasets and various algorithms to examine different gens, then the candidate genes are prioritized. Finally, the classification of prioritized genes is done based on scoring schemes [17] . TRIZ article has also presented a method for choosing promising genes, which is able to detect the genes that have the most biological relevance with the disease [18] . For formulation, TRIZ makes use of optimization search operators along with modified BAT algorithm. In this method, rMRMR is used as a filter approach, while SVM and MBA are used for classification approach. Hence, not only does it give a wide coverage in gene search space, it also finds the interaction between them [18] . To cure cancer patients, classified studies of cancer subgroups is required. In article [19] , to classify the cancer molecular subtypes, in addition to SVM algorithms, optimization J o u r n a l P r e -p r o o f Journal Pre-proof algorithms have also been used. The findings show efficient subsets obtained from miRNA and mRNA which are considered as diagnostic biomarkers and can classify molecular cancer groups [19] . In order to carefully study the protein structure, which is of most important biological macromolecules, a step-by-step classification algorithm has been presented [20] . Instead of classes related to the primary structure of proteins, this algorithm uses spatial configuration of proteins by predicting the structural classes of the secondary structure of proteins. Thus, extracting features from secondary structure series leads to a more accurate classification than previous methods [20] . Since the collected data are of the utmost importance in the field of biology, and so are the studies which should be conducted based on these data [21, 22] , some related works have made efforts to generate the clinical and biological datasets [23] [24] [25] [26] . Despite the existence of many datasets in the field of biology, the data pre-processing and their missing values [27] [28] [29] remain a challenge [30] . In addition to the cited literature works, some other works have investigated different methods from various perspectives [31, 32] and discussed their advantages and disadvantages as well as limitations [33, 34] . Such studies give a few new directions for future reference. The method suggested in this article is a method to classify the clinical and biological datasets. Employing the presented classification method in above-mentioned articles, could lead to detection of probable biomarkers, which is more reliable than other methods. The biggest weakness of the methods introduced in the above-mentioned studies is their low performance based on the classification criteria. To address such a limitation, the current study proposes a method, named ANNWCC, which is a combination of a multi-layer The proposed method uses a multi-layer artificial neural network to generate a classifier model. ANNs, inspired by the biological neural network systems, are computational models that consist of several neurons connected to other neurons via synapses. The role of the synapses is to transmit signals from a neuron to another. An ANN is a framework or an algorithm, which can be applied to the machine learning practices such as the biological data classification [35] . Figure 1 shows an example of ANN which includes two hidden layers, which are those between the input and output layers. J o u r n a l P r e -p r o o f As can be seen from Fig. 1 (where B represents bias) there are two neurons in each hidden layer. The output of a neuron can be calculated using Eq. 1: where O i,j is the output of the jth neuron in the ith hidden layer, B and W are the bias and edge weights respectively, and W k,j is the weight of the edge (synapse) between the kth neuron in the (i-1)th hidden layer and the jth neuron in the ith hidden layer. Training of an ANN, in which the values of synapses or edges are determined, is a nondeterministic polynomial (NP) problem. In the case of these problems, it is not possible to find an optimal solution in a polynomial-time order and for this reason, the heuristic and metaheuristic methods such as the optimization algorithms are applied to the train of ANN training [4] . WCC, which has been inspired by the rules of human sports, is an optimization algorithm [36] with a metaheuristic search strategy and shows a high level of performance compared to other optimization algorithms. The current study formulates the WCC algorithm The introduced method of ANNWCC can be applied in the third layer of the abovementioned architecture. ANNWCC, whose framework has been depicted in Figure 2 , is being applied at several stages. In the first step, the data pre-processing is carried out on a biological dataset. Normalization, in which the data of the groups are changed in the same range using Eq. 2, and fuzzification, which determines the degree to which each datum belongs to a group (Eq. 2), are used at the data pre-processing stage. where a and b are the upper and lower bounds of [a, b], in which the groups' data should be converted, and v i ,, v min , , v max, , and nrv i are the current values that should be normalized, the minimum value of the related group, the maximum value of the related group, and the normalized value, respectively. In Eq. 3, which is a fuzzification equation, AVG, v min , v i , and fv i are the average value of a group, the minimum value of a group, the current value which should be converted into a fuzzy value, and the fuzzified value, respectively. A group's data change in the range of [6, 7] using Eq. 3. In the next step, ANNWCC divides a clinical or biological dataset into several groups based on the total number of class labels and puts every sample in its relevant group. After dividing Where N, nh, and nn i are the number of neurons in the input layer, the number of the hidden layers, and the number of neurons in the ith hidden layer, respectively. The ANNWCC training which is the final step toward generating the desired model, is an NP problem that cannot be solved within a reasonable period of time. Studies have shown that using optimization algorithms to train ANNs yields comparatively more acceptable results. We use WCC for ANN training due to its remarkable performance in the NP problems. By the approach employed in the present study, each team is considered a potential model for the clinical data classification. The number of players are computed using Eq. 6: Where np is the number of players on a team. The score function which determines the suitability of a potential answer, is defined using Eq. 7. As shown in this equation, if a particular model in a relative class assigns the input samples to the closer output values, it will perform better than others. Eq. 7 is known as the root mean squared error (RMSE). where n, y, and p i are the number of samples in a class, the value assigned to a group's samples, and the predicted value of a sample, respectively. ANNWCC generates an individual model for each existing class. After samples of a dataset are evaluated by every one of the created models, a test sample is labelled with a class J o u r n a l P r e -p r o o f number that reflects the closeness of its model's output to the mean of its class. Eq. 8 shows the way a test sample is assigned to a class. is a minimum for all 1<=k<=m} (8) where o i,k , avg k , L i , and m are the kth model's output for the ith test sample, the average value of the training samples obtained from the kth model in the kth class, the label determined for the ith test sample, and the number of classes, respectively. In the proposed method, parameters, including the total number of teams, groups, and attempts that improve and fulfil the scoring function, should be defined at the beginning of the algorithm. Like other optimization algorithms, WCC begins with the first population of the potential answers named Teams. Every team, which is generated randomly, is considered a candidate solution that determines the weights of the ANN's edges. Then, the generated teams are randomly placed in some groups which have equal numbers of teams. Teams compete with each other and improve their scores (RMSE) through four operations. The WCC algorithm consists of four operations, including shooting, attacking, passing, and crossing which change the candidate solution or the weights of an ANN and introduce some new potential answers. In this study, shooting, attacking, passing, and crossing operations have been formulated using Eqs. 9, 10,11 and 12, respectively. Where k and Team are a random integer value between 1 and n (the total number of the ANN's edges) and a candidate solution that determines the weights of an ANN. v1 and v2 are two integer values between 1 and n. J o u r n a l P r e -p r o o f Where RNG determines the range of values assigned to variables or edges of ANN. Where k and m (k>m) are two integer values between 1 and n. In the proposed algorithm, there is a referee who evaluates the effect of an action on a team and keeps scores. If an operation enhances the team's score, the new changes will be accepted by that team; otherwise, they will be ignored. Matches are held at two stages: 1) group matches; and 2) elimination matches. At the end of the elimination stage, a champion is selected as an answer to the problem. This scenario, in which individual iterations are called seasons, will be repeated until an acceptable response is produced. The above-mentioned operations which change a team's values have been illustrated in Figure 3 . After holding the group competitions, teams with a higher score ascend to the elimination stage and continue their competition. Like the grouping stage, the elimination stage is held based on the mentioned operations. At the end of the elimination phase, the remaining team is considered to be the answer to the problem [36] . J o u r n a l P r e -p r o o f (a) An ANN with one hidden layer and 8 edges is considered. (b) The first population consists of two teams whose players are equal to the number of ANN's edges. Every player determines a weight for the specified edge. (c) The status of the teams before the attacking operators are shown. Team #2 is the attacker team and selects two players randomly. (d) The attacker team adds random values to its selected values and sends them toward to randomly selected players in Team #1. (e) Team #2 is the shooter and selects two players randomly. (f) The shooter team sends its selected values toward Team #1. (g) The status of a team before crossing. In this operator, the permutation of values in a specified range is changed. (h) The status of Team #1 is presented after crossing. (i) the status of a team before passing. In this operator, two players who are chosen randomly change their values. (j) The status of Team #1 after passing. J o u r n a l P r e -p r o o f We implemented a multi-layer perceptron ANN in the MATLAB programing language on a system with 12 GB of RAM, corei7 CPU, and Windows 10.1 operating system. To train an ANN, we used the WCC algorithm, a discrete symbiotic optimization search (DSOS) [37] , and the PSO algorithms [38] . To carry out this experiment, we used 13 clinical and biological datasets obtained from the UCI Machine Learning Repository [39] . Table 1 Table 1 . In the first experiment, we investigated the effect of the input data type on the ANN training stage, and then compared the three algorithms' performance on the CHD2, the normalized CHD2 (NCHD2), and the fuzzified CHD2 (FCHD2) datasets. It was found that an optimization algorithm with a faster convergence speed and similar results obtained from its different executions is the most suitable algorithm. Convergence means that an optimization algorithm can yield an optimal answer as the dedicated time or the number of iterations increases. Since optimization algorithms usually have operators that change the candidate solutions randomly, the produced answers may vary to some extent. However, different executions may yield similar answers. To address such a problem, it is recommended that optimization algorithms should be executed at least 30 individual times [52] . between actual and predicted labels, and RMSE [53] . Based on the mentioned criteria, an evaluation of the WCC, DSOS, and PSO algorithms was made in Figures 4 & 5 and Table 2, respectively. Every algorithm was executed 30 times, with 50 seasons per execution. The best-obtained results have been illustrated in Figure 4 , which shows the convergence of the algorithms on the PID dataset and demonstrates that all the three algorithms converge successfully on the non-preprocessed, fuzzified, and normalized PID datasets. The lower is the value of RMSE, the more effective is the algorithm. Although the model generated by each algorithm is different from the ones produced by the other two algorithms, every algorithm's performance is remarkable on the PID datasets their RMSE value is almost zero. As indicated in Figure 4 , when data pre-processing is not carried out, the performance of the algorithms will be low in terms of accuracy and correlation. The WCC algorithm achieves 100% of accuracy and correlation on both the fuzzified and normalized datasets, whereas the DSOS and PSO algorithms reach 100% of classification accuracy and correlation only on the normalized datasets. Figure 5 shows the stability of the algorithms in 30 individual executions. To investigate the performance of the algorithms on all the three types of PID datasets in detail, an evaluation of the algorithms' performance on the training and testing datasets with different sizes was made. To this end, the data were divided into various parts, and the methods were applied to them. The results have been shown in Table 2 In the third experiment, we evaluated the algorithms' performance on all the datasets. For this purpose, the datasets were divided into five folds. In the 5 iterations, one of the folds was used as the testing set and the other ones as the training sets. The algorithms were ran in 30 individual executions on the three types of data: non-pre-processed, fuzzified, and normalized. The results have been shown in Table 3 . This table presents As indicated in Table 3 , the performance of WCC, which is approximately the same for both the normalized and fuzzified datasets, is better than the other two algorithms. Besides, PSO and DSOS deliver better results only on the normalized data sets. Also, as evidenced by Table 3 , algorithms demonstrate distinct performance on different data types; while DSOS produces the best results on the CDH5 and HEP datasets, PSO yields the most desirable J o u r n a l P r e -p r o o f effects on the PAR dataset. Also, WCC delivers the most appropriate results on the CHD2, LUND, and WDBC datasets. In the case of the SHD, PID, and LIV datasets, each algorithm performs in much the same way as the other two algorithms do. In general, although all of the algorithms function properly, compared to DSOS and PSO, WCC seems to produce the most desired results in terms of CI, P-value, and the elapsed time. Table 4 summarizes and compares the three algorithms based on the above-mentioned criteria which have been computed using the "ANOVA 7" function (MATLAB) with the default parameters. As shown in Table 4 , although the results produced by the other two algorithms are suitable, once again WCC delivers the most favourable results on all the datasets. In the J o u r n a l P r e -p r o o f case of DSOS and PSO, although they produce similar results, DSOS' performance is slightly better than PSO's. Since feature selection (FS) is one of the primary data pre-processing steps which may help produce better results, the effects of FS on the performance of the proposed method was also investigated. For this purpose, after the desired features were selected using the FeatureSelect software application [15] , they were applied to the methods and the final models were generated using the selected features. The obtained datasets, called the FS-based datasets, together with their average results, have been shown in Table 4 . The receiving operation characteristic (ROC) curve is a graphical method for determining the separation ability of the models in the samples of classes [54] . An algorithm with a curve close to the top left-hand side of the graph has a high ability to separate the data. Based on the average of the best 30 individual executions of algorithms on all the datasets, Figure 6 is represented. As it is shown in Figure 6 , although the ROC curves of all the three algorithms show their remarkable separation ability, WCC outperforms DSOS and PSO in its ability to separate the data. Moreover, the separation ability of DSOS is better than that of PSO. The vertical axis of the ROC curves marks the sensitivity of the algorithms, which is computed using Eq. 16, and the horizontal axis presents the false positive rate (FPR) which is equal to (7−SPC) and is calculated by Eq. 16. The Purple, blue, red, and yellow curves show the ROC curves of the random guess, WCC, DSOS, and PSO algorithms, respectively. Although all the three algorithms' ROC curves are approximately in harmony/agreement, the area under the WCC's curve (AUC) is larger than the ones under the other two algorithms' curves. The PID, CHD, and SHD datasets, whose attributes and properties such as the total number of samples were presented in Table 1 , have been frequently used in various studies. In the final experiment, we made a comparison between the results of some previously done studies in the literature works and those of the three algorithms described in this paper. The research references, publication dates, authors' names, methods' names, and the best value for the accuracy obtained by each method on the above-mentioned datasets have been listed in Table Table 5 , the five-fold cross-validation results have been acquired under similar conditions. The best values for each group have been presented in bold print. The findings revealed that the FELM method yielded the best results on the PID and CHD5 datasets as well as the WCC, PSO, and DSOS algorithms. WCC produced the best results on the CHD2 dataset. And finally, the results delivered by WCC, PSO, and DSOS on the SHD dataset were identical. In most cases, based on the findings, WCC, PSO, and DSOS outperformed the other previously proposed approaches. In this part, first, we make a comparison between ANNWCC and the other clinical data classification approaches, then discuss its capabilities from different perspectives: 1-The Applied method aspect: The proposed method, employed for classification, is different from the other introduced methods. Unlike ANNWCC, which is a combination of the designed ANN and the WCC algorithm, a majority of classification methods are based on either an ANN, which uses classical or modified training methods [71] [72] [73] [74] [75] [76] [77] or an optimization algorithm which determines a weighting method for the classification of data [78] [79] [80] [81] . In contrast to the other approaches which introduce a universal model for all the different kinds of data [82] [83] [84] , ANNWCC generates a specific model for each class of data. 2-The Dataset types: In contrast to the previously introduced works, which have focused on just a specific dataset and based their evaluation on a single data type [85] [86] [87] , ANNWCC's performance / effect has been investigated on 13 different medical and biological datasets. ANNWCC is also operable on both the balanced and unbalanced J o u r n a l P r e -p r o o f datasets, while a majority of the other approaches have focused on just the balanced data [88] [89] [90] . In addition to the above-mentioned findings, it was revealed that there is a connection between the number of hidden layers and the computational time enhancing. 3-The Scalability aspect: Unlike the other approaches which have only one specific application, ANNWCC's results indicate that it can be used for various practical purposes, [91] [92] [93] [94] . ANNWCC can be also employed as a reference in future studies. In this paper, we proposed a new method based on a combination of ANN and the WCC algorithm to classify the clinical datasets. For the purpose of this classification, we employed several real datasets obtained from the UCI Machine Learning Repository and different literary works. We separated each group's data and generated a model for each one. Then we evaluated the performance of the algorithms on various experiments and described their functionality. Convergence, stability, error, the correlation between predicted and actual labels, accuracy, sensitivity, and specificity were the main criteria used for the comparison of the algorithms. Based on the above-mentioned criteria, the results produced by the WCC algorithm seemed to be better than those delivered by the other ones. We also compared the current studies' reported results on the PID, CHD, and SHD datasets with the results reported by the previously done studies on these datasets, and found out that the proposed method can achieve remarkable results in terms of the average value of accuracy for the PID and SHD datasets. J o u r n a l P r e -p r o o f Hybrid approach using fuzzy sets and extreme learning machine for classifying clinical datasets Medical decision support system for extremely imbalanced datasets Neural network classifier optimization using differential evolution with global information and back propagation algorithm for clinical datasets Trader as a new optimization algorithm predicts drug-target interactions efficiently Benchmarking human epithelial type 2 interphase cells classification methods on a very large dataset Towards an autonomous Human Chromosome Classification System Using Competitive Support Vector Machines Teams (CSVMT) Classification of clinically useful sentences in clinical evidence resources Unsupervised Extraction of Phenotypes from Cancer Clinical Notes for Association Studies Semi-Supervised Learning with Ensemble Self-Training for Cancer Classification Sequence-based bacterial small RNAs prediction using ensemble learning strategies Optimal breast cancer classification using Gauss-Newton representation based algorithm Multi-feature fusion for deep learning to predict plant lncRNAprotein interaction A knowledge-based system for breast cancer classification using fuzzy logic method An optimization of ReliefF for classification in large datasets FeatureSelect: a software for feature selection based on machine learning approaches SENET: A novel architecture for IoTbased body sensor networks GPS: Identification of disease genes by rank aggregation of multi-genomic scoring schemes A TRIZ-inspired bat algorithm for gene selection in cancer classification Masoudi-Nejad, mRNA and microRNA selection for breast cancer molecular subtype stratification using meta-heuristic based algorithms A step-by-step classification algorithm of protein secondary structures based on double-layer SVM model DrugR+: A comprehensive relational database for drug repurposing, combination therapy, and replacement therapy Computational-based drug repurposing methods in COVID-19 Bacterial clinical infectious diseases ontology (BCIDO) dataset, Data in brief Drug databases and their contributions to drug repurposing Comprehensive expression-based isoform biomarkers predictive of drug responses based on isoform co-expression networks and clinical data Synthetic repurposing of drugs against hypertension: a datamining method based on association rules and a novel discrete algorithm Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework Unmapped reads from cattle RNAseq data: a source for missing and misassembled sequences in the reference assemblies and for detection of pathogens in the host Accurate haplotype imputation with individualized ancestry-adjusted reference panels Clinical calculators in hospital medicine: Availability, classification, and needs, Computer methods and programs in biomedicine Collaborative learning based on associative models: Application to pattern classification in medical datasets An efficient hybrid feature selection method to identify potential biomarkers in common chronic lung inflammatory diseases Using classification models for the generation of disease-specific medications from biomedical literature and clinical data repository Advances in methods and software for RNA cytosine methylation analysis Disease global behavior: A systematic study of the human interactome network reveals conserved topological features among categories of diseases World Competitive Contests (WCC) algorithm: A novel intelligent optimization algorithm for biological and non-biological problems Discrete Symbiotic Organisms Search Algorithm for Travelling Salesman Problem Particle swarm optimization UCI machine learning repository International application of a new probability algorithm for the diagnosis of coronary artery disease Towards understanding stacking: studies of a general ensemble learning scheme, na Using the ADAP learning algorithm to forecast the onset of diabetes mellitus Assistant 86: A knowledge-elicitation tool for sophisticated users Diagnosing a disorder in a classification benchmark Optimal discriminant plane for a small number of samples and design method of classifier on the plane, pattern recognition Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection Nuclear feature extraction for breast tumor diagnosis Feature extraction: foundations and applications Analyzing authors and articles using keyword extraction, self-organizing map and graph algorithms A comparative analysis of speech signal processing algorithms for Parkinson's disease classification and the use of the tunable Q-factor wavelet transform The cancer genome atlas pan-cancer analysis project Exploration and exploitation in evolutionary algorithms: a survey Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine Feature generation using genetic programming with comparative partner selection for diabetes classification Hybrid prediction model for type-2 diabetic patients, Expert systems with applications Heart disease diagnosis utilizing hybrid fuzzy wavelet neural network and teaching learning based optimization algorithm A computational intelligence approach for a better diagnosis of diabetic patients A hybrid intelligent system for medical data classification Design and implementation of fuzzy expert system using fuzzy assessment methodology A swarm optimization approach for clinical knowledge mining, Computer methods and programs in biomedicine Fuzzy neuro genetic approach for predicting the risk of cardiovascular diseases AGFS: Adaptive Genetic Fuzzy System for medical data classification Medical dataset classification: a machine learning paradigm integrating particle swarm optimization with extreme learning machine classifier SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features Design and application of a generic clinical decision support system for multiscale data Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules Rule selection for coronary artery disease diagnosis based on rough set Design of a hybrid system for the diabetes and heart diseases, Expert systems with applications Knowledge mining from clinical datasets using rough sets and backpropagation neural network, Computational and mathematical methods in medicine Artificial intelligence based clinical data management systems: A review Computer based prognosis model with dimensionality reduction and validation of attributes for prolonged survival prediction Multimodal detection of concealed information using Genetic-SVM classifier with strict validation structure An Artificial Neural Network and Bayesian Network model for liquidity risk assessment in banking Intra-specific competitive co-evolutionary artificial neural network for data classification Structural learning in artificial neural networks using sparse optimization Self-Organizing and Error Driven (SOED) Artificial Neural Network for Smarter Classifications High dimensional data classification and feature selection using support vector machines Accuracy of rule extraction using a recursive-rule extraction algorithm with continuous attributes combined with a sampling selection technique for the diagnosis of liver disease A general feature-weighting function for classification problems New feature weighting approaches for speech-act classification An ensemble predictive modeling framework for breast cancer classification Data-driven modeling of clinical pathways using electronic health records Clinicopathological findings in sick dogs naturally infected with Leishmania infantum: Comparison of five different clinical classification systems, Research in veterinary science Type 2 diabetes mellitus prediction model based on data mining Molecular classification of hepatocellular adenoma in clinical practice Molecular Signatures for Tumor Classification: An Analysis of The Cancer Genome Atlas Data Mining balance disorders' data for the development of diagnostic decision support systems Extreme learning machines with heterogeneous data types Thyroid lesion classification in 242 patient population using Gabor transform features from high resolution ultrasound images, Knowledge-Based Systems Mobile health architecture for obesity management using sensory and social data Computer-aided detection system for nerve identification using ultrasound images: a comparative study Three-way decision support for diagnosis on focal liver lesions, Knowledge-Based Systems Deep learning for freezing of gait detection in Parkinson's disease patients in their homes using a waist-worn inertial measurement unit, Knowledge-Based Systems Zohre Arabi designed the project, prepared the manuscript, and performed the analysis. Ali Rezaee and Ahmad Habibizad Navin reviewed the manuscript and supervised the project.Mehdi Hosseinzadeh designed the project, prepared data for the analysis, reviewed the manuscript and supervised the project. All authors read and approved the final version of the manuscript. The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.