key: cord-0930421-1xg07jc8 authors: Naeem, Hamad; Bin-Salem, Ali Abdulqader title: A CNN-LSTM network with multi-level feature extraction-based approach for automated detection of coronavirus from CT scan and X-ray images date: 2021-09-27 journal: Appl Soft Comput DOI: 10.1016/j.asoc.2021.107918 sha: a4a5a77691168dad90961da820acc7d00fbe7e04 doc_id: 930421 cord_uid: 1xg07jc8 Auto-detection of diseases has become a prime issue in medical sciences as population density is fast growing. An intelligent framework for disease detection helps physicians identify illnesses, give reliable and consistent results, and reduce death rates. Coronavirus (Covid-19) has recently been one of the most severe and acute diseases in the world. An automatic detection framework should therefore be introduced as the fastest diagnostic alternative to avoid Covid-19 spread. In this paper, an automatic Covid-19 identification in the CT scan and chest X-ray is obtained with the help of a combined deep learning and multi-level feature extraction methodology. In this method, the multi-level feature extraction approach comprises GIST, Scale Invariant Feature Transform (SIFT), and Convolutional Neural Network (CNN) extract features from CT scans and chest X-rays. The objective of multi-level feature extraction is to reduce the training complexity of CNN network, which significantly assists in accurate and robust Covid-19 identification. Finally, Long Short-Term Memory (LSTM) along the CNN network is used to detect the extracted Covid-19 features. The Kaggle SARS-CoV-2 CT scan dataset and the Italian SIRM Covid-19 CT scan and chest X-ray dataset were employed for testing purposes. Experimental outcomes show that proposed approach obtained 98.94% accuracy with the SARS-CoV-2 CT scan dataset and 83.03% accuracy with the SIRM Covid-19 CT scan and chest X-ray dataset. The proposed approach helps radiologists and practitioners to detect and treat Covid-19 cases effectively over the pandemic. This outbreak of the coronavirus is placing all industries on lockdown. According to the current data of the World Health Organization (WHO), more than 12 million people have been infected, with roughly 552,050 deaths as of 9 July 2020 [1] . Also, health services have hit a point of decline in advanced nations, causing intensive care units to lack. Two different coronaviruses, severe acute respiratory syndrome (SARS) and the middle east respiratory syndrome (MERS), were identified for the strain which started to sprout in Wuhan, China [2] . The Covid-19 signs may include cold to fever, short breathing, and acute respiratory disease [3] . Compared with SARS, both coronavirus and the respiratory system damage the kidneys and liver [4] . For preventing Covid-19 and deadly diseases, no definite vaccine is available. The only way to avoid transmitting the virus infection among safe individuals is to separate the infected individual. RT-PCR is a method that can identify Covid-19 from a respiratory sample [5] . Because of the restricted supply and limitations of RT-PCR, the employed for automatically detecting different human diseases, thus ensuring intelligent healthcare [21, 22] . The deep learning approach acts as a feature extractor that improves the precision of classification [23] . Examples of the contributions [24, 25] from deep learning are identification of the lung tumor, bone suppression by radiography, diabetic retinopathy, prostate segmentation, skin lesions, and myocardium involvement in coronary CT scans. Our objective is to detect and classify Covid-19 in CT scan and X-ray images. To do this, the main challenges that must be addressed in artificial intelligence and Covid-19 detection are as follows. • The majority of recent solutions focus on Covid-19 identification, which can only facilitate doctors in detecting Covid-19 patients among other Pneumonia patients. They are unable to determine the severity in Covid-19 cases. Therefore, radiologists and physicians need more time to finish this task. • Collecting a large labeled dataset is the most challenging task in automated Covid-19 detection. The number of nonsevere cases is much larger than that of severe cases as per the recent Covid-19 literature. Overfitting from imbalanced classes in the Covid-19 dataset significantly reduces the accuracy of automatic detection [26] . • CNN networks can extract features from large datasets of high-dimensional images and videos. However, the training process of these networks is more complex, time-consuming, and resource-consuming due to high dimensional input. As a result, the need for these networks finishes, especially in IoT-based medical solutions where the computing resources are minimal compared to CNN computational complexity [27, 28] . In this paper, a combined deep learning and multi-level features extraction-based approach is designed to detect and classify Covid-19 CT scans and chest X-rays. In this article, the significant contributions are: • We propose a combined deep learning and multi-level features extraction-based approach to automatically identify the soft tissues in CT scan and chest X-ray images of Covid-19. Our approach supports radiologists and doctors in detecting Covid-19 automatically and categorizing severity. • The multi-level features extraction-based strategy comprises GIST, Scale Invariant Feature Transform (SIFT), and Convolutional Neural Network (CNN) is proposed to reduce CNN network training complexity. It helps to minimize parameter sizes and input dimensions for the CNN network. A combined CNN-LSTM based deep learning strategy is proposed for automatic Covid-19 detection with severity classification. The comparison with other state-of-the-art deep learning methodologies confirms the feasibility of the proposed approach. • The fused feature set is visualized using t-distributed Stochastic Neighbor Embedding (t-SNE) to confirm that the multi-level features extracted from the proposed strategy are not sparse. The visualization results show that the fused feature set is denser and can effectively perform deep learning tasks. • The overfitting and imbalanced data problems significantly affect the accuracy and loss values of the deep learning model. Dropout layer and activation function are used in the proposed strategy for solving the overfitting challenge, whereas the imbalanced data problem is solved by using the Synthetic Minority Oversampling (SMOTE). The remaining sections of the paper are organized as follows: the related works are given in Section 2, the proposed methodology is explained in Section 3, results and discussions are provided in Section 4, and finally, the conclusion is given in Section 5. More studies have been conducted on deep learning-based methods to achieve high Covid-19 detection and severity classification performance. For example, Sarker et al. [29] used DenseNet −121 to apply transfer learning for Covid-19 identification. They designed a platform that analyzes radiological images and displays the areas that are affected. This method had obtained 87% accuracy. Shan et al. [30] developed a deep learning method to classify infected lung regions automatically. The methodology was evaluated on 300 coronavirus-infected cases. The accuracy of this methodology was 91%. The proposed approach could not determine any further severity of the Pneumonia. Zhang et al. [31] utilized DenseNet for diagnosing coronaviral disease. Case identification sensitivities of Covid-19 and non-Covid-19 were 96% and 70.65%, respectively. Wang et al. [32] employed pre-trained deep learning methods to recognize the lung images of Covid-19. This method was tested on 1266 patients in six cities. This method had an accuracy of 87%. Kumar et al. [33] proposed a deep learning approach for detecting Covid-19 chest X-ray images. Nine pre-trained deep learning models were employed to extract features from Covid-19 X-rays, and SVM was used to classify them. This method had an accuracy of 95.38%. Aayush et al. [34] [37] built a public SARS-CoV-2 dataset based on CT scan images of suspected patients. They further proposed an eXplainable deep learning strategy (xDNN) to identify positive and negative covid patients. They had used direct input images for their pre-trained network. The baseline experiment of their public dataset achieved an excellent accuracy of 97.38% with minimal accuracy loss. Covid-19 severity detection is one of the most challenging tasks in assessing the condition of a covid positive patient. Deep learning-based severity assessment has been considered in several studies to be more effective and quantitative than reportbased radiologist assessment. Despite the promising results of deep learning methods on Covid-19 severity diagnosis, few studies on computer vision-based covid detection have been published. For example, Yue et al. [38] used 729 CT scan images of Covid-19 patients to measure the severity. They used a pretrained deep neural network to classify the severity and achieved an overall detection accuracy of 95.34%. They had used direct input images for their pre-trained network. Carvalho et al. [39] categorized covid patients into mild, moderate, and severe groups to perform severity detection. The experimental evaluation of artificial neural network (ANN) achieved an overall detection accuracy of 82% on 300 CT scan images of covid positive patients. They had used direct input images for network training. Xiao et al. [40] used a residual CNN (ResNet34) model to optimize their deep neural network for Covid-19 disease severity progression. They had used direct input images for their pre-trained network. The empirical analysis of 408 Covid-19 positive patients yielded an overall detection accuracy of 81.9%. Our proposed work fills the following research gaps. • CNN is usually quite promising to extract features from high dimensional input such as images etc. Even though the deep learning methodologies described in published works have proven excellent [29] [30] [31] [32] [33] 39] , the training period for such approaches is exceptionally long, as their CNN networks require large-scale optimization processes. It is due to the Covid-19 image's large input size. Fine-tuning or transfer learning is used to tackle this problem in published researches [34] [35] [36] [37] [38] 40] . The ImageNet dataset was used in the pre-training of most of these methods. Therefore, the training complexity for the CNN network will rise during fine-tuning if training and testing scenes are part of entirely different applications. This limitation poses a substantial challenge for IoT-based medical applications with limited processing resources than CNN computational complexity. Our paper presents a new way to decrease CNN's training complexity by reducing parameter sizes and input dimensions. The results reported in Tables 4 and 5, and Fig. 9 correspondingly justify this claim. • The majority of published research focuses on detecting Covid-19, which can only aid physicians in identifying Covid-19 patients among other patients. Covid-19 severity assessment can also assist physicians in prioritizing patient treatment. Usually, a lengthy evaluation time is required for radiologists and physicians to fulfill both tasks. However, deep learning-based solutions can provide fast, efficient, and accurate identification of Covid-19 with severity assessment. A few studies in the literature have focused on the severity assessment of Covid-19, and they have attained less accurate severity assessment results due to the insufficient number of Covid-19 samples. As a result, the automatic Covid-19 severity assessment is still under-examined, requiring additional research in this area. However, our research focuses on both the detection and the severity classification of Covid-19. The results can be viewed in Table 2 . The proposed approach is designed to detect and classify Covid-19 in CT scans and chest X-rays. We collected a SARS-CoV-2 CT scan dataset (www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset) from the Kaggle database for experimental purposes. The dataset contains a total of 2390 CT scans, 1229 of which are positive for infection with SARS-CoV-2, and the remaining 1161 are negative for infection with SARS-CoV-2. 34 of the Covid-19 cases were male, while 28 of them were female. The Italian Society of Medical and Interventional Radiology (SIRM) Covid-19 database provided the second dataset for this study (https://www.sirm.org/category/senza-categoria/Covid-19/ ). It includes 220 CT scan and X-ray images of the lungs from 79 actual Covid-19 cases. We analyzed 79 cases in 115 real cases of Covid-19. Senior radiologists and specialists of SIRM with several years of expertise in analyzing CT scans and chest X-rays have provided brief history of each Covid-19 case in the database. The second dataset was used in this investigation to diagnose the severity of Covid-19. Fig. 1 shows a chunk of lung CT scans and chest X-rays of Covid-19 cases. According to the National Health Commission of China's directions on Covid-19, the opacity and lung disease involvement produce legitimate results regarding Covid-19 disease severity [41] . The severity level of Covid-19 was evaluated using the lung severity score of Warren et al. [42] . Each lung sample was grouped according to its particular lung involvement score (0-4). Based on the illness stage and lung severity score, we divided the 79 real Covid-19 cases into four groups (Mild, Moderate, Severe, and Critical). Score 0 represents no lung involvement and opacity. • Mild Case (Score 1) = Pneumonia affects 25% of the lung region (Opacity is present in the lower and middle lung sections). The period of illness ranges from 1 to 10 days. • Moderate Case (Score 2) = Pneumonia affects 25%-50% of the lung region (In the lower and middle lung sections, opacity is present). The period of illness ranges from 11 to 20 days. Fig. 2 shows the flow diagram of the proposed approach. The proposed approach uses a combined CNN-LSTM strategy with multi-level features extraction-based strategy for Covid-19 detection and classification. The local and global features are extracted, selected, and fused from CT scans and chest X-rays using conventional image descriptors, i.e., SIFT and GIST in the level 1 feature extraction phase. Local and global features are further merged in a single feature vector with serial feature fusion. t-SNE is used to transform high dimensional fused feature set into low dimensional space. Synthetic Minority Oversampling (SMOTE) [43] is used to handle the imbalanced data problem. Deep features are extracted from the fused feature set using the CNN network in the level 2 feature extraction phase. CNN extracts more valuable features with further reduced dimensions. Lastly, the combined CNN-LSTM model uses the derived features for Covid-19 detection and classification. Details on each stage are provided further below subsections. CNN network is quite effective for a variety of high dimensional data such as images and videos etc. It needs a highdimensional optimization method during which the training time is long since the input size is large. In addition, if testing samples are drastically different from training samples, then the CNN network must be retrained or fine-tuned for identification [27, 28] . Level 1 feature extraction is used to reduce input dimensions of data for the CNN network. Consequently, it increases the CNN network training efficiency without loss of accuracy, minimizes input data dimensions and parameter sizes. Three significant steps are used to extract SIFT and GIST features of the CT scan and chest X-ray images, namely feature extraction, selection, and fusion. The description of each step is given below: The SIFT descriptor extracts local image features and uses an appropriate key point to describe them. Since CT scans and chest X-rays contain many textural, corners, and edges features, it is necessary to extract local features to minimize detection errors. Conventional SIFT descriptor [44] often fails to extract more local features from entire image patches. Due to certain key advantages, the proposed strategy chooses Dense SIFT Descriptor (D-SIFT) for local feature extraction from Covid-19 images. D-SIFT uses a dense grid for local features extraction from whole image patches. D-SIFT is x30-x60 times faster than conventional SIFT descriptor [45] . D-SIFT provides more high dimensional local characteristics of an image, and it is more accurate than other local feature descriptors such as SURF [46] and LBP [47] . Several studies have previously preferred D-SIFT over other feature descriptors for extracting local features from medical image datasets [48] . The main workflow of the D-SIFT feature descriptor is presented below. • In the first stage for feature description, D-SIFT selects the bounding box then uses a patch to move over CT scan and chest X-ray. The patch slides all areas within the bounding box to describe the feature. Each patch size is determined from 4 × η x × 4 × η y where η x and η y depicts each cell size in the patch. The patch is known as a descriptor sampling region. • The gradient histogram of the pixel is calculated in eight directions for each cell in the second stage. Each patch has 128-dimensional characteristics. All patch features are combined to deliver D-SIFT features. • In the third stage, Eq. (1) defines the total dimensionality of D-SIFT features. where W depicts the width of the CT scan and chest X-ray, δ y and δ x depicts vertical and horizontal steps, η y and η x depicts the width and height of the cell. Consequently, the overall D-SIFT dimensionality is 191 × 191 × 128. The input dimensions of the Covid-19 image are 200 × 200 × 3 in our case. The strategy is developed using MATLAB language. The conventional Bag of Features (BOF) selection approach [49] is used to reduce the dimensionality of D-SIFT features and boost the efficiency of the training of CNN. The D-SIFT features are then grouped into clusters which are arbitrarily built to minimize overall dimensionality, and their frequency is known as the weight. The overall dimension of the D-SIFT features is reduced up to 768. The entire stages of the BOF feature selection approach are presented below. • The most illustrative D-SIFT features are clustered into arbitrary clusters using the K-means approach [50] in the first level. K-means clustering initially selects the cluster centers arbitrarily and then uses the Euclidean distance to allocate the closest cluster centers to each D-SIFT feature. Each visual codeword refers to a cluster center. Eq. (2) is used to construct the final dictionary for CT scan and chest X-ray datasets. where k is the visual codeword (in our case, k = 768), n represents feature points'x i number, and the centroids of cluster j represent c j . Eq. (3) determines the functionality of the learning dictionary. where D size represents the dictionary size. • In the second level, each D-SIFT feature is quantized with the closest visual codeword in the dictionary. The value of histogram differs among [0,1] scales. The D-SIFT feature vector is computed after Euclidean distance calculation between local feature and visual dictionary, as shown in Eq. (4). 768dimensional D-SIFT feature vector is obtained in the final level of feature selection. The extraction of image texture is referred to as global feature extraction. Many works suggest the GIST descriptor for extraction of global features inside the CT scan and chest X-ray due to its high classification performance. Therefore, for global feature extraction, we pick the GIST descriptor. GIST descriptor uses first Gabor multi-scale and multi-directional filtration and then extracts the image's 512-dimensional texture feature vector. Global feature extraction for CT scan and chest X-ray is performed using Eq. (5). The weighted combination of the output size of multi-scale-oriented filters signifies the value of the GIST feature vector that is computationally expensive. A regularization technique [51] is selected to reduce the dimensionality of the GIST feature vector. 256-dimensional GIST feature vector is obtained in the final level of feature selection. where Gabor filter is indicated by g (x, y), the Scale factor of wavelet expansion is indicated by a −m , the number of scales is represented by m, the number of directions is represented by n, and filter direction is represented with θ. D-SIFT and GIST descriptors concentrate on edges, corners, and textural feature points that define each feature's surrounding area. As a result, it is confirmed that both descriptors can preserve the most relevant information inside Covid-19 samples. Fig. 3 represents the most essential 10,50,100 and 300 D-SIFT and GIST features as red spots. D-SIFT and GIST feature descriptors offer two vectors with lengths of 1 × 768 and 1 × 256, respectively. The Eqs. (6) and (7) where Fused fv 1×1024 represents fused feature set. The t-SNE feature visualization is used to determine whether the fused feature set contains essential or sparse information. Van der Maaten and Hinton introduced t-SNE [52] in 2008 as a popular approach of high-dimensional data visualization. The t-SNE algorithm transforms high dimensional data into low dimensional spaces. t-SNE algorithm has many tunable parameters such as the number of iterations and perplexity etc. The perplexity value of t-SNE is used to balance local and global data aspects in the resulting plot. The balanced attention between local and global aspects of the fused feature vector for various perplexity values is seen in Fig. 4 . We conducted two t-SNE visualization experiments using the R language. We approximate the lowest perplexity value required for separating Covid-19 (+) and Covid-19 (−) clusters in the first experiment. We visualized the finest Covid-19 (+) and Covid-19 (−) clusters separation through optimal perplexity value in the second experiment. t-SNE employs iterations to distinguish between different sample types. To visualize the separate Covid-19 (+) and Covid-19 (−) clusters, we used 300 iterations with each perplexity value. Fig. 4 displays the first row with dimensional reduced t-SNE plots of fused feature set on 1,2,3, and 4 perplexity values. Covid-19 (+) samples are indicated in an orange cluster, and Covid-19 (−) samples are indicated by a light blue-color cluster. The first row in Fig. 4 displays no visible cluster with overlapping samples when the perplexity value is 1. Even though on 2,3 and 4 perplexity values, the outcomes of the first t-SNE experiment shows separation of Covid-19 (+) and Covid-19 (−) samples into visual clusters but still the exception of a few outlier data points exists. Therefore, the second experiment is conducted for clear separation of Covid-19 (+) and Covid-19 (−) clusters. We chose four optimal perplexity values such as 5, 6, 7, and 8 to perform the second t-SNE experiment, as shown in the second row of Fig. 4 . The second row in Fig. 4 shows the t-SNE plots with clearly distinct Covid-19 (+) and Covid-19 (−) clusters, which display low dimensional fused feature set can categorize samples into their respective categories. The recommended high perplexity value ranges from 5 to 50. All the observations in Fig. 4 indicate that the fused feature set is dense with no missing values. The density of the dataset dramatically affects the absolute accuracy of the prediction. Higher density generally leads to more accurate predictions because more expressive data is available for learning [53] . The t-SNE visual clusters are more isolated and improve classification performance. A dataset can be separated with appropriate perplexity value, which can also be categorized by suitable hyperparameters, as observed in the classification performance of the proposed CNN-LSTM model in Tables 2-5, and Table 6 . In deep learning, we generally need to enter adequate data to train the model to avoid overfitting. Data augmentation can be used to enhance the sample size, reducing the impact of unbalanced data. Overfitting issues can be efficiently avoided with the proper data augmentation strategy, improving the model's robustness. Several studies have recently been conducted to provide new data augmentation methods. For example, Jayadeva et al. [54, 55] introduced a non-iterative method for adding samples to small datasets. This technique estimates the data in a sub-space of its eigenvectors, clusters them in the subspace, and produces additional samples within the clusters. They also proposed a Twin Neural Network (Twin NN) to learn from large unbalanced datasets further in their works. Although many recent articles published their datasets of Xrays, they contain up to a few hundred images in the Covid-19 datasets. To build a Covid-19 dataset with enough images in each class, X-rays of confirmed and not confirmed Covid-19 cases must be collected, which are difficult to obtain from reliable and authentic sources. The challenge is further strengthened because the collected data must be correctly annotated [56] . For experimentation purposes, the SARS-CoV-2 CT scan dataset has 1229 Covid-19 samples, whereas the SIRM Covid-19 CT scan and chest X-ray dataset contains just 220 Covid-19 samples. When samples in one class exceed the other, during the CNN and LSTM training, it may be possible to face imbalanced data problems. SMOTE over-sampling on classification models addresses this issue in this analysis. The experimental study with and without SMOTE data augmentation is discussed in Section 4. SMOTE is based on K-nearest neighbor clustering. Euclidean distance calculates Knearest neighbor for each sample of the minority class. Further, the random nearest neighbor x i is selected from a bunch of Knearest neighbors x j . The copies of new samples are generated by using Eq. (9). x new = x i + |x i + x j | × δ (9) where δ ∈ [0, 1] represents a random number between 0 to 1. CNN network is quite effective for the processing of high dimensional data such as images and videos etc. However, in the proposed approach, we consider the matrix formed by the handcrafted feature vectors instead of a direct image as an input of the CNN network. Image pixels usually connect locally and globally; similarly, the fused feature vectors represent the locally and globally correlated features [27, 28] . Therefore, we choose a fused feature set as an input of the CNN network. The fused feature vectors of 1 × 1024 length greatly help in reducing the training complexity of the CNN network. CNN network extracts more meaningful features with further dimension reduction. The proposed approach utilizes one dimensional CNN network consisting of two convolution layers, two pooling layers, three dropout layers, and one fully connected layer, as shown in Fig. 5 . The CNN network is developed using Python tenser flow library (1.9). The convolution layer filters first slide over the fused feature vectors and then later extract optimal deep features. Features that are derived from each filter are grouped into a new feature set known as a feature map. The optimal length and number of filters are selected by using hyper-parameter tuning. The non-linear activation function ReLU is used on each element as a result. In the proposed CNN network, two convolution layers use 64 and 128 filters, kernel of size four, and padding value same to make the exact shape of output as the input. The spatial size, the number of features, and computational complexity usually decrease using the max-pooling layer. In the proposed CNN network, two maxpooling layers use pooling of size two and strides of size one to obtain a feature map containing the most prominent features of the previous feature map. The proposed CNN network also includes a fully connected classification layer. The overfitting in the proposed CNN network is solved using the Softmax function and dropout layers. The output of the one-dimensional proposed CNN network is expressed by using Eq. (10) . where c l k denotes the scalar bias of the kth neuron at the first layer, the output of ith neuron of layer l-1 is marked with t l−1 i , X l−1 ik means kernel weight from the ith neuron at layer l-1 to kth neurons at layer l, and ''f()'' represents activation function. The features extracted using max-pooling layers are often passed to the fully connected layer for classification in CNN networks. However, in the proposed CNN network, the sequence of deep features passes to the LSTM layer rather than directly passing them through the fully connected layer for classification. The CNN network efficiently extracts and recognizes the image's local and global structures in the pixel series, while the LSTM network detects long-short-term dependencies [57] . In order to benefit from the characteristics of the two models, the proposed approach introduces a combined CNN-LSTM network for auto-identification and classification of Covid-19. The proposed CNN-LSTM model contains two phases, as shown in Fig. 5 . Phase one includes convolution layers and max-pooling layers, whereas phase two consists of the LSTM layer. The local and global information of the fused feature set is encoded by the convolution layers, while the LSTM layer decodes the encoded information. The information is further flattened and passed into a fully connected layer for classification. The workflow of the first phase is covered in Section 3.5. The LSTM model comprises one memory unit and three other interactive gates: input, forget, and output gates. The state from the previous state is preserved by the memory cell. The input gate defines how much input data from the network must be stored in the unit state at the current time ''t''. The forget gate determines whether the data will pass or refuse to enter the input gate at time ''t − 1''. The output gate specifies the information for the output. Eq. (11) defines the functionality of the LSTM model. where x t denotes the input at time ''t'', v * and w * depicts the weight matrices, b * and h * denote the bias and hidden states respectively.σ and tanh represent the activation functions. Input gate, forget gate, output gate, and memory cell are indicated with i t , f t , o t and c t , respectively. In Fig. 6 , we have introduced two different deep learning models, namely, CNN 2 -LSTM and CNN 1 -LSTM, respectively. CNN 2 -LSTM consists of two convolution layers of 1023 and 512, two max-pooling layers of 512 and 256, one LSTM layer of 128, and a fully connected layer. CNN 1 -LSTM consists of one convolution layer of 1023, one max-pooling layer of 512, one LSTM layer of 64, and a fully connected layer. We chose four evaluation matrices such as precision, recall, f1score, and accuracy to assess performance. These matrices were measured based on the number of True Positives (TP), False Positive (FP), True Negative (TN), and False Negative (FN) instances, (12) 2492 CT scans from the SARS-CoV-2 dataset (Dataset-1) and 220 CT scans and chest X-rays from the SIRM Covid-19 dataset (Dataset-2) were used for the experiment. We randomly selected 50% of CT scans and chest X-rays for training and testing purposes of each proposed CNN-LSTM model. Table 1 reports detailed layer configurations of both proposed CNN-LSTM networks. These experiments were run to 50 epochs with batch size 32,0.01 learning rate, two fully connected layers, and the sigmoid activation function. The dynamic graphs for accuracy, loss, and receiver operating characteristic (roc) for each proposed CNN-LSTM model are shown in Fig. 7 . The blue and red colors display training and testing accuracy lines for both CNN-LSTM models. Table 2 shows the overall and class-wise performance of the proposed CNN 2 -LSTM in both Covid-19 datasets. The percentage of recall indicates correctly classified cases as a particular class in the outcome of both datasets. The results suggest that the true positive ratio for both classes of Dataset-1 was high. As a result, the proposed CNN 2 -LSTM had the highest detection rate (98.94%). The mild class had a lower recall percentage (68%) than the other classes in Dataset-2. It was perhaps due to greater confusion in mild and moderate class samples, as illustrated in Fig. 8 . The proposed CNN 2 -LSTM achieved a maximum classification rate (83.03%) with only 220 Covid-19 samples of Dataset-2. Table 2 shows that the proposed CNN 2 -LSTM could assist radiologists and practitioners in detecting Covid-19 and its severity with a lower error rate. The confusion matrices of both Covid-19 datasets for the proposed CNN 2 -LSTM model are presented in Fig. 8 . The actual and predicted labels for each class are given vertically and horizontally, respectively. The prediction accuracies of instances were reported in four normalized confusion matrices. The proposed CNN 2 -LSTM model was tested on imbalanced data at first. Normalized detection and classification rates were displayed in confused matrices. The resulting confusion matrices for the first dataset showed Covid-19 (−) detection rate (0.93), which was lowest in comparison to the Covid-19 (+) detection rate (0.98). It showed that sample distribution in the two classes had been biased. The resulting confusion matrices for the second dataset showed a moderate family classification rate (0.35), which was lowest compared to critical, severe, and mild families' classification rates. Similarly, the classification rate for critical family was 0.4, which was lower than the classification rates for severe and moderate families. The sample ratio between these four families was imbalanced. The moderate family was classified at the highest classification rate (0.84) due to the highest number of samples. Classifier usually learns more from highest sample class compared to rest classes during training. The overall detection and classification rates were therefore affected. SMOTE subsequently resolved the imbalanced data problem. After using SMOTE, the resulting confusion matrices for the first dataset showed a detection rate of 0.98 for Covid-19 (−) and 1 for Covid-19 (+). The resulting confusion matrices for the second dataset showed a classification rate of 0.87 for moderate family, classification rate of 0.89 for severe family, classification rate of 0.89 for critical family, and classification rate of 0.68 for mild family. The mild family classification rate dropped from 0.84 to 0.68. The few samples in the moderate class showed a pulmonary region with less affected pneumonia anomalies. These samples were To assess the impact of various training data sizes on the detection accuracy of different deep learning models, we randomly divided SARS-CoV-2 CT scans into seven different training and test ratios. LSTM, GRU, RNN, DNN, CNN, CNN 1 -RNN, CNN 1 -GRU, CNN 2 -RNN, CNN 2 -GRU, CNN 1 -LSTM, and CNN 2 -LSTM were used for the comparison of detection accuracy, as shown in Table 3 . The highest detection accuracy attained by proposed CNN-LSTM models was 99.74%, CNN-GRU models were 85.36, CNN-RNN models were 98.33%, DNN model was 98%, RNN model was 96.95%, GRU model was 97.73%, and LSTM model was 98.31%, respectively. When the training data split ratios were between 30%-80% of the total dataset, the best values of detection accuracy for proposed CNN-LSTM models were between 98.94%-99.74%. Despite the excellent detection performance of the proposed CNN-LSTM models with data augmentation, the model was still overfitted with few train data split ratios. At 50% train data split ratio, the performance curves of proposed CNN-LSTM models showed stability with no overfitting. As a result, we considered the proposed CNN-LSTM models' performance as a maximum at 50% train data split ratio. In conclusion, the proposed CNN-LSTM models outperformed the other deep learning approaches with each training data split ratio. To determine the impact of the different number of layers on CNN network training efficiency, we compared the proposed CNN 2 -LSTM model with previous works [34] [35] [36] [37] [38] [39] [40] . The detection and classification performance was measured based on various indicators such as precision, recall, f-measure, and accuracy. The SARS-CoV-2 Dataset was used in the comparison of Covid-19 detection. As shown in Table 4 , the derived fused feature set of the proposed methodology renders the training complexity of the CNN network in contrast to direct input images. The previous work [37] used the pre-trained xDNN, VGG-16, ResNet, AlexNet, and GoogleNet models for Covid-19 detection. They obtained optimal detection accuracy between 97.38% to 91.73%. The previous works [34] [35] [36] exhibited the best detection accuracy (96.25%,87.68%, and 99.4%, respectively) with pre-training and layers (121, 18, and 201, respectively). The proposed CNN 2 -LSTM model achieved 98.91%∼99.74% detection accuracy without pre-training and 4 layers, as shown in Table 4 . Besides this, we also compared the classification performance of the proposed CNN 2 -LSTM model with previous works [38] [39] [40] . The earlier works [38] [39] [40] used Covid-19 images (729, 300, and 408) for severity classification. However, the proposed methodology used only 220 images for Covid-19 severity classification. The previous works [38] [39] [40] displayed optimal classification accuracy (95.34%,82%, and 81.9% respectively) with pre-training and layers (201,1 and 34 respectively). Compared to them, the proposed CNN 2 -LSTM model achieved 83.03% classification accuracy without pre-training and 5 layers, as shown in Table 4 . Compared to previous works [34] [35] [36] [37] [38] [39] [40] , the fused feature set of the proposed methodology significantly decreased the training complexity of the CNN network. We compared the proposed CNN-LSTM model with state-ofthe-art VGG 16 and VGG 19 models to show the impact of the different number of layers and parameter sizes on network training performance. In Table 5 , the performance of models is measured based on their overall detection accuracy, loss, and training time. The proposed CNN-LSTM model used 1024 vectors of fused features as input, whereas the VGG 16 and VGG 19 used full images (200 × 200 dimensions) of the SARS-CoV-2 CT scan dataset input. Compared to conventional VGG 16 and VGG 19 models, which needed entire images as input, the proposed CNN-LSTM model required fewer input parameters and layers for network training (as shown in Table 5 ). The proposed CNN-LSTM model obtained 99.1% overall detection accuracy, 0.02 loss, and a training time of 182.731 s with only 4 training layers and 156,802 trainable parameters. While traditional VGG 16 and VGG19 models achieved lower overall detection accuracy (49.2% and 54% respectively) and higher loss (0.69,0.69 respectively), this indicates that both models required additional trainable parameters and training layers for better learning of the whole image input dataset. Overall, the proposed CNN-LSTM model outperforms the conventional VGG 16 and VGG19 models, which take the entire image as input. All experiments were performed at 40 epochs with 16 GB main memory and NVIDIA GeForce RTX 2060 6 Gb GPU. Fig. 9 shows a brief comparison of training time between the proposed CNN-LSTM model and state-of-the-art deep learning models (VGG 16 and VGG19, respectively). We divided 40 epochs into eight batches to ensure clear graph visibility. The value of each batch indicates the average time of 5 epochs. In the initial 5 epochs, the value of the average training time for CNN-LSTM was high and later gradually declined in the rest of the epochs. The average training time value for both VGG 16 and VGG 19 models in all the 40 epochs was high compared to the proposed CNN-LSTM model. In general, models requiring the entire input image dataset usually take longer to learn more input parameters and layers. The average time given in Fig. 9 shows that the proposed CNN-LSTM model handles sizeable medical datasets with fewer training parameters and layers and less training time. 33 ,410 parameters. The CNN 2 -LSTM contains two convolution layers with lengths 64 and 128, two max-pool-layers with a size of 2, one LSTM layer with length 128, three dropout layers with length 0.1, and one fully connected layer. In comparison, the CNN 1 -LSTM contains one convolution layer with length 64, one max-pooling layer with a size of 2, two dropout layers with length 0.1, one LSTM layer with length 128, and one fully connected layer. In Table 6 , we can infer that the proposed CNN 2 -LSTM model significantly outperforms the others for precision, recall, f1-score, accuracy, loss, and roc. However, it is impossible to assess the overall detection of deep-learning models with just an accuracy indicator as the models can be over-fit. Therefore, there is a need to observe the loss values of all deep learning models in Table 6 , Fig. 7, and Fig. 12 . The proposed CNN 1 -LSTM obtained excellent performance at the 25th epoch and then decreased performance due to the overfitting of the network, as shown in Fig. 7 . The overall loss of CNN 1 -LSTM was 0.14. The proposed CNN 2 -LSTM Fig. 12 . As a result, the loss values of these deep learning models in Table 6 were high. The deep features of the CT scans and X-rays obtained by two convolution layers are distinct from typical time-series data features. The inclusion of more convolution layers and kernels also increases the model's accuracy. The dynamic graphs for accuracy, loss, and receiver operating characteristic (roc) for each proposed CNN-LSTM model and state-of-the-art deep learning techniques are shown in Figs. 7 and 12. The blue and red colors display training and testing accuracy lines for all models. The green and yellow colors indicate training and testing loss lines for all models. In Fig. 7 , each proposed CNN-LSTM model's accuracy begins with a 0.45 graph scale but quickly increases to a 0.98 graph scale in only 5-25 epochs. While the loss for each of the proposed The network with a reduced number of layers thus took more epochs to achieve optimum efficiency. CNN 2 -GRU and CNN 1 -GRU reached optimum efficiency during the 50th epoch, but a low performance was found as these networks were overfitted. Table 5 and Fig. 7 demonstrate that CNN can reliably extract deep characteristics, and LSTM can catch long-term data dependencies significantly from the result. Thus CNN-LSTM layered layout is more sophisticated than traditional deep learning approaches for image recognition. The test data was passed on all the saved models, and we estimated the precision, recall, f1-score, and accuracy as shown in Table 6 . The results demonstrated that each proposed CNN-LSTM model significantly outperformed the conventional deep learning techniques in precision, recall, f1score, and accuracy. Based on precision and recall performance indicators, the detailed comparison of the proposed CNN-LSTM model with state-of-the-art deep learning approaches is provided in Figs. 10 and 11. Figs. 10 and 11 showed that the proposed CNN-LSTM model achieved the best precision, recall, and weighted average value for each class (Covid and Non-Covid), among other deep learning approaches. In this paper, a combined deep learning and multi-level feature extraction methodology were proposed to identify Covid-19 CT scans and chest X-rays. Multi-level feature extraction approach was used to extract features from CT scans and chest X-rays. It significantly improved the training efficiency of the CNN network. The imbalanced data issue was resolved by using the popular SMOTE algorithm. t-SNE visualized the high dimensional features of the level 1 feature description. The LSTM deep learning model was used for Covid-19 detection in the proposed methodology. The Kaggle database's SARS-CoV-2 CT scan dataset and the SIRM Covid-19 CT scan and chest X-ray dataset were employed for experimentation. Experimental outcomes indicated that proposed approach attained 98.94% accuracy with the SARS-CoV-2 CT scan dataset and 83.03% accuracy with the SIRM Covid-19 CT scan and chest X-ray dataset The state-of-theart deep learning models were used to assess the efficiency of the proposed automated Covid-19 detection approach. The proposed approach has a high success rate with a limited number of Covid-19 samples, which speeds up the treatment of Covid-19 cases. The proposed approach provides help to doctors and radiologists in robust Covid-19 detection and assists them in treating severe cases. There are some limitations to this study that can be further improved in future research. The cross-validation of deep learning models with divergent folds is very important, but the proposed approach randomly separated data due to computational complexities. Moreover, the present research primarily focuses on Covid-19 pneumonia and can be applied to evaluate other types of pneumonia in the future. Hamad Naeem: Conceptualization, Methodology, Data curation, Software, Visualization, Investigation, Writing -original draft. Ali Abdulqader Bin-Salem: Writing -review & editing. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. About world meter Covid-19 data -world meter Clinical features of patients infected with 2019 novel coronavirus in Everything about the corona virus-medicine and health Coronavirus disease 2019. A & A Pract 2020 Detection of SARS-CoV-2 in different types of clinical specimens Chest CT for typical 2019-nCoV pneumonia: Relationship to negative RT-PCR testing Classification of Covid-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks Deep learning approach for microarray cancer data classification Three-stage network for age estimation Multiobjective differential evolution based random forest for e-health applications Detection of abnormal heart conditions based on characteristics of ECG signals Abd El-Latif, A novel blood pressure estimation method based on the classification of oscillometer waveforms using machine-learning methods Classification of diabetic retinopathy types based on convolution neural network (CNN) Classification of corneal pattern based on convolutional LSTM neural network Survey on computer vision for assistive medical diagnosis from faces Prediction of breast cancer using support vector machine and K-Nearest neighbors Performance evaluation of random forests and artificial neural networks for the classification of liver disorder Mathematical model development to detect breast cancer using multigene genetic programming Diabetes prediction: a deep learning approach Coronary artery heart disease prediction: a comparative study of computational intelligence techniques Developing IoT based smart health monitoring systems: a review Development of smart healthcare monitoring system in IoT environment Feature extraction for image recognition and computer vision Optimal deep learning model for classification of lung cancer on CT images Bone suppression of chest radiographs with cascaded convolutional networks in wavelet domain Abd El-Latif, Deploying machine and deep learning models for efficient data-augmented detection of Covid-19 infections SurfCNN: A descriptor enhanced convolutional neural network Surfcnn: A descriptor accelerated convolutional neural network for image-based indoor localization Covid-DenseNet: A deep learning architecture to detect Covid-19 from chest radiology images Lung infection quantification of Covid-19 in CT images with deep learning Covid-19 screening on chest x-ray images using deep learning-based anomaly detection A fully automatic deep learning system for Covid-19 diagnostic and prognostic analysis Detection of coronavirus disease (Covid-19) based on deep features Classification of the Covid-19 infected patients using DenseNet201 based deep transfer learning Covid-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis Explainable Covid-19 detection using chest CT scans and deep learning SARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification eduardo soares Rapid identification of COVID-19 severity in CT scans through classification of deep features COVID-19 chest computed tomography to stratify severity and disease extension by artificial neural network computer-aided diagnosis Development and validation of a deep learning-based model using computed tomography imaging for predicting disease severity of coronavirus disease Diagnosis and treatment protocol for novel coronavirus pneumonia Severity scoring of lungs edema on the chest radiograph is associated with clinical outcomes in ARDS An empirical study of oversampling and under sampling for instance selection methods on imbalance datasets Object recognition from local scale-invariant features A Bayesian hierarchical model for learning natural scene categories Speeded-up robust features (SURF) Identification of malicious code variants based on image visualization Bag-of-frequencies: A descriptor of pulmonary nodules in computed tomography images Descriptive visual word sand visual phrases for image applications PCA-SIFT: A more distinctive representation for local image descriptors Detection of malicious activities in internet of things environment based on visualization images and machine intelligence Visualizing data using t-SNE Evaluating Prediction Accuracy for Collaborative Filtering Algorithms in Recommender Systems Eigensample: A non-iterative technique for adding samples to small datasets Twin neural networks for the classification of large unbalanced datasets Deep learning-driven automated detection of Covid-19 from radiography images: A comparative analysis A CNN-LSTM model for tailings dam risk prediction The authors wish to thank Dr. Naeem Azam (Livestock Department of the Government of Punjab, Pakistan), Dr. Iqra Naeem (Nishtar Hospital Multan, Pakistan), and Dr. Sana Naeem (University of Veterinary & Animal Sciences, Lahore Pakistan) for their constructive reviews and suggestions that significantly contributed to improving the quality of the publication.