key: cord-0689037-xk73952z
authors: Aslan, Muhammet Fatih; Unlersen, Muhammed Fahri; Sabanci, Kadir; Durdu, Akif
title: CNN-based transfer learning–BiLSTM network: A novel approach for COVID-19 infection detection
date: 2020-11-18
journal: Appl Soft Comput
DOI: 10.1016/j.asoc.2020.106912
sha: c30bc877088dc6ba48159b04d8ca9006afffd184
doc_id: 689037
cord_uid: xk73952z

Coronavirus disease 2019 (COVID-2019), which emerged in Wuhan, China in 2019 and has spread rapidly all over the world since the beginning of 2020, has infected millions of people and caused many deaths. For this pandemic, which is still in effect, mobilization has started all over the world, and various restrictions and precautions have been taken to prevent the spread of this disease. In addition, infected people must be identified in order to control the infection. However, due to the inadequate number of Reverse Transcription Polymerase Chain Reaction (RT-PCR) tests, Chest computed tomography (CT) becomes a popular tool to assist the diagnosis of COVID-19. In this study, two deep learning architectures have been proposed that automatically detect positive COVID-19 cases using Chest CT X-ray images. Lung segmentation (preprocessing) in CT images, which are given as input to these proposed architectures, is performed automatically with Artificial Neural Networks (ANN). Since both architectures contain AlexNet architecture, the recommended method is a transfer learning application. However, the second proposed architecture is a hybrid structure as it contains a Bidirectional Long Short-Term Memories (BiLSTM) layer, which also takes into account the temporal properties. While the COVID-19 classification accuracy of the first architecture is 98.14%, this value is 98.70% in the second hybrid architecture. The results prove that the proposed architecture shows outstanding success in infection detection and, therefore this study contributes to previous studies in terms of both deep architectural design and high classification success.

short-term memory (LSTM) [27] is a kind of RNN architecture that are effectively solves the vanishing gradient problem. Moreover, it can learn long data sequences. Bidirectional long short term memory (BiLSTM) [28] stands for bi-directional LSTM; this means that the signal propagates backward as well as forward in time. Compared to BiLSTM, LSTM only uses historical context. Therefore BiLSTM can solve sequential modeling task better than LSTM [29] .

In this research, deep learning architectures for COVID-19 infection detection are proposed.

COVID-19 Radiography Database [30] is used as the dataset. First, ANN-based segmentation is applied to raw chest CT X-ray images before training to improve classification accuracy. As a result of the segmentation, the lung part of the raw image is cropped. In order to provide data diversity, the number of segmented images is increased with the data augmentation technique.

Later, 85% of these images are given as input to architectures designed for training. Both architectures include the previously trained AlexNet architecture (transfer learning). The first architecture is AlexNet's modified version in accordance with chest CT images. The second architecture includes the BiLSTM layer, which takes into account the temporal features in the image in addition to the first architecture. This study contributes to previous studies in terms of providing ANN-based lung segmentation, proposing a hybrid structure containing BiLSTM layer with transfer learning, and achieving high classification success.

The contributions of this research are summarized below.  The proposed model is uncomplicated, it can easily detect COVID-19 completely automatically.

The rest of the paper is organized as follows. Section 2 surveys the related work regarding COVID- 19 . The methodology and dataset are mentioned in section 3. Performance evaluation and results are in section 4. Discussion of the results, comparison with previous studies is described in section 5. Finally, chapter 6 concludes the article and provides information on future works.

In literature, there are many studies about artificial intelligence employed for various purposes like Alzheimer's disease diagnosis, cancer estimation, biopsy and dermoscopy analysis etc. [31] [32] [33] [34] [35] . In recent times, the COVID-19 pandemic creates heavy work on health workers. Therefore, any help by artificial intelligence to physicians makes their works and decisions healthy. In this context, various methods are proposed in the literature to interpret the X-Ray or Computer Tomography images in terms of COVID-19. Some of the previous studies can be summarized as follows:

Khan and Aslam [36] proposed a tool based on deep-learning techniques for diagnosis process of COVID-19 by using X-ray images. In that study, some of the deep-learning models were investigated. It was reported that VGG-16 and VGG-19 models have the better performance than the others. Aman Jaiswal [37] used X-ray images to diagnose COVID-19 by various deep learning algorithms. In that study, CNN architectures performances were compared. Additionally a majority rule was suggested as a novel approach. The best performance in the paper was presented as 98.96%. Nour, Cömert and Polat [38] proposed CNN architecture based automatic diagnosing system for detecting positive COVID-19 via using X-ray images. The suggested CNN model, consisting of a five convolution layered serial network was trained from scratch. The CNN model J o u r n a l P r e -p r o o f [39] proposed a DCNN based automatic COVID-19 diagnosis system by using X-ray images. The X-ray images were applied to DCNN based model Inception-V3 with transfer learning without any pre-process. The classification accuracy of the diagnosis system was 96%. Toğaçar, Ergen and Cömert [40] used a deep learning model to detect COVID-19 by X-ray images. In that study, the classes were restructured using the Fuzzy Color technique as a preprocessing step and the images that were structured with the original images were stacked. Later, the stacked dataset was trained using MobileNetV2 and SqueezeNet deep learning models. The Support Vector Machines (SVM) method was employed for image classification. The classification rate obtained with the proposed approach was 99.27%. Ucar and Korkmaz [41] demonstrated an artificial intelligence based structure that uses chest X-ray images to estimate COVID-19 disease. The SqueezeNet was tuned for the COVID-19 diagnosis with the Bayesian optimization additive. Additionally the dataset was augmented. As a result of the study, the accuracy of COVID-19 classification was stated as 98.3%. Ozturk, Talo, Yildirim, Baloglu, Yildirim and Acharya [42] suggested an automatic COVID-19 detection system based on deep learning. The DarkCovidNet model was designed for the automatic detection of COVID-19 using chest X-ray images. In that study, any preprocess on X-ray images such as augmentation or J o u r n a l P r e -p r o o f Journal Pre-proof segmentation etc. was not applied. As a result of the classification made with 3 classes of data, the classification accuracy was stated as 87.02%. Khan, Shah and Bhat [43] introduced a CNN architecture (CoroNet) to detect COVID-19 using CT and X-ray scans. This model was based on Xception architecture and pre-trained on ImageNet dataset. As a result of the application, it has been shown that the proposed architecture provides 89.6% and 95% accuracy for 4 classes and 3 classes, respectively. Sharma, Rani and Gupta [44] created deep learning models to determine COVID-19 patients from X-ray images. In order to increase number of X-ray images, they performed 25 different types of augmentations on the original images. It was reported that a better performance than previous studies was achieved. Narin, Kaya and Pamuk [45] applied COVID-19 diagnosis using chest X-ray images by developing the ResNet50, Inception-ResNetv2 and InceptionV3 deep learning models. In that study, the binary classification was performed, and the data were validated with 5 fold cross-validation. An average accuracy of 98% was achieved with the ResNet50 model. Ahuja, Panigrahi, Dey, Rajinikanth and Gandhi [46] has developed a threestep implementation to perform COVID-19 detection using CT images. In the first step, they increased the number of data by using 3 level stationary wavelet decomposition. In the second step, they made a classification based on transfer learning using pre-trained models In the last step, abnormalities in the CT image were localized. Finally, in a different study, Singh, Bansal, Ahuja, Dubey, Panigrahi and Dey [47] , after the image augmentation and preprocessing step, fine-tuned the VGG16 architecture and used the model to extract features from lung CT scan images. The The number of studies mentioned above can be increased even more. In the diagnosis of COVID-19 infection, both the images given to the network and the architecture of the network are very effective on the results. As seen above, methods such as CNN, transfer learning, machine learning have been used frequently for the diagnosis of COVID-19. In addition, most of the works perform image augmentation, cropping, image size reduction, etc. operations on the raw images and give the final image to the deep network. According to Liu and Guo [29] , BiLSTM is more effective on classification accuracy than the convolutional layer. However, until now, BiLSTM, which is quite modern and has a higher classification success than CNN, has not been used in previous studies in the diagnosis of COVID-19. What makes this study different from others is to give ANN-based segmented lung images to the CNN-based transfer learning-BiLSTM network. When the results are examined, it is seen that the proposed method provides a successful and easy to apply COVID-19 diagnosis.

In this section, detailed information will be given about the COVID-19 Radiography Database, lung segmentation, data augmentation and finally the architectures used for classification. A general block diagram of the study is given in Fig. 1 . The information about each block in Fig 

In this study an open-access database that covers the posterior-to-anterior chest X-ray images is used [30] . Table 1 , a total of 2905 images with three classes are presented in this dataset. 

Image segmentation is an important issue for artificial intelligence, because noises or irrelevant patterns in the image can lead to false predictions. The raw X-ray images in the dataset contain different noises as seen in Fig. 3 . In order not to take these noises into account in the artificial intelligence algorithm, the segmentation process is applied. The rotation process has been applied to images belonging to the COVID-19 class, which has a much smaller number of samples (see Table 1 ). After the data augmentation step, the COVID-19 class images are increased four times and the number of new COVID-19 class samples has reached J o u r n a l P r e -p r o o f Journal Pre-proof 13 1095. The cropped chest X-ray images are rotated in degrees counterclockwise from 0° to 359° according to a randomly generated number (see Fig. 6 ). 

Nowadays, deep learning-based artificial intelligence studies provide state-of-the-art solutions in computer vision. CNN, which is a deep learning method, is now preferred in different disciplines in image recognition applications. Small details that people cannot notice, can be easily distinguished using CNN. CNNs recognize visual patterns directly from pixel images with minimal preprocessing. The CNN structure was introduced by the LeNet architecture [52] , and AlexNet Unit (ReLU) activation function [53] . After each convolution layer, AlexNet has maximum pooling to reduce network size. After the last convolutional layer there are two fully-connected layers with 4096 outputs. Finally, a layer is added after fully-connected layers to classify the given data. This last layer classifies 1000 objects using the Softmax function [54] [55] [56] .

Different types of classification studies using AlexNet have been performed many times until now.

However lately, more efficient transfer learning applications have started to be preferred in many deep learning studies rather than designing a network from scratch or using an existing network directly. Because using and modifying a pre-trained CNN model is much easier and faster than training a new CNN model with randomly initiated weights. In CNN-based architectures, visual features are usually extracted and learned in the first layers. Therefore, the first layers are not changed and changes can be made on the last layers to take advantage of an existing architecture.

Using transfer learning, architectures trained on large data sets are used directly. Thus, previously learned parameters, especially weights, are transferred to the modified new model [57] . Although Table 2 , mAlexNet layers and parameters of these layers are given. conv1  55x55  11x11  4  0  96  relu  maxpool1 27x27  3x3  2  0  96  -conv2  27x27  5x5  1  2  256  relu  maxpool2 13x13  3x3  2  0  256  -conv3  13x13  3x3  1  1  384  relu  conv4  13x13  3x3  1  1  384  relu  conv5  13x13  3x3  1  1  256  relu  maxpool5  6x6  3x3  2  0  256  -fc6  ----4096  relu  fc7  ----4096  relu  fc8  ----25  relu  fc9  ----2 

When the training parameters are examined, it is seen that the Mini Batch parameter, which allows the training data to be divided into small groups, is 60 and the optimization algorithm used to reduce the train error is Stochastic Gradient Descent with Momentum (SGDM). Parameter values of this algorithm are also found in Table 2 .

(1)

The parameter updating equation performed using the SGDM algorithm is given in Eq. Table   2 in Eq.1, it is ensured that the classification errors in the chest X-ray images are minimized during training. As a result, by using the parameters shown in Table 2 , the network seen in Fig. 8 is trained with the SGDM algorithm. After the training, the accuracy values obtained as a result of the classification made with test images are given in the Results and Discussion section.

The second architecture, designed for the detection of COVID-19, includes the first architecture completely. The convolutional structure in this architecture is exactly the same as the previous architecture. The most important feature that distinguishes the hybrid structure from the previous architecture is the BiLSTM layer. As seen in Fig. 9 Since the BiLSTM layer is suitable for sequential data, firstly, the features are extracted from the images, for this, convolutional architecture is used. Therefore, the new architecture includes both J o u r n a l P r e -p r o o f convolutional layers and BiLSTM, as shown in Fig. 9 . Parameter values and training parameters of the designed mAlexNet-BiLSTM hybrid architecture are given in Table 3 . As seen in Table 3 and Fig. 9 , two consecutive BiLSTM (BiLSTM-1, BiLSTM-2) layers are used.

The temporal features obtained as a result of these layers are given as an input to the fully connected layer (fc9) and the classification is completed using Softmax. In proposed architecture, hidden layer neuron numbers, activation functions, etc. parameter values are found by trial and error method.

During the training phase, the Adam optimization algorithm is used to reduce the error in each iteration. Adam optimization algorithm is adaptive learning rate algorithm that's been designed specifically for training deep neural networks. Adam outperforms other optimization algorithms thanks to its relatively low memory requirement advantage [58] . Adam is an adaptive learning rate method, that is, it calculates individual learning rates for different parameters. Adaptive learning rates are adjustments to learning rate during training step by reducing the learning rate according The parameter values in Table 3 are used in the Adam optimization algorithm shown in Eq. 2-4, and thus, the weights are updated in each iteration in the training phase. As a result of these updates, after reaching the determined iteration number limit, the accuracy values calculated using test data can be seen in Fig. 11 and Table 4 .

Using the architectural parameters and training parameters of the two architectures described above, the performance of both algorithms is determined by test data. In the experimental studies, Training graphics of mAlexNet and hybrid architecture are shown in Fig. 10 . In Fig.10 , it is seen that both training and test loss approach the minimum at the end of the graph. Training time on CNN and BiLSTM networks is 139 seconds and 85 seconds, respectively. However, these times are directly related to the number of epochs and iterations. In the training step, the number of iterations for mAlexNet is determined as 1520, the number of epochs is 5, and the training duration is 139 seconds. In hybrid architecture training, the number of iterations is 1150 and the number of epochs is 50, the training takes 224 (139+85) seconds. The proposed method is not complicated and easy to implement since it automatically realizes segmentation and does not include feature extraction step due to its end-to-end learning architecture. Confusion matrices obtained according to the classification accuracy are shown in Fig. 11 . In addition, different performance metrics such as accuracy, recall, specificity, precision, F1-score, MCC, Kappa, Area Under Curve (AUC) calculated for performance measurement. Receiver operating characteristic (ROC) curves are also shown in Fig. 12 . The formula for each metric is defined between Eq. 5 -Eq. 12 [61] . More detailed information on metrics can be found in reference [62] . The performance values obtained with these formulas are shown in Table 4 . Considering the results obtained in Table 4 , it can be seen that the results of both architectures are successful. However, the classification success of the hybrid structure formed by adding BiLSTM to the mAlexNet model is higher. The calculated accuracy rates are 98.14% and 98.70% for mAlexNet and mAlexNet+BiLSTM, respectively. In addition, as shown in Table 4 , Precision,

Recall, F1-Score, Specificity and MCC values are also higher for the hybrid architecture. This shows that the hybrid architecture performs better overall performance and unbiased classification.

J o u r n a l P r e -p r o o f

The main goal of this study is to show that the CNN-based transfer learning-BiLSTM hybrid structure is highly effective for the diagnosis of COVID-19. Similar to other previous studies, our study is based on deep learning. However, it differs due to its methodological contribution. Studies suggesting different methods previously made using deep learning are compared in Table 5 in terms of their accuracy. Accordingly, it is seen that the proposed method is comparable with previous studies in terms of accuracy. As can be seen in Table 5 , many studies including CNN architecture have been carried out so far.

The biggest advantage of these architectures is that they contain an end-to-end learning structure, J o u r n a l P r e -p r o o f i.e. there is no handcrafted e feature extraction step. In addition, the new trend is transfer learningbased CNN architectures, as it improves classification accuracy. Therefore, the combination of different pre-trained models or pre-trained model-machine learning methods have been frequently proposed recently. However, the approach suggested in this study is different from the previous ones.

The proposed method owes its success to lung segmentation and hybrid architecture. Most of the deep learning-based studies for the diagnosis of COVID-19 are only CNN-based, as shown in Table   5 . In addition, most studies give raw images as input to the CNN without lung segmentation. This causes the features extracted from the X-Ray image to express that image class poorer. Since the proposed study performs the segmentation process automatically, it provides both high classification accuracy and convenience. Also, according to the study by Liu and Guo [29] ,

BiLSTM have greater effects than the convolutional layer on the classification accuracy. In this study, CNN is used for feature extraction and BiLSTM is used to classify COVID-19 according to these features. This provides a high classification success compared to most previous studies.

Moreover, the proposed architecture gives the features extracted from CNN directly to the BiLSTM layer. Therefore, its application is simple and uncomplicated.

The general disadvantage of deep learning studies is that the ability to generalize is largely 

Early detection of COVID-19 disease is crucial to preventing the disease from spreading to other people. This study uses chest X-ray images to easily diagnose COVID-19. First, ANN-based segmentation is applied to the raw images, so that only the lung area is evaluated for COVID-19

detection. Then, in order to provide data diversity in the images, images belonging to the COVID- Since the proposed model has an end-to-end learning structure, it provides automatic detection of COVID-19 by using chest X-ray images without requiring any handcrafted feature extraction J o u r n a l P r e -p r o o f technique. In this way, a fast and stable system helps expert radiographs as a decision support system. In this way, the workload of radiologists is reduced and misdiagnosis is prevented.

Although the proposed method is successful, different methods based on deep learning will be proposed for the detection of COVID-19 in future studies. The first planned study is to increase the success by increasing the number of datasets. As is known, the success of deep learning depends largely on the number of labelled data. Therefore, generative adversarial network (GAN) combined with a deep neural network (DNN) structure will be developed. Another planned study is to develop a stronger CNN-based lung segmentation.

Authors are grateful to the RAC-LAB (www.rac-lab.com) for training and support.

Conflict of Interest: The authors declare that they have no conflict of interest

COVID-19 situation update worldwide, as of

Features, evaluation and treatment coronavirus (COVID-19), in: Statpearls [internet

Recommendations for the use of Chest Radiography and Computed Tomography (CT) for Suspected COVID-19 Infection

Frequency and distribution of chest radiographic findings in COVID-19 positive patients

Enhancement of soft-tissue contrast in cone-beam CT using an anti-scatter grid with a sparse sampling approach

Multi-mounted X-ray cone-beam computed tomography

Breast cancer detection and classification using thermography: a review

Automatic lung cancer detection from CT image using improved deep neural network and ensemble classifier

A feasibility study of pulmonary nodule detection by ultralow-dose CT with adaptive statistical iterative reconstruction-V technique

Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review

A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection

Binary optimization using hybrid grey wolf optimization for feature selection

Application of neural network and weighted improved PSO for uncertainty modeling and optimal allocating of renewable energies along with battery energy storage

A novel hybrid PSO-GWO algorithm for optimization problems

Satin bowerbird optimizer: A new optimization algorithm to optimize ANFIS for software development effort estimation

Biogeography-based optimization

Bat algorithm applied to continuous constrained optimization problems

Multi-verse optimizer: a nature-inspired algorithm for global optimization

Memetic firefly algorithm for combinatorial optimization

Convolutional capsnet: A novel artificial neural network approach to detect COVID-19 disease from X-ray images using capsule networks

Deep learning workflow in radiology: a primer, Insights into Imaging

Transfer Deep Learning Along With Binary Support Vector Machine for Abnormal Behavior Detection

Very deep convolutional networks for large-scale image recognition

Learning Long-Term Temporal Features With Deep Neural Networks for Human Action Recognition

Advanced deep-learning techniques for salient and category-specific object detection: a survey

Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks

Long Short-Term Memory

Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Bidirectional LSTM with attention mechanism and convolutional layer for text classification

Can AI help in screening viral and COVID-19 pneumonia?

Deep Learning Approaches Towards Skin Lesion Segmentation and Classification from Dermoscopic Images-A Review

Use of multimodality imaging and artificial intelligence for diagnosis and prognosis of early stages of Alzheimer's disease

Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study

Determining cervical cancer possibility by using machine learning methods

Breast cancer diagnosis by different machine learning methods using blood analysis data

A Deep-Learning-Based Framework for Automated Diagnosis of COVID-19 Using X-ray Images

Analysis of Deep Learning algorithms on COVID-19 Radiography

A novel medical diagnosis model for COVID-19 infection detection based on deep features and Bayesian optimization

Automatic Detection of COVID-19 Using X-ray Images with Deep Convolutional Neural Networks and Machine Learning, medRxiv

COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches

COVIDiagnosis-Net: Deep Bayes-SqueezeNet based Diagnostic of the

COVID-19) from X-Ray Images

Automated detection of COVID-19 cases using deep neural networks with X-ray images

Coronet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images

Artificial Intelligence-Based Classification of Chest X-Ray Images into COVID-19 and Other Infectious Diseases

Automatic detection of coronavirus disease (covid-19) using xray images and deep convolutional neural networks

Deep transfer learning-based automated detection of COVID-19 from lung CT scan slices

Transfer learning based ensemble support vector machine model for automated COVID-19 detection using lung computerized tomography scan data

COVID-19 Image Data Collection, ArXiv, abs

A survey on image data augmentation for deep learning

Data augmentation for improving deep learning in image classification problem

Learning algorithms for classification: A comparison on handwritten digit recognition

Imagenet classification with deep convolutional neural networks

Utilizing AlexNet deep transfer learning for ear recognition

Application of deep learning in neuroradiology: Brain haemorrhage classification using transfer learning

Stacked denoising autoencoder and dropout together to prevent overfitting in deep neural network

Transfer learning based histopathologic image classification for breast cancer detection, Health information science and systems

Adam: A method for stochastic optimization

Adam deep learning with SOM for human sentiment classification

PIndroid: A novel Android malware detection system using ensemble learning methods

Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images

Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images

Covid-resnet: A deep learning framework for screening of covid19 from radiographs

Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks

A deep learning system to screen novel coronavirus disease 2019 pneumonia, Engineering

InstaCovNet-19: A deep learning classification model for the detection of COVID-19 patients using Chest X-ray

Detection of coronavirus disease (covid-19) based on deep features

Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images

A New Modified Deep Convolutional Neural Network for Detecting COVID-19 from X-ray Images