key: cord-0106961-fz8gk12f authors: Degerli, Aysen; Ahishali, Mete; Kiranyaz, Serkan; Chowdhury, Muhammad E. H.; Gabbouj, Moncef title: Reliable COVID-19 Detection Using Chest X-ray Images date: 2021-01-28 journal: nan DOI: nan sha: 59ee21c55d93d157969f223427a819c9028ffd30 doc_id: 106961 cord_uid: fz8gk12f Coronavirus disease 2019 (COVID-19) has emerged the need for computer-aided diagnosis with automatic, accurate, and fast algorithms. Recent studies have applied Machine Learning algorithms for COVID-19 diagnosis over chest X-ray (CXR) images. However, the data scarcity in these studies prevents a reliable evaluation with the potential of overfitting and limits the performance of deep networks. Moreover, these networks can discriminate COVID-19 pneumonia usually from healthy subjects only or occasionally, from limited pneumonia types. Thus, there is a need for a robust and accurate COVID-19 detector evaluated over a large CXR dataset. To address this need, in this study, we propose a reliable COVID-19 detection network: ReCovNet, which can discriminate COVID-19 pneumonia from 14 different thoracic diseases and healthy subjects. To accomplish this, we have compiled the largest COVID-19 CXR dataset: QaTa-COV19 with 124,616 images including 4603 COVID-19 samples. The proposed ReCovNet achieved a detection performance with 98.57% sensitivity and 99.77% specificity. Coronavirus disease 2019 , caused by severe acute respiratory syndrome Coronavirus-2 (SARs-CoV-2), was declared a pandemic by the World Health Organization in March 2020. The disease affects seriously people in high-risk groups (especially the elderly) leading to hospitalization, intubation, and even death [1] . In order to prevent the spread of the disease, detection, and isolation of infected patients have the utmost importance. However, the diagnosis of COVID-19 is challenging due to its similar symptoms with other viral infections such as fever, cough, fatigue, and breathlessness [2] . Therefore, reliable detection of the disease has significant importance. Recent diagnostic tools to detect COVID-19 are nucleic acid detection with real-time polymerase chain reaction (RT-PCR), computed tomography (CT), and chest X-ray (CXR) imaging. RT-PCR has become the gold standard for COVID-19 diagnosis. However, RT-PCR tests suffer from instability and high false alarm rate [3] . On the other hand, CT imaging has higher sensitivity compared to RT-PCR test; thus, recommended for the suspected cases [4] . However, the performance of CT imaging in the early COVID-19 cases has limited sensitivity [4] . Thus, CXR imaging is widely used for the diagnosis of COVID-19 mainly because of its advantages that are faster acquisition, less radiation exposure, and easy accessibility compared to the aforementioned tools [5] . Many studies utilized Deep Learning (DL) algorithms for COVID-19 detection [6] [7] [8] . However, the reliability of these models is under question due to their hidden decision-making process. In fact, the activation maps of the deep models reveal the unreliability of their decision-making process, where irrelevant areas on the CXRs, outside of the lung area such as bones, background, or text, affect the decision of the network. Therefore, several studies [9] [10] [11] attempted to prevent deep models to learn from these irrelevant areas on the CXRs with a two-staged approach for COVID-19 detection by processing only the lung areas with lung segmentation as their first stage. At the second stage, only the segmented lung area on the CXRs are given to the deep models as the input. Although these studies have achieved good performance for COVID-19 detection, data scarcity is the main drawback that can yield overfitting and hinders an accurate evaluation. Moreover, the datasets used in these studies encapsulate none or limited thoracic diseases, i.e., viral and bacterial pneumonia against COVID-19 pneumonia that makes them unreliable in real-case scenarios for COVID-19 diagnosis. In this study, to address the aforementioned issues we propose ReCovNet: a reliable COVID-19 detection network, which is an end-to-end network solution. Instead of detecting COVID-19 directly from the CXR image or the segmented lung area on the CXR, we embed this information into the ReCovNet model by transfer learning from a segmentation network. For this purpose, we initially train the segmentation network and detach its encoder block to reconstruct the ReCovNet model for COVID-19 detection. Additionally, in this work, we extend the QaTa-COV19 dataset that was introduced in our previous study [12] . The extended version of QaTa-COV19 is the largest COVID-19 dataset with 124, 616 images including 4603 COVID-19 samples. The control group CXRs consists of 14 different thoracic diseases and healthy subjects. Moreover, the QaTa-COV19 consists of a subset of 1065 early COVID-19 cases showing no or limited sign of COVID-19 pneumonia, which makes the diagnosis more challenging. Accordingly, the proposed ReCovNet trained over the largest QaTa-COV19 dataset has an outstanding performance with a reliable diagnosis compared to state-ofthe-art deep models. Lastly, the benchmark QaTa-COV19 dataset is publicly shared with the research community 1 . The rest of the paper is organized as follows. In Section 2, we introduce the QaTa-COV19 dataset and give the details of our proposed ReCovNet model along with the state-of-the-art deep models. In Section 3, we report the experimental results, and we conclude the paper in Section 4. In this section, first we introduce the benchmark QaTa-COV19 dataset. Then, the state-of-the-art deep models are introduced for COVID-19 diagnosis. Lastly, we propose the ReCovNet model for reliable COVID-19 detection. The benchmark QaTa-COV19 dataset, compiled by researchers of Qatar University and Tampere University is so far the largest COVID-19 dataset including 4603 COVID-19 and 120, 013 control group CXRs. The detection task on this dataset is especially challenging since QaTa-COV19 consists of 1065 samples from early COVID-19 cases that show no or limited sign of COVID-19 pneumonia. COVID-19 samples have been collected from publicly available datasets and repositories [10, [13] [14] [15] [16] [17] [18] , and were preprocessed by excluding low-quality images and any duplication. The control group images were collected from several datasets: ChestX-ray14 [19] , X-rays from pediatric patients [20] , and Chest X-rays (Indiana University) [21] . We have only used the bacterial and viral pneumonia CXRs from pediatric patients to increase pneumonia samples for a challenging diagnosis. Additionally, we included only the lateral-view CXRs from Chest X-rays (Indiana University) dataset since all other samples in the control group are from frontalview CXRs, whereas COVID-19 samples include CXRs both from lateral and frontal views. Table 1 shows the number of samples in the QaTa-COV19 dataset. COVID-19 detection is performed against the control group images, which consists of 14 different thoracic diseases and healthy subjects. Therefore, we perform a binary classification problem. Since the train and test sets of the ChestX-ray14 dataset are predefined, we have randomly split Chest X-rays (Indiana University), bacterial and viral pneumonia, and COVID-19 CXRs with the same train/ test ratio as in [19] . The CXRs in the dataset are resized to 224×224 pixels. We have augmented the images except for ChestX-ray14 samples using the Image Data Generator in Keras. The images are 10% randomly shifted both horizontally and vertically, and in a 10-degree range randomly rotated. Lastly, the 'nearest' mode is selected to fill the blank sections. DL algorithms achieved state-of-the-art results on many computer vision tasks, including COVID-19 detection. Especially during the pandemic, recent studies concluded that DL algorithms with Convolutional Neural Networks can achieve outstanding performance for COVID-19 diagnosis. Nevertheless, the major issue in DL is that supervised deep models require a large amount of data to generalize well over unseen data. Thus, when subjected to data scarcity, such models fail in the testing phase due to overfitting. In this study, our first objective is to investigate the performances of state-of-the-art deep models by transfer learning on the largest COVID-19 dataset: QaTa-COV19. The state-of-the-art networks are selected as follows: • DenseNet-121 [22] is a 121-layer deep network that achieves a maximum information flow by connecting the layers with additional input nodes. • ResNet-50 [23] is a deep network with 50-layers that introduces residual blocks to prevent gradient vanishing in deep model structures by shortcut connections that merge input and output through the stacked layers. • Inception-v3 [24] is a deep network with low computational complexity compared to other state-of-the-art deep models. The reduced complexity is ensured by pruning and factorizing operations inside the network. • Inception-ResNet-v2 [25] unites the structure of the inception model [24] with residual blocks [23] to achieve state-ofthe-art results in computer vision tasks with a less computational cost. In order to utilize the deep models in the COVID-19 detection task, we modify their output layers by inserting a global average pooling layer, a fully connected layer with 2-neurons, and a softmax activation function. The transfer learning is performed on the models by initializing their weights with the ImageNet weights. DL algorithms are often considered as black-box since their decisionmaking process is latent. In order to reveal their mystery in the decision-making process, the authors in [26] proposed Grad-CAM method that computes activation maps indicating the areas on the input image considered by the deep model during the classification task. In the COVID-19 detection task, our observations on the activation maps with the Grad-CAM approach show that the stateof-the-art deep models tend to learn and perform the classification from irrelevant areas on the CXRs, such as bones, background, or text. Therefore, the decisions of these models may be considered unreliable for COVID-19 detection. In order to overcome the unreliability issue, this study proposes ReCovNet: an end-to-end network for reliable COVID-19 detection. ReCovNet is a deep network that considers the lung areas on the input CXR images to detect COVID-19 pneumonia. The structure of the proposed ReCovNet is given in Fig. 1 . Accordingly, to construct ReCovNet, a segmentation network is trained in phase-I. The structure of the lung segmentation network is a convolutional autoencoder that maps the input image, X to its corresponding output mask, M: M ← − P θ ,φ (X). Any deep model can be used as the encoder block of the network, ε θ . On the other hand, the decoder block of the segmentation network is similar to the U-Net [27] model except for its u-shaped architecture, where the low-level features at the encoder block are concatenated with the high-level features at the decoder level. The u-shaped architecture is excluded by removing the skip connections, which performs the concatenation operation. The reason for constructing an encoder-decoder network without skip connections is that the contributions from the initial layers are avoided; therefore, the network can make decisions from the highlevel features that are closer to segmentation mapping of the input image. Based on our observations, this approach improves the performance of ReCovNet in terms of reliability observed in the activation maps. The decoder block of the segmentation network consists of φ ∈ {b j , w j } L j=1 with L number of layers composed of five stages. Each stage consists of an upsampling layer by ×2, and sequentially two times of the convolutional layer, batch normalization, and Rectified Linear Unit (ReLU) activation function. The output of the last stage is connected to a convolutional layer with a sigmoid activation function to reconstruct the segmentation mask at the output. In order, the number of convolutional layer filters are {256, 128, 64, 32, 16, 1} with kernel of size of k = (3 × 3). Lastly, training is performed over N number of samples {x j s,train , M j } N j=1 , where x s and M are the training data, and ground-truth segmentation masks, respectively. The loss function used in training is a hybrid function, which is the summation of the binary focal and dice loss functions. During phase-II of the training, we construct the convolutional layers of ReCovNet by ε θ that generates the latent features f ← − ε θ (.). Then, f is vectorized and downsampled by attaching a global average pooling layer and a fully connected layer with 2-neurons using softmax activation function. We perform the classification task with categorical cross-entropy loss function by training ReCovNet over N number of samples {x j train , y j train } N j=1 , where x and y are the training data and ground-truth labels, respectively. During this training phase, ε θ is not frozen; therefore, the latent features f are further adjusted to the benchmark QaTa-COV19 dataset. Overall, during the inference, ReCovNet does not require prior lung segmentation to provide reliable COVID-19 detection. Finally, we propose two versions of the proposed model: ReCovNet-v1 is formed by DenseNet-121 encoder due to its good performance in the COVID-19 detection task, and ReCovNet-v2 is formed by ResNet-50 encoder. In this section, the experimental setup is presented. Then, the experimental results are given on the benchmark QaTa-COV19 dataset. The performance metrics are calculated on the test (unseen) set of the QaTa-COV19 dataset. We consider COVID-19 CXRs as positiveclass, whereas control group samples as negative-class. Accordingly, we form the confusion matrix (CM) elements as follows: true positive is the number of correctly classified COVID-19 samples, false positive is the number of misclassified control group samples as the positive class member, true negative is the number of correctly detected control group samples, and false negative is the number of misclassified COVID-19 samples as the negative class members. The performance metrics are defined as follows: sensitivity is the rate of correctly detected COVID-19 samples in all positive samples, specificity is the ratio of correctly classified control group samples in all negative samples, precision is the rate of correctly classified positive samples among all the members detected as positive class members, accuracy is the rate of correctly detected samples among all the data. Moreover, we define the F-score as follows: where the harmonic average between precision and sensitivity is defined as F1-score as β = 1. On the other hand, to minimize the effect of false negatives over false positives, F2-Score is defined as β = 2. The major performance metric in COVID-19 detection is sensitivity since any misdetection of the disease threatens global health. Hence minimizing the false alarm (1 − specificity) is our target. The networks are implemented using Tensorflow library on NVidia ® GeForce RTX 2080 Ti GPU card. The optimizer choice is Adam with its default momentum parameters. ReCovNet models are trained with 15-epochs, the learning rate of α = 10 −5 , and a batch size of 64. The segmentation networks are trained with 15-epochs, the learning rate of α = 10 −4 , and a batch size of 32. We have utilized Montgomery County X-ray Set [28] and Japanese Society of Radiological Technology (JSRT) [29] datasets to train the segmentation models. All the images are frontal-view CXRs and have their corresponding ground-truths except for the JSRT. Thus, the segmentation masks provided by [30] are used as ground-truths for JSRT [29] . Overall, the number of CXRs is 385 in the lung segmentation dataset. For the performance evaluation, we split this data with a ratio of 80% training to 20% test sets. Then, training samples are augmented up to 1000 samples. In this section, the performance of segmentation networks is first investigated. Over the test set of segmentation dataset, the segmentation model with DenseNet-121 encoder has achieved 96.12% sensitivity and 98.59% specificity, and with ResNet-50 encoder 97.12% sensitivity and 98.22% specificity for the lung segmentation task. The COVID-19 detection performance results of the state-ofthe-art and ReCovNet models are presented in Table 2 . For each model, we have observed that their performance on COVID-19 detection is successful with > 94% sensitivity. The best model from the state-of-the-art deep models is DenseNet-121 with 97.43% sensitivity and 99.97% specificity. The performance of ReCovNet-v1 is very close to DenseNet-121. However, the best sensitivity in COVID-19 detection is achieved by the ReCovNet-v2 by 98.57%, which is an outstanding performance for the diagnosis on the largest COVID-19 dataset. Moreover, ReCovNet-v2 also holds a high specificity level of 99.77%. Table 3 shows the confusion matrices of the best performing models, which are DenseNet-121 from state-of-the-art deep models and ReCovNet-v2 from the proposed networks. The best detection (sensitivity) rate is achieved by ReCovNet-v2, which misses only 15 COVID-19 samples among 1050 images. The results on the largest COVID-19 dataset, which includes many CXR images from different thoracic diseases, shows that deep models can achieve elegant COVID-19 detection performance. However, the activation maps extracted by Grad-CAM [26] approach reveal the contribution of the irrelevant regions and this is a major issue of these models in COVID-19 diagnosis. To exemplify this issue, we have compared the proposed ReCovNet-v1 and ReCovNet-v2 models with deep models as shown in Fig. 2 . The activation maps show that DenseNet-121 and ResNet-50 models obviously get the information from irrelevant regions on the CXRs while the proposed models focus on the relevant regions. The diagnosis of COVID-19 is a crucial task to prevent the further spread of the disease. This study investigates the limitations of the state-of-the-art deep models that are trained for COVID-19 detection directly from CXRs. To address these problems, we propose an endto-end reliable COVID-19 detection network with pre-trained convolutional layers. We have compiled and publicly shared the largest COVID-19 dataset: QaTa-COV19, which includes 4603 COVID-19 samples, and 120, 013 CXRs from 14 different thoracic diseases and normal samples. The experimental results over this benchmark dataset have shown that the proposed approach has achieved the highest sensitivity level compared to competing methods. We also demonstrated how the proposed models properly focus their analysis in the relevant region of the CXR instead of irrelevant activation observed in the competing models. In our future work, more CXR images will be used to train the lung segmentation models to further increase the reliability of our approach in COVID-19 detection. World Health Organization A review of coronavirus disease-2019 (covid-19) Real-time rt-pcr in covid-19 detection: issues affecting the results Chest ct findings in coronavirus disease-19 (covid-19): relationship to duration of infection Computed tomography-an increasing source of radiation exposure Pdcovidnet: a parallel-dilated convolutional neural network architecture for detecting covid-19 from chest x-ray images Classification of covid-19 chest x-rays with deep learning: new models or fine tuning? Can ai help in screening viral and covid-19 pneumonia? Covid mtnet: Covid-19 detection with multi-task deep learning approaches Covid-cxnet: Detecting covid-19 in frontal chest x-ray images using deep learning Covid-19 classification of x-ray images using deep neural networks Covid-19 infection map generation and detection from chest x-ray images Bimcv covid-19+: a large annotated dataset of rx and ct images from covid-19 patients COVID-19 Image Repository COVID-19 DATABASE Covid-19 image data collection Chest Imaging Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Identifying medical diagnoses and treatable diseases by image-based deep learning Densely connected convolutional networks Deep residual learning for image recognition Rethinking the inception architecture for computer vision Inception-v4, inception-resnet and the impact of residual connections on learning Grad-cam: Visual explanations from deep networks via gradient-based localization U-net: Convolutional networks for biomedical image segmentation Two public chest x-ray datasets for computeraided screening of pulmonary diseases Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database