key: cord-0224644-qootuzu7 authors: Bassi, Pedro R. A. S.; Attux, Romis title: A Deep Convolutional Neural Network for COVID-19 Detection Using Chest X-Rays date: 2020-04-30 journal: nan DOI: nan sha: 4d603323ba37e18c26a53b64da6c976d38d02763 doc_id: 224644 cord_uid: qootuzu7 We present an image classifier based on the CheXNet and a transfer learning stage to classify chest X-Ray images according to three labels: COVID-19, viral pneumonia and normal. CheXNet is a DenseNet121 that has been trained twice, firstly on ImageNet and then, for classification of pneumonia and other 13 chest diseases, over a large chest X-Ray database (ChestX- ray14). The proposed network reached a test accuracy of 97.8% and, for the COVID-19 class, of 98.3%. In order to clarify the modus operandi of the network, we used Layer Wise Relevance Propagation (LRP) to generate heat maps, indicating an analytical path for future research on diagnosis. In 2020, COVID-19 became pandemic, affecting both developed and developing countries around the world. By 04/23/2020, the virus had already infected more than 2,600,000 people and caused more than 180,000 deaths (Hopkins (2020) ). The most commonly used method for COVID-19 diagnosis is reverse transcriptase-polymerase chain reaction (RT-PCR) ). It has high specificity, but is also expensive, slow and currently at a high demand. Chest X-Rays are commonly available and are faster and cheaper, but signals associated with the presence of COVID-19 in the lungs can be hard to detect. Researchers have already suggested the use of deep neural networks (DNNs) to help in the detection of the disease on Chest X-Ray images (Chowdhury et al. (2020) , Wang and Wong (2020) ). In Wang and Wong (2020) , the authors achieved good results, with 92.6% test accuracy, 96.4% recall and 87% precision on the COVID-19 images. In Chowdhury et al. (2020) , a larger COVID-19 dataset was reported, and the authors had a maximum of 98.3% test accuracy, with 96.7% recall and 100% precision regarding SARSCoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2). Neural networks have been successful at identifying pneumonia from X-Rays, achieving performances better than those of radiologists (Rajpurkar et al. (2017) ). In this work, we used CheXNet (Rajpurkar et al. (2017) ), a 121-layer Dense Convolutional Network or DenseNet (Huang et al. (2016) ) that had been trained on ImageNet (Deng et al. (2009) ) and then on a large chest X-Ray dataset, ChestX-ray14 (Wang et al. (2017) ). Using the open COVID-19 X-Ray dataset assembled in Chowdhury et al. (2020) , we performed another transfer learning on the DNN, training it to differentiate between images of normal lungs, viral pneumonia and COVID-19. After training it, we applied Layer Wise Relevance Propagation (LRP) (Bach et al. (2015) ) to the network, generating heat maps of the X-Rays, along with the probabilities of COVID-19, viral pneumonia and healthy lungs. These heat maps show us the regions of the image that mostly influenced the network classification, and also regions that were more representative of other classes. LRP allows us to have a better understanding of the DNN operation, but can also be useful to a radiologist in identifying the effects of COVID-19 in the X-Ray. An application of this method in the context of neuroimaging can be seen at Thomas et al. (2019) . In this study, we used the open dataset reported in Chowdhury et al. (2020) . The database is composed of 219 COVID-19 chest X-Ray images, as well as 1341 normal lung images and 1345 viral pneumonia images. It is available on Kaggle, and is one of the largest collections of COVID-19 images. As described in Chowdhury et al. (2020) , this dataset was created with posterior-to-anterior image of chest X-Ray. The COVID-19 data was taken from different databases: 63 images from Italian Society of Medical and Interventional Radiology (SIRM) COVID-19 DATABASE (SIRM (2020)), 60 from Novel Corona Virus 2019 Dataset (Cohen et al. (2020) ) and 60 images were collected (by the authors of Chowdhury et al. (2020) ) from 43 recently published articles. The normal and viral pneumonia images were taken from the Kaggle database Chest X-Ray images (pneumonia) (Kermany et al. (2018) ). More information about the dataset can be seen in Chowdhury et al. (2020) . Firstly, we divided the dataset from Chowdhury et al. (2020) into three: training, validation and testing. To create the test dataset, we randomly took 60 images of each class (normal, . The test dataset in Chowdhury et al. (2020) has the same configuration. After removing the 180 test images, we took 80% of the remaining images for training and 20% for validating. This was also done randomly, but preserving the same class proportions in the two datasets. Our training dataset ended up with 127 images positive for COVID-19. Many of the images had letters or words on them, and some of these words were exclusive for certain classes. For example, some COVID-19 images (from SIRM, SIRM (2020)) had the word "SEDUTO" (Italian word for "seated") written on the upper left corner. We were afraid that this could affect the network classification performance, hence we decided to manually edit our test dataset images, removing the words or letters. They were simply covered with black rectangles and, as they were not over the lungs, no relevant information was lost. The idea was only to test the network's ability analyzing the lungs and, by editing only the test dataset, there would be no risk of teaching the DNN to identify our black rectangles. We decided to apply data augmentation for two reasons: it improves the DNN performance for small datasets (like our COVID-19 database, as the authors found out in Chowdhury et al. (2020) ), and because it would balance our datasets. As we would also benefit from a balanced validation dataset, we applied augmentation in training and validation. All images were reshaped from 1024x1024 pixels to 224x224 to match the CheXNet input format. We used common image augmentation methods: rotations (between -40 and 40 degrees), translations (up to 40 pixels left and right and up to 28 pixels up and down) and flipping (horizontal). These transformations could augment our data and also make the DNN more resistant to input translations and rotations. We augmented our normal and viral pneumonia image database 8 times (2 random rotations, followed by 2 random translations and flipping), and our COVID-19 images 72 times (6 random rotations, then 6 random translations and flipping). We ended up with a training dataset of 9144 COVID-19, 8128 viral pneumonia and 8128 normal lung images. All augmentation was done online. DNNs have a tendency to overfit when trained on small datasets and transfer learning is a technique that can avoid this problem. It consists on using a network that was already trained in a large dataset, and training it again on the smaller database. Doing this, we hope that representations learned by the DNN in the first set can help the model's generalization on the second (Goodfellow et al. (2016) ). CheXNet (Rajpurkar et al. (2017) ) is an 121-layer Dense Convolutional Network (Huang et al. (2016) ) trained to classify the probability of 14 thoracic diseases (including pneumonia) in frontal chest X-Rays, from the dataset ChestX-ray14 (Wang et al. (2017) ), a large database with 112,120 images. Being a deep convolutional network trained with a large dataset, on a task very similar to ours, CheXNet seemed like a very good option for transfer learning. This network was very successful in classifying pneumonia, achieving an F1 score (harmonic average of the precision and recall) of 0.435, while the average score for 4 radiologists was 0.330 (Rajpurkar et al. (2017) ). CheXNet was also originally created with transfer learning, because the authors in Rajpurkar et al. (2017) began with a network pretrained on Im-ageNet (Deng et al. (2009) ), and trained it on ChestX-ray14 (Wang et al. (2017) ). Thus, as we trained this network again, on the COVID-19 dataset (Chowdhury et al. (2020) ), we can say our network was trained with transfer learning twice. To create our network, we downloaded a pretrained PyTorch version of CheXNet (Zech (2018) ). We then removed its last layer (the only fully con-nected one, with 14 outputs), added a fully connected layer with only 3 outputs (one for each of our classes), a dropout of 50% preceding it (to improve generalization in our smaller dataset), and we kept its sigmoid activation function. The training process was carried out in PyTorch, with binary cross entropy loss, stochastic gradient descent with momentum of 0.9 and minibatches of 9 images. We trained on two NVidia GTX 1080 GPUs. We began the training stage by freezing all the network parameters except for our added output layer. We trained it in this configuration for 10 epochs, with learning rate of 0.001, weight decay of 0.01 and early stopping with patience equal to 5. After this stage, we had a test accuracy of 89.5%. We then unfroze all network parameters, but used discriminative learning rates (Howard and Ruder (2018) ), making the last layer rate 10 −5 and, in each dense block, the rate would be 3 to 10 times smaller than the consecutive block rate. The DNN was trained in this configuration for 96 epochs, with early stopping of 20 epochs and weight decay of 0.01. Our test accuracy increased to 96.1%. We decided to remove the weight decay and train for 48 epochs more, also with early stopping of 20 and reducing all layers learning rates by a factor of 10, which gave us a test accuracy of 96.7%. At last, because our training error seemed stagnated, we set all our learning rates to 10 −5 , the weight decay to zero and trained the network for 48 epochs more (with early stop of 20), and that gave us our final network, with a test accuracy of 97.8%. Layer Wise Relevance Propagation (LRP) aims to make DNNs (complex and nonlinear structures with millions of parameters and connections) interpretable by humans. It decomposes the network prediction showing, in a heat map, how each input variable contributed to the output (Bach et al. (2015) ). Analyzing our network with LRP allowed us to identify problems in the DNN classification method, and also to generate a heat map of the X-Ray image, showing where in the lungs the network identified problems. This map could be given to radiologists along with the network predictions, helping them to verify the classifier analysis, providing insights about the X-Rays and allowing a more profitable cooperation between human experts and artificial intelligence. To apply the technique, we used the Python library iNNvestigate (Alber et al. (2019) ), which already implemented LRP for DNNs like DenseNet and has parameter presets that work well for these networks. This library works with Keras and TensorFlow, but we trained our DNN on PyTorch, so we used the library pytorch2keras (Malivenko (2018) ) to convert our model. After the conversion, we tested it again, and obtained the same accuracies we had on PyTorch, showing that the conversion worked well. We reached a test accuracy of 97.8% and we started the analysis of our DNN by creating a confusion matrix, shown in Table 1 . From the 60 COVID-19 images, 59 were classified correctly and one as normal lungs. In Table 2 , we show network metrics for our DNN. We can compare these results to those obtained by the networks the authors trained in Chowdhury et al. (2020) , with the same database (albeit with a different treatment). Our network outperformed all DNNs (including AlexNet, ResNet18 and DenseNet201) in accuracy and F1 Score, except for the SqueezeNet with data augmentation, which had F1 score and accuracy of Figure 1 : COVID-19 positive X-Ray from test dataset and the DNN's heatmap 0.983. More carefully, comparing the proposed CheXNet to the SqueezeNet within the COVID-19 class, we see that they reached a better recall (1) but a worse precision (0.967), as the latter network classified 58 COVID-19 images correctly and 2 as normal. Also, in this class, our networks have the same F1-score. The authors in Chowdhury et al. (2020) also trained a DenseNet201 (with 201 layers), and it is interesting to compare it with our CheXNet, also a DenseNet, but smaller, with 121 layers. The DenseNet201 obtained 0.967 test accuracy and 0.971 F1 Score, which are worse than the 0.978 accuracy and F1 Score that the proposed network reached. We tested different LRP presets on iNNvestigate and got clearer and better heat maps with "LRP-PresetAFlat". Figure 1 shows a COVID-19 X-Ray test image and heat map for it. The more red the region on the map, the more it was important for the DNN classification as COVID-19. The more blue, the more that region is related to other classes (like a healthy part of the lung). We can see a region on the center of the image that had a significant contribution to the diagnosis. We also decided to analyze the effect of words on the X-Ray images. We used the same test COVID-19 image shown in figure 1, but without removing the word "SEDUTO" from its upper right corner. The resulting heat map is shown in Figure 2 . It becomes clear, by the dark red color on the map, that the network learned to associate this word with the COVID-19 class. To measure the effect of this problem we tested DNNs we trained with our testing dataset but unedited (with the words and letters it originally had). This changed the network test accuracies but not significantly. A DNN with about 94% test accuracy had its accuracy increased in almost 1% and our best network had its accuracy decreased from 97.8% to 97.3%. Another test was to try to "fool" our network, adding to a normal lung X-Ray test image that was classified correctly the word "SEDUTO". Interestingly, the network given probability for Normal just changed from 98.51% to 98.45%, and for COVID-19 increased from 0.4471% to 0.4595%. Figure 3 shows that the word influenced negatively the DNN decision for the normal class. With 97.8% test accuracy and, in the COVID-19 class, 98.3% recall and precision, our classifier is on par with the best DNNs we could find for classifying COVID-19 with chest X-Rays (Chowdhury et al. (2020) , Wang and Wong (2020) ). Our network (CheXNet, based on a DenseNet with 121 layers) outperformed a DenseNet201 trained on the same dataset (Chowdhury et al. (2020) ). This indicates that applying transfer learning on a DNN already trained on a large chest X-Ray dataset (Wang et al. (2017) ) was beneficial in terms of performance and probably of training time. LRP showed promising results highlighting details of the X-Rays that most influenced the network classification. We hope that this may indicate a possibility to help radiologists and provide a better interaction between experts and the machines. We also discovered that words and letters influence the DNN classifications slightly. In the future, we think that it wold be useful to create an automatic method to edit the images, which could also be applied to the training dataset without compromising the neural network classification. A performance comparable to that of an expert had already been achieved by deep networks in pneumonia classification using radiography (Rajpurkar et al. (2017) ). This study and other initiatives (Wang and Wong (2020) , Chowdhury et al. (2020) ) show that DNNs have the potential of making chest X-Ray a fast, accurate, cheap and easily available auxiliary method for COVID-19 diagnosis. The trained network proposed here is open source and available for download in Bassi and Attux (2020) : we hope DNNs can be further tested in clinical studies and help in the creation of tools to fight the COVID-19 pandemic. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation Covid-19 dnns Can ai help in screening viral and covid-19 pneumonia? Covid-19 image data collection ImageNet: A Large-Scale Hierarchical Image Database Deep learning Coronavirus resource center Universal language model fine-tuning for text classification Identifying medical diagnoses and treatable diseases by image-based deep learning Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning Covid-19 database Analyzing neuroimaging data through recurrent deep learning models Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Detection of sars-cov-2 in different types of clinical specimens Chestx-ray14: Hospital-scale chest x-ray database and benchmarks on weaklysupervised classification and localization of common thorax diseases