key: cord-0500796-52vsaj1v authors: Chowdhury, Muhammad E. H.; Rahman, Tawsifur; Khandakar, Amith; Mazhar, Rashid; Kadir, Muhammad Abdul; Mahbub, Zaid Bin; Islam, Khandakar R.; Khan, Muhammad Salman; Iqbal, Atif; Al-Emadi, Nasser; Reaz, Mamun Bin Ibne title: Can AI help in screening Viral and COVID-19 pneumonia? date: 2020-03-29 journal: nan DOI: nan sha: 2a0f2a76df5af569a9d85d97d2a81ae10f02fccf doc_id: 500796 cord_uid: 52vsaj1v Coronavirus disease (COVID-19) is a pandemic disease, which has already infected more than half a million people and caused fatalities of above 30 thousand. The aim of this paper is to automatically detect COVID-19 pneumonia patients using digital x-ray images while maximizing the accuracy in detection using image pre-processing and deep-learning techniques. A public database was created by the authors using three public databases and also by collecting images from recently published articles. The database contains a mixture of 190 COVID-19, 1345 viral pneumonia, and 1341 normal chest x-ray images. An image augmented training set was created with 2500 images of each category for training and validating four different pre-trained deep Convolutional Neural Networks (CNNs). These networks were tested for the classification of two different schemes (normal and COVID-19 pneumonia; normal, viral and COVID-19 pneumonia). The classification accuracy, sensitivity, specificity and precision for both the schemes were 98.3%, 96.7%, 100%, 100% and 98.3%, 96.7%, 99%, 100%, respectively. The high accuracy of this computer-aided diagnostic tool can significantly improve the speed and accuracy of diagnosing cases with COVID-19. This would be highly useful in this pandemic where disease burden and need for preventive measures are at odds with available resources. images with an accuracy of 83.5%. Ayrton 40 used a small dataset of 339 images for training and testing using ResNet50 based deep transfer learning technique and reported the validation accuracy of 96.2%. However, this study was reporting the detection accuracy of COVID-19 images with normal images on a small dataset. The authors in this paper have prepared a comparatively large database of X-ray images of normal, viral pneumonia and COVID-19 positive pneumonia and made this publicly available so that other researchers can get benefit from it. Moreover, four different pre-trained deep learning networks (AlexNet, ResNet18, DenseNet201, and SqueezeNet) were trained, validated and tested for two different classification schemes in this study. One classification model was trained to classify COVID-19 and normal X-ray images while other was trained to classify normal, viral and COVID-19 pneumonia images. Deep convolutional neural networks typically performs better with a larger dataset than a smaller one. Transfer learning can be useful in those applications of CNN where the dataset is not large. The concept of transfer learning uses the trained model from large dataset such as ImageNet 41 is used for application with comparatively smaller dataset. This removes the requirement of having large dataset and also reduces the long training period as is required by the deep learning algorithm when developed from scratch 42, 43 . Although there are a large number of COVID-19 patients infected worldwide, the number of chest xray images publicly available online are small and scattered. Therefore, in this work, authors have reported a comparatively large dataset of COVID-19 positive chest X-ray images while normal and viral pneumonia images are readily available publicly and used for this study. A Kaggle database was created by the authors to make the database publicly available to the researchers worldwide and the trained models were made available so that others can get benefit of this study 44 . Database Description: In this study, we have used posterior-to-anterior image of chest x-ray as this view of radiography is widely used by radiologist in pneumonia diagnosis. Four different sub-databases were used to create one database. Among these two databases were developed by the authors and other two databases were already publicly available in Kaggle and GitHub. In the following section, authors have summarized how this dataset is created:  GitHub database has encouraged the authors to look into the literature and interestingly more than 1200 articles were published in less than two-months of period. Authors have observed that the GitHub database has not collected most of the x-ray and CT images rather a small number of images were in that database. Moreover, the images in SIRM and GitHub database are in random size depending on the x-ray machine resolution and the articles from which it was taken. Therefore, authors have carried out a tedious task of collecting and indexing the X-ray and CT images from all the recently publicly available articles and online sources. These articles and the radiographic images were then compared with the GitHub database to avoid duplication. Authors managed to collect 60 COVID-19 positive chest x-ray images from 43 recently published articles 44 , which were not listed in the GitHub database.  Chest X-Ray Images (pneumonia): Kaggle chest X-ray database is a very popular database, which has 5247 chest X-ray images of normal, viral and bacterial pneumonia with resolution varying from 400p to 2000p 47 . Out of 5247 chest X-ray images, 3906 images are from different subjects affected by pneumonia (2561 images for bacterial pneumonia and 1345 images for viral pneumonia) and 1341 images are from normal subjects. Chest xray images for normal and viral pneumonia were used from this database to create the new database. Figure 1 shows sample images from the database for normal, COVID-19 pneumonia, and viral pneumonia chest X-ray images. In this study, MATLAB 2019a was utilized to train, evaluate and test four well-known pre-trained deep learning CNNs: AlexNet 24 , ResNet18 48 , DenseNet201 48 & SqueezeNet 49 to classify the chest x-ray images for two classification problems as mentioned earlier. Figure 2 illustrates the overview of the methodology of this study. The training of the different models was carried out in a computer with Intel© i7-core @3.6GHz processor and 16GB RAM, 2 GB graphics card with graphics processing unit (GPU) on 64-bit Windows 10 operating system. ResNet and DenseNet have different variants however, ResNet18 and DenseNet201 were readily available for Matlab 2019a and therefore used in this study. Block diagram of the overall system. Images with and without augmentation were used for this study. In the Table1, summary of images from different sub-datasets were listed. Total number of COVID-19 images was 190 including training and test set. Therefore, 190 x-ray images were randomly selected from normal and viral pneumonia images to match with COVID-19 images to balance the database. However, image augmentation techniques were applied to make the 130 training COVID-19 x-ray images 20 fold, which resulted to 2600 X-ray images for COVID-19. Therefore, 1300 normal and viral pneumonia images were randomly selected from the database and two-fold images (2600) training images were created for each category using image augmentation. Four ML algorithms were trained and evaluation by five-fold cross validation using the x-ray images with and without image augmentation. Mini-batch gradient descent with 0.9 momentum, 5 mini-batches and learning rate was 0.0003 was used in this study. Testing of four different algorithms was carried out on 60 non-augmented images of different categories to evaluate the performance of each algorithms. Preprocessing: One of the important steps in the data preprocessing was to resize the X-Ray images as the image input for different algorithms were different. For AlexNet and SqueezeNet, the images were resized to 227×227 pixels whereas for ResNet18 and DenseNet201, the images were resized to and 224×224 pixels. All images were normalized according to the pre-trained model standards. Image augmentation: As discussed earlier, in this study, authors have utilized three augmentation strategies (Rotation, Scaling, and Translation) to generate 20-fold training set of COVID-19 images while 2fold normal and viral pneumonia images, as shown in supplementary figure S1. The rotation operation used for image augmentation was done by rotating the images in the clockwise and counter clockwise direction with an angle of 15, 30, 45, 60, 75 and 90 degrees. The scaling operation is the magnification or reduction of frame size of the image and 2.5% to 10% image magnifications were used in this work. Image translation was done by translating image horizontally and vertically by 5% to 20%. We investigated the features of the image by observing which areas in the convolutional layers activated on an image by comparing with the matching regions in the original images. The activation map can take different range of values and was therefore normalized between 0 and 1. The strongest activation channels were observed and compared with the original image. It was noticed that this channel activates on edges. It activates positively on light left/dark right edges, and negatively on dark left/light right edges. Convolutional neural networks learn to detect features like color and edges in their first convolutional layer. In deeper convolutional layers, the network learns to detect features that are more complicated. Later layers build up their features by combining features of earlier layers. Figure 3 shows the activation map in early convolutional layers, deep convolutional layer and strongest activation channel for each of the models. Activation map for different network models of (i) first convolutional layer, (ii) strongest activation channel, (iii) deep layer images set, and (iv) deep convolutional layer for that specific image. In order to evaluate the performance of different deep learning algorithms for classifying the x-ray images into two schemes, four different algorithms were trained. The trained algorithms were validated using 5-fold cross-validation. The performance of different networks for testing dataset was evaluated using six performances metrics such as-accuracy, sensitivity or recall, Specificity, Precision (PPV), Area under curve (AUC), F1 score. It was observed that SqueezeNet showed the best performance for classifying images from normal and COVID-19 group and in normal, viral pneumonia and COVID-19 group. The comparative performance of training and testing accuracy for different CNNs for two different classification schemes were shown in Supplementary Figure S2 and comparative AUC curve were shown in Supplementary Figure S3 . SqueezeNet is producing the highest accuracy of 98.3% for two classification schemes for both training and testing. ResNet18 and SqueezeNet were performing excellent in training dataset. However, SqueezeNet outperforms ResNet18 and other algorithms for test dataset. Table 2 summarizes the performance matrix for different CNN algorithms tested for the two different classification schemes without image augmentation. DenseNet201 outperforms other models in two different classification schemes in terms of different performance indices when the image augmentation was not employed. However, the performance matrix was significantly improved with image augmentation and it was observed that SqueezeNet outperforms other models in both classification schemes in this case as shown in Table 3 . Figure 4 shows the confusion matrix for SqueezeNet for two-class problem and three-class problem with image augmentation. It is clear from Figure 4 that the reduction of accuracy from 100% was due to the miss-classification of two COVID-19 images to normal otherwise this technique would reach 100% accuracy. This was true for both two-class and three-class problems. Moreover, a viral pneumonia was miss-classified to normal in the three-class problem. However, interestingly none of the COVID-19 images was miss-classified to viral pneumonia and vice versa. This is very important, as the computer-aided system (CAD) should not classify any COVID-19 patients to viral pneumonia or vice versa; however, it is important to see why the classifier failed for two COVID-19 patients and miss-classified them to normal. and Viral Pneumonia using SqueezeNet. The difference between normal and COVID-19 x-ray images can be observed in the deep convolutional layer of pretrained CNN model. It is notable that from Figure 5 that the 14 th layer of SqueezeNet pre-trained model can detect features that can distinguish normal, COVID-19 and Viral Pneumonia images. This shows the reason of the success of SqueezeNet in detecting COVID-19 X-ray images and distinguishing it from normal and viral pneumonia images, which several groups of researchers reported earlier are not reliably possible by X-ray images. Figure 6 shows two images of COVID-19 with false-negative findings missed by both two-and threeclass classifiers. The main reason behind the missing of two COVID-19 images is a less opacity in the left and right upper lobe and suprahilar on posterior-to-anterior x-ray images, which is very similar to normal x-ray images (see Supplementary Figure S4 ). This work presents deep CNN based transfer learning approach for automatic detection of COVID-19 pneumonia. Four different popular CNN based deep learning algorithms were trained and tested for classifying normal and pneumonia patients using chest x-ray images. It was observed that SqueezeNet outperforms other three different deep CNN networks. The classification accuracy, sensitivity, specificity and precision of normal and COVID-19 images, and normal, COVID-19 and viral pneumonia were (98.3%, 96.7%, 100%, 100%), and (98.3%, 96.7%, 99%, 100%) respectively. COVID-19 has already become a threat to the world's healthcare system and economy and thousands of people have already died. Deaths were initiated by respiratory failure, which leads to the failure of other organs. Since a large number of patients attending out-door or emergency, doctor's time is limited and computer-aided-diagnosis can save lives by early screening and proper-care. Moreover, there is a large degree of variability in the input images from the X-ray machines due to the variations of expertise of the radiologist. SqueezeNet exhibits an excellent performance in classifying COVID-19 pneumonia by effectively training itself from a comparatively lower collection of images. We believe that this computer aided diagnostic tool can significantly improve the speed and accuracy of diagnosing cases with COVID-19. This would be highly useful in this pandemic where disease burden and need for preventive measures are at odds with available resources. WHO Director-Generalʹs opening remarks at the media briefing on COVID-19 -11 ʺGlobal COVID-19 report Coronavirus COVID-19 Global Cases by the ʺDetection of SARS-CoV-2 in Different Types of Clinical Specimens,ʺ Jama ʺPoint-of-Care RNA-Based Diagnostic Device for COVID-19 Indiaʹs poor testing rate may have masked coronavirus cases Bangladesh scientists create $3 kit. Can it help detect COVID-19 ʺClinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China,ʺ Jama ʺEpidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study ʺEarly transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia ʺClinical features of patients infected with 2019 novel coronavirus in Wuhan, China,ʺ The Lancet ʺDetection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR,ʺ Eurosurveillance ʺMolecular diagnosis of a novel coronavirus (2019-nCoV) causing an outbreak of pneumonia ʺRecent advances in the detection of respiratory virus infection in humans ʺCT imaging features of 2019 novel coronavirus (2019-nCoV),ʺ Radiology COVID-19): lessons from severe acute respiratory syndrome and Middle East respiratory syndrome COVID-19): A Systematic Review of Imaging Findings in 919 Patients ʺA systematic approach to the design and characterization of a smart insole for detecting vertical ground reaction force (vGRF) in gait analysis ʺWearable real-time heart attack detection and warning system to reduce road accidents ʺReal-Time Smart-Digital Stethoscope System for Heart Diseases Monitoring,ʺ Sensors ʺHow far have we come? Artificial intelligence for chest radiograph interpretation,ʺ Clinical radiology ʺImagenet classification with deep convolutional neural networks,ʺ in Advances in neural information processing systems ʺLow-light image enhancement of high-speed endoscopic videos using a convolutional neural network ʺDeep Learning-based Image Conversion of CT Reconstruction Kernels Improves Radiomics Reproducibility for Pulmonary Nodules or Masses,ʺ Radiology ʺIdentifying medical diagnoses and treatable diseases by image-based deep learning ʺApplication of artificial neural networks for automated analysis of cystoscopic images: a review of the current status and future prospects ʺDevelopment and validation of a deep-learning algorithm for the detection of polyps during colonoscopy ʺA Novel Transfer Learning Based Approach for Pneumonia Detection in Chest X-ray Images The data that transformed AI research-and possibly the world ʺClassification of bacterial and viral childhood pneumonia using deep learning in chest radiography ʺHospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases,ʺ in IEEE CVPR ʺConvolutional networks for biomedical image segmentation ʺDeep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists ʺMultiple feature integration for classification of thoracic disease in chest radiography ʺDeep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks ʺCOVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images ʺA deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19),ʺ medRxiv Using Deep Learning to detect Pneumonia caused by NCOV-19 from X-Ray Images ʺImagenet: A large-scale hierarchical image database,ʺ in 2009 IEEE conference on computer vision and pattern recognition ʺConvolutional neural networks for medical image analysis: Full training or fine tuning? ʺA survey on transfer learning COVID-19 CHEST RADIOGRAPHY DATABASE COVID-19 Database COVID-Chestxray Database Chest X-Ray Images (Pneumonia) Inception: Understanding various architectures of Convolutional Networks ʺConvolutional networks and applications in vision,ʺ in Proceedings of 2010 IEEE international symposium on circuits and systems Acknowledgments: The authors would like to thank Italian Society of Medical Radiology and Interventional for sharing the X-ray images of COVID-19 patients publicly and would like to thank J. P.Cohen for taking the initiative to gather images from articles and online resources. Last but not the least, authors would like to acknowledge the Chest X-Ray Images (pneumonia) database in Kaggle which helped significantly to make this work possible. Otherwise, normal and viral pneumonia images were not accessible to the team. The authors declare no conflict of interest.