key: cord-0784445-38yh7sxy authors: Schlotterbeck, J. N.; Montoya, C. E.; Bitar, P.; Fuentes, J. A.; Dinamarca, V.; Rojas, G. M.; Galvez, M. title: Automatic analysis system of COVID-19 radiographic lung images (XrayCoviDetector) date: 2020-08-23 journal: nan DOI: 10.1101/2020.08.20.20178723 sha: 7a0f7d94cf2f18491b4cf51a623c2d3b19cd7418 doc_id: 784445 cord_uid: 38yh7sxy COVID-19 is a pandemic infectious disease caused by the SARS-CoV-2 virus, having reached more than 210 countries and territories. It produces symptoms such as fever, dry cough, dyspnea, fatigue, pneumonia, and radiological manifestations. The most common reported RX and CT findings include lung consolidation and ground-glass opacities. In this paper, we describe a machine learning-based system (XrayCoviDetector; www.covidetector.net), that detects automatically, the probability that a thorax radiological image includes COVID-19 lung patterns. XrayCoviDetector has an accuracy of 0.93, a sensitivity of 0.96, and a specificity of 0.90. COVID-19 (acronym of coronavirus disease 2019), also known as coronavirus disease or coronavirus, is an infectious disease caused by the SARS-CoV-2 virus . It was first detected in the city of Wuhan, China in December 2019 . Having reached more than 210 countries, areas, and territories (WHO, 2020) , the World Health Organization declared it a pandemic on March 11, 2020 (WHO, 2020 . It produces flu-like symptoms including fever, dry cough, dyspnea, myalgia, and fatigue. In severe cases, it is characterized by pneumonia, acute respiratory distress syndrome, sepsis, and septic shock . COVID-19 has radiological manifestations, even in asymptomatic patients, and in certain cases before a positive real-time reverse transcription-polymerase chain reaction (RT-PCR) test. Radiological features have already been used to identify high-risk patients early, improving the prognosis (and reducing the need for invasive mechanical ventilation) by being able to establish monitoring and early intensive management. 3805 radiography of normal, pneumonia not COVID-19, and COVID-19 type pneumonia was used. The database is made up of three sources: To minimize false-positive errors, the images of three databases (Table 1) were reviewed, validated, and selected by expert radiologists (obtaining the number of images detailed in Table 1) , since not all patients show distinguishable patterns on their chest radiographs. It should be noted that the first set (item 1 in Table 1 ) is RSNA images and is validated for a pre-pandemic Kaggle competition (patients without COVID). Being a small data set, two sets were randomly formed. One set was used to train the model and the other to validate it (not test set was formed). See Table 2 . In the training set, we have four times fewer cases of COVID, so different weights are introduced to each category to measure the error in the training of the model (in addition to passing the data in large batches of 100 images each, such that contain multiple COVID images at each training step). To measure the precision of the model (correct results over totals), note that the validation set is balanced so that there are equal numbers of positive and negative cases ( Table 2) . To avoid bias, by date or origin of images, annotations are removed from images using an automatic segmentation system. A U-Net-type AI architecture (Ronneberger et al., 2015) was trained to perform automatic segmentation of only the lung before performing the classification (Figure 1, 2 , 3). The data was grayscale normalized before the segmentation. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2020. . https://doi.org/10.1101/2020.08.20.20178723 doi: medRxiv preprint Figure 2 : Different RX images of segmented lungs using the U-Net-type AI model. Figure 3 . It consists of contraction (encoder network) and expansion path (decoder network). The 256x256x1 input image is passed through two 3x3 convolutional layers. Then, the image is downsampled using 2x2 max-pooling layers. This process is repeated until the image has a size of 16x16x512. Then instead of downsampling, the image is sent through a 2x2 deconvolutional layer (up-Conv 2x2; Fig 3) and concatenated consecutively with a cropped version of the previous feature map, and similarly, the feature map is sent through 3x3 convolutional layers. The process is repeated until an image of size 256x256x2 (Fig x) was got and a 1x1 convolution layer was applied to get a 256x256x1 sized output (1 class: lung). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2020. . https://doi.org/10.1101/2020.08.20.20178723 doi: medRxiv preprint The data was expanded by applying different transformations randomly. In Table 3 , the transformations for image segmentation model, and in Table 4 Table 3 : Keras Augmentation parameters for transformations that will be applied to each radiological image for segmentation model (Fig. 3) . All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2020. Table 3 : Transformations that will be applied to each radiological image to augment the data set for classification model (Fig. 5 ). The transformations (Table 3 ) produce greater variability to the data and help the model to improve generalization (so that it responds better with new data and with different distributions, trying to make it more robust and reliable). See Figure 4 . (Table 3) . All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2020. . https://doi.org/10.1101/2020.08.20.20178723 doi: medRxiv preprint Given that the data sets used are relatively small, it was decided to use transfer learning from a pretrained classic VGG-16 model (a convolutional neural network that is 16 layers deep; Fig 5) in the first convolutional layers with fixed weights obtained on ImageNet images (http://www.image-net.org/). ImageNet set contains 120 categories of images of all kinds, not from the medical area, so in the first fixed convolutional layers, the model contains the filters for the relevant characteristics typical of any image (for example, edge detection). For the unfrozen layers of the model, it is necessary to change the last dense layers to be of two categories. The unfreeze layers are trained for a slightly different problem. This time a problem from the medical area is used, with chest radiographs for the detection of pneumonia with over twelve thousand images from a Kaggle competition (with images very similar to the final problem to be solved, but where there are more images. RSNA Pneumonia Detection Challenge; https://www.kaggle.com/c/rsnapneumonia-detection-challenge). Also, since COVID can cause a type of pneumonia, the problems are similar and help the development of the system. Finally, you train on the final problem. During training, some neurons are randomly removed (given the lack of data, to avoid overfitting the training set and poor generalization). Using Amazon Web Services (AWS; https://aws.amazon.com) XrayCoviDetector was implemented. A website, and the complete neural network. Using Amazon EC2 (https://aws.amazon.com/es/ec2/), Amazon S3 (https://aws.amazon.com/es/s3/) for the storage and web site, Amazon SageMaker (https://aws.amazon.com/es/sagemaker/) and Amazon TensorFlow (https://aws.amazon.com/es/tensorflow/) to create and implement the neural networks used in this project. A website was created: www.covidetector.net (XrayCoviDetector; Fig 6) . (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2020 . . https://doi.org/10.1101 To login to XrayCoviDetector, the user must input his email and password (Fig 7A) , and register a new user the complete name, email, password, and must click the checkbox to know and accept the conditions of use (Fig 7B) . To upload a case, the age, sex, and a PNG or JPG chest image of the patient must be selected (Figure 8) . Then press the "ENVIAR" button. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2020. . https://doi.org/10.1101/2020.08.20.20178723 doi: medRxiv preprint Figure 8 : Web page to upload a case. Age, sex, and PNG or JPG chest image files must be selected to upload by pressing the "ENVIAR" button. Later, the user will receive an email from cis@clinicalascondes.cl with the result of the analysis of RX images of the patient. Two possible sentences appear in the received email: "se detectan posibles patrones de COVID-19" ("possible COVID-19 patterns were detected") or "no se detectaron patrones de COVID-19" ("COVID-19 patterns not detected"). See Figure 9 and Figure 10 . All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2020. . The results of the validation set, even distorting the images, are shown in a confusion matrix ( Accuracy: Refers to how close to the actual value the measured value is. High accuracy means that there is a small difference between the predicted result of XrayCoviDetector and the actual positive (RX lung with COVID-19 pneumonia) (Griffiths, 2009 (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2020. . https://doi.org/10.1101/2020.08.20.20178723 doi: medRxiv preprint = + Using data of Table 4 , the accuracy is equal to 0.93, sensitivity = 0.96, and specificity = 0.90. To verify that the system does not have any kind of bias related to the origin of the data, the system was tested using 66 CLC origin only images, obtaining an accuracy of 0.92. In this paper, a COVID-19 lung pattern automatic detection system using RX images was described. XrayCoviDetector is a worldwide accessible web system from a computer or a mobile device, easy to use that send the results through email in a couple of seconds. To train and validate the system, images from different origin databases, and probably different technical characteristics have been used (Table 1) . With these data, a high accuracy (0.93) was obtained. To assess whether the accuracy is maintained, XrayCoviDetector was measured using CLC acquired images, obtaining an accuracy of 0.92. Being able to conclude that the trained neural network is robust, and the results will not depend significantly on the characteristics of the images or the X-ray equipment used. XrayCoviDetector is a fast and fully automatic chest X-Ray analysis web system that detects COVID-19 pneumonia patterns. Pieces of X-Ray equipment are common in medical centers and hospitals worldwide, but PCR (the gold-standard COVID-19 diagnosis exam) is not as common. Therefore, X-Ray can be considered as a complementary support exam, and XrayCoviDetector supports the non-existence of a COVID-19 expert radiologist in the medical center. XrayCoviDetector may be less effective in detecting lung disease patterns in early stages than in more advanced stages. This is because there are fewer images of the early stages in the training set than in more advanced stages. It is important to note that many of the COVID-19 images used for this system correspond to the same patient on different days (so some of the images look similar but highlighting that they do not they are identical). To reduce this problem, they are randomly sorted and passed in separate groups during training (in addition to random distortion on augmentation). CT Imaging Features of 2019 Novel Coronavirus Head First Statistics: A Brain-Friendly Guide Clinical features of patients infected with 2019 novel coronavirus in A Critic Evaluation of Methods for COVID-19 Automatic Detection from X-Ray Images Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention Coronavirus disease (COVID-19) Pandemic A new coronavirus associated with human respiratory disease in China