key: cord-0606231-o7y3p9c1 authors: Maganaris, Constantine; Protopapadakis, Eftychios; Bakalos, Nikolaos; Doulamis, Nikolaos; Kalogeras, Dimitris; Angeli, Aikaterini title: Evaluating Transferability for Covid 3D Localization Using CT SARS-CoV-2 segmentation models date: 2022-05-04 journal: nan DOI: nan sha: d5b981aaf5c78e7077d65db14ab236cf82a30d90 doc_id: 606231 cord_uid: o7y3p9c1 Recent studies indicate that detecting radiographic patterns on CT scans can yield high sensitivity and specificity for Covid-19 localization. In this paper, we investigate the appropriateness of deep learning models transferability, for semantic segmentation of pneumonia-infected areas in CT images. Transfer learning allows for the fast initialization/reutilization of detection models, given that large volumes of training data are not available. Our work explores the efficacy of using pre-trained U-Net architectures, on a specific CT data set, for identifying Covid-19 side-effects over images from different datasets. Experimental results indicate improvement in the segmentation accuracy of identifying Covid-19 infected regions. Coronavirus pandemic (COVID-19) and its variations, has infected more than 427 million people and caused more than 5,9 million deaths around the globe, based on the facts of John Hopkins University (22 February 2022) [1]. Since the declaration of Public Health emergency of International Concern on January 30 2020 [2] from the World Health Organization (WHO) there was a research outbreak in the field, to respond the emergency situation. A great amount of research has been focused since then on fast detection of COVID, using CT or X-rays of the thorax, in parallel with other methods like Antigenic and Reverse Transcription-Polymerase Chain Reaction (RT-PCR) testing. Currently, the WHO recommends the use of rapid tests [3] in general population for primary case detection in symptomatic individuals suspected to be infected and asymptomatic individuals at * All authors contributed equally to this research. high risk of COVID-19, for contact tracing, during outbreak investigations and to monitor trends of disease incidence in communities, even if they have poor sensitivity in their results [4] . The RT-PCR tests are also known to have relatively high false negative rates [5] . This made crucial to invent methodologies to detect fast and reliably COVID infection but also to estimate the cruciality of it or how fast it expands in a patient, to curate it better, these data are not available with the methods of PCR or Antigen testing. This makes the use of CT and X-ray scans a mandatory tool in the hands of clinicians to work with COVID patients, but also suggests the need for creating new tools using Artificial Intelligence to assist the experts in their work and make detection of the effects of COVID faster. It was 2016, at a Conference in Toronto, when Geoffrey Hinton said that "[...] People should stop training radiologists now, it is just completely obvious that in 5 years Deep learning is going to do better than radiologists because this can be able to hit a lot more experience. It might be 10 years but we got plenty of radiologists already" [6] . Nevertheless, things progressed differently. During the pandemic, it became clear that the AI tools cannot replace radiologists, which were much in need and the pandemic had a great impact in them, especially in cases like Northern Italy [7] . Despite all that, it is crucial to continue developing automated decision making tools to assist healthcare personnel and overcome all the issues that comes with the analysis of Computed Tomography Imaging. One of the main setbacks in creating accurate models is the lack of publicly available datasets with chest scans of infected people. In addition, a lot of data that have been used so far, have faulty cleansing, so they create frustration on the results produced by deep learning models, trained to detect COVID [8] . Moreover, scans coming from different equipment, produce data that differentiate on how they signal COVID areas using the Hounsfield scale thus making the training of appropriate models even more difficult due to the significant differences in the training data. In our research we use two different datasets in order to detect the transferability of deep learning models between them, and estimate the accuracy of them to detect segments of COVID areas (i.e. markings of ground glass opacities, consolidation and pleural) arXiv:2205.02152v4 [eess.IV] 20 May 2022 [9] . These datasets include 3410 slides of CT scans with annotated COVID and Lung segments, but also have masks of the aforementioned signals of COVID-19. From our results we have found that there is evidence of such transferability of results, after retraining the model in a portion of the data of a second dataset and for a short number of epochs for the second train. It is important to note that we achieved a great prediction outcome with good rankings in precision, recall and F1 metrics. To present our results, we have visualized 3D reconstructions of the lungs using data from the CT scans. This way we achieved better understanding on how the model works in the complete CT scan and not by comparing separate slides of it. In our 3D model reproduction, we used only the lung areas of the slides, the annotations of the COVID areas made by the radiologists (ground truth) and the predicted COVID areas from our model. Now having these representations in hand we can view the bigger picture and how the model performs in a patient. This is making it easier for a user (i.e. a doctor) of the model to extract information from the results of how serious the illness is, or how the infection responds to medication etc. by comparing different scans of the same patience in different points of time. Another finding of our research was that even in our set of data, there were COVID areas (GGOs) falsely not annotated by the radiologists, we have seen that our model managed to correctly annotate COVID affected areas. Additionally, areas that have been falsely marked as COVID by the radiologists, and the model has been trained in them, produces correct output (i.e. non Covid) from our model. Deep learning methodologies using various types of images are common for identfication, detection or segmentation in medical imaging [10] and in biomedical applications [11] . In this context, researchers already investigating several approaches to assist medical professionals with Covid-19 detection. An initial approach was to classify multiple CT slices using a convolutional neural network variation [12] . The adopted methodology is able to identify a viral infection with a ROCAUC score of 0.95 (a score of 1 indicates a perfect classifier). However, despite the high detection rates, the authors indicated that it was extremely difficult to distinguish among different types of viral pneumonia based solely on CT analysis. Convolutional Neural Network (CNN) variations for the distinction of coronavirus vs. non-coronavirus cases have been proposed by [13] . The specific approach allows for a distinction among Covid-19, other types of viral infections, and non-infection cases. Results indicate that there are adequate detection rates and a higher detection rate than RT-PCR testing. Towards this direction, CNN structures are combined with Long Short Term Memory (LSTM) networks to improve the classffication accuracy of CNN networks further [14] . Additionally the work of [15] introduced a parallel partial decoder, called Inf-Net, which combines aggregation of highlevel features to generate a global map. This is achieved through the use of convolutional hierarchies. A U-Net-based model, named U-Net++, was applied to highresolution CT images for Covid-19 detection in [16] . Furthermore, in [17] a system for the detection of Covid-19 using 10 variants of CNNs in CT images is proposed, including AlexNet, VGG-16, VGG-19, SqueezeNet, GoogleNet, MobileNet-V2, ResNet-18, ResNet-50, ResNet-101, and Xception. ResNet-101 and Xception outperformed the remaining ones. AlexNet and Inception-V4 were also used for Covid-19 detection in CT scans in [18] . The framework presented in [19] used a CNN and an Artificial Neural Network Fuzzy Inference System (ANNFIS) to detect Covid-19, whereas a Stack Hybrid Classification (SHC) scheme based on ensemble learning is proposed in [20] . Focusing on segmentation, a type 2 fuzzy clustering system combined with a Super-pixel based Fuzzy Modified Flower Pollination Algorithm is proposed in [21] for Covid-19 CT image segmentation. Finally in [22] the experimental results indicate that the transfer learning approach outperforms the performance obtained without transfer learning, for the Covid-19 classification task in chest X-ray images, using deep classification models, such as convolutional neural networks (CNNs). The U-Net and the data transformation scripts, were developed in Python 3 using TensorFlow and Keras libraries. The models were trained in a VM using Unix MATE OS with 8 Core CPU and 64GB RAM provided GRNet Synefo service. Figure 1 presents the architecture of the U-Net model. To extract our results, we used two Lung Covid infected datasets, the Covid-19 CT segmentation dataset [23] from which we used only the Segmentation dataset nr. 2, this includes 9 DICOM files of continuous lung CT scans and the 20th April update [24] which contains another 20 labeled Covid-19 CT scans, from this we used only the 10 files marked as Coronacases and not the Redeopedia ones. The reason for this, is because all DICOM files we used, contained data in the Hounsfield scale [25] the Radeopedia set of DICOM files, contained pixel values in the range of 0-255 therefore we could not use it since it did not follow our normalization procedure. The first set of 9 DICOM files (we refer to this set as CT 1-9), contained 829 slides of CT, having dimensions of 630x630 pixels and includes already hand annotated lung and Covid masks for each slide of it. Similarly, the Coronacases dataset contained 10 CT scans (we refer to this set as CT 10) with 2581 slides in total having dimensions of 512x512 and includes also annotated masks by radiologists of the lung and Covid areas. Both sets, include continuous slides of complete lung CT scans of the same patient and not slides of different patients in each DICOM file. To construct our dataset we used only the slides that include lung areas, in order to achieve better results by reducing the extra information of slides without lung areas. But looking at Table 1 , we see that the data in the CT 1-9 dataset use 214 train + 60 validation + 440 test slides total of 714 slides, but we see that the slides with lung areas are in total 713. This happens since one of the slides, marks a tiny area of few pixels as Covid, even if there is no lung or Covid in it, like in Figure 6 (b). This is due to a human error in the annotation procedure. For the normalization process, we resized the images to 320x320 pixels using Nearest Neighbor Interpolation and we kept only the Hounsfield values in the range of -970 to -150. All the information that is needed in our paradigm of Covid segmentation in lung areas, relies in this spectrum of Hounsfield scale. To achieve this, we normalize each pixel based on the following type For every pixel value we get that is greater than 1. we assign the value 1 and every value less than 0. we assign the value 0. This way our dataset includes only values in the range [0.,1.] The radiologists have also marked in separate files the lung masks and the Covid masks for each slide of the CT scans we used. Therefore we arranged our Training Input in the form of n X 320 X 320 X 2 since in the first channel we used the normalized values of the CT slides and in the second channel we used the lung masks of each slide using binary values of 0,1 (1 in pixels that are marked as lung, 0 elsewhere). The output was having the form of n X 320 X 320 X 1 signaled with binary values 0,1 (1 in pixels that are marked as Covid, 0 elsewhere). In order to objectively evaluate our results, four different metrics are considered: accuracy, precision, recall, and the F1-score, which is directly calculated from precision and recall values. Accuracy (ACC) is defined as: where the nominator contains the true positives (TP) and true negatives (TN) samples, while denominator contains the TP and TN and false positives (FP) and false negatives (FN). Precision, recall and F1-score are given as: With the pre-described formation of both datasets, we started the training of a Unet using the CT 1-9 data, with 4 encoding/decoding layers having as input 440 x 320 x 320 x 2 were in the 1 st channel we have the data of the CT scan and in the 2 nd channel we have the lung mask of the specific CT slide. This way the model focuses only in the lung areas of the CT scans in the learning process. In the output we have 440 x 320 x 320 x 1 where in the only channel we have the masks data of the Covid areas. We also used a validation set of 60 x 320 x 320 x 2 as input and 60 x 320 x 320 x 1 as output. In the Unet we used (i.e. Figure 1 ) the rectified linear activation unit (ReLU) function for the 3x3 conv layers and the Sigmoid activation in the 1x1 conv layer, to get output values in the range (0,1), a learning rate of 0.0001, a batch size of 45 and the shuffling enabled. The max epochs were 200 from which it only used 111 till the early stopping engaged. We then extracted metrics for the test data of the CT 1-9 set (i.e. 214 slides) these were F1 score, Accuracy, Precision and Recall (see Table 2 ), from which we got a great Accuracy value with an average of 0.9973 and an F1 score with an average of 0.7832 in total. In Figure 3 we can see that the model performs great in the test set of the 1 st dataset, even in a bad scenario where the F1 Score is 0.4164 for this specific image. Afterwards we moved with the extraction of the same metrics, using all slides of set of the second dataset (CT 10). From these we got an average accuracy of 0.995 but an F1 score of 0.6137. Due to the low F1 score, we re-trained our model using a small portion of this dataset. From CD 10 dataset we had a similar size of train and validation sets (440 and 60) and we continued the train of the previous model but now we reduced the max epochs to 50. From these epochs the train process used 46 till the early stopping engaged. We again used the re-trained model in the test set of the 2 nd dataset which includes 1656 slides to obtain our metrics. We must note here that all the slides that have been used from both datasets, had an area marked as lung in the accompanied mask files from the radiologists. If a slide did not include a lung area, it was not included in the any of the trained/validation or test set data. From the extraction of metrics in the retrained model, we got an average accuracy of 0.9974 and an average F1 score of 0.8279 which was an improvement compared to the 0.6137 F1 before the retrain of the model. In Figure 5 see an example of how the re-trained model performs, even in the bad scenario where the F1 Score of is 0.0136 we cannot clearly see an actual GGO on the CT image, even if in the Covid masks the radiologists marks one. In the datasets we have used, we spotted some inconsistencies, since its annotation was made by hand. Specifically in our pre-train analysis, we found that in DICOM files there were slides with a false marked Covid areas. As we can see in Figure 6 (b) the radiologist has a marked Covid area in a section that there is no lung at all. We have also spotted a case where our model predicted a Covid area even if it was not marked as one by the radiologist. From the Figure 7 we can see that there is a GGO in the CT scan area, predicted by our model, even if there is no mask in ground truth. Taking account the previously mentioned paradigms, we cannot be sure of the extent of these faulty annotated areas, since the only chance to find them is to re-evaluate all annotations by a different radiologist and compare the results. To assist the medical personnel which works with tools like Computed Tomography scans, we constructed a python script that exports a 3D representation of the CT scan data and the Covid segments produced by our model. At first, we extracted for every slide in a CT the color value of each pixel. We created an array of color values only in the lung areas, each entry of the array was having the form of (x position, y position, z position, color value) The z position was computed starting from 0 and adding a constant value for every new slide that we export its data. We did the same procedure for the Covid masks that our model predicted and for the ground truth, but in these the color value was in binary (i.e. 0 or 1). We then exported these arrays in comma separated files which we then imported to the ParaView visualizer [26] using the table to point filtering. With the 3D reconstruction, the evaluation of a patience status is easier for the medical personnel, as they can see the whole lung and not evaluate separate CT slides which can be significant in number. With this tool it is also easier to estimate how a treatment performs in time, using different 3D representations of the same patient. From the Fig 8 we can compare the model prediction with the Ground Truth but we can also see that in the Ground Truth, there are faulty annotated Covid areas outside the lung segments (marked in the Ground Truth), made by the radiologists. Emergency Committee regarding the coronavirus disease (COVID-19) pandemic. en Antigen-detection in the diagnosis of SARS-CoV-2 infection. en Agnès Georges-Walryck, and Patrick Dehail. 2021. Accuracy of COVID-19 rapid antigenic tests compared to RT-PCR in a student population: The StudyCov study. eng Satu Kurkela, and Eliisa Kekäläinen. 2021. Real-life clinical sensitivity of SARS-CoV-2 RT-PCR test in symptomatic patients. eng Impact of the COVID-19 outbreak on the profession and psychological wellbeing of radiologists: a nationwide online survey Hundreds of AI tools have been built to catch covid. None of them helped. en Ground-glass opacity (GGO): a review of the differential diagnosis in the era of COVID-19 Deep Learning for Computer Vision: A Brief Review Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles COVID-19): Role of Chest CT in Diagnosis and Management Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: Evaluation of the Diagnostic Accuracy A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images. en Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks. en Deep Learning Model for Diagnosis of Corona Virus Disease from CT Images Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks. en A Novel Approach of CT Images Feature Analysis and Prediction to Screen for Corona Virus Disease (COVID-19). en SuFMoFPA: A superpixel and meta-heuristic based fuzzy image segmentation approach to explicate COVID-19 radiological images Transfer Learning for COVID-19 Pneumonia Detection and Classification in Chest X-ray Images. en. Technical report. Type: article. medRxiv Zhu Qiongjie, Dong Guoqiang, and He Jian. 2020. COVID-19 CT Lung and Infection Segmentation Dataset. Type: dataset This research has been co-financed by European Union's Horizon 2020 research and innovation programme under grant agreement No 883441 for the STAMINA Innovation action.