key: cord-0981639-tsovcx41
authors: Alshazly, H.; Linse, C.; Abdalla, M.; Barth, E.; Martinetz, T.
title: COVID-Nets: Deep CNN Architectures for Detecting COVID-19 Using Chest CT Scans
date: 2021-04-27
journal: nan
DOI: 10.1101/2021.04.19.21255763
sha: e7c7593e7bf766da0182abd28566065a8c5585d6
doc_id: 981639
cord_uid: tsovcx41

This paper introduces two novel deep convolutional neural network (CNN) architectures for automated detection of COVID-19. The first model, CovidResNet, is inspired by the deep residual network (ResNet) architecture. The second model, CovidDenseNet, exploits the power of densely connected convolutional networks (DenseNet). The proposed networks are designed to provide fast and accurate diagnosis of COVID-19 using computed tomography (CT) images for the multi-class and binary classification tasks. The architectures are utilized in a first experimental study on the SARS-CoV-2 CT-scan dataset, which contains 4173 CT images for 210 subjects structured in a subject-wise manner for three different classes. First, we train and test the networks to differentiate COVID-19, non-COVID-19 viral infections, and healthy. Second, we train and test the networks on binary classification with three different scenarios: COVID-19 vs. healthy, COVID-19 vs. other non-COVID-19 viral pneumonia, and non-COVID-19 viral pneumonia vs. healthy. Our proposed models achieve up to 93.96% accuracy, 99.13% precision, 94% sensitivity, 97.73% specificity, and a 95.80% F1-score for binary classification, and up to 83.89% accuracy, 80.36% precision, 82% sensitivity, 92% specificity, and a 81% F1-score for the three-class classification tasks. The experimental results reveal the validity and effectiveness of the proposed networks in automated COVID-19 detection. The proposed models also outperform the baseline ResNet and DenseNet architectures while being more efficient.

COVID-19 pneumonia [20, 31, 47, 51, 53] . These systems were constructed through a combination of segmentation and classification models. In the first stage, the lung region or the lesion region are first segmented Ensemble learning and deep ensembles were also explored in COVID-19 detection to improve the performance of single models. The authors in [42] , proposed an ensemble based on three deep networks 212 including: VGGNet [40] , ResNet [21] , and DenseNet [22] , which were pretrained on natural images. The 213 networks were considered for extracting features from the CT images, and a set of fully connected 214 layers were added on top to perform the classification task. Experiments were conducted on a dataset 215 with CT scans collected from different sources for patients with COVID-19, other lung diseases, and 216 healthy subjects. The proposed ensemble achieved better performance than using any single model 217 from the ensembled networks. Tao et al. [60] proposed an ensemble of three pretrained deep CNN 218 models, namely AlexNet [30] , GoogleNet [45] , and ResNet [21] to improve the classification accuracy of 241 In the following subsections we describe our proposed CovidResNet and CovidDenseNet models for 242 the automated COVID-19 detection on the SARS-CoV-2 CT-scan dataset. Inspired by the outstanding 243 performance of the well-designed ResNet [21] and DenseNet [22] architectures, we build our networks by 244 following similar construction patterns to get the benefits from both architectures. 

Our CovidResNet architecture is based on the deep residual networks (ResNets) [21] . ResNet is considered 247 a very deep CNN architecture and the winner of the 2015 ImageNet challenges [39] . The main problems 248 that have been addressed by the ResNet models are the vanishing gradients and performance degradation, 249 which occur during training deep networks. A residual learning framework was proposed, which promotes 250 the layers to learn residual functions with respect to the layer input. While conventional network layers 251 are assumed to learn a desired underlying function y = f (x) by some stacked layers, the residual layers 252 attempt to approximate y via f (x) + x. The residual layers start with the input x and evolve to a more 253 complex function as the network learns. This type of residual learning allows training very deep networks 254 and attains an improved performance from the increased depth.

The basic building block for CovidResNet is the bottleneck residual module depicted in Figure 1 . The 256 input signal to the module passes through two branches. The left branch is a stack of three convolutional 257 layers. The first 1 × 1 convolution is used for reducing the depth of the feature maps before the costly 3 × 3 258 convolutions, whereas the second 1 × 1 is used for increasing the depth to match the input dimensions. The 259 convolutions are followed by batch normalization (BN) [24] and rectified linear unit (ReLU) [34] activation.

The right branch is a shortcut connection that connects the module's input with the output of the stacked 261 layers, which are summed up before applying a final ReLU activation.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) CovidResNet is considered a deep model that consists of 29 layers. The first layer is made of 7 × 7 263 convolutional filters with a stride of 2. Following is a max pooling layer to downsample the spatial 264 dimensions. The architecture continues with a stack of four ResNet blocks, where each block has a 265 number between one and three bottleneck residual modules. When moving from a ResNet block to the 266 next one, the spatial dimension is reduced by max pooling and the number of the learned filters is doubled.

In the first block, we stack three modules, each having three convolutional layers with 64, 64 and 256 268 filters, respectively. After another max pooling layer, we stack three more bottleneck modules with a 269 configuration of 128, 128 and 512 filters, which forms the second block. The same procedure is repeated 270 for the third and fourth blocks, where the former has two stacked modules and the later has only one. The 271 network ends with an adaptive average pooling step and a fully connected layer. Table 1 summarizes the   272 CovidResNet architecture and a visualization is given in Figure 2 . As can be seen in the diagram, the The basic building block for the CovidDenseNet model is the DenseNet block. A simplistic form 286 of the dense connectivity of a dense block is shown in Figure 3 . The block has three layers and each 287 layer performs a series of batch normalization, ReLU activation, and 3 × 3 convolution operations. The 288 concatenated feature maps from all preceding layers are the input to the subsequent layer. Each layer 289 generates k feature maps, where k is the growth rate. So, if k 0 is the input to layer x 0 , then there are 290 3k + k 0 feature maps at the end of the 3-layer dense block. However, two main issues arise as the network 291 depth increases. First, as each layer generates k feature maps, the inputs to layer l will be (l − 1)k + k 0 , 292 and with deep networks this number can grow rapidly and slow down computation. Second, when the 293 network gets deeper, we need to reduce the feature maps size to increase the kernel's receptive field. So, 294 6/23 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 27, 2021. ; Figure 2 . A schematic diagram for the ensemble prediction process for the three-class problem. Both networks accept the same input CT image and each network outputs an independent class probability vector. The probability vectors are then averaged to obtain the final predicted class with highest probability.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 27, 2021. ; https://doi.org/10.1101/2021.04.19.21255763 doi: medRxiv preprint when concatenating feature maps of different sizes we need to match the dimensions. The first issue is 295 addressed by introducing a bottleneck layer of 1 × 1 convolution and 4 × k filters after every concatenation.

The second issue is addressed by adding a transition layer between the dense blocks. The layer includes 297 batch normalization and 1 × 1 convolution followed by an average pooling operation. and the CT images have variable sizes, the dataset is challenging. Figure 4 shows 12 CT images from the 

Wide variations in the CT image sizes in the SARS-CoV-2 CT-scan dataset ask for a strategy to resize 325 the images to a consistent input dimension for the network. The most frequently used approach to unify 326 images with different aspect rations involves stretching, which can result in images that look unnatural 327 or distorted. Therefore, we opt for a different procedure to preserve the aspect ratio by embedding the 328 8/23

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2021. ; image into a fixed-sized canvas. We apply padding with the average color of the ImageNet dataset [13] 329 when necessary to match the target shape. We empirically tried different input sizes and found that a To conduct our experiments and analysis we split the dataset into training and test sets. We follow 338 the subject-wise structure of the dataset, such that the two sets of persons in the training and test set are 339 disjunct. Hence, it is assured, that we evaluate our models on unseen persons. However, the number of 340 CT images per person vary. We choose 59.5% of the subjects for training and 40.5% for testing, such that 341 the amount of training images is 60% and 40%, respectively. The same ratio of persons is used for both 342 scenarios of multi-class and binary classification tasks. Within one scenario we choose the same split for 343 each architecture for the sake of consistency and comparability. 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

In CovidResNet all weights are pretrained, but the last layer. In CovidDenseNet the adapter layers and the 373 last layer are randomly initialized and all other weights are copied from the DenseNet121 model that was 374 pretrained on ImageNet. 375 We empirically found that it is not necessary to adjust all weights to the COVID-19 detection problem. 376 We assume that the filters from the first layers in a computer vision network provide somewhat generic 377 filters that can be used for the SARS-CoV-2 CT-scan dataset. The idea is to reduce the risk of overfitting . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2021. ; The models are initialized using pretrained weights that have been optimized for the ImageNet dataset.

Then, we train the models using the LAMB optimizer [57] , an initial learning rate of 0.0003 and cross- 

402 Table 5 provides the performance metrics, which are computed for each specific class, and the macro- we report the results of the best two ensembles in Table 5 . We can see that in both cases, the ensemble 421 models achieve better performance with respect to the macro average metrics compared to any individual cases. 428 We also plot the ROC curves and compute the AUC to investigate the diagnostic accuracy of the 429 proposed models for the multi-class problem in Figure 6 . Our CovidResNet and CovidDenseNet models 430

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2021. ; Figure 5 . Confusion matrices generated by the different models for the three-class classification task.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2021. ; https://doi.org/10.1101/2021.04.19.21255763 doi: medRxiv preprint DenseNet models and their baselines increases the classification accuracy for all classes. The superiority 437 of the ensembles over single models is also reflected in the ROC curves and their corresponding AUC 438 scores. When combining CovidDenseNet and CovidResNet, which we refer to it as Ensemble 1, we notice 439 that the AUC score for the Healthy class increased from 87.9% to 92.5% , whereas an increment within 440 1% in the AUC score is attained for the COVID-19 and Others classes. Similar results are achieved when 441 we combine the CovidDenseNet and its deeper baseline DenseNet121, which we refer to it as Ensemble 2, 442 even though the models were trained on the same training split of the used dataset. 

We have trained and tested our proposed architectures on binary classification tasks to investigate their 445 ability to distinguish between CT images of all possible classes as well as to investigate the difficulty of 446 these subtasks on the given dataset. We investigate three experimental scenarios. First, we train and test 447 our models to differentiate patients with COVID-19 from healthy individuals (COVID-19 vs. Healthy).

Then, we train and test the models to distinguish COVID-19 cases from non-COVID-19 patients infected 449 by other lung diseases (COVID-19 vs. Others). Finally, we train and test our models to differentiate 450 non-COVID-19 patients infected by other pulmonary diseases from healthy subjects (Others vs. Healthy). 451 Table 6 presents the results obtained by each model under each of these scenarios.

In the first scenario (COVID-19 vs. Healthy) we used 866 CT images of COVID-19 and 309 CT 453 images from the healthy class for testing. As we can see from Table 6 and under this scenario, all four 454 models achieve very competitive performance with accuracy above 93% and F1-score above 95%. The For a detailed class-wise results, the confusion matrix for each specific model under the considered 469 scenario is presented in Figure 7 . . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) In the second scenario (COVID-19 vs. Others) we investigate the effectiveness of our models in 475 differentiating the CTs of COVID-19 from others with viral lung infections. It is worth mentioning that 476 this is a challenging task due to the potential overlap of findings on CT images between COVID-19 and 477 the other lung viral infections. The obtained results in Table 6 

We also compare the performance of the different models under this scenario by plotting the ROC 495 curve and computing the AUC for each model. Figure 10 Figure 11 shows the confusion matrix for each of the tested models. We can observe that all the . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Figure 11 . Confusion matrices for Others vs. Healthy classification.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 

We proposed two deep CNN architectures (CovidResNet and CovidDenseNet) for the automated detection 521 of COVID-19 using chest CT scans. The models were developed and validated on the large multi-class 522 SARS-CoV-2 CT-scan dataset, which has more than 4000 CT scans. We conducted extensive experiments 523 to evaluate our models in multi-class and binary classification tasks. First, we trained our models to 524 differentiate COVID-19 cases from other non-COVID-19 infections as well as from healthy subjects. infections from non-infected healthy subjects. The obtained results demonstrate the superior performance 530 of our proposed models over the baseline models.

As to our knowledge, this is the first experimental study on the SARS-CoV-2 CT-scan dataset that 532 considers subject-wise splits for training and testing. Therefore, our models and results can be used as a 533 baseline benchmark for any future experiments conducted on this dataset. Although our experimental 534 results are promising, there is still room for improvement. We assume that experiments conducted on even 535 larger datasets of CT scans will improve the diagnostic accuracy and provide a more reliable estimation 536 of the models' performance. Collecting more CT scans and subjects for all classes and particularly the 537 Healthy and Others categories can further improve the diagnostic performance of the proposed models.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 27, 2021. ; https://doi.org/10.1101/2021.04.19.21255763 doi: medRxiv preprint

Computer-aided detection of COVID-19 from X-ray images using Recognition (CVPR)

Using Deep Learning from Chest X-ray Images During COVID-19

Batch normalization: Accelerating deep network training by 604 reducing internal covariate shift

COVID-19 infected patients using DenseNet201 based deep transfer learning

Development and evaluation of an artificial intelligence system for COVID-19 diagnosis

Chest ct findings in 2019 novel coronavirus (2019-ncov) infections from wuhan, 613 china: Key points for the radiologist

CoVNet-19: A Deep Learning model for the detection and 615 analysis of COVID-19 patients

Diagnostic Performance of CT and Reverse Transcriptase 617

Polymerase Chain Reaction for Coronavirus Disease 2019: A Meta-Analysis

ImageNet classification with deep convolu-620 tional neural networks

From community-acquired pneumonia to COVID-19: A deep learning-624 based method for quantitative analysis of COVID-19 on thick-section CT scans

Diagnosis of the Coronavirus disease (COVID-19): rRT-PCR or CT?

V-Net: Fully convolutional neural networks for 630 volumetric medical image segmentation

Rectified linear units improve restricted boltzmann machines

Dual-Sampling Attention 636 Network for Diagnosis of COVID-19 From Community Acquired Pneumonia

A comprehensive study on classification of COVID-19 on computed tomography 639 with pretrained convolutional neural networks

Classification of COVID-19 chest X-rays with deep learning: new models or 641 fine tuning?

U-Net: Convolutional networks for biomedical 643 image segmentation

Very deep convolutional networks for large-scale image 649 recognition

Classification of COVID-19 patients from chest CT 652 images using multi-objective differential evolution-based convolutional neural networks

Densely connected convolutional networks-based 655 COVID-19 screening model

SARS-CoV-2 CT-scan 657 dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification

Covid CT-net: A deep learning 659 framework for COVID-19 prognosis using CT images

Going deeper with convolutions

Computer Vision and Pattern Recognition

Convolutional capsnet: A novel artificial 665 neural network approach to detect covid-19 disease from x-ray images using capsule networks

AI-assisted CT imaging analysis 670 for COVID-19 screening: Building and deploying a medical AI system

Residual 673 attention network for image classification

COVID-Net: A tailored deep convolutional neural net-676 work design for detection of covid-19 cases from chest X-ray images

Contrastive Cross-site Learning with Redesigned Net for 678

COVID-19 CT Classification

COVID-AL: The diagnosis of COVID-19 680 with deep active learning

Accurately Differentiating

Between Patients With COVID-19

Multimodal Late Fusion Learning Approach

A deep learning system to 687 screen novel coronavirus disease 2019 pneumonia. Engineering

Predicting Covid-19 From Chest CT Images Using Attentional Convolutional Network

Chest CT manifestations of new 692 coronavirus disease 2019 (COVID-19): A pictorial review

Tomography Findings of Novel Coronavirus Disease 2019 (COVID-19) Pneumonia. Archives of 695

Large batch optimization for deep learning: Training bert in 76 minutes

RLDD: An Advanced Residual