key: cord-0947581-jm6xx1cs
authors: Kumar Singh, Vivek; Abdel-Nasser, Mohamed; Pandey, Nidhi; Puig, Domenec
title: LungINFseg: Segmenting COVID-19 Infected Regions in Lung CT Images Based on a Receptive-Field-Aware Deep Learning Framework
date: 2021-01-22
journal: Diagnostics (Basel)
DOI: 10.3390/diagnostics11020158
sha: 985727ff76a25ce4ec792a90182babdac05114f7
doc_id: 947581
cord_uid: jm6xx1cs

COVID-19 is a fast-growing disease all over the world, but facilities in the hospitals are restricted. Due to unavailability of an appropriate vaccine or medicine, early identification of patients suspected to have COVID-19 plays an important role in limiting the extent of disease. Lung computed tomography (CT) imaging is an alternative to the RT-PCR test for diagnosing COVID-19. Manual segmentation of lung CT images is time consuming and has several challenges, such as the high disparities in texture, size, and location of infections. Patchy ground-glass and consolidations, along with pathological changes, limit the accuracy of the existing deep learning-based CT slices segmentation methods. To cope with these issues, in this paper we propose a fully automated and efficient deep learning-based method, called LungINFseg, to segment the COVID-19 infections in lung CT images. Specifically, we propose the receptive-field-aware (RFA) module that can enlarge the receptive field of the segmentation models and increase the learning ability of the model without information loss. RFA includes convolution layers to extract COVID-19 features, dilated convolution consolidated with learnable parallel-group convolution to enlarge the receptive field, frequency domain features obtained by discrete wavelet transform, which also enlarges the receptive field, and an attention mechanism to promote COVID-19-related features. Large receptive fields could help deep learning models to learn contextual information and COVID-19 infection-related features that yield accurate segmentation results. In our experiments, we used a total of 1800+ annotated CT slices to build and test LungINFseg. We also compared LungINFseg with 13 state-of-the-art deep learning-based segmentation methods to demonstrate its effectiveness. LungINFseg achieved a dice score of [Formula: see text] and an intersection-over-union (IoU) score of [Formula: see text] —higher than the ones of the other 13 segmentation methods. Specifically, the dice and IoU scores of LungINFseg were [Formula: see text] better than those of the popular biomedical segmentation method U-Net.

Coronavirus disease 2019 (COVID- 19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is still threatening humans worldwide. The World Health Organization (WHO) declared that COVID-19 (the novel coronavirus disease) is a global pandemic on the 11 March 2020 [1] . Due to unavailability of an appropriate vaccine or medicine, the early diagnosis of COVID-19 disease is very crucial to saving many people's lives and protecting frontline workers. One of the gold standard COVID-19 detection methods is RT-PCR (reverse transcription-polymerase chain reaction); note that the RT-PCR test is time-consuming and has low sensitivity [2] . Besides, RT-PCR testing capacity is not enough in all countries and the required material is limited A growing number of research groups across the globe have shown that medical image segmentation algorithms based on deep learning have a tremendous capacity that can help detect and segment COVID-19 infections from lung CT ges. In [8] , a deep learning-based method is suggested to segment COVID-19 infection by aggregating residual transformations and employing soft attention techniques to learn significant feature representations from lung CT images. In [9] , an encoder-decoder network with feature variation and progressive atrous spatial pyramid pooling blocks is proposed to segment the infected region. A total of 21,658 annotated chest CT images (861 confirmed COVID-19 patients) were used to train the segmentation model. With CT images of 130 patients, a dice score of 72.60% was achieved. The authors of [10] investigated the effectiveness of deep learning models for segmenting pneumonia infected area in CT images for the detection of COVID-19. Specifically, they studied the efficacy of U-Net and a fully convolutional neural network (FCN) with CT images. With a dataset of 10 axial volumetric CT scans of confirmed COVID-19 pneumonia patients, the FCN model achieved an F1-score (dice score) of 57% approximately.

In [11] , a COVID-19 pneumonia lesion segmentation network, called COPLE-Net, was proposed to handle the lesions with various scales and appearances. In this model, a noiserobust dice loss (a generalization of dice loss) is introduced. This segmentation model has been trained and evaluated on images of 558 COVID-19 patients collected from 10 different hospitals, achieving a dice score of 80.29%. Fan et al. [12] employed a parallel partial decoder to aggregate features from high-level layers to generate coarse representations.

Then, they used recurrent reverse attention and edge attention guidance approaches to model the boundaries of infected areas. In [12] , Fan et al. also proposed a semi-supervised segmentation framework, Semi-Inf-Net, based on a randomly selected propagation strategy that needs a few labeled pieces of data for training. The Semi-Inf-Net model achieved a dice score of 59.70% with nine real CT volumes with 638 slices. Muller et al. [13] used different preprocessing methods and on-the-fly data augmentation techniques for training the 3D U-Net architecture using a small CT image dataset. They achieved a dice score of 76.10% with 20 CT volumes.

As mentioned above, patchy ground-glass and consolidations, and pathological changes, limit the accuracy of the existing segmentation methods. Receptive field (field-ofview), a region of neurons in a particular layer that affects a neuron in the next layer, is a vital concept in designing CNN. Large receptive fields could help deep learning models to learn contextual information and COVID-19 infection-related features that yield accurate segmentation results. The most common ways to enlarge the receptive field of a CNN is to increase the depth of the network, use pooling operations, and enlarge of sizes of filters. The increase of network depth or enlargement of sizes of filters significantly increases the computational cost, and pooling operations yield information loss. Dilated convolution [14] is also employed to enlarge the receptive fields of CNNs by inserting zeros in the filters, which has no computational cost.

In an attempt to address the problems stated above, we propose a fully automated and efficient deep learning-based method called LungINFseg, to segment the COVID-19 infection in lung CT images. Specifically, we propose the receptive-field-aware (RFA) module that can enlarge the receptive field of a segmentation models and increase the learning ability of the model without information loss. RFA comprises convolution layers to extract COVID-19 features, dilated convolution consolidated with learnable parallelgroup convolution to enlarge the receptive field, frequency domain features obtained by discrete wavelet transform (DWT), which also enlarge the receptive field, and an attention mechanism to promote COVID-19-related features. We compared LungINFseg with 13 state-of-the-art deep learning-based segmentation methods to demonstrate its effectiveness. The main contributions of this article are listed below:

We propose a fully automated and efficient deep learning based method to segment the COVID-19 infection in lung CT images.

We propose the RFA module that can enlarge the receptive field of the segmentation models and increase the learning ability of the model without information loss.

We present a comprehensive comparison with 13 state-of-the-art segmentation models, namely, FCN [15] , UNet [16] , SegNet [17] , FSSNet [18] , SQNet [19] , ContextNet [20] , EDANet [21] , CGNet [22] , ERFNet [23] , ESNet [24] , DABNet [25] , Inf-Net [12] , and MIScnn [26] .

Extensive experiments were performed to provide ablation studies that add a thorough analysis of the proposed LungINFSeg (e.g., the effect of resolution size and variation of the loss function). To reproduce the results, the source code of the proposed model is publicly available at https://github.com/vivek231/LungINFseg. This article is structured as follows: Section 2 explains the proposed LungINFseg model. Section 3 presents experimental results with an ablation study about the features of the proposed model. Finally, Section 4 concludes the article. Figure 2 presents the framework of the proposed LungINFseg model, which includes encoder and decoder networks. LungINFseg receives CT images as input and produces binary masks highlighting the infected regions. The features of each encoder block are bypassed to the corresponding decoder block to preserve the spatial feature information. In the following sections, we explain each part in detail.

Skip connection CT input Output Figure 2 . Framework of proposed LungINFseg. Figure 3 shows the encoder network that comprises four RFA blocks. As one can see, the CT images are fed into the main DWT is to obtain multi-band multi-scale decomposition of input lung CT images. The resulting DWT representations of CT images serve as an inputs to RFA blocks. As the human visual system has unequal sensitivity to frequency components, inserting frequency information oto the deep learning-based COVID-19 infection segmentation models can significantly improve their performance. In this study, DWT was utilized to extract COVID-19 infection-relevant contextual information, enlarge the receptive field, and preserve image contextual and spatial information. The use of DWT can enlarge the receptive field of CNNs and also increase the amount of data, which enhances the training process. DWT uses filter banks for recognizing both time and frequency resolutions at the same time [27] . In this work, we use 2D DWT with four Haar filters, namely, f LL = [1 1; 1 1] T , f LH = [−1 − 1; 1 1] T , f HL = [−1 1; −1 1] T , and f HH = [1 − 1; −1 1] T , to decompose a particular lung CT image x into four sub-bands, i.e., x LL , x LH , x HL , and x HH , as shown in Figure 4 . The decomposition process can be expressed as follows [28] : As shown in Figure 4 , the input CT is convoluted with low-pass and high-pass filters. While the output of each filter contains half the frequency content, it has the same size as the input CT image. Therefore, the outputs of the low and high branches together comprise the same frequency content as the input CT image; however, the amount of data is doubled, which improves the training process of the proposed model (a kind of data augmentation). Figure 5 shows a zoom-in visualization of the decomposition for a CT image into four sub-bands using DWT. It should be noted that DWT is related to the pooling operation and dilated filtering [29] . Assume that we make an average pooling with a factor 2 on input image x; we get x pooling (i, j) = (x(2i − 1, 2j − 1) + x(2i − 1, 2j) + x(2i, 2j − 1) + x(2i, 2j))/4. As one can see in Equation (1), DWT decomposition is connected to the average pooling: for example, the only difference between x LL and x pooling is the fixed coefficient 1/4. In turn, the decomposition of an image into sub-images using DWT is relatively connected to dilated filtering. Figure 6 shows the structure of the RFA module, which includes convolutional layers obtained from ResNet-18 pre-trained on ImageNet [30] , a learnable parallel dilated group convolutional block (LPDGC), and a feature attention module (FAM). RFA encoding layers can learn low-level features from lung CT images, such as spatial information (e.g., shape, edge, intensity, and texture) in the training phase. As one can see in Figure 6 , the RFA block receives two inputs. Input 1 represents the features extracted in the previous layers (except in the first RFA block, input 1 represents the input CT images). Input 2 represents the DWT decompositions of the input CT image. Input 1 is fed into a convolution layer with a kernel of size 3 × 3 and a stride of 1. The resulting features are summed with input 2 and then fed into LPDGC and FAM modules. Note that the DWT features are resized to the size of Input 1 using bilinear interpolation before the summation process. illustrates how receptive fields with varying dilation rates can capture the small and relevant regions in CT images. In this work, we propose the use of the LPDGC block, in which the conventional convolutional filters employed in the parallel dilated group convolutional (PDGC) block are replaced by a fully learnable group convolution mechanism [31] . Figure 8 shows the architecture of the LPDGC block, which comprises four group convolution (G-conv) layers with different dilation rates (1, 2, 3, and 4) followed by an exponential linear unit (ELU) activation function. The kernel size of each G-conv layer is 3 × 3. The main goal of learnable group convolution methods is to design a dynamic and efficient mechanism for group convolution, in which input channels and filters in each group are learned during the training phase. In general, the grouping structure can be expressed as two binary selection matrices for channels (S k ) and filters (T k ), as follows: Figure 8 . Illustration of the LPDGC block. Here, p and d refer to the padding and dilation rates respectively. ELU refers to the exponential linear unit activation function.

The size of S k is C × G, and the size of T k is N × G, where, C, N, and G refer to the numbers of channels, filters, and groups respectively. It should be noted that the elements of S k and T k are set as 1 or 0 during the training process, where s k (i, j) = 1 indicates that the ith channel is set to the jth group. Similarly, t k (i, j) = 1 indicates that the ith filter is set to the jth group. The elements of S k and T k are learned during the training process of the CNNs. As shown in Figure 8 , the outputs of the four dilated convolutions are aggregated through an element-wise sum operation. Consequently, the size of receptive-field is increased and multi-scale spatial dependencies are considered without resorting to fully connected layers, which would be computationally infeasible. The LPDGC block helps capture the global context in CT images without reducing the resolution of the segmentation map.

Feature attention modules (FAMs) [32] were recently used to encourage CNNs to learn and focus on task-relevant information instead of learning non-useful information (background, non-desired objects, etc.). As one can see in Figure 9 , FAM computes the final feature of each channel as a weighted a sum of the features of all channels and original features, which helps boost COVID-19-relevant information and learn semantic dependencies between all feature maps. It should be noted that R(.) refers to reshaping Y to R C×N . As shown in Figure 9 (the lower branch), the input feature vector Y ∈ R C×H×W is multiplied by its transposed Y T , and the resulting vector is fed into a softmax layer to get the channel attention map X ∈ R C×C . The final output O is obtained as follows:

where β is the weight factor, and ⊕ refers to element-wise sum operation. Figure 9 . Diagram of our feature attention module (FAM). Figure 10 shows the architecture of the decoder network, which consists of four main decoding blocks. The fully-convolutional approach proposed in [15] is employed. This first convolutional layer decreases the overall computational cost by adding a 1 × 1 kernel. Upsampling layers with a factor *2 are used to upsample the resulting features, and then they are added to the features coming from the corresponding encoder layers via skip connections (as shown in Figure 2 ). A threshold of 0.5 is employed to convert the output to binary masks. The segmented binary mask has the same size as that of the input image. Table 1 describes the architecture of LungINFseg. We present the layers of the encoder and decoder, including the input and output feature maps with the number of strides, kernel size, and padding. It should be noted that the input of each encoder block is bypassed to the output of its identical decoder block to recover the spatial feature information [33] . 

In this work, we used block-wise loss (BWL) and total loss (TL) functions. In the case of the BWL function, we used dice loss function to compare the features extracted by each RFA block. The BWL function can be formulated as follows:

where N is the number of RFA blocks, H is the number of channels generated by RFA block i, y represents the ground-truth, y ci represents the corresponding feature maps,ŷ is the predicted mask,ŷ ci represents the feature maps generated by the RFA blocks, and dice is the Dice coefficient that can be expressed as follows:

Regarding the TL function, we calculated the loss of the whole network as follows:

The overall loss (OL) function used for training the proposed model is formulated as:

To assess the performances of the segmentation models, five evaluation metrics were used: accuracy (ACC), dice coefficient (DSC), intersection over union (IoU), sensitivity (SEN), and specificity (SPE). The formulations of these metrics are given in Table 2 . Table 2 . Metric used to evaluate the segmentation methods. 

In this section, the experimental details, the ablation study, the results of the proposed model, and the comparisons with state-of-the-art models are provided.

To evaluate the efficacy of the proposed model, we employed the publicly available dataset provided in [34] , which contains 20 labeled COVID-19 CT scans (1800 + annotated slices). This dataset can be found at https://zenodo.org/record/3757476#.X-T7P3VKhhE. Left lung, right lung, and infections were marked by two radiologists and confirmed by an experienced radiologist. The dataset was divided (patient-wise) into three subsets: 70% for training, 10% for validation, and 20% for testing.

Data augmentation techniques were applied during the training phase to improve the performance of the model and robustness. To augment the CT dataset, we conducted the following procedures: (1) we scaled the images by varying the scaling variable from 0.5 to 2.0 with a step size of 0.25, (2) we employed the gamma correction on the CT slices by changing the gamma scaling constant from 0.5 to 1.5 with a step size of 0.5, and (3) we performed the flipping operations (horizontally and vertically) with 0.5 and rotated them with various angles, such as 15.

Besides, lung CT images were resized to 256 × 256 pixels. Finally, we normalized each wavelet to [0, 1] to get the input of its corresponding binary segmentation network. It should be noted that LungINFseg processes each CT volume slice by slice. The hyperparameters of the model were empirically tuned. We examined numerous optimizers, such as SGD, AdaGrad, Adadelta, RMSProp, and Adam, while changing the learning rate; we obtained the best outcomes with the Adam optimizer with β1 = 0.5, β2 = 0.999, and learning rate = 0.0002 with a batch size of four. We trained all segmentation models from scratch for 100 epochs. The experiments were carried out on an NVIDIA GeForce GTX 1070Ti with 8 GB of video RAM. The operating system was Ubuntu 18.04 using a 3.4 GHz Intel Core-i7 with 16 GB of RAM. The main required packages involve Python 3.6, CUDA 9.1, cuDNN 7.0, and PyTorch 0.4.1. To reproduce the results, the source code of the proposed model is publicly available at https://github.com/vivek231/LungINFseg.

To demonstrate the impact of each block on the performance of the proposed model, an ablation study was done. We firstly trained a baseline model without appending the discrete wavelet transform (DWT), learnable parallel dilated group convolutional (LPDGC) block, or feature attention module (FAM). Next, we added DWT to the baseline model (baseline + DWT). Besides, the LPDGC block was added separately to the baseline model (baseline + LPDGC). Apart from this, we also added FAM to each encoding layer of the baseline model (baseline + FAM). Several configurations were investigated, such as baseline + DWT + LPDGC and baseline + DWT + FAM. Finally, we studied the performance of the proposed model with and without data augmentation. Table 3 presents the results of different configurations of the examined models. The baseline model yielded DSC and IoU scores of 75.56% and 61.96% respectively. From this initial check, there is a possibility of improvement in model performance. Alternatively to adding a gray-scale channel from lung CT images, we substituted the encoder input by adding DWT to baseline (baseline + DWT); note that DWT produces four channels that carry multi-scale (multi-bands) features. Baseline + DWT achieved gains of 1% and 1.5% in DSC and IoU scores, respectively, when compared to the baseline model. Furthermore, the LPDGC block was added to the baseline model to expand the receptive field with varying dilation rates, with the various sizes of kernels, allowing dense feature extraction in the encoder. Figure 11 reveals that the LPDGC block can help capture some small infected regions from lung CT images. Baseline + LPDGC yielded clear improvements of 1.5% and 2% in the DSC and IoU scores respectively, when compared to the baseline model. Baseline + FAM yielded an enhancement in all evaluation matrices, as it achieved 1.5-2% improvements in DSC, IoU, and SEN scores, meaning the FAM block helps to improve the feature discriminability between a given COVID-19-infected region and neighboring healthy pixels. Based on the significant enrichment of each block, DWT and LPDGC blocks were combined with the baseline model, which led to improvements of more than 2% in DSC, IoU, and SEN scores, and a decrement in the standard deviation by 1%. Besides, we added DWT and FAM to the baseline model, which allowed us to create descriptive features to highlight the infected region in poor contrast or fuzzy-boundary CT images. The experiments revealed that this configuration yields small increases on the evaluation metrics compared to previous results.

Using the proposed LungINFseg model, we experimented with varying configurations with and without applying data augmentation during the training procedure. Without implementing data augmentation (w/o augmentation), LungINFseg obtained encouraging 3% improvements in DSC and IoU scores when compared to the baseline model. Finally, we utilized data augmentation (with augmentation) with LungINFseg. The performance of LungINFseg was improved by 5-6% in DSC, IoU, and SEN scores. The standard deviation of LungINFseg was reduced from ±0.12 to ±0.10. These effects reveal that LungINFseg can present more precise and robust segmentation compared to the baseline model. Table 4 presents the effectiveness of input image resolution on the performance of the proposed model (512 × 512, 384 × 384, 256 × 256, and 128 × 128). With the image resolution 512 × 512, a 16 × 16 feature map was produced at the final encoding layer, which extracts infected region features from CT images. However, the use of higher resolution images keeps some artifacts at the segmented masks, leading to DSC and IoU scores of 78.74% and 66.48%, respectively. With the image resolution 384 × 384, a 12 × 12 feature map was produced at the final encoding layer. This image resolution did not contribute to advancing the results. In turn, the image size of 256 × 256 yielded an 8 × 8 feature map at the final encoding layer. This feature map preserves infected-area-associated features and discards the irrelevant ones. Lastly, we examined an input size of 128 × 128; we found that it yielded unclear boundary results in the segmented masks. Table 5 presents the performance of the proposed model with different combinations of loss functions: BCE (i.e., TL without dice loss-Equation (7)), BCE + IoU-binary, BCE + SSIM [35] , BCE + dice loss (i.e., TL-Equation (7)), and TL + BWL (OL-Equation (8)). As shown, all loss functions achieved a dice score higher than 73%. The IoU-binary and SSIM loss function did not achieve a promising IoU score (60.22% and 58.18%, respectively). The convergence of these two loss functions is not significant enough to achieve optimal performance. The best dice and IoU scores are achieved with OL, and therefore it has been utilized with the proposed model. 

To segment the COVID-19 infection from lung CT images, LungINFseg is compared with the state-of-the-art segmentation models, such as FCN [15] , UNet [16] , SegNet [17] , FSSNet [18] , SQNet [19] , ContextNet [20] , EDANet [21] , CGNet [22] , ERFNet [23] , ES-Net [24] , DABNet [25] , Inf-Net [12] , and MIScnn [26] models. All these models are assessed both quantitatively and qualitatively. For the quantitative study, segmentation accuracy is computed using the ACC, DSC, IoU, SEN, and SPE. For a fair comparison, the trainable parameters of the individual evaluated model are also provided. In turn, for the qualitative study, prediction with their corresponding ground truth binary masks are compared visually.

As shown in Table 6 , LungINFseg achieved the highest DSC of 80.34% and the highest IoU of 68.77%. As for IoU, LungINFseg was significantly improved from 60.87% to 68.77% on the test set compared to the best competitor, FCN. Besides, the second-best competitor DABNet obtained 74.03% and 60.03% DSC and IoU scores respectively; its depth-wise asymmetric bottleneck module generates a sufficient receptive field and densely utilizes the contextual information. In comparison with the results of the very popular baseline biomedical segmentation model called UNet, LungINFseg exceeds it by more than 10% in both DSC and IoU scores. Additionally, SegNet achieved acutely poor outcomes in all matrices considering it is inefficient to segment accurately by producing many numbers of false positives. In turn, FSSNet has very few parameters (0.17 M); it yielded a 67.89% DSC score and failed to restore the infected region's spatial information at the output level. In the same manner, SQNet did not perform properly, but compared to LungINFseg yielded more than 22% improvements in DSC and IoU scores. Besides, the ContextNet creates a poor result, as it fails to retain the global context information efficiently, and LungINFseg shows 10% gains in DSC and IoU scores. Nevertheless, EDANet has performed slightly better-70.32%-DSC score because of its dilated convolution and dense connectivity aid to attain the greater result. Further, CGNet has shown some advancement due to its learning capability of the joint features of both local features and neighboring context. However, it misses getting much more global information to form effective segmentation. This model yields 71.21% DSC and 56.83% IoU scores, but LungINFseg has promising increases of 9% and 12% in both DSC and IoU scores respectively.

Two models, ERFNet and ESNet, employed residual 1-D factorized convolutions in encoding layers to extract important features and support to decrease the computation cost (2.06 M and 1.65 M of ERFNet and ESNet respectively). Extracted features do not significantly present a contribution to increasing the feature learnability; and LungINFseg achieved better results of around 7%, 9%, and 12% in DSC, IoU, and SEN scores respectively. Moreover, we have compared the results of our model with the Inf-seg model [12] . As one can see, LungINFseg yields a 12%, significant improvement in the DSC score.

Additionally, we have trained MIScnn [26] from scratch and then compared it with LungINFseg, finding that our model outperforms the results of MIScnn in terms of all evaluation metrics. Unlike the models mentioned-above, LungINFseg has a great generalization ability to segment the infection areas from lung CT images, thanks to RFA and DWT modules that can enlarge the receptive field of the segmentation models and increase the learning ability of the model without information loss.

To demonstrate the ability of LungINFseg, we present illustrative statistics of Dice and IoU scores. In Figure 13 , we show the boxplots of the Dice and IoU scores of the proposed model, FCN, UNet, SegNet, FSSNet, SQNet SQNet, ContextNet, EDANet, CGNet, ERFNet, ESNet, and DABNet. As shown in Figure 13 , among the tested models, the proposed model has the highest mean DSC and IoU scores and the smallest standard deviation with few outliers. In turn, other rest models have represented multiple outliers with low mean and high standard deviation compared to LungINFseg. In Figures 14 and 15 , for example, first, third, and sixth rows have confirmed single infected regions in both sides of the lung and a very small area covered on the left side of the lung. We clearly recognized that LungINFseg is capable of correctly segmenting both side's lungs, and the other model has carried larger false positives to do an inaccurate segmentation. Furthermore, for example, fourth and fifth rows have a widespread infection on both sides of the lung. In order to provide promising segmentation, LungINFseg segmented quite properly; despite that, FCN has created an acceptable segmentation. However, UNet, FSSNet, ContextNet, EDANet, CGNet, and ESNet generated very poor predictions due to lacking details contained in the low-level context information. Moreover, second row predictions present single small areas of infection, where LungINFseg shows its promising ability to properly segment infected areas, and the other compared methods produced larger false positive predictions. Figure 16 presents a quantitative comparison of the segmentation results of LungINFseg, Inf-Net, and MIScnn models. As one can see in the examples of the second, third, fourth and sixth columns, LungINFseg can accurately segment COVID-19 infection and has fewer FP compared to the Inf-Net and MIScnn models. The proposed model is especially useful for the segmentation of an infection with an indefinite boundary and small targets. Figure 16 . Qualitative comparison of the segmentation results of the LungINFseg, Inf-Net, and MIScnn. Here, left and right side numbers on each example refer to dice and IoU scores, respectively. The colors are used to represent the segmentation results as follows: TP (orange), FP (green), FN (red), and TN (black).

In this article, we have introduced an efficient deep learning-based LungINFseg model to segment the COVID-19 infection in lung CT images. Specifically, we have proposed the RFA module that can enlarge the receptive field of the segmentation models and increase the learning ability of the model without any information loss. We conducted extensive experiments that used 1800+ annotated CT slices to build and test LungINFseg. Further, we compared LungINFseg with 13 state-of-the-art deep learning-based segmentation methods to demonstrate its effectiveness. LungINFseg achieved a dice score of 80.34% and an IoU score of 68.77%, which are higher than the ones of the other 13 segmentation methods. Our experiments revealed that the RFA module, which allows enlarging receptive fields and encourages learning contextual information and COVID-19 infection-related features, yields accurate segmentation results. We found that LungINFseg can segment infected regions in CT images accurately and may have promising clinical potential. In future work, we will integrate our proposed model with a fully automated CAD system for making an accurate predictions for the severity of COVID-19. Besides, we will apply LungINFseg to different medical image segmentation problems, such as lung lobe segmentation, skin lesion segmentation, and breast tumor segmentation in ultrasound images. 

The dataset used in this study can be found at https://zenodo.org/ record/3757476#.X-T7P3VKhhE.

World Health Organization. WHO Coronavirus Disease (COVID-19) Dashboard; World Health Organization

CT imaging and differential diagnosis of COVID-19

Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases

Coronavirus disease 2019 (COVID-19): A systematic review of imaging findings in 919 patients

Explainable Machine Learning for Early Assessment of COVID-19 Risk Prediction in Emergency Departments

A critic evaluation of methods for covid-19 automatic detection from x-ray images

Chest CT in patients with a moderate or high pretest probability of COVID-19 and negative swab

Residual Attention U-Net for Automated Multi-Class Segmentation of COVID-19

You, Z. COVID-19 Chest CT Image Segmentation-A Deep Convolutional Neural Network Solution. arXiv 2020

Deep learning models for COVID-19 infected area segmentation in CT images

A Noise-robust Framework for Automatic Segmentation of COVID-19 Pneumonia Lesions from CT Images

Automatic COVID-19 Lung Infection Segmentation from CT Scans. arXiv 2020

Automated Chest CT Image Segmentation of COVID-19 Lung Infection based on 3D U-Net

Multi-scale context aggregation by dilated convolutions. arXiv 2015

Fully convolutional networks for semantic segmentation

U-net: Convolutional networks for biomedical image segmentation

Segnet: A deep convolutional encoder-decoder architecture for image segmentation

Fast semantic segmentation for scene perception

Speeding up semantic segmentation for autonomous driving

Contextnet: Exploring context and detail for semantic segmentation in real-time

Efficient dense modules of asymmetric convolution for real-time semantic segmentation

Cgnet: A light-weight context guided network for semantic segmentation

Erfnet: Efficient residual factorized convnet for real-time semantic segmentation

ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation

Depth-wise asymmetric bottleneck for real-time semantic segmentation

A Framework for Medical Image Segmentation with Convolutional Neural Networks and Deep Learning. arXiv 2019

The wavelet transform, time-frequency localization and signal analysis

A theory for multiresolution signal decomposition: The wavelet representation

Multi-level wavelet convolutional neural networks

Imagenet: A large-scale hierarchical image database

Fully learnable group convolution for acceleration of deep neural networks

Dual attention network for scene segmentation

Exploiting encoder representations for efficient semantic segmentation

COVID-19 CT Lung and Infection Segmentation Dataset

Image quality assessment: From error visibility to structural similarity

Acknowledgments: The Spanish Government partly supported this research through project PID2019-105789RB-I00.

The authors declare no conflict of interest.