key: cord-0534999-7da8kg2d
authors: Li, Yanwen; Luo, Luyang; Lin, Huangjing; Chen, Hao; Heng, Pheng-Ann
title: Dual-Consistency Semi-Supervised Learning with Uncertainty Quantification for COVID-19 Lesion Segmentation from CT Images
date: 2021-04-07
journal: nan
DOI: nan
sha: 680488ebaa5da23052659f750139a6b8bcc107de
doc_id: 534999
cord_uid: 7da8kg2d

The novel coronavirus disease 2019 (COVID-19) characterized by atypical pneumonia has caused millions of deaths worldwide. Automatically segmenting lesions from chest Computed Tomography (CT) is a promising way to assist doctors in COVID-19 screening, treatment planning, and follow-up monitoring. However, voxel-wise annotations are extremely expert-demanding and scarce, especially when it comes to novel diseases, while an abundance of unlabeled data could be available. To tackle the challenge of limited annotations, in this paper, we propose an uncertainty-guided dual-consistency learning network (UDC-Net) for semi-supervised COVID-19 lesion segmentation from CT images. Specifically, we present a dual-consistency learning scheme that simultaneously imposes image transformation equivalence and feature perturbation invariance to effectively harness the knowledge from unlabeled data. We then quantify the segmentation uncertainty in two forms and employ them together to guide the consistency regularization for more reliable unsupervised learning. Extensive experiments showed that our proposed UDC-Net improves the fully supervised method by 6.3% in Dice and outperforms other competitive semi-supervised approaches by significant margins, demonstrating high potential in real-world clinical practice.

By the end of 2020, the coronavirus disease 2019 [36] characterized by atypical pneumonia has spread over 220 countries and areas, infected more than 81 million people, and caused near 1.8 million losses of lives 1 . For early screening of the COVID-19, chest computed tomography (CT) plays a vital role as a noninvasive and fast technique, which is reported to have high sensitivity for detecting COVID-19-related abnormal findings [6, 1, 13, 7] . To improve the screening efficiency and alleviate radiologists' reading burden, various automatic COVID-19 chest CT analysis methods have been proposed from whole-volume classification and triaging [20, 27, 8, 17, 4] , weakly-supervised lesion localization [16, 31] , to accurate segmentation of lesion regions [5, 26] . Among previous studies, segmentation of COVID-19 often provides more accurate descriptions of the lesions, which has significant potential in assisting doctors with the diagnosis, treatment planning, and follow-up monitoring.

Currently, advanced segmentation methods are often fully supervised and heavily rely on pixel-wise or voxel-wise annotations. For novel diseases like COVID-19, acquiring such annotations is extremely expertise-demanded and time-consuming, while unlabeled data are often abundant due to increasing positive cases. Therefore, semi-supervised learning (SSL) that utilizes both labeled and unlabeled data is of great value to develop robust and accurate COVID-19 lesion segmentation algorithms. Thus far, many SSL approaches have been developed and successfully applied to various tasks [25] . Many works [23, 19, 2, 9, 14] adopts the smoothness assumption that two data samples that are close in the input space share the same label. This assumption is further expanded to the deep feature space, where similarities of feature maps are used for cluster assignment [28, 21, 29] . Despite the achievement, these approaches do not ensure robust learning from samples with low uncertainty. To reduce the influence of uncertain samples, uncertainty guidance has been introduced into the literature of SSL [34, 33, 30, 15] . Nevertheless, semi-supervised segmentation of COVID-19 lesions remains a challenging task, of which the annotations are extremely scarce, and the lesions often have irregular and ambiguous contours.

To tackle the above challenges, we propose a novel deep neural network with a uncertainty-guided dual-consistency learning scheme for COVID-19 lesion segmentation from chest CT scan volumes. Specifically, we impose image-level transformation equivalence out of the observation that the prediction of a sample should obtain the same transformation of the input. Meanwhile, we adopt feature-level perturbation invariance to a multi-decoder V-Net, where auxiliary decoder paths take perturbated features as inputs and form output consistency with a main decoder. Dual-consistency comprehensively enforces smoothness assumption into the SSL model from both input space and feature space, and hence the network could learn more invariant representations to diverse input or feature variants. Moreover, deep neural networks could memorize and easily overfit to noisy and uncertain contour points of COVID-19 lesions [35] , which leads to Fig. 1 : Overview of UDC-Net. Feature-level consistency (in green) is formed by the main decoder's prediction p U and auxiliary decoders' predictions {q 1 U , · · · , q k U }. Image-level consistency (in blue) is formed by p U and the predictionp U of transformed image. The confidence uncertainty u m and the consensus uncertainty u s are quantified by mean and standard deviation of the multidecoders' predictions, which are then used to guide the consistency learning (in red). A supervised loss is also used on the labeled data (in orange).

poor generalization in real-world clinical practice. Hence, we further introduce a novel uncertainty guidance to the consistency learning process. Particularly, we quantify both the confidence uncertainty and the consensus uncertainty based on the multi-decoder structure. The estimated uncertainties are then used together in an indicator function to filter out uncertain samples during training. The proposed uncertainty-guided dual-consistency network (UDC-Net) is evaluated on a large-scale COVID-19 dataset with 852 whole-volume chest CT scans. Extensive experiments show that our approach outperforms other competitive SSL-based segmentation approaches, yielding state-of-the-art performance on semi-supervised COVID-19 lesion segmentation.

As shown in Fig. 1 , our UDC-Net consists of a modified 3D multi-decoder V-Net [18] as its backbone. Apart from the supervised loss, our method makes full use of the unlabeled data by both feature-level and image-level consistency modules. Moreover, both the confidence uncertainty and the consensus uncertainty are estimated to guide more robust consistency learning. .

Image-level Consistency Learning via transformation equivalence of deep segmentation models f seg indicates that while a transformation T (·) is applied to an input image x, there should be f seg (T (x)) = T (f seg (x)) [32]. We conduct random transformation on the images to get the perturbated version T (x) as the input to our network. Subsequently, we have the corresponding prediction f (T (x)) given by the V-Net and the inverse transformation to the output T −1 (f (T (x))), which should be consistent to the output of input data without transformation f (x). Following the notations set before, let p = f (x) andp = f (T (x)) ,we introduce an image-level consistency regularization by minimizing the L2 loss between the two versions of output:

where i and N are the index and the total number of voxels, respectively.

Feature-level Consistency Learning via perturbation invariance can also enrich the learned representation of the model [19] . Particularly, different perturbated versions of the same feature maps should maintain the same predictions. Following [21] , we append several auxiliary decoders to the V-Net and inject shared encoder's outputs with various types of perturbations. Each auxiliary decoder receives a different version of the perturbated feature map, while the main decoder receives the un-perturbated feature map. Denoting the prediction from the main decoder as p, the prediction from the k-th auxiliary decoder as q k , the feature-level consistency is achieved by regularizing p and each q k as follows:

where K is the total number of extra decoders. Following [21] , seven types of feature perturbations, i.e., Feature noise, Feature dropout, Object masking, Context masking, Guided cutout, Intermediate VAT, and Random dropout, were introduced to seven auxiliary decoders, respectively. Detailed descriptions of each perturbation strategy can be found in the supplementary. All extra decoders were required to generate consistent prediction with the main decoder.

The perturbation of the hidden representations during the consistency learning process could amplify the feature noises and uncertainty caused by the difficulty of accurately delineating the lesion contours of COVID-19. To this end, we propose to quantify both the confidence uncertainty and the consensus uncertainty of the multi-decoders, to guide more robust unsupervised learning.

Confidence Uncertainty indicates whether the model generates confident predictions. Previous works [34, 15] used the entropy of the mean prediction of multiple perturbated inputs from self-ensembling models to estimate the prediction uncertainty. In our case, this form of uncertainty can be easily quantified using the main decoder and the K auxiliary decoders as below:

where i indicates the voxel index, K is the total number of auxiliary decoders, µ is the mean prediction, and u m is the estimated uncertainty. The higher u m i is, the less confidence the model is on its prediction.

Consensus Uncertainty indicates whether the model generates consistent predictions over multiple runs with perturbated data [11, 9] . Supposing the average prediction of a suspicious infection area is high but the outputs from different branches vary severely, this means the area is sensitive to perturbation. By the smoothness assumption [3] , the predictions for the target should be robust to perturbation, and the sensitive prediction hence highly suggests a noisy sample. Hence, we quantify the consensus uncertainty u s as the standard deviation over the multi-decoders' predictions:

Here, u s essentially indicates the consensus among different decoders, which is complementary with u m which measures the confidence of the model.

The quantified uncertainties are used to filter out uncertain voxels and consequently guide the model to learn from more reliable unlabeled data. Denoting i as the voxel index for the prediction volume, the reliable voxels are selected from a set Ω = {i|u s i < τ s & u m i < τ m }, where τ s and τ m are two thresholds. The cross consistency loss among decoders is then guided by:

Here, the uncertainty guidance is applied onto feature-level consistency learning as the uncertainties are generated with feature perturbations. Thus, the total loss for our uncertainty-guided dual-consistency learning UDC-Net for semisupervised lesion segmentation is as follows:

where L S is the supervised loss consists of a Dice loss and a cross entropy loss, α and β are two hyper-parameters weighing the contributions from different losses. During training, we first trained a supervised V-Net and then added the extra decoders for finetuning with uncertainty-guided consistency learning. The training process was terminated if the Dice coefficient on the validation dataset stagnated. Adam [10] was used as the optimizer with an initial learning rate of 0.001 and a learning decay rate of 0.95 per epoch. As widely adopted by SSL works [24, 21] , α and β were set to be two sigmoid-shape monotonically functions of the training steps with maximum of 1. The threshold τ m and τ s were set to 0.34 and 0.12 after tuning on the validation set. For testing, we carried out sliding window inference and took only the main decoder's prediction. All implementation was done with Pytorch [22] on an NVIDIA TITAN X GPU.

Datasets. In total, 852 chest CT volumes acquired from December 2019 to April 2020 were collected and enrolled in this study, among which 144 were voxel-annotated by four experienced radiologists. The labeled data were divided into: (1) 65 cases as labeled training dataset; (2) 9 cases as the validation set; and (3) 70 cases as the testing set. The remained 708 chest CT scans were used as the unlabeled training data.

Evaluation Metrics. We adopted Dice Score (DSC), Jaccard similarity cofficient (Jaccard), and Average Symmetric Surface Distance (ASD) to evaluate the segmentation performance.

We conduct ablation studies to analyze the contributions of our proposed methods, and the quantitative results can be seen in Table 1 . Regarding the testing set performance, image-level consistency (IC) shows increases of 2.4% in DSC, 2.5% in Jaccard, and 3.7 in ASD comparing to 3D V-Net. Meanwhile, feature-level consistency (FC) regularization leads to a large improvement of 4.5% in DSC, 5.5% in Jaccard, and 6.0 in ASD comparing to 3D V-Net. Unifying dual consistencies further improves DSC and Jaccard with about 1%, which demonstrates the effectiveness of learning from the unlabeled data. Further, introducing either the confidence uncertainty or the consensus uncertainty guidance consistently benefit the learning of the unlabeled data. Moreover, our method with dual uncertainty achieves better DSC and Jaccard with a comparable ASD to those of the single-uncertainty models, further demonstrating that dual uncertainties are complementary for guiding more robust learning. 

We compare our method against other state-of-the-art semi-supervised segmentation approaches. Several recent models were implemented, including Mean-Teacher (MT) [24] , Uncertainty-aware mean teacher [34], Transformation-consistent Self-ensembling Model (TCSM) [12] , and Cross Consistency Training (CCT) [21] . We run each methods four times with different random seeds.

Quantitative comparison results are reported in Table 2 . For a fair comparison, we implemented all methods with the 3D V-Net as backbone. As observed, UDC-Net outperforms all other methods with at least 1.8% in Dice, 2.2% in Jaccard, and 2.2 in ASD, showing outstanding unsupervised learning efficacy.

Qualitative comparison is illustrated by visualizing the segmentation results in Figure 2 . As demonstrated, Our UDC-Net delineates more accurate lesion contours than other methods regarding diverse shapes and sizes of lesion. Visualization of the two uncertainties can be found in the supplementary. 74.0 ± 0.11 60.1 ± 0.15 9.2 ± 0.9 TCSM [12] 72.9 ± 0.46 58.9 ± 0.58 9.1 ± 1.4 CCT [21] 75.6 ± 0.11 62.3 ± 0.19 6.1 ± 0.7 UDC-Net(ours) 77.4 ± 0.14 64.5 ± 0.15 3.9 ± 0.5 

We further evaluate our UDC-Net's effectiveness by varying the ratios of labeled and unlabeled training data. Table 3 shows that UDC-Net consistently improves the baseline V-Net with significant margins in both DSC, Jaccard, and ASD, whenever 32 or 65 labeled scans are provided. Moreover, the proposed approach consistently outperforms CCT [21] (the best model among those compared with ours) under all different scenarios. Notably, when less data are given, UDC-Net shows comparable or even better results than CCT. For instance, UDC-net achieves 75.0% DSC, 61.5% Jaccard, and 4.8 ASD with 32 labeled scans and 140 unlabeled scans (3rd row), which is comparable to the performance of CCT with double labeled scans (7th row). With 65 labeled scans and 140 unlabeled scans, UDC-Net (8th row) shows superior performance than CCT with 5 times unlabeled data (9th row). These findings demonstrate that our method enables more efficient unsupervised learning, suggesting

In this paper, we present an uncertainty-guided dual-consistency learning method for semi-supervised COVID-19 lesion segmentation from chest CT scans. Imagelevel transformation equivalence and feature-level perturbation invariance are both introduced to form dual consistency learning from unlabeled data. Meanwhile, the dual uncertainty mechanism further improves the learning process with more reliable and robust guidance. Extensive experiments on a large COVID-19 dataset demonstrate the efficiency of our method in real-world scenarios. Future work will include improving the method with more robust knowledge distillation and generalizing to other semi-supervised learning tasks. Table 4 : List of perturbations used in feature-level consistency learning.

Perturbations [21] Description Feature Noise A noise tensor N is applied to the output of encoder z to getz = z * N + z.

Generating a randomly dropout mask M drop to obtain perturbated z = z * M drop .

Object Masking Generating a object mask M obj using the output of main decoder to getz = z * M obj . Context Masking Generating a context mask Mcon = 1 − M obj to obtainz = z * Mcon.

Zero-out a random crop within each object's bounding box from the corresponding feature map z.

Using VAT to push the distribution to be isotropically smooth. Finding the adversarial perturbation r adv alter its prediction the most and injected into z to obtainz = r adv + z. [19] Random dropout Spacial dropout applied to z as a random perturbation 

Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases

Mixmatch: A holistic approach to semi-supervised learning

Semi-supervised learning

Hypergraph learning for identification of covid-19 with ct imaging

Inf-net: Automatic covid-19 lung infection segmentation from ct images

Sensitivity of chest ct for covid-19: comparison to rt-pcr

Clinical features of patients infected with 2019 novel coronavirus in wuhan, china

Development and evaluation of an artificial intelligence system for covid-19 diagnosis

Dual student: Breaking the limits of the teacher in semi-supervised learning

Adam: A method for stochastic optimization

Robust training with ensemble consensus

Transformationconsistent self-ensembling model for semisupervised medical image segmentation

Early triage of critically ill covid-19 patients using deep learning

Semi-supervised medical image classification with relation-driven self-ensembling model

Deep mining external imperfect data for chest x-ray disease screening

Active contour regularized semi-supervised learning for covid-19 ct infection segmentation with limited annotations

Artificial intelligence-enabled rapid diagnosis of patients with covid-19

V-net: Fully convolutional neural networks for volumetric medical image segmentation

Virtual adversarial training: a regularization method for supervised and semi-supervised learning

Deep learning covid-19 features on cxr using limited training data sets

Semi-supervised semantic segmentation with cross-consistency training

Pytorch: An imperative style, high-performance deep learning library

Deep co-training for semisupervised image recognition

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

A survey on semi-supervised learning

A noise-robust framework for automatic segmentation of covid-19 pneumonia lesions from ct images