key: cord-0223547-5opu6jrl authors: Yao, Qingsong; Xiao, Li; Liu, Peihang; Zhou, S. Kevin title: Label-Free Segmentation of COVID-19 Lesions in Lung CT date: 2020-09-08 journal: nan DOI: nan sha: 581f849115cbba1f533e4e75abaf1707aab00486 doc_id: 223547 cord_uid: 5opu6jrl Scarcity of annotated images hampers the building of automated solution for reliable COVID-19 diagnosis and evaluation from CT. To alleviate the burden of data annotation, we herein present a label-free approach for segmenting COVID-19 lesions in CT via pixel-level anomaly modeling that mines out the relevant knowledge from normal CT lung scans. Our modeling is inspired by the observation that the parts of tracheae and vessels, which lay in the high-intensity range where lesions belong to, exhibit strong patterns. To facilitate the learning of such patterns at a pixel level, we synthesize `lesions' using a set of surprisingly simple operations and insert the synthesized `lesions' into normal CT lung scans to form training pairs, from which we learn a normalcy-converting network (NormNet) that turns an 'abnormal' image back to normal. Our experiments on three different datasets validate the effectiveness of NormNet, which conspicuously outperforms a variety of unsupervised anomaly detection (UAD) methods. Recently, deep learning based methods have been proposed for COVID-19 lesion screening [2] and some of them are proved successful for COVID-19 segmentation [12] [13] [14] . Despite such success, they all rely on large-scale welllabeled datasets. However, obtaining such datasets is very difficult due to two related concerns. On the one hand, labeling a 3D CT volume is costly and time-consuming. Often it needs experienced radiologists, who are busy fighting the COVID-19 pandemic and hence lack time for lesion labeling. On the other hand, the COVID-19 lesions not only have a variety of complex appearances such as Ground-Glass Opacity (GGO), reticulation, and consolidation [15] , but also have high variations in texture, size, and position. Those diversities raise a greater demand for rich annotated datasets. Accordingly, large-scale well-labeled COVID-19 datasets are scarce, which limits the use of Artificial Intelligent (AI) to help fight against COVID-19. As reported in Table I , most of the public COVID-19 datasets focus on diagnosis which only have classification information, while only a few of them provide semantic segmentation labels. While research attempts [16] [17] [18] have been made to address the challenges, these works, nevertheless, still need annotated images for training purpose. In this paper, we present a label-free approach, requiring no lesion annotation. Although it is very difficult to build a large well-labeled COVID-19 dataset, collecting a large-scale normal CT volume dataset is much easier. It is also interesting to notice that the patterns of normal lungs are regular and easy to be modeled. The thorax of a normal person consists of large areas of air and a few tissues (such as tracheae and vessels [8] ), which can be clearly distinguished by CT intensity [8] . As shown in Fig. 1 , the air region is usually displayed as black background, with Chestxray [19] X-rays 434 Diagnosis COVID-CT [20] CT image 342 Diagnosis Patients Lungs [21] X-rays 70 Diagnosis Radiography [22] X-rays 219 Diagnosis SIRM-COVID [23] 2D CT image 340 Diagnosis POCOVID-Net [24] Ultrasound 37 Diagnosis SIRM-Seg [23] , [25] CT image 110 Segmentation Radiopedia [25] , [26] CT volume 9 Segmentation Coronacase [27], [28] CT volume 20 Segmentation Mosmed [29] CT volume 50 Diagnosis BIMCV [30] X-rays 10 Segmentation BIMCV [30] CT / X-rays 5381 Diagnosis its Hounsfield unit (HU) value around -1000 [8] . Meanwhile, the tissue (with its HU > −500 [8] ) has its intensity values similar to those of lesions, but it exhibits a regular pattern, which makes it amenable for modeling say by a deep network. This fact motivates us to formulate lesion segmentation as a pixel-level anomaly modeling problem. We hypothesize that if all the normal signals are captured at a pixel level, then the remaining abnormal pixels are localized automatically, which are grouped together as lesions. To facilitate pixel-level anomaly modeling, we propose to synthesize 'lesions' and insert them into normal CT images, forming pairs of normal and 'abnormal' images for training. Surprisingly, such 'lesion' synthesis procedure constitutes a few simple operations, such as random shape generation, random noise generation within the shape and traditional filtering. Using these training pairs, we train a deep imageto-image network such as 3D U-Net [31] that converts an 'abnormal' image into normal. We call our network as a normalcy-converting network (NormNet). The NormNet essentially learns a decision boundary between normal tissues (particularly the tissues in a high intensity range) and synthetic 'lesions'. We validate the effectiveness of NormNet on two different datasets. Empirically, it clearly outperforms various competing label-free approaches and its performances are even comparable to those of supervised method by some metrics. It should be noted that our approach differs from a research line called unsupervised anomaly detection (UAD) [32] [33] [34] [35] [36] , which aims to detect the out-of-distribution (OOD) data by memorizing and integrating anomaly-free training data and has been successfully applied in many image-level holistic classification scenarios. However, when applying the UAD methods for pixel-level image segmentation, their performances are rather limited [35] , which we will confirm in our experiments. Further, our method differs from those methods in the inpainting [37] task, whose images in both training and testing sets are contaminated by the masks (noises) from the same domain. Finally, our method is different from synthetic data augmentation [38] , which manually generates images according to the labeled lesion area. In contrast, we do not need any image with labeled COVID-19 lesions. In summary, we make the following contributions: • We propose the NormNet, a pixel-level anomaly mod-eling network, to distinguish the COVID-19 lesion from healthy tissues in the thorax area. This training procedure only needs a large-scale healthy CT lung dataset, without any labeled COVID-19 CT volume. • We design an effective strategy for generating synthetic 'lesions' using surprisingly simple operations such as random shape, noise generation, and image filtering. • The experiments show that our NormNet achieves better performances than various competing label-free methods on two different COVID-19 datasets. A. COVID-19 screening and segmentation for chest CT Deep learning based methods for chest CT greatly help COVID-19 diagnosis and evaluation [2] , [7] . Wang et al. [39] propose a weakly-supervised framework for COVID-19 classification at the beginning of the pandemic, which achieves high performance. Wang et al. [40] exploit prior-attention residual learning for more discriminative COVID-19 diagnosis. Ouyang et al. [41] solve the imbalanced problem of COVID-19 diagnosis by a dual-sampling attention network. However, it is more difficult for the COVID-19 segmentation task due to the lack of well-labeled data [18] , lesion diversities [15] and noisy labels [17] . Researchers have made attempts to address the above challenges. For example, to tackle the problem of labeled data scarcity, Ma et al. [28] annotate 20 CT volumes from coronacases [27] and radiopedia [26] . Fan et al. [18] propose a semi-supervised framework called Inf-Net. Zhou et al. [16] solve the same issue by fitting the dynamic change of real patients data measured at different time points. However, all of these models depend on data with semantic labels. In this work, we propose an unsupervised anomaly modeling method called NormNet, which achieves comparable performances, but with no need of labeled data. Anomaly detection or outlier detection is a lasting yet active research area in machine learning [42] , which is a key technique to overcome the data bottleneck [43] . A natural choice for handling this problem is one-class classification methods, such as OC-SVM [44] , SVDD [45] , Deep SVDD [46] and 1-NN. These methods detect anomaly by clustering a discriminate hyper-lane surrounding the normal samples in the embedding space. However, these methods can only detect anomaly in imagelevel. In medical imaging analysis, it is also important to find the abnormal area [43] , [47] . Recently, CNN-based generative models such as Generative Adversarial Networks (GAN) [48] , and Variational Auto-encoders (VAE) [49] are proved essential for unsupervised anomaly segmentation [50] . These methods first capture the normal distribution by learning a mapping between the normal data and a low-dimensional latent space by reconstruction loss. They assume that if this process is only trained with normal distributions, a lesion area with abnormal shape and context can not be correctly mapped and reconstructed, resulting in high reconstruction error, which helps to localize the lesion area. The f-AnoGAN method [ [51] learns the projection by solving an optimization problem, while VAE [49] tackles the same problem by penalizing the evidence lower bound (ELBO). Several extensions such as context encoder [52] , constrained VAE [53] , adversarial autoencoder [53] , GMVAE [54] , Bayesian VAE [55] and anoVAEGAN [56] improve the accuracy of the projection. Based on the pretrained projection, You et al. [54] restore the lesion area by involving an optimization on the latent manifold, while Zimmerer et al. [43] locate the anomaly with a term derived from the Kullback-Leibler (KL)-divergence. Despite the success of these methods for the classification tasks [57] , [58] , their segmentation performances are insufficient [35] . The assumptions used by those reconstructionbased methods are shown to be problematic [34] , [59] . Firstly, the calibrated likelihoods of the decoder may not be precise enough [60] . The out-of-distribution data have some possibilities to be successfully reconstructed [61] , which raises falsenegatives. Furthermore, the reconstruction is far from perfect [60] , [62] . The decoder can not reconstruct all of the details of normal data precisely, which may cause false positives. As a result, these anomaly segmentation methods have limited segmentation performance, as indicated in the brain tumor segmentation task [35] . Moreover, specifically in lung CT, as some of the tissues are very small and appear irregularly, their information is easily lost during the down-sampling process of the encoder [63] , which causes more segmentation errors. The design of our method is to alleviate such issues. Firstly, we choose a 3D U-Net [31] as our encoder-decoder structure, and use the skip connection of U-Net to alleviate the loss of information. Next, to avoid inaccurate modeling, we generate a segmentation map from the original healthy CT and compute the loss based on it directly. At last, to encourage our NormNet to learn a decision boundary for healthy signals, we use synthetic lesions as anomalies. In this section, we firstly introduce the overall framework of our NormNet. Then we illustrate how to generate diverse 'lesions' in the given lung mask. Finally, we clarify how to post-process the lesion results predicted by our NormNet to obtain the final lesion mask for an unseen test image. Let {R 1 , R 2 , · · · , R T } be a set of T healthy lung CT images. We clip the raw image R i with an HU range of [−800, 100] and scale the clipped image to [0, 1], obtaining R i . As shown in Fig. 2 , our methods firstly use CNNbased lung segmentation method to obtain the lung masks {M 1 , M 2 , · · · , M T } and the thorax areas stands for pixel-wise multiplication. It is worth noting that because no segmentation model can achieve 100% accuracy, and there are always some edges caused by segmentation errors left in the thorax area H i , we introduce a simple pre-processing step (in Section III-B) to remove erroneous edges and generate a new lung mask M i . Finally the thorax areas are updated to Then we use the synthetic 'lesion' generator described in Section III-C to synthesize various 'lesions' B within the lung masks M i with diverse shapes G and textures, and inject them into the thorax area H i to form the input A i . Because the tissue patterns in the high-intensity range (say HU≥ T with the threshold T = −500) in normal images are rather distinguishable from that of lesions, we concentrate on processing within this range and compute ground truth as where π(.) is an indicator function that produces a binary mask. Note that the value of τ in H i is equivalent to the HU threshold; for example, T = −500 means τ = 0.33. Our NormNet is learned to predict the healthy part from A i via encouraging it to be close to GT i (aka minimizing Dice loss and cross-entropy loss). In this procedure, our NormNet learns to capture the context of healthy tissues quickly and precisely. When our NormNet is applied to an unseen COVID-19 CT volume, it recognizes the healthy part of the volume with a high confidence and the lesion part of the volume with a low confidence. The confidence scores thus can be used as a decision boundary to predict the healthy parts and lesions. Because our training process is random, we learn the 5 models under the same setting to form an ensemble. A majority-vote for healthy parts is conducted as the final prediction. As our method is trained by the ground truth whose HU≥ T , a small number of lesion pixels whose HU< T are not taken into consideration and might get missed. So, we grow the localized lesion areas to bring them back, following the post-processing step in Section III-D. As mentioned above, this step is to separate the wrong edges caused by segmentation errors from lung mask M i . For a pair of inputs {M i , H i }, we select all the connected areas in thorax area H i with most of the pixels lying on the edges of the lung segmentation mask M i , and mark them as the wrong edges E i . To avoid injecting noise into those edges, we use the lung mask without those edges, formulated as M i = M i − E i . Note that we only launch this process in the training phase, leveraging the fact that no lesion occurs inside a healthy volume. As shown in Fig. 3 , the generator constitutes a set of simple operations, following the two steps: (i) generating lesionlike shapes and (ii) generating lesion-like textures. Below, we elaborate each step. 1) Generating lesion-like shapes: Multiple COVID-19 lesions may exist in a CT scan and they have various shapes. To obtain multiple lesion-like shapes with a CT, we propose the following pipeline. Below, U [a, b] denotes a uniform probability within the range [a, b]. • For each lung mask M i with a shape of size [32, 512, 512] , compute a factor λ = Mi maxj Mj as the fraction of the lung mask M i comparing to the one with maximum volume. This factor controls the number of ellipsoids being generated with a larger λ likely yielding more ellipsoids. • Create several ellipsoids as follows: (1) (3) Generate a large size ellipsoid with a probability of 0.2λ and with its radius ∼ U [32, 64] . • For each generated ellipsoid, deform it using elastic transformation [64] with random parameters and rotate it to align with random axes, yielding a blob C. Then position this blob at a random center inside the lung H i . At this stage, we have a set of blobs {C 1 , C 2 , . . .}. Then we merge connected blobs and obtain several non-adjacent blobs {G 1 , G 2 , . . .} with varying shapes. For each blob G j , we synthesize a patch of lesion B j by the following steps. 2) Generating lesion-like textures: The texture pattern of lesions varies; thus it is challenging to generate lesion-like textures. Below we outline our attempt of doing so using a set of simple operations. It should be noted that our method is far from prefect; nevertheless, we find it is empirically effective. We follow a series of three steps, namely noise generation, filtering, and scaling/clipping operations, to generate the lesion-like textures. • Noise generation. For each pixel denoted by x, generate where the pixel-dependent probability function a(x) will be defined later. using a Gaussian filter g with a standard deviation σ b . where ⊗ is the standard image filtering operator. The standard deviation σ b is randomly sampled as follows: ] with a probability of 0.7; U [ 2, 5] with a probability of 0.3. . • Scaling and clipping. This yields the lesion-like pattern B j (x). with β being the scaling factor that is obtained by where µ 0 ∼ U [0.4, 0.8] and mean t (f (x)) is the mean intensity of the image f (x) that passes the threshold t. Now, we describe how to obtain the pixel-dependent probability function a(x), again using a series of noise generation, filtering, and scaling operations. using a Gaussian filter g with a standard deviation σ a . where the standard deviation σ a ∼ U [2, 20] . • Scaling. This yields the desired function a(x). where a U ∼ U [0, 0.3], a L ∼ U [0, 0.3] and a U − a L > 0.15. Finally, we inject the synthetic lesions B j into the various blobs G j , and place these blobs at random centers inside the lung area H i . Mathematically, the image A i with synthetic 'lesions' is generated by finding the maximum value of the lung area H i and the synthetic lesions B j at each pixel point: Our goal is to learn a network that takes A i as input and outputs GT i . A post processing procedure is designed to obtain the final lesion prediction based on difference between the original CT volume and predicted healthy areas. As illustrated in Fig. 4 , the final prediction is obtained with the following steps: • Compute the lung mask ( Fig. 4(b) ) and predict the healthy part by NormNet (Fig. 4(c) ); • Compute the lesion region by subtracting the predicted healthy part from lung mask to get Fig. 4(d) . Considering that only bright pixels ≥ τ are in the lung mask, the fullpixel raw lesion areas (Fig. 4(f) ) is calculated, aiming to 'recover' less bright lesions; • Mean filtering F with kernel size k is then applied to Figs. 4(d) and 4(f) to smooth the lesion region and then remove the background noise via thresholding, which obtains the results in Fig. 4 (e) and 4(g), respectively; The illustration of the post-processing process.This step removes the healthy part from the COVID-19 CT volume and generate final prediction by mean filtering and growing. • A region growing algorithm is applied to obtain the final predicted regions, which firstly expands the lesion regions of Fig. 4(f) , and then removes the pixels out of the full pixel lesion regions defined by Fig. 4(g) . Below we firstly provide a brief description of the various CT lung datasets used in our experiments. Then we present our experimental settings and the baseline approaches we implement and compare. Finally, we show our main experimental results and an ablation study. One distinguishing feature of the paper lies in unleashing the power embedded in existing datasets. Rather than using a single dataset, we seamlessly integrate multiple CT lung datasets for two different tasks of healthy lung modeling, COVID-19 lesion segmentation, and general-purpose lung segmentation into one working solution. 1) CT datasets for healthy lung modeling: LUNA16 [65] is a grand-challenge on lung nodule analysis. The images are collected from The Lung Image Database Consortium image collection (LIDC-IDRI) [66] , [67] , [69] , and each image is labeled by 4 experienced radiologists. As half of the images are healthy and clean except for those contain nodule areas, we select 453 CT volumes from LUNA16 and remove the slices with nodules to formulate our healthy lung CT dataset. 2) CT datasets for COVID-19 lesion segmentation : To measure the performance of our methods towards COVID-19 segmentation, we choose two public COVID-19 CT segmentation datasets in the Table I with semantic labels. It is worth noting that our method segments the COVID-19 lesions under the unsupervised setting, and thus the labeled datasets are only used for testing. All of the CT slices have been resized to 512 × 512. • Coronacases: There are 10 public CT volumes in the [27] uploaded from the patients diagnosed with COVID-19. These volumes are firstly delineated by junior annotators 1 , and then refined by two radiologists with 5 years experience, and finally, all the annotations are verified and refined by a senior radiologist with more than 10 years experience in chest radiology diagnosis [28] . • Radiopedia: Another 8 axial volumetric CTs are released from Radiopaedia [26] and have been evaluated by a radiologist as positive and segmented [25] . To obtain the accurate lung area in the CT volume, we choose nnU-Net [68] as our lung segmentation method, which is proved to be state-of-the-art segmentation framework in medical imaging analysis. We use two lung CT datasets with semantic labels for the lung region: [28] . We choose 2D U-Net as the backbone. The model is trained by nnU-Net [68] in 5-fold cross-validation, which segments the lung region very precisely with Dice scores larger than 0.98 in both Coronacases and Radiopedia datasets. B. Experimental settings 1) Evaluation metrics: We use several metrics widely used to measure the performance of segmentation models in medical imaging analysis, including precision score (PSC), sensitivity (SEN) and Dice coefficient (DSC), which are formulated as follows: where tp, f p and f n refer to the true positive, false positive and false negative respectively. 2) Pre-processing: All of the images in the training and testing sets are segmented for the lung region at first. Then we unify their spacing to 0.8 × 0.8 × 1mm 3 , as well as orientation. Next, all of the images are clipped with window range [−800, 100] and normalized to [0, 1]. Finally, the lung regions are centralized and padded to 512 × 512 with 0. 3) Training and inference details: We choose 3D U-Net [31] as backbone for NormNet, implemented by MONAI 2 . As all of the volumes in both training and testing phases are well aligned, no more augmentation is needed. The NormNet is trained on a TITAN RTX GPU and optimized by the Adam optimizer with default settings. We train our network for 3500 iterations with a batch size of 8, and set the learning rate to 3e-4. For the testing phase, as the contexts of healthy signals are precisely captured by our NormNet, these signals are predicted with high probability. Therefore, we select those pixels with probability > 0.95 as healthy parts in the COVID-19 CT volume. For the mean filtering in the post processing, we set kernel sizes f to 9, 7 and thresholds to 0.2, 0.15 for lesion parts with bright pixels (Fig. 4d ) and full pixels (Fig. 4f) , respectively. We obtain these values according to the hyperparameter search, which are fixed to all of two COVID-19 datasets. We compare our methods with existing deep learning based methods in medical imaging analysis for unsupervised anomaly detection (UAD) methods to evaluate the effectiveness of our approach. To eliminate the influence of irrelevant factors, we use the images with only lung regions as training and testing sets for all of the experiments (expect for VAE Original). These encoder-decoder based methods are trained with a learning rate of 3e-4 and a batch size of 16 for 6000 iterations. To obtain the best performance for each method, we perform a greedy search up to two decimals to get the threshold with best Dice score for each COVID-19 dataset. • AE: An Autoencoder with a dense bottleneck z ∈ R 128 . • VAE [49] : As the reconstruction is more difficult for lung CT images, so we set α for KL loss as 1e-6 to make the reconstruction easily. • VAE Spatial [56] : A Variational Autoencoder with a spatial bottleneck z ∈ R 8×8×128 . • VAE Original: A Variational Autoencoder trained with the raw lung CT images without lung segmentation. • Context VAE [52] : Force the encoder to capture more information by inpainting cropped input. We set crop size to 32. • Constrained VAE [53] : Map the reconstructed image to the same point as the input in the latent space. • GMVAE [54] : Replace the mono-modal prior of the VAE with a Gaussian mixture [35] . • Bayesian VAE [55] : Aggregate a consensus reconstruction by Monte-Carlo dropout. The dropout rate is 0.2. • KL Grad [43] : Use the gradient map for KL loss to segment anomalies. • VAE restoration [54] : Restore the abnormal input to decrease the evidence lower bound (ELBO). The restoration part is marked as the detected abnormal area. • f-AnoGAN [50] : To keep the training process of f-Anogan stable, we resize the lung image to [64, 64] after center crop. In order to reveal the top-line for each dataset, we train nnU-Net [68] in 5-fold cross-validation. Furthermore, to test the performance of the supervised model when inferring unseen datasets, we train nnU-Net on two COVID-19 datasets and test on the left one, called nnU-Net-Unseen. Our NormNet firstly votes for the healthy tissues from the CT volumes with COVID-19 lesions. To test the performance of our NormNet, we collect all bright pixels with τ ≥ 0.33 of the CT volumes. As in Table II , our method successfully distinguishes the COVID-19 lesion parts and healthy parts with AUC larger than 85%. When we choose the prediction threshold as 0.95, the high specificity ensures that most of the lesions are treated as anomaly. Then, the post-processing procedure grows the lesion area to contain more lesions with less bright pixels (τ < 0.33). We also use mean filtering in the post-processing to remove the isolated healthy pixels that are segmented as anomaly, as shown in Fig. 5c . Therefore, our method reaches the Dice scores of 68.7%, 59.4% 3 and 69.7% (shown in Table III ) in the two different COVID-19 datasets respectively, which are significantly ahead of other unsupervised anomaly detection methods. The visual results shown in Fig. 5 reveal that most of the COVID-19 lesions are successfully (green area) segmented by our NormNet. On the contrary, the other unsupervised anomaly detection methods have limited power to segment COVID-19 lesion. As shown in Fig. 6 , due to the inaccurate reconstructions, the reconstruction-based methods such as VAE [49] and f-AnoGAN [50] can not reconstruct the tissues precisely. Such a reconstruction error greatly affects the segmentation performance of COVID-19 lesions. On the other hand, the encoder can not make sure to treat the COVID-19 lesion as anomaly, and suppress the lesion in the reconstruction results. Thus the KL-grad [43] and restoration [54] have less effect either. These two serious shortcomings result in low COVID-19 3 we remove CT volume #6 from the Radiopedia dataset as it has only about 70 positive pixels in 42 slices. segmentation performances, reported in Table III. E. Ablation study 1) Voting: To explore the effects of randomness in the training process, we evaluate the performances of the five models and their voting results with different number of iterations. As shown in Table IV , the performances of the five models oscillate as the iteration increases, while the NormNet greatly alleviates this problem through the voting mechanism of 5 models. 2) Modules of synthetic 'lesion' generator: The steps of synthetic 'lesion' generator can be roughly divided into three parts: Generate shapes (G j in Section III-C.1), probability maps (a i in (8)), and salt noises (B i in (6)). To investigate the influence of each part, we train a new NormNet without the corresponding diversity. To eliminate the diversity of shapes, we generate 5 ellipsoids with radius = 12 for any lung area H i without any deformation. For probability maps, we set probability = 2. At last, we set σ b = 2 and µ 0 = 150 for synthetic salt noises with the same texture. As shown in Table V , the loss of diversities affects the accuracy of the decision boundary and the segmentation performance. Especially, the salt noises filtered and scaled by fixed parameters have limited contexts, which are easily learned by the NormNet, resulting in extremely inaccurate decision boundary. Thus, our various synthetic 'lesions' can force the NormNet to learn a decision boundary for healthy tissues, which can be further used to segment COVID-19 lesions. Fig. 7 . The visualization of masks under different HU thresholds. Many noisy pixels with complex contexts occur when setting the threshold as T = −700. We use a colormap for better visualization of the nuances. 3) Hyparameter analysis: The threshold of HU is important in our method, since it filters the background noises while trying to keep the pattern complexity at a level that can be effectively managed by the network. On the one hand, if the threshold is too high, our NormNet only segments healthy pixels in a small-scale set, which causes more abnormal pixels missing. On the other hand, if the threshold is too small, some noisy pixels with complex contexts (as shown in Fig. 7) are segmented by the NormNet. It raises the difficulty and turns the NormNet to capture the features of synthetic 'lesions' instead of healthy tissues, as we can not make sure the contexts of synthetic 'lesions' are the same to COVID-19 lesions, the NormNet overfits the synthetic 'lesions' and can not segment COVID-19 successfully. As in Table VI , the performance drops rapidly when the HU threshold T = −700. In this paper, we propose the NormNet, a pixel-level anomaly modeling network to turn an 'abnormal' volume back to normal. A decision boundary for normal parts of the NormNet is learned by segmenting healthy tissues from the diverse synthetic 'lesions', which can be further used to segment COVID-19 lesions, without training on any labeled data. The experiments on two different COVID-19 datasets validate the effectiveness of the NormNet. Despite the improvement compared to existing unsupervised anomaly detection methods, there is still a gap between our methods and supervised methods such as nnU-Net [68] . After exploring the failure predictions of our methods, we find that they are divided into three categories: 1) Some anomalies such as pulmonary fibrosis (the first row shown in Fig. 8 ) are treated as COVID-19 lesions. 2) Gaps between datasets: for example, most of the layer thicknesses in Luna16 dataset are around 1mm. However, in Radiopedia dataset slices are padded together, which generate different contexts. The unseen contexts are treated as anomalies by our NormNet, which results in the most of false-positives in Radiopedia dataset. 3) Our method is only sensitive to the pixels with values larger than τ . Although most of lesions can be successfully detected, a small part of lesions with pixels smaller than τ are still missed (as shown in the right column of Fig. 8 ). These small lesions also serve a difficult problem for both supervised methods [17] and anomaly detection. In future, we plan to extend our method to address the above limitations and explore the possibility of applying the 'lesion' generator for segmentation in non-thoracic regions. A novel coronavirus outbreak of global health concern Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19 Coronavirus disease (COVID-19) Situation Report 164 Coronavirus disease 2019 (covid-19): A perspective from china Sensitivity of chest CT for COVID-19: Comparison to RT-PCR Imaging profile of the COVID19 infection: Radiologic findings and literature review The role of chest imaging in patient management during the COVID-19 pandemic: A multinational consensus statement from the fleischner society Computed Tomography Studies of Lung Mechanics Deep learning for medical image analysis Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches Multi-stage learning for robust lung segmentation in challenging CT volumes Serial quantitative chest CT assessment of COVID-19: Deep-learning approach Lung infection quantification of COVID-19 in CT images with deep learning Longitudinal assessment of COVID-19 using a deep learningbased quantitative CT pipeline: Illustration of two cases Imaging profile of the COVID-19 infection: Radiologic findings and literature review A Rapid, Accurate and Machine-agnostic Segmentation and Quantification Method for CT-based COVID-19 Diagnosis A Noise-robust Framework for Automatic Segmentation of COVID-19 Pneumonia Lesions from CT Images Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Images COVID-19 image data collection COVID-CT-Dataset: a CT scan dataset about COVID-19 COVID-19 Patients Lungs X Ray Images 10000 Can AI help in screening Viral and COVID-19 pneumonia Italian Society of Medical and Interventional Radiology COVID-19 dataset POCOVID-Net: Automatic Detection of COVID-19 From a New Lung Ultrasound Imaging Dataset (POCUS) COVID-19 CT segmentation dataset Towards Efficient COVID-19 CT Annotation: A Benchmark for Lung and Infection Segmentation BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation Anomaly detection: A survey Anomaly Detection in Medical Image Analysis Deep autoencoding gaussian mixture model for unsupervised anomaly detection Autoencoders for Unsupervised Anomaly Segmentation in Brain MR Images: A Comparative Study Normal Appearance Autoencoder for Lung Cancer Detection and Segmentation Context Encoders: Feature Learning by Inpainting nthetic data augmentation using GAN for improved liver lesion classification A Weakly-supervised Framework for COVID-19 Classification and Lesion Localization from Chest CT Prior-Attention Residual Learning for More Discriminative COVID-19 Screening in CT Images Dual-Sampling Attention Network for Diagnosis of COVID-19 from Community Acquired Pneumonia Deep Learning for Anomaly Detection: A Review Unsupervised anomaly localization using variational auto-encoders One-class svm for learning in image retrieval Support Vector Data Description Deep One-Class Classification Exploiting Epistemic Uncertainty of Anatomy Segmentation for Anomaly Detection in Retinal OCT Generative Adversarial Networks Auto-Encoding Variational Bayes f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks Unsupervised anomaly detection with generative adversarial networks to guide marker discovery Context-encoding variational autoencoder for unsupervised anomaly detection Unsupervised detection of lesions in brain MRI using constrained adversarial auto-encoders Unsupervised lesion detection via image restoration with a normative prior Unsupervised lesion detection in brain ct using bayesian convolutional autoencoders Deep Autoencoding Models for Unsupervised Anomaly Segmentation in Brain MR Images Deep anomaly detection with deviation networks Viral Pneumonia Screening on Chest X-ray Images Using Confidence-Aware Anomaly Detection Do Deep Generative Models Know What They Dont Know?" in ICLR Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields Analysis and Recognition of Medical Images: 1. Elastic Deformation Transformation LUNA16 Data From LIDC-IDRI The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans Automated Design of Deep Learning Methods for Biomedical Image Segmentation The cancer imaging archive (tcia): Maintaining and operating a public information repository Data from thethoracic volume and pleural effusion segmentations in diseased lungs for benchmarking chest ct processing pipelines Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach