key: cord-0537870-ry4px337
authors: Saparov, Talgat; Kurmukov, Anvar; Shirokikh, Boris; Belyaev, Mikhail
title: Zero-Shot Domain Adaptation in CT Segmentation by Filtered Back Projection Augmentation
date: 2021-07-18
journal: nan
DOI: nan
sha: dc187e314b2b463763ec5f9ae432f8c8b996fdb8
doc_id: 537870
cord_uid: ry4px337

Domain shift is one of the most salient challenges in medical computer vision. Due to immense variability in scanners' parameters and imaging protocols, even images obtained from the same person and the same scanner could differ significantly. We address variability in computed tomography (CT) images caused by different convolution kernels used in the reconstruction process, the critical domain shift factor in CT. The choice of a convolution kernel affects pixels' granularity, image smoothness, and noise level. We analyze a dataset of paired CT images, where smooth and sharp images were reconstructed from the same sinograms with different kernels, thus providing identical anatomy but different style. Though identical predictions are desired, we show that the consistency, measured as the average Dice between predictions on pairs, is just 0.54. We propose Filtered Back-Projection Augmentation (FBPAug), a simple and surprisingly efficient approach to augment CT images in sinogram space emulating reconstruction with different kernels. We apply the proposed method in a zero-shot domain adaptation setup and show that the consistency boosts from 0.54 to 0.92 outperforming other augmentation approaches. Neither specific preparation of source domain data nor target domain data is required, so our publicly released FBPAug can be used as a plug-and-play module for zero-shot domain adaptation in any CT-based task.

Computed tomography (CT) is a widely used method for medical imaging. CT images are reconstructed from the raw acquisition data, represented in the form of a sinogram. Sinograms are two-dimensional profiles of tissue attenuation as a function of the scanner's gantry angle. One of the most common reconstruction algorithms is Filtered Back Projection (FBP) [14] . This algorithm has an important free parameter called convolution kernel. The choice of a convolution kernel defines a trade-off between image smoothness and noise level [13] . Reconstruction with a high-resolution kernel yields sharp pixels and a high noise level. In contrast, usage of a lower-resolution kernel results in smooth pixels and a low noise level. Depending on the clinical purpose, radiologists use different kernels for image reconstruction.

Modern deep neural networks (DNN) are successfully used to automate computing clinically relevant anatomical characteristics and assist with disease diagnosis. However, DNNs are sensitive to changes in data distribution which are known as domain shift. Domain shift typically harms models' performance even for simple medical images such as chest X-rays [16] . In CT images, factors contributing to domain shift include [5] slice thickness and inter-slice interval, different radiation dose, and reconstruction algorithms, e.g., FBP parameters. The latter problem is a subject of our interest.

Recently, several studies have reported a drop in the performance of convolutional neural networks (CNN), trained on sharp images while being tested on smooth images [1, 7, 8] . Authors of [12] proposed using generative adversarial networks (GAN) to generate realistic CT images imitating arbitrary convolution kernels. A more straightforward approach simultaneously proposed in [8] , [1] , and [7] suggests using a CNN to convert images reconstructed with one kernel to images reconstructed with another. Later, such image-to-image networks can be used either as an augmentation during training or as a preprocessing step during inference.

Such approaches are very intuitive: convolution operations used in CNN are the same as in FBP. However, these seemingly similar convolutions process different images: FBP operates on sinograms, whereas CNN layers work with reconstructed CT images. Moreover, local convolutions in sinogram space become global in the image space during reconstruction. In other words, to emulate a convolution kernel applied to sinograms, one needs to use a full CNN with a wide enough receptive field instead of a single convolutional layer.

Alternatively, several papers assess the impact of different, more straightforward augmentation techniques, including windowing [5, 6] , gamma correction [15] , image normalization [2] , and image filtering [10] . Two latter methods were specifically proposed to address the differences in the convolutional kernels using images in the pixel domain. Although these methods can be applied to any domain adaptation problem in a zero-shot setup, their simplicity limits the possible generalization effect.

In this study, we aim to take the best of both approaches, achieving a high level of generalization with a physics-driven augmentation procedure. We propose FBPAug, a new augmentation method based on the FBP reconstruction algorithm. This augmentation mimics processing steps used in proprietary manufacturer's reconstruction software. We initially apply Radon transformation to all training CT images to obtain their sinograms. Then we reconstruct images using FBP but with different randomly selected convolution kernels. To show the effectiveness of our method, we compare segmentation masks obtained on a set of paired images, reconstructed from the same sinograms but with different con- Fig. 1 : Bland-Altman plot showing prediction agreement using FBPAug (proposed augmentation, red) and next best competitor (Gaussian noise, blue). Agreement is measured between predictions on paired images reconstructed with soft and hard convolution kernels from Mosmed-private dataset. Difference in image pairs were always computed as Volume soft − Volume sharp . volutional kernels. These paired images are perfectly aligned; the only difference is their style: smooth or sharp. We make our code and results publicly available, so the augmentation could be easily embedded into any CT-based CNN training pipeline to increase its generalizability to smooth-sharp domain shift.

In this section, we detail our augmentation method, describe quality metrics, and describe datasets which we use in our experiments.

Firstly, we give a background on a discrete version of inverse Radon Transform -Filtered Back-Projections algorithm. FBP consists of two sequential operations: generation of filtered projections and image reconstruction by the Back-Projection (BP) operator.

Projections of attenuation map have to be filtered before using them as an input of the Back-Projection operator. The ideal filter in a continuous noiseless case is the ramp filter. Fourier transform of the ramp filter

The image I(x, y) can be derived as follows:

where * is a convolution operator, t = t(x, y) = x cos θ + y sin θ and κ(t) is the aforementioned ramp filter.

Assume that a set of filtered-projections p θ (t) available at angles θ 1 , θ 2 , ..., θ n , such that θ i = θ i−1 + ∆θ, i = 2, n and ∆θ = π/n. In that case, BP operator transforms a function f θ (t) = f (x cos θ + y sin θ) as follows:

In fact, κ(t) that appears in (1) is a generalized function and cannot be expressed as an ordinary function because the integral of |w| in inverse Fourier transform does not converge. However, we utilize the convolution theorem that states that F(f * g) = F(f ) · F(g). And after that we can use the fact that the BP operator is a finite weighted sum and Fourier transform is a linear operator as follows:

However, in the real world, CT manufacturers use different filters that enhance or weaken the high or low frequencies of the signal. We propose a family of convolution filters k a,b that allows us to obtain a smooth-filtered image given a sharp-filtered image and vice versa. Fourier transform of the proposed filter is expressed as follows:

Thus, given a CT image I obtained from a set of projections using one kernel, we can simulate the usage of another kernel as follows:

Here, a and b are the parameters that influence the sharpness or smoothness of an output image and R(I) is a Radon transform of image I. The output of the Radon transform is a set of projections. Fig. 2 shows an example of applying sharping augmentation on a soft kernel image ( Fig. 2(a) to (c)) and vice versa: applying softening augmentation on a sharp kernel image ( Fig. 2(b) to (d)).

We compare the proposed method with three standard augmentations: gamma transformation (Gamma), additive Gaussian noise (Noise), and random windowing (Windowing), the technique proposed by [5] . As a baseline method, we train a network without any intensity augmentations (Baseline).

Gamma, augments images using gamma transformation:

where M = max(I(x, y)), m = min(I(x, y)) with a parameter γ, such that we randomly sample logarithm of γ from N (0, 0.2) distrubution.

Noise is the additive gaussian noise from N (0, 0.1) distribution.

Windowing make use of the fact that different tissue has diferent attenuation coefficient. We uniformly sample the center of the window c from [−700, −500] Hounsfield units (HU) and the width of the window w from [1300, 1700] HU. Then we clip the image to the [c − w/2, c + w/2] range using following formula:

FBPAug parameters were sampled as follows. We uniformly sample a from In all experiments, we zoom images to 1 × 1 mm pixel size and use additional rotations and flips augmentation. With probability 0.5 we rotate an image by multiply of 90 degrees and flip an image horizontally or vertically.

We report our results on two datasets: Mosmed-1110 and a private collection of CT images with COVID-19 cases (Covid-private). Both datasets include chest CT series (3D CT images) of healthy subjects and subjects with the COVID-19 infection. 

The dataset consists of 1110 CT scans from Moscow clinics collected from 1st of March, 2020 to 25th of April, 2020 [9] . The original images have 0.8 mm inter-slice distance, however the released studies contain every 10th slice so the effective inter-slice distance is 8 mm. Mosmed-1110 contains only 50 CT scans that are annotated with the binary masks of ground-glass opacity (GGO) and consolidation. We additionally ask three experienced radiologists to annotate another 46 scans preserving the methodology of the original annotation process. Further, we use the total of 96 annotated cases from Mosmed-1110 dataset.

Covid-private All images from Covid-private dataset are stored in the DICOM format, thus providing information about corresponding convolution kernels. The dataset consists of paired CT studies (189 pairs in total) of patients with COVID-19. In contrast with many other datasets, all of studies contain two series (3D CT images); the overall number of series is 378 . Most importantly, every pair of series were obtained from one physical scanning with different reconstruction algorithms. It means the slices within these images are perfectly aligned, and the only difference is style of the image caused by different convolutional kernels applied. Data include paired images with the following kernels (number in brackets is pairs count): SOFT, LUNG (40); FC07, FC55 (40); B31f, B70f (38); FC07, FC51 (27); B31s, B60s (23); LUNG, STANDARD (21). First kernel in each pair is soft, second is sharp. Images were obtained from CT scanners of the following manufacturers: GE (LUNG, STANDARD, SOFT), Toshiba (FC07, FC51, FC55), Siemens (B31f, B60s, B70f).Covid-private does not contain ground truth mask of GGO or consolidation, thus we only use it to test predictions agreement.

For the comparison, we use the standard segmentation metric, Dice Score. Dice Score (DSC) of two volumetric binary masks X and Y is computed as DSC = 2|X∪Y | |X|+|Y | , where |X| is the cardinality of a set X.

Furthermore, we perform statistical analysis ensure significance of the results. We use one-sided Wilcoxon signed-rank test as we consider DSC scores for two methods are paired samples. The null hypothesis of Wilcoxon test is H 0 : P(X > Y ) = P(X < Y ), and the alternative is H 1 : P(X > Y ) > P(X < Y ). Here, P(X > Y ) is the probability of an observation from population X exceeding an observation from population Y .

To adjust for multiple comparisons we use Bonferroni correction.

To evaluate our method, we conduct two set of experiments for COVID-19 segmentation.

First, we train five separate segmentation models: baseline with no augmentations, FBPAug, Gamma, Noise, and Windowing on a Mosmed dataset to check if any augmentation result in significantly better performance. Mosmed is stored in Nifti format and does not contain information about the kernels. Thus, we use it to estimate the in-domain accuracy for COVID-19 segmentation problem.

Second, we use trained models from the previous experiment to make a predictions on a paired Mosmed-private dataset. We compare masks within each pair of sharp and soft images using Dice score to measure prediction agreement for the isolated domain shift reasons, as the only difference between images within each pair is their smooth or sharp style, see Fig. 3 .

For all our experiments, we use a slightly modified 2D U-Net [11] . We prefer the 2D model to 3D since in the Mosmed-1110 dataset images have 8 mm interslice distance and the inter-slice distance of Covid-private images is in the range from 0.8 mm to 1.25 mm. Furthermore, the 2D model shows performance almost equal to the performance of the 3D model for COVID-19 segmentation [3] . In all cases, we train the model for 100 epochs with a learning rate of 10 −3 . Each epoch consists of 100 iterations of Adam algorithm [4] . At each iteration, we sample a batch of 2D images with batch size equals to 32. The training was conducted on a computer with 40GB NVIDIA Tesla A100 GPU. It takes approximately 5 hours for the experiments to complete. [3] . Next we observe a significant disagreement in predictions on paired (smooth and sharp) images for all methods, except FBPAug (p-values for Wilcoxon test for FBPAug vs every other method are all less than 10 −16 ). For FBPAug and its best competitor we plot a Bland-Altman plot, comparing GGO volume estimates Fig. 1 . We can see that prediction of FBPAug model agree independent of the volume of GGO.

Deep learning-based image conversion of ct reconstruction kernels improves radiomics reproducibility for pulmonary nodules or masses

Normalizing computed tomography data reconstructed with different filter kernels: effect on emphysema quantification

Ct-based covid-19 triage: Deep multitask learning improves joint identification and severity quantification

Adam: A method for stochastic optimization

Domain-specific cues improve robustness of deep learning-based segmentation of ct volumes

Practical window setting optimization for medical image deep learning

Ct image conversion among different reconstruction kernels without a sinogram by using a convolutional neural network

Simulation of ct images reconstructed with different kernels using a convolutional neural network and its implications for efficient ct workflow

Mosmeddata: data set of 1110 chest ct scans performed during the covid-19 epidemic

Image filtering as an alternative to the application of a different reconstruction kernel in ct imaging: feasibility study in lung cancer screening

U-net: Convolutional networks for biomedical image segmentation

Data augmentation using generative adversarial networks (cyclegan) to improve generalizability in ct segmentation tasks

Spatial domain filtering for fast modification of the tradeoff between image sharpness and pixel noise in computed tomography

Image reconstruction: Part 1-understanding filtered back projection, noise and image acquisition

Improving ct image tumor segmentation through deep supervision and attentional gates

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study

We propose a new physics-driven augmentation methods to eliminate domain shift related to usage of different convolution kernels. It outperforms existing augmentation approaches in our experiments. We release the code, so our flexible and ready-to-use approach can be easily incorporated in any existing deep learning pipeline to ensure zero-shot domain adaptation.