key: cord-0331671-z8qgbmnc authors: Georgiadis, Antonios; Babbar, Varun; Silavong, Fran; Moran, Sean; Otter, Rob title: ST-FL: Style Transfer Preprocessing in Federated Learning for COVID-19 Segmentation date: 2022-03-25 journal: nan DOI: 10.1117/12.2611096 sha: 91fbae39bfa1b9c424feac856faae2bd023d18a3 doc_id: 331671 cord_uid: z8qgbmnc Chest Computational Tomography (CT) scans present low cost, speed and objectivity for COVID-19 diagnosis and deep learning methods have shown great promise in assisting the analysis and interpretation of these images. Most hospitals or countries can train their own models using in-house data, however empirical evidence shows that those models perform poorly when tested on new unseen cases, surfacing the need for coordinated global collaboration. Due to privacy regulations, medical data sharing between hospitals and nations is extremely difficult. We propose a GAN-augmented federated learning model, dubbed ST-FL (Style Transfer Federated Learning), for COVID-19 image segmentation. Federated learning (FL) permits a centralised model to be learned in a secure manner from heterogeneous datasets located in disparate private data silos. We demonstrate that the widely varying data quality on FL client nodes leads to a sub-optimal centralised FL model for COVID-19 chest CT image segmentation. ST-FL is a novel FL framework that is robust in the face of highly variable data quality at client nodes. The robustness is achieved by a denoising CycleGAN model at each client of the federation that maps arbitrary quality images into the same target quality, counteracting the severe data variability evident in real-world FL use-cases. Each client is provided with the target style, which is the same for all clients, and trains their own denoiser. Our qualitative and quantitative results suggest that this FL model performs comparably to, and in some cases better than, a model that has centralised access to all the training data. We hereby attest that this work has not been submitted for publication or presentation in any other conference. We developed a noise-agnostic flavour of Federated Learning by utilizing style transfer prepossessing prior to the federation -the ultimate goal is to achieve better results in COVID-19 segmentation, when the models are trained with FL. The target style is shared in the form of a dataset (e.g. 20-50 images), and each client trains its own CycleGAN transformation which maps its local data to the common style. The benefit of this approach is that it reduces any image noise in the CT scans prior to sending information via the federation. We used 2 publicly available COVID-19 segmentation datasets, in addition to artificial noise patterns, and demonstrate statistically significant lesion segmentation improvement ranging between 5%-40%, depending on the noise pattern. Due to stringent privacy laws, sharing of confidential data between institutions and countries is fraught with difficulties, and is generally considered impossible. Federated Learning provides a solution to this data sharing dilemma, allowing globally distributed data to remain private while still permitting a centralised neural network model to be learnt using information from all of these images existing across institution and country boundaries. Federated learning solves the problem of how to learn a single model based on data that is locked away in data silos without revealing per-client private data to other clients or the central server. The client and the aggregator share the same neural network architecture. Clients train on their local data and send the gradient updates to the aggregator, these gradient updates are combined by the aggregator potentially in a cryptographically secure manner, 1 the central model weights are updated with the aggregared gradients, and the resulting weights are distributed to the clients at the same time. Prior research has explored the benefits of federated learning for leveraging disparate datasets for the purpose of COVID-19 chest CT scan segmentation. 2 However, there is no previous research that accounts for the differing factors of variation of CT images that are distributed across client nodes. In practice CT images arising from different generations of CT machine can differ vastly across many factors of variation, for example brightness, detail and noise level, in addition to factors such as using a contrast-enhancing agent prior to the scan (contrast-enhanced vs non-contrast images). To address this issue, in this paper, we instead assume that a small representative dataset can be shared with the clients, with the style most commonly encountered, and thus have the clients learn an unpaired domain mapping between the local and target domains using a CycleGAN. Our contribution to the state-of-the-art is two-fold: • Mixed CT image data quality & the effect on FL: Through experimentation with synthetic and semisynthetic datasets of varying structural and stylistic features, we highlight the negative effect of differing quality images on client nodes on the accuracy of a federated U-Net 3 for CT image segmentation. • Noise agnostic FL for varying noise patterns: We present ST-FL, a federated learning framework that incorporates the denoising CycleGAN at each client node, standardising image quality per client and increasing the robustness of federated learning to mixed data quality observed in practice. For normalising the image quality on client nodes with a CycleGAN, 4 we propose two approaches. i) Universal CycleGAN : only one denoiser is trained at the aggregator level and is then shared with the clients. ii) Client-specific CycleGAN : multiple client-specific denoisers are trained at a client level. Experimental evaluation shows that ST-FL leads to higher quality segmentation models for chest CT scan images. We used a number of publicly available COVID-19 segmentation datasets, which include segmentation masks generated by radiologists. We extracted a small amount of data to be our target style and used the rest for training and testing. In addition to the already-existing noise patterns of the dataset (e.g. discolouration, blurring, contrast), we further enhanced them with artificial noise (e.g. contrast enhancement, contrast inversion, Gaussian noise, mixed noise etc.). We experimented with the Universal Cycle-GAN and Client-specific Cycle-GAN approaches and compared the thresholded segmentation results with respect to the FedAvg scheme and a Centralised model trained on style transferred datasets. The CycleGANs consist of a U-Net generator and a PatchGAN 5 discriminator. Prior to the actual federated training, we train both Universal and Client Specific CycleGANs for 100 epochs. For federated training, we concatenate original and style transferred images for each client segmentation UNet, ensuring that client models can learn salient information from each channel. This also ensures that the performance of the model is at least comparable to FedAvg, because the model weights will be adapted to consider information only from the original input channel in the worst case. For client datasets that serve as style targets, we concatenate 2 copies of the same image to input in the local segmentation model. We then train these models in a federated setting for 35 epochs. At the end of each training epoch, we aggregate their weights in a server model and broadcast them back to each client. In order to test the efficacy of our scheme, we perform experiments with 2 different types of client datasets: • Synthetic Dataset: We use the Coronacases 6 dataset of COVID-19 patient chest scans, both in its vanilla form and in an augmented form wherein each client dataset represents different noise patterns added to the dataset (inversion, Gaussian, contrast enhanced, mixed, etc). This scenario models a situation where client institutions may have chest scans with similar structural characteristics but differing style characteristics. • Semi-Synthetic Dataset: We use the Coronacases and MedSeg 7 dataset as client datasets and augment them with similar noise patterns as above to create additional client datasets. Compared to the Coronacases dataset, the MedSeg dataset was seen to have noisy labels and some structural and stylistic differences that can potentially hinder effective training. In this paper, we consider the scenario where client datasets are of similar size. Because the Coronacases and MedSeg datasets are of different sizes (30 and 100 images respectively), we fix |D k | = 30 for all clients. For both approaches, we add random warping to all client images not only for data augmentation, but also to add some variability in different client datasets. This also ensures that the CycleGAN is able to learn unpaired mappings between the original and target styles and becomes agnostic to any structural similarity between images. After warping, we keep aside 20% of each client dataset D k val as a validation set and calculate the average Dice and IOU score on the union of all client validation sets D val = ∪ N k=1 D k val . To understand which noise patterns the CycleGAN based approaches perform well on, we calculate the average performance improvement on the union of training and validation sets for each client (Figure 1 ). Table 1 shows metric scores of the different schemes tested across clients and dataset types. For all dataset types and number of clients, we observe that the client specific CycleGAN preprocessing scheme outperforms its federated learning counterparts. For synthetic datasets with high degree of structural similarity, the centralised training scheme can be viewed as an upper bound on the segmentation performance relative to all other federated learning schemes, with the client specific CycleGAN scheme coming closest to this bound. For semi-synthetic datasets, on the other hand, we observe no discernable pattern in segmentation performance of centralised training, with the client specific CycleGAN scheme outperforming it in all cases. Intuitively, this is because the centralised model is being trained on datasets of varying structural similarities and noisy labels, making it harder for weights to generalise across datasets. The client specifc CycleGAN becomes more robust to this, leading to improved performance across all noise pattens tested. We also see that universal CycleGAN preprocessing offers lower and more inconsistent performance gains on average compared to the client specifc CycleGAN scheme. This is because the style transferred output from the Universal CycleGAN was often of poor quality, especially in situations involving large numbers of clients, as the model is unable to adapt to differing client distributions. Table 1 : Performance Metrics for the Methods Tested for Differing Numbers of Clients. Note that we report the best metric averaged over 5 trials and its associated 95% confidence interval. Figure 1 shows a bar plot of the performance gains of all the scheme tested relative to the vanilla FedAvg scheme. Here, we averaged the % performance improvement of each noise pattern across differing numbers of clients tested, over 5 runs. We note that there is a discrepancy in the performance of CycleGAN related schemes over the noise patterns tested. Specifically, for both the semi synthetic and synthetic dataset, applying style transfer preprocessing on images corrupted by Gaussian noise does not produce meaningful improvement in dice scores. This is likely due to information loss in images that is not necessarily corrected by unpaired style transfer. Conversely, we see significant gains in performance in inversion and mixed noise patterns for both dataset types, though these gains are larger for the synthetic dataset case. Intuitively, this is because averaging weights of models trained on structurally dissimilar client datasets likely limits the benefits of style transfer, which can only correct for noise distribution shifts and not structural shifts. This paper presents a novel preprocessing method for performing federated learning in a noise agnostic manner, with a focus on segmentation of lesions in COVID 19 patient chest scans. Medical datasets in a federated learning setup tend to have variations in contrast, noise, brightness and detail, motivating the need for a common normalisation scheme which renders federated systems agnostic to noise. We explored the idea of using style transfer based pre-processing on client datasets in 2 scenarios: a) varying noise patterns but common structure, and b) varying noise patterns and varying structure. Our work suggests that style transfer pre-processing leads to higher dice scores in downstream segmentation tasks on average in both cases. We characterised the performance of our method on some common noise patterns in medical datasets and found disparities in performance, with some noise patterns showing much greater improvement in segmentation performance than others. Future work could focus on exploration of this technique in settings where client datasets are unbalanced and / or are of unequal size and further characterise its noise-specific performance. Practical secure aggregation for privacy-preserving machine learning ACM SIGSAC Conference on Computer and Communications Security Federated semi-supervised learning for covid region segmentation in chest ct using multi-national data from china, italy, japan U-net: Convolutional networks for biomedical image segmentation Unpaired image-to-image translation using cycle-consistent adversarial networks Image-to-image translation with conditional adversarial networks Covid-19 ct lung and infection segmentation dataset Covid-19 CT scan image data and segmentation dataset. Free to download