key: cord-135004-68y19dpg
authors: Russo, Carlo; Liu, Sidong; Ieva, Antonio Di
title: Impact of Spherical Coordinates Transformation Pre-processing in Deep Convolution Neural Networks for Brain Tumor Segmentation and Survival Prediction
date: 2020-10-27
journal: nan
DOI: nan
sha: 
doc_id: 135004
cord_uid: 68y19dpg

Pre-processing and Data Augmentation play an important role in Deep Convolutional Neural Networks (DCNN). Whereby several methods aim for standardization and augmentation of the dataset, we here propose a novel method aimed to feed DCNN with spherical space transformed input data that could better facilitate feature learning compared to standard Cartesian space images and volumes. In this work, the spherical coordinates transformation has been applied as a preprocessing method that, used in conjunction with normal MRI volumes, improves the accuracy of brain tumor segmentation and patient overall survival (OS) prediction on Brain Tumor Segmentation (BraTS) Challenge 2020 dataset. The LesionEncoder framework has been then applied to automatically extract features from DCNN models, achieving 0.586 accuracy of OS prediction on the validation data set, which is one of the best results according to BraTS 2020 leaderboard.

Magnetic Resonance Imaging (MRI) is used in everyday clinical practice to assess brain tumors. However, the manual segmentation of each volume representing the extension of the tumor is time-demanding and operator-dependent, as it is often non-reproducible and depends upon neuroradiologists' expertise. Several automatic or semi-automatic segmentation algorithms have been introduced to help segment brain tumors, and Deep Convolutional Neural Networks (DCNN) have recently shown very promising results.

To further improve the accuracy of automatic methods, the Multimodal Brain Tumor Segmentation (BraTS) challenge [1] [2] [3] is organized annually within the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The BraTS 2020 challenge includes a task for automatic segmentation of the total area containing the tumor (Whole Tumor -WT), as well as the Necrosis and Active tumor cells area (Tumor Core -TC, Enhancing Tumor -ET, Necrosis and ET are contained in TC).

Furthermore, glioma patients often have a dire survival prognosis following surgical resection and radiochemotherapy [4] . Thus, a further task to predict patient overall survival (OS) has been added into the challenge, aimed at improving the prediction of patient survival outcome in order to add information that are relevant to the decisionmaking process.

DCNNs are data driven algorithms. They require huge amount of data to obtain good results. In medical imaging, such big datasets are not often available, thus pre-processing and data augmentation plays an important role. While pre-processing methods are usually used to standardize input data, they can also be used to enhance meaningful data inside the original input images: an example is cropping the region of interest when the input data includes lots of redundant and misleading information.

Therefore, we propose a novel spherical space transformation method to enhance information on specific points of the tumor as well as enable the DCNN learning process to be invariant to rotation and scaling of the input images. Furthermore, we extended the use of lesion features extracted from the latent space of the segmentation models using the LesionEncoder framework, which replaces the classic imaging / radiomic features, such as volumetric parameters, intensity, morphologic, histogram-based and textural features, which showed high predictive power in patient OS prediction.

Dataset The dataset consists of four MRI sequences used to determine the segmentation and extract survival features, namely T1-weighted, post-contrast T1, T2-weighted and FLAIR images. Training dataset have 336 4-channel volumes with ground truth segmentation. Validation dataset is composed by data from 125 patients [5, 6] . Testing dataset is composed by additional 166 patients.

The DCNN that we chose to use as baseline for our method is derived from Myronenko [7] , which is based on Variational Auto Encoder (VAE) U-Net with adjusted input shape and loss function according to the type of transformation used into the preprocessing phase. The VAE proposed by Myronenko is composed by a U-Net with two decoder branches: a segmentation decoder branch, used to obtain the final segmentation, and an additional decoder branch to reconstruct the original volumes, used to regularize the shared encoder. The loss function is given by the formula:

where LL2 is the L2 loss on the VAE branch and LKL is the KL divergence penalty term.

We trained different models by changing pre-processing method (Cartesian and spherical) and some layers hyperparameters. Although the models share the same VAE structure proposed by Myronenko, there are a few differences. More specifically, Cartesian_v1 includes standard Dropout with rate 0.2, Kernel size filters 3x3x3 in Convolution layers outside the Green Blocks and an additional 3x3x3 convolution layer before the Blue Block. While Cartesian_v2 uses SpatialDropout3D with the same ratio, 1x1x1 convolution filters and no additional layers. The spherical model has the same structure of Cartesian_v1 but with Spherical transformation pre-processing on inputs. The spherical transformed model (using Cartesian_v2 structure) has not been trained yet by the 2020 challenge deadline. A bisCartesian model has been also trained using the structure of the Cartesian v1 model but using a lower coefficient of the KL loss, set to 0.0001. The model has not given better segmentation results, although it is showing improved results on OS task.

Our team previously presented the spherical coordinate transformation pre-processing as a method to improve segmentation results [8] . The Spherical transformed volume is shown in Figure 1 . Figure 1 . Example of the representation of a radiologic volume in spherical coordinate system. A)) A brain MRI volume with its 3D segmentation of the tumor, and B) the same volume transformed into a spherical coordinate system using the center of the volume as the origin.

Each pre-processed volume uses an origin point. Thus, to achieve good performance on the training, it is important to correctly select origin points, included within the tumor. For this reason, we used a cascade of DCNNs, the first one predicting a coarse segmentation, and then refining the segmentation by using origin points included in the previous model. The first pass model of the cascade could also be a model trained on non-transformed input (a Cartesian model), but the use of the Spherical model already in the first cascade's pass enabled pre-training weights to be used for the next training steps.

The Spherical coordinate transformation also adds an extreme augmentation. This is a beneficial step as it adds invariance to the rotation and scaling to the DCNN model. However, such an invariance also has a drawback especially when dealing with WT segmentation: apparently, WT segmentation works better with a Cartesian model, whereas using Spherical pre-processing adds many false positive regions to the WT. Thus, we used a Cartesian model to filter out the false positive regions found by the spherical pre-processing as shown in Figure 2 . We used this proposed method for the first time in a BraTS challenge, achieving similar results obtained in our original paper regarding the improvement of the accuracy of the model trained on transformed input compared to the baseline model. We also tested the intersection of the segmentation on the three different classes (Spherical -Cartesian intersection 3CH) instead of filtering only the WT class. Finally, we ensembled the best segmentations from Cartesian_v2 and the intersection method to improve the results further.

After filtering the Spherical segmentation with the Cartesian filter, we used a post-processing method to improve ET segmentation. We noticed that many false positive ET segmentations are due to isolated voxels. For this reason, we applied a binary opening operator to isolate thin branches over ET spots and then filter out the spots having less than 30 voxels. When the ET segmentation is still present after these filters, the original ET segmentation is restored and used as the final one. Otherwise, the ET segmentation is completely erased, meaning that no ET is present in the current volume. Table 1 shows the summary of segmentation results on the validation dataset. The most promising methods tested so far were the Cartesian_v2 and the spherical models: used alone without post-processing, the Cartesian_v2 gave the best results in WT and TC segmentation, while the spherical model worked better on ET segmentation.

The ensemble of the models gave a further improvement in the segmentation of ET class, while the best result for the WT and TC classes segmentation remained using only the Cartesian_v2 model.

The best overall improvement on ET has been shown by post-processing the segmentation of the intersected Cartesian and Spherical models on the three channels, even if the TC class dice score decreased and WT class did not improve further. Results of the final model on testing dataset shown in Table 2 seems to confirm a good accuracy, above all on the ET segmentation, although is not possible to make a comparison with the other models since the Challenge only allows to test one method on the dataset. 

Our team also participated in Task 2 of the BraTS Challenge: prediction of patient overall OS from pre-operative MRI scans. Instead of using the pre-defined imaging / radiomic features, such as volumetric parameters, intensity, morphologic, histogram-based and textural features, we used the features automatically extracted from MRI scans using the novel LesionEncoder (LE) framework [9] . The LE features were further processed using Principle Component Analysis (PCA) to reduce dimensionality, and then used as input to a generalized linear model (GLM) [10] to predict patient OS.

The LE framework was proposed in a recent work for COVID-19 severity assessment and progression prediction [9] . The original LE adopted the U-Net structure [11] , which consists of an encoder and a decoder based on the EfficientNet [12] . While the encoder learns and captures the lesion features in the input images, the decoder maps the lesion features back to the original image space and generates the segmentation maps. The features learnt by the encoder in the latent space encapsulate rich information of the lesions, therefore, can be used for lesion segmentation, as well as other tasks such as classification and prediction.

In this study, we used the VAE as backbone to build the LE. As described in the previous section, three different configurations have been applied to the VAE model, resulting in three different lesion encoders: i.e., LE_Cartesian (Cartesian_v2), LE_Spherical (Spherical) and LE _bisCartesian (Cartesian_v1_bis). The latent variables of the input images / MRI scans extracted by individual lesion encoders were then used as the features to predict patient OS. For each MRI scan, a high-dimensional feature vector ( = 256) was derived. As the high-dimensional feature space tended to lead to overfitting, we therefore used PCA to control the feature dimensionality by setting different numbers of principle components (̂= [2, 60] in this study) for further analysis. Figure 3 shows the joint age and OS distribution of the patients in the training cohort. The age distribution, as shown at the top of the figure, seems to be a normal distribution. The OS distribution, on the right side of the figure, seems to be heavily skewed, with the majority of cases having OS less than 400 days. To model the tailed distribution of the OS values, we therefore used a Tweedie distribution [13] , a special case of exponential dispersion models whose skewness can be controlled by a power parameter ( = [1.1,1.9] in this study).

A GLM model [10] based on the Tweedie distribution, i.e., Tweedie Regressor [13] , was built to predict OS values. The Tweedie Regressor was implemented using scikitlearn (v0.23.2). As the resection status and age are essential predictors of OS, both of them were merged with the LesionEncoder features as input to the Tweedie Regressor for OS prediction.

Two evaluation schemes were used to assess the prediction performance. The results were first evaluated based on accuracy of the classification of subjects as long-survivors (>15 months / 450 days), short-survivors (<10 months / 300 days), and mid-survivors (survival rate between 10 and 15 months / 300 -450 days). In addition, a pairwise error analysis between the predicted and actual OS (in days) was performed, evaluated using the following metrics: mean square error (MSE), median square error (median SE), standard deviation of the square errors (std SE), and the Spearman correlation coefficient (Spearman R). Amongst the 235 patients in the training set, 118 underwent a surgical gross total resection (GTR) and 10 underwent a subtotal resection (STR); in 107 cases, no information about the resection status are available. All of the 29 subjects in the validation set had a GTR resection status. The extent of resection was considered in the model as it has been shown to correlate to post-surgical outcome [14] .

Cross-Validation on the Training Set We used 5-fold cross-validation to train and validate the proposed method. An internal validation set (20%) was split from the dataset in each fold with the remaining 80% as the training set. For each of the three lesion encoders, i.e., LE_Cartesian, LE_Spherical and LE_bisCartesian, this process was repeated 5 times, leading to 5 different sub-models. Figure 4 illustrates the projected feature space of the features extracted using LE_Spherical (a), and the scatter plots (b, c) of the predicted OS vs. actual OS of the training samples. There was high variance in the performance of the sub-models (from 0.362 to 0.574). The prediction results of the 5 sub-models were further aggregated, and the results were summarized in Table 3 . LE_Cartesian achieved the highest accuracy (0.494) and Spearman R (0.429), while LE_Spherical had the lowest MSE, median SE and std SE. These two models outperformed LE_bisCartesian; however, the differences were not substantial (<0.034 in accuracy). Prediction Performance on the Validation Set The 5 different sub-models with the same configuration were then applied to the official validation set (n=29) to predict the OS of each validation case. The mean value of the 5 predictions of each case were then averaged to derive the final prediction. Results of the three models with different configurations are summarized in Table 4 . The LE_bisCartesian model achieved the highest accuracy (0.552); however, its MSE and std SE were higher, and Spearman R was lower than the other models. The LE_Cartesian model had the lowest MSE and the highest Spearman R, showing a better representation of the overall distribution of the OS values.

These findings showed a complementary nature of different models; therefore, we combined the outputs of these models to test whether the prediction performance could be improved further. Four combinations were tested, which consistently showed equal or better accuracy (between 0.552 and 0.586) compared to using individual models. Our final submission for OS prediction on the validation dataset, which was based on the M1&M2 model, was ranked the 4 th place in accuracy among the 42 participating teams. In the meanwhile, it achieved the 5 th place in both MSE and Spearman R, the 8 th place in median SE, and the 10 th place in std SE (checked on 23 October 2020). We further applied the M1&M2&M3 model on the official test dataset (n=107). The model's performance, as shown in Table 5 , was lower on the test dataset compared to the validation dataset, implying a marked difference between the two datasets and overfitting of the model. However, without knowing the results of other models, either of our own or from other participating teams, it is difficult to confirm whether such performance drop is caused by a less representative training dataset or a less generalizable model, or both.

Spherical Coordinate Transformation Pre-processing Spherical coordinate transformation pre-processing of the input dataset contribute to explore data in a different way, thus changing the learning process and achieving different features compared to the classical DCNN model learning process. Those different features can help to improve the segmentation process as well as contributing to deep feature extraction to be used in patients' OS prediction. Even if the spherical pre-processing method contributes to improving baseline model results, simple post-processing methods also have a strong impact on segmentation accuracy. However, overall segmentation results obtained by this method are not amongst the best ones compared to other teams in BraTS 2020 leaderboard, and additional efforts should be done to fine tune both Cartesian and spherical training phase.

The LesionEncoder framework extends the use of lesion features beyond conventional lesion segmentation. There is a wealth of information in the brain tumors including shape, texture, location, extent and distribution of involvement of the abnormality, that can be extracted by the lesion encoder. While it has been demonstrated in COVID-19 progression prediction [9] and severity assessment [15] , here we demonstrated a new application of LE in patient OS prediction. It may have strong potential in a wide range of other clinical and research applications, e.g., brain tumor pseudo-progression detection [16] and ophthalmic disease screening [17] .

Various dimension reduction methods have been tested in this study, including PCA, Independent Component Analysis (ICA), t-distributed stochastic neighbor embedding (T-SNE). In the training phase, PCA was found to have lower variability in accuracy than the other methods; as a result, it was chosen to process the high-dimensional features. We used a linear search strategy to optimize the two most important parameters of the OS prediction model, including the number of principle components in PCA (̂= [2, 60] ) and the power of Tweedie distribution ( = [1.1,1.9]). The optimal parameters for LE_Cartesian were (̂= 10, = 1.6), and (̂= 3, = 1.6) for both LE_Spherical and LE_bisCartesian. In addition, it will be important to demonstrate the scale invariance in the Tweedie regressor within different datasets in our future work.

In conclusion, we have introduced a novel and very promising method to pre-process brain tumors' MR images by means of a spherical coordinates transformation to be used in DCNN models for brain tumor segmentation. The LesionEncoder framework has been applied to automatically extract imaging features from DCNN models, demonstrating good performance for the survival prediction task.

The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)

Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary

Segmentation Labels and Radiomic Features for the Pre-operative Scans of the TCGA-GBM collection

Segmentation Labels and Radiomic Features for the Pre-operative Scans of the TCGA-LGG collection

Spherical Coordinates Transform Pre-processing in Deep Convolution Neural Networks for Brain Tumor Segmentation in MRI

3D MRI Brain Tumor Segmentation Using Autoencoder Regularization, in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes

Severity Assessment and Progression Prediction of COVID-19 Patients based on the LesionEncoder Framework and Chest CT. medRxiv

Generalized Linear Models, Second Edition

Convolutional networks for biomedical image segmentation

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

The theory of exponential dispersion models and analysis of deviance. Monografias de matemática

Evidence for Improving Outcome Through Extent of Resection

Severity Assessment of COVID-19 based on Clinical and Imaging Data. medRxiv

A Deep Learning Methodology for Differentiating Glioma from Radiation Necrosis using Multimodal MRI: Algorithm Development and Validation

A Deep Learning based Algorithm Identifies Glaucomatous Discs using Monoscopic Fundus Photos