key: cord-023038-p9w9fwak authors: Fang, Mengjie; He, Bingxi; Li, Li; Dong, Di; Yang, Xin; Li, Cong; Meng, Lingwei; Zhong, Lianzhen; Li, Hailin; Li, Hongjun; Tian, Jie title: CT radiomics can help screen the Coronavirus disease 2019 (COVID-19): a preliminary study date: 2020-04-15 journal: Sci DOI: 10.1007/s11432-020-2849-3 sha: doc_id: 23038 cord_uid: p9w9fwak The Coronavirus disease 2019 (COVID-19) is raging across the world. The radiomics, which explores huge amounts of features from medical image for disease diagnosis, may help the screen of the COVID-19. In this study, we aim to develop a radiomic signature to screen COVID-19 from CT images. We retrospectively collect 75 pneumonia patients from Beijing Youan Hospital, including 46 patients with COVID-19 and 29 other types of pneumonias. These patients are divided into training set (n = 50) and test set (n = 25) at random. We segment the lung lesions from the CT images, and extract 77 radiomic features from the lesions. Then unsupervised consensus clustering and multiple cross-validation are utilized to select the key features that are associated with the COVID-19. In the experiments, while twenty-three radiomic features are found to be highly associated with COVID-19, four key features are screened and used as the inputs of support vector machine to build the radiomic signature. We use area under the receiver operating characteristic curve (AUC) and calibration curve to assess the performance of our model. It yields AUCs of 0.862 and 0.826 in the training set and the test set respectively. We also perform the stratified analysis and find that its predictive ability is not affected by gender, age, chronic disease and degree of severity. In conclusion, we investigate the value of radiomics in screening COVID-19, and the experimental results suggest the radiomic signature could be a potential tool for diagnosis of COVID-19. The novel coronavirus disease 2019(COVID- 19) outbreak in Wuhan in the end of 2019 [1, 2] . It has rapidly spread into other provinces in China and other countries [3, 4] . The COVID-19 causes typical viral pneumonia with severe acute respiratory syndrome, with a much higher mortality rate than the flu [5, 6] . During the screen of the COVID-19, it is difficult to discriminate the COVID-19 from other types of pneumonias such as flu-related pneumonia, bacterial pneumonia, and mycoplasma pneumonia [7] . The mildly ill patients were widely found in the COVID-19 infected patients, making the screening even more challenging [8, 9] . In the clinical practice, the nucleic acid testing is used for screening of the COVID-19 from the sputum or other clinical samples of suspected patients [10, 11] . However, the false negative cases of nucleic acid testing lead to lots of omission diagnosis cases, which raises the risk of spreading [12] . The CT imaging is also applied for the diagnosis of COVID-19 in patients with pneumonia [13, 14] . The radiological finding, i.e., ground-glass opacity, is the typical symptoms on CT [15, 16] . However, the radiological finding is a subjective evaluation, which relies on the diagnostic experience of the radiologist. Meanwhile, progression of the COVID-19 demonstrates different radiological symptom [16] . Therefore, a rapid quantitative screening method will be helpful for the screening of COVID-19. Radiomics, a quantitative analysis technology based on medical imaging, has been widely used in the oncology research [17] [18] [19] . Previous radiomics studies showed that the quantitative radiomic features could represent the changes in pathology and gene level, and thus had encouraging performance in cancer diagnosis, treatment outcome prediction, and prognosis prediction [20] [21] [22] [23] . Besides application in cancer, radiomics has also been used in the heart diseases, such as hypertrophic cardiomyopathy [24] and coronary heart disease [25] . Furthermore, it has also been applied to differentiate primary progressive pulmonary tuberculosis from community-acquired pneumonia in children [26] . Radiomics might provide a potential tool for screening of COVID-19. To our knowledge, there is no reported research about the radiomics in COVID- 19. In this retrospective study, we intend to apply CT radiomics in the screening of the COVID-19. We focus on discriminating the pneumonia caused by COVID-19 from other pneumonias. The remainder of this paper is organized as follows. We describe the dataset and method in Section 2. In Section 3, we present the results of our method and the stratification analysis. In Section 4, we discuss the value of our method to the clinic, and the future research directions. The radiomics workflow of this study includes retrospective data collection, lung lesion segmentation, image preprocessing, feature extraction, feature selection and signature construction, and performance evaluation. Ethical approval is obtained for this retrospective analysis, and the informed consent requirement is waived. The data is collected from Beijing Youan Hospital. The COVID-19 pneumonias were collected between December 25, 2019 and February 5, 2020. The other types of pneumonias were collected before October, 2019. There are 46 COVID-19 pneumonias and 29 other types of pneumonias enrolled in this study. The exclusion criteria include (a) substantial motion artifacts in CT images; (b) small or inconspicuous lesions that could not be identified by CT; (c) deficiency of baseline clinic pathological data; (d) large interval (> 1 week) between CT scan and pathological diagnosis. All patients are randomly divided into a training set (n = 50) and a test set (n = 25). Chest CT examinations are performed for all patients with a 256-section scanner (Philips Brilliance iCT; Dutch Philips). The CT protocol is as follows: 120 kV; automatic tube current (100-400 mA); slice thickness, 0.9-5 mm; collimation, 0.625 mm; pitch, 0.914; matrix, 512 × 512; breath hold at full inspiration. The images are photographed at lung (window width, 1500 HU; window level, −500 HU) and mediastinal (window width, 350 HU (Hounsfield Units); window level, 50 HU) settings. The scanning range is from the thorax entrance to the posterior costal angle. We retrieve CT images from the picture archiving and communication system (PACS). To reduce the impact from the different CT slice thicknesses, the thin-slice scan is transformed into the simulated thickslice scan by fusing the adjacent slices. The lung region is automatically extracted by a threshold of HU = −300. Then, the slice with the largest lesion area is selected, and the 2-dimensional region of interest (ROI) covering the entire lesion area is segmented manually using ITK-SNAP software (version 3.6.0 1) ). We use the bicubic resampling to standardize the image scale in the slice, resulting in a pixel size of 0.5 mm × 0.5 mm. Firstly, 3 morphological features are calculated based on the ROI. Then, 14 intensity-based statistical features, 12 gray-level co-occurrence matrix (GLCM) features and 11 gray-level run-length matrix (GLRLM) features are generated from the original CT image and the smoothed image respectively. We refer to the image biomarker standardisation initiative (IBSI) to standardize the feature extraction algorithms. The radiomic features are normalized to z-scores based on the parameters calculated in the training set. The feature extraction is performed in MATLAB 2017a (Mathworks, Natick, MA, USA) using an in-house developed tool-box. The training set is utilized for feature selection and signature building. Firstly, we implement the consensus clustering, which is an unsupervised process widely-used in sample grouping, to reduce redundancy of the radiomic feature set. We increase the cluster number from two. For each cluster number, the feature set is bootstrapped 10000 times with a sampling proportion of 80%, and the hierarchical clustering is implemented repeatedly to obtain the consensus indexes of features. The distance metric is set to (1-Spearman correlation). In each cluster, the feature yielding the highest intra-cluster average consensus index is identified as the medoid feature. The similarities between it and other intra-cluster features are estimated using the Spearman correlation, which are called the intra-cluster correlation coefficients. In this study, the cluster number making the intra-cluster correlation coefficients of all the clusters larger than 0.8 is selected. Then, we construct the support vector machine (SVM) with radial basis function kernel using the medoid features as the candidate inputs. To reduce the potential risk of overfitting, we conduct a forward selection step to determine the finial combined features whose scale is limited within 1/5 of the number of patients in the smallest group. The evaluation criterion is set to the average accuracy in multiple 3-fold cross-validation. The optimal hype-parameters of SVM (i.e., the penalty factor and the kernel function parameter) are also selected. Finally, based on the whole training set, the SVM model is trained as the radiomic signature. We utilize the radiomic heatmap, in which the unsupervised clustering is implemented on the radiomic features and the patients respectively, to identify the obvious radiomic expression patterns and to reveal the association between the patterns and the pneumonia types. Univariate analysis is used to assess the relationship between patients of different sets. Differences between the groups are assessed using the independent t-test or Mann-Whitney U test for continuous variables and Fisher's exact test or chi-square test for categorical variables. Receiver-operating characteristic (ROC) curves are plotted for the features and the radiomic signature to assess their predictive performances and are compared using the Delong test. The area under the curve (AUC) of the ROC curve is obtained. The calibration curve is plotted to assess the calibration of the radiomic signature and accompanied by the Hosmer-Lemeshow test. The stratification analysis is presented on gender, age, with/without chronic disease and degree of severity to evaluate the association of radiomic signature with COVID-19 in different clinical subgroups. 1) www.itksnap.org. Figure 1 (Color online) Radiomic heatmap on the overall set. Unsupervised clustering of patients (n = 75) and radiomics features (n = 77) reveal clusters of patients with similar radiomic expression patterns. Demographic data in the training and test sets are listed in Table 1 . The median (range) ages of the two sets are 45 (3-89) years and 56 (1-86) years respectively, and the proportion of females is 48.0% and 44.0% respectively. There is no significant difference in age or gender between the two sets (p-values = 0.322 and 0.935). We evaluate the predictive ability of the radiomic features based on univariate analysis, and find that 23 features yield significant differences between COVID-19 and other types of pneumonias with p-values < 0.05. Meanwhile, significant differences of radiomic expression between the two groups are revealed in the radiomic heatmap (Figure 1 ), indicating there are intrinsic relations between CT phenotype of lesion and the pneumonias type. In the consensus clustering step, 17 distinct clusters are obtained respectively (Figure 2) . We retain the medoid features, which are described in Table 2 , to obtain the candidate feature set for building radiomic signature. To control for overfitting, these features are further reduced in the forward selection step combined with the multiple cross-validation. The best SVM model, which achieves an average accu- Distributions of the radiomic signature scores and pneumonias types in the training set and test set are shown in Figure 3 . The radiomic signature shows powerful predictive ability both training set and test set with AUCs of 0.862 (95% CI, 0.756-0.967) and 0.826 (95% CI, 0.655-0.998) respectively (Figure 4(a) ). The Delong test reveals that the difference is not statistically significant between AUCs on the two sets with p-values = 0.733. The calibration curves demonstrate that the predicted probability of COVID-19 obtained from the radiomic signature matches the frequency of actual observation (Figure 4(b) ). The results of Hosmer-Lemeshow test (p-values = 0.356 and 0.460 on training set and test set respectively) suggest there is no significant departure between the calibration curves and the diagonal line, which represents the perfect prediction. We perform stratification analysis on the subgroups of gender, age, with/without chronic disease and degree of severity based on the whole patients. We use the ROC curve and AUC to evaluate the performance of radiomic signature on these subpopulations. The results of Delong test imply its generalization on various kinds of patients (all p-values > 0.05) ( Figure 5 ). In conclusion, the CT radiomics for diagnosis of COVID-19 is considered for the first time. The experimental results have demonstrated that many radiomic features from the pneumonia are highly associated with the infection of COVID-19. In addition, the model based on these features could well discriminate COVID-19 pneumonia from other pneumonias such as flu pneumonia, bacterial pneumonia, and mycoplasma pneumonia. Although our results are encouraging, there are still some limitations of our study. Our preliminary study is based on data from a single hospital. Further large-scale validation from multiple hospitals and multiple regions should be performed. Meanwhile, as the COVID-19 patients could have different progress and treatment outcomes, the quantitative models which facilitate the identification of finer subtypes and the prediction of prognosis would be very valuable for improving the clinical management. Besides, 2D ROI and 2D radiomic features are used during our analysis. In the future, we will introduce 3D radiomic features to further improve the performance of our model. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster A novel coronavirus from patients with pneumonia in China Coronavirus infections-more than just the common cold First case of 2019 novel coronavirus in the United States Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan Clinical features of patients infected with 2019 novel coronavirus in Wuhan Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding COVID-19 pneumonia: what has CT taught us? Time course of lung changes on chest CT during recovery from 2019 novel coronavirus (COVID-19) pneumonia. Radiology Imaging features of 2019 novel coronavirus pneumonia CT manifestations of two cases of 2019 novel coronavirus (2019-nCoV) pneumonia Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study Development and validation of a novel MR imaging predictor of response to induction chemotherapy in locoregionally advanced nasopharyngeal cancer: a randomized controlled trial substudy (NCT01245959) Radiomics: extracting more information from medical images using advanced feature analysis Development and validation of an individualized nomogram to identify occult peritoneal metastasis in patients with advanced gastric cancer Artificial intelligence in cancer imaging: clinical challenges and applications A new approach to predict progression-free survival in stage IV EGFR-mutant NSCLC patients with EGFR-TKI therapy Quantitative biomarkers for prediction of epidermal growth factor receptor mutation in non-small cell lung cancer Prognostic value of deep learning PET/CT-based radiomics: potential role for future individual induction chemotherapy in advanced nasopharyngeal carcinoma LGE-CMR-derived texture features reflect poor prognosis in hypertrophic cardiomyopathy patients with systolic dysfunction: preliminary results Radiomic features are superior to conventional quantitative computed tomographic metrics to identify coronary plaques with napkin-ring sign Computed tomography-based predictive nomogram for differentiating primary progressive pulmonary tuberculosis from community-acquired pneumonia in children