key: cord-0930666-1wap3sk4 authors: Zhang, Bin; Ni-jia-Ti, Ma-yi-di-li; Yan, Ruike; An, Nan; Chen, Lv; Liu, Shuyi; Chen, Luyan; Chen, Qiuying; Li, Minmin; Chen, Zhuozhi; You, Jingjing; Dong, Yuhao; Xiong, Zhiyuan; Zhang, Shuixing title: CT-based radiomics for predicting the rapid progression of coronavirus disease 2019 (COVID-19) pneumonia lesions date: 2021-06-01 journal: Br J Radiol DOI: 10.1259/bjr.20201007 sha: 310c22971cd17c13e66cfaae64a027e17d639587 doc_id: 930666 cord_uid: 1wap3sk4 OBJECTIVES: To develop and validate a radiomic model to predict the rapid progression (defined as volume growth of pneumonia lesions > 50% within seven days) in patients with coronavirus disease 2019 (COVID-19). METHODS: Patients with laboratory-confirmed COVID-19 who underwent longitudinal chest CT between January 01 and February 18, 2020 were included. A total of 1316 radiomic features were extracted from the lung parenchyma window for each CT. The least absolute shrinkage and selection operator (LASSO), Relief, Las Vegas Wrapper (LVW), L1-norm-Support Vector Machine (L1-norm-SVM), and recursive feature elimination (RFE) were applied to select the features that associated with rapid progression. Four machine learning classifiers were used for modeling, including Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT). Accordingly, 20 radiomic models were developed on the basis of 296 CT scans and validated in 74 CT scans. Model performance was determined by the receiver operating characteristic curve. RESULTS: A total of 107 patients (median age, 49.0 years, interquartile range, 35–54) were evaluated. The patients underwent a total of 370 chest CT scans with a median interval of 4 days (interquartile range, 3–5 days). The combination methods of L1-norm SVM and SVM with 17 radiomic features yielded the highest performance in predicting the likelihood of rapid progression of pneumonia lesions on next CT scan, with an AUC of 0.857 (95% CI: 0.766–0.947), sensitivity of 87.5%, and specificity of 70.7%. CONCLUSIONS: Our radiomic model based on longitudinal chest CT data could predict the rapid progression of pneumonia lesions, which may facilitate the CT follow-up intervals and reduce the radiation. ADVANCES IN KNOWLEDGE: Radiomic features extracted from the current chest CT have potential in predicting the likelihood of rapid progression of pneumonia lesions on the next chest CT, which would improve clinical decision-making regarding timely treatment. The rapid spread of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as a potentially fatal disease is a major and urgent threat to global health. 1 As of August 14, 2020 , there are more than 21.05 million confirmed cases by the World Health Organization (WHO) with 752,378 deaths. 2 Since the outbreak of COVID-19, chest CT plays an indispensable role in the detection, diagnosis, and follow-up of COVID-19 pneumonia. 3 Chest CT not only presents the clinical course of COVID-19 infection and the disease severity but also predicts the poor outcomes of patients. [4] [5] [6] However, multiple CT scans in short time during the COVID-19 pandemic arouses great concern about the radiation burden of the patients and healthcare workers. It is widely accepted that ionizing radiation increases the lifetime likelihood of developing cancer. 7 Some previous studies have tried a low-dose chest CT scan in the diagnosis of COVID-19 pneumonia to reduce radiation dose. [8] [9] [10] [11] [12] However, low-dose CT may miss some key Objectives: To develop and validate a radiomic model to predict the rapid progression (defined as volume growth of pneumonia lesions > 50% within seven days) in patients with coronavirus disease 2019 (COVID- 19) . Methods: Patients with laboratory-confirmed COVID-19 who underwent longitudinal chest CT between January 01 and February 18, 2020 were included. A total of 1316 radiomic features were extracted from the lung parenchyma window for each CT. The least absolute shrinkage and selection operator (LASSO), Relief, Las Vegas Wrapper (LVW), L1-norm-Support Vector Machine (L1-norm-SVM), and recursive feature elimination (RFE) were applied to select the features that associated with rapid progression. Four machine learning classifiers were used for modeling, including Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT). Accordingly, 20 radiomic models were developed on the basis of 296 CT scans and validated in 74 CT scans. Model performance was determined by the receiver operating characteristic curve. Results: A total of 107 patients (median age, 49.0 years, interquartile range, 35-54) were evaluated. The patients underwent a total of 370 chest CT scans with a median interval of 4 days (interquartile range, 3-5 days). The combination methods of L1-norm SVM and SVM with 17 radiomic features yielded the highest performance in predicting the likelihood of rapid progression of pneumonia lesions on next CT scan, with an AUC of 0.857 (95% CI: 0.766-0.947), sensitivity of 87.5%, and specificity of 70.7%. Conclusions: Our radiomic model based on longitudinal chest CT data could predict the rapid progression of pneumonia lesions, which may facilitate the CT follow-up intervals and reduce the radiation. Advances in knowledge: Radiomic features extracted from the current chest CT have potential in predicting the likelihood of rapid progression of pneumonia lesions on the next chest CT, which would improve clinical decision-making regarding timely treatment. signs of COVID-19 pneumonia compared with standard-dose CT. In this present study, for the first time, we aimed to develop a CT-based radiomic model to predict the probability of rapid progression of COVID-19 pneumonia to guide the follow-up interval of chest CT scan, which may reduce ionizing radiation dose and estimated cancer risk. This study was approved by the institutional review board and the need for written informed consent was waived. A total of 118 COVID-19 patients from two designated hospitals were retrospectively included between January 8, 2020 and February 25, 2020. Adult patients had a laboratory-confirmed COVID-19, which was achieved by real-time reverse transcriptionpolymerase chain reaction (RT-PCR) assay of throat swab samples (at least two samples were taken, at least 24 h apart) for COVID-19 according to the protocol established by the WHO. The 118 CT scans from 11 patients were excluded due to no follow-up CT scans or the interval time of two adjacent CT scans >7 days. Finally, a total of 370 CT scans from 107 patients were analyzed in this study. The whole dataset was randomly divided into two subsets, 80% for training and the remaining 20% for validation using 10-fold cross-validation. A representative case and the flowchart of patients and CT scans inclusion are shown in Figure 1 . The distribution of number of CT scans and patients under different time interval between two adjacent CT scans is illustruted in Figure 2 . Patients underwent chest CT scans by CT 64 scanner (GE Medical System), Siemens Emotion 16 scanner (Siemens Healthineers; Erlangen, Germany), or ICT 128 scanner (Philips Healthcare, Netherlands). No contrast agent was administered. CT acquisition parameters of the three CT scanners were shown in Table 1 . We used a previously trained 3D U-net ++that based on 2000 COVID-19 pneumonia cases to pre-segment the COVID-19 pneumonia lesions in this present study. Several days later, an experienced radiologist (with more than 15 years' experience in chest imaging) edited and verified the pre-segmentation results, removed false positives and segmented the missed lesions. Radiomic feature extraction All raw CT images were preprocessed by 1 mm*1 mm*1 mm resampling. Radiomic features were extracted from the lung window (window width: 1500 Hounsfield Unit [HU], window level: 600 HU) in the Python (v.3.7.0, Beaverton, Ore; https://www. python. org/) by using the Pyradiomics package (v.3.0; https:// github. com/ Radiomics/ pyradiomics). The parameters used in different transforms in c were presented in Table 2 . A total of 1316 radiomic features were extracted under seven image types including Originial, Wavelet, LoG, Square, SquareRoot, Exponential and Logarithm. The class and corresponding number of radiomics features are presented in Table 3 . All radiomic features were normalized by min_max. Considering the high-dimensional radiomic features may contain redundant information, five feature selectors including the least absolute shrinkage and selection operator (LASSO), 13 Relief, 14 Las Vegas Wrapper (LVW), 15 L1-norm-Support Vector Machine (L1-norm-SVM), 16 and recursive feature elimination (RFE) 17 were used to reduce the dimensions of the features before the machine learning was applied to train the models. Machine-learning-based radiomic model construction Rapid progression of pneumonia lesions meant volume growth >50% within seven days, which was calculated as the ratio of the pneumonia volume on the next CT scan (V2) to the pneumonia volume on the current CT scan (V1). The threshold of 50% was identified according to the COVID-19 guidelines (trial version 6) released by the National Health Commission of China. 18 V2/ V1 >1.5 indicates that the current CT scan is a positive sample, otherwise, a negative sample. For unbiased estimates of diagnostic accuracy, our dataset was randomly split into training and validation datasets with a ratio of 4:1. The proportions of positive and negative samples in training and testing datasets were the same when splitting the dataset. Four common machine learning algorithms including Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT) were applied to predict the occurrence of rapid progression. To select the optimal model and hyperparameters for each model, we conducted 10-fold cross-validation on each training dataset. The hyperparameters of 20 combination models were showed in Supplementary Material 1. The model with the highest area under the curve (AUC) is considered to be the optimal model. All models building was performed in the Python environment (v.3.7.0, https://www. python. org/) by using the Scikit-learn package (v.0.23.1; https:// scikit-learn. org/). Categorical variables were expressed as counts and percentage, while continuous variables are shown as median and interquartile range. All the statistical analyses were performed using Python, v.3.7.0 (Beaverton, Ore; https://www. python. org/). The packages were used as follows: "mlr" for LR, "randomForest" for RF, and "e1071" for SVM. Receiver operating characteristic curve (ROC) analyses were performed to evaluate the performance of different models to predict the occurrence of rapid progression. The AUC comparison of different models used Delong test. 19 A p < 0.05 was considered significant. A total of 107 patients with COVID-19 were included in this study, with median age of 49.0 years (interquartile range, 35-54) and 60 (56.6%) were males. 11 (10.3%) had hypertension, 5 (4.7%) had diabetes, 3 (2.8%) had coronary heart disease, 2 (1.9%) had chronic liver disease, 2 (1.9%) had chronic lung disease, and 5 (4.7%) had previous surgery. The patients underwent a total of 370 chest CT scans with a median interval of 4 days (interquartile range, 3-5 days). The median time from the initial to the last CT scan was 15 days (interquartile range, [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] . After selection by LASSO, Relief, LVW, L1-norm SVM, and RFE, 11, 18, 2-24, 17, and 44 features were identified respectively. Among the 20 combined models, L1-norm SVM +SVM with 17 radiomic features achieved the optimal predictive performance, with an AUC of 0.885 (95% CI: 0.845-0.924), sensitivity of 93.7%, specificity of 63.5%, and accuracy of 69.9% in the training dataset and an AUC of 0.857 (95% CI: 0.766-0.947), sensitivity of 87.5%, specificity of 70.7%, and accuracy of 74.3% in the internal-validation dataset ( Figure 3 , GLCM, Gray-level co-occurrence matrix; GLDM, Gray-level dependence matrix; GLRLM, Gray-level run length matrix; GLSZM, Gray-level size zone matrix; NGTDM, Neighboring gray tone difference matrix. Figure 3 . The ROC curves of the training dataset and testing dataset with the optimal model. The left panel shows the mean ROC curve and the 95% CI for the training dataset (a). The right panel shows the mean ROC curve and the 95% CI for the validation dataset (b). In this present study, we developed and validated radiomic models based on this chest CT scan to predict the probability of rapid progression of COVID-19 pneumonia on next CT scan. We combined five feature selection methods and four classification methods and the results showed that the combination of L1-norm SVM +SVM outperformed other combinations, yielding an AUC of 0.857 (95% CI: 0.766-0.947), sensitivity of 87.5%, specificity of 70.7%, and accuracy of 74.3%. Radiomics is a quantitative tool for medical imaging, which enhances the existing data available to clinicians by means of advanced mathematical analysis from the field of machine learning. This novel approach has been widely used in diagnosing, staging, predicting treatment response and prognosis of cancers. Since the outbreak of COVID-19, some studies have applied CT-based radiomics to copy with this emergent infectious diseases. By high-throughput extracting huge amounts of features from chest CT images of COVID-19 pneumonia, radiomics can reflect underlying information that associates with disease heterogeneity. Fang et al developed a radiomic nomogram with high performance in differentiating COVID-19 from other types of viral pneumonia. 20 Recent studies also used radiomics to diagnose and predict the outcomes of COVID-19. For example, researchers proposed a non-invasive and quantitative radiomic model using CT to predict poor outcomes in advance among COVID-19 patients. [21] [22] [23] Wei et al 24 Current evidence demonstrated that radiomics has potential in the clinical management of COVID-19. Unlike previous studies, this study provided a radiomic tool to predict COVID-19 patients who had high-risk of rapid progression of pneumonia within seven days and the results showed promising. The performance of a radiomic model would be affected by the each step of radiomic workflow. The differences in noise and resolution of CT images from the different CT systems may impact the reproducibility of radiomics, for instance, different values of radiomic features. 29 We used image pre-processing to reduce the bias caused by different scanners and imaging protocols. According to previous studies, we extracted three common classes of radiomic features (first-order statistics, shape-based features and texture features) by using the Pyradiomics package. Regarding the various choices of feature selection and modelling methodologies, the identification of optimal machine learning methods for radiomic applications is a crucial step towards stable and clinically relevant clinical-decision support systems; thus, multiple machine-learning methods should be employed and compared. In this study, we chose five feature selectors and four modeling methods to identify the best combination and found that L1-norm SVM +SVM achieved the highest performance in the specific task of predicting rapid progression of COVID-19. Zhang et al 30 evaluated six feature selection methods and nine classifiers to predict the recurrence and distant metastasis in patients with advanced nasopharyngeal carcinoma and found that the combination methods Random Forest (RF) +RF performed the best. In this study, we identified 17 radiomic features that were most strongly related to the prediction outcome, consisting of 3 first-order statistics and 14 texture features. All are associated with image uniformity and heterogeneity. COVID-19 pneumonia lesions with high-risk of rapid progression were more heterogeneous (e.g. mixed ground-glass opacity and consolidation) than those with low-risk of rapid progression. Previous studies have showed that radiomics or texture analysis can characterize tumor phenotypes and reflect the tumor heterogeneity. 31, 32 This study also demonstrated that radiomics features can serve as an effective biomarker of COVID-19 pneumonia by reflecting the heterogeneity of lesions. This study also has some limitations. First, the retrospective nature of this study. Second, the clinical and laboratory variables did not be integrated into the prediction model because they were not matched with the each chest CT examinations. Third, the effect of treatment on the COVID-19 pneumonia did not be considered because there were no specific treatment of COVID-19. The drugs of COVID-19 used in clinical setting were mixed although there were recommends of guidelines. Finally, this model lacks of external validation, whose generalization needs to be tested in other institutions. In conclusion, we proposed a CT-based radiomic model to predict the rapid progression of COVID-19 pneumonia, which may rationalize the chest CT follow-up intervals of COVID-19 patients and would benefit the clinical management of COVID-19 patients. transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19 World Health Organization. Coronavirus disease 2019 (COVID-19) situation report -207 The indispensable role of chest CT in the detection of coronavirus disease 2019 (COVID-19 Use of chest imaging in the diagnosis and management of COVID-19: a who rapid advice guide Role of computed tomography in predicting critical disease in patients with covid-19 pneumonia: a retrospective study using a semiautomatic quantitative method Predictors of coronavirus disease 19 (COVID-19) pneumonitis outcome based on computed tomography (CT) imaging obtained prior to hospitalization: a retrospective study Computed tomography and patient risk: facts, perceptions and uncertainties A low-dose chest CT protocol for the diagnosis of COVID-19 pneumonia: a prospective study Low-Dose chest CT for the diagnosis of COVID-19-A systematic, prospective comparison with PCR False-Negative nasopharyngeal swab RT-PCR assays in typical COVID-19: role of Ultralow-dose chest CT and bronchoscopy in diagnosis Dose-optimised chest computed tomography for diagnosis of coronavirus disease 2019 (COVID-19) -evaluation of image quality and diagnostic impact Application of CareDose 4D combined with Karl 3D technology in the low dose computed tomography for the follow-up of COVID-19 Regression shrinkage and selection via the LASSO The feature selection problem: traditional methods and a new algorithm Feature selection and Classification-A probabilistic wrapper approach Feature selection based on L1-norm support vector machine and effective recognition system for Parkinson's disease using voice recordings Gene selection for cancer classification using support vector machines National Health Commission of the People's Republic of China. Guidelines for the diagnosis and treatment of novel coronavirus infection (trial version 6) Fast implementation of DeLong's algorithm for comparing the areas under correlated receiver operating characteristic curves Radiomics nomogram for the prediction of 2019 novel coronavirus pneumonia caused by SARS-CoV-2 Radiomics analysis of computed tomography helps predict poor prognostic outcome in COVID-19 Computed tomography Radiomics can predict disease severity and outcome in coronavirus disease 2019 pneumonia CT quantification and Machinelearning models for assessment of disease severity and prognosis of COVID-19 patients Identification of common and severe COVID-19: the value of CT texture analysis and correlation with clinical characteristics Severity assessment of COVID-19 using CT image features and laboratory indices A novel machine Learning-derived radiomic signature of the whole lung differentiates stable from progressive COVID-19 infection: a retrospective cohort study Machine learning-based CT radiomics method for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection: a multicenter study A model based on CT radiomic features for predicting RT-PCR becoming negative in coronavirus disease 2019 (COVID-19) patients Radiomics: the bridge between medical imaging and personalized medicine Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma Radiomics: images are more than pictures, they are data Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology Thanks to all the medical workers for their fighting against the COVID-19, and to the people of the country and the world for their contributions to this campaign.