key: cord-0742635-2s4ifz7i authors: Qi, Xiaolong; Jiang, Zicheng; YU, QIAN; Shao, Chuxiao; Zhang, Hongguang; Yue, Hongmei; Ma, Baoyi; Wang, Yuancheng; Liu, Chuan; Meng, Xiangpan; Huang, Shan; Wang, Jitao; Xu, Dan; Lei, Junqiang; Xie, Guanghang; Huang, Huihong; Yang, Jie; Ji, Jiansong; Pan, Hongqiu; Zou, Shengqiang; Ju, Shenghong title: Machine learning-based CT radiomics model for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection: A multicenter study date: 2020-03-03 journal: nan DOI: 10.1101/2020.02.29.20029603 sha: b8bb4db131a25b1bbb30d4205b16dc9ef988d22f doc_id: 742635 cord_uid: 2s4ifz7i Abstract Objectives To develop and test machine learning-based CT radiomics models for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection. Design Cross-sectional Setting Multicenter Participants A total of 52 patients with laboratory-confirmed SARS-CoV-2 infection and their initial CT images were enrolled from 5 designated hospitals in Ankang, Lishui, Zhenjiang, Lanzhou, and Linxia between January 23, 2020 and February 8, 2020. As of February 20, patients remained in hospital or with non-findings in CT were excluded. Therefore, 31 patients with 72 lesion segments were included in the final analysis. Intervention CT radiomics models based on logistic regression (LR) and random forest (RF) were developed on features extracted from pneumonia lesions in training and inter-validation datasets. The predictive performance was further evaluated in test dataset on lung lobe- and patients-level. Main outcomes Short-term hospital stay (≤10 days) and long-term hospital stay (>10 days). Results The CT radiomics models based on 6 second-order features were effective in discriminating short- and long-term hospital stay in patients with pneumonia associated with SARS-CoV-2 infection, with areas under the curves of 0.97 (95%CI 0.83-1.0) and 0.92 (95%CI 0.67-1.0) by LR and RF, respectively, in the test dataset. The LR model showed a sensitivity and specificity of 1.0 and 0.89, and the RF model showed similar performance with sensitivity and specificity of 0.75 and 1.0 in test dataset. Conclusions The machine learning-based CT radiomics models showed feasibility and accuracy for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection. 4 submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. The imaging or algorithm data used in this study are available upon request. Lead author and the manuscript's guarantor affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned have been explained. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint To develop and test machine learning-based CT radiomics models for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection. A total of 52 patients with laboratory-confirmed SARS-CoV-2 infection and their initial CT images were enrolled from 5 designated hospitals in Ankang, Lishui, Zhenjiang, Lanzhou, and Linxia between January 23, 2020 and February 8, 2020. As of February 20, patients remained in hospital or with non-findings in CT were excluded. Therefore, 31 patients with 72 lesion segments were included in the final analysis. CT radiomics models based on logistic regression (LR) and random forest (RF) were developed on features extracted from pneumonia lesions in training and inter-validation All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint The coronavirus disease 2019 (COVID-19) from Wuhan, China has become a global challenge since the December 2019. [1] [2] [3] Clinical characteristics of patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection have been reported, and the median hospital stay of forty-seven discharged patients was 10 days. 1 Studies showed that patients' condition in Wuhan worsens on the 10th day after illness onset. 2 However, patients with symptoms longer than 10 days outside Wuhan were less severe than those in Wuhan. 3 Therefore, the hospital stay in patients with SARS-CoV-2 infection is one of the prognostic indicators, and its non-invasive predicting tool is important for assessing the patients' clinical outcome. Chest CT is recommended as a routine test in the diagnoses and monitoring of COVID-19 since ground-glass opacities and consolidation are the most relative imaging features in pneumonia associated with SARS-CoV-2 infection. [4] [5] [6] On the basis of our previous work utilizing quantitative CT for COVID-19, 7 we hypothesized that high-throughout information hidden behind CT images 8 had potential in discriminating the hospital stay. The study aimed to develop and test machine learning-based CT radiomics models for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint The multicenter study was conducted according to principles of the Declaration of Helsinki and approved by all institutional review board. The need for written informed consent from the participants was waived. Patients with laboratory-confirmed SARS-CoV-2 infection and their initial CT images were enrolled from 5 designated hospitals between January 23, 2020 and February 8, 2020, with final follow-up on February 20, 2020 (Figure 1 ). Most patients received antiviral treatment with interferon inhalation, lopinavir and ritonavir, combined with probiotics. Patients were discharged once the results of two real-time fluorescence polymerase-chain-reaction tests taken 24 hours apart were negative for SARS-CoV-2 antigens. Patients without pneumonia findings or those remained in hospital were excluded. Sample size consideration was shown in supplementary file. In this study, the optimal cut-off value of hospital stay was determined to be 10 days based on previous studies 1-3 , by which patients were classified into short-term hospital stay (≤10 days) and long-term hospital stay (>10 days). The pipeline of radiomics model was shown (Figure 1) , and features extraction and model building were performed on lung lobe-level. Images preprocessing was shown in supplementary file. Images containing lesions were segmented using Python (3.6, https://www.python.org) and 3Dslicer (version 4.10.0; https://www.slicer.org/) with two All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint steps. First, lung lobes of each patient were segmented automatically using algorithms based on U-net 9 , and results were checked and modified by one radiologist (Q.Y). Next, lesions in each lung lobe were labeled semi-automatically using serval seeds placed within lesion region to generate the contours. Three radiologists (S.H, Q.Y and X.P.M) evaluated segments of each lesion and reached a consensus. All imaging processes were blinded to clinical data. In total, 1218 features were calculated per lesion patch. First-order, shape and second-order features were extracted from original images and wavelet filter applied images using pyradiomics. 10 Two supervised learning algorithms, logistic regression (LR) and random forest (RF), were used to build the model and verify the robustness of features (supplementary file). 11 We applied 5-fold cross-validation on the training dataset to prove model performance. The cutoff point was defined on receiver operating characteristic (ROC) curves of training data by maximizing the sum of sensitivity and specificity. The model performance was evaluated using test dataset on lung lobe-level. Areas under the ROC curve (AUC), sensitivities, specificities, positive predictive value (PPV), and negative predictive value (NPV) were recorded. On patient-level, one was defined as long-term hospital stay once more than one lesion of lung lobe was labeled as long-term stay lesion, if not, as short-term hospital stay. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Test values like areas under the receiver operating characteristic curves (95% confidence interval), sensitivity, specificity was calculated in SPSS and Python. A P-value < 0.05 was considered statistically significant. This cross-sectional study did not involve patients in study design, outcome measures, or writing or editing of this study. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint On patients-level, in training and inter-validation datasets, 6 of 6 patients were correctly classified as short-term stay by both models, and 20 of 20, 16 of 20 patients were correctly identified as long-term stay by RF and LR models, respectively. In test dataset, 1 of 2 patients was correctly classified as short-term stay, and 3 of 3 were correctly identified as long-term stay by RF and LR models. As of February 28, we followed up for a prospective dataset of 6 newly discharged patients with 24 lesions from designated hospitals (Comorbidities, symptoms and laboratory findings were described in supplementary Table 2 ). All patients were correctly recognized as long-term stay using both RF and LR models developed on raw 52 patients. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint 1 3 In this study, machine learning-based CT radiomics models were developed and tested for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection. CT radiomics features hidden within lesions, ground-glass opacities and consolidation, were extracted, and machine-learning models demonstrated robust performance using multicenter cohorts for training and inter-validation, and an independent dataset and a prospective dataset for test. Though there were slightly differences in CT scan parameters among centers, key features included in models were second-order, and focused on distribution, correlation and variance in gray level intensities, which described the relationship between voxels and hold quantitative information on the spatial heterogeneity of pneumonia lesions. 11, 12 Compared with first-order features, second-order features were not sensitive to absolute value and thus more robust. Moreover, the models showed satisfied AUCs more than 90% on both training and test process, which indicated that the models could be applied in a general situation. Similarity in AUCs, sensitivity and specificity for RF and LR models also demonstrated the robustness, according to prior study that classification method showed most dominant in variability of model. 11 The study was limited by small sample size. The percentage of short-term hospital stay is low in our multicenter cohorts, and semi-automated lesion segmentation might result in selection bias. A larger prospective multicenter cohort is needed to tune and test the machine learning-based CT radiomics models. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . In summary, the machine learning-based CT radiomics models showed feasibility and accuracy for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint 1 5 The The condition of patients in Wuhan worsens on the 10th day after illness onset. However, patients with symptoms longer than 10 days outside Wuhan were less severe than those in Wuhan. The hospital stay is one of the prognostic indicators, and its non-invasive predicting model based on CT radiomics features is important for assessing the patients' clinical outcome. In the multicenter study with patients from 5 designated hospitals in China, the CT radiomics models using logistic regression and random forest method showed satisfied diagnostic performance with sensitivity of 1.0 and 0.75, specificity of 0.89 and 1.0, respectively, in independent test dataset. The machine learning-based CT radiomics models showed feasibility and accuracy for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint 1 8 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Clinical findings in a group of patients infected with the 2019 novel coronavirus (SARS-Cov-2) outside of Wuhan, China: retrospective case series COVID-19): A Perspective from China Time Course of Lung Changes On Chest CT During Recovery From 2019 Novel Coronavirus (COVID-19) Pneumonia CT Imaging of the 2019 Novel Coronavirus (2019-nCoV) Pneumonia. Radiology CT imaging of coronavirus disease 2019 (COVID-19): from the qualitative to quantitative Radiomics: Images Are More than Pictures, They Are Data Automatic lung segmentation in routine imaging is a data diversity problem, not a methodology problem Computational Radiomics System to Decode the Radiographic Phenotype Multicenter study demonstrates radiomic features derived from magnetic resonance perfusion images identify pseudoprogression in glioblastoma All rights reserved. No reuse allowed without permission author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.29.20029603 doi: medRxiv preprint