key: cord-0730909-vwbwrwmb authors: Garrafa, E.; Vezzoli, M.; Ravanelli, M.; Farina, D.; Borghesi, A.; Calza, S.; Maroldi, R. title: Early Prediction of In-Hospital Death of COVID-19 Patients: A Machine-Learning Model Based on Age, Blood Analyses, and Chest X-Ray Score date: 2021-06-13 journal: nan DOI: 10.1101/2021.06.10.21258721 sha: 8e378fbfb7149ed82f907a02e76260be038b512f doc_id: 730909 cord_uid: vwbwrwmb Background: To develop and validate an early-warning model to predict in-hospital mortality on admission of COVID-19 patients at an emergency department (ED). Methods: In total, 2782 patients were enrolled between March 2020 and December 2020, including 2106 patients (first wave) and 676 patients (second wave) in the COVID-19 outbreak in Italy. The first wave patients were divided into two groups with 1474 patients used to train the model, and 632 to validate it. The 676 patients in the second wave were used to test the model. Age, 17 blood analytes and Brescia chest X-ray score were the variables processed using a Random Forests classification algorithm to build and validate the model. ROC analysis was used to assess the model performances. A web-based death-risk calculator was implemented and integrated within the Laboratory Information System of the hospital. Results: The final score was constructed by age (the most powerful predictor), blood analytes (the strongest predictors were lactate dehydrogenase, D-dimer, Neutrophil/Lymphocyte ratio, C-reactive protein, Lymphocyte %, Ferritin std and Monocyte %), and Brescia chest X-ray score. The areas under the receiver operating characteristic curve obtained for the three groups (training, validating and testing) were 0.98, 0.83 and 0.78, respectively.Conclusions: The model predicts in-hospital mortality on the basis of data that can be obtained in a short time, directly at the ED on admission. It functions as a web-based calculator, providing a risk score which is easy to interpret. It can be used in the triage process to support the decision on patient allocation. Starting from late February 2020, the COVID-19 outbreak struck the north of Italy causing more than 32 30,000 deaths in Lombardy alone, up to the end of March 2021. At the beginning of the outbreak, the 33 Spedali Civili di Brescia (SCBH), the university hospital of one of the hardest hit cities in Europe, was 34 faced with a 'flash flood' of severely ill patients seeking admission to the Emergency Department (ED). 35 For several weeks, their number exceeded the available resources, obliging a continuous organizational 36 restructuring of the hospital wards (Garrafa et al., 2020b) . 37 In those weeks, given the limited evidence of clinically proven predictors (Marengoni et al., 38 2021 )(Wynants et al., 2020 )(Sperrin et al., 2020 , prioritizing hospital admission of non-critical patients 39 was an arduous task. Essentially, the criteria were based on the presence of fever, respiratory symptoms 40 and the level of blood oxygenation. A significant drawback of this approach was that patients referring 41 to the ED with very similar clinical findings underwent inconsistent assessments. In this scenario, the 42 availability of predictors would have been extremely beneficial, not only to triage patients, but also to 43 monitor hospitalized patients and warn of exacerbation of the outbreaks. 44 Starting from March 2020, all patients referred to EDs underwent a chest X-ray at admission or within a 45 few hours. With the purpose of grading pulmonary involvement and tracking changes objectively over 46 time, a chest X-ray severity score was developed (Brescia X-ray score) ( The descriptive statistics for all variables in the dataset are presented in Table S2 and were computed and 148 stratified by the two waves (MA vs MD) and by outcome (Alive vs Dead). The two subsets were similar 149 for most variables. 150 The correlations between the 17 analytes and the Brescia X-ray score were investigated using Spearman 151 correlation coefficients and visualized using a correlation plot ( Figure S1 ). The Brescia X-ray score was 152 positively correlated with Neutrophil to Lymphocyte ratio, CRP, LDH, standardized Ferritin, and D-153 Dimer, and was negatively correlated with Lymphocyte %, Monocyte %, and Basophil %. 154 A machine-learning model (BS-EWM) was developed by inputting a dataset of 2782 COVID-19 patients 156 admitted to the ED and hospitalized at SCBH from March to December 2020. The majority of patients 157 (2106/2782, 75.70%) belonged to the first wave (MA), the remaining fraction (676/2782, 24.30%) to the 158 second wave (MD). As outcome, the machine-learning model had the condition Dead/Alive, and, as 159 covariates: age, Brescia X-ray score and 17 blood sample analytes. 160 Figure 1 reports the flow chart that describes how data were divided for training, validation and testing 161 the BS-EWM. 162 The SMOTE procedure, rebalancing the Dead/Alive ratio (50% vs. 50%) from the original 20.09%, 163 improved accuracy, specificity, and sensitivity of the Random Forest applied on it (see Table S3 which 164 compares performance metrics with/without the SMOTE method). 165 The rel VIM and PDP were extracted from the Random Forests (Figure 2 , panel A and B respectively). 166 In panel A1, the rel VIM of BS-EWM based on age, Brescia X-ray score and 17 blood analytes are 167 reported on a bar plot. Since age was strongly associated with the risk of death, it masked the role of the 168 8 other covariates. For completeness, the relevance of the 17 analytes and Brescia X-ray score was 169 estimated in an additional EWM, in which the covariate 'age' was excluded. In the resulting bar plot 170 ( Figure 2 , panel A2), 9/17 analytes and the Brescia X-ray score were noted as being important in 171 predicting the risk of death (rel VIM>60). The effects of changes in covariate values on the risk of death-172 threshold of the EWM were reported by means of a PDP (a 2D plot in the x-y plane) (Figure 2, panel 173 B). Only Fibrinogen was excluded from this graphical representation since in Table 1 When compared to other models such as GBM and Logistic Regression, the Random Forest showed 178 better performance in terms of AUC, sensitivity, and specificity. The in-sample sensitivity (0.93) yielded 179 by the model was the highest, and it maintained an important 0.82 in validating the out-of-sample 180 sensitivity, and this decreased to 0.73 when testing the MD subgroup (see Table 2 which contains details 181 on all the metrics extracted from the ROC analysis). ROC curves are visualized in Figure 3 where, for 182 each model (Random Forest, GBM and Logistic Regression), the performances in Training, Validating 183 and Testing are compared in a unique graph. 184 The dataset for the development, validation and testing of the BS-EWM originated entirely from an 186 Italian region, potentially limiting the generalizability of the risk score in other areas of the world. 187 Additional validation studies from different geographic areas are welcomed. Furthermore, though the 188 BS-EWM has been validated using blood sample values obtained by instruments that satisfy internal and 189 external quality control, different equipment could lead to divergent results (Martens et al., 2021 )(Lippi 190 et al., 2020 . Therefore, it would be appropriate to harmonize the results. Another limit could have been 191 the presence of missing values, though the BS-EWM has also performed adequately in this condition 192 since it used a multiple imputation technique to overcome the problem. Finally, it is important to point 193 out that the BS-EWM risk score should not be used for asymptomatic COVID-19 patients or for the 194 pediatric population. 195 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 13, 2021. Though the BS-EWM has been developed on a cohort of 2106 patients belonging to the wave, the model also demonstrated a sensitivity greater than 70% in the early prediction of high risk in 197 patients in the second wave, when in-hospital mortality was 40% lower. 198 Several predictive models have recently been applied to COVID-19 cohorts with variable results, some 199 of them previously developed to predict mortality for community-acquired pneumonia, such as the 200 Pneumonia Severity Index, CURB-65, qSOFA, and MuLBSTA(Yavuz et al., 2021 )(Lazar Neto et al., 201 2021 )(Artero et al., 2021 ), NEWS2 criteria (Myrstad et al., 2020 )(Gidari et al., 2020 , and SCAP score 202 (Anurag and Preetam, n.d.). Novel early-warning scores have been specifically built on COVID-19 203 patient series using different techniques such as parametric and non-parametric tests (Linssen et al., 2020) 204 or artificial intelligence techniques such as the COVID-GRAM score (Liang et al., 2020) . 205 While these models are mostly based on age and a set of vital (clinical) parameters, in addition to age, 206 the BS-EWM depends on blood parameters. It is conceivable that blood analytes capture a snapshot at 207 hospital admission signaling a specific bodily reaction to viral infection in terms of hyperinflammation, 208 immune response and thrombophilia. On the other hand, the other models are more influenced by the 209 general status of the patient, which may be determined by concomitant and pre-existing diseases. The present study is not unique in encompassing radiological findings combined with blood analysis. 234 The study by Schalekamp et al. (Schalekamp et al., 2020) integrated blood analysis parameters and 235 radiological information derived by grading chest X-rays (0-8 scale points). Unlike the cited study, with 236 the BS-EWM in this study, the radiological score did not reach a high relevance (rel VIM) in predicting 237 high risk. This difference can be explained by the different approaches used to build the model (Logistic 238 regression vs Random Forests) and by the high degree of correlation of the X-ray score with multiple 239 blood analytes: "collinearity" thus could have "stolen" importance from the information provided by 240 imaging. Nevertheless, at admission, the chest X-ray score of patients who subsequently died was 241 significantly higher than for patients who survived. Furthermore, the chest X-ray score may provide 242 additional stability to the model, playing an important role in the case of missing data in the blood sample 243 An important and pragmatic aspect offered by the BS-EWM is that the biomarkers employed may be 245 obtained by the emergency laboratory in less than an hour (Garrafa et al., 2020a) and, differently from 246 other biomarkers (Kyriazopoulou et al., 2021, p. 19) , they are non-expensive and frequently used also in 247 developing countries. It is important to note that the same methodology could be applied to other 248 infections and be practical to triage people. 249 Most laboratories, including the small or peripheral ones, may provide results in a short time. At the 250 Spedali Civili of Brescia, the BS-EWM is integrated within the Laboratory Information System. It works 251 as a web-based calculator and is easy to interpret. It provides a risk threshold of 0.5, above which patients 252 are graded as having a potentially high death-risk, thus supporting closer clinical observation or 253 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 13, 2021. admission to a high-intensive care ward. In patients yielding a low risk (score 0 to 0.49), the decision by 254 clinicians to allocate them to a low-intensive care ward or to monitoring is further sustained. 255 Finally, the need to regularly update models and closely monitor their performances over time and 256 geographically should be underlined, given the rapidly changing nature of the disease and its 257 management. 258 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 13, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 13, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 13, 2021. ; https://doi.org/10.1101/2021.06.10.21258721 doi: medRxiv preprint Tables Table 1: Descriptive statistics on all variables in the dataset stratified respect Alive-Dead. Comparison between first (March-April) and second (May-December) wave. 26 (0.11-0.94) Comparison between the performances of three methods: Random Forest, GBM and Logistic Regression model applied on the rebalanced dataset obtained with SMOTE methodology. Logistic Regression predictions are computed using the 10-fold cross-validation in order to be comparable with Random Forest and GBM predictions (which use out-of-bag and 10-fold cross-validation, respectively). Figure 1 : Flow-chart of the data used in the empirical analyses The BS-EWM was trained with a Random Forest on 70% of first wave patients (rebalanced with the SMOTE procedure) and (i) validated on remaining 30% of first wave patients (ii) tested on 676 second wave patients. In detail, 2106 patients were randomly in training and validating, maintaining the same death prevalence of the first wave. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 13, 2021. Table 1 <0.05. PDPs are displayed from the most to the less important variable. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 13, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 13, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 13, 2021. ; In this table we compare the performance of two RFs applied on (i) a dataset rebalanced with the SMOTE methodology (ii) the original dataset. This analysis suggests the use of SMOTE methodology before applying RF since the performance in Training and Validating groups (especially in terms of sensitivity) are better respect those obtained from the RF grown on the original dataset. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 13, 2021. ; https://doi.org/10.1101/2021.06.10.21258721 doi: medRxiv preprint The relationships between 17 analytes and Brescia chest-xray score are inspected with the Spearman correlation coefficients, r s which are represented in this correlation plot by means of blue and red circles (positive and negative correlation, respectively). The diameter of the circle is proportional to the magnitude of r s and black crosses on them identify correlation not significantly different from zero (p-values>0.05). The correlation matrix is reordered according to the hierarchical cluster analysis on the quantitative variables. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 13, 2021. ; https://doi.org/10.1101/2021.06.10.21258721 doi: medRxiv preprint Median (Q1, Q3) Median (Q1, Q3) * Wilcoxon rank-sum test t ** Fisher's exact test