key: cord-0778647-0avdml1r
authors: Fernandez, Ancy; Obiechina, Nonyelum; Koh, Justin; Hong, Anna; Nandi, Angela; Reynolds, Timothy M.
title: Survival prediction algorithms for COVID‐19 patients admitted to a UK district general hospital
date: 2021-01-18
journal: Int J Clin Pract
DOI: 10.1111/ijcp.13974
sha: 2916a5623b6c35c4b8496027e681f4f653129f98
doc_id: 778647
cord_uid: 0avdml1r

OBJECTIVE: To collect and review data from consecutive patients admitted to Queen’s Hospital, Burton on Trent for treatment of Covid‐19 infection, with the aim of developing a predictive algorithm that can help identify those patients likely to survive. DESIGN: Consecutive patient data were collected from all admissions to hospital for treatment of Covid‐19. Data were manually extracted from the electronic patient record for statistical analysis. RESULTS: Data, including outcome data (discharged alive/died), were extracted for 487 consecutive patients, admitted for treatment. Overall, patients who died were older, had very significantly lower Oxygen saturation (SpO2) on admission, required a higher inspired Oxygen concentration (IpO2) and higher CRP as evidenced by a Bonferroni‐corrected (P < 0.0056). Evaluated individually, platelets and lymphocyte count were not statistically significant but when used in a logistic regression to develop a predictive score, platelet count did add predictive value. The 5‐parameter prediction algorithm we developed was: [Formula: see text] CONCLUSION: Age, IpO2 on admission, CRP, platelets and number of lungs consolidated were effective marker combinations that helped identify patients who would be likely to survive. The AUC under the ROC Plot was 0.8129 (95% confidence interval 0.0.773 ‐ 0.853; P < .001).

prediction of hospital mortality in patients from the UK, but that UK data are combined with data from China. 3 That report was based on 653 patients of whom the outcome was known in just 58 patients.

We also note a BMJ Editorial that stated that all models are wrong but better reporting and data sharing could improve this. 4 Therefore, we reviewed the case notes of patients with known outcomes seen at Queen's Hospital Burton (QHB). To ensure consistency of data collection, only QHB data were used because the UHDB Trust has only been recently formed by merger of two individual Trusts and the two parts of the group use different patient records software.

We then identified an algorithm that may help to predict which patients will survive Covid-19 infection, based on their initial investigation results.

Data were manually extracted from the Meditech hospital computer system for 487 consecutive patient admissions for Covid-19 infection at Queen's Hospital, Burton-on-Trent, UK into an excel spreadsheet. Anonymised data were used for this evaluation project.

We carried out two rounds of algorithm development. The first was based on laboratory data only and the second included clinical information.

Demographics (Age, gender, ethnicity) and Initial investigation results (Oxygen saturation on admission (SpO2), platelets, total white cell count, neutrophil count, lymphocyte count, CRP, ALT, ALP bilirubin and d-Dimers) were collated and since the data were not normally distributed, statistical analysis was non-parametric using the Mann-Whitney rank test for unpaired data (http://vassa rstats.net/ index.html). Due to the multiple possible correlations for each set of blood results, a Bonferroni correction was used (to P < .0056) for blood results. For age and initial SpO2, the standard threshold (P < .05) was used.

Gender, ethnicity, ALT, ALP, bilirubin and d-dimers were all excluded from further analysis because Mann-Whitney statistic showed there were no significant differences between those who survived and those who died.

Multivariate logistic regression of age, admission SpO2, admission CRP, admission platelets and admission lymphocytes (X variables) against survival (0)/death (1) (Y variable) was carried out using an internet calculator (http://stats.blue/Stats_Suite/ logis tic_regre ssion_calcu lator.html), and IBM SPSS. Platelets and lymphocytes were included in this analysis because although they were non-significant using the Bonferroni corrected P threshold, they met the standard significance threshold, and because the stepwise logistic regression process assesses whether data are significant or not for the regression and non-significant contributors are excluded.

Lymphocytes were then excluded because they were shown to be non-significant in the logistic regression, leading to a 4-parameter regression model. In both stages of data analysis, logistic regression was attempted using a variety of different normalisation transforms (Square root, inverse, and logarithm) for all variables and the transform that worked best for each analyte was used for the final version of the algorithm.

For age, SpO2, IpO2, platelets and CRP, the appropriate transform was the natural logarithm.

The second regression model gave a significantly greater area under the ROC curve. Therefore, validation analysis was carried out only on the 5-parameter regression model. This analysis was carried out

• Covid-19 is a novel viral infection that causes a significant risk of death in those patients who need hospital treatment.

• Many prediction algorithms have already been written but more data are required to further improve algorithm performance.

• We have developed a survival prediction algorithm that uses admission blood results, and two clinical factors to identify risk of death from Covid-19.

• We used data from a single location to remove the confounding effects of different policies in different healthcare systems.

by boot-strapping using 20 replicates. For each replicate, the data were separated into two roughly equal groups (regression and validation) by assigning a random number (0-1) to each row of data, with values <0.5 assigning the data to the regression group and ≥0.5, to the validation group. The data were then regressed using a 5-parameter model and the AUC was estimated using the validation data.

Data were collected retrospectively, and the study had no impact on the care of the patient during their admission. Furthermore, it was anonymised before statistical analysis. This met the definition of service evaluation and therefore did not require review by a research ethics committee (http://www.hra-decis ionto ols.org.uk/resea rch/).

After exclusion of patients for whom the full dataset was not available, there were 166 patients who had died and 250 who had survived whose data were used to derive the 4-parameter prediction algorithm (416 patients in total). During the collation of the extra clinical data, nine extra sets of patient data were collected resulting in 259 patients who had survived and 166 who had died being used for the second analysis round (425 patients in total). Ethnicity was not shown to be significant but this may be due to the population distribution (88.6% white, 2.26% Indian sub-continent, 0.4% Black, 9.97% unspecified). Table 1 shows ages and initial blood results for patients who left the hospital alive, and those who died with the two-tailed P value for difference. Overall, patients who died were older, had very significantly lower SpO2 on admission and higher CRP. No other differences met Bonferroni-corrected statistical significance. Platelets and lymphocyte counts did meet the "standard" statistical significance threshold of P < .05. Table 2 shows the blood results after 6 days. In the patients who died, the platelets, neutrophils and CRP were statistically significantly higher. Further analysis of blood results of patients for whom paired data at days 0 and 6 were available did not reveal any significant differences in the changes in results between the dead and alive groups (data not shown). After testing different transforms in the first data analysis round, it was found that optimum normalisation was achieved using the natural logarithm of all X variables. On the first pass of the logistic regression using normalised variables, Age, SpO2, CRP and platelet count were statistically significant contributors but lymphocyte count was shown to be non-significant (P = .435), so a second pass excluding lymphocytes was completed. This gave the following prediction algorithm, which was statistically significant (Goodness of fit by Hosmer and Lemeshow Test P = .018).

where LN is natural logarithm.

For this 4-parameter algorithm, the ROC curve AUC was 0.737 (95% Conf. Interval 0.689-0.784; P < .001).

In the second data analysis round, IpO2 was shown to be statistically significant but this rendered SpO2 non-significant (P0.3913), so it was excluded. The resultant algorithm was as follows: 

Coefficients from full dataset and split datasets for 5-parameter model tested (www.vassa rstats.net/roc_comp.html) and was shown to be significant (P = .021 two-tailed). Table 3 shows the coefficients and AUC estimated using the full 5-parameter dataset, and the mean ± SD derived from the 20 replicates of split regression/validation datasets. The deviation of the full dataset coefficient from the mean of the split dataset means was evaluated by calculating the number of standard deviations between them (Z score). In all cases, the Z score was < ± 0.4 indicating no statistically significant difference. Figure 1 shows a ROC plot for the 4-and 5-parameter prediction algorithms.

Covid-19 represents a new threat to health and we are still learning to deal with it. The only way to improve our knowledge about how to deal with this threat quickly is to share the data we have as openly and rapidly as possible. 4 A large number of algorithms evaluating Covid-19 patients have already been published but we were unable to identify any that use admission data from a single source to predict outcome. Furthermore, many algorithms used data from multiple centres in different countries which may have significantly different healthcare systems.

The two algorithms we have developed use parameters that have been reported to work in other predictive algorithms 2 but only use four parameters (Age, Admission SpO2, CRP and platelet count) or five parameters (age, admission inspired pO2, CRP, platelet count and number of consolidated lungs) because other data that we evaluated were shown not to have significant differences between those who survived admission for Covid-19, and those who did not. All of the parameters used are simple tests that should be available within 60-90 minutes of arrival at hospital. The importance of having a tool that helps predict survival is that it can also be used to predict which patients may need more complex interventions to assist them, that is, patients with a lower survival probability may benefit from earlier consideration for intensive care.

There are clearly limitations to the data we have presented.

• Our dataset is relatively small (416 patients, of whom 166 died) but represents the total data available from the first wave of Coronavirus patients passing through our doors, so is as complete a dataset as we can collect.

• We were unable to get data from any independent source to verify the algorithm but boot-strapping analysis shows that the overall estimate for the better 5-parameter algorithm are robust.

• The population served by our hospital is predominantly white

British, so data on other ethnicities were too limited to be useful.

Despite the limitations, our 5-parameter model has similar effectiveness to other published algorithms:

The PANDEMYC score 5 showed that oxygen saturation was an important discriminatory factor in predicting survival or in-hospital deterioration.

We note that our dataset and the other datasets described above are all relatively small. We would be very happy to share our data with any other researcher to allow the development of better clinical tools for survival prediction. Also, having derived our algorithm during the first wave of the epidemic, further studies to identify whether survival is improved in second and subsequent waves would be very useful.

We have developed an algorithm to predict the probability of sur- 

The authors would like to thank Alexandra Timperley, Moomena

Chowdhury, Faisal Al-khalidi, Maceij Rusilowicz, Rachel Garnett, Surekha Amonker, Dylan Parmar, Cindy Cleto Rodrigues and Alice

Gwyn Jones for assisting with data collection.

TMR is currently in the receipt of project grants from Genzyme Therapeutics, Oxford, UK (now Sanofi Genzyme, Oxford, UK); Shire Pharmaceuticals, Basingstoke, UK, now Takeda Pharmaceutical Ltd; and Synageva BioPharma, Watford, UK (now Alexion Pharma UK, Uxbridge, UK).

Data Collection: AF/AH/JK. Statistical Analysis: TR. Paper writing/ Approval: AF/AN/NO/TR.

All data available from the authors and has been uploaded as supplemental files with this submission.

Timothy M. Reynolds https://orcid.org/0000-0002-9729-4775

Clinical characteristics of coronavirus disease 2019 in China

Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal

Risk prediction for poor outcome and death in hospital in-patients with COVID-19: derivation in Wuhan, China and external validation in London

Prediction models for diagnosis and prognosis in Covid-19: all models are wrong but data sharing and better reporting could improve this

An easily applicable and interpretable model for predicting mortality associated with COVID-19

Clinical risk score to predict in-hospital mortality in COVID-19 patients: a retrospective cohort study

Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: an observational cohort study

Survival prediction algorithms for COVID-19 patients admitted to a UK district general hospital