key: cord-0700960-2peklnbv
authors: Knight, Stephen R; Gupta, Rishi K; Ho, Antonia; Pius, Riinu; Buchan, Iain; Carson, Gail; Drake, Thomas M; Dunning, Jake; Fairfield, Cameron J; Gamble, Carrol; Green, Christopher A; Halpin, Sophie; Hardwick, Hayley E; Holden, Karl A; Horby, Peter W; Jackson, Clare; Mclean, Kenneth A; Merson, Laura; Nguyen-Van-Tam, Jonathan S; Norman, Lisa; Olliaro, Piero L; Pritchard, Mark G; Russell, Clark D; Shaw, Catherine A; Sheikh, Aziz; Solomon, Tom; Sudlow, Cathie; Swann, Olivia V; Turtle, Lance C W; Openshaw, Peter J M; Baillie, J Kenneth; Docherty, Annemarie; Semple, Malcolm G; Noursadeghi, Mahdad; Harrison, Ewen M
title: Prospective validation of the 4C prognostic models for adults hospitalised with COVID-19 using the ISARIC WHO Clinical Characterisation Protocol
date: 2021-11-21
journal: Thorax
DOI: 10.1136/thoraxjnl-2021-217629
sha: f508be8919c06952be0743d87c13060f7721857d
doc_id: 700960
cord_uid: 2peklnbv

PURPOSE: To prospectively validate two risk scores to predict mortality (4C Mortality) and in-hospital deterioration (4C Deterioration) among adults hospitalised with COVID-19. METHODS: Prospective observational cohort study of adults (age ≥18 years) with confirmed or highly suspected COVID-19 recruited into the International Severe Acute Respiratory and emerging Infections Consortium (ISARIC) WHO Clinical Characterisation Protocol UK (CCP-UK) study in 306 hospitals across England, Scotland and Wales. Patients were recruited between 27 August 2020 and 17 February 2021, with at least 4 weeks follow-up before final data extraction. The main outcome measures were discrimination and calibration of models for in-hospital deterioration (defined as any requirement of ventilatory support or critical care, or death) and mortality, incorporating predefined subgroups. RESULTS: 76 588 participants were included, of whom 27 352 (37.4%) deteriorated and 12 581 (17.4%) died. Both the 4C Mortality (0.78 (0.77 to 0.78)) and 4C Deterioration scores (pooled C-statistic 0.76 (95% CI 0.75 to 0.77)) demonstrated consistent discrimination across all nine National Health Service regions, with similar performance metrics to the original validation cohorts. Calibration remained stable (4C Mortality: pooled slope 1.09, pooled calibration-in-the-large 0.12; 4C Deterioration: 1.00, –0.04), with no need for temporal recalibration during the second UK pandemic wave of hospital admissions. CONCLUSION: Both 4C risk stratification models demonstrate consistent performance to predict clinical deterioration and mortality in a large prospective second wave validation cohort of UK patients. Despite recent advances in the treatment and management of adults hospitalised with COVID-19, both scores can continue to inform clinical decision making. TRIAL REGISTRATION NUMBER: ISRCTN66726260.

Disease resulting from infection with SARS-CoV-2 has a high mortality rate with deaths predominantly caused by respiratory failure. 1 We previously reported two prognostic scores for in-hospital mortality 2 and deterioration 3 

What is the key question? ► The clinical characteristics and management of hospitalised patients with confirmed or highly suspected COVID-19 have changed over time; ongoing prospective validation of risk prediction scores is therefore required.

► Both 4C prediction scores performed well in a large prospective UK cohort, with stable validation metrics across National Health Service regions and ethnicity, despite reduced overall deterioration and mortality risk compared with original derivation and validation cohorts.

► This is the first large prospective revalidation of these risk stratification tools, which demonstrated stable performance in over 75 000 hospitalised UK patients.

Consortium (ISARIC4C) study derived and validated in large UK cohorts during the first pandemic wave. As hospitals around the world faced a continued influx of patients with COVID-19, these easy-to-use risk stratification tools have facilitated early identification of patients infected with SARS-CoV-2 who are at the highest risk of deterioration and death to guide management and optimise resource allocation. 4 Both scores use readily available clinical and biochemical parameters at the time of assessment, with predicted risk used to guide antiviral treatment across the UK. 5 Furthermore, independent external validation has demonstrated consistent performance worldwide. [6] [7] [8] [9] [10] Management and treatment of patients admitted or diagnosed in hospital with COVID-19 have changed markedly over the past year, notably with the introduction of corticosteroids for people requiring supplemental oxygen or ventilatory support. 11 Ongoing prospective validation is therefore necessary to ensure adequate performance and inform the need for temporal recalibration. 12 In this article, we extend our previous work by prospectively validating these scores in a large study cohort using the wide geographical coverage of the ISARIC4C study cohort in England, Wales, and Scotland, among adults recruited during the second pandemic wave.

The International Severe Acute Respiratory and emerging Infections Consortium (ISARIC)-WHO Clinical Characterisation Protocol UK (CCP-UK) study is an ongoing prospective multicentre cohort study being conducted by ISARIC4C in 306 hospitals across England, Scotland and Wales (National Institute for Health Research (NIHR) Clinical Research Network Central Portfolio Management System ID 14152). The study was part of a suite of 'sleeping' protocols established and approved in the UK prior to the pandemic and was activated on 17 January 2020. CCP-UK is a version of a standardised open-source international research protocol (the ISARIC/WHO CCP) created in 2012, which has been employed worldwide for harmonised observational studies of COVID-19. [13] [14] [15] The CCP-UK protocol and further study details are available online. 16 In this analysis, we included consecutive adults (aged ≥18 years) who had highly suspected or PCR-confirmed COVID-19. As before, we included patients with suspected COVID-19 in the analysis because both models were intended for use in participants at the point of initial evaluation for COVID-19, when virological confirmation might not be available. We also included nosocomial COVID-19 acquisition to test the hypothesis that acquisition of infection in hospital might be associated with differential risk. Community-acquired infection was defined as symptom onset or first positive SARS-CoV-2 PCR result within 7 days from admission; participants who did not meet these criteria and had either symptom onset or first positive SARS-CoV-2 PCR result more than 7 days from admission were classified as nosocomial cases. 17 Among nosocomial cases, patients who met the deterioration outcome before the onset of COVID-19 were excluded. Northern Ireland was excluded from model validation due to the small numbers of patients recruited.

For both scores, we included eligible hospitalised participants with confirmed or highly suspected COVID-19 between 27 August 2020 and 17 February 2021. All patients had at least four weeks follow-up to reduce selection bias (final data extraction date: 23 April 2021). Patients included within original derivation or validation cohorts for either score were excluded from this analysis. Participants who had ongoing hospital care at the end of follow-up (the point at which a final outcome was recorded in the case record form) were classified as not meeting the endpoint because the risk of deterioration declines with time since admission. 3 Hospitals that recruited participants admitted to intensive care unit (ICU) exclusively were not included since these participants had already met the 4C Deterioration outcome by definition. Both scores are summarised in table 1, with additional information on how coefficients were transformed into points systems contained in online supplemental appendix 1.

The study is reported in accordance with Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidance. 18 Demographic, clinical and outcome data were collected by research nurses and medical student volunteers using a publicly available standardised case record form and uploaded to a Research Electronic Data Capture Database (REDCap, Vanderbilt University, US, hosted by University of Oxford, UK). Co-morbidities were defined by a modified Charlson comorbidity index and obesity was clinician-defined. 16 Consent was not required for the use of routinely collected clinical data from medical records. The Control of Patient Information notice 2020 for urgent public health research makes provision for this in England and Wales. A waiver for consent was obtained from the Public Benefit and Privacy Panel for Health and Social Care in Scotland.

For the 4C Deterioration model, we used a composite primary outcome comprising any of the following during hospital admission and were equally weighted: initiation of ventilatory support (non-invasive ventilation, invasive mechanical ventilation or extracorporeal membrane oxygenation); admission to a high-dependency unit or ICU; or death (all-cause), as reported previously. 3

All validation analyses were conducted as described in our previous publication. 3 Briefly, we assessed model discrimination (how well predictions differentiated participants who experienced the outcomes from those who did not, quantified as the C-statistic) and calibration (agreement between predicted and observed risk, assessed using calibration slopes, calibration-inthe-large and calibration plots). 19 20 To assess calibration for the 4C Mortality score, we transformed points scores to the probability scale using the observed mortality proportion for each distinct total point score in the original reported validation cohort. 2 We used multiple imputation with chained equations to account for missing data using the mice package in R, 21 as previously described. 3 We included all predictors (including restricted cubic spline transformations) and the outcome in the imputation models. Analyses were done in each of the 10 multiply imputed datasets and pooled using Rubin's rules. 22 Our primary analyses were performed stratified by National Health Service (NHS) region in order to examine for evidence of between-region heterogeneity in model performance. We visualised C-statistics, calibration-in-the-large and slope estimates across regions in forest plots and calculated pooled estimates using random effects meta-analysis, as previously recommended. 23 Decision curve analysis allows assessment of clinical utility by quantifying the trade-off between correctly identifying true positives and incorrectly identifying false positives weighted according to the threshold probability. 24 The threshold probability represents the risk cut-off above which any given treatment or intervention might be considered and reflects the perceived risk:benefit ratio for the intervention. Decision curve analysis was used to quantify the net benefit of implementing the model in clinical practice 24 compared with the following: a treat-all approach; a treat-none approach. All decision curves were smoothed by locally weighted smoothing (LOESS) from stacked multiply imputed datasets.

A sensitivity analysis was also performed, with stratification of the validation cohort by ethnic group and month of admission, in view of previously reported differences in COVID-19 outcomes by ethnicity and over time. 25 26 All analyses were done in R (V.3.6.3).

This was an urgent public health research study in response to a Public Health Emergency of International Concern. Patients or the public were not involved in the design, conduct or reporting of this rapid response research.

The funder of the study had no role in study design, data collection, data analysis, data interpretation or writing of the report. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.

August 2020 and 17 February 2021, 76 588 eligible adults were recruited to the ISARIC4C study and included in the current analysis, of whom 69 260 (90.4%) were known to have PCR-confirmed COVID-19. Baseline demographic, physiological and laboratory characteristics are shown stratified by outcome (table 2). The median age of patients in the cohort was 72 years (IQR 57-83); 35 231 (46.0%) were female and 52 704 (72.6%) had at least one comorbidity. The temporal distribution of participant admissions, stratified by NHS region, is shown in figure 1 . For patients with nosocomial infections, the median time from admission to recruitment was 15 days (IQR 10-29). A summary of missingness for predictor variables included in both scores is shown in online supplemental appendix 2.

For the deterioration outcome, 73 078 (95.4%) participants had an outcome available, and in-hospital clinical deterioration occurred in 27 352 (37.4%), with a median time to deterioration of 5 days (IQR 1-12). For mortality, 72 481 (94.6%) participants had an outcome available, with in-hospital death occurring among 12 581 (17.4%). The median time to death was 11 days (IQR [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] .

Forest plots showing model discrimination (C-statistic) and calibration metrics (slope and calibration-in-the-large) for both 4C Mortality and 4C Deterioration scores are shown in figure 2. C-statistics were consistent across NHS regions for 4C Mortality scores (point estimates 0.77-0.81; pooled random-effects metaanalysis estimate 0.78 (95% CI 0.77 to 0.78), I 2 =35%) and 4C Deterioration (point estimates 0.74-0.78; pooled randomeffects meta-analysis estimate 0.76 (0.75 to 0.77), I 2 =57%). Calibration slopes were also consistent across regions, for the 4C Mortality score (0.97-1.31; pooled estimate 1.09 (1.03 to 1.16), I 2 =65%) and 4C Deterioration (point estimates 0.94-1.04; pooled estimate 1.00 (0.98 to 1.03), I 2 =30%; figure 3 ).

Heterogeneity across NHS regions in calibration-in-the-large was seen. For 4C Mortality, points estimates were -0.15 to 0.41 (pooled estimate 0.12 (-0.01 to 0.25), I 2 =97%); for 4C Deterioration, point estimates ranged from -0.15 to 0.19 (pooled estimate -0.04 (-0.13 to 0.04), I 2 =93%). The sensitivity, specificity, positive-predictive value and negative-predictive value across the full range of probability thresholds from the model are shown in online supplemental appendix S3 and S4. Decision curve analysis to further examine clinical utility showed higher 

Sensitivity analyses that used complete case data showed similar discrimination and performance metrics (online supplemental appendices S6 and S7) to analyses that used the imputed dataset. Despite changes in unadjusted mortality when stratified by month of admission for patients aged ≥60 years old (online supplemental appendix S10), discrimination (4C Mortality range 0.77-0.80, I 2 =74%; 4C Deterioration 0.75-0.79, I 2 =82%) and calibration slopes (4C Mortality point estimates 1.05-1.16, I 2 =6%; 4C Deterioration 0.97-1.09, I 2 =49%, online supplemental appendices 11-13) were found to remain stable for both scores. However, 4C Mortality demonstrated heterogeneity for calibration-in-the-large in December 2020 (0.27) compared with other months of admission (range −0.06 to 0.14, pooled estimate 0.10 (−0.02 to 0.22), I 2 =96%; online supplemental appendix S11), corresponding with an increase in unadjusted mortality for patients ≥70 years old (online supplemental appendix S10). The calibration plot for patients recruited during December demonstrated an associated underestimation of mortality risk (online supplemental appendix S12). For 4C Deterioration, calibrationin-the-large was similar across admission month (pooled estimate −0.06 (−0.11 to −0.01), I 2 =81%).

Across the four 4C Mortality risk groups, the corresponding mortality risks were: low risk (0-3 score, mortality rate 1.5%); intermediate risk (4-8 score, 9.5%); high risk (9-14 score, 32.8%); and very high risk (≥15 score, 63.9%). These mortality risks were similar to the original validation cohort (online supplemental appendices 14 and 15). A stepwise increase in oxygen requirement, deterioration, mortality and duration of Data are median (IQR) or n (%), calculated from non-missing data. Participants are shown by the first chronological deterioration category through which they met the composite primary outcome (HDU or ICU admission, ventilatory support or death). *Statistics presented: median (IQR); n (%). HDU, high-dependency unit; ICU, intensive care unit; NHS, National Health Service.; hospital stay was seen for both scores as the predicted outcome risk increased (figures 4 and 5). For those identified at the lowest risk of mortality (0-3 score) or deterioration (first and second deciles), patients were the least likely to receive oxygen or deteriorate and had a shorter length of in-patient stay.

We prospectively validated our previously reported risk stratification tools in a prospective cohort study of 76 588 UK hospitalised patients with confirmed or highly suspected COVID-19 during the second pandemic wave. Both scores (4C Mortality and 4C Deterioration) showed consistent discrimination and calibration across NHS regions and ethnicity. Similar metrics to the original reported first wave validation cohorts were found, despite systematic changes in patient management compared with the first wave, particularly the routine use of corticosteroids. Robust prognostic models that predict outcomes among COVID-19 patients have been urgently needed to support clinical decision-making regarding hospital admission and treatment. Our 4C scores measure patient comorbidity, abnormal physiology and inflammation using routinely measured demographics, bedside observations and laboratory tests to facilitate objective evidence-based assessments. We previously noted that our prognostic tools should be interpreted in the context of current standard treatment at the time the models were developed and validated. 12 Hospital-based management of adults with COVID-19 has evolved with the accrual of new evidence and increasing clinical experience, including dexamethasone use and patient proning. 11 27-29 Therefore, the temporal assessment of model performance is critical to ensure consistent performance and inform the need for temporal recalibration, by updating model intercepts, slopes or coefficients as required. 12 Our findings suggest that recalibration is currently not required for the 4C Mortality score or 4C Deterioration model. We will continue to monitor model performance prospectively in ISARIC4C and can perform temporal recalibration to update the model if required.

A key aim of risk stratification is to support clinical management decisions, as part of daily routine care and inform stratification of patients on the basis of clinical severity. The combination of Figure 2 Validation metrics for 4C Mortality and 4C Deterioration score. Random-effects meta-analysis was performed across NHS regions for each metric. CITL, calibration-in-the-large; NHS, National Health Service.

Predicted vs observed outcome probability shows as calibration plots across all NHS regions for 4C Mortality (calibration slope 1.09, CITL 0.12) and 4C Deterioration score (calibration slope 1.00, CITL −0.04). Scores fitted using original derivation cohorts, with predictions from 10 multiply imputed validation data sets, pooled and LOESS curve fitted through predictions. CITL, calibration-in-the-large; NHS, National Health Service.

both scores could be included in the programmatic standard of care adopted by hospitals to identify clinical pathways for patients with COVID-19. As well as improving clinical management, it should also encourage better allocation of human and economic resources. 30 We demonstrate that risk classes identified in original score derivation and validation cohorts continue to perform well, with a stepwise elevation in oxygen requirement, risk of deterioration and death seen as risk class increased. In addition, both scores performed well across the full range of risk. Patients identified as low risk by both scores were less likely to require oxygen and have a short hospital admission, suggesting these patients could be managed in the community if supported by a clinician's overall assessment, while patients within the intermediate-risk group (4C Mortality) or third to fifth deciles (4C Deterioration) might be suitable for ward level monitoring initially. Meanwhile, patients at high risk of mortality or deterioration may prompt aggressive treatment and early escalation to critical care if appropriate. However, these risk stratification tools should guide, but not replace, clinical decision making. Furthermore, care should be taken when interpreting predicted risk, as it does not reflect the outcome risk in the absence or presence of a particular intervention. 30 Unfortunately, the prediction of risk with and without intervention is difficult to perform. 31

Consistent performance of both models is perhaps surprising, particularly as the introduction of corticosteroids is likely to have reduced mortality among people receiving supplementary oxygen. 11 32 In this cohort, 38 417 (54%) patients received systemic corticosteroids compared with 14% and 12% in the original 4C Deterioration and Mortality cohorts. On the contrary, the B.1.1.7 variant was reported to be the dominant circulating SARS-CoV-2 strain during the second wave 33 and may be associated with higher mortality. 34 In our present study, mortality (17%) and deterioration (37%) rates were lower than cohorts used initially for model derivation and validation (mortality 32%, deterioration 43%) despite similar patient age, comorbidities and time to outcome. Nonetheless, both models performed well despite these changes across the included time period. Therefore, application of the 4C Mortality and 4C Deterioration scores together still provides a validated and evidencebased approach for clinicians to predict the appropriate outcome as required to inform clinical management decisions.

This is the largest prospective validation study for prognostic risk scores among hospitalised patients with COVID-19. As for our original reports, we adhered to TRIPOD reporting standards, 18 used multiple imputation to deal with missing data and examined heterogeneity in detail by NHS region, ethnicity and month of admission. We have demonstrated the ability to temporally validate both scores during subsequent admission waves and, if required, can temporally recalibrate both scores if performance decreases in future. In addition, both scores were able to identify both low-risk (rule-out) and high-risk patients (rule-in) for mortality and deterioration, which corresponded with oxygen requirement and duration of in-hospital stay.

There are, however, some limitations. First, the patient cohort comprised of hospitalised patients with confirmed or highly suspected COVID-19 who were seriously ill (mortality rate of 17.4%) and were of advanced age (median age 72 years), similar to cohorts used to derive each score. These models are not for use in the community and could still perform differently in populations at lower risk of death. In addition, pooled estimates demonstrated high heterogeneity, particularly for calibrationin-the-large. This may represent clinical differences across each region, including patient population and the medical management of hospitalised patients with COVID-19. Nevertheless, external validation in Brazil, 6 Canada, 7 France, 8 Netherlands 9 and Pakistan 10 has demonstrated consistent performance for the 4C Mortality Score.

Second, a proportion of recruited patients had incomplete episodes for deterioration (4.5%) or mortality (5.4%). We handled missing outcome data using multiple imputation in the primary analysis, assuming missingness at random and performed a complete case sensitivity analysis, with consistent findings. Furthermore, the inclusion of all-cause mortality as a primary outcome measure, rather than covid-related mortality, may reduce interpretability in patients with nosocomial infection.

We have prospectively validated easy-to-use risk scores that enable accurate stratification among hospitalised adults with community-acquired or hospital-acquired COVID-19 for clinical deterioration or mortality. The performance of both scores has remained consistent despite temporal changes in management and treatment during the second wave. Application within the validation cohorts showed this tool could guide clinician decisions, including treatment escalation. Although the models showed consistent performance across England, Wales and Scotland, validation in other countries should be prioritised to enable its clinical implementation internationally. The ongoing performance of both scores will need to be assessed in the context of increasing deployment of immunomodulatory agents 35 and COVID-19 vaccines, as well as emerging SARS-CoV-2 variants.

Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study

Risk stratification of patients admitted to hospital with covid-19 using the ISARIC who clinical characterisation protocol: development and validation of the 4C mortality score

Development and validation of the ISARIC 4C deterioration model for adults hospitalised with COVID-19: a prospective cohort study

NHS. C0860-clinical-commissioning-policy-remdesivir-for-people-hospitalised-withcovid-19-v2-. pdf

NHS England guidlines for Remdesivir therapy in COVID-19 patients

Community-Acquired pneumonia severity assessment tools in patients hospitalized with COVID-19: a validation and clinical applicability study

External validation of the 4C mortality score among COVID-19 patients visiting the emergency department or admitted to hospital in Ontario

Integrating deep learning CT-scan model, biological and clinical variables to predict severity of COVID-19 patients

Performance of prediction models for short-term outcome in COVID-19 patients in the emergency department: a retrospective study

Isaric 4C mortality score as a predictor of in-hospital mortality in Covid-19 patients admitted in Ayub teaching hospital during first wave of the pandemic

Dexamethasone in hospitalized patients with Covid-19

Prediction models for covid-19 outcomes

Open source clinical science for emerging infections

Global outbreak research: harmony not hegemony

Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study

Surveillance definitions for COVID-19

Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement

Assessing the performance of prediction models: a framework for traditional and novel measures

Towards better clinical prediction models: seven steps for development and an ABCD for validation

Mice: multivariate imputation by Chained equations in R

Multiple Imputation for Nonresponse in Surveys

Individual participant data (IPD) metaanalyses of diagnostic and prognostic modeling studies: guidance on their use

A simple, step-by-step guide to interpreting decision curve analysis

Changes in UK hospital mortality in the first wave of COVID-19: the ISARIC who clinical characterisation protocol prospective multicentre observational cohort study

Ethnicity and outcomes from COVID-19: the ISARIC CCP-UK prospective observational cohort study of hospitalised patients by Ewen M

ISARIC4C Investigators

Overview | COVID-19 rapid guideline: managing COVID-19 | guidance | NICE

Severe covid-19 pneumonia: pathogenesis and clinical management

Intensive care management of coronavirus disease 2019 (COVID-19): challenges and recommendations

Improving clinical management of COVID-19: the role of prediction models

Prediction meets causal inference: the role of treatment in clinical prediction models

WHO Rapid Evidence Appraisal for COVID-19 Therapies (REACT) Working Group. Association between administration of systemic corticosteroids and mortality among critically ill patients with COVID-19: a metaanalysis

SARS-CoV-2 -increased circulation of variants of concern and vaccine rollout in the EU/EEA -14th update

Risk of mortality in patients infected with SARS-CoV-2 variant of concern 202012/1: matched cohort study

C1143-interim-clinical-commissioning-policy-tocilizumab-rps-v2. pdf

Ethics approval Ethical approval was given by the South Central Oxford C research ethics committee in England (reference 13/SC/0149) and by the Scotland A research ethics committee (reference 20/SS/0028).Provenance and peer review Not commissioned; externally peer reviewed.Data availability statement Access to all data and samples collected by ISARIC4C are controlled by an Independent Data and Materials Access Committee composed of representatives of research funders, academia, clinical medicine, public health, and industry. The application process for access to the data is available on the ISARIC4C website (https:// isaric4c. net/ sample_ access/).

J Kenneth Baillie http:// orcid. org/ 0000-0001-5258-793X Malcolm G Semple http:// orcid. org/ 0000-0001-9700-0418 Mahdad Noursadeghi http:// orcid. org/ 0000-0002-4774-0853