key: cord-0703755-zcipha0f authors: McDonald, Samuel A.; Medford, Richard J.; Basit, Mujeeb A; Diercks, Deborah B.; Courtney, D. Mark title: Derivation with Internal Validation of a Multivariable Predictive Model to Predict COVID‐19 Test Results in Emergency Department Patients date: 2020-11-28 journal: Acad Emerg Med DOI: 10.1111/acem.14182 sha: 7b199eed88348851477106400857ddf95c561a0d doc_id: 703755 cord_uid: zcipha0f OBJECTIVES: The COVID‐19 pandemic has placed acute care providers in demanding situations in predicting disease given the clinical variability, desire to cohort patients, and high variance in testing availability. An approach to stratify patients by likelihood of disease based on rapidly available emergency department (ED) clinical data would offer significant operational and clinical value. The purpose of this study was to develop and internally validate a predictive model to aid in the discrimination of patients undergoing investigation for COVID‐19. METHODS: All patients greater than 18 years presenting to a single academic ED who were tested for COVID‐19 during this index ED evaluation were included. Outcome was defined as the result of COVID‐19 PCR testing during the index visit or any positive result within the following 7 days. Variables included chest radiograph interpretation, disease specific screening questions, and laboratory data. Three models were developed with a split‐sample approach to predict outcome of the PCR test utilizing logistic regression, random forest, and gradient boosted decision‐tree methods. Model discrimination was evaluated comparing AUC and point statistics at a predefined threshold. RESULTS: 1026 patients were included in the study collected between March and April 2020. Overall, there was disease prevalence of 9.6% in the population under study during this time frame. The logistic regression model was found to have an AUC of 0.89 (95% CI 0.84 ‐ 0.94) when including four features: exposure history, temperature, WBC, and chest radiograph result. Random forest method resulted in AUC of 0.86 (95% CI 0.79 ‐ 0.92) and gradient boosting had an AUC of 0.85 (95% CI 0.79‐0.91). With a consistently held negative predictive value, the logistic regression model had a positive predictive value of 0.29 (0.2‐0.39) compared to 0.2 (0.14‐0.28) for random forest and 0.22 (0.15 – 0.3) for the gradient boosted method. CONCLUSION: The derived predictive models offer good discriminating capacity for COVID‐19 disease and provide interpretable and usable methods for those providers caring for these patients at the important crossroads of the community and the health system. We found utilization of the logistic regression model utilizing exposure history, temperature, WBC, and Chest XR result had the greatest discriminatory capacity with the most interpretable model. Integrating a predictive model‐based approach to COVID‐19 testing decisions and patient care pathways and locations could add efficiency and accuracy to decrease uncertainty. The narrative of COVID-19 for the acute care provider in the United States has been dominated by testing and the question of likelihood of disease for individual patients 1 . In a matter of months providers have gone from relying on reported classic symptoms based on anecdotal experience to developing their own gestalt based on their own experience to having SARS-CoV-2 tests available. Despite very rapid development of viral RNA polymerase chain reaction (PCR) testing in many locations, the question of pre-test probability (the probability of disease prior to a test) is still critical for the following reasons: testing throughout the US is still not fully developed or immediately available 2 , risk of disease may be significantly over-estimated resulting in clinician fear and impact on personal protective equipment (PPE) use, expansion of COVID-19 to at-risk countries and settings with no testing 3, 4 , and its role in establishing a post-test probability after result of the test 5 . This latter point has become critical for emergency clinicians, hospital administrators, and epidemiologists trying to limit both community and health care associated spread and follow disease trajectory. The post-test probability is the probability of disease after a test and is a function of two things: a) the pre-test probability and b) the diagnostic performance and result of the test. Despite varying tests being used and the uncertainty of their diagnostic performance, the post-test probability is driven to a very large part by pre-test probability. Efforts to more scientifically quantify pre-test probability would aid individual and health system decision making 6, 7 . The greatest concern to limiting spread both within the community and importantly within health systems and care facilities is the reported nontrivial rate of false negative testing 8 . In the context of high-risk patients, many are recommending repeat testing, recognizing that quality of nasopharyngeal swab technique, early stage illness, and handling of media may contribute to false negative tests 9, 10 . For these reasons, many systems are adopting efforts to risk stratify patients based on gestalt, early internal data, or home-grown scoring systems. These approaches attempt to meet a significant operational need in the ED in particular as the decision of disposition and early management of suspected COVID-19 patients typically falls to the emergency physician. However, minimal work has been done to provide a data-driven method for COVID-19 risk stratification. Once developed, such a predictive approach would further our ability to provide appropriate recommendations for mitigating disease spread in and out of the hospital, identify patients who Accepted Article warrant repeat testing if the initial test is negative, and inform future research. The purpose of this study was to develop and internally validate a predictive model that could aid in the discrimination and management of patients undergoing investigation for COVID-19 when presenting to the emergency department. Specifically, our objective was to use a robust electronic health record (EHR) that captures lab, clinical, vital sign, comorbidity, radiographic data with traditional and novel modeling techniques to predict SARS-CoV-2 PCR test results in ED patients. This is a retrospective cohort study of consecutive patients presenting to the emergency department that were tested for COVID-19 between March 6 and April 24, 2020 at a single academic quaternary care facility with a typical yearly volume of 50,000 patients. Given the pandemic environment and reduced ED presentations, the volume experienced during the study period was about 4000 unique patient encounters. This quality improvement initiative was prioritized in the early evolution of our response to COVID-19 in order to best evaluate how to deploy testing strategies in a resource supply conscious environment and was formally approved by the Department of Emergency Medicine Quality Improvement Committee. Reporting of this derivation has followed guidelines established by TRIPOD statement 11 . We included all patients greater than 18 years presenting to our ED who were tested for COVID-19 during this index ED visit and had not previously received testing in our health system. Testing decisions during the study period were protocolized by the health system based on a CDC guideline driven testing strategy requiring either a high risk exposure or high prevalence location in addition to associated symptoms. Testing was also performed if there was high suspicion from the practicing clinician that fell outside of the protocolized guidelines. No asymptomatic testing was being performed during this period. Patients with repeat visits after the index visit were not included in the analysis as the available prior encounter data within the EHR could differentially affect pre-test probability estimation as well as work-up of disease relative to an encounter without previous ED testing. The primary outcome of interest was the presence of COVID-19 infection evaluated by the performance of PCR testing via nasopharyngeal swab obtained at the index or any resulting positive test within the following week. Positive PCR testing was defined by any positive PCR test performed by laboratory technicians without awareness of other clinical data. If initial testing was negative but repeat testing within 1 week was positive, a positive result was attributed as the final outcome. This was This article is protected by copyright. All rights reserved done to account for the concern for possible initial false-negative testing in setting of early presenters. If no additional testing was performed the initial result was considered the final outcome. Initially 44 variable candidates were collected from the clinical data warehouse that are routinely available during an emergency department encounter when evaluating influenza-like illnesses. The dataset consisted of patient specific information, triage screening questions, initial vital signs, specific lab results, and chest x-ray interpretation. Patient specific information included age at encounter, gender, race, and specific comorbid conditions consisting of hypertension, diabetes, end-stage renal disease, asthma, COPD. The comorbidities are extracted from SNOMED concept-based patient registries, defined by a standardized vocabulary of clinical concepts, that are automatically populated within the EHR anytime a clinician adds a diagnosis or problem list item that maps to the specific SNOMED concept 12 . Given the number of patients available for our analysis limits our capacity to use unique comorbid conditions in a predictive model, we quantified the number of comorbid conditions per patient as a unique ordinal variable from 0 to 5. The screening questions performed at triage are a set of standardized general risk stratification questions for developing pathogens with increased specificity to COVID-19. These questions assessed presence of recent fever, travel to location of high disease prevalence, exposure to someone with COVID-19, and associated viral symptoms and are treated as dichotomous variables. Continuous variables collected for evaluation consisted of laboratory values and vital signs. Labs selected for evaluation were total WBC, absolute lymphocyte count and percentage, alanine aminotransferase (ALT), aspartate aminotransferase (AST), C-reactive protein (CRP), ferritin, lactate dehydrogenase (LDH), high sensitivity troponin T (hs-TnT), and NT-pro BNP. These were selected based on emerging literature reporting value in disease specificity and severity 13, 14 . Vitals obtained for the visit were collected as additional covariates consisting of blood pressure, pulse, respiratory rate, oxygen saturation, temperature. To have a uniform collection method the first vitals obtained during the ED visit were used for analysis. Values of each of these covariates consisted of the first value obtained in the encounter for consistency and availability. Lastly Chest X-ray (CXR) interpretation was included as a categorical variable consisting of options of consistent with viral pneumonia, low likelihood findings related to viral pneumonia, and negative. This was stored as a structured result within the EHR, classified by the interpreting radiologist. This was a new process implemented specifically for COVID-19. There was a subset of images that were Accepted Article classified by the study group prior to implementation of that process. These were tagged in a similar fashion by the interpreting radiologist. List-wise elimination was performed for any observation without an associated COVID-19 order result. List-wise elimination was also performed if no vital sign data or screening questionnaire data were available. This was done given the concern that these data were missing not at random (MNAR) as these points should be present in almost all emergency department visits. Variables with very low variance, more than 70% missing values, or high correlation with other features were removed prior to modeling. After initial variable selection and division of data, imputation was performed separately on the testing and validation data sets utilizing a random forest-based nonparametric method that can impute both numeric and categorical variables using R package missRanger 15, 16 . It is likely that many encounters had missing data for chest radiograph or lab values given they were not ill appearing and unlikely to be admitted. Clinically, we did not mandate lab or CXR in all patients tested for COVID and the resulting data is likely what is to be found in real world setting. Data were randomly divided with a split-sample approach allocating 75% of data for training each model and 25% for internal validation. Utilizing the training data set, three distinct models were developed to predict COVID-19 test results to maximize our ability to arrive at an optimal method for prediction. A logistic regression model was developed using a backward stepwise variable selection method utilizing Wald chi square statistics to evaluate the reduced model. This was performed using R RMS package 17 . Two ensemble decision-tree based approaches were used, random forest using R Ranger package 18 and a gradient boosted decision tree method using R XGBoost package 19 utilizing the default recommendations for tuning parameters. These models use the composite prediction of a group of base models to optimize bias and variance of the prediction, improving the prediction of a standard decision tree-based methodology. While there are many options for model types to use for prediction, our goal was to have a model that could be easily implemented within our EHR to provide automated alerting to providers. At this time, our capacity to implement models more complex than generalized linear models or tree-based models is limited, and we felt the operational value for testing other model types was low. We present central tendency with mean and associated 95% confidence intervals. Categorical data are presented as total number and percentages. Model performance is evaluated by examining the area under the receiver operator curve for each model with associated 95% confidence intervals in the validation Accepted Article data. To examine the specific threshold that would be used for clinical decision making, we compared the positive predictive values (PPV) at a set negative predictive value of 99%. This cut-point was chosen to preferentially minimize false negatives. All analysis was done using R version 3.6.3 Over the study period, 1060 encounters met the inclusion criteria. There were 11 observations removed due to no associated outcome value and 23 observations were removed due to lack of triage screening data or vitals. Our final study sample consisted of 1026 unique patients. Prevalence of COVID-19 in this population was found to be 9.6% (95% CI 9.9-11.6%) with 99 patients testing positive. The cohort had a mean age of 52 years (SD 19) with 56% being female. Other demographic details are found in Table 1. ALT was found to have most frequent missing data with 29% in training and 31% in validation data subsets, followed by CXR (16% and 18%), Lymphocytes (16%, 14%) and WBC (15%, 13%), visualized in Figure 1 . The rest of the predictors had less than 3% missingness in both training and validation data. This dataset reflects all subjects who were tested. The final selected covariates are presented in Table 2 . Figure 2 . To contextualize performance with respect to the necessary clinical decision making, sensitivity, specificity, and resulting PPV were reported in Table 3 at the pre-specified threshold where NPV was 99%. The logistic regression model offered a higher positive predictive value of 0.29 (0.2-0.39) compared to 0.20 (0.14-0.28) for the random forest and 0.22 (0.15 -0.3) for the gradient boosted method. The selected logistic regression model is presented in Table 4 for review. Using data collected in the ED encounter of 1,026 patients presenting for evaluation of COVID-19, we developed three statistical learning models to aid in discrimination of patients when COVID-19 testing is not available, timely, or of questionable result. When comparing the derived models, all had similar discriminative capacity when predicting positive COVID-19 PCR result, but the logistic regression model offered identical This article is protected by copyright. All rights reserved performance at the pre-defined test threshold (NPV > 99%), the primary aim of this study, with a greater capacity to identify patients positive for COVID-19 as well (PPV 29%). We feel the results of this model provide great value to the bedside clinician in offering a relatively simplified and easy to interpret model with enough strength in discrimination to provide risk stratification for patients presenting to the ED for COVID-19 ( Figure 3 ). The value in this derived model is its accessibility and relevance to an acute care clinician with key features that were identified by the feature selection processabnormal x-ray, positive contact with disease, elevated temperature, and WBC -clinically meaningful and regularly obtained during an ED encounter. While the ensemble methods had slightly decreased predictive capacity, they also attributed significant importance to these same variables further justifying their use in our final model. As front-line clinicians continue to care for all patients presenting to the emergency department, the threat of COVID-19 creates a cognitive burden to ensure accurate diagnosis needed and expected of us to facilitate appropriate recommendations to prevent further spread of this disease. Clinicians in the ED and point of care are being inundated daily with the question of "does this patient have COVID?" while at the same time they are charged with preserving PPE, assigning patients to the right team, and conserving test reagent resources. Models that offer some discrimination among patients provide an easy method for assigning pre-test risk stratification. In settings where testing is not easily available or scarce this may offer enough discrimination to reduce the need for testing in low-risk patients. Even when testing may still be performed it could offer a data driven mechanism to reduce PPE consumption. Some of the more accessible tests to emergency providers have reportedly poorer sensitivity causing concern for false negative rates 20,21 . While not studied here specifically within this cohort, this model could be used to further lower the risk of COVID-19 after initially receiving a negative point of care test. This could offer reassurance to care teams, minimize PPE consumption for patients that are admitted, and reduce the need for patients being discharged to perform the self-isolation that would likely be unnecessary and unwarranted. While none of these models offer near perfect discrimination there is a clear separation in the two distributions in our results (Figure 2 and 3) . This offers some flexibility in determination of an appropriate test threshold to divide the model prediction into two classes without significant loss in discrimination, increasing the capacity for utilizing the model for more than its capacity to rule out disease. Increasing the classification threshold could allow one to focus on positive predictive value. While not typically a focus for emergency Accepted Article medicine, utilizing the model in this capacity could provide a method for identifying a high-risk cohort early in course of care to aid in better allocation of scarce hospital resources such as restricted treatment regimens, negative pressure rooms, and even personal protective equipment. A model of this type could become even more useful in the future when concern for COVID-19 is not the forefront of the acute care provider thoughts. If implemented appropriately, a provider caring for a patient with a high pre-test probability for disease could receive an alert within the EHR that recommends testing and appropriate cohorting to minimize unnecessary staff exposure. With a post-test probability of 30% found utilizing available emergency department data, with our model would be useful toward these efforts. This is one of the first models to provide a context on the likelihood of COVID-19 with a population specific to emergency providers in the United States. There has been limited work that remains in pre-print from China that attempted to predict COVID -19 with a goal of minimizing unnecessary testing, specifically CT scans as they were used as a primary decision point in China early in the pandemic 22, 23 . Unfortunately, many of the features included in some of these early predictive models utilized laboratory tests that are not easily available or would not result during the normal course of an emergency department visit in the United States. Healthcare researchers are more frequently utilizing novel techniques for prediction as consensus seems to be that novel "AI" machine learning techniques offer improved prediction and discrimination 24, 25 . Though allure exists for using some of these newer methods, the benefit of using a logistic regression model, in addition to its improved predictive performance for this use case, is the added interpretability offered 26 . It becomes operationally more lucrative than these black box machine learning methods as it provides the basis for development of a simple decision tool for clinicians to risk stratify patients when a health system lacks the capacity for implementation of a regression model within an EHR. Unlike standard Decision Trees, the nodes that enable decision making are not easily extrapolated into a single decision rule due to the nature of the composite prediction. Without implementation within an EHR for automated scoring, ensemble methods provide little value to the average clinician. There are several limitations to our study. First and foremost, this was a retrospective analysis that is prone to bias. In addition, this resulted in a lot of missing data though very few observations had a significant amount of missingness. Our institution's testing capacity, uniquely, has been available to ED physicians since COVID-19 arrived in the North Texas area. Our broad inclusion criteria, institutional approach to liberal Accepted Article testing, and test availability likely aided to reduce spectrum bias typical in these retrospective derivation studies that overly sample the more ill patients. Another limitation was that this was a single-site derived model in a unique patient population with a presumed inherently high-risk patient baseline. While generalization of this model would presumably show reduction in discriminatory capacity, the simplicity of the logistic regression model seems less prone to overfitting than a model with a significantly higher number of predictors. One feature that may have higher variance elsewhere would be the categorization of the radiology read as a discrete result was provided by our radiologists following a system-wide protocol. If left to emergency physicians or other radiologists without strict guidance, the predictive nature of the CXR may vary. Additionally, with more data, a more optimally fit model could be constructed with the addition of other high value predictors. Future work intends to externally validate on data from multiple other institutions. It is possible that the information lost due to the necessary discretization of the continuous variables within the tree-based methods affected their performance when compared to the logistic regression model. Additionally, these ensemble prediction methods are typically "data-hungry" 25 and could have benefitted from additional observations for training that were unavailable at a single site study. It is possible future work with more observations would find that these methods offer better discrimination than we have found in our cohort. Lastly, defining an outcome for modeling is difficult when gold standards are not well validated. The PCR test has been used as gold standard for diagnosis of COVID, but some of these assays have been found to have a nontrivial false negative rate 20, 21 . While we could not expect all patients to get both a COVID PCR testing and CT Chest in normal emergency department operations in the United States, we attempted to mitigate this false negative rate by allowing for repeat testing performed within a reasonable time-horizon to their initial visit. Regardless, at present and the foreseeable future, the COVID PCR test is clinically being used as the source of truth. Based on the analysis of a consecutive population of patients arriving to an emergency department for the evaluation of COVID-19, the derived and internally validated model has good discriminatory capacity for COVID-19 disease utilizing 4 easily obtained variable: exposure history, temperature, WBC, and Chest XR Accepted Article result. This model offers an easily interpretable and immediately usable to the average emergency clinicianfrom a rule of thumb method to implementation of the model within the EHR -to help drive high quality patient and system-level care for COVID-19 disease. Mean (SD) / n (%) This article is protected by copyright. All rights reserved The COVID-19 Pandemic in the US: A Clinical Update Preliminary Results of Initial Testing for Coronavirus (COVID-19) in the Emergency Department Multi-tiered screening and diagnosis strategy for COVID-19: a model for sustainable testing capacity in response to pandemic COVID-19: Are Africa's diagnostic challenges blunting response effectiveness? Improved Molecular Diagnosis of COVID-19 by the Novel, Highly Sensitive and Specific COVID-19-RdRp/Hel Real-Time Reverse Transcription-PCR Assay Validated In Vitro and with Clinical Specimens Pretest Probability Estimates: A Pitfall to the Clinical Utility of Evidence-based Medicine? COVID-19 Testing: The Threat of False-Negative Results Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR Real-time RT-PCR in COVID-19 detection: issues affecting the results Negative Nasopharyngeal and Oropharyngeal Swabs Do Not Rule Out COVID-19 Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement Accepted Article This article is protected by copyright. All rights reserved 12 SNOMED CT Concept Hierarchies for Sharing Definitions of Clinical Conditions Using Electronic Health Record Data Hematologic, biochemical and immune biomarker abnormalities associated with severe illness and mortality in coronavirus disease 2019 (COVID-19): a meta-analysis Unique epidemiological and clinical features of the emerging novel coronavirus pneumonia (COVID-19) implicate special control measures Fast Imputation of Missing Values MissForest-non-parametric missing value imputation for mixed-type data Regression Modeling Strategies A Fast Implementation of Random Forests for High Dimensional Data in C++ and R methods for the detection of SARS-CoV-2 from nasopharyngeal and nasal swabs from individuals diagnosed with COVID-19 Comparison of Abbott ID Now and Abbott m2000 methods for the detection of SARS-CoV-2 from nasopharyngeal and nasal swabs from symptomatic patients A Novel Triage Tool of Artificial Intelligence Assisted Diagnosis Aid System for Suspected COVID-19 pneumonia In Fever Clinics Development and utilization of an intelligent application for aiding COVID-19 diagnosis Machine learning and medical education