key: cord-1014445-uzmcoqa6 authors: Bhakta, Shivang; Sanghavi, Devang K.; Johnson, Patrick W.; Kunze, Katie L.; Neville, Matthew R.; Wadei, Hani M.; Bosch, Wendelyn; Carter, Rickey E.; Shah, Sadia Z.; Pollock, Benjamin D.; Oman, Sven P.; Speicher, Leigh; Siegel, Jason; Libertin, Claudia R.; Matson, Mark W.; Franco, Pablo Moreno; Cowart, Jennifer B. title: Clinical and Laboratory Profiles of SARS-CoV-2 Delta Variant Compared to Pre-Delta Variants date: 2022-04-26 journal: Int J Infect Dis DOI: 10.1016/j.ijid.2022.04.050 sha: b52d0ad383e795eebb5e5d5b87df4bd8eefa8e11 doc_id: 1014445 cord_uid: uzmcoqa6 Background : The emergence of SARS-CoV-2 variants of concern has led to significant phenotypical changes in transmissibility, virulence, and public health measures. Our study used clinical data to compare characteristics between a Delta variant wave and a pre-Delta variant wave of hospitalized patients. Methods : This single-center retrospective study defined a wave as an increasing number of COVID-19 hospitalizations, which peaked and later decreased. Data from the United States Department of Health and Human Services was used to identify the waves’ primary variant. Wave 1 (08/08/20-04/01/21) was characterized by heterogeneous variants, while Wave 2 (06/26/21-10/18/21) was predominantly Delta variant. Descriptive statistics, regression techniques, and machine learning approaches supported the comparisons between waves. Results : From the cohort(n=1318), Wave 2 patients(n=665) were more likely to be younger, have fewer comorbidities, require more ICU care, and show an inflammatory profile with higher C-reactive protein, lactate dehydrogenase, ferritin, fibrinogen, prothrombin time, activated thromboplastin time, and INR compared to Wave 1. The gradient boosting model showed an area under the ROC curve of 0.854(sensitivity 86.4%;specificity 61.5%;positive predictive value 73.8%; negative predictive value 78.3%). Conclusions : Clinical and laboratory characteristics can be used to estimate the COVID-19 variant regardless of genomic testing availability. This finding has implications for variant-driven treatment protocols and further research. The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), which causes coronavirus disease 2019 (COVID- 19) , has led to a significant global health crisis resulting in more than 5.8 million deaths as of February 14, 2022 (Coronavirus Resource Center, 2022) . A high prevalence and transmissibility of SARS-CoV-2 in a population allows adaptive mutations in the viral genome, mostly mildly deleterious or neutral. A small number of these mutations may result in a significant phenotypical virus with an increase in transmissibility, increase in virulence, or decrease in effectiveness of the public/social health measures (Khateeb et al., 2021) . The World Health Organization (WHO, 2021) defines them as variants of concern (VOCs). As a result, most countries have experienced many waves of this viral illness, generally coinciding with new variant strains (Davies et (Paul et al, 2021) , the emergence of a new VOC wave may lag its identification occurring in a geographic local. Consequently, surrogate identifying of the emergence of a new wave is needed. Laboratory markers may predict prognosis, including lymphocytopenia, inflammatory markers (e.g., C-reactive protein, ferritin), lactate dehydrogenase, high-sensitivity troponin I, abnormal coagulation parameters, and others which are commonly associated with poor outcomes (Sui et al, 2021; Poggiali et al, 2020; Henry et al., 2020) . In this retrospective study, we aimed to compare the hospitalized patient characteristics of the Delta variant surge with the pre-Delta variant surge in a single-center hospital in Florida, to define and distinguish the variants clinically. This study was conducted at the Mayo Clinic Hospital in Jacksonville, Florida (MCF), and was deemed exempt from review by the Institutional Review Board (IRB 21-002944). Hospitalized patients with a positive nasopharyngeal polymerase chain reaction (PCR) test or antigen for SARS-CoV-2 on admission or during their hospital stay were reviewed. The vaccination status was assessed from our EMR, which is updated from Florida State Health Online Tracking System (SHOTS) every two weeks for all patients greater than 5 years of age residing in Florida. Vaccine breakthrough was defined as a positive PCR or antigen for SARS-CoV-2 obtained after 14 days from complete vaccination (after 2nd dose Wave 2 to minimize carry-over effects from previous variants. Data were analyzed using a mixture of standard descriptive statistics, regression techniques and machine learning approaches to support comparing patient characteristics and hospital outcomes between the two waves of patients. First, data were analyzed for differences between waves using descriptive statistics. Absolute value standardized mean differences larger than 10% were considered relevant differences between the waves. Kruskal-Wallis rank sum tests were used to test for these differences more formally. Next, another goal was to determine if more generalized, or clustered, combinations of variables could be associated with the waves. To address this, a gradient boosting machine (GBM) was estimated using baseline comorbidities, patient characteristics, and laboratory values performed closest to hospital admission. Rather than using the GBM to predict a clinical outcome such as 28-day mortality, the GBM was trained to predict the COVID-19 variant type predominant in that wave. In this way, the GBM was used to explore fundamental differences between patient cohorts with the hypothesis that the variants were unique while allowing for interactions and non-linear associations in the modeling form. Lab values had differing levels of missing rates in the two waves. To avoid missing data being used to predict wave likelihood, missing data were imputed using the MissForest algorithm prior to modeling. Data were split in a traditional train (80%) and test (20%) manner. The GBM was then constructed using a cartesian grid search to find optimal tuning parameters (learning rate, column sampling, row sampling, tree depth, and number of trees) while using 5-fold crossvalidation to assess model performance. The model with the highest mean AUC across folds in the training data set was selected as the final model. In the final test data set, a receiver operating characteristic (ROC) curve was generated and traditional binary classification summaries such as sensitivity and specificity were used to assess model performance and misclassification in data not used during model development and selection. are also reported after a threshold for 0.028 wave classification were selected to optimize F1 performance in training data. Given that the baseline laboratory markers of clotting, inflammation, and other assays were noted as important variables in the GBM fit, longitudinal analyses were conducted using censored mixed models with a random intercept. Data were winsorized at the 0 and 95 th percentiles. For these models, the parameters of primary interest were the model wave indicator which quantified how much difference there was in laboratory values at admission (time = 0) and the hospital day by wave interaction term which quantified differences in the rate of change in the laboratory values between waves throughout the hospital stay. Statistical analyses and graphical presentations were created using R version 4.0.3 (Vienna, Austria). When p-values were reported to be interpreted, p<0.05 (two-sided) threshold was used to represent statistical significance. The final sample (N=1318) included 653 cases in the pre-delta variant Wave 1 and 665 cases in the delta-dominant Wave 2. Figure 1 provides an overview of hospital admissions and identifies cases that were included in each wave, as well as a buffer period in which we observed overlap between pre-Delta and Delta variants. Descriptive statistics and tests for differences in baseline comorbidities and patient characteristics between waves are shown in Table 1 . Several differences in baseline comorbidities and patient characteristics were observed including age, race, ethnicity, and comorbidities such as hypertension, chronic kidney disease, chronic obstructive pulmonary disease, coronary artery disease, and congestive heart failure. Patients from Wave 2 were significantly younger, with fewer comorbidities and less immunosuppression than those from Wave 1. Wave 2 patients were more likely to require intensive care but had lower unadjusted mortality than those from Wave 1 (Figure 2) . A comparison of lab assay values between the two waves is provided in Table 2 Table 1 ). Throughout the COVID-19 pandemic, much attention has been directed to the SARS-CoV-2 virus as if it were a single entity. With the emergence of Delta and now Omicron variants, it has become clear that variants have the potential for differences in patient trajectories and characteristics that warrant further consideration and clarification. Genomic surveillance would be the most precise means of identifying the spread of new viral variants, but this technology has limitations. As of mid-2021, the United States ranked 33 rd worldwide in genomic surveillance (Crawford and Williams, 2021) , and even when genomic testing is performed there are often significant time delays with results. Test results may also not be identifiable back to the individual patient level. One goal of this study was to quantify better how a constellation of factors shifted in hospitalized patients as the predominant variant changed in the community without access to individual patient-level genomic testing. The ability to predict the phenotype of the predominant viral variant has implications for individual patient care as well as for hospitals and at the population level. The Delta variant predominant Wave 2 was characterized by more inflammation and higher intensive care needs than the pre-Delta variant Wave 1. Knowing that a patient presents with Delta-like characteristics will allow better prognostication for that individual. If hospital cases start to rise with Delta-like characteristics, knowing this information will allow hospitals to plan internally and in networks to prepare for high intensive care utilization in equipment, space, and staffing. At the population level, a rise in Delta-like cases identified early would allow early adoption of public health measures to suppress spread. In comparison, a non-Delta like wave of cases in a highly vaccinated population would warrant a different public health response. As the SARS-COV2 virus continues to mutate, variants of concern will likely continue to spread. Public health strategies should be adaptable to not only rising case counts but the possibility that a variant will cause more severe disease. We utilized a machine learning technique, gradient boosting, to explore differences between waves. The gradient boosting machine model identified multiple inflammatory and clotting factor variables that meaningfully shifted between the two waves. Certain markers were significantly higher in Wave 2, such as coagulation studies, segmented neutrophils, fibrinogen, LDH, ferritin, and CRP (Table 2) . CRP has been a surrogate marker for the degree of cytokines released in COVID-19, which is usually a higher level associated with a hyperinflammatory Over the 2 years of the COVID-19 pandemic, the healthcare system has realized that it needs to adapt to the various SARS-CoV-2 variants based on the clinical characteristics of each strain, the population it involves and the likelihood for mortality or morbidity. The vaccine's effectiveness also plays a huge role in the outcomes of these patients. Earlier in the pandemic, the healthcare's ability to handle patients' needs other than COVID-19 was negatively impacted. With prediction modeling of future variants, decisions to maintain healthcare throughput can be One of the central weaknesses of the study is a lack of definitive classification of the variants that infected the hospitalized patients over the study period. This was in part due to the retrospective nature of the study and the evolving adoption of genetic sequencing for the variant. Thus, we have leveraged HHS.gov prevalence data to classify which of the SARS-CoV-2 variants predominated in given time-periods. Those data reflect variants sequenced from a given region comprising multiple states. Different percentages of variants can exist within a region but are presumed to be insignificant. Selection bias due to the study's single-center nature and misclassification bias of the variants in the wave could exist. We attempted to address these limitations by including all patients within the study period. Furthermore, due to the retrospective nature of this study, we relied on the records documented directly into the electronic health record. While developed using data partitioned off for model testing ("validation"), the gradient boosting machine was only trained on a single site without a separate prospective validation study. Also, the model is currently limited in that it was trained only to discriminate the predominantly pre-Delta variants from a predominantly Delta strain. Lastly, traditional diagnostic summary metrics do not account for misclassification bias; additional thresholds could be explored to minimize metrics like false negative rates. Further research will be required to determine if the use of simple patient characteristics can readily identify changes in the predominant variant. Of note, shifts in predominate SARS-CoV-2 variants would likely be associated with model drift and decrease in model performance, necessitating a classification model that is not strictly binary. The principal finding of our study is that a selection of readily obtainable laboratory studies and patient characteristics can be used to differentiate between cases of SARS-CoV-2 individuals hospitalized during a pre-Delta predominant versus a Delta predominant wave. The importance of this finding is that it may provide a future approach to developing simple statistical monitoring systems for future waves of infections. This may address the real-world challenges of sequencing the variant and adapting treatment accordingly. If a shift in patient characteristics is detected using readily obtainable data, genetic sequencing of variants could be expedited and prioritized. A significant genotypical shift resulting in variants of concern may be readily apparent secondary to increased cases in the community and later seen as a wave in hospital admissions. As in-hospital COVID-19 cases decrease over time due to increased vaccine effectiveness or natural immunity, less clinically impactful genomic mutations, and improvements in outpatient treatments, a predictive model like ours may compare a patient's phenotypical characteristics to known variants and treat them accordingly. ☐ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The left panel shows a receiver operating characteristic (ROC) curve along with associated diagnostic metrics. The blue circle represents the selected threshold, which was determined by the optimal F1 score in the training data. The right panel is a confusion matrix displaying false and true negatives/positives and associated metrics of specificity, sensitivity, negative predictive probability (NPV) and positive predictive probability (PPV). In all cases, metrics are associated with test data that was not used during model development or selection. Lab assays at the first test during a patient's admission. For values below the lower limit of detection, values were imputed to half the distance between 0 and the lower limit. For values above the upper limit of detection, values were winsorized at the upper limit. † P-values arise from Kruskal-Wallis rank sum tests. Global variation in sequencing impedes SARS-CoV-2 surveillance Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England Evaluation of the relative virulence of novel SARS-CoV-2 variants: a retrospective cohort study in Ontario SARS-CoV-2 variants of concern and variants under investigation in England: technical briefing 14 Hematologic, biochemical and immune biomarker abnormalities associated with severe illness and mortality in coronavirus disease 2019 (COVID-19): a meta-analysis Emerging SARS-CoV-2 variants of concern and potential intervention approaches Inflammatory and Prothrombotic Biomarkers Associated with the Severity of COVID-19 Infection Infection with the SARS-CoV-2 Delta Variant is Associated with Higher Recovery of Infectious Virus Compared to the Alpha Variant in both Unvaccinated and Vaccinated Individuals Biomarkers and outcomes of COVID-19 hospitalisations: systematic review and meta-analysis Effectiveness of COVID-19 vaccines against symptomatic SARS-CoV-2 infection and severe outcomes with variants of concern in Ontario Clinical and virological features of SARS-CoV-2 variants of concern: a retrospective cohort study comparing B.1.1.7 (Alpha), B.1.315 (Beta), and B.1.617.2 (Delta) Genomic Surveillance for SARS-CoV-2 Variants Circulating in the United States Lactate dehydrogenase and C-reactive protein as predictors of respiratory failure in CoVID-19 patients Public Health Scotland and the EAVE II Collaborators. SARS-CoV-2 delta VOC in Scotland: demographics, risk of hospital admission, and vaccine effectiveness BNT162b2 and ChAdOx1 nCoV-19 Vaccine Effectiveness against Death from the Delta Variant SARS-CoV-2 variants of concern are emerging in India Elevated Plasma Fibrinogen Is Associated With Excessive Inflammation and Disease Severity in COVID-19 Patients Lenzilumab in hospitalised patients with COVID-19 pneumonia (LIVE-AIR): a phase 3, randomised Hospital admission and emergency care attendance risk for SARS-CoV-2 delta (B.1.617.2) compared with alpha (B.1.1.7) variants of concern: a cohort study COVID-19) Update: FDA Takes Additional Actions on the Use of a Booster Dose for COVID-19 Vaccines Multicellular immunological interactions associated with COVID-19 infections World Health Organization. Tracking SARS-CoV-2 variants Accessed 10 The authors did not receive financial support for the authorship of this article. The authors have submitted the disclosure of potential conflicts of interest. Approval was not required.