key: cord-0334991-9fpbwm8k authors: Hu, Z.; van der Ploeg, K.; Chakraborty, S.; S Arunachalam, P.; Mori, D. M.; Jacobson, K. B.; Bonilla, H.; Parsonnet, J.; Andrews, J. R.; Hedlin, H.; de la Parte, L.; Dantzler, K.; Ty, M.; Tan, G. S.; Blish, C. A.; Takahashi, S.; Rodriguez-Barraquer, I.; Greenhouse, B.; Butte, A. J.; Singh, U.; Pulendran, B.; Wang, T. T.; Jagannathan, P. title: Early immune responses have long-term associations with clinical, virologic, and immunologic outcomes in patients with COVID-19 date: 2021-08-29 journal: nan DOI: 10.1101/2021.08.27.21262687 sha: 3fc485ff98878377ca46361fb19347398a040f34 doc_id: 334991 cord_uid: 9fpbwm8k The great majority of SARS-CoV-2 infections are mild and uncomplicated, but some individuals with initially mild COVID-19 progressively develop more severe symptoms. Furthermore, mild to moderate infections are an important contributor to ongoing transmission. There remains a critical need to identify host immune biomarkers predictive of clinical and virologic outcomes in SARS-CoV-2-infected patients. Leveraging longitudinal samples and data from a clinical trial of Peginterferon Lambda for treatment of SARS-CoV-2 infected outpatients, we used host proteomics and transcriptomics to characterize the trajectory of the immune response in COVID-19 patients within the first 2 weeks of symptom onset. We define early immune signatures, including plasma levels of RIG-I and the CCR2 ligands (MCP1, MCP2 and MCP3), associated with control of oropharyngeal viral load, the degree of symptom severity, and immune memory (including SARS-CoV-2-specific T cell responses and spike (S) protein-binding IgG levels). We found that individuals receiving BNT162b2 (Pfizer-BioNTech) vaccine had similar early immune trajectories to those observed in this natural infection cohort, including the induction of both inflammatory cytokines (e.g. MCP1) and negative immune regulators (e.g. TWEAK). Finally, we demonstrate that machine learning models using 8-10 plasma protein markers measured early within the course of infection are able to accurately predict symptom severity, T cell memory, and the antibody response post-infection. We first examined antibody levels and transcriptomic profiles at day 0 and day 5 after 10 enrollment in both patients randomized to Peginterferon Lambda and placebo. Based on the 11 subject-reported symptom starting date, these samples were collected -1 to 20 days after 12 symptom onset, with most of the samples collected within the first 2 weeks of the symptom 13 onset ( Figure 2A ). As expected, we observed a positive correlation between the S protein 14 binding IgG levels at enrollment and the time since symptom onset17 ( Figure 2B ). We 15 performed principal component analysis of transcriptomic data and calculated the correlation 16 between the first two principal components (PC) and other clinical variables. We found that PC1 17 had the strongest association with the time since symptom onset and the IgG titer, suggesting 18 that these transcriptomic profiles capture the progression of the immune response in COVID- 19 19 patients ( Figure 2C -E). We also performed PCA analysis on the Olink data. Similar to results 20 from the analysis of transcriptomics data, Olink data were associated with disease progression, 21 as indicated by the high correlation between PC2 and the time since symptom onset ( Figure 2F CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint H). We also observed an association between PC1 and age, which captures the impact of age 1 on the plasma protein landscape in COVID-19 patients. 2 3 We previously reported that Peginterferon Lambda treatment neither shortened the duration of 4 SARS-CoV-2 viral shedding nor improved symptoms in outpatients with COVID-19 16 . PCA 5 analysis revealed that transcriptional and proteomics profiles at day 5 post-treatment were not 6 affected by Peginterferon Lambda treatment ( Figure 2D , 2G, and Supplemental Figure 1 ). In 7 addition, we found no significant differences in the T cell responses (at day 28 after enrollment) 8 and antibody responses (at day 28 and month 7 after enrollment) between the two treatment 9 arms (Supplemental Figure 1) , as reported previously 17 . Taken together, Peginterferon Lambda 10 treatment did not show noticeable effects on the immune response in COVID-19 outpatients. 11 Therefore, we combined the data from the control and treatment arms together for all 12 downstream analysis. Trajectory analysis reveal sequential activation of immune pathways in COVID-19 8 patients 9 We next characterized the trajectory of early transcriptomic and proteomic responses 10 using the RNAseq and Olink data as a function of time since symptom onset. To reduce the 11 dimensionality and improve interpretability, we calculated the enrichment score of different 12 immune pathways (based on Gene Ontology 18 ) from the RNAseq data. We then combined 13 pathway enrichment scores and Olink measurements into a single dataset for downstream 14 statistical analysis. We fitted the data with quadratic regression to capture the non-linear 15 dynamics of the pathways and proteins. Table 2 ). Among them, 16 immune pathways or proteins 18 showed nonlinear dynamics, as indicated by significant coefficients of the quadratic term 19 (Supplemental Table 2 ). 20 We performed clustering analysis and identified four clusters based on the trajectory of the 22 significant pathways and proteins ( Figure 3A -B). Cluster 1 contains interferon-related pathways, 23 natural killer cell activation pathways and proteins known to be activated by interferon signaling, 24 including MCP-1, MCP-2, CXCL10 and CXCL11 [19] [20] [21] [22] . The trajectories in cluster 1 already 25 reached the peak at the time of symptom onset and monotonically decreased over time. The 26 trajectories in cluster 2 peaked at 1-5 days after symptom onset and contain Interferon-γ and 27 pathways related to T cell activation. Interestingly, it also contains several myeloid cell attracting 28 chemokines (CXCL1 and CXCL6) and the innate cell response pathway. Cluster 3 peaked 29 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint between 10 to 14 days after the symptom onset and is characterized by pathways related to B 1 cell activation. Cluster 4 trajectories monotonically increase after symptom onset and are 2 characterized by the increasing S protein binding IgG level and related B cell differentiation 3 pathways. The trajectory analysis revealed the sequential activation of interferon signaling, NK 4 cells, myeloid cells, Interferon-γ, T cells, B cell and antibody production within the first 15 days 5 of symptom onset. 6 7 To characterize how the composition of blood immune cells change over time, we used a 8 previously established tool named xCell to estimate the enrichment score of the major immune 9 cells 23 . As a positive control, we compared the neutrophil score with the neutrophil count data 10 obtained from clinical lab tests and found high correlation between them ( Figure 3C ). Quadratic 11 regression did not find significant associations between the major cell types and the time since 12 symptom onset ( Figure 3D ). The results suggest that the trajectory of different immune 13 pathways ( Figure 3A CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint regression. The grey areas represent the 95% confidence intervals. (C) We estimated the spearman correlation between the 7 neutrophil enrichment score using the xCell. The plot shows the correlation between the xCell score and the counted neutrophil 8 percentage in whole blood. (D) The relationship between xCell enrichment score and days after symptom onset. 9 10 Variations in early immune responses are associated with disease severity in COVID- 19 11 patients. 12 We next sought to identify immune pathways and plasma proteins associated with symptom 14 severity in COVID-19 outpatients. At the time of sample collection (day 0 and day 5 after 15 enrollment), the majority of subjects showed either mild to moderate symptoms that 16 subsequently resolved (n=100) or were asymptomatic (n=8). However, 8 patients later 17 developed progressive and more severe symptoms and were hospitalized or presented to the 18 emergency department (median 2 days to progression, range 1-13 days). We defined these 19 individuals as severe COVID-19, and used regression models to identify immune pathways and 20 plasma proteins to compare these participants with those who didn't seek care at the hospital 21 (mild/moderate COVID19), while controlling for days after symptom onset. 22 As two positive controls, we confirmed well documented findings that lymphocyte percentages 24 were negatively correlated with symptom severity and neutrophil percentages were positively 25 correlated with symptom severity ( Figure 4A ) 24 . In addition, our regression analysis identified 17 26 immune pathways and 24 plasma proteins that are significantly associated with symptom 27 severity (FDR<0.05, Figure 4B -C, and Supplemental table 3). The proteins and pathways from 28 cluster 1 (as identified above in Figure 3A ) were significantly enriched (Fisher's exact test, p < 29 0.001), including pathways related to interferon response, Rig-I signaling, NK cell activation and 30 multiple protein markers known to be induced by interferon signaling (MCP-1, MCP-2 and 31 CXCL11). The result highlights the association between early immune responses and symptom 32 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 29, 2021. severity. Our regression analysis excluded the asymptomatic individuals, as their symptom 1 onset date was unknown. To include the asymptomatic individuals, we performed one-way 2 ANOVA analysis without adjusting for symptom onset time. The results from the ANOVA 3 analysis were consistent with the regression analysis ( Figure 4D ). CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We examined associations between plasma proteins measured early in the course of infection 1 and oropharyngeal viral load (measured by the area under the Ct curve from day 0 to 14 post 2 enrollment). We identified 36 plasma proteins significantly associated with oropharyngeal viral 3 load (top 10 significant proteins shown in Figure Table 3 CoV-2-specific antibodies measured at 28 days and 7 months post-enrollment. We identified 87 15 plasma proteins that were significantly associated with SARS-CoV-2-specific T cell responses at 16 day 28, and 91 and 13 plasma proteins significantly associated with S protein-binding IgG at 17 day 28 days and month 7, respectively (top 10 significant proteins shown in Figure Table 3) . 24 25 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint We further examined immune pathways and proteins associated with multiple clinical, virologic, 1 and immune outcomes in COVID-19 patients. We identified 21 plasma proteins and 15 immune 2 pathways that are correlated with three out of four aspects of the patient outcomes ( Figure 5C 3 and D). These include expected direct links (e.g. the correlation between immunoglobulin 4 production pathway and S protein-binding IgG) and more indirect links (e.g. the correlation 5 between proinflammatory cytokines and S protein-binding IgG) between immune 6 measurements. To establish a sequential order of pathway activation and protein expression, 7 we fit a quadratic regression for each measurement, and then identified the time when the 8 measurement reached maximal expression (Supplemental Figure 2A ). This revealed 4 plasma 9 proteins and 12 immune pathways whose expression were significantly associated with time. In addition, these variations were predicted to increase the expression of IFNAR2 and CCR2 26 . 17 While the exact causal relationship cannot be established from our observational data, together 18 our results suggest that the early interferon-related response and downstream CCR2 signaling 19 shape later adaptive responses, and have long-term impact on the clinical, virological and 20 immunological outcomes in COVID-19 patients. 21 Interestingly, plasma levels of RIG-I (gene symbol DDX58) were significantly associated with all 23 examined virologic and immunologic outcomes ( Figure 5F ), as well as symptom severity ( Figure 24 4B). Higher levels of plasma RIG-I were associated with less oropharyngeal viral load, more 25 severe symptoms, increased SARS-CoV-2 specific T cell responses, and increased levels of S 26 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint protein-binding IgG to SARS-CoV-2. Since RIG-I is a cytosolic PRR that, upon recognition of 1 short viral double-stranded RNA during a viral infection, leads to upregulation of interferon 2 signaling 27 , we explored associations between plasma RIG-I levels and related immune 3 measurements, including the mRNA-level and protein-level expression of RIG-I and interferons, 4 as well as RIG-I and interferon-related pathways. We found that the plasma RIG-I levels were 5 modestly correlated with mRNA-level expression of RIG-I (correlation = 0.23, p value = 0.004, 6 Figure 5G ), as well as Rig-I signaling and Interferon related pathways ( Figure 5C ). Interestingly, 7 we found a strong correlation between plasma level of RIG-I and plasma level of DFFA, an 8 intracellular protein known to be involved in apoptosis ( Figure 5H is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint Interestingly, we found that the vaccine also induced two proteins that are negatively associated 20 with T cell and antibody response following natural infection, including TWEAK and DNER 21 ( Figure 6D ). TWEAK has been known to attenuate the adaptive immunity by inhibiting STAT-1 22 and NF-κB 35 , suggesting that its induction could have a negative impact on the protective 23 immunity against COVID-19. Taken together, our comparative analysis shows that the 24 proteomic response of the BNT162b2 vaccine mirrors in many ways the proteomic response 25 after SARS-CoV-2 infection. At the same time, we found important distinctions, including fast 26 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint activation of adaptive immunity, an absence of neutrophil response in response to the second 1 dose of vaccine and the induction of TWEAK that may negatively affect the adaptive response. We performed predictive modeling to test if plasma proteins measured early following infection 24 can accurately predict symptom severity, oropharyngeal viral load, and SARS-CoV-2 specific 25 memory T cell and antibody responses manifested later in the study. We adopted a computation 26 pipeline to select a small subset of predictive biomarkers from the 184 proteins measured by 27 Olink assays. We used a leave-one-out cross validation strategy to iteratively evaluate the 28 model performance. We used Random Forest for feature selection and for building the final 29 model ( Figure 7A ). Based on results from cross-validation, we selected between 8 to 10 protein is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint severity, oropharyngeal viral load, memory t cell activity, day 28 spike protein binding IgG levels 1 and month 7 spike protein binding IgG levels, respectively ( Figure 7B ). 2 3 We compared the final models to baseline models that use only demographic (age and gender) 4 data. The selected protein markers substantially improved the prediction of symptom severity, 5 spike protein binding IgG levels at day 28 and month 7, and memory T cell responses at day 28. 6 On the other hand, protein markers did not improve the prediction for oropharyngeal viral load. 7 8 We further tested if our model can accurately predict symptom severity in an independent 9 dataset. We identified a published dataset that characterized the plasma proteins from 58 10 COVID-19 patients (26 moderate cases and 34 severe cases) 36 . Our model was able to 11 accurately identify severe cases in the independent dataset, achieving an AUC of 0.96. The 12 individuals in the test dataset already manifested severe symptoms while our training dataset 13 was collected before the severe symptom were shown, potentially explaining the higher model 14 performance in the test dataset than in the training dataset. 15 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. after enrollment. We also observed that the immune response after the first dose of SARS-1 CoV-2 mRNA vaccination largely recapitulates the trajectory of immune response after SARS-2 CoV-2 infection while the response to the second dose of vaccine is characterized by fast 3 upregulation of both early and late protein markers and absence of a neutrophil response. 4 Finally, we demonstrate that a machine learning model is able to predict symptom severity, T 5 cell memory response and antibody response accurately using 8-10 plasma protein markers. 6 7 We observed that high plasma RIG-I levels were associated with greater disease severity, T cell 8 activity, and the antibody response, suggesting that plasma RIG-I is a biomarker for increased 9 immune activity in COVID-19 patients. High plasma RIG-I levels were also associated with 10 lower SARS-CoV-2 viral loads, suggesting a potential role of this protein in restricting early virus 11 replication. RIG-I has been shown to be critically important in the response to several RNA 12 viruses, including influenza virus, typically via interactions with the adapter protein mitochondrial 13 antiviral-signaling protein (MAVS) and downstream Type I and Type III interferon upregulation. 14 RIG-I was recently shown to play an important role in both sensing SARS-CoV-2 RNA and 15 inhibiting SARS-CoV-2 replication in human lung cells, but not via downstream MAVS 16 induction 37 . Rather, interactions between the RIG-I helicase domain and SARS-CoV-2 RNA 17 induced an inhibitory effect on viral replication, independent of downstream interferon 18 upregulation 37 . This may explain the rather modest correlations observed between plasma RIG-I 19 and RIG-I signaling and interferon related pathways. In contrast, we observed significant 20 correlations between plasma RIG-I levels and plasma levels of DFFA, an intracellular protein 21 known to be involved in cell death 29 , as well as other intracellular proteins, suggesting that 22 plasma RIG-I levels may reflect increased cellular apoptosis. This hypothesis is consistent with 23 a recent report which observed significant associations between gene expression signatures of 24 apoptosis in plasmacytoid dendritic cells with increased disease severity 9 . 25 26 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint Our analysis also suggests that the variations in the early immune response shape the long-1 term outcome of COVID-19 patients. However, the cause of variations in the early responses 2 are not fully understood. We observed that higher expression of three CCR2 ligands (MCP1, 3 MCP2 and MCP3) were associated with multiple patient outcomes in COVID-19 patients, 4 including increased disease severity and higher T cell activity and S protein-binding IgG levels. 5 In addition MCP1 and MCP3 are negatively associated with oropharyngeal viral load. Our result 6 is consistent with previous studies that show CCR2 signaling is associated with symptom 7 severity in multiple viral infections, including SARS-Cov-2 and influenza 38,39 . A previous GWAS 8 study has also identified an association between COVID-19 symptom severity and genetic 9 variations that leads to increased CCR2 expression (receptor to interferon induced MCP1, 10 MCP2 and MPC3) 26 . On the other hand, mouse studies show that CCR2 is essential for the 11 survival of mice after pathogen challenge 40-42 . Our study also shows that the CCR2 ligands are unclear whether a neutrophil response may be beneficial, or detrimental, following vaccination. 7 In addition, we found that the vaccine also induced proteins that are negatively associated with 8 T cell and antibody response following natural infection, including TWEAK 35 . Further studies will 9 be required to determine whether inhibiting TWEAK could potentially improve vaccine efficacy. 10 Our study has some limitations. First, while we identified multiple associations between early 12 immune measures and the outcome of COVID-19 patients, we did not establish causal 13 relationships between them. Future studies are needed to perturb key immune pathways in the 14 early immune response and test their effect on the patient outcomes. Second, our study 15 measured the immune response during the first 2 weeks of symptom onset in COVID- 19 16 patients. Earlier immune responses between the initial infection and symptom onset have not 17 been characterized. This is due to the difficulty to detect pre-symptomatic COVID-19 infection. 18 Routine SARS-COV-2 monitoring in a select cohort will be required to acquire samples prior to 19 and immediately after the infection in order to assess whether pre-infection signatures predict 20 outcomes in COVID-19 patients. Third, our analysis focused on individual plasma proteins 21 (based on olink data) and immune related Gene Ontology pathways (based on RNA-seq data). 22 We used the Gene Ontology-based pathways to provide a high-level overview of the immune 23 response in COVID-19 patients. Caution should be taken when interpreting the Gene Ontology 24 pathways results, as the pathways are manually curated gene lists from literature and subject to 25 publication bias, curation errors and over-simplification of biological processes. We encourage 26 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 29, 2021. ; others to investigate the immune response of individual genes of interest using our shared RNA-1 seq data. Finally, we have created machine learning models to predict multiple outcomes in 2 COVID-19 patients, including symptom severity, T cell response and antibody responses. While 3 we are able to validate the model for predicting symptom severity in an independent dataset, 4 additional datasets are needed to validate our model for predicting T cell and antibody 5 responses. 6 7 In this study, we identified multiple biomarkers for predicting clinical and immunological 8 outcomes in COVID-19 patients, including plasma level of RIG-I and the CCR2 ligands (MCP1, 9 MCP2 and MCP3). In addition, we demonstrate that machine learning models using 8- 10 10 biomarkers are highly effective in predicting these outcomes. The models can potentially be 11 used to identify high-risk COVID-19 patients who will develop life-threatening symptoms, and to 12 predict the degree of immune memory development. In addition, these biomarkers and models 13 could also help explain variations in the response to COVID-19 vaccines, and to further identify 14 differences between natural infection and vaccine-induced immunity. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint individuals were eligible if infection was the initial diagnosis of SARS-CoV-2 infection. Exclusion 1 criteria included current or imminent hospitalization, respiratory rate >20 breaths per minute, 2 room air oxygen saturation <94%, pregnancy or breastfeeding, history of decompensated liver 3 disease, recent use of interferons, antibiotics, anticoagulants or other investigational and/or 4 immunomodulatory agents for treatment of COVID-19, and prespecified lab abnormalities. Full 5 eligibility criteria are provided in the study protocol. The protocol was amended on June 16 th , 6 2020 after 54 participants were enrolled but before results were available to include adults up to 7 75 years of age and eliminate exclusion criteria for low white blood cell and lymphocyte count. 8 The trial was registered at ClinicalTrials.gov (NCT04331899). The study was performed as an guidelines identify oropharyngeal swabs as acceptable upper respiratory specimens to test for 26 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 29, 2021. ; https://doi.org/10.1101/2021.08.27.21262687 doi: medRxiv preprint COVID-19 patients Time-resolved systems immunology reveals a late juncture linked to fatal 2 COVID-19 Longitudinal analyses reveal immunological misfiring in severe COVID-19 Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell 6 Multi-omic profiling reveals widespread dysregulation of innate immunity 8 and hematopoiesis in COVID-19 Single-cell multi-omics analysis of the immune response in COVID-19 Longitudinal immune dynamics of mild COVID-19 define signatures of 12 recovery and persistence Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine Peginterferon Lambda-1a for treatment of outpatients with 17 uncomplicated COVID-19: a randomized placebo-controlled trial