key: cord-0272484-byn0omi8 authors: Buckley, Paul R.; Lee, Chloe H.; Pinho, Mariana Pereira; Babu, Rosana Ottakandathil; Woo, Jeongmin; Antanaviciute, Agne; Simmons, Alison; Ogg, Graham; Koohy, Hashem title: HLA-dependent variation in SARS-CoV-2 CD8+ T cell cross-reactivity with human coronaviruses date: 2021-07-20 journal: bioRxiv DOI: 10.1101/2021.07.17.452778 sha: 60091e612bda4df141a3dfa63a2d005a8d4d22d1 doc_id: 272484 cord_uid: byn0omi8 Pre-existing T cell immunity to SARS-CoV-2 in individuals without prior exposure to SARS-CoV-2 has been reported in several studies. While emerging evidence hints toward prior exposure to common-cold human coronaviruses (HCoV), the extent of- and conditions for-cross-protective immunity between SARS-CoV-2 and HCoVs remain open. Here, by leveraging a comprehensive pool of publicly available functionally evaluated SARS-CoV-2 peptides, we report 126 immunogenic SARS-CoV-2 peptides with high sequence similarity to 285 MHC-presented target peptides from at least one of four HCoV, thus providing a map describing the landscape of SARS-CoV-2 shared and private immunogenic peptides with functionally validated T cell responses. Using this map, we show that while SARS-CoV-2 immunogenic peptides in general exhibit higher level of dissimilarity to both self-proteome and -microbiomes, there exist several SARS-CoV-2 immunogenic peptides with high similarity to various human protein coding genes, some of which have been reported to have elevated expression in severe COVID-19 patients. We then combine our map with a SARS-CoV-2-specific TCR repertoire data from COVID-19 patients and healthy controls and show that whereas the public repertoire for the majority of convalescent patients are dominated by TCRs cognate to private SARS-CoV-2 peptides, for a subset of patients, more than 50% of their public repertoires that show reactivity to SARS-CoV-2, consist of TCRs cognate to shared SARS-CoV-2-HCoV peptides. Further analyses suggest that the skewed distribution of TCRs cognate to shared and private peptides in COVID-19 patients is likely to be HLA-dependent. Finally, by utilising the global prevalence of HLA alleles, we provide 10 peptides with known cognate TCRs that are conserved across SARS-CoV-2 and multiple human coronaviruses and are predicted to be recognised by a high proportion of the global population. Overall, our work indicates the potential for HCoV-SARS-CoV-2 reactive CD8+ T cells, which is likely dependent on differences in HLA-coding genes among individuals. These findings may have important implications for COVID-19 heterogeneity and vaccine-induced immune responses as well as robustness of immunity to SARS-CoV-2 and its variants. Results Curation of functionally evaluated SARS-CoV-2 peptides 113 114 To investigate the potential for T cell cross-reactivity against SARS-CoV-2 as conferred by 115 common-cold HCoVs, we curated a comprehensive pool of SARS-CoV-2 class I and II 116 peptides from three datasets (see Methods), which have been functionally evaluated for CD4+ 117 and CD8+ T cell responses (see Figure 1 for study overview). The data comprise 1799 and 118 1005 immunogenic and non-immunogenic SARS-CoV-2 peptides respectively (Fig 2A) . Many 119 of these peptides were tested for T cell reactivity in the context of multiple HLA alleles and/or 120 by multiple assays (IFNg, IL-5 production etc). Furthermore, some peptides are described by 121 qualitative labels corresponding to varying response magnitude (Positive-high and Positive-122 low etc). Taking these combinations into account, we found 3979 and 2427 immunogenic and 123 non-immunogenic complexes (Fig 2B) . For immunogenic complexes, the most common 124 lengths are 9-mers, followed by 15-and 10-mers (Fig 2C) , and 36.0% are presented by class I 125 MHC, 32.9% by class II (Fig 2D) and for 31% MHC type is unknown (Fig S1A) . For non-126 immunogenic complexes, 36.1% are presented by class I, 26.4% by class II and for 37.51% the 127 MHC is unknown. At the gene level, HLA-allele specific information was available for 934 128 (56.5%) and 607 (42.2%) of immunogenic class I and II complexes respectively ( Fig S1A) . Given the high proportion of missing MHC information, we employed netMHCpan 4.1 and 131 netMHCIIpan to predict presenting class I and class II alleles respectively for immunogenic 132 peptides (see Methods). Here, we were able to identify 98% of known MHC molecules, 133 providing confidence in predictions for unknown alleles (Fig S1B) . 134 135 We next sought to examine whether HLAs exhibit preferences towards presenting peptides 136 from certain SARS-CoV-2 proteins. By employing a similar methodology to Karnaukhov et 137 al 17 , we gauged the enrichment and depletion of HLA ligands arising from these proteins (see 138 Methods). Indeed, we observed differential antigen presentation by HLAs e.g., HLA-C*07:02 139 appears to be the most consistently enriched in presenting 9mers from the examined proteins 140 (Fig 2E) , while HLA-A*02:01 is enriched in presenting 9mers from ORFs but depleted for 141 10mers across most assessed proteins. This disparity may be due to a known preference of 9-142 mers for HLA-A*02:01 18 . Furthermore, despite the prevalence of HLA A*02:01 in the global 143 population and in MHC presentation experiments, this allele appears to be depleted for 144 presenting ligands from SARS-CoV-2 proteins that have been the focus of intense experimental 145 work e.g., spike and nucleocapsid phosphoprotein. These patterns of HLA preferences in presenting SARS-CoV-2 peptides appear to differ for 9-148 and 10-mers. For example, whereas HLA-C*07:02 is enriched for presenting 9mers, this allele 149 appears to be a poor presenter of 10mers from each examined protein. It is unclear why 150 substantially fewer 10-mer HLA-C*07:02 ligands are predicted than 9mers, however it is 151 plausible that this allele may prefer 9mers, as appears to be the case with HLA-A*02:01, -152 A*11:01 and -B*40:01 18 , or that this may be a SARS-CoV-2 specific effect. 153 154 Although it is of great interest to reveal the rate to which SARS-CoV-2 MHC-bound peptides 155 are immunogenic in humans 19 , it cannot be examined directly with existing data because not 156 all MHC-bound SARS-CoV-2 peptides have been evaluated for immunogenicity. 157 Nevertheless, we explored the pool of MHC-bound peptides in our dataset that have been 158 examined for a T cell response, to gauge the proportion that SARS-CoV-2 pMHC are 159 immunogenic. Overall, we observed low rates of immunogenic pMHC (Fig 2F) , although 160 ligands of HLA-B*40:01 appear to be commonly immunogenic. Interestingly we observed that 161 HLA-C*07:02 does not present any 10-mers in our dataset. This apparent preference for 9-162 mers is consistent with availability of HLA-C*07:02 ligands tested for T cell response in 163 humans from the IEDB, where there exist only 121 unique peptides, of which 73% are 9mers 164 and only 12% are 10mers. In summary, these data suggest length and source protein 165 preferences for HLA alleles presenting SARS-CoV-2 peptides and that HLA-B*40:01 SARS-166 CoV-2 ligands are commonly immunogenic. Identification of Shared and Private Immunogenic SARS-CoV-2 peptides 169 170 To discriminate SARS-CoV-2-HCoV shared (herby referred to as 'sCoV-2-HCoV') peptides, 171 we compared immunogenic SARS-CoV-2 peptides to HCoV protein sequences. For this, we 172 define a metric that considers 1) sequence homology, 2) physicochemical similarities 173 (MatchScore 20 ) and 3) presentation status for which the source peptide from SARS-CoV-2 and 174 the target peptide from one of the HCoVs are required to be presented by the same HLA. A 175 source peptide is defined as shared if it fulfils all these three conditions otherwise is considered 176 as a private peptide (see Methods). 177 178 Using our metric, we identified 126 unique SARS-CoV-2 (immunogenic) peptides pointing to 179 285 highly similar peptides in HCoVs (Supplementary Data File 1). Hence, we provide a 180 comprehensive map of private and shared SARS-CoV-2 functionally evaluated immunogenic 181 peptides, and for sCoV-2-HCoV peptides, their matches from each HCoV. Out of the HLAs tested (see Methods) 33 and 28 class I and II HLAs respectively were 184 predicted to present the target HCoV pMHCs (Fig3A). HLA-A*02:01 and HLA-B*27:05 were 185 the most and least common class I presenters respectively. For class II, DRB1-1501 and DRB5-186 0101 were the most common presenters, while DRB1-0301 and DRB1-1303 were the least. 187 Most shared class I and II peptides were predicted to bind multiple HLA allelic variants (Fig 188 2SA ). Compared with private peptides it appears that sCoV-2-HCoV peptides are presented by 189 less HLAs, although this was not significant (Fig S2C) . Nevertheless, the range of predicted 190 alleles for these peptides suggests recognition in broad geographical and ethnic settings 21 . For the 126 SARS-CoV-2 peptides with high similarity to HCoV, we also observed binding to 193 multiple HLAs (FigS2B). In addition, we found that 9mers comprise 54% of the 126 SARS-194 CoV-2 peptides with high-similarity matches to HCoV, followed by 15mers (19%) and 10mers 195 (17.5%) (FigS2C). Consistent with previous reports 22 , the betacoronaviruses HKU1 and OC43 196 were most enriched in target matches (Fig3B), perhaps due to higher total sequence homology 197 among betacoronavirus strains 23 . We next examined the extent to which immunogenic CoV-2 peptides exhibit homology to multiple HCoV strains. Surprisingly, we found that 42 199 SARS-CoV-2 immunogenic peptides exhibit matches to at least three strains (Fig3C). However, we observed small clusters of peptides that only possess homology with one strain, 201 e.g OC43 or HKU1. ORF1ab protein and spike surface glycoprotein produced the highest 202 quantity of shared SARS-CoV-2-HCoV peptides in both strains, and the protein regions from 203 which these peptides were found are similar in both HKU1 and OC43 (Fig S2E-H) . Of particular note about our map of shared and private peptides is that this map is subject to 206 thresholds that we used in our metric. The sequence homology threshold that was used here is 207 50% and most peptides had greater than or equal to 70% sequence homology ( Fig S2E) 208 Although, more stringent sequence homology parameter will result a map containing fewer 209 shared peptides (Fig 3D) , our main conclusions in this manuscript remain the same even with 210 sequence homology threshold of 70% (data are not shown). Lastly, we compared the amino acid distribution between shared and private SARS-CoV-2 213 peptides for 9-mers, which is the most common peptide length in our dataset ( Fig 3E) . We 214 observed some moderate differences, e.g., increased prominence of Valine at position 9 within 215 shared peptides. We have therefore identified a pool of 126 SARS-CoV-2 immunogenic peptides -that exhibit 218 high similarity to 291 peptides in HCoV strains -which are likely to be presented by an array 219 of class I and II HLA molecules. This array of presenting alleles suggests the potential for 220 broad global population coverage, which is explored later. We propose that this pool of 221 experimentally confirmed immunogenic SARS-CoV-2 peptides and their counterpart high 222 similarity matches be considered as potential targets for T cell cross-reactivity, therefore 223 warranting investigation into pre-existing immune memory from HCoV or a role in protection 224 from SARS-CoV-2 variants. Identification of peptides with high similarity to self and self-microbiomes 227 228 To prevent aberrant T cell mediated inflammation and tissue damage, the immune system has 229 evolved several checkpoint mechanisms. These include thymus negative selection and 230 peripheral tolerance. Indeed, dissimilarity to self is increasingly recognised as a component of 231 peptide immunogenicity 25 , which may assist in calibrating a balance between immunogenicity 232 and inflammatory pathogenesis. To evaluate the extent to which dissimilarity to self and self-microbiomes contribute to SARS-235 CoV-2 peptide immunogenicity, we took a similar approach and used our metric to compare 236 SARS-CoV-2 peptides to human self-proteome and microbiomes that include 457 gut and 50 237 airway microbiota. (see Methods). Here, for SARS-CoV-2 HLA class I presented 9-and 10-238 mer peptides we observed that immunogenic SARS-CoV-2 peptides were significantly more 239 dissimilar to the human proteome than their non-immunogenic counterparts (Fig 4A, S3A ). 240 Using this approach, we could not detect any significant difference between immunogenic and 241 none immunogenic class II peptides in their dissimilarity to self-proteome ( Fig S3B) . Interestingly however, for peptides of both lengths 9 and 10, we identified several 244 immunogenic SARS-CoV-2 peptides with considerable sequence similarity to the human 245 proteome (Fig 4A-B , Table S1 ). For the top 10% of these peptides with highest similarity to 246 self, the mean amino acid conservation (the proportion of the amino acid sequence which is 247 exactly conserved) between these peptides and corresponding self matches is 72. 1% To investigate the potential association of these peptides in immunopathology further, we 255 predicted MHC presentation by a set of class I HLA alleles (see Methods) for the top 10% of 256 peptides most similar to the human proteome for 9mers and 10mers. We observed that these 257 peptides with high similarity to self are predicted to bind multiple HLAs (Fig 4C) , and 258 interestingly, we found that in most cases, the SARS-CoV-2 immunogenic peptide and the 259 match from the human proteome are predicted to be presented by the same allele (Fig4C). 260 261 Next, we examined the list of genes with high sequence similarities to these SARS-CoV-2 262 immunogenic peptides ( post-acute sequelae of COVID-19 36 (often referred to as 'long-COVID'). Of further interest, 280 we found SMPD4 and SLC1A4, which together with CCL3 and CCL3L1 are involved in the 281 response to TNF, which is part of the cytokine storm following COVID-19 disease. By comparing SARS-CoV-2 peptides to human microbiomes, we observed subtle higher 284 dissimilarity of SARS-CoV-2 immunogenic peptides to the gut ( Fig S3C) and airways ( Fig 285 S3D ) microbiomes, which may suggest a link between the diversity of both microbiota and 286 heterogeneity of the disease in populations, although this warrants further investigation. Given the magnitude of the global pandemic and the widespread vaccination required to 289 combat it, future virus-induced autoimmune disease and immunopathology is of concern. 290 Overall, this analysis suggests dissimilarity of viral peptides to self-proteins as a correlate of 291 peptide immunogenicity. Furthermore, we present candidate genes and peptides with high 292 similarity to SARS-CoV-2 T cell targets, which we suggest as prime targets for further 293 investigations into their role in autoimmune disease and immunopathology following SARS-294 CoV-2 infection and/or vaccination. 295 296 CD8+ T cell cross-reactivity and common-specificity within SARS-CoV-2 297 298 A valuable characteristic of our map of SARS-CoV-2 shared and private peptides, is that for 299 245 of these (out of 1279 class I immunogenic peptides), cognate TCRs at the beta chain 300 resolution are available in the IEDB. We therefore set out to map the TCR landscape through 301 a network approach to explore the potential for cross-reactivity among SARS-CoV-2 specific 302 CD8+ T cells, and their common-specificity. Here, to avoid overestimating connectivity, any 303 peptides of different lengths, which share starting positions in the SARS-CoV-2 proteome and 304 are recognised by identical sets of TCRs, are considered as one peptide. Through a two-mode (bipartite) network-graph illustrating the connectivity of SARS-CoV-2 307 immunogenic peptides with their cognate TCRs, amongst a highly connected topography we 308 observed considerable connectivity for some sCoV-2-HCoV peptides e.g. "FLN.." (Fig S4A) . Exploring this further, we projected the bipartite network-graph into a one-mode graph where 310 nodes represent peptides and an edge between two nodes requires existence of a TCR 311 recognising both peptides ( Fig S4B) . The clustering around a small set of hubs suggests that 312 many experimentally assessed TCRs target a small set of SARS-CoV-2 peptides. Indeed, we 313 found that in this dataset, 80% of the TCRs are reported to recognise only 40 (16%) peptides, 314 of which 4 are sCoV-2-HCoV shared peptides and 36 are SARS-CoV-2 private ( Fig S4C) . This 315 dominant set of peptides may be due to experimental biases e.g., research may be heavily 316 biased toward several protein regions. However, this may also reflect a selection bias by SARS-317 CoV-2 specific TCRs. In this regard, amongst these 80% of TCRs, we observed high usage of 318 V gene TRBV20-1 37 and J gene TRBJ2-1 38 (Fig S4D) , that have been previously reported to 319 have implications in COVID-19 patients. Similarly, we examined the extent of common specificity in SARS-CoV-2 specific T cells by 322 a one-mode graph in which nodes represent TCRs and an edge represents whether two nodes 323 (TCRs) recognize the same peptide ( Fig S4E) . Interestingly, this graph reveals a set of highly 324 connected hubs reflecting levels of common specificity, however there are many TCRs which 325 recognise only a single unique peptide. Comparing these two sets of TCRs, we did not observe 326 considerable differences in their CDR3b sequences ( Fig S4F-G) , however we observed 327 differences in V and V-J gene usage (Fig S4H-J) . In summary, we employed peptides with known cognate TCRs in the IEDB database -although 330 limited in numbers -to explore SARS-CoV-2 CD8+ T cell cross-reactivity. Our network 331 approach demonstrates that SARS-CoV-2 CD8+ T cells can cross-react and exhibit common-332 specificities. Presence of public TCRs recognising sCoV-2-HCoV peptides in COVID-19 convalescents and 335 healthy subjects 336 337 We next integrated our map of SARS-CoV-2 shared and private peptides with a recently 338 published dataset known as 'MIRA' 39 to track the patterns of public TCRs recognizing sCoV-339 2-HCoV peptides in convalescents and/or healthy subjects. Here, Nolan et al., employed the 340 'Multiplex Identification of Antigen-Specific T cell receptors' (MIRA) assay to identify 341 SARS-CoV-2 specific TCRs from PBMCs and naïve T cells. These data include more than 342 160k high confidence SARS-CoV-2-specific TCRs mapped to target peptides from 39 Healthy 343 controls (HC) (defined as unexposed to SARS-CoV-2) and 90 COVID-19 convalescent 344 patients. These data consist of 792 unique SARS-CoV-2 peptides, 54 of which are sCoV-2-345 HCoV shared peptides. 346 To elucidate the landscape of public TCRs in HC and convalescent patients, we generated a 348 bipartite graph comprising all public TCRs (defined as CDR3b+V +J gene(s) present in at least 349 two subjects) cognate for SARS-CoV-2 private and sCoV-2-HCoV shared peptides ( Fig 5A) . This graph revealed two clear hubs. In the first (green nodes), we observed that healthy subjects 351 were connected to public TCRs which recognise both sCoV-2-HCoV and SARS-CoV-2-352 private peptides. In the second hub (red nodes) comprising convalescent patients, we observed 353 that generally their public TCR repertoires predominately recognise SARS-CoV-2-private 354 peptides. Indeed, it appears that cognate TCRs of sCoV-2-HCoV peptides are more pronounced 355 in HC ( Given that in these healthy donors, the TCRs are generally from naïve CD8+ T cells which are 363 expanded and stimulated with SARS-CoV-2 peptide pools and analysed with the 'MIRA' 364 assay, the presence of cognate TCRs recognising sCoV-2-HCoV peptides in HC as well as 365 COVD-19 patients may not necessarily translate into pre-existing T cell immunity. Rather, due 366 to the high similarity between the cognate SARS-CoV-2 antigens and (predicted) HCoV 367 presented peptides, we suggest it is plausible that these SARS-CoV-2 specific TCRs are cross-368 reactive with HCoV peptides. Indeed, consistent with Francis et al., 15 who demonstrate pre-369 existing memory CD8+ T cells to SPR* peptide in 80% of unexposed individuals, we found a 370 set of public TCRs -which are observed in both convalescent and unexposed individuals -371 recognising this sCoV-2-HCoV peptide. In this light, we reveal candidate public TCRs and 372 corresponding SARS-CoV-2 peptides with high similarity to HCoVs, which should be 373 examined further for cross-reactive potential. From these two bipartite graphs, we observed that healthy individuals respond to a balance of 376 SARS-CoV-2 private and sCoV-2-HCoV peptides, although it appears that infection primarily 377 dictates a dominant recognition of private SARS-CoV-2 peptides ( Fig S5B) . For convalescent 378 patients, we observed that public TCR repertoires of the majority (51/86) of patients are almost 379 entirely (>=99%) occupied by TCRs recognising SARS-CoV-2 private peptides ( Fig 5C) . 380 However, in a subset of convalescent patients, public TCRs recognising sCoV-2-HCoV 381 peptides comprise a substantial fraction of the public repertoire. In fact, for 12 convalescent 382 patients, >50% of their public TCRs recognise sCoV-2-HCoV peptides. Comparing these two groups of patients, we did not find evidence of a link toward biological 385 sex or age. To explore potential correlates, we first gathered the 12 patients whose public TCRs 386 most dominantly (>50%) recognise sCoV-2-HCoV peptides (labelled PubTCR-SharedEp), and 387 then via sampling 12 patients 10 times from the set of 51 patients whose public TCRs almost 388 entirely recognise SARS-CoV-2 private peptides (labelled PubTCR-Private), we compared 389 HLA coding genes of these two groups. We observed that the PubTCR-SharedEp group is 390 statistically enriched for carrying HLA-B*07:02, HLA-C*07:02 and HLA A*03:01, whereas 391 the former group includes a broader set of HLAs among which HLA A*01:01 was more 392 pronounced ( Fig 5D) . Taken together, we report existence of a set of CD8+ TCRs in both HC and COVID-19 417 convalescent patients that recognise SARS-CoV-2 peptides with high sequence similarity to a 418 pool of predicted HCoV pMHC. This high sequence similarity indicates cross-reactive 419 potential of these TCRs. Primarily however, we observed that COVID-19 patients develop 420 public TCR responses to private peptides -many of which are not observed in unexposed 421 individuals -indicating that any cross-reactive potential is limited. For the subset of COVID-422 19 patients whose public TCRs are directed towards sCoV-2-HCoV peptides -and are observed 423 in HC -we found distinct HLA profiles. Therefore, in agreement with recent data from Francis 424 et al., we suggest that CD8+ T cell HCoV-CoV-2 cross-reactive potential is apparent, although 425 likely conditioned by patient HLA genotype. It is plausible that these patients may exhibit more 426 robust protection against SARS-CoV-2 and its variants. Potential conserved coronavirus CD8+ T cell targets with broad population coverage 429 430 Given the emergence of new SARS-CoV-2 variants and concern over the theoretical capacity 431 of future mutants to evade current vaccine strategies 1 , conserved CD8+ T cell targets across 432 multiple coronavirus strains with the potential to elicit T cell responses in a large percentage 433 of global populations are of interest. We therefore searched our peptide map for SARS-CoV-2 434 peptides with 'high-similarity' matches to multiple HCoVs, and with cognate TCRs in the 435 MIRA dataset. To select only the top 'high-similarity' SARS-CoV-2-HCoV matches for this 436 analysis, we applied a more stringent sequence homology threshold. Indeed, in addition to the 437 'MatchScore' and peptide presentation criteria outlined previously (see Methods: 438 Discriminating shared and private SARS-CoV-2 peptides), we only retained matches with at 439 least 70% sequence conservation (i.e. allowing 30% amino acid substitution). 440 441 We found 86 peptides that match these criteria, 84 of which are recognised by TCRs in both 442 convalescent and HC (Fig 6A-B) . We next focused on SARS-CoV-2 peptides with high 443 similarity matches in >=3 HCoV strains (Table 2 , Supplementary Data File 7). Of these SARS-444 CoV-2-HCoV matches, the number of amino acid substitutions ranged between 0-3, with a 445 mean of 1.79 and standard deviation of 0.78. Additionally, while each of these peptides 446 exhibited a high similarity match to either MERS or SARS-CoV, the majority exhibited 447 homology with both of these viruses ( Fig S6A) . As well as high conservation across many 448 coronavirus strains, collectively these SARS-CoV-2 peptides are predicted to bind multiple 449 HLA alleles (Fig 6C) , raising the possibility that this set of peptides may elicit T cell responses 450 in a substantial proportion of the global population. We next sought to determine the extent in global and regional populations that these CD8+ T 453 cell targets may elicit T cell responses individually and accumulatively. We therefore used the 454 IEDB population coverage tool 41 , which employs global HLA allele prevalence data to predict 455 the percentage of individuals in a regional population to respond to a given epitope set. Starting 456 with each SARS-CoV-2 peptide and predicted HLAs individually, we find considerable 457 coverage of 55.32% for "LLLD*", while "VQID*" exhibits the lowest predicted coverage of 458 7.09% (Fig 6D) . Similarly to a previous approach by Ahmedid et al 42 , we set out to predict the accumulated 461 global population coverage of the set. We found that 8 peptides collectively produce >90% 462 global coverage, while the entire set is predicted to elicit T cell responses in 92.93% of the 463 global population (Fig 6E) . Regionally, Europe and North America exhibited the highest 464 predicted coverage (Fig 6F) . Of note, Africa and Asia also exhibited high predicted coverage. Central America (defined as Guatemala and Costa Rica) exhibited low coverage of 7%. It is 466 unclear why, and further investigation is necessary to produce a peptide set with high coverage 467 in these countries. Overall, we identified a set of 10 SARS-CoV-2 immunogenic peptides, each highly conserved 470 across coronavirus strains, which collectively provide global population coverage of ~93%. We believe that this is an encouraging insight in the search for pan-coronavirus T cell targets, 472 and additionally propose these as top candidates for cross-protective immunity. 473 474 Our work demonstrates that T cells specific to SARS-CoV-2 peptides with high similarity to 477 HCoV pMHC can be expanded from naïve individuals, and that these cognate public TCRs are 478 also observed in a subset of recovered COVID-19 patients. This finding firstly suggests that 479 SARS-CoV-2-unexposed individuals could mount T cell responses to HCoVs that -due to 480 peptide similarity -could be cross-reactive with SARS-CoV-2 antigens. Furthermore, we 481 propose that while COVID-19 disease appears to primarily direct responses against SARS-482 CoV-2-private peptides, patients with certain HLA alleles (e.g HLA-B*07:02, -C*07:02, -483 A*03:01) may be more likely to possess sCoV-2-HCoV cross-reactive CD8+ T cells. It is 484 therefore plausible that SARS-CoV-2 naïve individuals with certain HLAs may be at lower 485 risk of severe disease -or experience augmented vaccine responses -if previously exposed to 486 endemic coronaviruses, however a direct link to pre-existing immunity requires further 487 investigation. Indeed, our analysis indicates that after SARS-CoV-2 infection, a subset of individuals has 490 memory T cells that primarily recognize sCoV-2-HCoV peptides. In these convalescent 491 patients, it is unclear whether infection itself, and/or prior exposure to HCoVs are driving this 492 subset of individuals to select for these peptides. There is conflicting evidence surrounding the 493 existence of memory SARS-CoV-2 cross-reactive CD8+ T cells in unexposed 494 individuals 15,16,40 , and a limitation of our work is that we could not to provide a direct link to 495 pre-existing immunity, because from healthy donors the MIRA dataset only evaluated 496 expanded naïve T cells and did not examine anti-viral efficacy of the responding T cells. 497 Indeed, although we cannot determine the cause or timeframe of this selection of sCoV-2-498 HCoV peptides in this subset of individuals, the potential implications are interesting. It is 499 plausible that these patients may exhibit more robust protection against SARS-CoV-2 variants, 500 HCoVs or even future emerging coronavirus strains. Future work should explore any immunity 501 benefit of infection-induced cross-reactive T cell responses, and in addition, it will be 502 interesting to examine whether vaccination against SARS-CoV-2 can induce T cell memory 503 that is cross-reactive with SARS-CoV-2 variants and/or wider coronaviruses in such 504 individuals. Furthermore, by our identification of a set of 10 potentially cross-reactive peptides 505 with broad population coverage, it is possible that these peptides could be employed to test 506 which patients exhibit cross-reactive phenotypes e.g., after vaccination with relevant antigens. 507 508 More broadly, data are beginning to demonstrate distinct vaccine-induced responses linked to 509 differential patient exposure to SARS-CoV-2 3,4 . In turn, it is possible that COVID-19 vaccine 510 boosted cross-reactive immune responses may influence vaccine-induced protection 7 . Indeed, 511 it will be important to explore whether COVID-19 vaccination can boost any infection-induced 512 cross-reactive T cell memory, and whether this affects robustness of protection from SARS-513 CoV-2 variants or wider coronaviruses. to memory vs naïve responses, we build upon existing work by proposing additional alleles 528 which may be carried by individuals who possess cross-reactive T cells, as well as those which 529 appear depleted or absent in these individuals. Few studies have examined associations 530 between HLA type and COVID disease or its severity 15,44,45 . Nevertheless, the emerging 531 picture is indicating that HCoV-SARS-CoV-2 cross-reactivity is conditioned by HLA 532 genotype. Together, we provide a landscape of TCR-pMHC interactions (all TCR-pMHC 533 interactions used in the analyses are found in Supplementary Data File 8) which may be 534 involved in HCoV-SARS-CoV-2 cross-reactivity and provide a framework for further anti-535 viral mechanistic studies. 536 537 Although our study provides a map of shared and private SARS-CoV-2 peptides to date and 538 offers the extent to which one may expect CD8+ T cells cross-reactivity between HCoVs and 539 SARS-CoV-2, a limitation is that for cross-reactivity insights, we had to limit ourselves only 540 on CD8+ T cells for which both peptides and their cognate TCRs information were available. 541 Additionally, our approach for identifying homologous sequences seems to work better for 542 MHC class I peptides that are considerably shorter in length than their class II counterparts. 543 With a more suitable metric for longer peptides, one may substantiate our insights for class II. Our metric for discriminating shared and private peptides is based on three factors: 1) sequence 546 homology at 50%, 2) physicochemical similarity of 75% and 3) that both source and target 547 peptides must be presented by the same HLA. Of these three, 50% sequence homology may 548 seem too relaxed. In support of our use of this threshold we note that: a) factors 2 and 3 are 549 additionally applied to compensate for this, b) we have checked our results with 70% sequence 550 homology and observed that main conclusions are robust, 3) as this map is suggested for further 551 functional validation we favour minimizing false negatives at the cost of potential false 552 positives. Through examining the potential for cross-reactivity between SARS-CoV-2 and HCoV strains, 555 we have predicted that a set of 10 highly conserved immunogenic peptides could mount CD8+ 556 T cell responses in 99% of the global population. These peptides have been reported previously 557 in in silico and experimental work 46-50 however to our knowledge their large accumulated 558 global population coverage has not yet been reported. Some of these peptides exhibit similar 559 population coverage although with different HLA profiles, therefore it may be possible to tailor 560 a smaller set of peptides to specific regions of interest (based on local HLA frequency), thus 561 maximising coverage with a minimal set of peptides. Our work firstly identifies these peptides 562 as top candidates for cross-reactivity. Secondly, we propose that their high conservation across 563 strains may be of interest as pan-coronavirus targets, to assist ongoing work in search of 564 mitigation strategies to reduce the threat from mutant variants or emerging coronaviruses 51-53 . A complex facet of severe COVID-19 disease and its diverse clinical manifestation is 567 immunopathogenesis. Indeed, exacerbated immune responses including cytokine storm are a 568 primary clinical characteristic in severe COVID-19 patients. Aberrant transcriptional 569 programming has been observed in response to SARS-CoV-2 54 , characterised by a failure of 570 type-1 and -3 interferon responses and simultaneous high induction of chemoattractants. While 571 the growing evidence for pre-existing HCoV cross-reactive memory T cell responses may 572 simply translate into an immunity benefit in some patients, in concert with data from MERS 573 and SARS-CoV-1, there are considerable evidence that cross-reactive T and B cell responses 574 may on the other hand be involved in immunopathology with SARS-CoV-2. 575 576 Venkatakrishnan et al., 55 identified peptides that are identical between SARS-CoV-2 and the 577 human proteome. Their work demonstrates that the genes giving rise to these peptides are 578 expressed in tissues implicated in COVID-19 pathogenesis. Our work expands their insights, 579 by identifying SARS-CoV-2 peptides that are experimentally confirmed to be immunogenic, 580 with high similarity to the human proteome. Consistent with their conclusions, we find 581 similarity of immunogenic SARS-CoV-2 peptides to human genes e.g., CCL3, CCL31 and 582 CD163. These insights are of particular interest given the elevated cytokine and chemokine 583 responses in severe COVID patients. More broadly, there is evidence that viral antigens that 584 are structurally similar to self-antigens can be involved in inducing autoimmunity via 585 molecular mimicry 8 . For these reasons, we propose these peptides as candidates which may 586 exhibit immunopathological or autoimmune associations. 587 588 In conclusion, we have employed an in-silico approach to examine the evidence surrounding 589 cross-reactive SARS-CoV-2 CD8+ T cell responses. We observed a set of SARS-CoV-2 590 candidates with high similarity to the human proteome and suggest investigation into whether 591 they provoke immunopathology. We have also provided evidence of CD8+ T cell cross-592 reactivity, not only to an extent which indicates that naïve individuals could mount cross-593 reactive responses to SARS-CoV-2 and common-cold coronaviruses, but we also found that 594 SARS-CoV-2 infection induces CD8+ T cell responses against peptides with high similarity to 595 HCoV in some COVID-19 patients. We build upon existing evidence that such cross-reactivity 596 is conditioned by presence of specific HLA alleles and envision that the insights presented here 597 are leveraged to explore whether these potentially cross-reactive T cells and cognate pMHCs 598 influence COVID-19 disease heterogeneity, vaccine-or infection-induced protection from 599 SARS-CoV-2 and its emerging variants of concern. 600 601 Acknowledgments: 602 We greatly acknowledge conversations and guidance from Dr Mikhail Shugay (ITM, 603 Moscow), and Dr Giorgio Napolitani (KCL, London). Declaration of interest 613 The authors declare no competing interests. 614 All data processing and analysis was performed using the R plugin for Pycharm 2020, in either R Immunogenic peptides not observed in either the IEDB or VIPR were also gathered from the 'MIRA' 628 dataset which maps cognate TCRs and SARS-CoV-2 peptides. Antigen presentation by MHC class I was predicted using NetMHCpan v4.1 against HLA-A*0101, To provide reasonable statistical inference, we only examined proteins longer than 100 amino acids. To compute enrichment or depletion, we followed the approach by Karnaukhov et al. was recorded. Three metrics -which all must be satisfied -were used determine whether a peptide is 675 considered shared with HCoV or private to SARS-CoV-2. We below describe each metric, and then 676 explain the three thresholds which all must be achieved for a peptide to be classified as 'shared.'. Firstly, once all peptides from the proteome of interest of length N are generated, a similarity index 679 we call the 'MatchScore' is calculated for each pairwise comparison. This metric is charged with 680 assessing physicochemical similarity between two peptides of interest. For each SARS-CoV-2 681 peptide, the highest 'MatchScore' against each HCoV protein is retained and the rest are discarded. To calculate the 'MatchScore', we employ the method designed by Bresciani et al 24 . Briefly, for two 683 peptides a or b of length N, the similarity score is given as; The MatchScore function produces a score where 1 reflects an exact match, i.e no mismatches in two 692 sequences, and 0 reflects high dissimilarity. Criteria 1: A shared peptide and its HCoV match must have a MatchScore of >0.75. The second metric is based on sequence homology between two sequences, essentially reflecting the where 'HammingDistance' is the hamming distance between two peptides of interest, which 702 calculates the number of different positions, and 'Length' is the length of the compared peptides. Criteria 2: The ProportionMismatched between a shared peptide and its HCoV match must be < 0.5 704 (50%). Naturally, the inverse of this is true, in that at least 50% amino acid conservation between a SARS- CoV-2 peptide and a HCoV match must be observed for the peptide to be considered 'shared'. The third metric is based on predicted presentation by HLA of the SARS-CoV-2 peptide and its HCoV match. Criteria 3: Both the SARS-CoV-2 peptide and its HCoV match must be predicted to bind at least one 712 common HLA allele. All three criteria must be satisfied for a SARS-CoV-2 peptide to be classified as a shared peptide and 715 also for a match from HCoV to be considered a homologous match. doParallel and foreach functions 716 were used to parallelise the processing. Here, the same similarity criteria were employed as in the previous HCoV section. However, in 721 contrast with HCoV comparison, due to the size of the human proteomes and microbiomes, the best 722 match against the whole proteome is retained. doParallel and foreach functions were used to 723 parallelise the processing. Gathering human proteome sequence The human proteome was downloaded in fasta format from UniProt Human gene sets with sequence similarity to SARS-CoV-2 immunogenic peptides The SARS-CoV-2 peptides of lengths 9 and 10 with a similarity score to the human proteome in the 740 top 10 percentile were gathered. Only predicted binders (see MHC presentation prediction) were 741 retained. The entire IEDB receptor data for SARS-CoV-2 peptides was downloaded. Bipartite graphs were 746 generated using iGraph and Matrix libraries in R. Bipartite graphs were projected into one-mode 747 graphs using the bipartite_projection function. All graphs were exported from iGraph into Cytoscape 748 v3.82 using the R function createNetworkFromIgraph from package RCy3. From Cytoscape, '.graphml' files were exported and opened with Gephi. Gephi was used to finalise the diagrams and 750 improve visual aesthetics. Either 'ForceAtlas' or 'Fructerman-Reingold' templates were used. Gravity 751 and repulsion parameters were altered to improve visual aesthetics. 752 Figure 1 : Overview of the study. A) Functionally evaluated SARS-CoV-2 peptides are gathered from three online repositories. Data are cleaned and integrated, thus curating a comprehensive pool of SARS-CoV-2 class I and II peptides. Exploratory data analysis followed. B) Next, each immunogenic SARS-CoV-2 peptide of length l is compared to each possible linear peptide of length l from four common-cold causing human coronavirus strains. Based on similarity criteria and following confirmation that the target hit from HCoV is predicted to bind HLA, peptides are classified as 'shared' SARS-CoV-2-HCoV peptides. Those which do not adhere to the criteria are classified as SARS-CoV-2 private. C) The entire set of SARS-CoV-2 shared and private, immunogenic and non-immunogenic peptides is compared to the human proteome and gut and airways microbiomes. D) 245 peptides from our SARS-CoV-2 peptide dataset have known cognate TCRs in the IEDB. These peptide-TCR associations were examined to explore the extent of cross-reactivity and common-specificity within SARS-CoV-2. E) Both shared and private immunogenic SARS-CoV-2 peptides are integrated with the COVID-19 MIRA TCR repertoire dataset and employed to examine the presence of TCRs recognizing shared/private SARS-CoV-2 peptides in health and/or disease. F) The entire set of 126 shared SARS-CoV-2 peptides is searched for those most highly conserved across coronaviruses. This resulted in 17 peptides, of which we used to predict global and regional population coverage, given predicted HLA alleles. Size of the dot represents the MatchScore. B) The frequency of cognate TCRs which recognize these peptides from the COVID-19 convalescent or healthy cohorts. C) The HLA alleles predicted to present SARS-CoV-2 peptides with high similarity matches to 3 or 4 HCoV strains. D) Global population coverage as calculated by the 'IEDB population coverage tool' for each individual SARS-CoV-2 peptide with high similarity matches to 3 or 4 HCoV strains. E) Accumulated global population coverage predicted by the IEDB population coverage tool. F) Regional population coverage for the entire set of 10 SARS-CoV-2 peptides with matches to 3 or 4 HCoV. Table S1 : Peptides identified with high similarity to human proteome. HitSeq shows the match from the self proteome. SharedEpitope shows whether the peptide is a sCoV-2-HCoV shared peptide or not. SYMBOL shows the gene from which the hit peptide is derived. HitSeq_Binder shows whether the hit peptide is predicted to bind an HLA allele. AASeq_Similarity shows the proportion of amino acids conserved between the SARS-CoV-2 peptide and the human proteome match. 3: Data file containing a summary of the peptides recognized By public TCRs in the PubTCR-SharedEp and PubTCR-Private groups, supplemented with those recognized in a sampled set of healthy patients. File contains a 1 or a 0 demonstrating whether a public TCR in each group of patients recognizes the peptide. 4: Data file containing the information from data file 3, but also includes information pertaining to whether key class I HLA alleles are observed in a patient with a public TCR recognizing each peptide. 5: Data file containing cohort information regarding the peptides most commonly recognized by private TCRs in the MIRA dataset. 6: Data file reporting the private TCRs which recognize the sCoV-2-HCoV shared peptides shown in data file 5. 7: Data file containing information from Figure 6 , exhibiting the peptides with hits to >=3 HCoV strains. File provides detailed information regarding SARS-CoV-2 peptide and the corresponding hit to HCoV. 8: Data file contains every TCR from the IEDB and the MIRA datasets which we used in this analysis, mapped to the recognized SARS-CoV-2 peptide. Table S2 : Two previously reported private TCRs are identified in additional HLA-B*07:02+ individuals at beta chain resolution, indicating these are cross-reactive public TCRs. COVID-19: immunopathology and its implications for therapy Increased Serum Levels of sCD14 and sCD163 Indicate a 858 Preponderant Role for Monocytes in COVID-19 Single-cell analysis reveals macrophage-driven T cell dysfunction in 861 severe COVID-19 patients. medRxiv Cytokine storm and leukocyte 865 changes in mild versus severe SARS-CoV-2 infection: Review of 3939 COVID-19 866 patients in China and emerging pathogenesis and therapy concepts Longitudinal profiling of respiratory and systemic immune 869 responses reveals myeloid cell-driven lung inflammation in severe COVID-19 Severe COVID-19 associated variants linked to chemokine receptor gene 872 control in monocytes and macrophages Authors: bioRxiv Prepr CCR5 inhibition in critical COVID-19 patients decreases 874 inflammatory cytokines, increases CD8 T-cells, and decreases SARS-CoV2 RNA in 875 plasma by day 14 Persistence of SARS CoV-2 S1 Protein in CD16 + Monocytes Post-Acute Sequelae of COVID-19 ( PASC ) Up to 15 Months Post-Infection Comprehensive analysis of TCR repertoire in COVID-19 using single 879 cell sequencing Single-cell landscape of immunological responses in patients with 881 COVID-19 A large-scale database of T-cell receptor beta (TCRβ) sequences and 883 binding associations from natural and synthetic exposure to SARS-CoV-2 CD8+ T cells specific for an immunodominant SARS-CoV-2 886 nucleocapsid epitope cross-react with selective seasonal coronaviruses Predicting population coverage of T-cell epitope-based diagnostics 889 and vaccines Cross-serotypically 891 conserved epitope recommendations for a universal T cell-based dengue vaccine Robust T Cell Immunity in Convalescent Individuals with 894 Asymptomatic or Mild COVID-19 Association of HLA Class I Genotypes With Severity of 896 Coronavirus Disease-19 Personalized workflow to identify 900 optimal T-cell epitopes for peptide-based vaccines against Prediction of T and B cell epitopes in the proteome of 902 SARS-CoV-2 for potential use in diagnostics and vaccine design silico studies suggest T-cell cross-reactivity between 905 SARS-CoV-2 and less dangerous coronaviruses Expected immune recognition of COVID-19 virus by 907 memory from earlier infections with common coronaviruses in a large part of the 908 world population 863 Identification and characterization of an 910 immunodominant SARS-CoV-2-specific CD8 T cell response Epitopes Vaccine Prediction against Severe Acute Respiratory Syndrome 914 (SARS) Coronavirus Using Immunoinformatics Approaches Rational Design of a Pan-Coronavirus Vaccine Based on Conserved CTL 917 Epitopes A universal coronavirus vaccine. Science (80-. ) Imbalanced Host Response to SARS-CoV-2 Drives 921 Development of COVID-19 Benchmarking evolutionary tinkering underlying human-923 viral molecular mimicry shows multiple host pulmonary-arterial peptides mimicked 924 by SARS-CoV-2 immunomind/immunarch: 0.6.5: Basic 926 single-cell support Gathering clinical and TCR repertoire data for COVID-19 patients and healthy subjects The COVID-19 MIRA dataset (>160k high-confidence SARS-CoV-2-specific TCRs) was 762 downloaded from https://clients.adaptivebiotech.com/pub/covid-2020 with corresponding sample 763 metadata. These data contain TCR repertoire data mapped to SARS-CoV-2 epitopes from 5 patient 764 cohorts, including COVID convalescent patients and healthy subjects with no known exposure to 765 SARS-CoV-2. Only convalescent patients and healthy subjects were used in the analysis due to low 766 numbers of subjects for other cohorts. A public TCR is defined as a CDR3 sequence and V and J gene which is observed in more than one 772 patient in the MIRA dataset. All graphs were first generated using iGraph in R, exported to Cytoscape 773 using the createNetworkFromIgraph function in the RCy3 package. From cytoscape, all graphs were 774 exported as .graphml files and read into Gephi. In Gephi, either 'ForceAtlas' and 'Fruchterman- Reingold' templates were used. In almost all cases, gravity and repulsion parameters were adjusted to 776 improve visual aesthetics. Estimating population coverage of SARS-CoV-2 peptides with high conservation to three or 779 more HCoV showing the common specificity of SARS-CoV-2 specific TCRs. Each node is a TCR, and an edge reflects whether two TCRs recognize the same peptide. Node size reflects the number of peptides recognized by a TCR. F) A sequence logo plot visualizing the position weight matrix for CDR3b 5mers in the IEDB dataset, for "Hub TCRs", those with considerable common-specificity. G) A sequence logo plot visualizing the position weight matrix for CDR3b 5mers in the IEDB dataset, for "Singletons", those recognizing only one unique SARS-CoV-2 peptide H) A barplot contrasting the Kmer distribution for "Hub" and "Singleton" TCRs. The count is normalized by the number of TCRs in each group (Hub or Singletons). I) A barplot contrasting the V gene usage of "Hub" and "Singleton" TCRs. Y axis shows the frequency to which the V gene is used in each group. J) A barplot contrasting the V-J gene usage of "Hub" and "Singleton" TCRs. Y axis shows the frequency to which the V-J gene combination is used in each group.