key: cord-296007-1gsgd22t authors: Mohseni, Amir Hossein; Taghinezhad-S, Sedigheh; Su, Bing; Wang, Feng title: Inferring MHC interacting SARS-CoV-2 epitopes recognized by TCRs towards designing T cell-based vaccines date: 2020-09-12 journal: bioRxiv DOI: 10.1101/2020.09.12.294413 sha: doc_id: 296007 cord_uid: 1gsgd22t The coronavirus disease 2019 (COVID-19) is triggered by severe acute respiratory syndrome mediated by coronavirus 2 (SARS-CoV-2) infection and was declared by WHO as a major international public health concern. While worldwide efforts are being advanced towards vaccine development, the structural modeling of TCR-pMHC (T Cell Receptor-peptide-bound Major Histocompatibility Complex) regarding SARS-CoV-2 epitopes and the design of effective T cell vaccine based on these antigens are still unresolved. Here, we present both pMHC and TCR-pMHC interfaces to infer peptide epitopes of the SARS-CoV-2 proteins. Accordingly, significant TCR-pMHC templates (Z-value cutoff > 4) along with interatomic interactions within the SARS-CoV-2-derived hit peptides were clarified. Also, we applied the structural analysis of the hit peptides from different coronaviruses to highlight a feature of evolution in SARS-CoV-2, SARS-CoV, bat-CoV, and MERS-CoV. Peptide-protein flexible docking between each of the hit peptides and their corresponding MHC molecules were performed, and a multi-hit peptides vaccine against the S and N glycoprotein of SARS-CoV-2 was designed. Filtering pipelines including antigenicity, and also physiochemical properties of designed vaccine were then evaluated by different immunoinformatics tools. Finally, vaccine-structure modeling and immune simulation of the desired vaccine were performed aiming to create robust T cell immune responses. We anticipate that our design based on the T cell antigen epitopes and the frame of the immunoinformatics analysis could serve as valuable supports for the development of COVID-19 vaccine. Designing of multi-hit peptides vaccine sequence 1 3 7 A set of high immunogenic hit peptides derived from N and S proteins of SARS-CoV-2 with high binding 1 3 8 events to HLA-A0201, HLA-B0801, HLA-B3501, HLA-B3508, HLA-B4405, and HLA-E were selected 1 3 9 on the basis of their solvent exposed residues and hydrophobicity scales. The AAY and GPGPG linkers 1 4 0 were applied for linking the candidate N and S hit peptides together, respectively. Additionally, the human 1 4 1 beta defensin 3 was also joined at N-terminus of the vaccine construct using EAAAK linker which acts as 1 4 2 adjuvant to improve the immunogenicity of the multi-hit peptides vaccine. The SOPMA web server (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html) was 1 9 8 identified for them, respectively. Giving our data, five and ten hit peptide antigen candidates (Z-value 1 9 9 cutoff > 4), and 23 and 40 homologous peptide antigens in 16 and 25 organisms by using HLA-A0201-2 0 0 peptide-TCR template [PDB entry 2p5e and 1oga] and the experimental peptide database were inferred in 2 0 1 ORF3a and S proteins, respectively. No hit peptide antigen candidates with Z-value cutoff > 4 were 2 0 2 detected for ORF6, ORF7a, and ORF8 queries by using HLA-A0201-peptide-TCR template ( Figure S1A ). Our results showed that by using HLA-B0801 ( Figure S1D ), HLA-B3501 (Figure S1B), and HLA-B3508 2 0 4 ( Figure S1E ) template, TCR-pMHC models with Z-value cutoff > 4 were only predicted for S (5 hit peptide 2 0 5 antigen candidate and 91 homologous peptide antigens in 57 organisms with PDB entry 1mi5), N (3 hit 2 0 6 peptide antigen candidate and 3 homologous peptide antigens in 2 organisms with PDB entry 2nx5), and S 2 0 7 (1 hit peptide antigen candidate and 1 homologous peptide antigens in 1 organisms with PDB entry 2ak4) 2 0 8 proteins, respectively. While, only M and S proteins were used to predict TCR-pMHC complex by using 2 0 9 HLA-B4405 ( Figure S1C ) with PDB entry 3dxa (15 homologous peptide antigens in 12 organisms). Also 2 1 0 only N, ORF10, and S proteins were used for prediction of TCR-pMHC complex by HLA-E (Figure S1F) 2 1 1 with PDB entry 2esv (84 homologous peptide antigens in 41 organisms) with Z-value cutoff > 4. No hit 2 1 2 peptide antigen candidates with Z-value cutoff > 4 were detected for M, E, and ORF3a queries by using 2 1 3 HLA-E-peptide-TCR template ( Figure S1F ). Moreover, model for ORF1ab query did not predict due to its 2 1 4 sequence length was ≥ 300. The 3D structure of the TCR-pMHC complex for each hit peptide derived 2 1 5 from each protein was illustrated by SWISS-MODEL. Detailed information about derived hit peptides for 2 1 6 SARS-CoV-2 along with bat-CoV, MERS-CoV, and SARS-CoV were tabulated in Table 1 . The binding events among hit peptides derived from N proteins related to HLA-A0201 2 1 9 Our TCR-pMHC models predicted that position 2 of the homologous peptide antigens ( Figure 1A ) related 2 2 0 to TCR-pMHC complex of SARS-CoV-2 as well as SARS-CoV and bat-CoV N proteins prefers the 2 2 1 hydrophobic amino acid residues (e.g. Ile, Leu, Met, and Phe), and the second position of these hit peptides 2 2 2 is an hydrophobic amino acid residue Pro forming five strong VDW forces with residues Y99, V67, M45, 2 2 3 Y7, and F9 and two H-bonds with residues K66 and E63 on MHC molecule ( Figure S2A , left). By contrast, 2 2 4 the second position of MERS-CoV N protein-derived hit peptide is charged residue Arg forming three 2 2 5 strong VDW forces with residues M45, F9, and V67 and two H-bonds with residues K66 and E63 (As 2 2 6 same as SARS-CoV-2) on MHC molecule. Surprisingly, position 9 of all homologous peptides antigens 2 2 7 ( Figure 1A ) prefers the hydrophobic amino acid residues (e.g. Leu, Ile, Val, and Met), and the position 9 of 2 2 8 these hit peptides is hydrophobic amino acid residue Leu (in the SARS-CoV-2, SARS-CoV, and bat-CoV) 2 2 9 and Gly (in the MERS-CoV). Our results showed that Leu attaches to the MHC with three strong VDW 2 3 0 forces with residues L81, I124, and W147 and three H-bonds with residues D77, Y84, and T143, while Gly 2 3 1 forms three strong VDW forces with residues L81, V95, and W147 and two H-bonds with residues D77 2 3 2 and V95 on MHC molecule ( Figure S2A , left). Moreover, positions 4, 6, and 8 of hit peptides in SARS- CoV-2, SARS-CoV, and bat-CoV form one H-bond with residues Q52, Q52 and D32 in chain E of TCR, with reside S100 in chain D of TCR (Table S1 ). Visualization of interactions in the atomic level structure 2 3 6 of a TCR-pMHC complex in the hit peptide of SARS-CoV-2 N protein for HLA-A0201 ( Figure S3A ) 2 3 7 within 20 and 8 Å generated on-the-fly using PyMOL. Accordingly, position 1 of these hit peptides forms two H-bonds with residues Y170 and Y158 and position 2 4 3 2 of these hit peptides forms a H-bonds with residue E62 and four strong VDW forces with residues A66, 2 4 4 Y6, W96, and M44. Moreover, position 7 of these hit peptides forms a H-bond with residue N76 and three 1 1 hit peptides lacks any contacts, although the position 5 and 6 of hit peptides form both H-bonds and/or 2 4 7 strong VDW forces on both MHC molecule and TCR ( Figure S2A and S2C, center) (Table S2 ). Visualization of interactions in the atomic level structure of a TCR-pMHC complex in the hit peptide of 2 4 9 SARS-CoV-2 N protein for HLA-E ( Figure S3C ) within 20 and 8 Å was generated on-the-fly using of the homologous peptide antigens of all queries has no detectable binding to both MHC and TCR. Additionally, position 6 of these hit peptides with a H-bond and strong VDW forces connects to the 3 0 5 residues Q96 and Y96 on TCR, respectively ( Figure S2D , right). The hit peptide correlates well with the 3 0 6 amino acid profile on the conserved positions 7 (Tyr) that forms one strong VDW forces with residue 3 0 7 W147 on MHC ( Figure S2B , right) and four strong VDW forces with residue A97, H47, H32, and L90 on 3 0 8 TCR ( Figure S2D , right and Table S6 ). Visualization of interactions in the atomic level structure of a TCR-3 0 9 pMHC complex in the hit peptide of SARS-CoV-2 S protein for HLA-B0801 ( Figure S3D ) within 20 and 8 3 1 0 Å was generated on-the-fly using PyMOL. CoV and SARS-CoV ORF3 proteins. Our data emphasized the hit peptide derived from SAR-CoV-2 S 3 2 6 protein by using HLA-A0201 had lower immunogenicity than bat-CoV, MERS-CoV, and SARS-CoV. 1 4 revealed a high degree of immunogenicity between SARS-CoV-2, SARS-CoV, and bat-CoV but a more 3 3 0 limited immunogenicity with MERS-CoV. Based on the hypothesis that solvent exposed residues via increasing TCR binding can provide appropriate 3 3 2 evidence about peptide immunogenicity, we measured solvent exposed area (SEA) for each hit peptide. Our results displayed that the SEA > 30 Å 2 for hit peptide related to HLA-B0801, HLA-B3501, and HLA-3 3 4 B3508 were 5.08, 7.76, and 8.28, respectively. Indeed, we found the most solvent accessibility of amino 3 3 5 acids were in the hit peptide M (SEA > 30 Å 2 : 5.99), S (SEA > 30 Å 2 : 6.75), and ORF10 (SEA > 30 Å 2 : 6) 3 3 6 for HLA-A0201, HLA-B4405, and HLA-E, respectively. Over the past few months, studies in humans are beginning to unravel the underpinnings relationship 3 3 8 between hydrophobicity scales and eradication of immune responses. Currently, identification of peptide 3 3 9 regions exposed at the surface has gained much attention in the field of immunogenicity of peptides to HLA-E-peptide-TCR ( Figure 2E ) and S protein related to HLA-B4405 peptide-TCR ( Figure 2D ) had 3 5 0 grater hydrophobicity than ORF10 and M proteins due to differences at positions 6 and 9, and 5, 3 5 1 3 9 8 predicted as a single domain without disorder. Among them, the first model had a better quality in most 4 4 0 cases with C-score = -3.25, Estimated TM-score = 0.35±0.12, and Estimated RMSD = 11.7±4.5Å. However, because of the modelling errors and unavailability of an appropriate template such as angles and 4 4 2 irregular bonds, generation of the 3D models cannot be sufficient to follow the necessary accuracy level for 4 4 3 some biological purpose, especially where experimental data is rare. As such, for modification of local 4 4 4 errors, helping to bring 3D model of vaccine closer to native structures, and growing the accuracy of 4 4 5 primary 3D model, the refinement of 3D structure of the vaccine is vital, particularly for furthering in-silico 4 4 6 studies [20] . Therefore, the refinement of the model was performed by using 3D refine tool. On the basis of 4 4 7 the overall quality of the refined model, the model 5 exhibited the best results with RMSD 0.549Å ( Figure 4 4 8 5B). The quality of the best model of the multi-hit peptides vaccine construct was validated by ProSA-web. It is well-known that the Z score is in relation with the length of the protein, indicating that negative Z-4 5 0 scores are more appropriate for a trustworthy model. In fact, the Z score shows the overall quality and (range −20 to 10) and also was located within the space of protein related to X-ray, suggesting that the 4 5 6 obtained model is reliable and closes to experimentally determined structure. Evaluation the overall quality 4 5 7 of the finalized model of vaccine construct by Ramachandran plot analysis emphasized that 77.5% (93/120) 4 5 8 of all residues in finalized model ( Figure 5F ) compared to 54.2% (65/120) of all residues in initial model 4 5 9 ( Figure 5E ) were in favored (98%) regions, also, 90.0% (108/120) of all residues in finalized model ( Figure 4 6 0 5F) compared to 78.3% (94/120) of all residues in initial model ( Figure 5E ) were in allowed (>99.8%) 4 6 1 regions. After that, protein structure and visualization of the measured interactions between atoms 1 0 To our best knowledge, from the standpoint of immunoinformatics approaches, the concept discussed in 4 8 1 this study is the first structural modeling to investigate both the TCR and pMHC interfaces for the SARS-4 8 2 CoV-2 proteins. Our results provide a blueprint for inferring the SARS-CoV-2-derived hit peptides with 4 8 3 high accuracy towards vaccine development. Therefore, it is tempting to speculate that the aforementioned 4 8 4 models will offer valuable framework for identifying specific peptides with a potential to acrivate T cell- Hit peptide; VDW: van der Waals (VDW) forces C-ImmSim Immune simulator web server was used for determining ability of vaccine to induce T cell 4 6 8 immunity. This server yielded results consistent with actual immune responses as evidenced by a general 4 6 9 marked increase in the generation of secondary responses. For better following the effects of the final 4 7 0 vaccine construct for stimulation of T cell immunity, a construct with point mutations on key residues, 4 7 1 replacing of the hydrophobic amino acids with charged amino acids was constructed ( Figure 6A ). According to data, after vaccination with native vaccine, there was a consistent rise in Th (helper) cell