key: cord-351472-ch004jxy authors: Vashi, Yoya; Jagrit, Vipin; Kumar, Sachin title: Understanding the B and T cells epitopes of spike protein of severe respiratory syndrome coronavirus-2: A computational way to predict the immunogens date: 2020-04-10 journal: bioRxiv DOI: 10.1101/2020.04.08.013516 sha: doc_id: 351472 cord_uid: ch004jxy The 2019 novel severe respiratory syndrome coronavirus-2 (SARS-CoV-2) outbreak has caused a large number of deaths with thousands of confirmed cases worldwide. The present study followed computational approaches to identify B- and T-cell epitopes for spike glycoprotein of SARS-CoV-2 by its interactions with the human leukocyte antigen alleles. We identified twenty-four peptide stretches on the SARS-CoV-2 spike protein that are well conserved among the reported strains. The S protein structure further validated the presence of predicted peptides on the surface. Out of which twenty are surface exposed and predicted to have reasonable epitope binding efficiency. The work could be useful for understanding the immunodominant regions in the surface protein of SARS-CoV-2 and could potentially help in designing some peptide-based diagnostics. binding domains of S proteins of SARS-CoV-2 and SARS-CoV bind with similar affinities to 74 human ACE2 11,12 . 75 As the situation worsens, there is a growing need for the development of suitable 76 therapeutics and alternate diagnostics against SARS-CoV-2 for effective disease management 77 strategies. Diagnostic assays based on peptides have become increasingly substantial and 78 indispensable for its advantages over conventional methods 13 . The present study aimed to 79 locate appropriate epitopes within a particular protein antigen, which can elicit an immune 80 response that could be selected for the synthesis of the immunogenic peptide. Using 81 computational approach, S glycoprotein of SARS-CoV-2 was explored to identify various 82 immunodominant epitopes for the development of diagnostics. Besides, the results could also 83 help us to understand the SARS-CoV-2 surface protein response towards T and B cells. Collection of targeted protein sequence 87 We downloaded amino acid sequences (n=98) of S protein available at the time of study 88 on targeted SARS-CoV-2 from the National Centre for Biotechnological Information (NCBI) 89 database. 90 Identification of potential peptides 91 To identify an immunodominant region, it is of extreme importance to select the 92 conserved region within the S protein of SARS-CoV-2. All the sequences were compared 93 among themselves for variability using protein variability server by Shannon method 14 . The 94 average solvent accessibility (ASA) profile was predicted for each sequence using SABLE 95 server 15 . BepiPred 1.0 Linear Epitope Prediction module [16] [17] [18] incorporated in Immune Epitope 96 Database (IEDB) 19 was used to predict potential epitopes within the S protein. The FASTA 97 sequence of the targeted protein was used as an input for all the default parameters. 98 The potential epitopes are represented by blue peaks, while green-colored slopes represent non-124 epitopic regions (Figure 2 ). 125 The existence of B-cell linear and discontinuous (conformational) epitopes within the 126 identified segments could help us to identify the peptides, which can elicit immune response 127 28 . We identified 18 linear epitopes, predicted by ElliPro (IEDB), which contains regions from 128 19 of our selected peptides highlighted in red in Table 2 . These identified B-cell linear epitopes 129 are placed based on their positional value, and scores. Epitopes with high scores have more 130 potential for antibody binding. Five of our selected peptides (peptide numbers 3, 5, 19, 23, and 131 24 in Table 1) were not considered as potential linear B-cell epitopes. Using the same module, B-cell discontinuous epitopes were predicted, which gave 16 133 epitope regions that contained regions from 18 of our selected peptides highlighted in red 134 (Table 3) . Six peptides (peptide numbers 3, 5, 14, 19, 23, and 24 in Table 1 ) were not predicted 135 as discontinuous B-cell epitopes. To further confirm, we used ABCpred server to detect B-cell 136 epitopes, with default threshold of 0.51. It identified various epitopes with different length and 137 scores; out of those, the regions which contained our selected peptides are highlighted in red 138 (Table 4) . A high score represents a good binding affinity with epitopes, and most of our 139 peptides scored more than 0.7 and were predicted as linear B-cell epitopes. 140 We used the IEDB server to determine the binding affinity for human leucocyte antigen 141 (HLA) with our selected peptides from Table 1 . As recommended by the IEDB server, 142 reference HLA allele sets were used for the prediction of MHC-I and MHC-II T-cell epitopes, 143 as they provide comprehensive coverage of the population. All the predictions were made using 144 IEDB recommended procedures. We observed good binding affinities for our selected peptides. The list of binding affinities for MHC-I T-cell epitopes is given in Table 5 , where low rank 146 represents high binding affinity. The epitopes with rank <1% for very high binding affinity 147 were selected. Regions from all of our selected peptides were found to be potential T-cell 7 epitope(s) with high binding affinity with HLA-A and HLA-B alleles, except one. Similarly, 149 the list of binding affinities for MHC-II T-cell epitopes are given in Table 6 . Regions from our 150 selected peptides are highlighted in red. The results revealed that around half of our selected 151 peptides are potential T-cell epitope(s) with high binding affinity with HLA-DRB and HLA-152 DP/DQ alleles. Overall, it was found that the regions identified in Table 1 not only had good B-cell and 154 T-cell affinities, but the majority of them had overlaps with discontinuous epitopes also (Table 155 3). The peptide segments identified from the set of 98 sequences of the SARS-CoV-2 S 156 glycoprotein appear to hold reasonable potential to act as immunogens. Peptide-based success 36 . In our study, we predicted both B-cell and T-cell epitopes for conferring immunity 172 in different ways. We speculate that the identified epitopes with considerably good epitope 8 binding efficiency have the potential to be an immunodominant peptide. Peptide-based 174 sensitive and rapid diagnostic kits are considered as a better alternative to the conventional 175 serological tests including whole antigenic protein 13 . The study could help us to use the 176 predicted peptide as an immunogen for the development of diagnostics against SARS-CoV-2. Figure 3 . Table 3 . IEDB ElliPro predicted discontinuous epitopes for spike protein of SARS-CoV-2. Sequences that match our selected peptides are marked in red. Table 5 . IEDB prediction of binding affinity with MHC-I alleles, only our selected peptides with percentile rank less than 1.00 are shown here. The binding affinity is considered higher for low percentile rank. Sequences that match our selected peptides are marked in red. T1105, B:Q1106, B:R1107, B:N1108, B:F1109, B:Y1110, B:E1111, B:P1112, B:Q1113, B:I1114, B:I1115, B:T1116, B:T1117, B:D1118, B:N1119, B:T1120, B:F1121, B:V1122, B:S1123, B:G1124, B:N1125, B:C1126, B:D1127, B:V1128, B:V1129, B:I1130, B:G1131, B:I1132, B:V1133, B:N1134, B:N1135, B:T1136, B:V1137, B:Y1138 C:A1087, C:H1088, C:F1089, C:P1090, C:R1091, C:E1092, C:G1093, C:V1094, C:F1095, C:V1096, C:S1097, C:N1098, C:G1099, C:T1100, C:H1101, C:W1102, C:F1103, C:V1104, C:T1105, C:Q1106, C:R1107, C:N1108, C:F1109, C:Y1110, C:E1111, C:P1112, C:Q1113, C:I1114, C:I1115, C:T1116, C:T1117, C:D1118, C:N1119, C:T1120, C:F1121, C:V1122, C:S1123, C:G1124, C:N1125, C:C1126, C:D1127, C:V1128, C:V1129, C:I1130 Table 6 . IEDB prediction of binding affinity with MHC-II alleles, only our selected peptides with percentile rank less than 1.00 are shown here. The binding affinity is considered higher for low percentile rank. Sequences that match our selected peptides are marked in red.