key: cord-0329345-8a4vhfqa authors: Swathika, R S; Vimal, S; Bhagyashree, E; Elumalai, Elakkiya; Gupta, Krishna Kant title: Peptide-based epitope design on non-structural proteins of SARS-CoV-2 date: 2021-12-28 journal: bioRxiv DOI: 10.1101/2021.12.27.474315 sha: 2feaac683f5125f767c0d389af432aac3927b021 doc_id: 329345 cord_uid: 8a4vhfqa The SARS-CoV-2 virus has caused the severe pandemic, COVID19 and since then its been critical to produce a potent vaccine to prevent the quick transmission and also to avoid alarming deaths. Among all type of vaccines peptide based epitope design tend to outshine with respect to low cost production and more efficacy. Therefore, we started with obtaining the necessary protein sequences from NCBI database of SARS-CoV-2 virus and filtered with respect to antigenicity, virulency, pathogenicity and non-homologous nature with human proteome using different available online tools and servers. The promising proteins was checked for containing common B and T-cell epitopes. The structure for these proteins were modeled from I-TASSER server followed by its refinement and validation. The predicted common epitopes were mapped on modeled structures of proteins by using Pepitope server. The surface exposed epitopes were docked with the most common allele DRB1*0101 using the GalaxyPepDock server. The epitopes, ELEGIQYGRS from Leader protein (NSP1), YGPFVDRQTA from 3c-like proteinase (nsp5), DLKWARFPKS from NSP9 and YQDVNCTEVP from Surface glycoprotein (spike protein) are the epitopes which has more hydrogen bonds. Hence these four epitopes could be considered as a more promising epitopes and these epitopes can be used for future studies. autoimmune reactions. They need storage at a cold temperature and have low 50 stability and. Therefore, peptide-based vaccines may be a promising strategy which 51 involves minimal microbial components to stimulate humoral and adaptive 52 immunity against a microorganism. These vaccines are able to target very specific 53 epitopes removing the risks associated with allergic and autoimmune responses [5] . 54 The SARS CoV 2 contains four major structural proteins: Membrane protein, Spike 55 protein, Envelope protein and Nucleo-capsid protein. Early work on the new corona 56 virus have highlighted the importance of surface glycoprotein (spike protein), also 57 called as S proteins which act as the key mediator for the virus to enter the host cell 58 [6]. While comparing SARS COVID and SARS COVID2 the spike protein is found 59 to be binding to a receptor called angiotensin converting receptor ACE-2 which 60 helps in entering to the host cells. It has been reported that the spike protein of the 61 novel virus binds to ACE -2 more efficiently when compared to that of SARS 62 COVID. Therefore, it spreads more effortlessly and is also more contagious. Many 63 working hard to find a potential vaccine to curb the global pandemic situation [7, 8] . 65 Peptides have become one of the most important vaccine candidates given their 66 comparatively easy production and design, chemical stability of structure and non-67 appearance of any possible infectious potential[9,10,11 The exo-membrane topology of the proteins were analysed using the Pepitope 121 server (http://pepitope.tau.ac.il/) where position of each epitope in the predicted 122 protein structure model was identified. Under epitope mapping algorithm, 123 "Combined" option was selected. This is done to make sure that the targeted 124 epitopes are exposed on the surface and ensure that the epitopes considered are not 125 hidden within the protein globular structure. 126 2.9. Epitope binding mode and interaction analysis: The Primary genomic sequence data of SARS-CoV-19 was fetched from the NCBI 137 database. From the obtained Genomic data 38 protein sequences were collected in 138 FASTA format. 139 Virulency is considered as one of the main factor for studying the pathogenesis and 141 the MP3 tool uses a combined SVM-HMM approach to yield increased efficiency 142 and reliability to predict pathogenic proteins. Out of 38 proteins, 11 proteins were 143 found to be virulent (Table 1 & Supplementary Table1) . 144 For the candidate protein sequences, antigenicity was evaluated using the Vaxijen 146 server. Antigenecity is the ability of a chemical structure to bind explicitly with a 147 group of products that have adaptive immunity. The VaxiJen 2.0 server uses an 148 Protein sequences that showed antigenicity value (>0.4) were identified as antigenic and 150 were subjected to further studies. Similarly, 11 proteins were predicted to be 151 antigenic. The nsp9 protein was most antigenic (Table 1 & Supplementary Table1) . 152 It is necessary to predict the similarity with human genome because as any 154 similarity between human and virus genome may lead to side effects including auto 155 immunity issues. Hence all 38 protein sequences were subjected to comparison with 156 human proteome by using BLASTp tool with default parameters. "No significant 157 similarity found" hits were considered as non-homologous proteins. All 11 proteins 158 were found to be non-homologous to human proteome (Table 1 & Supplementary 159 Table1). 160 The presence of these protein sequences either on the Transmembrane helix or 162 outside the membrane were identified using the TMHMM server 2.0. This is 163 necessary because proteins present in the core are less exposed and not viable. Hence 164 the proteins which are present outside or in the Trans-membrane region were 165 considered. In Table 1 After obtaining final set of 11 proteins, B-cell epitopes were predicted for each of 184 these sequences. B-cell epitope is a part of the protein that will bind with the 185 antibody. Hence it is necessary to find the B-cell epitopes for these 11 proteins. The 186 B-cell epitopes predicted for each 11 proteins are given in Supplementary Table 2 The common epitopes were checked for the inhibitory concentration (IC50) values 194 that predicts epitope binding to major histocompatibility complexes (MHCs). For 195 this, MHCpred tool was used. In this server common epitope was checked for its 196 binding affinity with allele DRB1*0101 as it is considered as immunodominant 197 allele. All those common epitopes were selected whose IC50 value was <200 nM. 198 Similarly, the epitopes with IC50 values less than 200 nM considered for epitope 199 mapping on protein structure. We considered only those epitopes whose predicted 200 IC50 values was less than 100 nM (Table2). Table 3 )and used for epitope mapping. The final epitopes 211 were found on six proteins namely Leader protein(nsp1) (Figure 1 After refinement the selected models were validated using SAVES server which helps in 216 evaluating the dependability of the protein. SAVES version 5.0 server from NIH MBI 217 laboratory is mostly useful for validation analysis. The quality was evaluated using 218 VERIFY3D and PROCHECK (Supplementary Table 3) . From this server, Ramachandran 219 plot was also visualized for each protein (Figure 1-6) . If a protein sequence is present in 220 more disallowed region, then that protein is not considered. 221 222 3.11 Peptide mapping 223 Out of the 11 protein sequences, after validating the structure from the 224 Ramachandran plot only 6 proteins (nsp1, nsp5, nsp8, nsp9, nsp9(isoform) and 225 nsp10) were having final epitopes. The epitopes were mapped and again 226 antigenicity of those epitopes were predicted. The eight surface exposed and high 227 antigenecity epitopes (Fig 7 a-h) were selected for docking study with 228 "DRB1*0101" (PDB ID "1AQD"). The best peptide-protein complexes were ranked based on interaction parameters 284 which include hydrogen-bonding pattern, similarity score and accuracy score. For 285 interpreting the binding mode of the complex Ligplot server was used and results 286 are shown in Supplementary Figure file 1(Fig 8 a-g) . 287 288 From the obtained hydrogen-bonding interaction the one with the maximum number of 290 hydrogen bonds were taken into consideration as this results in stability and more 291 energetically favourable interactions. The epitopes with more number of hydrogen bonds 292 (ELEGIQYGRS and YGPFVDRQTA) can be considered as more promising epitopes 293 (Table 3) . 294 Initially from the 38 protein sequences considered from the whole genome of SARS-308 CoV19, after fulfilling the necessary constraints which include antigenicity, virulency, 309 nonhomologous to human proteome, presence of the epitope on the membrane and 310 essentiality of protein only 11 protein sequences remained. For these 11 proteins linear B-311 cell and T-cell epitope prediction were done and the common epitopes were found. The 312 protein structure for these sequences were modelled using I-TASSER and these models 313 were subjected to refinement and validation processes after which only 6 protein were 314 remaining. For the spike protein available protein model from PDB were taken into 315 consideration. Now for the 7 protein models their epitope's antigenicity were evaluated 316 and the one with highest value for each protein was further considered for the docking 317 process. These epitopes were docked with DRB1*0101 allele using GalaxyPepDock 318 server and the docked structures obtained were analysed in LigPlot for hydrogen bonding 319 interaction, similarity and accuracy score. As a result the hydrogen bonding pattern 320 between the epitopes and the allele were analysed and the epitopes with more number of 321 hydrogen bonds were considered to be more stable and energetically favourable. These 322 epitopes could be used for further studies. 323 We sincerely acknowledge SASTRA Deemed To be University for providing 325 computational facility. 326 None 328 Group of the International Committee on Taxonomy of Viruses The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-336 nCoV and naming it SARS-CoV-2. Nat Microbiol Chloroquine and hydroxychloroquine in coronavirus disease 2019 338 (COVID-19). Facts, fiction and the hype: a critical appraisal Hydroxychloroquine-azithromycin for COVID-19 -Warranted 341 or dangerous Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus 344 (2019-nCoV) in vitro Vaccines (Basel) Structural and functional properties of SARS-348 CoV-2 spike protein: potential antivirus drug development for COVID-19 Exoproteome and secretome derived broad spectrum novel drug and 351 vaccine candidates in Vibrio cholerae targeted by Piper betel derived compounds Designing a multi-epitope peptide based 354 vaccine against SARS-CoV-2. Sci Rep Designing a multi-epitope peptide based vaccine 356 against SARS-CoV-2 the epitope-based peptide vaccine design strategy and studies against COVID19 Design 363 of novel multiepitope constructs-based peptide vaccine against the structural S, N and M 364 proteins of human COVID-19 using immunoinformatics analysis Epitope-based peptide 367 vaccine design and target site depiction against Ebola viruses: an immunoinformatics study Editorial: Reverse Vaccinology. Front Immunol VaxiJen: a server for prediction of protective antigens, 374 tumour antigens and subunit vaccines MP3: a software tool for the prediction of 377 pathogenic proteins in genomic and metagenomic data Predicting transmembrane protein 379 topology with a hidden Markov model: application to complete genomes DEG: a database of essential genes Predicting linear B-cell epitopes using string 386 kernels I-TASSER: a unified platform for automated protein 392 structure and function prediction GalaxyWEB server for protein structure prediction and 395 refinement GalaxyPepDock: a protein-peptide docking tool based on 398 interaction similarity and energy optimization LIGPLOT: a program to generate schematic 402 diagrams of protein-ligand interactions