key: cord-0847870-0yqyclxk authors: Rahman, M. Shaminur; Hoque, M. Nazmul; Islam, M. Rafiul; Akter, Salma; Rubayet-Ul-Alam, A. S. M.; Siddique, Mohammad Anwar; Saha, Otun; Rahaman, Md. Mizanur; Sultana, Munawar; Hossain, M. Anwar title: Epitope-based chimeric peptide vaccine design against S, M and E proteins of SARS-CoV-2 etiologic agent of global pandemic COVID-19: an in silico approach date: 2020-03-31 journal: bioRxiv DOI: 10.1101/2020.03.30.015164 sha: 9d3a4c7e8c660f772f98ead796ab87aa6772d298 doc_id: 847870 cord_uid: 0yqyclxk Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the ongoing pandemic of coronavirus disease 2019 (COVID-19), a public health emergency of international concern declared by the World Health Organization (WHO). An immuno-informatics approach along with comparative genomic was applied to design a multi-epitope-based peptide vaccine against SARS-CoV-2 combining the antigenic epitopes of the S, M and E proteins. The tertiary structure was predicted, refined and validated using advanced bioinformatics tools. The candidate vaccine showed an average of ≥ 90.0% world population coverage for different ethnic groups. Molecular docking of the chimeric vaccine peptide with the immune receptors (TLR3 and TLR4) predicted efficient binding. Immune simulation predicted significant primary immune response with increased IgM and secondary immune response with high levels of both IgG1 and IgG2. It also increased the proliferation of T-helper cells and cytotoxic T-cells along with the increased INF-γ and IL-2 cytokines. The codon optimization and mRNA secondary structure prediction revealed the chimera is suitable for high-level expression and cloning. Overall, the constructed recombinant chimeric vaccine candidate demonstrated significant potential and can be considered for clinical validation to fight against this global threat, COVID-19. Linear epitopes prediction based on solvent-accessibility and flexibility revealed that 163 chain A of S protein possessed 15 epitopes, of which three epitopes having different residues 164 positions (395-514, 58-194, 1067-1146) were highly antigenic (score > 0.8). However, the B and 165 C chains had 18 and 19 epitopes, respectively, and of them three epitopes in chain B (residues 166 position 1067-1146, 89-194, 58-87) and two epitopes in chain C (residues position 56-194, 1067-167 1146) seem to be highly antigenic (score > 0.8) ( Table 1 ). The amino acid residues 395-514 and 168 56-194 of the detected epitopes belonged to RBD and NTD regions of the S protein, 169 respectively. These regions were considered as the potential epitope candidates in the IEDB 170 protein possessed only one highly antigenic epitope (57- 194 antigenicity score = 0.449) which might be potentially functional in host cell binding (Table 2) . 195 Furthermore, the Kolaskar and Tongaonkar antigenicity profiling found five highly antigenic 196 epitopes in RBD region with an average (antigenicity) score of 1.042 (minimum = 0.907, 197 maximum = 1.214), and seven highly antigenic epitopes in NTD with an average (antigenicity) 198 score of 1.023 (minimum = 0.866, maximum = 1.213) ( Supplementary Fig. 4 , Supplementary 199 Table 2 ). The average Kolaskar scores for envelop protein B-cell epitope (EBE) and membrane 200 protein B-cell epitope (MBE) were 0.980 and 1.032, respectively (Supplementary Table 2) . 201 However, through ABCPred analysis, we identified 18 and 11 B-cell epitopes in RBD and NTD 202 regions with average antigenicity score of 0.775 and 0.773 in the associated domains, 203 respectively (Supplementary Table 3) . 204 205 The IEDB analysis resource tool was employed to identify T-cell epitopes in RBD the IEDB MHC-II prediction tool generated 13-mer124 peptides from the RBD, and 10-mer 73 220 peptides in the NTD segments of the S protein that showed interaction with many different 221 and/or common MHC-II alleles with an IC 50 value ranging from 1.4 to 49.9 nM (Supplementary 222 Data 1). Furthermore, for MHC-I and MHC-II processing, the analysis tool of the IEDB 223 generates an overall score for each epitope's intrinsic potential of being a T-cell epitope based on 224 proteasomal processing, TAP transport, and MHC-binding efficiency (Supplementary Data 1). 225 The outcomes of these tools are quite substantial because they utilize vast number of alleles of 226 HLAs (human-leukocyte-antigens) during computation. 227 228 From the selected epitopes from the RBD and NTD segments, top five based on IC 50 230 score were used in molecular docking analysis using the GalaxyWeb server with their respective 231 HLA allele binders, in which they revealed significantly favorable molecular interaction for 232 binding affinity. Docking complexes thus formed have significantly negative binding energy, 233 and most of the aa residues of the epitopes were involved in molecular interactions with their 234 respective HLA alleles (Supplementary data 1). The epitope-HLA docking complexes were 235 further refined with GalaxyRefineComplex, and their binding affinity was analyzed through 236 PRODIGY web-server. All of the selected epitopes showed significantly negative binding 237 affinity (ΔG always remained ≤ -8.2 kcal mol -1 , average = -9.94 kcal mol -1 , Fig. 4 , 238 Supplementary data 1). 239 The findings of IFNepitope program suggests that, both the target RBD and NTD regions 242 of S protein, and B-cell linear epitope (MBE) had great probability to release of IFN-γ with a 243 positive score. A total of 56 potential positive IFN-γ inducing epitopes (15-mer) were predicted 244 for the RBD domain with an average epitope prediction score of 0.255 and the maximum SVM 245 score of 0.625. On the other hand, a total of 33 potential positive epitopes were predicted for the 246 NTD domain with an average epitope prediction score of 0.312 and the maximum SVM score of 247 0.811. Moreover, the M protein also possessed several IFN-γ inducing epitopes having an 248 average epitope prediction score of 0.980 (Supplementary Table 4 (Fig. 5a, Supplementary Data 2) . In addition to geographical 266 distribution, the ethnic groups also found to be an important determinant for good coverage of 267 the CTL and HTL epitopes (Fig. 5b) The CoV-RMEN peptide was predicted to contain 43.2% alpha helix, 67.4% beta sheet, 303 and 12% turns (Fig. 6b, Supplementary Fig. 5 ) using CFSSP:Chou and Fasman secondary 304 structure prediction server. In addition, with regards to solvent accessibility of aa residues, 34% 305 were predicted to be exposed, 30% medium exposed, and 34% were predicted to be buried. Only 306 2 aa residues (0.0%) were predicted to be located in disordered domains by the RaptorX Property 307 server ( Supplementary Fig. 6 ). The Phyre2 server predicted the tertiary structure model of the 308 designed chimeric protein in 5 templates (c5x5bB, c2mm4A, c6vsbB, c5x29B and c6vybB) 309 based on heuristics to maximize confidence, percent identity and alignment coverage. The final 310 model of the CoV-RMEN peptide modelled at 82% with more than 90% confidence (Fig. 6c) . 311 Moreover, 65 residues were modelled by ab initio. The immune-stimulatory ability of the predicted vaccine CoV-RMEN was conducted 329 through the C-ImmSimm server. The analysis predicts the generation of adaptive immunity in 330 target host species (human) using position-specific scoring matrix (PSSM), and machine learning 331 techniques for the prediction of epitopes and immune interactions 28 . The cumulative results of 332 immune responses after three times antigen exposure with four weeks interval each time revealed 333 that the primary immune response against the antigenic fragments was elevated indicated by 334 gradual increase of IgM level after each antigen exposure (Fig. 7a) . Besides, the secondary The ClusPro server was used to determine the protein binding and hydrophobic 351 interaction sites on the protein surface. The immune responses of TLR3 and TLR4 against 352 vaccine construct (CoV-RMEN) were estimated by analyzing the overall conformational stability 353 of vaccine protein-TLRs docked complexes. The active interface aa residues of refined 354 complexes of CoV-RMEN and TLRs were predicted (Fig. 8 , Table 3 ). The relative binding free 355 energies (ΔG) of the protein-TLRs complexes were significantly negative (Table 3) content is between 30% and 70% ( Fig. 9 a,b,c) . 376 The evaluation of minimum free energy for 25 structures of chimeric mRNA, the 379 optimized sequences carried out by the 'Mfold'server. The results showed that Δ G of the best 380 predicted structure for the optimized construct was Δ G = -386.50 kcal/mol. The first nucleotides 381 at 5' did not have a long stable hairpin or pseudoknot. Therefore, the binding of ribosomes to the 382 translation initiation site, and the following translation process can be readily accomplished in 383 the target host. These outcomes were in the agreement with data obtained from the 384 'RNAfold'web server ( Fig. 9 d,e) where the free energy was -391.37 kcal/mol. 385 386 After codon optimization and mRNA secondary structure analysis, the sequence of the 388 chimeric peptide vaccine production does not involve virus replication, therefore reduce the cost 416 of production. Hence, a low cost strategy should be adopted for developing a highly demanded 417 vaccine for the mankind. Heterologous expression of any vaccine candidate protein has very 418 promising scopes for developing such low cost vaccine, providing that all essential properties for 419 antigenicity, immunogenicity and functional configuration are being conserved to mimic the 420 structural and functional property of the actual antigen 34 . Construction of a vaccine candidate 421 with multiple potential epitopes can obviously potentiate the multi-valency of the antigen to 422 develop immune response against a number of epitopes of any pathogen. Also, rational 423 engineering of epitopes for increased potency and magnitude, ability to enhance immune 424 response in conserved epitopes, increased safety and absence of unnecessary viral materials and 425 cost effectiveness all these cumulatively include potential benefit to multi-epitope recombinant 426 protein based vaccine 20 . This study was designed to assist with the initial phase of multi-epitope 427 vaccine candidate selection. Thereby, safe and effective vaccine development by providing 428 recommendations of epitopes that may potentially be considered for incorporation in vaccine 429 design for SARS-CoV-2. Vaccine design is improved through the use of specialized spacer sequences 39 . To 461 designing the CoV-RMEN (vaccine candidate) GG and EGGE linkers were incorporated 462 between the predicted epitopes to produce sequences with minimized junctional 463 immunogenicity, thereby, allowing the rational design construction of a potent multi-epitope 464 vaccine 21,38 . The molecular weight of our vaccine candidate, the CoV-RMEN is 46.8 kDa with a 465 predicted theoretical pI of 8.71, indicating that the protein is basic in nature. Also, the predicted 466 instability index indicates that the protein will be stable upon expression, thus further 467 strengthening its potential for use. The aliphatic index showed that the protein contains aliphatic 468 side chains, indicating potential hydrophobicity. All these parameters indicate that the 469 recombinant protein is thermally stable, hence would be best suited for use in different endemic 470 areas worldwide 6,21 . 471 The knowledge of secondary and tertiary structures of the target protein is essential in 472 vaccine design 39,40 . Secondary structure analysis of the CoV-RMEN indicated that the protein 473 consisted of 43.2% alpha helix, 67.4% beta sheet, and 12% turns with only 2 residues disordered. all of which showed significant antigenic properties compared to any other viral proteins. This 533 chimera also includes potential CTL, HTL and B-cell epitopes to ensure humoral as well as 534 cellular immune response and the optimal expression and stability of the chimera was validated. 535 With multiple limitations and high cost requirements for the attenuated vaccine preparation for 536 contagious agents like SARS-CoV-2, this chimeric peptide vaccine candidate gives us the hope 537 to ensure it's availability and relatively cheap option to reach entire world. This CoV-RMEN can 538 be very effective measure against COVID-19 to reach globally. Hence, this could be cloned, 539 expressed and tried for in vivo validations and animal trials at the laboratory level. 540 A total of 250 partial and complete genome sequences of SARS-CoV-2 were retrieved 545 from NCBI (Supplementary Table 5 We employed both structure and sequence-based methods for B-cell epitopes prediction. 573 Conformational B-cell epitopes on the S protein were predicted by Ellipro (Antibody Epitope 574 Prediction tool; http://tools.iedb.org/ellipro/) available in IEDB analysis resource 51 with the 575 minimum score value set at 0.4 while the maximum distance selected as 6 Å. The ElliPro allows 576 the prediction and visualization of B-cell epitopes in a given protein sequence or structure. The 577 ElliPro method is based on the location of the residue in the protein's three-dimensional (3D) 578 structure. ElliPro implements three algorithms to approximate the protein shape as an ellipsoid, 579 calculate the residue protrusion index (PI), and cluster neighboring residues based on their 580 protrusion index (PI) value. The residues lying outside of the ellipsoid covering 90% of the inner 581 core residues of the protein score highest PI of 0.9 23 . Antigenicity of full-length S (spike 582 glycoprotein), M (membrane protein) and E (envelope protein) proteins was predicted using 583 The GalaxyRefine server was further used to improving the best local structural quality of the 679 CoV-RMEN according to the CASP10 assessment, and ProSA-web 680 (https://prosa.services.came.sbg.ac.at/prosa.php) was used to calculate overall quality score for a 681 specific input structure, and this is displayed in the context of all known protein structures. The 682 ERRAT server (http://services.mbi.ucla.edu/ERRAT/) was also used to analyze non-bonded 683 atom-atom interactions compared to reliable high-resolution crystallography structures. A 684 Ramachandran plot was obtained through the RAMPAGE server 685 (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php). The server uses the PROCHECK 686 principle to validate a protein structure by using a Ramachandran plot and separates plots for 687 Glycine and Proline residues 64 . 688 To further characterize the immunogenicity and immune response profile of the CoV-691 RMEN, in silico immune simulations were conducted using the C-ImmSim server 692 The red, cyan, and yellow colored regions represent the potential antigenic domains predicted by the IEDB analysis resource Elipro analysis. GalaxyWEB-GalaxyPepDock-server followed by the refinement using GalaxyRefineComplex and free energy (ΔG) of each complex was determined in PRODIGY server. Ribbon structures represent HLA alleles and stick structures represent the respective epitopes. Light color represents the templates to which the alleles and epitopes structures were built. Further information on molecular docking analysis is also available in Supplementary Data 1. (COVID-19), a public health emergency of international concern declared by the World Health Organization (WHO). An immuno-informatics approach along with comparative genomic was applied to design a multi-epitope-based peptide vaccine against SARS-CoV-2 combining the antigenic epitopes of the S, M and E proteins. Here, I would like to request you to have APC waivers and discounts (Bangladesh, lowermiddle-income country) for this manuscript as per the rules of the journal. Therefore, me and rest of the co-authors of this manuscript do firmly believe and hope that you and the reviewer panel will consider this manuscript suitable for publication in npj Vaccines journal. Thanking you for kind consideration. Novel Coronavirus (2019-nCoV) situation reports -World Health Organization (WHO) Epitopes Vaccine Prediction against Severe Acute Respiratory Syndrome (SARS) A new coronavirus associated with human respiratory disease in China In Silico Prediction of a Novel Universal Multi-epitope Peptide 765 Vaccine in the Whole Spike Glycoprotein of MERS CoV Immunogenicity and structures of a rationally designed prefusion 768 MERS-CoV spike antigen Epitope based peptide vaccine design and target site depiction against Middle East 771 Respiratory Syndrome Coronavirus: an immune-informatics study Preliminary identification of potential 774 vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV 775 immunological studies Design of multi epitope-based peptide vaccine against E 777 protein of human 2019-nCoV: An immunoinformatics approach Cryo-EM structure of the SARS coronavirus 779 spike glycoprotein in complex with its host cell receptor ACE2 Cryo-EM structure of the 2019-nCoV spike in the prefusion 782 conformation The outbreak of SARS-CoV-2 pneumonia calls for viral vaccines Return of the coronavirus: 2019-nCoV Structural definition of a neutralization epitope on the N-terminal domain 788 of MERS-CoV spike glycoprotein Structural Definition of a Neutralization-sensitive Epitope on the MERS-793 The expression of membrane protein augments the specific responses 795 induced by SARS-CoV nucleocapsid DNA immunization Coronavirus envelope protein: current knowledge Genomic characterization of the 2019 novel human-pathogenic 800 Coronavirus infections and immune responses Epitope-Based Vaccine Target Screening against Highly Pathogenic MERS-805 CoV: An In Silico Approach Applied to Emerging Infectious Diseases In-silico design of a multi-epitope vaccine candidate against 808 onchocerciasis and related filarial diseases Recent Advances in the Vaccine Development Against Middle East 810 Discovery of a novel coronavirus associated with the recent pneumonia 835 outbreak in humans and its potential bat origin The continuing 2019-nCoV epidemic threat of novel coronaviruses to 837 global health-The latest 2019 novel coronavirus outbreak in Wuhan Expression of SARS-coronavirus nucleocapsid protein in Escherichia coli 840 and Lactococcus lactis for serodiagnosis and mucosal vaccination Potential rapid diagnostics, vaccine and therapeutics for 2019 novel 843 -ncoV): a systematic review SARS coronavirus and innate immunity Host cell proteases: Critical determinants of coronavirus 847 tropism and pathogenesis Structural modeling of 849 2019-novel coronavirus (nCoV) spike protein reveals a proteolytically-sensitive 850 activation loop as a distinguishing feature compared to SARS-CoV and related SARS-851 CD4(+) Th1 cells promote CD8(+) Tc1 cell survival, memory response, 853 tumor localization and therapy by targeted delivery of interleukin 2 via acquired pMHC I 854 complexes PROCHECK: a 878 program to check the stereochemical quality of protein structures iPBAvizu: a PyMOL plugin for an efficient 3D protein structure 881 superimposition approach Protein identification and analysis tools on the ExPASy server Reliable B cell epitope 885 predictions: Impacts of method development and improved benchmarking VaxiJen: a server for prediction of protective 888 antigens, tumour antigens and subunit vaccines Improved method for predicting linear B-cell 890 epitopes Prediction of continuous B cell epitopes in an antigen 892 using recurrent neural network A semi empirical method for prediction of antigenic 895 determinants on protein antigens PRODIGY: 897 a web server for predicting the binding affinity of protein-protein complexes Predicting 900 population coverage of T-cell epitope-based diagnostics and vaccines Resistome diversity in bovine clinical mastitis microbiome, a 903 signature concurrence Designing of interferon-gamma inducing MHC 905 class-II binders Prototype Alzheimer's disease vaccine using the 907 immunodominant B cell epitope from β -amyloid and promiscuous T cell epitope pan HLA DR-binding peptide Co-expression of the C-terminal domain of Yersinia enterocolitica invasin 910 enhances the efficacy of classical swine-fever-vectored vaccine based on human 911 adenovirus RaptorX server: a resource for 913 template-based protein structure modeling Improving the physical realism and structural accuracy of protein 916 models by a two-step atomic-level energy minimization Structure validation by Cα geometry: , ψ and Cβ deviation The HADDOCK web server for data-921 driven biomolecular docking Synonymous codon usage pattern in glycoprotein 923 gene of rabies virus The authors declare no competing interests.