key: cord-303069-ss6g3jkg authors: Jakhar, Renu; Gakhar, S.K title: An Immunoinformatics Study to Predict Epitopes in the Envelope Protein of SARS-COV-2 date: 2020-05-26 journal: bioRxiv DOI: 10.1101/2020.05.26.115790 sha: doc_id: 303069 cord_uid: ss6g3jkg COVID-19 is a new viral emergent human disease caused by a novel strain of Coronavirus. This virus has caused a huge problem in the world as millions of the people are affected with this disease in the entire world. We aimed to design a peptide vaccine for COVID-19 particularly for the envelope protein using computational methods to predict epitopes inducing the immune system and can be used later to create a new peptide vaccine that could replace conventional vaccines. A total of available 370 sequences of SARS-CoV-2 were retrieved from NCBI for bioinformatics analysis using Immune Epitope Data Base (IEDB) to predict B and T cells epitopes. Then we docked the best predicted CTL epitopes with HLA alleles. CTL cell epitopes namely interacted with MHC class I alleles and we suggested them to become universal peptides based vaccine against COVID-19. Potentially continuous B cell epitopes were predicted using tools from IEDB. The Allergenicity of predicted epitopes was analyzed by AllerTOP tool and the coverage was determined throughout the worlds. We found these CTL epitopes to be T helper epitopes also. The B cell epitope, SRVKNL and T cell epitope, FLAFVVFLL were suggested to become a universal candidate for peptide-based vaccine against COVID-19. We hope to confirm our findings by adding complementary steps of both in vitro and in vivo studies to support this new universal predicted candidate. As we all know the Corona virus has stopped the movements of the entire world. This virus is so deadly that it is taking lives of the more than thousands of people every day and affecting millions of people on the globe. Although the disease was first reported in the Wuhan city of China, where the virus was isolated from a patient with respiratory symptom in Dec 2019, [1, 2] later identified it by the name of COVID-19 [3] . World Health Organization (WHO) announced this disease as pandemic disease that spread from China to more than a hundred countries in the world. By May 25, 2020, the disease had already struck more than million persons of whom thousands of peoples died from COVID-19 infection majority of them were reported from China, Italy, United State of America, Britain and Spain. Corona viruses are the large group of viruses belonging to the family Coronaviridae and the order Nidovirales that are common among animals [4] . The coronaviridae family is divided into four genera based on their genetic properties, including Alpha, Beta, Gamma and Delta corona virus genus [5] . The 2019-nCoV is enveloped positive-sense RNA, Beta corona virus with a genome of 29.9 kb [6] . They are zoonotic, transmitted from animals to humans [7] . Covid-19 affects the respiratory system (lungs and breathing tubes). Most COVID-19 patients developed severe acute respiratory illness with symptoms of fever, cough, and shortness of breath. Maximum reported cases of COVID-19 have been linked through travel to or residence in countries in this region [8, 9] . Presently there are no clinically approved vaccines available in the world for this disease. The development of new vaccine for this new emergent strain by using therapeutic and preventive approach can be readily applied to save human lives. The use of peptides or epitopes as therapeutics is a good strategy as it has advances in design, stability, and delivery [11, 12] . Moreover, there is a growing importance on the use of peptides in vaccine design by predicting immunogenic CTL, HTL and B cell epitopes from tissue-specific proteins of organisms [13, 14] . Among the structural proteins of SARS-CoV-2, the CoV envelope (E) protein is a small integral membrane protein involved in life cycle of virus. It involves in envelope formation, and some other aspects like assembly formation, budding, and pathogenesis. Thus, it is considered to be a promising target for effective COVID-19 vaccine design [15] . More importantly, T-cell-based cellular immunity is essential for cleaning SARS-CoV-2 infection because it is memory based [16, 17] . Also, the low mutation rate of the E protein or it is a highly conserved protein that can elicits both cellular immunity, and neutralizing antibody against COVID-19 is necessary for an efficient vaccine development [18, 19] . Therefore, in this study, an immunoinformatics based approach was adopted to identify a candidate epitopes against envelope protein of SARS-CoV-2 that could be appropriately activate a significant cellular, and humoral immune response [20, 21] . The aim of this study is to analyze envelope protein strains using in silico approaches looking for the conservancy, which is further studied to predict all potential epitopes that can be used after in vitro and in vivo confirmation as a therapeutic peptide vaccine [22, 23, 24] . The protein sequence of envelope protein from severe acute respiratory syndrome coronavirus 2 isolate Indian strain (SARS-CoV-2/166/human/2020/IND) with accession no. QIA98585.1 was retrieved from the NCBI. The antigenicity of this sequence was predicted by the VaxiJen v2.0 server [25] with default parameter. In the present study, envelope protein was found to be a potential antigenic protein with good antigenicity score. A total of 370 envelope protein sequences were retrieved from the NCBI database till 12 April 2020. These 370 sequences retrieved were collected from different parts of the world; retrieved sequences and their accession numbers are listed in the supplementary file. Further, the multiple sequence alignment of envelope protein sequences was carried out through clustal W. Envelope protein 3D structure was obtained by Swissmodeller which uses homology detection methods to build 3D models [26] . UCSF Chimera was used to visualize and minimize the 3D structures [27] , and structure validation was carried out with SAVES [28] . Homology modelling was achieved to establish conformational B cell epitope prediction and for further verification of the surface accessibility and hydrophilicity of B lymphocyte epitopes predicted, as well as to visualize all predicted T cell epitopes in the structural level. B cell epitope is the portion of an immunogen, which interacts with B-lymphocytes. As a result, the B-lymphocyte is differentiated into an antibody-secreting plasma cell and the memory cell. Thus, the IEDB resource was used for analysis. Envelope protein was subjected to Bepipred linear epitope prediction [29] , Emini surface accessibility [30] , Kolaskar and Tongaonkar antigenicity [31] , Parker hydrophilicity [32] , Chou and Fasman beta turn [33] and Karplus & Schulz Flexibility Prediction [34] prediction methods in IEDB, that predict the probability of specific regions in the protein to bind to B cell receptor, being in the surface, being immunogenic, being in a hydrophilic region and being in a beta turn region, respectively. Potentially continuous B cell epitope was predicted using tool Ellipro from IEDB resource [35] . The Allergenicity of predicted epitopes was analyzed by AllerTOP Tool [36] . ToxinPred server was used to predict toxicity assessment of epitopes [37] . T-cell epitopes were predicted by the NetCTL server [38] . The parameter was set at 50 to have the highest specificity and sensitivity of 0.94 and 0.89, respectively and all the supertypes were taken during the submission of a protein sequence. A combined algorithm of Major Histocompatibility Complex (MHC)-1 binding, Transporter of Antigenic Peptide (TAP) transport efficiency and proteasomal cleavage efficiency were used to predict the overall scores [39] . On the basis of the combined score first, five best epitopes were selected for further testing as putative epitope vaccine candidates. MHC-1 binding T cell epitope was predicted by IEDB by using the Stabilized matrix method (SMM) for each peptide [40] . Prior to prediction, all epitope lengths were set as 9mers, conserved epitopes that bind to many HLA alleles at score equal or less than 1.0 percentile rank were selected. For further analysis, alleles having IC50 less than 200 nm were selected. Overall, the higher immunogenicity of peptides shows more expected to be CTL epitopes than those having lower immunogenicity. Therefore, the IEDB immunogenicity prediction tool was used for the prediction of the immunogenicity of the candidate epitopes [41] . Analysis of peptide binding to MHC class II molecules was assessed by the IEDB MHC II prediction tool, where SMM based NetMHCIIpan 3.0 server was used [42] . It covers all HLA class II alleles including HLA-DR, HLA-DQ, and HLA-DP [43] . IC50 below 200 nM show maximum interaction potentials of HTL epitope and MHC II allele [44] . Accordingly, five top epitopes were selected. The predicted HTL epitopes were submitted to the IFN epitope server to check whether the MHCII binding epitopes had the ability to induce IFN-γ [45] . All potential MHC I and MHC II binders from envelope protein were assessed for population coverage against the whole world population that had been reported COVID-19 cases. Calculations achieved using the selected MHC-I and MHC-II interacted alleles by the IEDB population coverage calculation tool [46] . Epitopes of MHC I alleles that predicted to bind with percentile rank below 0.5 were selected as the ligands, which are modeled using PEP-FOLD online peptide modeling tool [47] . The receptor MHC I allele 3D structure was obtained from the PDB server [48] . Patchdock program was used for all dockings [49] . PyMol and CHIMERA were used for visualization and determination of binding affinity and to show the suitable epitopes binding with the lowest energy. The protein sequence of envelope protein from severe acute respiratory syndrome coronavirus 2 isolate Indian strain retrieved in FASTA format was screened using the VaxiJen server to predict the immunogenicity. In the present study, the QIA98585.1) was predicted to be antigenic protein based on the overall score by the Vaxijen server and this has been indicated as an immunogenic protein. A total of 370 envelope protein sequences retrieved from the NCBI database were aligned, to see the conservation of predicted epitopes. By means of IEDB analysis resource B and T cell epitopes were predicted and population coverage was calculated. Three-dimensional structure of envelope protein of the SARS-CoV-2 was modelled using the homology structure modelling tool Swissmodeller (Fig.1) . This protein showed a good model with Swissmodeller by using PDB ID: 5X29 respectively as a template has more than 91% identity and 54% similarity with the query structure. These models were energy minimized by using Chimera. The Ramachandran plot and Prosa Z-score validation (Fig. 2) indicated that >86% residues in the favoured region for the modelled envelope protein. The conformational B-cell epitopes were also obtained in five chains of envelope protein by using ElliPro. ElliPro gives the score to each output epitope, which is Protrusion Index (PI) value averaged over each epitope residue [50] . Some ellipsoids approximated the tertiary structure of the protein. The highest probability of a conformational epitope was calculated at 76% (PI score: 0.76). Residues involved in conformational epitopes, their number, location and scores are shown in Table 1 , 60 SRVKNL 65 residues were found have highest PI score. This epitope is antigenic, nonallergic, nontoxin, and conserved in SARS-CoV-2. Also, their positions on 3D structures are shown in Envelope protein from the SARS-CoV-2 was analyzed using the IEDB MHC-1 binding prediction tool to predict the T cell epitope suggested interacting with different types of MHC Class I alleles. Based on NetCTL and SMM-based IEDB MHC-I binding prediction tools with higher affinity (IC50 less than 200) were predicted to interact with different MHC-1 alleles. The predicted total score of proteasome score, tap score, MHC score, processing score, and MHC-I binding are summarized as a total score in Table 2 Table 2 . Among these 5 T-cell epitopes, 9-mer epitope, FLAFVVFLL was found to have the highest immunogenicity which was maximum than above said epitope and found to have more number of allelic interactions with good population coverage than other epitopes. By the same way in IEDB MHC-1 binding prediction tool, T-cell epitopes from the SARS-CoV-2 were analyzed using the MHC-II binding prediction method; based on SMM based NetMHCIIpan with IC50 less than 200. There were top 5 predicted epitopes found to be nonallergic and antigenic interact with MHC-II alleles for which the peptide (core) (Table-2 and 3) . Epitopes that are suggested interacting with MHC-I and II alleles (especially high affinity binding epitopes and that can bind to a different set of alleles) were selected for population coverage analysis. The results of population coverage of all epitopes are listed in Table 2 and 3. FLAFVVFLL epitope that interacts with most frequent MHC class I and II alleles gave a high percentage against the whole world population by the IEDB population coverage tool. The maximum class I and II combined population coverage (84.88%) for this proposed epitope was found in North America (Table-4 ), while the higher population coverage in Europe (83.87%) and East Asia (81.61%) followed by South Asia (65.17%) and North Africa (65.31%) then Northeast Asia (64.8%) and Southeast Asia (61.1%). Table 4 represents the populations for which the MHC I and II Class Combined coverage of other areas. 5 Proposed T-Cell epitopes, FLAFVVFL (green) and B cell epitopes, SRVKNL (yellow) in pantamer structure of E protein of SARS-CoV-2. The predicted T cell epitope FLAFVVFLL that interacted with selected human's MHC-1and II alleles were used as ligands (Fig. 6) to detect their interaction with alleles /receptors, by docking techniques using on-line software Patchdock. After successful docking by PatchDock, the refinement and re-scoring of the docking results were carried out by the FireDock server. After refinement of the docking scores, the FireDock server generates global energies/ binding energies for the best solutions. Chimera was used to visualize the best results. The 3D structure of epitopes was predicted using PEP-FOLD and energy minimization was carried out by using Chimera. Based on the binding energy in kcal/mol unit, the lowest binding energy (kcal/mol) was selected to obtain a best binding (pose) and to predict real CTL and HTL epitope as possible. The receptors used for docking studies included reported HLAs, HLA-C*03:03 (PDB ID: 1EFX) for class I and HLA-DRB1*01:01 (PDB ID: 1AQD) for class II. HLA-C*03:03 and HLA-DRB1*01:01 was observed to have the interaction with the FLAFVVFLL epitope with lower binding energy, -50.11 kCal/mol and -73.72 kCal/mol respectively ( Fig. 7 and 8) . The predicted peptide showed significant binding affinities with all HLAs. Also, the binding energy of the predicted epitopes were compared with the binding energy of the already experimentally verified peptides and found to be negative [16] . In this study, we aimed to determine the highly potential immunogenic epitopes for B and T Since the immune response of T cell is long lasting response comparing with B cell, where the antigen can easily escape the antibody memory response [18] additionally, CD8+ T and CD4+ T cell responses play a major role in antiviral immunity [16] , designing of a vaccine against T cell epitope is much more promising. FLAFVVFLL epitope could be used as a potential candidate because it had a maximum combined score and immunogenic score. Moreover, it possessed the maximum number of HLA binding alleles amongst other CTL and HTL. This epitope was found to be antigenic, non-toxin and nonallergic. An ideal epitope should be highly conserved. The conservancy analysis of this epitopes indicated that this epitope was found to have been conserved in all sequences of the SARS-CoV-2 consider in this study. We found these CTL epitopes to be HTL epitopes. The overlapping between MHC Class I and II T cell epitopes suggested the possibility of antigen presentation to immune cells via both MHC class I and II pathways especially the overlapping sequences. To conclude, by using E protein one epitope, SRVKNL was proposed for an international therapeutic peptide vaccine for B cell. Regarding T cell, the FLAFVVFLL epitope was highly recommended as a therapeutic peptide vaccine to interact with both MHC class I and II. We recommend in vitro and in vivo validation for the efficacy and efficiency of these predicted candidate epitopes as a vaccine as well as to be used as a diagnostic screening test. Author contribution: Renu Jakhar conducted the study, performed in silico analysis and wrote the manuscript. S.K. Gakhar plans the study and revises the manuscript. Outbreak of Pneumonia of Unknown Etiology in Wuhan China: the Mystery and the Miracle A new coronavirus associated with human respiratory disease in China The 2019-new coronavirus epidemic: evidence for virus evolution Emerging coronaviruses: genome structure, replication, and pathogenesis A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Cross-species transmission of the newly identified coronavirus 2019-nCoV Recent advances in the detection of respiratory virus infection in humans The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health-The latest 2019 novel coronavirus outbreak in Wuhan Epitope-based vaccine target screening against highly pathogenic MERS-CoV: an in silico approach applied to emerging infectious diseases Structural basis of development of multi-epitope vaccine against middle east respiratory syndrome using in silico approach Epitope-based peptide vaccine design and target site depiction against Middle East Respiratory Syndrome Coronavirus: an immune-informatics study Recent Advances in the Vaccine Development Against Middle East Respiratory Syndrome-Coronavirus Preliminary identification of potential vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies Coronavirus envelope protein: current knowledge The membrane protein of severe acute respiratory syndrome coronavirus acts as a dominant immunogen revealed by a clustering region of novel functionally and structurally defined cytotoxic Tlymphocyte epitopes A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 Analysis of the genome sequence and prediction of B-cell epitopes of the envelope protein of Middle East respiratory syndrome-coronavirus The membrane protein of severe acute respiratory syndrome coronavirus functions as a novel cytosolic pathogen-associated molecular pattern to promote beta interferon induction via a Toll-like-receptor-related TRAF3-independent mechanism Exceptionally potent neutralization of Middle East respiratory syndrome coronavirus by human monoclonal antibodies Evaluation of candidate vaccine approaches for MERS-CoV A decade after SARS: strategies for controlling emerging coronaviruses More than one reason to rethink the use of peptides in vaccine design VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines SWISS-MODEL: homology modelling of protein structures and complexes UCSF Chimera, a visualization system for exploratory research and analysis Stereochemistry of polypeptide chain configurations Prediction of residues in discontinuous B-cell epitopes using protein 3D structures Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide A semi-empirical method for prediction of antigenic determinants on protein antigens New hydrophilicity scale derived from highperformance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X ray-derived accessible sites Prediction of the secondary structure of proteins from their amino acid sequence Prediction of chain flexibility in proteins Protection from Ebola virus mediated by cytotoxic T lymphocytes specific for the viral nucleoprotein AllerTOP -a server for in silico prediction of allergens Open Source Drug Discovery Consortium. In silico approach for predicting toxicity of peptides and proteins Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee'artificial neural network approach The immune epitope database (IEDB) 3.0 Properties of MHC class I presented peptides that enhance immunogenicity NetMHCIIpan-3.0, a common panspecific MHC class II prediction method including all three human MHC class II isotypes Protection from Ebola virus mediated by cytotoxic T lymphocytes specific for the viral nucleoprotein Toward more accurate pan-specific MHCpeptide binding prediction: a review of current methods and tools Novel immunoinformatics approaches to design multi-epitope subunit vaccine for malaria by investigating anopheles salivary protein Predicting population coverage of T-cell epitope-based diagnostics and vaccines PEP-FOLD: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema PatchDock and SymmDock: servers for rigid and symmetric docking A comprehensive analysis of aminopeptidase N1 protein (APN) from Anopheles culicifacies for epitope design using Immuno-informatics models