key: cord-016126-i7z0tdrk authors: Dangi, Mehak; Kumari, Rinku; Singh, Bharat; Chhillar, Anil Kumar title: Advanced In Silico Tools for Designing of Antigenic Epitope as Potential Vaccine Candidates Against Coronavirus date: 2018-10-14 journal: Bioinformatics: Sequences, Structures, Phylogeny DOI: 10.1007/978-981-13-1562-6_15 sha: doc_id: 16126 cord_uid: i7z0tdrk Vaccines are the most economical and potent substitute of available medicines to cure various bacterial and viral diseases. Earlier, killed or attenuated pathogens were employed for vaccine development. But in present era, the peptide vaccines are in much trend and are favoured over whole vaccines because of their superiority over conventional vaccines. These vaccines are either based on single proteins or on synthetic peptides including several B-cell and T-cell epitopes. However, the overall mechanism of action remains the same and works by prompting the immune system to activate the specific B-cell- and T-cell-mediated responses against the pathogen. Rino Rappuoli and others have contributed in this field by plotting the design of the most potent and fully computational approach for discovery of potential vaccine candidates which is popular as reverse vaccinology. This is quite an unambiguous advance for vaccine evolution where one begins with the genome information of the pathogen and ends up with the list of certain epitopes after application of multiple bioinformatics tools. This book chapter is an effort to bring this approach of reverse vaccinology into notice of readers using example of coronavirus. It compelled us to apply the well-known reverse vaccinology (RV) approach on available proteome of coronavirus. RV approach has been successfully applied on many prokaryotes, but there are very few known applications on eukaryotes and viruses. So, it is worthwhile to explore the potential of this approach to identify potential vaccine candidates for coronavirus. RV basically does the in silico examination of the viral proteome to hunt antigenic and surface-exposed proteins. This approach was initially applied successfully to Neisseria meningitidis serogroup B (Kelly and Rappuoli 2005) against which none of the prevailing techniques could develop a vaccine. The present book chapter is intended to explore the potential of RV approach to select the probable vaccine candidates against coronavirus and validate the results using docking studies. Undoubtedly, the traditional approaches for vaccine development are fortunate enough to efficiently resist the alarming pathogenic diseases of its time. However, the traditional approach suffers from certain limitations like it is very timeconsuming, the pathogens which can't be cultivated in the lab conditions are out of reach, and certain non-abundant proteins are not accessible using this approach (Rappuoli 2000) . Consequently, a number of pathogenic diseases are left without any vaccine against them. All these limitations are conquered by reverse vaccinology approach utilizing genome sequence information which ultimately is translated into proteins. Hence all the proteins expressed by the genome are accessible irrespective of their abundance, conditions in which they expressed. The credit of fame of reverse vaccinology should go to the advancements in the sequencing strategies worldwide. Accordingly, improvement in the sequencing technologies has flooded the genome databases with huge amount of data which can be computationally undertaken to reveal the various crucial aspects of the virulence factors of the concerned pathogen. Reverse vaccinology is based on same approach of computationally analysing the genome of pathogen and proceeds step by step to ultimately identify the highly antigenic, secreted proteins with high epitope densities. The best epitopes are selected as potential vaccine candidates (Pizza et al. 2000) . This approach has brought the unapproachable pathogens of interest in spotlight and is evolving as the most reassuring tool for precise selection of vaccine candidates and brought the use of peptide vaccines in trend (Sette and Rappuoli 2010; Kanampalliwar et al. 2013 ). Bexsero is the first universal serogroup B meningococcal vaccine developed using RV, and it has currently earned positive judgement from the European Medicines Agency (Gabutti 2014) . Whether it is discovery of pili in gram-positive pathogens which were thought to not have any pili or the sighting of factor G-binding protein in meningococcus (Alessandro and Rino 2010), the reverse vaccinology steals all the credits from other conventional approaches. Most of the applications of RV are against prokaryotes and very few against eukaryotes and viruses because of complexity of their genome. Corynebacterium urealyticum (Guimarães et al. 2015) , Mycobacterium tuberculosis (Monterrubio-López et al. 2015) , H. pylori (Naz et al. 2015) , Acinetobacter baumannii (Chiang et al. 2015) , Rickettsia prowazekii (Caro-Gomez et al. 2014) , Neospora caninum (Goodswen et al. 2014) and Brucella melitensis (Vishnu et al. 2017) are the examples of some pathogens that are recently approached using this in silico technique in order to spot some epitopes having potential of being a vaccine candidate. Herpesviridae (Bruno et al. 2015 ) and hepatitis C virus (HCV) (Kolesanova et al. 2015) are the examples of the viruses that are addressed using this approach. (Altschul et al. 1990; Okonechnikov et al. 2012; Golosova et al. 2014) . Multiple sequence alignment (MSA) was done via ClustalW, and the phylogenetic tree was constructed using NJ method from Unipro UGENE 1.16.1 bioinformatics toolkit (Okonechnikov et al. 2012 ). Analysis of secondary structure of the proteins of seed genome was done by means of ExPASy portal. The aim is to forecast the solvent accessibility, instability index, theoretical pI, molecular weight, grand average of hydropathicity (GRAVY), aliphatic index, number of charged residues, extinction coefficient etc. (http://web. expasy.org/protparam/; Gasteiger et al. 2005) . Virus-mPLoc was used to identify the localization of proteins of virus in the infected cells of host (http://www.csbio.sjtu.edu.cn/bioinf/virus-multi/; Hong-Bin Shen and Kuo-Chin Chou 2010) . This information is important to understand the destructive role and mechanism of the viral proteins in causing the disease. In total six different subcellular locations, namely, host cytoplasm, viral capsid, host plasma membrane, host nucleus, host endoplasmic reticulum and secreted proteins, were covered. These predictions could help in formulation of better therapeutic options against the virus. As per the protocol of RV, secreted and membrane proteins are of special interest, therefore, filtered for further analysis. To predict the number of transmembrane helices TMHMM Server v. 2.0 (http://www.cbs.dtu.dk/services/TMHMM/; Krogh et al. 2001 ) was used. Signal peptides are known to impact the immune responses and possess high epitope densities. Moreover, most of the known vaccine candidates also possess signal peptides. Hence, it is worthwhile to predict signal peptides in proteins prior to epitope predictions. Signal-BLAST web server is used to predict the signal peptides without any false predictions (http://sigpep.services.came.sbg.ac.at/signalblast.html; Frank and Sippl 2008) . The prediction options include best sensitivity, balanced prediction, best specificity and detect cleavage site only. We choose to make the predictions using each option, and the proteins predicted as signal peptide by all the four options were preferred for further investigation. The most appropriate targets as vaccine candidates are those which possess the adhesion-like properties because they not only mediate the adhesion of pathogen's proteins with cells of host but also facilitate transmission of virus. Adhesions are known to be crucial for virulence and are located on surface which makes them promptly approachable to antibodies. The stand-alone SPAAN with a sensitivity of 89% and specificity of 100% was used to carry out the adhesion probability predictions, and the proteins with having adhesion probabilities higher than or equal to 0.4 were selected (Sachdeva et al. 2004 ). BetaWrap motifs are dominant in virulence factors of the pathogens. If the proteins are predicted to possess such motifs, then they are appropriate to be taken under reverse vaccinology studies. BetaWrap server is the only online web server to make such predictions. The proteins having P-value lower than 0.1 were anticipated to contain BetaWraps (http://groups.csail.mit.edu/cb/betawrap/betawrap.html; Bradley et al. 2001 ). For added identification of the antigenic likely of the proteins, they were subjected to VaxiJen server version 2.0. It is basically an empirical method to hunt antigenic proteins. So, if the proteins are not found antigenic using other sequence-based methods, then they can be identified using this method. This step confirms the antigenicity of proteins selected using above-mentioned steps (http://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.html; Doytchinova and Flower 2007). For being a probable vaccine candidate, the protein should not exhibit the characteristics of an allergen as they trigger the type-1 hypersensitivity reactions causing allergy. Therefore, to escape out such possibilities, the proteins were also subjected to allergenicity predictions using Allertop (http://www.pharmfac.net/ allertop; Dimitrov et al. 2014) and AlgPred tools (http://www.imtech.res.in/ raghava/algpred/submission.html; Saha and Raghava 2006a, b). To check whether the filtered proteins possess any similarity to host proteins or not, the standard Blastp (http://blast.ncbi.nlm.nih.gov/blast) searches were performed. In case of sequence similarity, there is a feasibility of generation of immune responses against own cells. Predicting the epitopes binding to MHC class I is the main decisive phase of the RV to carry out valid vaccine predictions. The predicted epitopes were docked with receptor that is HLA-A*0201 using ClusPro (http://cluspro.bu.edu/login.php; Kozakov et al. 2017 ) that is an automated protein-protein docking web server. The literature searches provided the information of conserved residues of the receptor site. The default parameters were used for docking (Comeau et al. 2004a, b; Kozakov et al. 2006 ). A total of 40 different sequenced strains of coronavirus are available at NCBI. Among them 7 strains are pathogenic to humans. Various information regarding source, host and collection of these strains are presented in Table 15 .1 and 15.2. This information can be obtained from NCBI's genome database, the Virus Pathogen Database and Analysis Resource and Genomes OnLine Database (Liolios et al. 2006; Pickett et al. 2012) . The MERS strain is taken as seed genome as it is the most prevalent and disastrous strain among others. Its proteome consists of total 11 proteins as shown in Table 15 .3. The results of sequence similarity to reveal orthologs using Blastp are shown in Table 15 .4. The sequences with greater than 30% identity score are considered as homologs. The phylogenetic tree is depicted in Fig. 15 .1 and the MERS-CoV, taken as seed genome, found clustered with different Bat coronaviruses. The results of analysis of secondary structure of the proteome using ExPASy tools are shown in the Table 15 .5. From the analysis of charge on the residues and pH values, it is concluded that six of the proteins are basic and positively charged unlike allergens which are acidic in nature. However, five proteins are acidic and show negative charge. The negative GRAVY score of five proteins justify them to be of hydrophilic nature with majority of the residues positioned towards the surface. For the rest of six proteins, the GRAVY score is positive; it means that these are The accession number and identity of orthologs obtained in different strains is shown in the table hydrophobic proteins. The proteins with less than 40 value of instability index are quite stable than those with higher values. All the proteins are having the molecular weight less than 110 kDa except 3 (YP_009047202.1, YP_009047203.1 and YP_009047204.1). This exhibits the effectiveness of lightweight proteins as targets as they can be easily purified because of their low molecular weights. The protein YP_009047204.1 is reported as a spike glycoprotein. It is acidic with prominent negative charge, with negative GRAVY score which suggests its hydrophilicity and Figure 15 .2 depicts the subcellular localization of proteins of the seed genome, i.e. MERS-CoV. Only one protein was predicted to be localized in host cytoplasm, four in host membrane, two in both host cell membrane and endoplasmic reticulum (ER) while two in only ER, and two are left unrecognized. The known spike protein is predicted to be localized in host ER. From these results we decided to pick the proteins which are located in host membrane or were predicted to be localized in both host membrane and ER. The two are known envelop protein and membrane protein from bibliographic studies, and along with that, the known spike protein was also included in the filtered results. Out of the filtered proteins, only two (YP_009047210.1 and YP_009047208.1) contain more than two transmembrane helices, therefore filtered out. The results of transmembrane helices prediction are tabulated in Table 15 .6. Figure 15 .3 depicts the subcellular localization of proteins of all the four selected genomes using Virus-mPLoc prediction tool. The proteins that are predicted to possess the signal peptides by Signal-BLAST web server are YP_009047204.1 and YP_009047205.1. The results of Signal-BLAST web server are tabulated in the Table 15 .7. This step takes into account the concept of adhesion-based virulence. Adhesions cause pathogen recognition and initiation of inflammatory responses by the host. SPAAN predicted 2 (YP_009047204.1 and YP_009047205.1) out of 11 proteins of MERS strain as adhesive (Table 15 .8). Only one protein (YP_009047204.1) was predicted to contain BetaWrap motifs within it (Table 15 .8). Hence, it is considered virulent and might be responsible for initializing the infection in the host. A total of 9 out of 11 proteins of MERS strain were predicted antigenic (prediction values greater than 0.4). The protein with accession number YP_009047206.1 and YP_009047208.1 were among the filtered proteins, however, not predicted antigenic, therefore filtered out. As a result, only four proteins (YP_009047204.1, YP_009047205.1, YP_009047207.1 and YP_009047209.1) were kept for further analyses. None of the 11 proteins of MERS-CoV possessed any clue of allergenicity as per prediction results from AlgPred and Allertop tools; it means that no vigorous immune responses will be mounted if the epitopes from these proteins will be adopted as vaccine candidates. None of the protein of MERS strain shows similarity with the proteins of host that demonstrates that the epitopes from these proteins can safely elicit the required immune response without the hazard of autoimmunity. In total 12 different 9-mer epitopes with potential to bind to receptors of both B-cell and T-cell were predicted. The list of the predicted epitopes can be found in the Table 15 .9 and are specific for MERS-CoV strain. All these epitopes displayed no conservancy with proteins of other human and non-human pathogenic strains. Docking permits to reveal the binding energy or potency of connection among epitopes and the receptor in appropriate orientation. The ClusPro docking server was used to dock the predicted 90 epitopes against HLA-A*0201. The structure of the receptor was available from PDB and was optimized before docking to free it from the complexed self-peptide (4U6Y, Resolution 1.47 Å, Bouvier et al. 1998 ). PEPstr (Peptide Tertiary Structure Prediction Server; Kaur et al. 2007 ) was used to derive the tertiary structure of the predicted peptides. Figure 15 .4 depicts the quaternary structure of the receptor HLA-A*0201 with its conserved active site known to form complex with the peptides (Bouvier et al. 1998 ). The binding energy results obtained after performing docking analysis are listed in Table 15 .9. The 9-mer epitope VVCAITLLV at site 21 of protein YP_009047209.1 docked to the receptor with smallest amount of binding energy (À951.7) and 12 hydrogen bonds. The next epitope in the list was also from the same protein YP_009047209.1 at site 27, i.e. TLLVCMAFL. The predicted structure of the top 5 potent epitopes on the basis of docking energy and the snapshots of docking results are displayed in Figs. 15.5, 15.6, 15.7, 15.8 and 15.9 . The most chief restriction for developing a safe and sound vaccine against any of the virus is to identify the protective antigens. The present study is an effort of application of reverse vaccinology approach to investigate a choice of coronavirus proteomes to identify possible vaccine targets. This technique has demonstrated to be a competent way to forecast 12 different epitopes from the selected seed genome. These epitopes are from spike glycoprotein, NS3 protein, NS4B protein and envelope protein. Unfortunately none of the epitope is found conserved in other strains, and all are specific to MERS-CoV. The docking analysis studies revealed perfect binding between HLA-A*0201 receptor and epitopes. The conserved residues of the receptor site are also involved in H-bonding with epitope residues. Further, the selected antigenic epitopes must be validated using in vitro and in vivo studies to confirm their potential as vaccine candidates. Review: reverse vaccinology: developing vaccines in the era of genomics Basic local alignment search tool The Middle East respiratory syndrome coronavirus -a continuing risk to global health security Crystal structures of HLA-A*0201 complexed with antigenic peptides with either the amino-or carboxyl-terminal group substituted by a methyl group BETAWRAP: successful prediction of parallel beta -helices from primary sequence reveals an association with many microbial pathogens Geminiviridae. In: Virus taxonomy-ninth report of the International Committee on Taxonomy of Viruses Lessons from Reverse Vaccinology for viral vaccine design Discovery of novel cross-protective Rickettsia prowazekii T-cell antigens using a combined reverse vaccinology and in vivo screening approach Identification of novel vaccine candidates against Acinetobacter baumannii using reverse vaccinology ClusPro: a fully automated algorithm for protein-protein docking ClusPro: an automated docking and discrimination method for the prediction of protein complexes Middle East respiratory syndrome coronavirus (MERS-CoV): announcement of the coronavirus study group AllerTOP v.2-a server for in silico prediction of allergens VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines High performance signal peptide prediction based on sequence alignment techniques Meningococcus B: control of two outbreaks by vaccination Protein identification and analysis tools on the ExPASy server Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses Discovering a vaccine against neosporosis using computers: is it feasible? Genome informatics and vaccine targets in Corynebacterium urealyticum using two whole genomes, comparative genomics, and reverse vaccinology Reverse vaccinology: basics and applications PEPstr: a de novo method for tertiary structure prediction of small bioactive peptides Reverse vaccinology and vaccines for serogroup B Neisseria meningitidis Way to the peptide vaccine against hepatitis C PIPER: an FFT-based protein docking program with pair wise potentials The ClusPro web server for protein-protein docking Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes An integrative approach to CTL epitope prediction, a combined algorithm integrating MHC-I binding, TAP transport efficiency, and proteasomal cleavage prediction Improved method for predicting linear B-cell epitopes The genomes on line database (GOLD) v.2: a monitor of genome projects worldwide Identification of novel potential vaccine candidates against tuberculosis based on reverse vaccinology Identification of putative vaccine candidates against Helicobacter pylori exploiting exoproteome and secretome: a reverse vaccinology based approach Database resources of the national center for biotechnology information Unipro UGENE: a unified bioinformatics toolkit Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing Reverse vaccinology SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks AlgPred: prediction of allergenic proteins and mapping of IgE epitopes Prediction of continuous B-cell epitopes in an antigen using recurrent neural network Reverse vaccinology: developing vaccines in the era of genomics Virus-mPLoc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites ProPred1: prediction of promiscuous MHC class-I binding sites Identification of potential antigens from non-classically secreted proteins and designing novel multitope peptide vaccine candidate against Brucella melitensis through reverse vaccinology and immunoinformatics approach