key: cord-0683828-iy4knx7j authors: Banerjee, Amrita; Santra, Dipannita; Maiti, Smarajit title: Energetics based epitope screening in SARS CoV-2 (COVID 19) spike glycoprotein by Immuno-informatic analysis aiming to a suitable vaccine development date: 2020-04-05 journal: bioRxiv DOI: 10.1101/2020.04.02.021725 sha: 998ebc942842d9052e95776680d2b6caa0480772 doc_id: 683828 cord_uid: iy4knx7j The recent outbreak by SARS-CoV-2 has generated a chaos in global health and economy and claimed/infected a large number of lives. Closely resembling with SARS CoV, the present strain has manifested exceptionally higher degree of spreadability, virulence and stability possibly due to some unidentified mutations. The viral spike glycoprotein is very likely to interact with host Angiotensin-Converting Enzyme 2 (ACE2) and transmits its genetic materials and hijacks host machinery with extreme fidelity for self propagation. Few attempts have been made to develop a suitable vaccine or ACE2 blocker or virus-receptor inhibitor within this short period of time. Here, attempt was taken to develop some therapeutic and vaccination strategies with a comparison of spike glycoproteins among SARS-CoV, MERS-CoV and the SARS-CoV-2. We verified their structure quality (SWISS-MODEL, Phyre2, Pymol) topology (ProFunc), motifs (MEME Suite, GLAM2Scan), gene ontology based conserved domain (InterPro database) and screened several epitopes (SVMTrip) of SARS CoV-2 based on their energetics, IC50 and antigenicity with regard to their possible glycosylation and MHC/paratopic binding (Vaxigen v2.0, HawkDock, ZDOCK Server) effects. We screened here few pairs of spike protein epitopic regions and selected their energetic, IC50, MHC II reactivity and found some of those to be very good target for vaccination. A possible role of glycosylation on epitopic region showed profound effects on epitopic recognition. The present work might be helpful for the urgent development of a suitable vaccination regimen against SARS CoV-2. An outbreak of a novel Coronavirus, Severe Acute Respiratory Syndrome CoV-2 (or SARS CoV-2 or COVID- 19) infection is threatening the humanity, globally occurring from last week of December 2019. As a result, a massive loss of human health status and global economy are becoming unaccountable. As of current situation, SARS CoV-2 claimed more than 40,000 lives from more than 800,000 infected persons globally [1] . The outbreak started from the Wuhan province of China and spread more than 195 countries with most adverse effects in China, Italy, Iran, Spain, the United States, France, Germany, Britain and several other countries. Any type of therapeutic strategies starting from the blocking of viral entry, inhibition of spike proteins association with host ACE-2 (angiotensin converting enzyme type 2), modulations of interfering kinase activity, inactivation of viral genome expression-packaging and vaccination against this virus is the demand of the present situation. Regarding the vaccination strategies, it is assumed that frequent mutation results in anomalies in its surface/ spike proteins [2, 3] . Mostly resembling the features of SARS CoV global outbreak (2003, https://www.who.int/csr/sars/en/) , this virus unlikely manifested it's extremely high grade of virulence, spreading capability and stability across the geographical barrier (or specifically colder place, aged persons or specific genders; yet to be clarified) [4] . The positive selective pressure could account for the stability and some clinical features of this virus compared with SARS and Bat SARS-like CoV [5] . Stabilizing mutation falling in the endosome-associated-protein-like domain of the nsp2 protein could account for COVID-2019 high ability of contagious, while the destabilizing mutation in nsp3 proteins could suggest a potential mechanism differentiating COVID-2019 from SARS CoV [5] . Nevertheless, nutritional and immunological statuses are also important factors for the screening of the therapeutic strategies for the affected and sensitive persons. Possible medications or immunizations from the existing drugs or infusion of convalescent plasma should be conducted with utmost care to the COVID 19 patients [6] . Advanced precautionary steps and therapeutic interventions should be formulated taking into account of several personal and community factors [7] . Development of a successful and reproducible vaccination protocol and its human trial may take longer time for the issues of mutation and large number glycan shield and epitope masking on the SARS CoV 2 proteins [8] . In a series of medication regimen, 1 (AT1R) blockers is used for reducing the severity and mortality from SARS-CoV-2 virus infections [9] . Chloroquine and Hydroxychloroquine are now being prescribed somewhere to fight COVID-19 for the time being [10, 11] . Human coronaviruses and other influenza viruses resulted in epidemic in last 2 decade in different parts of the world. The anomalies between severity and spreading between the origin site, China and the other parts of the World (European and North America countries) might have some indication. Common human CoVs may have annual peaks of circulation in winter months in the US, and individual human CoVs may show variable circulation from year to year. [12] Colder climate and prior exposure to other human coronaviruses, or influenza or flu viruses or possible vaccination against those might develop antibody dependent enhancement (ADE) of immunological responses during recent SARS CoV-2 exposure. ADE might have modulated immune response and could elicit sustained inflammation, lymphopenia, and/or cytokine storm [13, 14] . Possibly, that could be one of the reasons (more history of exposure with CoVs beside weaker immune system) for older people being more affected by the present SARS CoV-2. Moreover, both helper T cells and suppressor T cells in patients with COVID-19 were below normal levels. The novel coronavirus might mainly act on lymphocytes, especially T lymphocytes [15] . Strong inflammatory events could be the initiator of the collapsing environment during COVID-19 infection. In most of the death cases in COVID-19 infections, acute respiratory failure is followed by other organs like kidney anomalies. In these cases inflammatory outburst might have worsened the infection and post viral-incubation situations [16, 17] . Recent studies in experimentally infected animal strongly suggest a crucial role for virus-induced immunopathological events in causing fatal pneumonia after human CoV infections [18] . So, combined anti-viral and anti-inflammatory treatment might be beneficial in these cases [19] . SARS-based available immune-therapeutic and prophylactic modalities revealed poor efficacy to neutralize and protect from infection by targeting the novel spike protein. [20] . In this background, critical screening of the spike sequence and structure from SARS CoV-2 by energetic and IC50 based immunoinfrmatics analysis may help to develop a suitable vaccine. So, in the current study we were intended to analyze the spike proteins of SARS CoV, MERS CoV and SARS CoV 2 and four other earlier out-breaking human corona virus strains. We critically compared SARS CoV and SARS CoV 2 spike-proteins, domains, motifs and screened several epitopes based on their energetics, IC50 and antigenicity employing several bio/immuuno -informatics software with regard to their possible glycosylation and MHC/paratopic binding effects. The present work might be helpful for the urgent development of a suitable vaccination regimen. Tertiary structures of selected coronavirus (CoV) spike proteins were predicted/ validated using Phyre2, Protein Homology/analogy Recognition Engine V 2.0 [22] and SWISS-MODEL [23] . In Phyre2 structures were predicted against 100,000 experimentally designed protein folds. Predicted structures were subjected to analysis in SWISS-MODEL for QMEAN Z-score calculation which includes cumulative Z-score of Cβ, All atoms, Solvation and Torsion values prediction. RAMPAGE: Ramachandran Plot Analysis server [24] was used for protein 3D structures quality assessment. The summation of number of residues in favored regions and in additionally allowed regions was considered for percent (%) quality assessment. Predicted tertiary structures were visualized and aligned using PyMol molecular visualization system. Pymol assigns the secondary structure using a secondary structure alignment algorithm called "dss", where the sequences of two structures were aligned first then the structures were aligned. For the visualization of molecules a high-speed ray-tracer molecular graphics system was used. Secondary structural analysis and their 3D folding patterns were analyzed in the form of topology using ProFunc; a protein function predicting server using protein 3D structures [25] . In protein classification, topology analysis plays an independent and effective alternative to traditional structural prediction. Topological differences between two structures indicated differences in protein folding and flexibility. Sequence comparisons among selected CoV spike glycoproteins were conducted through multiple sequence alignment using Clustal X2 [26] . Conserved motifs were identified using MEME Suite (http://meme.sdsc.edu/meme/cgi-bin/mast.cgi) server. MEME Suite represents the ungapped conserved sequences which are frequently present in a group of related sequences. The 7 motif number has been defined in the current study for motif finding. Whereas, GLAM2Scan tools was used for the identification of gapped motifs within the related sequences. Conserved motifs were represented through LOGO using GLAM2Scan tools of MEME Suite server. Identified motifs were subjected to annotation using protein BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC= blasthome) and finally functional gene ontology based conserved domain identification was conducted using InterPro: Classification of protein families interactive database [27] . Conserved epitopes of SARS Cov-2 spike glycoprotein were identified using SVMTrip: A tool which predicts Linear Antigenic Epitopes [28] . SVMTrip predicts the linier antigenic epitopes by feeding Support Vector Machine with the Tri-peptide similarity and Propensity scores of different pre-analyzed epitope data. Annotation of predicted epitopes was performed through protein BLAST. SVMTrip have gained 80.1% sensitivity and 55.2% precision value with five fold cross-validation. For epitope prediction 20 amino acid lengths was selected. The Major Histocompatibility Complex (MHC) binding efficiency of predicted epitops was performed using Immune Epitope Database (IEDB) and Analysis Resource [29] . A total of 5 DPA, 6 DQA and 662 DRB alleles from MHC class II were screened for the detection of best interactive alleles on the basis of highest consensus percentile rank and lowest IC50 value. All the analyses were performed on Human Class II allele, using frequently occurring alleles (frequency > 1%), peptide length of 9mers was selected; consensus percentile rank ≤ 1 was used for the selection of peptides. Antigenecity of predicted epitopes were determined using Vaxigen v2.0 protective antigen, tumour antigens and subunit vaccines prediction server [30] . Vaxigen v2.0 uses auto cross covariance (ACC) transformation of selected protein sequences based on unique amino acid properties. Each sequence was used to find out 100 known antigen and 100 non-antigens. The identified sequences were tested for antigenecity by leave-one-out cross-validation and overall external validation. The prediction accuracy was up to 89%. The structure of MHC class II HLA-DRA, DRB molecule (PDB ID: 2q6w, 5jlz) and fully glycosylated COVID 19 spike protein structure (PDB ID: 6svb) was retrieved from Protein Data Bank (PDB) and Docking was performed using HawkDock [31] and ZDOCK [32] Server generating 100 docking solutions. Among them best 10 were analyzed based on docking scores and binding free energy value calculation. The Higher degree of similarity between SARS CoV and SARS CoV-2 might indicate and help in the therapeutic and vaccination strategies with reference to the current global situation. However, an absolute higher degree of virulence and spreading nature of SARS CoV-2 is of great concern in the present scenario. Based on the alignment pattern, selected sequences were subjected to analysis different conserved motifs in the protein sequences. A total of 7 conserved motifs were analyzed ( The epitope designing was conducted only with the COVID 19 spike glycoprotein sequence and structure. From the sequence analysis, 10 different locations were found which also showed similarity with SARS CoV and SARS COVID 2 spike glycoprotein in protein BLAST (Figure 4) . Also the motif positions within the spike glycoprotein monomer were represented in figure 4 . Epitopes 1, 4 and 5 were not represented in COVID 19 spike glycoproteins, as they were found to be embedded within the virus envelop. Among the others, motifs 2, 3, 6 and 7 were found at the interior location of spike glycoprotein monomer but motifs 8, 9 and 10 were found at the surface of the structure, which could be used as the immunological targets for the proper diagnosis and treatment of COVID 19. The important issue of epitope finalization could be confronted by the factor of possible transition between pre-fusion and postfusion spike structural distortion. Specific mutant structure has been designed and tested to be resistant to conformational change after ACE2 binding and protease cleavage at the S1/S2 site [34] . This may be indicative to searching suitable epitope which may remain unhindered from pre-to post-fusion state transition. The proper type of Major Histocompatibility Complex (MHC) selection for identified COVID 19 epitope was performed and enlisted in Table 3 ). The threshold value of highest consensus percentile rank was selected as 10 for all. As a whole, highest Consensus percentile rank value of 10 was observed for sequence QQLIRAAEIRASANL (epitope 3A) and lowest IC50 value of 7.11 was observed for sequence IIAYTMSLGAENSVA (epitope 8B). The antigenic property of identified target sequences from epitopes was also predicted on the basis of threshold value of 0.4. Below the threshold value, the sequence has been considered as non-antigenic and sequences with above value were antigenic in nature. A total of 9 antigenic sequences were detected (Table 3) , among them two sequences AAEIRASANLAATKM (epitope 3B) and ITPGTNTSNQVAVLY (epitope 10B) were found with higher threshold value of 0.7125 and 0.7193 respectively. As the location of 3B was more interior 10B could be used as potent antigen. According to epitope locations ( Figure 4 ) and antigenic nature, other sequences like 8A&B, 9A&B could be the target also. Coronavirus spike proteins are glycosylated in nature where N-acetyl glucosamine (NAG) is the main component. Glycan shielding and possible epitope masking of an HCoV-NL63 has been observed which may be the barrier for proper immunogenic responses [8] . Comparative analysis between glycosylated and non-glycosylated protein revealed some structural modification at the epitope locations. Among the identified epitopes 10B with sequence ITPGTNTSNQVAVLY (598-612) was found with N linked glycosylation at 603 position. The structural modification of this epitope was analyzed using non-glycosylated protein structure of COVID 19 (Acc. No.: NC_045512.2:21563-25384) and gltcosylated COVID 19 protein (PDB ID: 6vsb). Effect of glycosylation on protein structures revealed that glycosylated conformation was more organised (Figure 5a ) than non-glycosylated one (Figure 5b ). Secondary structural comparison between two epitopes showed more organised structure with attached NAG residue (Figure 5d ) whereas a shorter β-sheet structure was observed when NAG is removed from the structure (Figure 5c ). The peptide interactive site of 10B epitope was blocked due to NAG attachment. As a result of which antibody binding to the antigen may hamper. The NAG residue directly binds with N or ASN amino acid residue (Figure 5e ). So the removal of NAG from the spike glycoprotein structure is difficult. Structural distortion between glycosylated and non-glycosylated epitope 10B at tertiary level indicated that removal of NAG may distort the structure of epitope (Figure 5f ). Again that may hamper the proper antigenantibody binding. In this section energetics of epitope attachment with MHC class II HLA-DRA, DRB was determined in presence and absence of NAG at the 10B epitope structure through molecular docking ( Figure 6 ). Docking results showed that without NAG, the binding efficiency of 10B Though, epitope 8 A & B were also present at the surface of the spike glycoprotein but was found to wrapped with a short segment IGAEHVNNSYECD (651-663) carrying a glycosylation at N residue position 657 (Figure 7a ). As a result of which antibody accessibility to this epitope may also be difficult. Whereas, surface epitope 9 with sequence VRDPQTLEILDITPC (576-590) showed highest antigenecity of 1.1285 (Table 3) can recombine to gain entry into human cells are the points also to be noted [35] . SARS CoV-2 induced severe and often lethal lung failure is caused due to its inhibition of ACE-2 expression [36] . So, keeping the ACE-2 normal functioning but blocking viral entry is the most challenging issue right now. Possible suitable epitope as screened in the current study may be helpful in this global pandemic situation. The history of last two decades' outbreak of these types of virus is very much evident. The present situation justifies further advanced studies with proper infrastructure and fund-resources facilities at a global scale to eradicate current or any possible future outbreak. Position of Epitope 9 in COVID 19 spike protein with no direct or indirect NAG attachment, PDB ID: 6svb (b). Among 10 best docking positions, 8 were found in epitope presenting site of MHC II HLA-DRB1, PDB ID: 5jlz (c). The best docking posture of epitope 9 with MHC II HLA-DRB1 (d) and its specific interaction pattern (e). Genetic diversity and evolution of SARS-CoV-2 Evolutionary Trajectory for the Emergence of Novel Coronavirus SARS-CoV-2 COVID-2019: The role of the nsp2 and nsp3 in its pathogenesis Potential interventions for novel coronavirus in China: A systematic review Coronavirus Disease 2019 (COVID-19) Pandemic and Pregnancy Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy Angiotensin receptor blockers as tentative SARS-CoV-2 therapeutics Of chloroquine and COVID-19 Chloroquine and hydroxychloroquine as available weapons to fight COVID-19 Human coronavirus circulation in the United States Pathogenic human coronavirus infections: causes and consequences of cytokine storm and immunopathology Seasonality of Respiratory Viral Infections Dysregulation of immune response in patients with COVID-19 in Wuhan, China Immune responses in COVID-19 and potential vaccines: Lessons learned from SARS and MERS epidemic Is COVID-19 receiving ADE from other coronaviruses Anti-spike IgG causes severe acute lung injury by skewing macrophage responses during acute SARS-CoV infection Lianhuaqingwen exerts anti-viral and antiinflammatory activity against novel coronavirus (SARS-CoV-2) A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence viruSITE -integrated database for viral genomics The Phyre2 web portal for protein modeling, prediction and analysis SWISS-MODEL: homology modelling of protein structures and complexes Structure validation by Calpha geometry: phi,psi and Cbeta deviation ProFunc: a server for predicting protein function from 3D structure ClustalW and ClustalX version 2 The InterPro protein families database: the classification resource after 15 years SVMTriP: A method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity The immune epitope database and analysis resource program 2003-2018; reflections and outlook VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines HawkDock: a web server to predict and analyze the protein-protein complex based on computational docking and MM/GBSA Interactive Docking Prediction of Protein-Protein Complexes and Symmetric Multimers Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach Stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses A crucial role of angiotensin converting enzyme 2 (ACE2) in SARS coronavirus-induced lung injury Human Coronavirus NL63 (Acc. No.: NC_005831 Human coronavirus 229E (Acc. No.: NC_002645.1 Human coronavirus OC43 strain ATCC VR-759 (Acc. No.: YP_009555241.1) (Template: 6nzk.1A) Maximum str. similarity, Alignment, Ramachandran plot alikeness and minimum structural distortion are noticed b/w SARS CoV and COVID 19 SARS coronavirus 2 isolate Wuhan-Hu-1 (Acc. No.:NC_045512 10B epitope position on COVID 2 or COVID 19 spike protein (Acc. No.: NC_045512.2:21563-25384), no NAG attached with N residue at the 603 position (b). Secondary structure of epitope 10B without NAG attachment (c) and with NAG attachment (d). Close view of NAG attachment with N residue in 6vsb at position 603 (e) With NAG epitope 10B binding to MHC class II HLA-DRA, DRB epitope binding site (c, d) and different molecular interactions of 10B epitope with MHC class II (f). Lower panel of tabulated image describes amino acids responsible for stable binding between epitope 10B and MHC No.