key: cord-336343-qbcb9qi3 authors: Agarwal, Ajay title: in-silica Analysis of SARS-CoV-2 viral strain using Reverse Vaccinology Approach: A Case Study for USA date: 2020-06-16 journal: bioRxiv DOI: 10.1101/2020.06.16.154559 sha: doc_id: 336343 cord_uid: qbcb9qi3 The recent pandemic of COVID19 that has struck the world is yet to be battled by a potential cure. Countless lives have been claimed due to the existing pandemic and the societal normalcy has been damaged permanently. As a result, it becomes crucial for academic researchers in the field of bioinformatics to combat the existing pandemic. The study involved collecting the virulent strain sequence of SARS-nCoV19 for the country USA against human host through publically available bioinformatics databases. Using in-silica analysis and reverse vaccinology, two leader proteins were identified to be potential vaccine candidates for development of a multi-epitope drug. The results of this study can provide further researchers better aspects and direction on developing vaccine and immune responses against COVID19. This work also aims at promoting the use of existing bioinformatics tools to faster streamline the pipeline of vaccine development. The Situation of COVID19 A new infection respiratory disease was first observed in the month of December 2019, in Wuhan, situated in the Hubei province, China. Studies have indicated that the reason of this disease was the emergence of a genetically-novel coronavirus closely related to SARS-CoV. This coronavirus, now named as nCoV-19, is the reason behind the spread of this fatal respiratory disease, now named as COVID-19. The initial group of infections is supposedly linked with the Huanan seafood market, most likely due to animal contact. Eventually, human-to-human interaction occurred and resulted in the transmission of the virus to humans. [13]. Since then, nCoV-19 has been rapidly spreading within China and other parts of World. At the time of writing this article (mid-March 2020), COVID-19 has spread across 146 countries. A count of 164,837 cases have been confirmed of being diagnosed with COVID-19, and a total of 6470 deaths have occurred. The cumulative cases have been depicting a rising trend and the numbers are just increasing. WHO has declared COVID-19 to be a “global health emergency”. [14]. Current Scenario and Objectives Currently, research is being conducted on a massive level to understand the immunology and genetic characteristics of the disease. However, no cure or vaccine of nCoV-19 has been developed at the time of writing this article. Though, nCoV-19 and SARS-CoV are almost genetically similar, the respiratory syndrome caused by both of them, COVID-19 and SARS respectively, are completely different. Studies have indicated that – “SARS was more deadly but much less infectious than COVID-19”. -World Health Organization Currently, research is being conducted on a massive level to understand the immunology and genetic characteristics of the disease. However, no cure or vaccine of nCoV-19 has been developed at the time of writing this article. Though, nCoV-19 and SARS-CoV are almost genetically similar, the respiratory syndrome caused by both of them, COVID-19 and SARS respectively, are completely different. Studies have indicated that -"SARS was more deadly but much less infectious than COVID-19." The spread of SARS epidemic has provided with any useful insights as during that time the virus epidemic was contained only through general prevention means and treatment of the individual symptoms. As a result, there exists only a limited number of tools available to test the coronavirus for their ability to infect humans. This has acted as a major limitation for predicting the next zoonotic viral outbreak. [13] [14] [15] . In response to the current medical crisis, the World Health Organization has activated the R&D Blueprint for the acceleration of the development of diagnostics, therapeutics and vaccines for treatment of this novel coronavirus. The objective of this study lies in compliance to the guidelines for research activated by WHO. This study aims to utilize a reverse-vaccinology approach in order to identify potential vaccine candidates for COVID19 for the country USA. We shall be deploying open-access bioinformatics tools for our analysis of the same. The virulent strain of SARS-CoV-2 was selected for in-silico analysis. The genome of the viral strain is made available by NCBI -National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov.in). The reference identification for SARS-CoV-2 is given by RefSeq NC_045512.2 14534 viral protein sequences of the SARS-CoV-2 were obtained from the ViPR -Virus Pathogen Database and Analysis. [1] . These protein sequences were identified and downloaded in a tabular format for the Host -Humans and the Country -USA. The FASTA-file for all the 14535 protein sequences was downloaded and loaded in R using "SeqinR" and "Biostrings" packages. The different physicochemical properties were analysed using the "Peptides" package in the R. [2] [3] [4] . VaxiJen 2.0 is used to predict the antigenicity of the protein based on the FASTA-files that contain their respective amino acid sequences. The online software predicts the antigenic score for the same. [5] [6] . The T-cell and B-cell epitopes of the selected two leader protein sequences were predicted using the IEDB (The Immune Epitope Database). IEDB is a freely available resource funded by NIAID. It is a catalog that contains the experimental data on antibody and T cell epitopes studied in humans, non-human primates, and other animal species in the context of the infectious disease, allergy, autoimmunity, and transplantation. It also assists in hosting tools for predicting and analysing the epitopes. [7] . For MHC Class-I T-cell epitope prediction for ORF1ab leader proteins, NetMHCpan EL 4.0 method was used. This method was applied for the HLA-A*11-01 allele. For MHC Class-II T-cell epitope prediction for ORF1ab leader proteins, Sturniolo prediction method was used. This method was applied for the HLA DRB1*04-01 allele. For B-cell lymphocyte epitope prediction, Bepipered Prediction Method was used. [8] . Ten epitopes were selected on random for analysis of their antigenicity, allergenicity, topology and conservancy. AllerTOP v2 (https://www.ddg-pharmfac.net/AllerTOP/) was used for determining the allergenicity of selected epitopes. ToxinPred server (https://webs.iiitd.edu.in/raghava/toxinpred/protein.php) was used to determining the toxicity of the selected epitopes. The prediction of transmembrane helices in proteins was determined using the TMHMM Server v2.0 (http://www.cbs.dtu.dk/services/TMHMM/). The conservancy of the selected epitopes was analysed using the conservancy analysis tool of the IEDB server. For this analysis, the parameter for the sequence identity threshold was adjusted to '>=50'. [7] . In order to generate the 3D structure of the epitopes choses, the PEP-FOLD3 tool was used. PEP-FOLD (http://bioserv.rpbs.univ-paris-diderot.r/services/PEP-FOLD3/) uses a de novo approach to predict the peptide sequences from the amino acid sequences. [9] [10] [11] . It utilizes the structural alphabet SA letters to describe the conformations of four consecutive residues. The pre-docking of the selected epitopes was done using the UCSF Chimera. It was also used to perform pre-docking of the selected alleles HLA-A*11-01 (for MHC Class-I) and HLA DRB1*04-01. Later, the docking of the peptide-protein was done using HPEPDOCK. HPEPDOCK is a web server of performing blind peptide-protein utilizing hierarchical algorithm. [12] . Instead of performing length stimulations to refine peptide conformations, HPEPDOCK studies the peptide flexibility through an ensemble of peptide conformations produced by the MODPEP program. The SARS-CoV-2 strain was identified. All the 14534 protein sequences were analysed on the basis of the physicochemical properties to select the top five candidates for further analysis. For the physicochemical analysis, the number of amino acids, instability index, aliphatic index, and the grand average of hydrophobicity (GRAVY) scores of the all the 14,534 proteins were taken using the Peptides package in R. This packages allows the identification, selection and analysis of multiple amino acid sequences in the same FASTA-file. Hence, it was utilized. The physicochemical study unveiled five potential candidates having the lowest score of GRAVY and instability index less than 40 (hence, displaying stability). These five candidates were individually analysed for their molar extinction coefficients and antigenicity. The VaxiJen 2.0 tool was used to analyse the antigenicity of the potential five candidates and the molar extinction coefficient was analysed using the ExPASy's online tool -ProtParam. Out of the five potential candidates, only two were selected for further analysis. This was based on the criteria of having the highest score of predicted antigenicity and highest values of the extinction coefficients. Both the leader proteins were then used for further analysis. The T-cell epitopes of the MHC Class-I for both the leader protein were determined using the NetMHCpan EL 4.0 prediction method of the IEDB server keeping the sequence length at 9. This tool allows the generation of the epitopes and sorts them on the basis of their percentile scores. Randomly, ten potential epitopes were selected randomly for the antigenicity, allergenicity, toxicity, and conservancy tests. For MHC class-II, T-cell epitopes (HLA DRB1*04-01 allele) of the proteins were also determined using the IEDB Analysis tools. The Sturniolo prediction methods were used for the same. Again, ten potential candidates were chosen based on the same criteria as that of MHC Class-I. The B-cell epitopes of the proteins were selected using the Bepipered Linear Epitope Prediction method of the IEDB server. The topology of the chosen epitopes was determined using the TMHMM v2.0 server (http://www.cbs.dtu.dk/services/TMHMM/). The potential T-cell epitopes, whose topology, antigenicity, allergenicity, toxicity and conservancy was analysed, for ORF1ab leader proteins MT326102 and MT326175 are depicted in the following tables. Table 4 Generation. On the analysis of the antigenicity, allergenicity, toxicity, and conservancy analysis of the Tcell epitopes, it was found that most of them were antigenic, simultaneously being nonallergenic, non-toxic and higher values of conservancy. From the ten selected MHC Class-I and MHC Class-II T-cell epitopes, one from each category were selected from both the leader proteins. The criteria for being selected was having the higher antigenic scores, nonallergenicity, non-toxicity, and conservancy value above 90%. The selected epitopes were: RSDARTAPH, VQLNNNNNN, LSLPVLQVR, and VQLSLPVLQ. The B-cell epitopes of the ORF1ab leader proteins are displayed in the following two figures. The following figures depict the PEP-FOLD3 generated 3D structures of the selected T-cell epitopes of MHC Class-I and Class-II: RSDARTAPH, VQLNNNNNN, LSLPVLQVR, and VQLSLPVLQ. leader protein MT326175. ViPR: an open bioinformatics database and analysis resource for virology research SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis Biostrings: Efficient manipulation of biological strings Peptides: a package for data mining of antimicrobial peptides VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Identification of novel vaccine candidates against campylobacter through reverse vaccinology The immune epitope database (IEDB) 3.0. Nucleic acids research Exploiting the Reverse Vaccinology Approach to Design Novel Subunit Vaccine against Ebola Virus PEP-FOLD: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides Improved PEP-FOLD approach for peptide and miniprotein structure prediction PEP-FOLD3: faster de novo structure prediction for linear peptides in solution and in complex HPEPDOCK: a web server for blind peptide-protein docking based on a hierarchical algorithm This research didn't receive any specific grant from funding agencies. The author declares that there is no conflict of interest. The study doesn't contain any work/experiment performed with human participants or animals done by the author. HPEPDOCK Server was used to perform docking of the peptide and protein. The purpose of the same was to analyse which of the two selected T-cell MHC Class-I epitope: RSDARTAPH and LSLPVLQVR had the lowest global energy. The epitope having the lowest global energy acts as a better vaccine candidate. The docking was performed against the HLA-A*11-01 allele whose pdb format was obtained through pre-docking using UCSF Chimera.The global energy for the selected Class-I epitopes were: -182.706 and -191.198 respectively for RSDARTAPH and LSLPVLQVR. Out of the two MHC Class-I epitopes selected for leader proteins ORF1ab MT326102 and MT326175, the global energy was lowest for LSLPVLQVR. For the MHC Class-II T-cell epitope, the docking was performed against the HLA DRB1*04-01 allele. Same procedure was followed as mentioned before. The two selected epitopes after the previous analysis were: VQLNNNNNN and VQLSLPVLQ. The global energy for the selected epitopes were -203.369 and -238.196 respectively.Out of the two T-cell MHC Class-II epitopes selected for the ORF1ab leader proteins MT326102 and MT326175, the lowest global energy was of VQLSLPVLQ. The scope of this study involved performing an in-silica analysis of the SARS-CoV-2 viral strain for the country of USA against human host. The ViPR database was used for the same to obtain all the protein sequences. A total of 14534 viral protein sequences were obtained whose extensive physicochemical analysis was done to select a group of two. This extensive analysis was performed using Peptides package in the R Software.It was revealed that the two leader proteins ORF1ab MT326102 and MT326715 had the highest extinction coefficient and the lowest score on the GRAVY. Along with the given parameters, these leader proteins were highly stable and were also antigenic in nature.The FASTA-formatted files of these selected proteins were taken and analysed to obtain the potential T-cell and B-cell epitopes. The T-cell epitopes of MHC Class-1 and MHC Class-II were analysed on the basis of their scores. Ten randomly selected T-cell epitopes from both the classes were taken for further analysis of allergenicity, toxicity, conservancy scores and antigenicity. Only one epitope from both the classes was selected which possessed a higher conservancy score (more than 90%), was non-toxic, non-allergic and antigenic in nature.The two selected epitopes were then docked against their respective alleles to obtain the global energy scores. The epitopes which displayed the lowest global energy score on docking with the alleles were selected and proposed as successful and potential vaccine candidates Funding: