key: cord-1041068-ien7z5gd authors: Gulzar, Sumaira; Husssain, Saqib title: Bioinformatics Study on Structural Proteins of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-COV-2) For Better Understanding the Vaccine Development date: 2020-04-23 journal: bioRxiv DOI: 10.1101/2020.04.21.053199 sha: d5ea77a01a50176b531119836279d718f33655da doc_id: 1041068 cord_uid: ien7z5gd Novel coronavirus 2019 (2019-nCoV), also known as SARS-CoV-2), leads high morbidity and mortality in global epidemics. Four structural proteins (surface glycoprotein (QIQ22760.1), envelop glycoprotein (QIQ22762.1), nucleocapsid phosphoprotein (QIQ22768.1) and membrane glycoprotein (QIQ22763.1)) of SARS-CoV-2 are extracted from the NCBI database and further analyzed with ExPASy ProtParam tool. Lucien is the highest in envelope, surface and membrane glycoprotein that is an optimal environment for rapid virus fixation on host cell's surface to the receptor molecule. Transmembrane region prediction was performed by SOSUI server. For all structural proteins, except nucleocapsid Phosphoprotein, the trans-membrane prediction indicates that the virus can enter the host easily. Domain analysis was done by SMART tool. Domain information helps in the function of the viral protein. Lastly, the 3D structure prediction was carried out by Swiss Model and the result validation was achieved by PROCHECK. Such models are the starting point of the community for structural drug and vaccine designs as well as virtual computational screening. The latest 2019-nCoV, now officially known as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), was established as the responsible pandemic. Infections of (SARS-CoV-2) have been reported in more than 200 countries including Pakistan [1] . The first genome sequence of SARS-CoV-2 to be published from Pakistan and is now available on NCBI https://www.ncbi.nlm.nih.gov/nuccore/MT240479, GISAID and NEXTRAIN. Coronaviruses, a genus belonging to the Coronaviridae family, contain the largest genome (26.4 kb to 31.7 kb) of RNA viruses with a diameter of 120-160 nm, consisting of a single-stranded positive-sense RNA molecule [2] The broad genome has given this family of viruses extra plasticity in their adaptation and alteration of genes. The G + C content of coronavirus genomes ranges from 32% TO 43% The genome consists of seven genes arranged in sequence [5′replicase ORF1ab, spike (S), envelope (E), membrane (M), nucleocapside (N)-3′] with small untranslated regions in both termini and additional ORFs in each subgroup of coronavirus. [3, 4] 5 'Non-structural protein coding regions consisting of two-thirds genome replicase genes 1 and structural and non-essential gene. 2-7 regions consisting of structural and non-essential protein coding regions [5] . The genes are translated from genomic mRNA 2 -7. Subgenomic RNAs encode the main Surface Proteins (S), Envelope Protein (E), Membranes Protein (M), and Nucleocapside Protein [6] . Surface protein of SARS CoV-2 join ACE2 (angiotensin-converting enzyme2) and to infect cells. After this initial process, Surface Protein must be produced with an enzyme known as protease to complete entry into the cell. SARS CoV-2 uses TMPRSS 2 in the same way as SARS-CoV is used to complete this process [8] Surface Glycoproteins are outside of the virion and give the typical shape to the virion. The S proteins form homotrimers that allow sun-like morphologies to be developed that give the name Coronaviruses via the Cterminal transmembrane regions, S proteins bind to the virion membrane and interact with M proteins. Virions can be attached through the N-terminus of the S proteins to different surface receptors in the host cell's plasma membrane. The S protein is the receptor-binder and viral input in host cells and is therefore a major therapeutic objective [9] . Membrane Glycoproteins proteins inside the Golgi system are glycosylated. The modification of the M protein is essential for the virion to attach into the cell and make antigenic protein [10] . The protein M plays a key role in the cell's regeneration of virions. N protein forms a complex by binding to In this paper, we tried to describe the four structural proteins of SARS-CoV-2 by the use of bioinformatics tools II. MATERIALS AND METHODS Four structural proteins of SARS COV-2 included in this research named as surface glycoprotein (QIQ22760.1), envelop glycoprotein (QIQ22762.1), nucleocapsid phosphoprotein (QIQ22768.1), membrane glycoprotein (QIQ22763.1) available in the NCBI was retrieved. For further study of bioinformatics, the FASTA sequence was selected and used. Using the Protparam method, protein statistics were reviewed. ProtParam method enables the measurement of various physicochemical protein sequence parameters. Parameters included molecular weight, Theoretical Pi (Isoelectric point), Half-life calculation, instability index, aliphatic index and Grand range of hydropathicity (GRAVY) Trans-membrane sequence prediction SOSUI tools from EXPASY repository are used for predicting transmembrane. Results normally return within 1 min. It provides the following production number of TMHs with sequence and protein type One of the main goals of molecular biology is functional assignment to protein. Protein functional sites that are responsible for or perform all of the essential protein functions are more conserved than other regions over the time of evolution. These accessible websites are called domains. SMART Internet server was forecasting domains The SARS COV-2 structural protein amino acid sequences are used as targets for homology modeling using the SWISS-MODEL server. Depending on the Global Model Quality Estimation (GMQE) and QMEAN, the top-ranked models are further analyzed and sorted. GMQE is a quality estimate that combines the targettemplate alignment properties with the template search method. The corresponding GMQE value is given as 0 to 1. The QMEAN Z-score gives a global estimation of the "degree of nativeness" of the structural characteristics observed in the model and is described in Benkert et al. III. RESULTS Phenyl alanine and serine are absolutely missing in all proteins tested as displayed in chemical parameters (Table 1) . Histidine and glutamic acid is absent in envelope protein and cystein lacks in nucleocapsid phosphoprotein. lucien quantity is more in envelop protein, surface and membrane glycoprotein while glycine is more in nucleocapsid phosphoprotein The physical parameters (Table 2) show that Surface protein contain highest amino acids (1273), negatively charged residues (110) positively charged residues (103), EC (148960) and molecular weight (141178.47). Higher molecular weight of the surface glycoprotein suggests that its tertiary structure may contain strong amino acid side chains. All three Proteins are basic in nature, excluding surface glycoprotein, as the isoelectric point value is more than 9.The instability index value greater than 40 is considered unstable [21] This is derived from the Instability index surface, membrane and envelop glycoprotein is stable and nucleocapsid phosphoprotein is highly unstable .the range of aliphatic index from 33.01 -55.09 that suggests a tendency to be sensitive to a wide range of temperatures [22] and GRAVY value shows protein's hydropathicity and whether the nature protein side chains are hydrophilic or hydrophobic [23] .Leaving the surface Glycoprotein and nucleocapsid phosphoprotein all other protein are hydrophobic in nature Analysis of the transmembrane protein reveals ( Table 3) (Table 5 and Figure. 2) shows the evaluation of the 3D model that all four proteins have satisfactory model with QMEAN > 0.Model validity research Shows that the models are at the optimum point, just like in the Ramchandran series. (Figure 3 ). The SWISS-MODEL system was used for protein structure homology modelling and alignments for all structural proteins of SARS COV 2. First, the right protein template structures in PDB were selected using the given measures: should high reporting of the template (i.e. > 60% target associated with the template) and sequence uniqueness > 35%. Instead, as an initial criterion, we used the GMQE and QMEAN4 scoring method to distinguish well from poor ones We also carried out PROCHECK analysis to measures protein structure stereochemical consistency by comparing geometry with residue-by-residue and particularly structural geometry. There should be more than 90 % amino acid for good quality model according to the PROCHECK standard acid resided in the most favored regions. Table 5 and Figure 4 indicate that percentage of modeled proteins residues in most favoured regions (red) is 83%─91.4%, percentage of residues of the modeled proteins in additional allowed regions (yellow) is 8.6─15.6% , percentage of residues of the modeled proteins in generously allowed regions (beige) is 0.0─1.5%, and percentage of residues in disallowed regions (white) is 0.0% These findings show that the models are generated overall stereochemical properties were highly stable, and that future molecular modelling studies can benefit from the models. Of those four proteins, the Ramachandran plots provide more proof of their acceptability ( Figure 3 ). IV. CONCLUSIONS The physical parameters shows that Surface protein contain highest amino acids, negatively charged residues, positively charged residues, EC and molecular weight than other proteins. The chemical factors show in proteins phenyl alanine and serine is totally absent in all structural proteins, there is a lack of histidine and glutamic acid in the protein envelope and a lack of cysteine in the nucleocapside phosphoprotein. The envelope, surface and membrane glycoprotein is rich in Leucine so have more affinity to the host cell receptor surface as stated in Luo et al, 1999. Although glycine is more in nucleocapside phosphoprotein, researchers should be able to isolate the protein with little effort along with these data and iso-electric point value. Higher molecular weight of the surface glycoprotein suggests that its tertiary structure may contain strong amino acid side chains. Instability index surface, membrane and envelop glycoprotein is stable and nucleocapsid phosphoprotein is highly unstable Leaving the surface Glycoprotein and nucleocapsid phosphoprotein all other protein are hydrophobic in nature. The prediction of trans-membrane in all structural proteins except Nucleocapsid Phosphoprotein shows the virus ability to reach the host with ease. This fairly simple method may help us understand how antivirals and vaccines could be produced against it Furthermore, as a beginning for docking studies (small and large scale) we are now offering homology models, Diverse knowledge of the molecular biology of SARS COV_2 is required to learn more. Developing technologies will gain valuable insight into the structure of the protein in order to determine how protein disease induces, and understanding the relationship between protein-protein and protein RNA would greatly enhance our ability to develop vaccines. Meanwhile, methods of molecular simulation provide important solutions to the struggle Severe acute respiratory syndrome-related coronavirus-The species and its viruses, a statement of the Coronavirus Study Group Ninth report of the international committee on taxonomy of viruses Comparative analysis of complete genome sequences of three avian coronaviruses reveals a novel group 3c coronavirus Comparative analysis of twelve genomes of three novel group 2c and group 2d coronaviruses reveals unique group and subgroup features The molecular biology of coronaviruses Proliferative growth of SARS coronavirus in Vero E6 cells SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Cell Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission MERS-CoV spike protein: A key target for antivirals The glycosylation status of the murine hepatitis coronavirus M protein aff ects the interferogenic capacity of the virus in vitro and its ability to replicate in the liver but not the brain Coronavirus particle assembly: primary structure requirements of the membrane protein The membrane M protein of the transmissible gastroenteritis coronavirus binds to the internal core through the carboxyterminus Characterization of the coronavirus mouse hepatitis virus strain A59 small membrane protein E Coronavirus pseudo particles formed with recombinant M and E proteins induce alpha interferon synthesis by leukocytes Nucleo capsidindependent assembly of coronavirus-like particles by co-expression of viral envelope protein genes A severe acute respiratory syndrome coronavirus that lacks the E gene is attenuated in vitro and in vivo The small envelope protein E is not essential for murine coronavirus replication The intracellular sites of early replication and budding of SARS corona virus Toward the estimation of the absolute quality of individual protein structure models Structural diversity of G protein-coupled receptors and significance for drug discovery A simple method for displaying the hydropathic character of a protein A simple method for displaying the hydropathic character of a protein All authors acknowledge and thank their respective Institutes and Universities Research officer at Genome center, International Center for Chemical and Biological Sciences University of Karachi Pakistan