key: cord-307701-fujejfwb authors: Gaurav, Shubham; Pandey, Shambhavi; Puvar, Apurvasinh; Shah, Tejas; Joshi, Madhvi; Joshi, Chaitanya; Kumar, Sachin title: Identification of unique mutations in SARS-CoV-2 strains isolated from India suggests its attenuated pathotype date: 2020-06-07 journal: bioRxiv DOI: 10.1101/2020.06.06.137604 sha: doc_id: 307701 cord_uid: fujejfwb Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), which was first reported in Wuhan, China in November 2019 has developed into a pandemic since March 2020, causing substantial human casualties and economic losses. Studies on SARS-CoV-2 are being carried out at an unprecedented rate to tackle this threat. Genomics studies, in particular, are indispensable to elucidate the dynamic nature of the RNA genome of SARS-CoV-2. RNA viruses are marked by their unique ability to undergo high rates of mutation in their genome, much more frequently than their hosts, which diversifies their strengths qualifying them to elude host immune response and amplify drug resistance. In this study, we sequenced and analyzed the genomic information of the SARS-CoV-2 isolates from two infected Indian patients and explored the possible implications of point mutations in its biology. In addition to multiple point mutations, we found a remarkable similarity between relatively common mutations of 36-nucleotide deletion in ORF8 of SARS-CoV-2. Our results corroborate with the earlier reported 29-nucleotide deletion in SARS, which was frequent during the early stage of human-to-human transmission. The results will be useful to understand the biology of SARS-CoV-2 and itsattenuation for vaccine development. Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) is the cause of the novel human Corona Virus Disease COVID-19, first reported on November 17 th , 2019 in Wuhan, China [12] . It has spread rapidly creating a global public health emergency situation in the last few months and the World Health Organization (WHO) on March 11 th , 2020 declared it a pandemic. Till date, SARS-CoV-2 has infected close to five million people and caused more than 328,000 deaths globally. Although started late, India showed ramping of more than 112,000 reports of SARS-CoV-2 infection including more than 3,400 deaths, as of May 20 th , 2020. As compared to the other recent coronavirus epidemics SARS, 2003 and MERS (Middle East Respiratory Syndrome), 2012 having mortality rates of 11% [4] and 35% [8] , respectively, SARS-CoV-2 has a current mortality rate of about 3.4% [7] . However, the transmissibility of SARS-CoV-2 recorded far greater than the previous coronavirus epidemics. SARS-CoV-2 has a positive-sense 29.9 kbp RNA genome, encoding 12 open reading frames (ORFs) [6] . It codes for four structural proteins, namely spike glycoprotein (S), envelope glycoprotein (E), membrane glycoprotein (M), and nucleocapsid protein (N) [12] . S protein is a large membrane protein which helps in endocytosis of the viral particle into the host cell by binding to the angiotensin-converting enzyme 2 (ACE2) receptor [15] . E protein is a small viral transmembrane protein which helps in the assembly of the virions by forming ion channels inside the host cell [5] . M protein is the most abundant protein present on the viral membrane, important for morphogenesis and viral assembly [2] . N protein is responsible for the packaging of the viral genome and plays an important role in viral assembly [13] . The genome also codes for 2 overlapping polyproteins, namely polyprotein 1a (PP1a) and polyprotein 1ab (PP1ab); PP1a being a truncated version of PP1ab. These code for 16 non-structural proteins (NSPs) including proteases (3C-like protease and Papain-like protease) and RNA-processing enzymes like RNAdependent RNA polymerase, Helicase, 3'-5' Exonuclease, Endoribonuclease, Guanine N7methyltransferase, 2'O-ribose methyltransferase and ADP ribose phosphatase [12] . In addition to structural and NSPs, SARS-CoV-2 genome also codes for at least two other viroporin candidates (other than the E protein), namely ORF3a and ORF8 [3] . The function and the possible contribution of other ORFs in the viral infection are largely unknown. There is a global race to sequence the SARS-CoV-2 to understand its variability among the population and identify the unique mutation. Gujarat Biotechnology Research Centre (GBRC), Gujarat India, actively involved in the complete genome sequencing of SAR-CoV-2 strains from India. More than 144 complete genome sequences of SARS-CoV-2 strains from the state have been completed. The samples were collected from the COVID-19 testing center following the guidelines of Indian council of Medical research. The complete genome sequencing of the samples were carried out by the gene specific primer sets using the Ion S5™ next-generation sequencing system (Thermo Fisher Scientific, USA) following manufacturer's protocol. In this study, we are reporting the unique mutations in the SARS-CoV-2 genome isolates from India. The two samples from a male and female aged 66 (husband and wife) has been procured from the COVID-19 testing facility in Gujarat, India. Husband and wife contracted infection from their son who got it from local/community transmission. Both were recovered without having severe symptoms of COVID-19. The complete genome sequence analysis of SARS-CoV-2 isolated from both the patients showed a size of 29902 bp each (accession numbers MT435081 and MT435082), in contrast with 29903 bp size of the Wuhan strain. However, the single nucleotide deletions in the genome of virus isolates from these two patients were at different positions. Furthermore, the sequence revealed a total of ten mutations each across the genome as compared to SARS-CoV-2 strain from Wuhan. The details of the mutations in the Indian SARS-CoV-2 isolates areprovided in Table 1 . The genome sequence of SARS-CoV-2 from the virus isolates of two patients in Gujrat, India, were analyzed and compared with the isolate from Wuhan, China. The single nucleotide change in the viral genome could change the encoding amino acid, which might result in the conformational change in the viral structure protein [11] . The genome of virus isolates from both patients showed a point mutation of C241T corresponding to its genome length and lies in the 5' untranslated region (5' UTR). It is unlikely that this mutation has any substantial effect on viral replication. The nucleotide change of C1059T in the protein ORF1ab has directed threonine, a polar uncharged amino acid, to mutate into a hydrophobic amino acid, isoleucine. Threonine residues are well known to form hydrogen-bond interactions with surrounding polar residues in the interior of protein structure. The amino acid residues around this threonine at position 265 corresponding to the polyproteinORF1ab correspond to 262 K-F-D-T-F-N-G-E-C-P 271 suggesting that threonine might form hydrogen bonds with surrounding polar residues. Perhaps it might also assist in the formation of protein bend caused due to glycine and proline having the position of i, i+3 with respect to each other. Isoleucine being hydrophobic might disturb any conformational protein bend due to its inability to form a hydrogen bond. This mutation might also result in increased hydrophobic interactions with surrounding phenylalanine residues. Since very little is known about NSP2 or any of its homologous structure, no assertion at the structural level could be made. In the native form, NSP12 is bound with the cofactors NSP7 and NSP8 to form the RNA dependent RNA polymerase (RdRP) complex. The mutation of C14408T at NSP12 in Indian isolates of SARS-CoV-2 lead to the substitution of proline with threonine. The proline residue at position 323 corresponding to the polyprotein results in an unusual turn in the alpha-helix in which it resides [14] . Interestingly, the α -helix is one of the sites for the interaction of NSP12 with the cofactor, NSP8. However, this mutation of proline causes significant changes into the structure due to the disturbance of that turn in that α -helix (Fig. 1A) . This would probably change the entire structure of that interacting site, leading to the decreased affinity of RdRP to NSP8 which likely disturbs its functioning. Mutation study in the 3-D structure of NSP12 ( Mutation of A23403G causes the substitution of aspartic acid with glycine. This mutation in the S protein lies in the small loop turn between two anti-parallel beta-strands. These betastrands participate actively in the trimerization of spike protein which is vital for binding of S protein to its receptor ACE2 [14] . Replacing the aspartic acid residue with the much more flexible glycine might end up in disturbing the β -strands. Prediction of effect of the mutation on the secondary structure (Fig. 2 A) suggests that this mutation is likely to disrupt one of the four beta-strands in that region, thus possibly inhibiting the trimerization of spike protein on the surface of the virus. This could significantly decrease the affinity of S protein to ACE2 receptors as well. As a result of the mutation of G25563T, glutamine gets substituted into histidine in the (Fig. 2 B) suggests that the C-terminal of this protein (70-118 amino acids) in the wild-type ORF8 protein was found to have four uniformly-spaced beta-strands. However, this mutation was found to essentially chop off the fourth beta-strand. The highly organized secondary structure of the C-terminal of the protein is likely to have some function in the viral replication or infectivity. This mutation truncating the fourth beta-strand probably disturbs its function. Thus, this mutation might be severely affecting the functionality of the ORF8 protein. We also found this same mutation in the virus isolate from an unrelated individual suggesting that this mutation might be relatively common. This result has a striking similarity with that of an in-vivo experiment reported earlier on SARS CoV-1 where a 29-nucleotide deletion in the ORF8 gene was the most obvious genetic change in the early stage of human-to-human transmission of SARS CoV-1 [9] . Moreover, the 29-nucleotide deleted SARS CoV-1 strain had a 23-fold less viral replication as compared to its wild type, suggesting that this mutation effectively attenuated the virus. The conspicuous similarity with our result might shed light into the attenuation of the Indian strain of SARS-CoV-2 in humans. It would be interesting to further explore its possibility for vaccine development. Interestingly, the mutation in ORF8 was found when A at 28254 th position was found to be deleted. However, another stop codon was formed 12 nucleotides farther due to this can compromise the glycosylation process. Presence of glycine and proline residues around this region is expected to give the structure kink and expose these residues for a potential O-linked glycosylation process. Hence, this mutation of serine into isoleucine is expected to heavily inhibit and compromise this modification in this region. The authors declare no conflict of interest. Zhang H, Penninger JM, Li Y, Zhong N, Slutsky AS (2020) Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target. Intensive Care Med 46:586-590 Comparison between the predicted secondary structure of wild-type ORF8 protein (upper) and E110stop mutated ORF8 protein (lower). The E110stop mutation essentially truncates off the last beta-strand (110 th to 118 th amino acid) in the secondary structure of the protein. The other column includes, protein pertaining to the mutation site (protein annotation), reference Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2): An overview of viral structure and host response Role of Severe Acute Respiratory Syndrome Coronavirus Viroporins E, 3a, and 8a in Replication and Pathogenesis SARS: prognosis, outcome and sequelae In-silico approaches to detect inhibitors of the human severe acute respiratory syndrome coronavirus envelope protein ion channel The Architecture of SARS-CoV-2 Transcriptome Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges Comparative and kinetic analysis of viral shedding and immunological responses in MERS patients representing a broad spectrum of disease severity Attenuation of replication by a 29 nucleotide deletion in SARS-coronavirus acquired during the early stages of human-to-human transmission Formation and transfer of disulphide bonds in living cells A new integrated symmetrical table for genetic codes A new coronavirus associated with human respiratory disease in China Evidence for Gastrointestinal Infection of SARS-CoV-2 Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 We would like to thank the Department of Science and Technology, Government of Gujarat for financial support. nucleotide at the same position in Wuhan SARS-CoV-2 sequence, the mutated nucleotide (allele nucleotide), and the corresponding amino acid change due to the point mutation (amino acid mutation) in the full length genome sequence of the Indian virus isolates. Protein Annotation