key: cord-0842069-qxhd3xlj authors: Bano, Iqra; Sharif, Mehmoona; Alam, Sadia title: Genetic drift in the genome of SARS COV‐2 and its global health concern date: 2021-09-23 journal: J Med Virol DOI: 10.1002/jmv.27337 sha: 28c13e59267755ad7ff863895007f3a9de2ac37d doc_id: 842069 cord_uid: qxhd3xlj The outbreak of the current coronavirus disease (COVID‐19) occurred in late 2019 and quickly spread all over the world. The severe acute respiratory syndrome coronavirus‐2 (SARS‐CoV‐2) belongs to a genetically diverse group that mutates continuously leading to the emergence of multiple variants. Although a few antiviral agents and anti‐inflammatory medicines are available, thousands of individuals have passed away due to emergence of new viral variants. Thus, proper surveillance of the SARS‐CoV‐2 genome is needed for the rapid identification of developing mutations over time, which are of the major concern if they occur specifically in the surface spike proteins of the virus (neutralizing analyte). This article reviews the potential mutations acquired by the SARS‐CoV2 since the pandemic began and their significant impact on the neutralizing efficiency of vaccines and validity of the diagnostic assays. sequence alignment with phylogenetic investigation, SARS-CoV-2 is presently reported as the most up to date member of genus Betacoronavirus (β-CoV) within Coronaviridae family and Nidovirales order. The Coronaviridae family contain an enveloped virus having a nonsegmented genome of positive single strand RNA (ssRNA) with cap at the 5′ end and poly-A tail at the 3′ end, which itself act directly as mRNA for the formation of poly-proteins. Based on the analysis of the complete genome sequence, the genome of Beta-CoVs contains few nonstructural and four structural proteins such as spike, membrane, envelope, and nucleocapsid protein. 1 The genome of coronavirus is reported as the largest genome among the other known coronaviruses having 32%-43% GC content. The genomic sequence of SARS-CoV-2 shows different lengths that range from 29.8 to 29.9 kilo-base having 12 open reading frames (ORFs) encoding 27 different proteins. 9 More than 90% amino acids within the four structural genes of SARS-CoV-2 are identical with that of SARS-CoV, except for the S-gene which diverges. 10 The genome of SARS-CoV-2 does not contain the gene for hemagglutinin-esterase that is recognized in a few Beta-CoVs. 11 Approximately 2/3rd RNA of SARS-CoV-2 contains the region ORF1a/b having 16 nonstructural protein (nsp1-16) for the transcription and replication of virus and is considered as largest ORF (pp1ab). The remaining 1/3rd of the genome contains ORF that encodes structural and accessory proteins 12 (Figure 1 ). The evolutionary tree analysis of complete genome showed correlation among SARS-CoV-2 and other coronaviruses that originate from bats and are grouped within the subgenus named Sarbecovirus F I G U R E 1 Schematic description of morphology and genome of SARS-CoV-2.(A) Virus is covered with S, M, and E protein. Inside phospholipid bilayers, the RNA is encompassed by the N-protein that is phosphorylated. (B) There are 29903 nucleotide bases and they contain 5′-UTR, ORF1a, and b that encodes 16 nonstructural proteins, 4 structural genes encoding S, M, N, and E proteins, 6 genes that code for ORF3a, 6, 7a, 7b, 8, and 10 accessory proteins, along with the 3′-UTR. The vertical red lines with circles having the same color on the genome indicate the position of 17 high-frequency mutations and co-mutations. 12 ORF, open reading frame; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; UTR, untranslated region and genus Betacoronavirus. The matrix representation with the parsimony (MRP) pseudo-sequence supertree identified that RaTG13 (MN996532), bat-SL-CoVZC45 (MG772933), bat-SL-CoVZXC21 (MG772934), and SARS-CoV-2s constituted one major clade 13 ( Figure 2 ). Particularly, the closest relative of SARS-CoV-2 is RaTG13 (MN996532) originated from bat Rhinolophus affinis, which has been previously reported by phylogenetic analysis of SARS-CoV-2 constructed with the genomic sequence. 14 MRP pseudo-sequence supertree also exhibited civet-sampled coronavirus (AY572035) as the closest relative of the SARS-CoVs. SARS-CoV-2 has a 79% similar genome sequence with SARS and 50% with Middle East Respiratory Syndrome (MERS). 15 The spike proteins of SARS-CoV-2 have 1273 amino acids which are larger than that of SARS-CoV (1255) and bat SARSr-CoVs (1245-1269). It is different from other members within subgenus Sarbecovirus due to the S protein and 76.7%-77.0% sequence of amino acids are similar with SARS-CoVs from civets as well as humans, 75%-97.7% are similar with coronavirus found in bats within the same subgenus and 90.7%-92.6% showed similarity with coronavirus found in pangolins. 10 Another unique feature within the genome of SARS-CoV-2 is that it contains four amino acid residues (PRRA) within the intersection of S1 along with S2 subunits of spike protein. Polybasic cleavage site (RRAR) is produced due to these amino acid residues that permit efficacious cleavage by furin along with many proteases. It is confirmed from structural study that the furin cleavage site decreases the stability of spike protein within SARS-CoV-2 and encourage its receptor binding. As compared to SARS-CoV, SARS-CoV-2 is also highly transmissible due to the presence of the furin cleavage site. 16 The genetic diversity of SARS-CoV-2 is critical for its competency, durability as well as pathogenesis. One of the studies on SARS-CoV-2 origin showed that the major reason for the genetic diversity of the virus is random mutation and recombination. 17 The rate of mutation in SARS-CoV-2 is around 8 × 10 −4 nucleotides/genome annually, which is very high for RNA viruses. 18 From the analysis of 220 genome sequences within the database, it has been revealed that as compared to Asia, the rate of mutation is high in Europe and North America. The genome of SARS-CoV-2 has nine putative recombinant patterns, containing six recombinant regions within S-protein and one in every RNA-dependent RNA polymerase, nsp 13 and ORF 3a. 19 Furthermore, the genome analysis recommended that the element for receptor binding within SARS-CoV-2 might conceivably emerge due to recombination between the coronavirus that was found in the pangolin along with RaTG13. 20 Mutation in the S-protein is a major issue of concern as it might alter tropism and pathogenicity of the virus. It has been predicted that mutation might enhance ACE-2 binding affinity, which is a key determinant of SARS-CoV-2 infectivity. 21 Mutation is one of the most important mechanisms that is responsible for the evolution of RNA viruses. 22 Different studies have been conducted for the recognition of genomic variation of SARS-CoV-2, and revealed different types of genetic variations including missense, insertion, noncoding, synonymous as well as deletion mutation. 23 According to the WHO, among 5775 distinct variants, the most frequent type of mutations were missense mutation (2969 variants) and synonymous mutations (1965) in SARS-CoV-2. 24 In different studies, genetic analysis has reported mutations in a few genes which include ORFs like ORF1ab, 3a, 6, 7, 8, 10, S, N, E, as well as M. However, nsp1, nsp2 nsp3, nsp12, and nsp15 of ORF1ab, ORF8 and S genes have also a large number of mutations among the other genes. 25 In addition, two insertion mutations with known effects were identified on ORF1ab. 15 Among the other known mutations, the most common mutations are 241C>T placed on 5′-untranslated region (UTR), 14408C>T placed on nsp12, 3037C>T placed on nsp3, and 23403A>G. 26 In addition, 5′-UTR and 3′-UTR have noncoding mutations and may affect the packaging and titers of SARS-CoV-2. 27 Based on various studies, it has been found that frame-shift mutation also occurs in different regions of the genome, except M gene. These deletions alter the 3D structure of the virus which affects its virulency, pathogenesis, and host innate immune responses. 28 The sequence of SARS-CoV-2 genome showed more spot mutations on nsp12 as compared to Asian viral genome. 29 Reportedly, co-mutations were also found such as 241C>T (in 5′-UTR) with 3037C>T (F105F), 28144T>C (L84S), and 23403A>G (D614G) along with 8782C>T (S75S) with 28144T>C (L84S) and 18060C>T>C (L6L). In addition, 241C>T leader mutation in the European viral genome coexisted with three mutations such as 3037C>T (F105F), 14408C>T (P323L), and 23403A>G (D614G) that led to high COVID-19 infection rate, which showed that these four mutations play a key role in increasing viral transmission. 30 Similarly, in March 2020, another study showed that variants of SARS-CoV-2 having G614 within the spike protein replaced the original D614 form and became world dominant form. According to WHO, the largest clade was D614G, which had five subclades correlated with it. Moreover, almost every strain having D614G mutation altered the proteins for viral replication. As this protein is a target for antiviral drugs such as remdesivir and favipiravir, it might be possible that strains of SARS-CoV-2 become resistant to these drugs and multiply quickly. Based on the Global initiative on sharing all influenza data (GISAID) nomenclature system, the genomes o SARS-CoV-2 were separated into seven major clades such as L to which the reference strain of F I G U R E 2 Phylogenetic supertree illustrated the evolution of SARS-CoV-2 by using a protein source. 14 MRP (Matrix representation with parsimony) pseudo-sequence supertree is constructed by using source phylogenetic trees for phylogenetic analysis of nine SARS-CoV-2 along with 5 SARS-CoV, 2 MERS-CoV, and 11 bat coronaviruses as outgroups. MAFFT (Multiple Alignment using Fast Fourier Transform) is used for the alignment of amino acid sequences and phylip file was formed by Clustal W. MRP supertree is constructed by using published supertree software Clann (version 4. 43 • Seems to spread more easily. • L452R mutation enhanced attachment to ACE2. • Three spikes missense mutation • Potential reduction in neutralization by mAb treatments, convalescent, and postvaccination sera. 39, 43 • E484K • D614G • V1176F • ORF1a • L3468V, • L3930F • Mutation in N-protein include: • A119S • R203K • G204R Non RBD: • D614G • D215G • D80A • A701V • L18F 9. B.1.525 First detected in United Kingdom/Nigeria in December 2020. This lineage harbors following spike mutations, • Reduced neutralization by convalescent and postvaccination sera. 46 • 69del • A67V • 70del • 144del • D614G • E484k • F888L • Q677H 10. B.1.526 First detected in New York in November 2020. • Reduced neutralization by convalescent and postvaccination sera. 46, 47 • L5F • D253G • T951 • E484K • S477N • A701V • D614G 11. A.23.1 Uganda Have 12-17 amino acid mutations (7 in spike protein). • Data is scarce but presence of E484K can be associated with major concern of immune escape. 48 12. B.1.617 Most prevalent and common variant in India emerged in late 2020. It has two prominent mutations in the critical receptor binding domain i.e., E484Q and L452R. • Increased transmission possibly due to enhanced binding efficiency between viral [49] [50] [51] • spike proteins and human Angiotensin Converting Enzmye-2 (hACE2). • Reduced sensitivity to vaccine (BNTI62b2 mRNA) elicited antibodies. • Significant reduction in neutralization by postvaccination sera. (Table 1) . 42 Rees-Spear et al. 52 • D614G • P681R • E484Q • Q1071H 14. B.1.617.2 India in December 2020. Spike mutations include, • Significant reduction in neutralization by post vaccination sera and EUA monoclonal antibody treatments. 48 • G142D • T19R • 156del • 157del • L452R • R158G • DG14G • D950N • P681R • G142D • E484Q • D614G • T19R • L452R • D950N and infection rate. 55 The efficacy of diagnostic assays which mainly rely upon SARS-CoV-2 S-protein is highly vulnerable, as mutation at this site escapes successful detection that leada to an increased rate of falsenegative results. In contrast, point mutations are not more likely to occur in the N-protein of the virus and are less likely to affect its function. Thus, diagnostic tests targeting N-protein of the virus are highly efficient than those targeting S-protein due to its conserved sequence (limited mutations in N-protein) and strong immunogenicity. 56 Although, the N-protein is less likely to mutate but not rigidly invulnerable to mutations hence, in vitro diagnosis and vaccine development must consider the potential N-protein mutations. Moreover, diagnostic assays that rely upon polyclonal antibodies have a significant advantage over tests that assess the single epitope by using mAb as polyclonal antibodies are more likely to report accurate results despite of mutation in any epitope by recognizing multiple analytes simultaneously. 57 None of the novel SARS-COV-2 variants including 501Y in South Africa, D796H, H69/V70, and D614G represented the escape variant while detecting with polyclonal antibodies directed against N-protein. 58 Even the recent strain B.1.1.7 that has 17 mutations could be detected by using these antibodies and does not seem to impact drastically on the Berlin-Charité protocol (98% sequence can be detected with present primers and probe) but may challenge the commercially available kits directed against spike-proteins. 59 Recently, Vogels et al., 60 Origin and evolution of pathogenic coronaviruses The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health-The latest 2019 novel coronavirus outbreak in Wuhan, China Characteristics of and public health responses to the coronavirus disease 2019 outbreak in China COVID-19 in early 2021: current status and looking forward Effective treatment of severe COVID-19 patients with tocilizumab COVID-19 vaccines: where we stand and challenges ahead A novel coronavirus from patients with pneumonia in China Return of the coronavirus: 2019-nCoV Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan A pneumonia outbreak associated with a new coronavirus of probable bat origin Complete genome characterization of a novel coronavirus associated with severe human respiratory disease in Wuhan. bioRxiv (Preprint) Genetics and genomics of SARS-CoV-2: a review of the literature with the special focus on genetic diversity and SARS-CoV-2 genome detection Phylogenetic supertree reveals detailed evolution of SARS-CoV-2 Characteristics of SARS-CoV-2 and COVID-19 Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The lancet The proximal origin of SARS-CoV-2 Mechanisms of viral emergence Analysis of the mutation dynamics of SARS-CoV-2 reveals the spread history and emergence of RBD mutant with lower ACE2 binding affinity Evolutionary trajectory for the emergence of novel coronavirus SARS-CoV-2 The receptor binding domain of SARS-CoV-2 spike protein is the result of an ancestral recombination between the bat-CoV RaTG13 and the pangolin-CoV MP789 Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decadelong structural studies of SARS coronavirus Moderate mutation rate in the SARS coronavirus genome and its implications The establishment of reference sequence for SARS-CoV-2 and variation analysis Large scale genomic analysis of 3067 SARS-CoV-2 genomes reveals a clonal geo-distribution and a rich genetic variations of hotspots mutations Identification of the hypervariable genomic hotspot for the novel coronavirus SARS-CoV-2 Variant analysis of SARS-CoV-2 genomes Mutation landscape of SARS-CoV-2 reveals five mutually exclusive clusters of leading and trailing single nucleotide substitutions. bioRxiv (Preprint) Identification of multiple large deletions in ORF7a resulting in in-frame gene fusions in clinical SARS-CoV-2 isolates Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant Genotyping coronavirus SARS-CoV-2: methods and implications Global dynamics of SARS-CoV-2 clades and their relation to COVID-19 epidemiology Developing Covid-19 vaccines at pandemic speed D614G spike mutation increases SARS CoV-2 susceptibility to neutralization Evolutionary and structural analyses of SARS-CoV-2 D614G spike protein mutation now documented worldwide Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2 The D614G mutation in the SARS-CoV-2 spike protein reduces S1shedding and increases infectivity Broadly neutralizing antibodies present new prospects to counter highly antigenically diverse viruses Sixteen novel lineages of SARS-CoV-2 in South Africa Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil Neutralization of SARS-CoV-2 spike 69/70 deletion, E484K and N501Y variants by BNT162b2 vaccine-elicited sera SARS-CoV-2 variant B. 1.1. 7 is susceptible to neutralizing antibodies elicited by ancestral Spike vaccines SARS-CoV-2 501Y. V2 escapes neutralization by South African COVID-19 donor plasma Multiple SARS-CoV-2 variants escape neutralization by vaccine-induced humoral immunity Estimated transmissibility and impact of SARS-CoV-2 lineage B. 1.1. 7 in England Imported SARS-CoV-2 Variant P. 1 in Traveler Returning from Brazil to Italy SARS-CoV-2 spike E484K mutation reduces antibody neutralization. The Lancet Microbe A novel and expanding SARS-CoV-2 variant, B. 1.526, identified in New York. medRxiv Novel SARS-CoV-2 variants: the pandemics within the pandemic Possible link between higher transmissibility of B.1.617 and B.1.1.7 variants of SARS-CoV-2 and increased structural stability of its spike protein and hACE2 affinity Neutralization of variant under investigation B. 1.617 with sera of BBV152 vaccinees Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies The effect of spike mutations on SARS-CoV-2 neutralization A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants Phylogenetic analysis and structural modeling of SARS-CoV-2 spike protein reveals an evolutionary distinct and proteolytically sensitive activation loop Mutations strengthened SARS-CoV-2 infectivity The nucleocapsid protein of SARS-CoV-2: a target for vaccine development Overlooked benefits of using polyclonal antibodies Could mutations of SARS-CoV-2 suppress diagnostic detection? Will the emergent SARS-CoV2 B.1.1.7 lineage affect molecular diagnosis of COVID-19? Analytical sensitivity and efficiency comparisons of SARS-CoV-2 RT-qPCR primer-probe sets A recurrent mutation at position 26340 of SARS-CoV-2 is associated with failure of the E gene quantitative reverse transcription-PCR utilized in a commercial dual-target diagnostic assay Genetic drift in the genome of SARS COV-2 and its global health concern The authors declare that there are no conflict of interests. All data presented in the article are included in the manuscript. http://orcid.org/0000-0003-3102-8252