key: cord-1056109-8ihybuto authors: Sabir, Dana Khdr title: Analysis of SARS-COV2 spike protein variants among Iraqi isolates date: 2021-11-04 journal: Gene Rep DOI: 10.1016/j.genrep.2021.101420 sha: b4d905dd9d40d155d76a456b058c76d692f1cfa1 doc_id: 1056109 cord_uid: 8ihybuto The ongoing pandemic of COVID-19 caused by the SARS-COV2 virus has triggered millions of deaths around the globe. Emerging several variants of the virus with increased transmissibility, the severity of disease, and the ability of the virus to escape from the immune system has a cause for concerns. Here, we compared the spike protein sequence of 91 human SARS CoV2 strains of Iraq to the first reported sequence of SARS-CoV2 isolate from Wuhan Hu-1/ China. The strains were isolated between June 2020 and March 2021. Twenty-two distinct mutations were identified within the spike protein regions which were: L5F, L18F, T19R, S151T, G181A, A222V, A348S, L452 (Q or M), T478K, N501Y, A520S, A522V, A570D, S605A, D614G, Q675H, N679K, P681H, T716I, S982A, A1020S, D1118H. The most frequently mutations occurred at the D614G (87/91), followed by S982A (50/91), and A570D (48/50), respectively. In addition, a distinct shift was observed in the type of SARS-COV2 variants present in 2020 compared to 2021 isolates. In 2020, B.1.428.1 lineage was appeared to be a dominant variant (85%). However, the diversity of the variants increased in 2021, and the majority (73%) of the isolated were appeared to belong to B.1.1.7 lineage (VOC/alpha variants). To our knowledge, this is the first major genome analysis of SARS-CoV2 in Iraq. The data from this research could provide insights into SARS-CoV2 evolution, and can be potentially used to recognize the effective vaccine against the disease. Highly contagious coronavirus disease 2019 (COVID-19) is a human respiratory system infection which is caused by SARS-coronavirus-2 (SARS-CoV2). The disease was firstly reported in the city of Wuhan, China in December 2019 (1) and it was declared to be pandemic by WHO on March 11, 2020 (2). Inhalation and/or direct contact with infected droplets is the main route of disease transmission (3) . In Iraq, the first confirmed case of COVID-19 was reported on the 24 th of February 2020 in Al-Najaf city in the south of Baghdad (4) . As of 30 June 2021, more than 1.6 million people have contracted the disease in Iraq with more than 18,000 deaths (5) . The etiology of COVID-19 is SARS-Cov-2; positive single-stranded RNA viruses belong to β-genus of the Coronaviridae family (1, 6) . The size of the genome is about 29 Kbp, having 14 open reading frames (ORFs) and 27 proteins, including four structural proteins which are spike (S), envelop protein (E), membrane protein (M), and nucleocapsid (N) (6, 7) . Spike protein is a key target for diagnosis and developing vaccines (8) since it plays important role in the pathogenicity of the virus by interacting with cellular receptors such as the angiotensin-converting enzyme 2 (ACE2) to enter the human body (1) . It is made up of 1273 amino acid residues and the structure of the protein was first determined using cryo-electron microscopy (PDB: 6VSB) (8) . The protein is composed of two S1 and S2 sub-domains. Residues from 1 to 685 were located in the S1 subdomain, whereas the rest were located in the S2 subdomain that is used to fuse and enter into the target cells [13, 14] . The receptor-binding domain (RBD) (residue 319 to 541) is an important region of the S1 subdomain which facilitate the virus to interact with ACE2 of the human cells [11, 12] . Several variants of SARS-COV2 have been reported around the world resulted possessing one or more mutations in the virus genome that can affect the transmissibility of the virus and also the immunological response of the host cells (9) . A variant that carries D614G mutation in the spike protein of the virus, also known as G clade or B.1, was diagnosed during the early days of the pandemic in the north of America, and then reported in many European countries (10) . Another variant possesses N439K mutation in the receptor-binding domain (RBD) of spike protein has also been reported independently in many countries in Europe and the USA. The N439K variant has shown to have increased pathogenicity and can escape from neutralizing monoclonal antibodies and reducing the activities of some polyclonal responses (11) . B.1.1.7 lineage (Alpha variant, also known as VOC (Virus of Concern)) is a variant of the SARS-COV2 which was first identified in the January of 2020 in the UK (12). (14, 15) . This variant is characterized by having several mutations in the spike proteins including L452R, T478K and P681R (15) . A study has shown that this variant of SARS-COV2 is 60% more transmissible than the Alpha variant (B.1.1.7) (15) . There are other variants, such as P.1 (Gamma) lineage (20J/501Y.V3) which is mainly spread in Brazil, carrying 17 mutations; three of them are located in the spike protein regions (K417T, E484K, and N501Y) (16) . In this study, we describe the different variants of SARS-COV2 spread from June 2020 to March 2021 in Iraq based on a comprehensive analysis of 91 spike protein sequences of the virus. This study should give an insight into the evolution and the transmissibility of the virus in the region. Spike protein sequences of Iraqis' SARS CoV2 strains were obtained from the publicly accessible Global Initiative on Sharing All Influenza Data (GISAID) databank on 18 June 2021. Multiple sequence alignments were created using online program Clustal-Omega at the European Bioinformatics Institute database (https://www.ebi.ac.uk/). The alignments were carefully inspected to recognize the mutations within the 91 Iraqi isolates relative to the Wuhan-Hu-1 (accession number: NC_045512.2) (1). The location of the amino acids within the structure of the spike protein was mapped based on the structural information of the S protein available in the Protein Data Bank (PDB: 6VXX). Visualization of the protein structure was carried out using UCSF Chimera software. At the time of writing, a total of 91 SARS-COV2 genome sequences of Iraq were available in the GISAID database. The strains GISAID accession numbers; collection dates, age and gender of the patients are shown in Table 1 . Overall, an increase in the genome diversity and different viral lineages was observed in the SARS-COV2 isolates in 2021 compared to 2020. Such an increase in viral genome diversity is derived from the mutations which occur as the result of viral replication (12). Despite coronaviruses have a genetic proofreading mechanism to maintain their genome (17) and SARS-COV2 has low genetic diversity (18) , still, several variants of the virus J o u r n a l P r e -p r o o f have been recorded around the world since the first identification of the virus in December 2019 (19, 20) . Knowing the viral variant that circulating in the region is important to develop effective use of the vaccines and therapeutic agents. Among the 33 genome sequences from 2020, B.1.428.1 appeared to be a dominant lineage (85%, 28 strains). This is followed by 6% (2 strains) of B.1.1.7 lineages (Alpha/ VOC), and 3% (1 strain (22). This lineage is suggested to be highly transmissible and highly contributed towards the first pandemic waves in Qatar (22) . Also, B.1.428 is categorized as an Epsilon (Variants of Interest or VOI) by WHO (23). In terms of genome diversity, B.1.428.1 lineage is characterized by possessing several mutations in the genome including two mutations in the spike protein region which are A522V, D614G (https://www.gisaid.org/). These mutations can increase transmissibility and infectivity of the virus compared to the wild type (10, 24) (this variant is 10 times more infectious than the original SARS-COV2 (20)). In addition, these mutations in the spike protein might affect the correct diagnosis of the virus, and also the severity of the disease (25) . However, it does not increase the affinity of the protein to the ACE2, nor affects the neutralization action of the antibodies (24) . Thus, a vaccine produced based on the wild type spike protein should equally be effective against this variant (24) . Only one of the 2020's isolates was classified as a B.4 lineage. The strain was isolated in October 2020 and has an accession number of EPI_ISL_907075. Finding B.4 lineage among Iraqi strains is not surprising since this lineage has shown to be originated from Iran (26, 27) , and is mainly distributed in Asian countries between March to mid-May of 2020 (https://cov-lineages.org/). Additionally, people returning from Iran was thought to be the main factor to initiate spreading the SARS-COV2 virus in Iraq (4). Moreover, since the mutations of this lineage are located in the non-structural proteins (NPs), which are either enzymes or functional proteins involves in viral replication and methylation, it is less likely the variants can affect the pathogenicity of the virus or the host immune responses (28) . Feghali et al. (2021) (29) was also reported that the predominant lineage of SARS-COV2 strains isolated between 2 February 2020 and 15 March 2020 in Lebanon was B.1, and then B.4. Moreover, another one of the 2020 isolates was found to belong to B.1 lineage (or 20 clade). This lineage is characterized by having a mutation in NPS and D614G mutation in the spike protein region (30) . It has increased transmissibility and it first appeared in early 2020 in Europe then it became a predominant variant in many countries around the world (30). There were three genome sequences of SARS-CoV2 isolated in December of 2020. These strains showed an increase in the genome diversity compared to other strains isolated earlier in the same year. The two strains which were belonging to B.1.1.7 variants had accession numbers of EPI_ISL_1524332 and EPI_ISL_1524344, and the third strain was belonging to B.1.177.80 variant with an accession number of EPI_ISL_2383317. B.1.1.7 variant (Alpha variant/ VOC (12, 31)) was first identified in the UK. The variant was also reported among Karnataka/ India isolates in December 2020 (32) . This variant is characterized by higher transmissibility compared to most other variants (31) . However, there was no indication that the B.1.1.7 variation was linked to the severity of the illness or mortality (31) . On the other hand, B.1.177.80 lineage, which was identified among 2020's strain, is also called (Scandinavian lineage) has been reported in several countries around the world mainly in Sweden, Norway, Denmark, but also in Iraq and Switzerland (https://www.isitzen.com/). The variant possesses several mutations including A222V, D614G, and L18F mutations in the spike proteins. To our knowledge, there is no information about transmissibility and severity of the disease caused by this variant; however, it is expected to spread faster than the wild type since it is having D614G mutation. SARS-COV2 is an RNA virus and mutations can naturally occur during viral replications, which can lead to variation (12). These mutations, particularly if occurs in the spike protein region, can change the transmissibility, pathogenicity, and also the effect of the vaccine and therapeutic antibodies against the virus (31, 33) . Non-synonymous mutations were analysed among the genome of 91 SARS-COV2 isolates in the spike protein region. At least one or more mutations were found in all strains, except in the genome sequence of the EPI_ISL_907075 strain which did not have a mutation in the spike protein region. However, mutations V198I in NSP2, L37F in SNP6, and T113I in NSP14 were recorded. In total, 23 non-synonymous mutations were distributed in 22 distinct sites of the spike protein. D614G was found to be the most predominant mutation as 96% (87 strains) of the sequenced strains were carrying the mutation (Figure 4 ). D614G mutation was also reported to be the dominant (84.20%) among 2634 genome sequence data of SARS-COV2 from Qatar (22) . SARS-COV2 variant carrying D614G mutation has been shown to alter the virus fitness and increased transmissibility of the virus compared to the wild type (10, (33) (34) (35) . However, such mutation is not affecting the severity of the disease caused by the variant (10, (33) (34) (35) . Other predominant mutations which appeared frequently among Iraq isolates were S982A (55%), A570D (53%), P681H (52%), D1118H (51%). Limited researches have been carried out to study the effects of each mutation. However, P681H, which can be found in Alpha (20I, V1), Kappa (21B), and Delta (21A) variants, and the residue is located near to the furin cleavage; mutation of P618 to H can increase the transmissibility of the virus (28) . Similarly, D1118H, which is found in Alpha (20I, V1, B.1.1.7) lineage, can increase the transmissibility of the virus (28) . Within the spike protein region, six of the mutations were located in the ribosomal binding domain (RBD; 319 -541). These mutations were L452 (Q, M), T478K, N501Y, A520S, A522V. Residue 452 in the RBD of the spike protein had two variants (L452Q in accession number: EPI_ISL_1524379) and (L452M in accession number: EPI_ISL_1524347). Changes in the RBD region can alter the sensitivity of the variants towards neutralizing monoclonal or polyclonal antibodies. Previously, the L452R variant was shown to become resistant to monoclonal antibodies (mAbs) X593 and P2B-2F6 (20, 28) , however, mutating leucine (L) to glutamine (Q) or methionine (M) among the Iraqi strains could affect differently on the sensitivity of the strain towards the host and/or vaccine-induced immunity. Particularly, in the case of mutating leucine to glutamine since these two amino acids have different biochemical properties (leucine has a hydrophobic side chain, whereas glutamine has a polar uncharged side chain). Mutation of T478K in the RBD can be found in both Delta and Kappa (B.1.617.2/1) variants of the virus, but not in Alpha (B.1.1.7), beta (B.1.351), or Gamma (P.1)(28). This mutation has been shown to increase the transmissibility and affinity to ACE2. In addition, it helps the virus to escape from the host immune response (28) . Both N501Yand A522V were the most frequent mutations that occurred at the RBS of the Iraqi strains with 37% and 53%, respectively ( Figure 4 ). N501Y mutation can be found in Alpha, Beta, or Gamma, but not in Delta and Kappa variants. Such mutation, same as T478K, increases the J o u r n a l P r e -p r o o f transmissibility of the virus (28) . However, N501T reduces the binding affinity of spike protein to the human ACE2 (36) . In summary, our data emphasise the increased genome diversities of the SARS-COV2 in Iraq. B.1.428.1 variant was appeared to be prevalent linage in the second half of 2020, whereas B.1.1.7 was appeared to be the dominant lineage in early 2021. Twenty-three mutations were detected in 22 sides of the spike protein of the Iraqi strains. World Health Organization. Coronavirus disease 2019 (COVID-19) situation Impacts of novel pandemic coronavirus (COVID-19) outbreak on dental practice: A review of the current literature Challenges Facing Iraq to Tackle the Spread of COVID-19: An Overview World Health Organization. Coronavirus disease 2019 (COVID-19) Situation Report -59 Children Protection Against COVID-19 at the Pandemic Outbreak Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerging microbes & infections Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Genetic variants of SARS-CoV-2-what do they mean? Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-mediated immunity Detection of a SARS-CoV-2 variant of concern in South Africa Prevention ECfD, Control. Threat assessment brief: emergence of SARS-CoV-2 .617 variants in India and situation in the EU/EEA Reduced sensitivity of SARS-CoV-2 variant Delta to antibody neutralization Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus Coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics Coast-to-coast spread of SARS-CoV-2 during the early epidemic in the United States The global population of SARS-CoV-2 is composed of six major subtypes A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology One year of SARS-CoV-2: Genomic characterization of COVID-19 outbreak in Qatar. medRxiv. 2021. 23. Organization WH. COVID-19 weekly epidemiological update SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity Control CfD, Prevention. SARS-CoV-2 variant classifications and definitions COVID-19 WHO African Region: External situation report 19 An emergent clade of SARS-CoV-2 linked to returned travellers from Iran. Virus evolution Emerging SARS-CoV-2 variants of concern and potential intervention approaches Genomic characterization and phylogenetic analysis of the first SARS-CoV-2 variants introduced in Lebanon Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of 2020 Genomic characteristics and clinical effect of the emergent SARS-CoV-2 B. 1.1. 7 lineage in London, UK: a whole-genome sequencing and hospital-based cohort study. The Lancet Infectious Diseases Importation, circulation, and emergence of variants of SARS-CoV-2 in the South Indian State of Karnataka Spike mutation D614G alters SARS-CoV-2 fitness Structural and functional analysis of the D614G SARS-CoV-2 spike protein variant Evaluating the effects of SARS-CoV-2 spike mutation D614G on transmissibility and pathogenicity Structural basis of receptor recognition by SARS-CoV-2 Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.