key: cord-0284858-l95s23r2 authors: Al-Rashedi, N. A. M.; Alburkat, H.; Hadi, A. O.; Munah, M. G.; Jasim, A. H.; Hameed, A.; Oda, B. S.; Lilo, K. M.; AlObaidi, L. A. H.; Vapalahti, O.; Sironen, T.; Smura, T. title: High prevalence of an alpha variant lineage with a premature stop codon in ORF7a in Iraq, winter 2020-2021 date: 2021-10-22 journal: nan DOI: 10.1101/2021.10.20.21265042 sha: 009ee542d8aedb5d6b33ba1ea057c6a61a8646be doc_id: 284858 cord_uid: l95s23r2 Background: Since the first reported case of coronavirus disease 2019 (COVID-19) in China, SARS-CoV-2 has been spreading worldwide. Genomic surveillance of SARS-CoV-2 has had a critical role in tracking the emergence, introduction, and spread of new variants, which may affect transmissibility, pathogenicity, and escape from infection or vaccine-induced immunity. As anticipated, the rapid increase in COVID-19 infections in Iraq in February 2021 is due to the introduction of variants of concern during the second wave of the COVID-19 pandemic. Aim: To understand the molecular epidemiology of SARS-CoV-2 during the second wave in Iraq (2021), Methods: We sequenced 76 complete SARS-CoV-2 genomes using NGS technology and identified genomic mutations and proportions of circulating variants among these. Also, we performed an in silico study to predict the effect of the truncation of NS7a protein (ORF7a) on its function Results: We detected nine different lineages of SARS-CoV-2. The B.1.1.7 lineage was predominant (78.9%) from February to May 2021, while only one B.1.351 strain was detected. Interestingly, the phylogenetic analysis showed that multiple strains of the B.1.1.7 lineage clustered closely with those from European countries. A high frequency (88%) of stop codon mutation (NS7a Q62stop) was detected among the B.1.1.7 lineage sequences. In silico analysis of NS7a with Q62stop found that this stop codon had no significant effect on the function of NS7a. Conclusion: This work provides molecular epidemiological insights into the spread variants of SARS-CoV-2 in Iraq, which are most likely imported from Europe. In late December 2019, an outbreak of pneumonia of unknown etiology was announced in Wuhan, China. A relative unknown coronavirus named "Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)" was then identified as the causative agent of COVID-19 [1] . On March 11, 2020 , the WHO declared the COVID-19 outbreak a pandemic [2] , affecting most of the human population, especially in developing countries [3] . SARS-CoV-2 is highly infectious and has caused over 234 million confirmed cases of COVID-19 globally, including over 4.8 million deaths reported by the WHO as of October 3, 2021. In Iraq, the first SARS-CoV-2 case was diagnosed in February 24, 2020 [4] . Since then, over 695,489 cases and more than 13,000 deaths were confirmed by the end of February 2021 [5] . The epidemiological situation displayed a slight improvement at the end of the first wave (week 1, 2021). However, the number of cases has risen with the beginning of the second wave in week 5, 2021 [6] . Emerging RNA viruses are a global health concern due to their potentially high transmission rate, high mutation rates, and aggressive competition to host cellular functions. As a result of SARS-CoV-2 mutation dynamics, several variants of concern have emerged, of which there is evidence of increased transmission, more severe disease outcomes, and/or evidence of decreased neutralization by antibodies raised against previous infection or vaccine [7] . In particular, amino acid replacements in the spike protein can lead to enhanced binding with the host ACE2 receptor causing increased transmissibility and potentially higher virulence [8, 5] . Similar to influenza A virus and, to a lesser extent, seasonal human coronaviruses, SARS-CoV-2 can be expected to accumulate adaptive amino acid replacements in its glycoprotein, resulting in antigenic drift [9] . However, due to the biological differences in influenza A virus (IAV) (which is a segmented negative-stranded RNA virus with a higher overall evolutionary rate compared to coronaviruses) and seasonal coronaviruses, (which have been circulating among the human population for a long time), genomic surveillance of SARS-CoV-2 is needed to detect and assess the effect of such mutations [9] . The global effort for SARS-CoV-2 sequencing has led to efficient tracking of circulating lineages as well as tracking of mutations that may lead to changes in vaccine efficacy, PCR detection, and virus transmissibility [10, 11] . Therefore, surveying the molecular epidemiology/spatiotemporal changes in the SARS-CoV-2 genome and understanding its mutations are important. Yet, there is a significant underrepresentation of SARS-CoV-2 sequences from middle-and low-income countries in the global dataset [12] . Recently, four variants have been identified by the WHO to be of particular concern (VOCs): Alpha [13] . Only a limited number of SARS-CoV-2 sequences are currently available from Iraq (https://www.gisaid.org/). The first sequence of Iraqi patients available from the first wave showed the presence of a GH clade with the D614G mutation [14] . In the current study, we sequenced 76 SARS-CoV-2 genomes to produce baseline data for the genomic surveillance of SARS-CoV-2 in Iraq. Our work summarizes sequences, emerging mutations, and the evolutionary relationships of SARS-CoV-2 in Iraq between December 2020 and February 2021. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Combined naso-and oropharyngeal swabs were collected from 76 patients (46 males and 30 females, age ranging between 13 and 85 years) in Samawa, Iraq (31.3188° N, 45 .2806° E) during the second epidemic wave of COVID-19 in Iraq (between December 27, 2020 and February 28, 2021). Of these patients, five (6.6%) died, eight (10.5%) had severe disease, and the remaining 63 (82.9%) had mild to moderate infections. The samples were analyzed with STAT-NAT COVID-19 MULTI real-time PCR kits (Sentinel, Milano, Italy), based on two targets in RdRP and Orf1b genes, to detect the presence of SARS-CoV-2. The real-time PCR assay was conducted using the Mx3000P qPCR system (Agilent Technologies, Waldbronn, Germany). A total of 76 samples that had a high copy number of the virus were selected for whole genome sequencing. RNA isolation was carried out using TRIzol reagent (ThermoFisher Scientific, MA, USA) from a viral transport media (VTM) sample (3:1 ratio) according to the manufacturer's procedure. The LunaScript RT Super Mix Kit (New England Biolabs, UK) was used for first-strand cDNA synthesis. A multiplex PCR approach following the ARTIC protocol was used to amplify the viral genome using Q5 High Fidelity DNA Polymerase (New England BioLabs, UK). The NEBNext Ultra II library prep kit was used for Illumina sequencing library preparation. The libraries were quantified using the Qubit 4 with the dsDNA High Sensitivity Kit (ThermoFisher Scientific, MA, USA). High throughput sequencing was performed using the Illumina NovaSeq 6000 system with a read length of 250 bp, which produced 1,222,270 reads. Additionally, sequencing reads with low-quality (quality score <30) and short sequence (<50 nt) were removed using Trimmomatic [15] , assembly using BWA-MEM [16] , variant calling using LoFreq [17] , and consensus calling using SAMtools [18] implemented in the HaVoC pipeline [19] . Mutation analysis of the SARS-CoV-2 genome was interpreted using the GISAID CoVsurver "CoVsurver enabled by GISAID" [20] and Coronapp web application [21] . Lineage and clade assignment were identified using Pangolin (version v.3.1.7) [22] and the Nextstrain web server [23] . The 3D model of the mutant ORF7a was built using the Swiss-model web server, and the crystal structure of ORF7a (pdb:7ci3) was used as a template. Structural comparison was performed between the selected template and the built model to assign their similarity and dissimilarity using TM-align [24] and FATCAT web tools [25] . The PROCHECK web server was used to validate the best-fit model based on the stereochemical properties and geometry of the structure [26] . The quality of the model was evaluated by establishing a plot between phi and psi of the polypeptide residues using the Ramachandran plot server [27] . Subsequently, the model structure was refined using a 3D refine web server [28] . Finally, the refined model structure was prepared for docking by adding polar hydrogen All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted October 22, 2021. ; https://doi.org/10.1101/2021.10.20.21265042 doi: medRxiv preprint atoms and Gasteiger charges using the Autodock tool 1.5.6 [29] . The model, along with the wild ORF7a, was subjected to the HADDOCK 2.4 web server to investigate the protein-protein interactions [30] . Genome sequence alignment was performed using alignment of multiple complete SARS-CoV-2 genomes (MAFFT online version April 11, 2020) [31] . To analyze the SARS-CoV-2 genome samples derived from the Iraqi patients in a phylogenetic relatedness, a data set of 154 available SARS-CoV-2 complete genomes from different countries was collected from GISAID available on May 25, Table S1 ). The phylogenetic tree was mapped by a maximum likelihood estimation using a fit substitution model (ModelFinder) and replicate number with 1000 bootstrap on IQ-TREE (version: 1.6.10) [32] with ultrafast bootstrap support. ITOL v6 tools [33] were used for the visualization of the phylogenetic tree. The COVID-19 pandemic caused by SARS-CoV-2 has caused significant morbidity and mortality worldwide. During the first wave in Iraq, (February-December 2020), implementation of the restrictions (lockdowns) was related to a significant reduction in daily reported cases and mortality, followed by phased relaxation in restrictions. During the second wave (Figure 1 ), the cumulative number of cases reached more than a million by late May 2021 [6] . perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in We identified nine different genetic lineages, including two variants of concern: B. Coronapp [21] , the amino acid change D614G in the spike glycoprotein was detected at a high frequency of sequences (n = 73, 96%) ( Figure 3 ). All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Table S3 ) and recorded in 77 countries before. Figure 4 , comprising high frequencies of S106, G107, and F108del in NSP6 (n=62, 81.6%), spike H69del (n=53, 69.7%), and Y145del (n=51, 67.1%). All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in We found a high prevalence of premature stop codon Q62stop among the Iraqi Alpha variant strains. SARS-CoV-2 ORF7a is a transmembrane protein (type I) composed of 122 amino acids (15, 81, 21, and 5 amino acids composed to N-terminal, luminal domain, transmembrane segment, and cytoplasmic tail, respectively). It has been reported [34] that this accessory protein modulates the immune response of the host by binding with the host lymphocyte function-associated antigen I (LFA-1). Previous studies have suggested that the amino acids T39, E41, N43, Q62, A66, and K72 play a key role in the function of ORF7a [34] . Among these six active residues, two (A66 and K72) were truncated by a premature stop codon. Therefore, to predict the potential effect of the stop codon (Q62*) on the function of the ORF7a protein, we constructed a 3D model of the mutant ORF7a. The crystal structure of ORF7a (PDB: 7ci3) was used as the template. The similarity and dissimilarity between the selected template and model were performed using TM-align and FATCAT, where the optimal structural similarity was evaluated based on the obtained TM score (0.70, TM-score > 0.5, indicating that the two proteins have the same fold). A flexible protein structure comparison between the model and template was achieved using FATCAT, where the obtained p-value (1.55e-15) and RMSD (0.06 Å) indicate that the two protein structures are significantly similar. After validation, quality check, adding polar hydrogen atoms, and adding Gasteiger charge of the selected model, it was then subjected to molecular docking alongside the wild ORF7a to assess the protein-protein interaction of ORF7a and LFA-1 Idomain (pdb: 7ci3). The latter is located on the cell membrane of human leukocytes, which is the target of ORF7a ( Figure 5 ). The molecular docking results suggested that there was no significant difference in the binding affinity of the mutant and wild ORF7a, and the HADDOCK scores were -67.0 and -63.6 for mutant and wild ORF7a, respectively ( Table 1) . All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Z-Score -2.1 -1.7 The phylogenetic tree confirmed the presence of different lineages belonging to multiple clusters ( Figure 6 ). Seventy-six strains from the Iraqi population were distributed to eight different SARS-CoV- Complete-genome sequencing and phylogenetic analysis of SARS-CoV-2 strains is an essential approach for tracking the virus evolution and understanding the circulation of SARS-CoV-2 variants in Iraq. However, there is little genetic information about the SARS-CoV-2 outbreak in Iraq. Therefore, the current study aimed to provide some rudimentary information about the genotypes of SARS-CoV-2 that are circulating in the country. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted October 22, 2021. ; https://doi.org/10.1101/2021. 10.20.21265042 doi: medRxiv preprint In this analysis, 76 SARS-CoV-2 complete genomes were sequenced from Iraq. From these genomes, we identified nine lineages. Of these, four genome sequences accompanied four novel mutations (A2853T, T20780A, TG25774AT, and T26159A) that caused a change in the amino acids E45V (NSP3), M41K (NSP16), W128M (NS3), and V256D (NS3). As expected, most (n = 73, 96%) of the sequences from the second epidemic wave in Iraq contained the amino acid replacement D614G in the spike proteins (Figure 3) . D614G was the first mutation in the spike glycoprotein that was first identified in Germany in January 2020 and became the dominant mutation in all the circulating strains worldwide by June 2020 [35] . Despite the D614G mutation being located outside of receptor binding domain (RBD) it enhanced the function of S protein and became the common mutation in most of the circulating strains during the second pandemic wave of COVID-19. The amino acid D614G is in the S1 subunit and is incorporated via hydrogen bonding and electrostatic interactions with two amino acids in the S2 subunit. Thus, replacement of Asp with Gly was accompanied by rupturing of these interactions, causing higher virus fitness in the upper respiratory tract [36] . The first SARS-CoV-2 genome sequence was reported on June 30, 2020, during the first wave occurrence in Iraq and belonged to the B.1/GH clade. According to the clade distribution, this clade diminished during the second wave. The B.1.1.7/GR and GRY clades were the most prevalent, which is consistent with the global distribution of SARS-CoV-2 clades in different countries. Here, we report the first confirmed case of Alpha/B.1.1.7 variant of concern in Iraq (EPI_ISL_1524332), in a sample collected in December 2020, followed by the recording of 59 cases (GISAID). Since then, the number of infected cases has risen to over seven thousand cases per day. This is likely due to the emergence of this variant, which is characterized by high transmissibility and pathogenicity. The Beta variant (B.1.351) was first reported in South Africa in October 2020, and concerns about this variant are associated with high transmissibility, pathogenicity, and the limited protection of some vaccines against the infection [37] . Interestingly, we identified one strain belonging to B.1.351 for the first time in Iraq in late February 2021, which was collected from a patient without a history of travel, suggesting that this variant has been circulating locally before this date. showing that there is a genomic diversity of this variant in Iraq, which could be attributed to a variety of infection sources. It has been reported that the deletion 69-70 in the S protein causes a negative result from RT-PCR assays specific target for S-gene [38] . This specific deletion has occurred at high frequency in different countries and is currently geographically widespread. According to our results, this deletion was identified in 53 strains among 60 of the Alpha variant ( Figure 4 ). In addition to this deletion, a cluster of aa mutations (Y144del, N501Y, A570D, D614G, P681H, T716I, S982A, and D1118H) were noticed in the spike proteins of the strains belonging to the Alpha variant (Supplementary Table S2 ). Fortunately, most of these aa changes are located outside the RBD; hence, All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in [34] . This motivated us to use an in silico approach to investigate the effect of Q62* on the function of NS7a using molecular docking scores ( Table 1 ). The results predicted that NS7a was still able to bind to its target (LFA-1). Consistently, some strains with Q62* were derived from patients with severe infections, suggesting that truncated NS7a may not reduce the pathogenicity of the virus. However, further studies on the functional consequences of the stop codon Q62* are required to confirm this finding. The phylogenetic tree indicated that Iraqi B.1.1.7 strains form several subclusters, suggesting multiple introductions followed by local transmission, and most of the Iraqi strains clustered with the European strain, which may either reflect true importations or be due to unequal sampling efforts. In this work, we reported the genome sequences of SARS-CoV-2 from Iraq and tracked the locally distributed viral variations during the second epidemic wave. Sequence analysis showed the transformation of the previously circulating strains from the first wave to the dominance of Alpha variants that most likely surged during the second epidemic wave, as in most other countries. In addition, one Beta variant (B.1.351) was detected. Furthermore, we detected a prevalent NS7a Q62stop mutation among the Alpha variant strains in Iraq. In silico analysis suggested that there was no significant difference in the binding affinity of mutant and wild NS7a to LFA-I. All raw sequencing data used in this study are available on the Sequence Read Archive (SRA) under the BioProject accession numbers PRJNA731979, PRJNA735311, and PRJNA738286. The genome sequences were deposited in GISAID and GenBank and are now accessible by the numbers listed in Supplementary Table S1 and S3. NA designed the study. AJ, AO, and NA collected the strains and epidemiological data. NA, HA, TS, OV, and TS performed laboratory work, RNA extraction, whole-genome sequencing, and bioinformatics analysis. NA, AH, MM, LA analyzed and interpreted data and prepared the figures. NA, MM, KM, and BA drafted the manuscript. All authors discussed the results and contributed to the revision of the final manuscript. This work was approved by the scientific research-ethics committee of Al Muthanna University within the collaborative protocol of joint work between the College of Science and Public Health Department, Al-Muthanna Directorate (July 30, 2020-8928). All participants provided signed informed consent. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in All participants provided signed informed consent. A new coronavirus associated with human respiratory disease in China General's opening remarks at the media briefing on COVID-19 Crucial contribution of the universities to SARS-CoV-2 surveillance in Ecuador: Lessons for developing countries COVID-19 Weekly Epidemiological Update 22: Special edition: Proposed working definitions of SARS-CoV-2 Variants of Interest and Variants of Concern SARS-CoV-2 variants, spike mutations and immune escape SARS-CoV-2 mutations: The biological trackway towards viral fitness The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity The evolutionary dynamics of endemic human coronaviruses Will mutations in the spike protein of SARS-CoV-2 lead to the failure of COVID-19 vaccines? Why is the S protein the target of most COVID-19 Real-time RT-PCR in COVID-19 detection: Issues affecting the results Global disparities in SARS-CoV-2 genomic surveillance Increased transmissibility and global spread of SARS-CoV-2 variants of concern as at Genome sequencing of a novel coronavirus SARS-CoV-2 isolate from Iraq Trimmomatic: A flexible trimmer for Illumina sequence All rights reserved. No reuse allowed without permission. perpetuity preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM A sequencequality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets The sequence alignment/map format and SAMtools HaVoC, a bioinformatic pipeline for reference-based consensus assembly and lineage assignment for SARS-CoV-2 sequences Global initiative on sharing all influenza data-from vision to reality Coronapp: A web application to annotate and monitor SARS-CoV-2 mutations A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology NextStrain: Real-time tracking of pathogen evolution TM-align: A protein structure alignment algorithm based on the TMscore FATCAT: A web server for flexible structure comparison and structure similarity searching PROCHECK: A program to check the stereochemical quality of protein structures Main-chain conformational tendencies of amino acids 3Drefine: An interactive web server for efficient protein structure refinement Software news and updates AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility The HADDOCK2.2 web server: User-friendly integrative modeling of biomolecular complexes MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization All rights reserved. No reuse allowed without permission. perpetuity preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted W-IQ-TREE: A fast online phylogenetic tool for maximum likelihood analysis Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation Structural assessment of SARS-CoV2 accessory protein ORF7a predicts LFA-1 and Mac-1 binding potential Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-ll The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity Evidence of escape of SARS-CoV-2 variant B. 1. 351 from natural and vaccine-induced sera COG. UK update on SARS-CoV-2 Spike mutations of special interest UK_19-December-2020_SARSCoV-2-Mutations.pdf. Prepared by COG-UK We would like to thank Dr. Ryiad Abed-Ameer Halfi and Spec. Microbiologist Batool Kadham Salman, Ministry of Health, Iraq, and all the personnel from the unit of Molecular Virology, College of Medicine, Helsinki University, Finland, for their great efforts in this work. None declared. The present work was supported by Sawa University, Samawa, Iraq.All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in