key: cord-1032001-8eh4bsrp authors: Sallam, Malik; Mahafzah, Azmi title: Molecular Analysis of SARS-CoV-2 Genetic Lineages in Jordan: Tracking the Introduction and Spread of COVID-19 UK Variant of Concern at a Country Level date: 2021-03-05 journal: Pathogens DOI: 10.3390/pathogens10030302 sha: c4b60d286a2daeec0a6b1531167937c66eac5c30 doc_id: 1032001 cord_uid: 8eh4bsrp The rapid evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is manifested by the emergence of an ever-growing pool of genetic lineages. The aim of this study was to analyze the genetic variability of SARS-CoV-2 in Jordan, with a special focus on the UK variant of concern. A total of 579 SARS-CoV-2 sequences collected in Jordan were subjected to maximum likelihood and Bayesian phylogenetic analysis. Genetic lineage assignment was undertaken using the Pango system. Amino acid substitutions were investigated using the Protein Variation Effect Analyzer (PROVEAN) tool. A total of 19 different SARS-CoV-2 genetic lineages were detected, with the most frequent being the first Jordan lineage (B.1.1.312), first detected in August 2020 (n = 424, 73.2%). This was followed by the second Jordan lineage (B.1.36.10), first detected in September 2020 (n = 62, 10.7%), and the UK variant of concern (B.1.1.7; n = 36, 6.2%). In the spike gene region, the molecular signature for B.1.1.312 was the non-synonymous mutation A24432T resulting in a deleterious amino acid substitution (Q957L), while the molecular signature for B.1.36.10 was the synonymous mutation C22444T. Bayesian analysis revealed that the UK variant of concern (B.1.1.7) was introduced into Jordan in late November 2020 (mean estimate); four weeks earlier than its official reporting in the country. In Jordan, an exponential increase in COVID-19 cases due to B.1.1.7 lineage coincided with the new year 2021. The highest proportion of phylogenetic clustering was detected for the B.1.1.7 lineage. The amino acid substitution D614G in the spike glycoprotein was exclusively present in the country from July 2020 onwards. Two Jordanian lineages dominated infections in the country, with continuous introduction/emergence of new lineages. In Jordan, the rapid spread of the UK variant of concern should be monitored closely. The spread of SARS-CoV-2 mutants appeared to be related to the founder effect; nevertheless, the biological impact of certain mutations should be further investigated. The evolutionary analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is appealing for several reasons. First, this novel virus harbours a ribonucleic acid (RNA) genome, with replication using RNA-dependent RNA polymerase. This replicase enzyme has a minimal proofreading activity; the hallmark of rapidly-evolving viruses (e.g., influenza virus and hepatitis C virus) [1, 2] . In addition, the pandemic nature of coronavirus disease 2019 (COVID- 19) , with more than 100 million detected cases so far, translates into a huge pool of susceptible hosts with varying selective pressure on the viral genome [3, 4] . This resulted in rapid divergence of The molecular signature found consistently in the Spike gene region of the first Jordan lineage B.1.1.312 was the replacement of adenine by thymine at position 24,432 (A24432T) of the reference genome NC_045512 (thymine instead of uracil since the results were those of DNA sequencing). This mutation was non-synonymous resulting in the replacement of glutamine (Q) by leucine (L) at position 957 of the spike glycoprotein (Q957L). The molecular signature in the Spike gene region for the second Jordan lineage B.1.36.10 was C22444T (a synonymous mutation). Using the Tamura The molecular signature found consistently in the Spike gene region of the first Jordan lineage B.1.1.312 was the replacement of adenine by thymine at position 24,432 (A24432T) of the reference genome NC_045512 (thymine instead of uracil since the results were those of DNA sequencing). This mutation was non-synonymous resulting in the replacement of glutamine (Q) by leucine (L) at position 957 of the spike glycoprotein (Q957L). The molecular signature in the Spike gene region for the second Jordan lineage B.1.36.10 was C22444T (a synonymous mutation). Using the Tamura To determine sequence clustering among the three most common genetic lineages of SARS-CoV-2 in Jordan, we conducted maximum likelihood (ML) phylogeny construction. Using the Spike gene region, the proportion of phylogenetic clustering was the highest among the B. Figure 2 ). Please refer to the Materials and To determine sequence clustering among the three most common genetic lineages of SARS-CoV-2 in Jordan, we conducted maximum likelihood (ML) phylogeny construction. Using the Spike gene region, the proportion of phylogenetic clustering was the highest among the B. Figure 2 ). Please refer to the Materials and Methods section for the explanation of difference in B.1.1.312 number of sequences for the two sub-genomic regions (Supplementary S1). For the three major genetic lineages in Jordan (B.1.1.312; B.1.36.10 and B.1.1.7), an assessment of amino acid substitutions in the spike glycoprotein compared to that in the reference sequence (YP_009724390) was undertaken. The amino acid substitution D614G was detected in the vast majority of sequences (n = 566, 97.8%), and the wild type (D614) was last identified in June 2020. The amino acid substitutions N501Y and P681H besides the deletion ∆69/70 were consistently found among the lineage B.1.1.7 sequences, while N501I was detected in a single sequence from the first Jordan lineage B.1.1.312. The following amino acid substitutions were totally absent from the sequences that were analyzed in this study: K417N and E484K. Using the Protein Variation Effect Analyzer (PROVEAN) tool, two amino acid substitutions were predicted to be deleterious for the spike glycoprotein: T716I detected among B.1.1.7 sequences and Q957L found in the first Jordan lineage B.1.1.312 (Table 2) . For the three major genetic lineages in Jordan (B.1.1.312; B.1.36.10 and B.1.1 assessment of amino acid substitutions in the spike glycoprotein compared to tha reference sequence (YP_009724390) was undertaken. The amino acid substitution D614G was detected in the vast majority of seque = 566, 97.8%), and the wild type (D614) was last identified in June 2020. The amino acid substitutions N501Y and P681H besides the deletion Δ69/7 consistently found among the lineage B.1.1.7 sequences, while N501I was detect single sequence from the first Jordan lineage B.1.1.312. The following amino acid substitutions were totally absent from the sequenc were analyzed in this study: K417N and E484K. Using the Protein Variation Effect Analyzer (PROVEAN) tool, two amino ac stitutions were predicted to be deleterious for the spike glycoprotein: T716I d among B.1.1.7 sequences and Q957L found in the first Jordan lineage B.1.1.312 (Ta Table 2 . Prediction of amino acid substitution impact in the spike glycoprotein of SARS-CoV-2 stratified by the thr major genetic lineages detected in Jordan. Amino Acid Substitution PROVEAN 1 Score Prediction (Cutoff UK variant of concern (B. Variants with a score equal to or below −2.5 are considered "deleterious," and variants with a score above −2.5 are considered "neutral" in the Protein Variation Effect Analyzer (PROVEAN) tool. Bayesian analysis of the UK variant of concern (B.1.1.7) lineage, with 35 SARS-COV-2 S sequences collected in Jordan between 24 December 2020 and 6 January 2021 revealed that the time to the most recent common ancestor (tMRCA) of this lineage in Jordan was 21 November 2020 (95% highest posterior density interval: 17 November 2020-24 December 2020). Coalescent analysis using a Bayesian skyline plot showed a rapid exponential increase in the number of effective infections between 1 January 2021 and 5 January 2021 (Figure 3 ). Bayesian analysis of the UK variant of concern (B.1.1.7) lineage, with 35 SARS-COV-2 S sequences collected in Jordan between 24 December 2020 and 6 January 2021 revealed that the time to the most recent common ancestor (tMRCA) of this lineage in Jordan was 21 November 2020 (95% highest posterior density interval: 17 November 2020-24 December 2020). Coalescent analysis using a Bayesian skyline plot showed a rapid exponential increase in the number of effective infections between 1 January 2021 and 5 January 2021 (Figure 3) . In this study, we utilized molecular clock and coalescent analyses to describe the timeline of introduction of the genetic lineage B.1.1.7-commonly known as the UK variant of concern-and its spread in Jordan. Additionally, we employed the Pango classification system, which facilitates the classification and nomenclature of SARS-CoV-2 genetic lineages, containing molecular signatures that can be helpful to track its introduction/emergence and spread [5] . This approach can be used to evaluate public health measures including control and mitigation practices [31] . The negative consequences of the current COVID-19 pandemic necessitates such in-depth epidemiologic studies, which can be helpful to plan effective preventive strategies [32, 33] . The major result of this study revealed that the genetic lineage B.1.1.7 was introduced into Jordan about four weeks earlier than the official reporting of its introduction into the country [27] . Bayesian skyline coalescent analysis showed that the exponential increase in infections as a result of the B.1.1.7 lineage coincided with the new year 2021, following a lag phase of several weeks. It is known that the human behavior can drive a surge in infections if a super spreader event takes place in a large gathering [34, 35] . However, this hypothesis needs further evaluation using contact tracing data together with dense sampling to reconstruct the evolutionary history of this lineage in the country. In this study, we utilized molecular clock and coalescent analyses to describe the timeline of introduction of the genetic lineage B.1.1.7-commonly known as the UK variant of concern-and its spread in Jordan. Additionally, we employed the Pango classification system, which facilitates the classification and nomenclature of SARS-CoV-2 genetic lineages, containing molecular signatures that can be helpful to track its introduction/emergence and spread [5] . This approach can be used to evaluate public health measures including control and mitigation practices [31] . The negative consequences of the current COVID-19 pandemic necessitates such in-depth epidemiologic studies, which can be helpful to plan effective preventive strategies [32, 33] . The major result of this study revealed that the genetic lineage B.1.1.7 was introduced into Jordan about four weeks earlier than the official reporting of its introduction into the country [27] . Bayesian skyline coalescent analysis showed that the exponential increase in infections as a result of the B.1.1.7 lineage coincided with the new year 2021, following a lag phase of several weeks. It is known that the human behavior can drive a surge in infections if a super spreader event takes place in a large gathering [34, 35] . However, this hypothesis needs further evaluation using contact tracing data together with dense sampling to reconstruct the evolutionary history of this lineage in the country. Despite the need for further evidence regarding the biological significance of B.1.1.7 lineage, several studies reported on the rapid dissemination of this lineage in UK among several other countries [6, 16, 36, 37] . This proposed change in virus behavior can be related to enhanced binding between the spike glycoprotein of this lineage and its receptor; and this enhancement has been proposed to be the result of N501Y amino acid substitution [18, 38] . Additionally, we used the Pango classification system to describe the molecular epidemiology of COVID-19 in Jordan [5] . Since the first introduction of the novel coronavirus into humans, the expanding genetic diversity of the virus demanded a scheme to classify and name monophyletic clades, which would facilitate the study of epidemiologic features of the virus including its spread. This would also provide a consensus to study the possible biological significance of such lineages [39, 40] . In this study, we adopted the approach conceived by Rambaut et al., that can help in analyzing patterns of introduction and spread of this novel virus in a certain region [5, 12] . Community transmission of SARS-CoV-2 in Jordan became apparent in August 2020, and was dominated by three genetic lineages starting with the first and second Jordan 1.312 and B.1.36.10) , and it was recently driven by the UK variant of concern (B. 1.1.7.) . The emergence/introduction of the two Jordan lineages can be mostly related to a founder effect, since no discernible advantageous or neutral mutations were detected among the two lineages [41] [42] [43] . The molecular signature of the second Jordan lineage (B.1.36.10) was found in earlier sequences collected in Turkey [44] . This might point to a possibility of introduction of this lineage into Jordan in early September 2020, considering that travelers coming from Turkey (classified as a green country at that time) were not required to be quarantined [45] . One result that should be investigated further is the higher proportion of phylogenetic clustering for the B.1.1.7 lineage compared to the two Jordan lineages. This indicates a higher proportion of domestic transmission, which can be linked to enhanced transmissibility of the virus. However, such a result is pending further evidence to support the current observations linking such a genetic lineage with a higher transmission [37] . In line with several previous studies, genetic analysis of SARS-CoV-2 in Jordan showed the shift into B lineage, harboring the spike D614G amino acid substitution, with all sequences collected in Jordan harboring this substitution from July 2020 onwards [23, 24] . This amino acid substitution was present in the country as early as March 2020, which hints to the effects of virus genetic changes on its epidemic behavior, despite the need for further evidence to support such a correlation [21, [46] [47] [48] . The current study used the state-of-the-art phylogenetic inference methods to characterize the molecular epidemiology of SARS-CoV-2 in Jordan. Additionally, this study can be considered among the first studies in the Middle East and North Africa region utilizing the Pango classification system to characterize the genetic diversity of SARS-CoV-2 to the best of our knowledge. Limitations of this study included potential sampling bias in time, which was manifested by variation in sequencing proportion in relation to new cases diagnosed each month; with 1.3% sequencing rate out of the newly diagnosed cases before October 2020 and 0.1% thereafter. Another caveat of this study can be the enhanced surveillance of passengers (and their contacts) coming from UK or other countries where the UK variant of concern was reported. This may have caused the dominance of B.1.1.7 lineage among sequences collected in December 2020-January 2021. All SARS-CoV-2 genetic sequences that were collected in Jordan were retrieved from GISAID, as of 30 January 2021 [11] . The Jordanian sequences were aligned together with the reference SARS-CoV-2 sequence Wuhan-Hu-1 (accession number: NC_045512). Multiple sequence alignment was undertaken through a multiple alignment program for amino acid or nucleotide sequences (MAFFT v.7) [49] . Data on daily COVID-19 diagnosed cases and deaths in Jordan were retrieved from Coronavirus Source Data, and covered the period from 3 March 2020 to 29 January 2021 [50] . To describe the genetic lineages of the sequences in the SARS-CoV-2 Jordanian dataset, we utilized Phylogenetic Assignment of Named Global Outbreak Lineages (Pangolin) [51] . The Pangolin tool follows the 'Pango' nomenclature system for classifying SARS-CoV-2 genomic sequences [5, 12] . The measurement of within-lineage genetic distances was done using MEGA6, which was also used to detect the following amino acid substitutions/deletions in the spike glycoprotein sequence: D614G, E484K, N501Y, P681H, 69-70del, and K417N [52] . Genetic divergence from the reference sequence of SARS-CoV-2 and within-lineage genetic diversity were assessed using the Tamura-Nei model as implemented in MEGA6 [52, 53] . For the three major SARS-CoV-2 lineages circulating in Jordan (B.1.1.312; B.1.36.10; and B.1.1.7), we used the Protein Variation Effect Analyzer (PROVEAN) tool in order to assess the possible functional changes in the spike glycoprotein compared to that in the reference sequence (YP_009724390) [54] . To conduct the ML phylogenetic analysis, we used two sub-genomic parts of the dataset: (1) ORF1ab (NC_045512 positions: 266-21,555); with the following Jordanian sequences removed for having long (>10) stretches of ambiguous (N) bases: EPI_ISL_429992; EPI_ISL_429995; EPI_ISL_430008; EPI_ISL_430013; EPI_ISL_450189; EPI_ISL_636390; EPI_ISL_730391; EPI_ISL_730473; EPI_ISL_730545; EPI_ISL_755118; EPI_ISL_755120; EPI_ISL_755121; EPI_ISL_755122; EPI_ISL_755123; EPI_ISL_755124; EPI_ISL_755125; EPI_ISL_755126; EPI_ISL_755127; EPI_ISL_755128; EPI_ISL_755129; EPI_ISL_755131; EPI_ISL_755237; EPI_ISL_755238; EPI_ISL_755239; EPI_ISL_755240; EPI_ISL_755243; EPI_ISL_755247; EPI_ISL_755267; EPI_ISL_878495; which yielded a dataset with 550 Jordanian sequences; (2) Spike S (NC_045512 positions: 21,563-25,384); with the following Jordanian sequences removed for having long (> 10) stretches of ambiguous (N) bases: EPI_ISL_430013; EPI_ISL_450189; EPI_ISL_755118; EPI_ISL_755120; EPI_ISL_755121; EPI_ISL_755123; EPI_ISL_755125; EPI_ISL_755126; EPI_ISL_755128; EPI_ISL_755131; EPI_ISL_878495; EPI_ISL_430008; EPI_ISL_730543; EPI_ISL_730545; which yielded a dataset with 565 Jordanian sequences. Phylogeny construction for the two sub-genomic Jordanian datasets using the ML approach was done using PhyML v3 [55] . The Smart Model Selection (SMS) was used for selection of the most appropriate nucleotide substitution model, depending on the Akaike Information Criterion (AIC) [56] . Models that were used for construction of ML trees were: GTR + G for ORF1ab; and HKY85 + I for S region. [57] . The following criteria were used for Bayesian evolutionary analysis by sampling trees (BEAST) analysis: HKY nucleotide substitution model with discrete gammadistributed rate heterogeneity, uncorrelated relaxed clock model with a uniform rate prior (initial value of 0.0065) and a Bayesian skyline tree density model [23] . A single run with 200 million chain length was performed, with samples of trees and parameters collected every 20,000 steps after discarding a burn-in of 20%. Convergence was checked for using Tracer v1.6.0. with all parameters having effective sample sizes (ESSs) of >200. Construction of the Bayesian skyline plot was done in Tracer; and assembly of the maximum clade credibility (MCC) tree was done using TreeAnnotator available in BEAST package [57] . Visualization of the trees in this study was undertaken in FigTree [58] . In the current study, molecular characterization of SARS-CoV-2 in Jordan was undertaken for the first time to the best of our knowledge. A recent report by Edyth Parker et al investigated the emergence of lineage B.1.1.7 in Jordan and revealed the current dominance of this lineage in Jordan [59] . Two Jordan lineages dominated infections in the country, with a recent introduction of the lineage B.1.1.7. This UK variant of concern was present in the country several weeks before its official reporting, with an exponential propagation over the first few days of the new year 2021. The introduction of new lineages in the country appeared to be related to founder effect; nevertheless, the biological significance of certain mutations should be further evaluated. An important note should be clarified, which is related to the distinction that should be made between the epidemiologic and contact tracing value of determination of virus lineages as opposed to the identification and characterization of novel strain, subtypes or types of viruses that have distinct biological features. Thus, continuous surveillance of genetic variability of SARS-CoV-2 is recommended to track the emergence of new genetic variants, with subsequent studies of its potential biological significance. The media hype about the UK variant of concern seems justified considering its rapid spread and the number of amino acid changes detected in the spike glycoprotein of this lineage, which can have important effects on antigenicity and transmissibility. In turn, this can have implications for the current vaccine formulations and resurgence of new waves of infection. The data authors can be contacted directly via GISAID website: https://www.gisaid.org/. The da-tasets analysed during the current study (ML analyses files, xml files without sequences, Tracer log files) are available from the corresponding author (M.S.) on a reasonable request and considering the terms of use by GISAID. We sincerely thank the originating lab (Biolab Diagnostic Laboratories) and the submitting lab (Andersen lab at Scripps research) who have sequenced and shared the full genome data for SARS-CoV-2 in the GISAID database. The authors declare no conflict of interest. Rates of evolutionary change in viruses: Patterns and determinants Imbert, I. Insights into RNA synthesis, capping, and proofreading mechanisms of SARS-coronavirus COVID-19 Coronavirus Pandemic SARS-CoV-2 and COVID-19: A genetic, epidemiological, and evolutionary perspective A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK Variant analysis of SARS-CoV-2 genomes Virus evolution and transmission in an ever more connected world Phylogenetic and phylodynamic analyses of SARS-CoV-2 On the origin and continuing evolution of SARS-CoV-2 Data, disease and diplomacy: GISAID's innovative contribution to global health Addendum: A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Covid-19: New coronavirus variant is identified in UK New mutations raise specter of 'immune escape Genomic Characterisation of an Emergent SARS-CoV-2 Lineage in Manaus: Preliminary Findings. Available online Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa Molecular Mechanism of the N501Y Mutation for Enhanced Binding between SARS-CoV-2's Spike Protein and Human ACE2 Receptor Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant Spike mutation D614G alters SARS-CoV-2 fitness No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2 COVID-19 Coronavirus spike protein analysis for synthetic vaccines, a peptidomimetic antagonist, and therapeutic drugs, and analysis of a proposed achilles' heel conserved region to minimize probability of escape mutations and drug resistance Temporal increase in D614G mutation of SARS-CoV-2 in the Middle East and North Africa Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil COVID-19 misinformation: Mere harmless delusions or much more? A knowledge and attitude cross-sectional study among the general public residing in Jordan The Hashemite Kingdom of Jordan. COVID-19 Updates in Jordan Jordan to Close Border with Syria after Spike in COVID-19 Cases High Rates of COVID-19 Vaccine Hesitancy and Its Association with Conspiracy Beliefs: A Study in Jordan and Kuwait among Other Arab Countries Conspiracy Beliefs Are Associated with Lower Knowledge and Higher Anxiety Levels Regarding COVID-19 among Students at the University of Jordan Use of phylogenetics in the molecular epidemiology and evolutionary studies of viral infections Immune System, Nutrition, Respiratory and Cardiovascular Diseases during COVID-19: A Complex Combination A Systematic Review of COVID-19 Epidemiology Based on Current Evidence Mobility network models of COVID-19 explain inequities and inform reopening Assessment of SARS-CoV-2 transmission among attendees of live concert events in Japan using contact-tracing data Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom The effect of SARS-CoV-2 variant B.1.1.7 on symptomatology, re-infection and transmissibility Biological and Clinical Consequences of Integrin Binding via a Rogue RGD Motif in the SARS CoV-2 Spike Protein Genetic Variants of SARS-CoV-2-What Do They Mean? The coronavirus is mutating-does it matter? A Founder Effect Led Early SARS-CoV-2 Transmission in Spain On the founder effect in COVID-19 outbreaks: How many infected travelers may have started them all? Natl Insights on early mutational events in SARS-CoV-2 virus reveal founder effects across geographical regions Identification of the nucleotide substitutions in 62 SARS-CoV-2 sequences from Turkey Updated List of Green, Yellow and Red Countries Eligible to Travel to and from Jordan Mutations in SARS-CoV-2 viral RNA identified in Eastern India: Possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity MAFFT-DASH: Integrated protein sequence and structural alignment Our World in Data: Coronavirus Source Data Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin COVID-19 Lineage Assigner Molecular Evolutionary Genetics Analysis version 6.0 Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees Predicting the functional effect of amino acid substitutions and indels New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0 Smart Model Selection in PhyML Bayesian phylogenetics with BEAUti and the BEAST 1.7