key: cord-0903393-i1o1ytnr authors: Wang, Xin; Hu, Mingda; Jin, Yuan; Wang, Boqian; Zhao, Yunxiang; Liang, Long; Yue, Junjie; Ren, Hongguang title: Global Mutational Sweep of SARS-CoV-2: from Chaos to Order date: 2021-11-17 journal: bioRxiv DOI: 10.1101/2021.11.16.468834 sha: 40a8fd93cd158bb35409acf33cefcf908212003c doc_id: 903393 cord_uid: i1o1ytnr Analysis of large-scale genome sequences demonstrates the mutation of SARS-CoV-2 has been undergoing significant sweeps. Driven by emerging variants, global sweeps are accelerated and purified over time. This may prolong the pandemic with repeating epidemics, presenting challenges to the control and prevention of SARS-CoV-2. the mutation of SARS-CoV-2 undergoes a significant sweeps, in which previous variants (mutation combinations) are swept and replaced by new ones with possible adaptive advantages. Over time, the replacing process for later sweeps may have been accelerated, which can be seen from the sharp borders of later squares in the figure. Furthermore, later squares are darker than earlier ones, suggesting an increasingly genomic homogeneity over phases, which is due to more purified sweeps of variants as the pandemic goes on. The driving forces behind this phenomenon may be related to the enhanced fitness or adaptation of variants to human beings. We have examined the aforementioned observation in a number of countries with large-scale SARS-CoV-2 sequences separately. Despite the regional differences, the conclusion holds for almost all situations (see Supplementary Figure S1 -S10), which all show phases (squares) divided by mutational sweeps. Benefit from abundant genome sequences, the figure of the United Kingdom is considerably representative (see Figure 1A ), which shows several clear phases divided by sweeps dominated by typical SARS-CoV-2 variants. We further compared the mutation spectrums among different regions. Due to the difference in both the control measure and the prevalence of variants, the heat map shows a variety of shapes. Taking comparing the UK and the US as an example (see Supplementary Figure S11 -S12), the persistence of Alpha variant and Delta variant in the UK are longer than those in the US, so the similar mutation spectrum shows rectangles rather than squares. Note the mutation spectrums of these two countries showed less similarities in 2020, indicating the early regional genomic difference between the two countries. Nevertheless, the emergence of Alpha and then Delta variants in 2021 quickly converged such regional diversity. This implies that the evolution of SARS-CoV-2 has been undergoing selective sweeps both regionally and globally, in which previous local predominant strains can be quickly replaced by imported variants, e.g., Alpha and Delta variants, which has evolutionary advantages either in transmission or host adaptation, or both. At the time this manuscript being submitted, the Delta variant has almost completed its sweep process throughout the world and become a global dominant variant. New SARS-CoV-2 variants with enhanced fitness will surely emerge in the future to replace the former predominant variant, but the possibility of co-circulating of multiple competing variants is low. It seems that the SARS-CoV-2, after the host-jumping event, has finished the early stages in adaptation to human beings through chaotic mutations and evolved into relatively persistent stabilized adaptations. More or less like the seasonal influenza virus [5] , the alternation of epidemic strains of SARS-CoV-2 may become periodic. Supplementary Figure S3 ................................................................................................................. 5 Supplementary Figure S4 ................................................................................................................. 6 Supplementary Figure S5 We collected 2,487,499 high-quality SARS-CoV-2 complete genome sequences from GISAID Website (c.f. Fasta ID.csv for detailed information). For each genome, the nucleotide mutation is calculated in comparison with Wuhan-Hu-1 (GenBank accession number NC_045512). We study the mutation spectrum of genomes in a given region, focusing on the whole world and four object countries, Brazil, India, the United Kingdom, and the United States. Nucleotide mutations with global occurrence of less than 10,000 are considered infrequent and then abandoned. Additionally, mutations absent in any of the four countries are neither included. Thus, 475 major nucleotide mutations remain for further studies. We depict each region by its weekly mutation spectrum of genomes, consisting of the weekly proportion of 475 major mutations from Feb 24, 2020, to Aug 16, 2021, namely a period of 78 weeks. A two-week window is screened in the spectrum and the similarity is calculated, using both the Cosine similarity and the Frobenius similarity. For the Cosine similarity, the matrix of each window is flattened into a row vector, then the Cosine similarity between flattened vectors and is calculated as follows. For matrices of two windows, , ∈ ℝ , we define the Frobenius similarity between windows by the minus Frobenius norm of the difference between matrices and . A pneumonia outbreak associated with a new coronavirus of probable bat origin SARS-CoV-2 variants, spike mutations and immune escape Global initiative on sharing all influenza datafrom vision to reality A selective sweep in the Spike gene has driven SARS-CoV-2 human adaptation Comparing SARS-CoV-2 with SARS-CoV and influenza pandemics. The Lancet Infectious Diseases Global initiative on sharing all influenza data -from vision to reality None declared. We downloaded SARS-CoV-2 genome sequences as of Sep 08, 2020, from GISAID Website 1 . Only high-quality complete sequences are retained and thus we obtained 2,487,499 genome sequences of SARS-CoV-2. Please c.f. Supplementary_Fasta_ID.csv for detailed information of those genome sequences.