key: cord-0762332-gfwqog3x authors: Farkas, Carlos; Fuentes-Villalobos, Francisco; Garrido, José Luis; Haigh, Jody J; Barría, María Inés title: Insights on early mutational events in SARS-CoV-2 virus reveal founder effects across geographical regions date: 2020-04-12 journal: bioRxiv DOI: 10.1101/2020.04.09.034462 sha: 2aedc6d26bddc8805490e147a0b928ed23ff897b doc_id: 762332 cord_uid: gfwqog3x Here we aim to describe early mutational events across samples from publicly available SARS-CoV-2 sequences from the sequence read archive repository. Up until March 27, 2020, we downloaded 53 illumina datasets, mostly from China, USA (Washington DC) and Australia (Victoria). Of 30 high quality datasets, 27 datasets (90%) contain at least a single founder mutation and most of the variants are missense (over 63%). Five-point mutations with clonal (founder) effect were found in USA sequencing samples. Sequencing samples from USA in GenBank present this signature with 50% allele frequencies among samples. Australian mutation signatures were more diverse than USA samples, but still, clonal events were found in those samples. Mutations in the helicase and orf1a coding regions from SARS-CoV-2 were predominant, among others, suggesting that these proteins are prone to evolve by natural selection. Finally, we firmly urge that primer sets for diagnosis be carefully designed, since rapidly occurring variants would affect the performance of the reverse transcribed quantitative PCR (RT-qPCR) based viral testing. The COVID-19 pandemic caused by a novel 2019 SARS coronavirus, known as SARS-CoV-2, that SARS-CoV-2 is a close relative of the RaTG13 bat-derived coronavirus (around 88% 53 identity) rather than of SARS-CoV-1 (79% identity) or middle east respiratory syndrome 54 coronavirus MERS-CoV (50% identity) (Lu et al. 2020 ). Due to this association with bat 55 coronaviruses, it was also argued that SARS-CoV-2 virus has the potential to spread into another 56 species, as bat coronaviruses do (Hu et al. 2018) . Recently, it was demonstrated that SARS-CoV-57 2 is closely related to a pangolin coronavirus (Pangolin-CoV) found in dead Malayan pangolins 58 with a 91.02% identity, the closest relationship found so far for SARS-CoV-2 (Zhang et al. 59 2020). In that study, genomic analyses revealed that the S1 protein of Pangolin-CoV is related 60 closer to SARS-CoV-2 than to RaTG13 coronavirus. Also, five key amino acid residues involved 61 in the interaction with the human ACE2 receptor are maintained in Pangolin-CoV and SARS-62 CoV-2, but not in RaTG13 coronavirus. Thus, it is likely that pangolin species are a natural 63 reservoir of SARS-CoV-2-like coronaviruses and SARS-CoV-2 will continue to evolve with 64 novel mutations, as the pandemic evolves. Australian SARS-CoV-2 mutations, which were found to be heterogeneous. A mutational signature from USA mutations was however found in an Australian sample, suggesting a world-80 wide spread of this molecular signature consisting of five-point mutations. Remarkably, 81 mutations in the helicase and orf1a proteins of the virus were found more frequently than others, 82 suggesting that these proteins are prone to evolve throughout natural selection. As proof of the 83 latter, a single nucleotide polymorphism (SNP) in an Australian sample causes a bona-fide stop 84 codon in the helicase protein, strongly suggest this protein will evolve on SARS-CoV-2 in the 85 future. As genetic drift prompts the mutational spectrum of the virus, we recommend frequently Since SARS-CoV-2 is an RNA virus that rapidly evolves after infection, these evolutionary 241 events will likely affect its fitness over time. In this study, we reveal the early mutational events Inhibition of RNA Helicases of ssRNA Virus Belonging to Flaviviridae, Coronaviridae and Picornaviridae Families Quantification of the detrimental effect of a 296 single primer-template mismatch by real-time PCR using the 16S rRNA gene as an 297 example Keep up with the latest coronavirus research The spike 301 glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site 302 absent in CoV of the same clade SARS and MERS: recent 305 insights into emerging coronaviruses Apparent founder effect during the early 308 years of the San Francisco HIV type 1 epidemic (1978-1979) Genomic characterization and infectivity of a novel SARS-312 like coronavirus in Chinese bats Identification of Coronavirus Isolated from a Patient Genome analyses help track coronavirus' moves Fast gapped-read alignment with Bowtie 2 322 and Genome Project Data Processing S. 2009. The Sequence Alignment/Map format 323 and SAMtools and Tan W. 2020. Genomic characterisation and epidemiology of 2019 novel 328 coronavirus: implications for virus origins and receptor binding Integrative genomics viewer Genomic diversity 334 of SARS-CoV-2 in Coronavirus Disease 2019 patients GISAID: Global initiative on sharing all influenza data -from 337 vision to reality FDA-ARGOS is a database with public quality-controlled reference 341 genomes for diagnostic use and regulatory science 344 Chikungunya virus emergence is constrained in Asia by lineage-specific adaptive 345 landscapes Can we contain the COVID-19 outbreak with the 347 same measures as for SARS? Lancet Infect Dis Genome Composition and 350 Divergence of the Novel Coronavirus (2019-nCoV) Originating in China A new coronavirus associated with human respiratory disease in China Primer-BLAST: 357 a tool to design target-specific primers for polymerase chain reaction Probable Pangolin Origin of SARS-CoV-2 Associated with 360 the COVID-19 Outbreak