key: cord-300783-pvn2qq0f authors: Sadykov, Mukhtar; Mourier, Tobias; Guan, Qingtian; Pain, Arnab title: Short sequence motif dynamics in the SARS-CoV-2 genome suggest a role for cytosine deamination in CpG reduction date: 2020-08-07 journal: bioRxiv DOI: 10.1101/2020.06.19.161687 sha: doc_id: 300783 cord_uid: pvn2qq0f RNA viruses use CpG reduction to evade the host cell defense, but the driving mechanisms are still largely unknown. In an attempt to address this we used a rapidly growing genomic dataset of SARS-CoV-2 with relevant metadata information. Remarkably, by simply ordering SARS-CoV-2 genomes by their date of collection, we find a progressive increase of C-to-U substitutions resulting in 5’-UCG-3’ motif reduction that in turn have reduced the CpG frequency over just a few months of observation. This is consistent with APOBEC-mediated RNA editing resulting in CpG reduction, thus allowing the virus to escape ZAP-mediated RNA degradation. Our results thus link the dynamics of target sequences in the viral genome for two known host molecular defense mechanisms, mediated by the APOBEC and ZAP proteins. Viruses utilize numerous mechanisms to avoid the host cell defense. One such 34 mechanism is the CpG dinucleotide reduction observed in many single-stranded RNA 35 fraction of the observed C>U changes represent multiple, independent events ( Figure S3 ). The reported C>U frequency is therefore most likely an underestimate. Over this 106 period of five months, we find a steady increase in C>U substitutions ( Figure 1B 5'-UU-3') in viral dinucleotide frequencies for the 5-month period ( Figure S4 ). Among all 121 dinucleotides, UpC showed the highest degree of decrease, while UpU exerted the 122 highest rates of increase. CpG, CpA, CpC, CpU and GpA also showed a negative net 123 gain but not as prominent as UpC, CpA or ApG. We find that the majority of dinucleotide 124 losses were due to C>U changes, in agreement with a recent study by Rice and et al. 2020 ). When analyzing the context of genomic sites undergoing C>U changes we 129 noticed an enrichment for 5'-UCG-3' motifs (Table S2) . To assess the contribution of 130 C>U changes in CpG loss, we examined the dynamics of [A/C/G/U]CG trinucleotides 131 over time ( Figure 1D ). The progressive change (~1% over a 5-month period) of 5'-UCG-132 3' to 5'-UUG-3' is most striking when supported by a larger number of genomes (days 133 70 to 115), whereas no such pattern is observed for the other trinucleotides ( Figure 1D ). 134 The association between cytosine deamination and CpG loss is further underlined by 135 the rapid, progressive increase in 5'-UCG-3' > 5'-UUG-3' changes compared to other 5'-136 For all di-, tri-, and tetra-nucleotide motifs containing C in the reference genome, the noted. The ratio between these two measures was compared to the expected ratio, 215 defined as the number of C's with a C-to-U substitution divided by the total number of 216 C's in the genome. The probability was calculated using a binomial distribution. All 217 statistical tests were performed using Rstudio v. 1.1.414 (Booth et al. 2018) . 218 219 Folding potential 220 The reference sequence was divided into overlapping 500-nucleotides windows, each 221 shifted by 10 nucleotides. These sequences were folded using RNAfold (Lorenz et al. 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 We thank all laboratories which have contributed sequences to the GISAID database 270 and Zhadyra Yerkesh for giving her comments and helpful discussions. The data underlying this article are available in GISAID, at https://gisaid.org. The ID 283 numbers of genomes used are provided in Table S1 . UpA dinucleotide frequencies on RNA virus replication and characterization of 287 the innate cellular pathways underlying virus attenuation and enhanced 288 replication Imbalanced Host Response to SARS CoV-2 Drives Development of COVID-19 RStudio: Integrated 294 Development for R Diverse mechanisms used by cellular restriction factors 297 to inhibit virus infections The Heterogeneous Landscape and Early Evolution of Pathogen-Associated CpG Dinucleotides in SARS-CoV-2 Evidence for 312 host-dependent RNA editing in the transcriptome of SARS-CoV-2 Patterns of evolution and host 315 gene mimicry in influenza and other RNA viruses Dinucleotide evolutionary dynamics in 318 influenza A virus Induced Mutation of Human Immunodeficiency Virus Type-1 Contributes to 330 Adaptation and Evolution in Natural Infection SnapShot: Antiviral Restriction Factors ViennaRNA Package 2.0 Molecular Mechanism of 338 RNA Recognition by Zinc-Finger Antiviral Protein Mansky 341 LM. 2020. Deamination hotspots among APOBEC3 family members are defined 342 by both target site sequence context and ssDNA secondary structure The Zinc Finger Antiviral Protein restricts SARS-CoV-2. bioRxiv Evidence for strong mutation bias towards, and 357 selection against, U content in SARS-CoV-2: implications for vaccine design Acute Respiratory Syndrome Coronavirus Sequence Characteristics and 361 Evolutionary Rate Estimate from Maximum Likelihood Analysis Modeling the Embrace of a Mutator: APOBEC Selection of other coronaviruses -causes and consequences for their short and long 374 evolutionary trajectories The Evolutionary Pathway to Virulence of an RNA Virus Bieniasz 379 PD. 2017. CG dinucleotide suppression enables antiviral defence targeting non-380 self RNA Innate 382 immune signaling induces high levels of TC-specific deaminase activity in 383 primary monocyte-derived cells through expression of APOBEC3A isoforms Human 386 SARS-CoV-2 has evolved to reduce CG dinucleotide in its open reading frames Extreme genomic CpG deficiency in SARS-CoV-2 and evasion of host 396 antiviral defense Multi-site co-398 mutations and 5'UTR CpG immunity escape drive the evolution of SARS-CoV-2