key: cord-327520-qj7coqfr authors: Wei, Yulong; Silke, Jordan R.; Aris, Parisa; Xia, Xuhua title: Coronavirus genomes carry the signatures of their habitats date: 2020-06-13 journal: bioRxiv DOI: 10.1101/2020.06.13.149591 sha: doc_id: 327520 cord_uid: qj7coqfr Coronaviruses such as SARS-CoV-2 regularly infect host tissues that express antiviral proteins (AVPs) in abundance. Understanding how they evolve to adapt or evade host immune responses is important in the effort to control the spread of COVID-19. Two AVPs that may shape viral genomes are the zinc finger antiviral protein (ZAP) and the apolipoprotein B mRNA-editing enzyme-catalytic polypeptide-like 3 protein (APOBEC3). The former binds to CpG dinucleotides to facilitate the degradation of viral transcripts while the latter deaminates C into U residues leading to dysfunctional transcripts. We tested the hypothesis that both APOBEC3 and ZAP may act as primary selective pressures that shape the genome of an infecting coronavirus by considering a comprehensive number of publicly available genomes for seven coronaviruses (SARS-CoV-2, SARS-CoV, MERS, Bovine CoV, Murine MHV, Porcine HEV, and Canine CoV). We show that coronaviruses that regularly infect tissues with abundant AVPs have CpG-deficient and U-rich genomes; whereas viruses that do not infect tissues with abundant AVPs do not share these sequence hallmarks. In SARS-CoV-2, CpG is most deficient in the S protein region to evaded ZAP-mediated antiviral defense during cell entry. Furthermore, over four months of SARS-CoV-2 evolutionary history, we observed a marked increase in C to U substitutions in the 5’ UTR and ORF1ab regions. This suggests that the two regions could be under constant C to U deamination by APOBEC3. The evolutionary pressures exerted by host immune systems onto viral genomes may motivate novel strategies for SARS-CoV-2 vaccine development. sequence region of APOBEC3 and ZC3HAV1 isoforms were extracted in FASTA format along with 140 their ENSEMBL Accession IDs. 141 To compare gene expressions of APOBEC3 and ZC3HAV1L among tissues, we retrieved publicly 142 available RNA Sequencing and Microarray studies that each sampled at least 10 mammalian 143 tissues. The five mammalian species that have extensive tissue-specific mRNA expressions are 144 Homo sapiens, Bos taurus, Canis lupus familiaris, Mus musculus, and Sus scrofa. 160 Given that the data extracted were from multiple independent sources, thus not directly 161 comparable, the relative mRNA expression level designations (high or low) for APOBEC3 and 162 ZAP isoforms in a given tissue were derived from comparisons among AVP expressions in all 163 tissues in each independent source. Specifically, we calculated the proportion of mRNA , we calculated the averaged PME value by considering all tissue-175 specific PME values in each independent source. Finally, for each AVP, tissue-specific PMEs 176 were designated as high if they are greater than the averaged PME value and low if they are 177 less than the averaged PME. In addition, each column in Supplemental figures S1 and S2 with 178 column title designations "APOBEC3" or "ZC3HAV1" contains the tissue-specific AVP 179 expressions from an individual source, where darkest blue represents the tissue with the 180 highest mRNA expression and darkest red represents the lowest mRNA expression. Retrieving and processing the genomes and regular habitats of coronaviruses infecting five 182 mammalian species 183 The genome, Accession ID, and Sample 191 We computed the nucleotide and di-nucleotide frequencies in each viral genome. Among (2) 207 The index is expected to be 1 with no CpG deficiency or excess, smaller than 1 if CpG is deficient 208 and greater than 1 if CpG is in excess. 209 Next, among 2666 high sequence quality and complete SARS-CoV-2 genomes from CNCB, we records of tissue infection are located in Supplemental File S1). 258 We determined which human tissues are commonly infected by coronaviruses and whether 259 these tissues express AVPs in abundance. Figure expressions, such as Porcine HEV infecting pig liver (Fig. 2a) , Canine CoV infecting dog intestine 283 and lung (Fig. 2b) , and Bovine CoV infecting cattle intestine (Fig. 2c ). All three of these 284 coronaviruses do not avoid tissues with high AVP expressions, nor do they display a compelling 285 preference for tissues with low AVP expressions. Lastly, Murine MHV regularly infects mice 286 brain and liver but rarely infects the lung; however, mice brain and liver express low levels of Only coronaviruses targeting tissues with high AVP expressions exhibit decreased CpG and 323 increased U content 324 In the previous section, we demonstrated that many surveyed host-specific coronaviruses 325 commonly infect tissues that exhibit high levels of AVPs (Fig. 1, 2; supplemental Fig. S1, S2 ), but 326 MHV does not conform to this observation (Fig. 2d) . Here we compared the CpG and U content 327 of these coronaviruses and found that viruses that regularly infect AVP-rich tissues tend to Based on global sequence comparison, figure 4a shows that most SNPs are C->U substitutions. 380 More specifically, local mutation patterns (Fig. 4b) show that among 28475 sequence samples, increases over time, but only at the 5' UTR region (Fig. 5a) and ORF1ab region (Fig. 5b) and not 391 in other regions (Fig. 5c, 5d, Supplemental Fig. S5 ). It is noteworthy that in the S region, tissues expressing both AVPs in abundance (Fig. 2a, 2b, and 2c) . Unsurprisingly, these global 444 trends were absent from Murine MHV genomes (Fig. 3) as this virus does not regularly infect 445 tissues that highly express AVPs (Fig. 2d) APOBECs and virus restriction Antiretroviral Restriction Factors in Pteropid Bats as Revealed by APOBEC3 Gene Are pangolins the intermediate host of the 608 2019 novel coronavirus (SARS-CoV-2)? Expression (GTEx) project Broad antiretroviral defence by 612 human APOBEC3G through lethal editing of nascent reverse transcripts Structure 615 of the zinc-finger antiviral protein in complex with RNA reveals a mechanism for 616 selective targeting of CG-rich viral sequences APOBEC3-mediated 619 restriction of RNA virus replication Nucleic acid determinants for selective 621 deamination of DNA over RNA by activation-induced deaminase Conservation, 624 acquisition, and functional impact of sex-biased gene expression in mammals The role of ZAP and OAS3/RNAseL pathways in the attenuation of an RNA virus with elevated frequencies of 630 TISSUES 2.0: an integrative 632 web resource on mammalian tissue expression Quasispecies structure, 634 cornerstone of hepatitis B virus infection: mass sequencing approach Adenosine Deaminases Acting on RNA (ADARs) are Both Antiviral and Proviral Genome Database: new annotation tools for a new reference genome. Nucleic Acids 640 Research Structural basis of receptor recognition by 642 SARS-CoV-2 The double-domain cytidine 644 deaminase APOBEC3G is a cellular site-specific RNA editing enzyme APOBEC3A 646 cytidine deaminase induces RNA editing in monocytes and macrophages Mitochondrial 649 hypoxic stress induces widespread RNA editing by APOBEC3G in natural killer cells Isolation of a human gene that 652 inhibits HIV-1 infection and is suppressed by the viral Vif protein Rampant C->U hypermutation in the genomes of SARS-CoV-2 and 654 other coronaviruses -causes and consequences for their short and long evolutionary 655 trajectories Coronaviruses lacking 657 exoribonuclease activity are susceptible to lethal mutagenesis: evidence for 658 proofreading and potential therapeutics CG dinucleotide 660 suppression enables antiviral defence targeting non-self RNA On the origin and continuing evolution of SARS-662 Within-patient mutation 664 frequencies reveal fitness costs of CpG dinucleotides and drastic amino acid changes in 665 HIV CpG-Recoding in Zika Virus Genome Causes Host-Age-Dependent Attenuation of Infection With Protection Against 668 Lethal Heterologous Challenge in Mice RNA virus attenuation 670 by codon pair deoptimisation is an artefact of increases in CpG/UpA dinucleotide 671 frequencies 673 Translation-associated mutational U-pressure in the first ORF of SARS-CoV-2 and other 674 coronaviruses APOBEC3G cytidine deaminase association with coronavirus 676 nucleocapsid protein The CpG dinucleotide 678 content of the HIV-1 envelope gene may predict disease progression Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral 683 Relationship of SARS-CoV to other pathogenic RNA 685 viruses explored by tetranucleotide usage profiling A comparative encyclopedia of DNA 687 elements in the mouse genome Association of Potent Human Antiviral Cytidine Deaminases with 7SL RNA and Viral RNP in HIV-1 Virions Moderate mutation rate in the SARS 692 coronavirus genome and its implications Host Factor That Blocks Human Immunodeficiency Virus Type 1 Replication Zinc-finger antiviral protein inhibits HIV-1 697 infection by selectively targeting multiply spliced viral mRNAs for degradation