key: cord-0293616-8wgqvoo8 authors: Aris-Brosou, Stéphane; Parent, Louis; Ibeh, Neke title: Viral long-term evolutionary strategies favor stability over proliferation date: 2019-07-12 journal: bioRxiv DOI: 10.1101/539239 sha: 133656553141552d43f0900d8a1e6dcb8b594d19 doc_id: 293616 cord_uid: 8wgqvoo8 Viruses are known to have some of the highest and most diverse mutation rates found in any biological replicator, with single-stranded (ss) RNA viruses evolving the fastest, and double-stranded (ds) DNA viruses having rates approaching those of bacteria. As mutation rates are tightly and negatively correlated with genome size, selection is a clear driver of viral evolution. However, the role of intragenomic interactions as drivers of viral evolution is still unclear. To understand how these two processes affect the long-term evolution of viruses infecting humans, we comprehensively analyzed ssRNA, ssDNA, dsRNA, and dsDNA viruses, to find which virus types and which functions show evidence for episodic diversifying selection and correlated evolution. We show that selection mostly affects single stranded viruses, that correlated evolution is more prevalent in DNA viruses, and that both processes, taken independently, mostly affect viral replication. However, the genes that are jointly affected by both processes are involved in key aspects of their life cycle, favoring viral stability over proliferation. We further show that both evolutionary processes are intimately linked at the amino acid level, which suggests that it is the joint action of selection and correlated evolution, and not just selection, that shapes the evolutionary trajectories of viruses – and possibly of their epidemiological potential. Humanity is regularly reminded of the epidemiological toll of viruses, in part due to recent To date, one of the most salient evolutionary feature shared by all viruses is the ex-38 istence of a negative correlation between mutation rate and genome size Holmes (2009) . 39 This is a critical result as it suggests that selection is driving the evolution of mutation (Tables S1-S4) ; subtypes / geno-79 types / clades were treated as independent data sets. Although some of these viruses could 80 be segmented or not, with circular or linear genome, with positive or negative strands, and 81 with or without overlapping reading frames, accounting for these structural features would 82 have led to smaller and smaller data sets, precluding any statistical analysis, so that the 83 data were not split beyond viral type. Each list contained the virus names, the length of 84 their genome, their number of protein-coding genes (CDS's), and was associated with a ref-85 erence coding sequence (see query sequences.zip at https://github.com/sarisbro). In order to obtain corresponding sequence alignments of orthologous genes, BLASTn 87 searches were performed on a custom database limited to viral genes present in NCBI 88 nucleotide database with blast-2.6.0+. For this, all gbvrl*.seq.gz files were downloaded 89 from ftp.ncbi.nih.gov/genbank while querying viruSITE, and were concatenated into 90 a single GenBank file, then converted into a FASTA file with readseq to specifically 91 extract CDS's Gilbert (2003) . This was done to avoid retrieving 5' and 3' untranslated 92 regions that would cause problems to the downstream codon analyses. BLASTn searches 93 were performed for each viruSITE viral sequence with a stringent E-value threshold of 94 10 −100 , keeping a maximum of 100 sequences with at least 80% coverage with each query; 95 this ensured that subtype / genotype / clade boundaries were not crossed. As the viruses 96 retrieved from the viruSITE also included viruses that require a vector (e.g., arboviruses 97 such as the Dengue and Yellow fever viruses), or viruses that circulate in non-human 98 hosts but that can lead to zoonoses (e.g., the Camel alphacoronavirus leading to MERS Gblocks 0.91b at the codon level using the stringent default settings Castresana (2000) . Both trees and alignments are available from https://github.com/sarisbro. To obtain gene annotations, Gene Ontology (GO) terms were retrieved from gene (Table S1 ; P < 0.01). In spite of these differences, we note that episodic 227 diversifying selection mostly affects genes involved in viral replication (Table S1 ). Genes (Table S2) . Again, most of these 231 functions and processes are involved in viral replication (Table S2 ). At the intersect of 232 these evolutionary processes however, the genes that are jointly affected by selection and Table S3 ). This suggests that despite key differences in life history strategies adopted We note however that we neglected some aspects of viral structure: indeed, viruses can 297 be segmented or not, with a circular or linear genome, with positive or negative strands, 298 overlapping reading frames, complications that we could not consider here due to the 299 resulting small sample sizes, even if these factors can impact the mode of evolution of 300 viruses Holmes (2009). Future work should however strive to address these limitations. 301 We also neglected the population genetics context in which different viruses evolve, con-302 text that can often be correlated to structural constraints Lynch (2007); Holmes (2009). Furthermore, as we solely focused on intragenic interactions, and not intergenic or higher 304 order correlations, it is not impossible that we missed higher-level constraints affecting TranslatorX: multiple alignment of 318 nucleotide sequences guided by amino acid translations Gene set enrichment analysis with topGO. Biocon-320 ductor Improv 27 Viral outbreaks involve destabilized evolu-322 tionary networks: evidence from Ebola The essentials of computational molecular evolu-324 tion Deep mutational 326 scanning identifies sites in influenza nucleoprotein that affect viral inhibition by MxA Forecasting national 329 and regional influenza-like illness for the USA Selection of conserved blocks from multiple alignments for their use 331 in phylogenetic analysis Blast2GO: a 333 universal tool for annotation, visualization and analysis in functional genomics research Viruses' life history: towards a mechanistic basis of a 336 trade-off between survival and reproduction among phages The idiosyncratic drivers of 338 correlated evolution Why are RNA virus mutation rates so damn high? Muscle: multiple sequence alignment with high accuracy and high 342 throughput 2016a Zika virus in the americas: Early epidemiological and genetic findings 347 Mobile real-time surveillance of Zika virus in Brazil A sensitive genetic assay for the 349 detection of cytosine deamination: determination of rate constants and the activation 350 energy Extent and evolution of gene 352 duplication in DNA viruses Experimental selection reveals a trade-off be-354 tween fecundity and lifespan in the coliphage Qß Sequence file format conversion with command-line readseq Appendix 1E Genomic 358 surveillance elucidates Ebola virus origin and transmission during the Stability-mediated epistasis constrains 361 the evolution of an influenza protein Unifying 363 the epidemiological and evolutionary dynamics of pathogens The evolution and emergence of RNA viruses What does virus evolution tell us about virus origins? Both epistasis and diversify-369 ing selection drive the structural evolution of the Ebola virus glycoprotein mucin-like 370 domain Not so different after all: a comparison 372 of methods for detecting amino acid sites under selection Prevalence of epis-374 tasis in the evolution of influenza A surface proteins The fitness landscape of a tRNA 376 gene The origins of genome architecture Mutation and epistasis in influenza virus evolution. 380 Viruses 10 The unsolved challenge to phylogenetic cor-382 relation tests for categorical characters Com-384 pensatory evolution in mitochondrial tRNAs navigates valleys of low fitness Detecting 387 individual sites subject to episodic diversifying selection Coordinated 389 evolution of influenza A surface proteins Widespread historical contin-391 gency in influenza viruses HyPhy: hypothesis testing using 395 phylogenies Spidermon-397 key: rapid detection of co-evolving sites using Bayesian graphical models FastTree 2-approximately maximum-400 likelihood trees for large alignments R: A Language and Environment for Statistical Computing. R Foun-402 dation for Statistical Computing phytools: an R package for phylogenetic comparative biology (and 404 other things) Molecular 406 evolution of human species D adenoviruses Predicting the emergence of H3N2 influenza viruses 408 reveals contrasted modes of evolution of HA and NA antigens From molecular genetics to phylodynamics: evolutionary relevance of 410 mutation rates across viruses Why do RNA viruses recombine? Origins 414 and evolutionary genomics of the 2009 swine-origin H1N1 influenza a epidemic viruSITE-integrated database for viral genomics Genomic characterization of a newly discovered coronavirus associated with acute res-420 piratory distress syndrome in humans Darwinian 422 evolution can follow only very few mutational paths to fitter proteins Constraints from protein structure 425 and intra-molecular coevolution influence the fitness of HIV-1 recombinants A procedure for robust estimation 428 and inference in linear regression