key: cord-0986521-55zj33pu authors: Moya, Andrés; Holmes, Edward C.; González-Candelas, Fernando title: The population genetics and evolutionary epidemiology of RNA viruses date: 2004 journal: Nat Rev Microbiol DOI: 10.1038/nrmicro863 sha: 7ed854c45e67069041c0636b66ce3b81b7504b73 doc_id: 986521 cord_uid: 55zj33pu RNA viruses are ubiquitous intracellular parasites that are responsible for many emerging diseases, including AIDS and SARS. Here, we discuss the principal mechanisms of RNA virus evolution and highlight areas where future research is required. The rapidity of sequence change in RNA viruses means that they are useful experimental models for the study of evolution in general and it enables us to watch them change in 'real time', and retrace the spread through populations with molecular phylogenies. An understanding of the mechanisms of RNA virus sequence change is also crucial to predicting important aspects of their emergence and long-term evolution. Central to population genetics is understanding how the five main forces of evolutionary change -mutation, recombination, natural selection, GENETIC DRIFT and migration -interact to shape the genetic structure of populations. These same forces are also central to understanding RNA virus evolution, although their relative strengths differ to those observed for DNA-based organisms. For RNA viruses, most attention has been directed towards mutation, selection and genetic drift. We can understand their importance and interaction by considering four basic properties of RNA virus populations. First, RNA viruses often have very large population sizes, such that the number of viral particles in a given organism might be as high as 10 12 . Second, such immense population sizes, which are several orders of magnitude larger than those observed for cellular organisms, are a product of explosive replication. For example, a single infectious particle can produce an average of 100,000 viral copies in 10 hours. As natural selection is most efficient with large populations, it is no surprise that experiments using RNA viruses have shown that selection is of fundamental importance in controlling their evolutionary dynamics, such that new mutants with increased FITNESS (as measured by their selection coefficient, s) continually appear and out-compete older, inferior alleles 4 . Third, owing to the lack of proofreading activity in their polymerase proteins, RNA viruses exhibit the highest mutation rates of any group of organisms, approximately one mutation per genome, per replication 5, 6 . Finally, the genome sizes of RNA viruses are typically small, ranging from only 3 kb to ~30 kb, with a median size of 9 kb. These last two properties are intimately related because high-mutation rates are theoretically expected to limit genome size. In particular, a mutation rate that exceeds a notional ERROR THRESHOLD (set at approximately the reciprocal of the genome size) generates so many deleterious mutations in each replication cycle that even the fittest viral genomes are unable to reproduce, and population size decreases to extinction 7, 8 . However, RNA viruses that exist close to (but below) the error threshold are also THE POPULATION GENETICS AND EVOLUTIONARY EPIDEMIOLOGY OF RNA VIRUSES population that would experience genetic drift in the same way as the actual population. N e can be lower than the census population size (N) owing to various factors, including a history of population bottlenecks and variance in reproductive rates. A rugged fitness landscape is one with multiple fitness peaks. The more rugged a landscape the lower the average fitness correlations between neighbouring points. The independent evolution of similar traits in two or more unrelated or distantly related lineages. CLONAL INTERFERENCE In asexual populations beneficial mutations compete (or 'interfere') with each other as they go to fixation. Therefore, the fixation of advantageous mutations is sequential. This has important implications for the rate of adaptive evolution. COMPLEMENTATION The cooperative interaction of mutant genes in viral populations resulting in a phenotype closer to the wild type. In a broader sense, it refers to the use of genetic information belonging to another member of the population. The successive build-up of deleterious mutations in finite asexual populations. It has been proposed to be an important reason for the evolution of sexual reproduction. 280 | APRIL 2004 | VOLUME 2 www.nature.com/reviews/micro The quasispecies as a model of RNA virus evolution. The remarkable mutational power of RNA viruses has meant that their evolution has often been considered to be different to DNA-based organisms 1 . Key to this is the concept of the quasispecies, which was first developed by Eigen and Schuster 16 to understand the dynamics of primitive evolutionary systems. RNA viruses are of particular importance in this respect as they might represent biological entities that evolve according to the rules of quasispecies theory. The basis of quasispecies theory is the notion that the target of natural selection is not simply the fastest growing replicator, but rather a broad spectrum of mutants that are produced by erroneous copying of the fittest (or master) sequence 16, 17 . Natural selection acts on the entire quasispecies because mutation rates close to the error threshold mean that individual viral genomes are linked by a mutational coupling -all the possible mutational links between viral genomes are established -so that the whole population evolves as a single unit. One particularly important implication of this special form of group selection is that the fastest replicating RNA viral genomes could be out-competed by those with lower replication rates if the latter have a high probability of being generated by mutation from closely related variants. An important question is whether the quasispecies model is an accurate description of RNA virus evolution. Experimental evidence for the quasispecies was first reported for the bacteriophage Qβ 18 . Subsequent experiments with mammalian vesicular stomatitis virus (VSV) provided one of the most important supporting observations for the quasispecies -a high-fitness viral variant was suppressed by one of lower fitness 19 . However, this can also be explained by genetic drift; the probability that any variant achieves fixation in a population is partially dependent on its initial frequency, so most rare, albeit advantageous, variants are lost by drift in small populations. Indeed, a generic problem of quasispecies theory is that genetic drift is expected to be extremely restricted 17 , which might not be the case for viruses in Nature 20,21 . More recently, in vitro studies of the evolutionary dynamics of bacteriophage φ6 provided evidence for one aspect of quasispecies theory -that viral genomes differ in their mutational spectra and that this affects fitness 22 . However, because these experiments used small populations and RUGGED FITNESS LANDSCAPES, and because little is known about fitness landscapes in Nature, the generality of these results in uncertain. Although the quasispecies has a firm theoretical foundation, and there is some evidence for it in laboratory populations, whether it applies to RNA viruses in nature is less clear. For example, simple observations of high levels of genetic variation in RNA viruses are not sufficient to prove the existence of quasispecies, although this is often the only evidence presented, nor is the existence of an error threshold, which can easily be explained using evolutionary models 8 . Rather, to demonstrate that RNA viruses form quasispecies it is necessary to show that natural selection acts on viral able to produce many beneficial mutations in a short time, thereby enhancing adaptability, provided that their populations are sufficiently large. In the simple situation outlined above, RNA viruses should evolve in a highly deterministic manner, with the process of natural selection working efficiently on a vast array of mutational variants. Although it is true that RNA virus populations are often highly diverse, this is not sufficient to explain the entirety of RNA virus evolution. In particular, deterministic approaches assume that population sizes are universally large, such that the fate of a given mutation can be predicted if its frequency and fitness are known. Although the population sizes of RNA viruses are often very large, factors such as variation in replication potential among variants, differences in generation time among infected cells and POPULATION BOTTLENECKS, most notably during transmission between hosts, might lead to an EFFECTIVE POPULATION SIZE (denoted N e ) that is much smaller than the actual number of infected cells. Theory predicts that in populations where N e is small (such that the compound parameter N e s < 1), genetic drift has an important role in determining the frequency and fate of mutations 9 . Recombination might also have an important role in RNA virus evolution. Although most studies indicate that recombination rates in many RNA viruses are often lower than those in other organisms 10 , there are notable exceptions. Perhaps the most dramatic is HIV, in which the genomic recombination rate exceeds the genomic mutation rate 11 . Frequent recombination seems advantageous because it can create high fitness genotypes more rapidly than by mutation alone. Moreover, recombination might also purge deleterious mutations from virus populations, thereby preventing a dramatic decrease in fitness (see below). However, simulation studies have indicated that frequent recombination is more likely to reduce fitness when mutation rates are close to the error threshold 12 . Finally, recombination rates in RNA viruses might not be set by natural selection at all. Rather, they could simply be a passive function of the replication machinery or ecological circumstances of the virus in question. For example, recombination rates seem to be particularly low in negative-sense RNA viruses 13 , which might be a result of the RNA-packaging mechanism. Understanding the causes of variation in recombination rate among RNA viruses is a key area for future study. A final factor to consider in RNA virus evolution is migration. Migration (also referred to as gene flow) must not only be understood at a macroscopic level (that is, among hosts within a population, among populations or between host species), but also within a single infected individual. From the site of inoculation, viruses can be transported to several tissues, generating intra-host spatial variation 14 . However, the effect of a non-uniform population distribution on the spread, fitness and variability of virus populations has been much less studied than other evolutionary factors, although in some experiments a positive correlation between migration rate and the average fitness of the population has been observed 15 . it is also possible to directly estimate changes in fitness, one of the main goals of modern population genetics. A panoply of experiments have highlighted both the mechanisms of RNA virus evolution and what RNA viruses can tell us about evolution in general. Experimental studies have made a substantial contribution towards understanding the processes that govern RNA virus evolution 25 . The main findings of these experiments can be grouped into three general types. First, the molecular basis of adaptive evolution in viruses, including the occurrence and frequency of CONVERGENT EVOLUTION 26, 27 , viral attenuation 28 and compensatory mutations 29 . Second, the role of population bottlenecks and the accumulation of deleterious mutations, and how they affect fitness [30] [31] [32] [33] . Finally, the importance of CLONAL INTERFERENCE 34 and COMPLEMENTATION 35 in determining rates of viral adaptation. In brief, these studies make eight conclusions 36 . First, there is extensive convergent and parallel evolution (both in genotype and in phenotype) across lineages replicating in the same host, perhaps reflecting the fact that relatively few sites are free to vary when genome sizes are small. Second, advantageous mutations that are fixed early on when viruses are challenged with new environments confer the largest fitness benefit. Third, phenotypic evolution tends to mirror the evolution of fitness increments, with large changes occurring early in new environments. Fourth, rates of nucleotide substitution remain approximately constant through time. Fifth, overall genetic diversity remains low during the phase of maximum fitness increase and rises once fitness becomes asymptotic. Sixth, evolutionary changes that increase fitness in one host often, although not always, reduce fitness in an alternative host. Seventh, population bottlenecks and spatial heterogeneity lead to an increase in unique nucleotide substitutions. Finally, severe reductions in population size can lead to the accumulation of deleterious mutations and consequent fitness losses. Many fundamental questions in evolutionary biology have been addressed using RNA viruses as model experimental systems. One of the first questions addressed concerned a severe consequence of deleterious mutation accumulation known as MULLER'S RATCHET 37 . Studies have explored how clonally evolving RNA viruses prevent the excessive build-up of deleterious mutations in populations that are experiencing strong bottlenecks or small effective sizes 38 . One popular suggestion is that sexual reproduction has evolved in RNA viruses, because it allows them to escape Muller's ratchet when effective population sizes are small [39] [40] [41] . In this model, the accumulation of deleterious mutations is expected to be less of a problem for RNA viruses that either recombine or undergo reassortment in the case of viruses with segmented genomes. However, recombination rates are often so low in RNA viruses that it is difficult to hypothesize that they have a direct fitness benefit. In the face of low populations as a unit. Testing this prediction in Nature will be one of the most important future areas of study for those investigating the mechanics of viral evolution. Experimental evolution constitutes a powerful tool for simulating natural evolution 23 and is frequently used to test the basic principles of evolutionary theory 24 . BOX 1 provides a schematic overview of experiments of this type. The strength of the experimental approach is that the phenotypic and molecular changes of RNA viral populations can be monitored in real time. More importantly, under an appropriate experimental setting, This technique (shown in panel b of the figure) requires a mixture of the evolved mutant and the ancestral WT, which is then used to infect cell monolayers in a flask and two additional dishes (with and without the WT inhibitor, respectively). On the following day, a sample is taken from the first flask to infect cell monolayers in a new flask and two new dishes (under the same conditions as used previously). The sample from the second flask is then used to infect two new dishes, with and without the WT inhibitor. Plaque numbers are counted as a measure of virus particle numbers. Calculating the logarithm of the mutant/WT quotient against time gives the relative fitness of the evolved mutant. www.nature.com/reviews/micro R E V I E W S Molecular epidemiology was first introduced to the study of infectious disease in the early 1970s 53 . Since this time, the analysis of gene sequence variation has become a standard practice for virologists with an interest in epidemiology, especially with the advent of new, high-throughput sequencing technologies. From the typing of viral populations to the study of the origins of a new virus, viral gene sequence variation has been used to answer a wide variety of questions, increasing both the quantity and quality of the epidemiological data available. In this section, we discuss how particular aspects of RNA virus evolution affect the reconstruction of their epidemiological history. In this context, it is important to note that the observed epidemiological patterns of viruses result from their evolution at two different levels: within individual hosts 54 (and vectors 55 ) and among hosts at the population level. RNA viruses differ greatly in their patterns and processes of intra-and inter-host evolution, as well as in the duration of the infection caused and the type of immune response that is induced. Such factors must be considered when discussing their epidemiological dynamics in a comparative setting 3 . Many aspects of the epidemiological history of viruses can be graphically summarized in a phylogenetic tree. The timescale of these trees can vary from a few weeks to many centuries, and depends on the rate of accumulation of variation in the sequences under study and the timescale of sampling. Although graphically very similar, there are important differences between phylogenetic trees and gene genealogies 56 . The former are used to analyse the evolutionary history of distinct viral species or genes, usually by sampling one representative of each unit under study. By contrast, genealogies depict the history of genetic polymorphisms segregating in contemporaneous populations. Gene genealogies have been used extensively in the study of RNA viruses recombination rates, RNA viruses could escape Muller's ratchet because their long-term effective population sizes might not be small enough to allow deleterious mutations to accumulate, or perhaps compensatory mutations are sufficiently frequent to counteract fitness losses 42 . Experimental evolution with RNA viruses has also been crucial for studying the dynamics of natural selection. This has been the case, for instance, in studies of COMPETITIVE EXCLUSION 43 , the RED QUEEN HYPOTHESIS 43 , convergent evolution 26, 27, 44 and the rhythm of adaptive change 45 . We expect RNA viruses to continue to have an important role in this area for many years to come. Finally, one promising area involves using RNA viruses to test theories for the evolution of cooperation. For example, Turner and Chao conducted a series of experiments with bacteriophage φ6 in which they demonstrated that RNA viruses could evolve under PRISONER'S DILEMMA conditions 46 , and also escape from it 47 . Despite the power of experiments, there are still difficulties in estimating some of the important parameters in evolutionary biology, such as the rate of deleterious mutation and fitness values. In recent years, a number of in silico approaches have been used to answer these questions, most notably with computer-generated genomes ('digital organisms') that are designed to behave as living systems. Digital organisms have the ability to create a copy of their own genome, but are subject to copying errors, so that populations of programs evolve in, and adapt to, their environments 48 . Although digital organisms are not as sophisticated as viruses, they are useful study tools because experiments can be easily controlled and repeated. Digital organisms have been useful in studying predicted adaptive evolution over short and long timescales 49,50 , the role of epistasis in evolution 51 and testing key aspects of quasispecies theory, particularly whether fast replicators can be less fit than slower replicators at high-mutation rates 52 . In the absence of ecological niche differentiation, only one of a set of competing species can occupy a particular niche, leading to the elimination of other species. A dynamic equilibrium between competing species, in which no species can ever win, and new adversaries continually replace the losers. It therefore depicts an ongoing arms race. A theoretical game that highlights the costs and benefits of the evolution of cooperation. For example, games can be played when the population is composed of 'defectors' and 'cooperators' . The theory that the evolution of gene or protein sequences proceeds at a constant rate. Establishing the source of a viral outbreak is one of the many applications of molecular epidemiology. Occasionally, this work has implications other than epidemiological. For instance, Ou et al. 68 were able to identify an HIV-infected dentist as the inadvertent donor of the virus to some of his patients, and to simultaneously discard him as the source of viruses infecting others. More recently, a similar analysis identified a medical doctor as the source of HIV infection in a former lover 133 . This is the first case in which a molecular epidemiological analysis has been accepted as evidence in a criminal court in the United States. Owing to the rapid rate of RNA virus evolution, they can be useful in forensic medicine, given a rigorous statistical framework. For this, it is necessary to translate the proposals of both the defendant and the prosecutor into testable phylogenetic hypotheses. González-Candelas et al. 134 used this approach to evaluate the individual likelihoods of a number of hepatitis C virus patients having been infected in a hospital from a common source. In this case, the prosecutor's hypothesis was that each patient was infected at the hospital. Conversely, the defendant's hypothesis was that each patient became separately infected from the general population. These two proposals translate into different phylogenetic tree topologies, the likelihoods of which can be compared directly. The prosecutor's proposal is equivalent to observing a monophyletic group comprising the infected patients, which is significantly distinct from the background population, whereas the defendant's proposal corresponds to the grouping of the sequences from each infected patient with those of the background population. This molecular analysis proved that all the individuals were from a single outbreak, and was the first report of the use of molecular phylogenies to determine the likelihood of a patient sharing the source of infection with other infected patients. essential for the identification of positive 72, 73 , or purifying 74 , natural selection at nucleotide sites, the presence and extent of recombination 13, 75, 76 and for dating important points in the history of epidemics 67,77,78 . Genealogical analysis is especially relevant for reconstructing the recent epidemiological history of viral populations 66,79-81 , such as in forensic studies (BOX 2, FIG. 1) , and therefore have important implications for determining public health policies. Although it is fair to assume that frequent mutation means that long-term rates of nucleotide substitution are usually high in RNA viruses, in reality these rates might vary widely, both within and among genes in the same species and among viral species 82 . Indeed, present data indicate that viral substitution rates are much more variable than their underlying mutation rates 5,6 , which is most likely a reflection of important differences in replication dynamics. For example, the nucleotide substitution rate in human T-lymphotropic virus type II (HTLV-II) varies from ~1 × 10 -4 substitutions per site per year in epidemics with high rates of transmission and where replication is rapid, such as those in injecting drug users, to ~1 × 10 -7 substitutions per site per year in endemic situations, where viruses are maintained within hosts through the clonal expansion of infected cells rather than by active replication 83 . However, in many RNA viruses, substitution rates of 1 × 10 -3 to 1 × 10 -4 substitutions per site per year are observed 82 . The variation in substitution rates across viral genomes has benefits, because it allows different epidemiological questions to be addressed, relating to different temporal scales. So, rapidly evolving (hypervariable) gene regions are informative for studying viral evolution within individuals, or for identifying the source of a particular disease outbreak. More conserved regions are better suited for in-depth phylogenetic inference, from analysing viral genotypes at the species to family levels. In a number of cases it has been proposed that molecular evolution within specific RNA virus genes proceeds at a constant rate. Such MOLECULAR CLOCKS have been proposed for human influenza A virus 84 , although the constancy of the evolutionary rate does not hold in many other cases 82,85-87 . Non-clocklike evolution can result from a number of evolutionary forces, such as changes in host species, changes in structural and functional constraints 88, 89 , and the occurrence of positive selection. Although most modern methods of phylogenetic analysis incorporate such rate variation -so that it is unlikely to cause significant error in the reconstruction of tree topologies -it can have an important impact on the analysis of divergence times. The past 25 years have seen the emergence of several RNA viruses,which are either new to medical science or have increased in prevalence to the extent that they are now a major concern for public health. Agents that fall into this category include HIV, hepatitis C virus and, most recently, severe acute respiratory syndrome because their rapid evolution means that sequence variation increases over very short periods of time. Indeed, RNA viruses constitute the most important class of measurably evolving populations 57 , with evolution even occurring during the infection of a single individual 58 . More accurate methods of evolutionary inference have already been designed for these rapidly evolving populations 59-61 . In particular, whereas phylogenetic trees generally depict strictly bifurcating patterns of relationships, gene genealogies can take into account recombination, which will lead to interconnected networks of lineages. Moreover, using a statistical approach called coalescent theory 62-64 , it is possible to infer demographic processes from genetic polymorphism data, most notably, rates of viral population growth and decline. Coalescent methods can operate under several population genetic models and use gene genealogies as key analytical tools. Coalescent theory therefore provides a crucial conceptual link between phylogenetics and population genetics. Both phylogenies and gene genealogies are relevant for the epidemiological analysis of RNA viruses and are useful for investigating the origin of new viruses 65-67 or identifying the source of an outbreak [68] [69] [70] [71] . Used in conjunction with population genetic theory, they are also Figure 1 | Two alternative epidemiological scenarios translate into different phylogenetic tree topologies, the statistical support for which can be compared directly. The tree in panel a depicts a common and close origin for samples 1-3 (node A), which is separate from the control samples 4-7 (node B). Node A might correspond to a single outbreak or a suspected transmission among these patients, whereas node B includes samples suspected, but not related to, the outbreak (4-7) and unrelated population controls (8, 9) . Panel b represents the alternative proposal for sample 1, which is now separated from the former cluster and instead groups with the control samples. Similar proposals can be separately formulated for each of the samples 1-3. possible. Therefore, jumping species boundaries might only be a problem for an RNA virus if it has to adapt to different cellular receptors, although this still does not guarantee that sustained transmission will be established in the new host. An informative example is provided by influenza A virus. Birds are the main species reservoir, and avian influenza A viruses are usually unable to jump directly into humans because they lack the necessary mutations in the haemagglutinin (HA) gene to infect human cells 100 . Even when avian influenza A viruses do infect humans, human-to-human transmission might not be established. More generally, the relationship between the virus and the host cell receptor predicts an association between the number of cells a virus infects and its host range, thereby explaining whether a virus is a host 'specialist' or 'generalist'. Determining whether such a relationship exists should be a key goal in understanding the genetic basis of viral emergence. Case studies in viral emergence. The complex interplay between ecology and genetics in viral emergence can be seen in HIV. An important ecological factor in HIV emergence involves the bushmeat trade in west Africa. Not only have a wide range of related simian immunodeficiency viruses (SIVs) been isolated from animal carcasses 101 , but the bushmeat trade has increased owing to encroachment by humans on the ranges occupied by non-human primates. The SIV that is found in chimpanzees (SIVcpz) is the closest relative, and therefore the most likely ancestor, of human HIV-1 (REF. 102 ), whereas SIVsm from sooty mangabeys seems to be the reservoir population for HIV-2 (REF. 103 ). For both HIV-1 and HIV-2, there have been multiple transfers of virus from their reservoirs into humans, with these viruses most likely establishing themselves in humans during the last century 67,77 . Also of importance, was the movement of individuals infected early in the epidemic from small, isolated rural populations to cities in Africa 103 , which enabled incipient epidemic strains to reach a large number of susceptible hosts. Yet, genetics is also likely to have been important in the emergence of HIV. In particular, phylogenetic studies indicate that SIVs are most easily transmissable among related primate hosts 104 , implying that not all possible instances of cross-species viral transmission that could occur do occur, and that adaptive constraints might exist. A more recent and highly publicized example of viral emergence is provided by the SARS-CoV, the agent of a severe form of pneumonia that has killed more than 700 people worldwide since its appearance in China in November 2002. It is unclear whether the epidemic of 2002-2003 was the first appearance of SARS, or whether the virus had sporadically entered human populations previously, but without detrimental consequences. The animal reservoir for SARS-CoV is also a subject for debate. Phylogenetic analysis reveals that SARS-CoV is equidistant between coronavirus groups 1 and 2, which are usually isolated from mammalian species, and coronavirus group 3, which is currently confined to birds 65, 105, 106 (FIG. 2) . Moreover, the sequence divergence between these three groups coronavirus (SARS-CoV). Given the continuing threat that is posed by viral diseases, it is essential that we determine the factors underlying viral emergence. Hosts acquire RNA viruses by two different mechanisms. First, owing to host-virus co-speciation, host populations might have carried a specific RNA virus for their entire evolutionary history. Although co-speciation has been proposed in some RNA viruses 90 , the process seems to be rare. This is most likely a result of the short infectious periods of many RNA viruses, so that they have limited opportunity for the sustained transmission that is probably needed for co-speciation. By contrast, many DNA viruses establish persistent infections and are therefore expected to be able to follow long-term patterns of host speciation 91 . A more common method by which RNA viruses could enter new host species is through lateral transfer from different reservoir species. Both ecology and genetics seem to be important in determining whether a virus is able to successfully cross species boundaries. In many cases, ecological factors are the most important. Although such factors are diverse, and have been reviewed in detail elsewhere 92, 93 , they usually reflect changes in either the proximity or density of the host and/or reservoir species, which increase the likelihood that humans are exposed to new pathogens and that sustained transmission networks will be established. Far less is known about the possible genetic factors that might affect the ability of viruses to cross species boundaries. Although RNA viruses are the group of pathogens that seem most able to cross species boundaries 94 , perhaps because high mutation rates provide them with an increased capacity to adapt to new hosts 95 , not all RNA viruses are equally equipped in this respect. For example, in many cases, RNA viruses (such as rabies virus infection in humans) establish 'dead-end' infections in specific hosts, without subsequent transmission, which reflects imperfect adaptation. This indicates that there are constraints that inhibit viral adaptation to new hosts, perhaps owing to the fitness trade-offs that seem commonplace in viruses that need to infect different hosts or cell types 25, 96 . Therefore, infecting different hosts is likely to represent a major adaptive challenge for RNA viruses, despite their mutation rate. Examples are animal vector-borne viruses, which are less subject to adaptive evolution than their non-vector-borne counterparts, presumably owing to the difficulties that are associated with simultaneous replication in hosts as divergent as invertebrates and vertebrates 97 . If extended over longer periods of time, this will lead to a phylogenetic rule of cross-species transmission, such that the greater the evolutionary distance between hosts, the lower the probability of viral transfer among them 98 . A fundamental aspect of the mechanistic basis of viral emergence is the relationship between virus and host cell receptors 99 . Unless a virus has sequences that are able to recognize the cellular receptors of a potential host species, successful cross-species transmission will not be some basic rules to be established, and phylogenetic methods have been used to successfully predict the future population survival of strains of influenza A virus 109 . First, the larger the population size of the reservoir species, the more viruses it can harbour, including those with shorter durations of infection and increased virulence 110 . Consequently, animal species that live at high densities, such as some bats, rodents and birds, are most likely to be reservoirs, particularly those animal populations that already live in close proximity to humans. Less intuitively, if there is a relationship between the breadth of cell tropism and the number of species infected, most attention should be given to those viruses that infect several cell types. More importantly, a comprehensive survey of RNA virus diversity should be undertaken in appropriate animal species. This can be done through the use of degenerate PCR primers that have been designed for several RNA virus families, followed by studies to determine whether the viruses will grow in human cells. Similar approaches have already uncovered a plethora of new virus families from marine environments 111 . and SARS-CoV is so large that SARS-CoV has clearly experienced a long period of independent evolution. Studies of animals sold at Chinese markets have detected antibodies in a number of mammalian species 107 . Most notably, viruses obtained from the Himalayan palm civet (a member of the Viverridae) are closely related to human strains of SARS-CoV. Whether this species represents the main reservoir for SARS-CoV is still unclear. Finally, there have also been suggestions that SARS-CoV is a recombinant of mammalian and avian coronaviruses and that this genetic event might have trigged viral emergence 108 . However, because the sequences involved are so divergent, the phylogenetic incongruence in trees of SARS-CoV seems more likely to be due to variation in the molecular clock than inter-coronavirus recombination. Predicting viral emergence. The 'holy grail' for studies of emerging diseases is to predict which infectious agents are likely to infect human populations in the future. Although we are a long way from making accurate predictions, evolutionary genetics does allow Owing to the highly divergent nature of these viruses, the analysis was conducted using an alignment of 12 amino acid sequences that are 1,270 residues in length. The tree was inferred using the maximum likelihood (ML) method available in TREE-PUZZLE 135 . Numbers next to some branches represent quartet puzzling support values, which give an indication of the reliability of that branch. SARS-CoV appears as a distinct lineage. b | Magnified phylogeny of representative SARS-CoV strains isolated from humans and the Himalayan palm civet (Paguma larvata), a putative reservoir species. The tree was constructed using the same region as in part a but using nucleotide sequences (16 sequences, 3,765 bp). The tree was inferred using the ML method available in PAUP* 136 host population is large and extensively mixed. Consequently, advantageous mutations, most notably those that confer antigenic escape, are able to be fixed in the virus population in a regular manner 121 . By contrast, population-level selection seems to be considerably weaker in the sexually transmitted HIV, although some evidence for long-term cytotoxic T lymphocyte (CTL)-mediated selection has been found 122 . This contrasts with intra-host HIV evolution, in which immune-driven natural selection is the dominant evolutionary process 72, 123, 124 . The reduced impact of selection at the population level is most likely to be caused by extensive variation in rates of partner exchange, which in turn reduces the effective population size 125 , and because there is a large bottleneck at transmission 126 . Therefore, for HIV, intra-host and inter-host evolution seem to be largely decoupled. Although fewer studies have compared intra-and inter-host evolution in acute RNA virus infections, a recent analysis of dengue virus indicated that most amino acid changes that arise within hosts are deleterious in the long term 74 . A common view is that if virulence is a selected trait at all, then it is often involved in a trade-off with transmissibility; the balance of these factors that maximizes the BASIC REPRODUCTIVE RATE (R 0 ) of the pathogen is favoured by natural selection 127 , although this has recently been questioned 128 . For RNA viruses it is therefore important to determine whether virulence is optimized and, if so, how it is linked to transmissibility. Complexities arise because virulence is also likely to vary according to the transmission mode 129 , and whether there is a long period of intra-host evolution, including superinfection by other strains, which increases intra-strain competition and therefore virulence 130 . In short, predictions about the long-term evolution of virulence in RNA viruses need to be made on a case-specific basis. However, some aspects of the evolution of virulence reflect those that are associated with drug resistance. For example, if particular mutations confer virulence, then whether they become fixed in populations also depends on the strength of genetic drift, even if they are advantageous. Consequently, the optimal level of virulence might not be acquired by chance in small populations. Similarly, if the evolutionary process differs greatly within and among hosts, a selectively favoured level of virulence within hosts might be disadvantageous among hosts. The intra-host evolution of HIV tends to result in the production of highvirulence viral strains that preferentially use the CXCR4 chemokine receptor, infect cells faster and cause AIDS to develop more rapidly 131 . However, these strains seem to be transmitted less often, indicating that they are selectively disadvantageous in new hosts 132 . Understanding the interplay between virulence and transmissibility is clearly central to understanding the evolution of virulence of RNA virus in diseases. One aim of studies of RNA virus evolution is to use our understanding of evolutionary processes in the short term, which have often been acquired from experiments, to predict what evolution will do in the long term. Although evolutionary biologists are rightly nervous about predicting future change, the rapid pace of RNA virus evolution means that these predictions can be tested quickly. Of most immediate interest are patterns of drug resistance and viral virulence. The evolution of drug resistance. Understanding drug resistance is one area in which population biology has a direct impact on public health 112, 113 . In the case of RNA viruses, most interest has focused on the potential of drugs to control HIV infection. Despite the optimism that initially surrounded the deployment of highly active antiretroviral therapy (HAART), which involves combinations of drugs 114 , antiviral therapy is unlikely to provide a cure for HIV. There are several reasons for this, not least of which is that despite our inability to detect viruses in some patients receiving HAART, viral replication is ongoing, although at greatly reduced levels 115 . Early studies predicting the length of time it would take for resistance to arise under multiple drug therapy also underestimated the importance of recombination in HIV, which we now know is extensive 116 . Frequent recombination could allow drug resistance to be acquired more rapidly than acquisition through mutation alone. There are many factors that influence the evolution of drug resistance, and important results have been obtained -for example, regarding the probability of resistance mutations arising before and during treatment, and the optimal time for the onset of drug treatment 117 . One important question, which also relates to the mechanics of viral evolution in general, is whether drug-resistance mutations have a fitness cost compared with wild-type alleles in the absence of the drug. If there is a fitness cost, we would hypothesize that resistance mutations would not reach high frequencies in populations, despite their benefit to the virus in hosts. There is evidence that, in the absence of the drug, HIV strains harbouring drug-resistant mutations are less fit than wild-type HIV strains 118 . Unfortunately, in other cases, drug-resistant HIV mutants seem to have greater infectivity, and even replication capacity, than wild-type viruses 119 . Not surprisingly, these mutations are increasingly sampled from drug-naive patients 120 . Even if drug mutations are universally advantageous, their long-term success depends on more than their individual fitness. An important mediating factor is the strength of genetic drift at the population level. If drift is strong, which will be the case if effective population sizes are small, the frequencies that mutations eventually attain in populations has a large stochastic component. For RNA viruses, effective population size reflects the mode of transmission. This can be shown by comparing HIV with influenza A virus 3 . In the case of the respiratory transmitted influenza A virus, the BASIC REPRODUCTIVE RATE (R 0 ). For viruses, this is defined as the number of secondary infections that are caused by a single index case in an entirely susceptible population. For an epidemic to proceed, R 0 needs to exceed 1. which they mutate, their evolution cannot be described in full without a consideration of all the processes of evolutionary change. A particular challenge for the future is to determine whether viral evolution in nature is similar to that established in vitro. The beauty of RNA viruses is that the link between experimental and natural systems can be made simply -few other organisms are as well suited for studying evolutionary processes. Establishing the rules of RNA virus evolution is important: not only will this provide information that is essential for understanding the basic mechanisms of evolutionary change, but it will assist in the design of strategies for the control, treatment and eradication of RNA viruses, and perhaps for predicting their emergence. Although it is clear that RNA viruses are unique in the rapidity with RNA virus mutations and fitness for survival Origin and Evolution of Viruses Unifying the epidemiological and evolutionary dynamics of pathogens Frequency-dependent selection in a mammalian RNA virus Rates of spontaneous mutation among RNA viruses A key reference giving figures for the spontaneous rate of mutation in RNA viruses and the difficulties in making these measurements The rate and character of spontaneous mutation in an RNA virus Self-organisation of matter and the evolution of biological macromolecules The Major Transitions of Evolution Transition between stochastic evolution and deterministic evolution in the presence of selection: general theory and application to virology. Microbiol Recombination in evolutionary genomics High rates of human immunodeficiency virus type 1 recombination: near-random segregation of markers one kilobase apart in one round of viral replication Viral quasispecies and recombination Phylogenetic analysis reveals a low rate of homologous recombination in negative-sense RNA viruses Genetic diversity and tissue compartmentalization of the hepatitis C virus genome in blood mononuclear cells, liver, and serum from chronic hepatitis C patients Effect of population patchiness and migration rates on the adaptation and divergence of vesicular stomatitis virus quasispecies populations The hypercycle. A principle of natural self-organization. Part A: emergence of the hypercycle Steps Towards Life Nucleotide sequence heterogeneity of an RNA phage population RNA virus quasispecies populations can suppress vastly superior mutant progeny An important experimental study showing that a highfitness variant was suppressed in a low-fitness population. Often interpreted as indicating that RNA viruses form quasispecies Evidence for the nonquasispecies evolution of RNA viruses A paper questioning the relevance of the quasispecies model to the evolution of RNA viruses in Nature Evolvability of an RNA virus is determined by its mutational neighborhood Experimental evolution recapitulates natural evolution Experimental evolution of parasites Restrictions to RNA virus adaptation: an experimental approach Exceptional convergent evolution in a virus molecular basis of adaptive convergence in experimental populations of RNA viruses Evolutionary dynamics of viral attenuation Experimental genomic evolution: extensive compensation for loss of DNA ligase activity in a virus Rate of deleterious mutation and the distribution of its effects on fitness in vesicular stomatitis virus Distribution of spontaneous mutants and inferences about the replication mode of the RNA bacteriophage φ6 Evaluating the impact of population bottlenecks in experimental evolution Resistance of virus to extinction on bottleneck passages: study of a decaying and fluctuating pattern of fitness loss Clonal interference and the evolution of RNA viruses An important experimental demonstration that clonal interference slows down the rate of adaptive evolution in RNA viruses Contributions of vesicular stomatitis virus to the understanding of RNA virus evolution Models of experimental evolution: the role of genetic chance and selective necessity Fitness of RNA virus decreased by Muller's ratchet A seminal paper on experimental evolution with RNA viruses demonstrating the action of Muller's ratchet Effect of deleterious mutation-accumulation on the fitness of RNA bacteriophage MS2 Genetic exchange by recombination or reassortment is infrequent in natural populations of a tripartite RNA plant virus The advantage of sex in the RNA virus φ6 Evolutionary dynamics of fitness recovery from the debilitating effects of Muller's ratchet Evolution of sex and the molecular clock in RNA viruses The red queen reigns in the kingdom of RNA viruses Different trajectories of parallel evolution during viral adaptation Molecular epidemiology of HIV transmission in a dental practice Analysis of a rape case by direct sequencing of the human immunodeficiency virus type 1 pol and gag genes Molecular epidemiology of an outbreak of infection with hepatitis C virus in recipients of anti-D immunoglobulin Transmission of hepatitis C virus by a cardiac surgeon Immune-mediated positive selection drives human immunodeficiency virus type 1 molecular variation and predicts disease duration Likelihood model for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene Patterns of intra-and interhost nonsynonymous variation reveal strong purifying selection in dengue virus Evidence of recombination in natural populations of hepatitis A virus A natural intergenotypic recombinant of hepatitis C virus identified in St Petersburg Timing the ancestor of the HIV-1 pandemic strains Origin and evolution of GBV-C/hepatitis G virus and relationships with ancient human migrations An integrated framework for the inference of viral population history from reconstructed genealogies Establishes the link between population genetics and mathematical epidemiology through the use of molecular phylogenies to study the population dynamics of infectious diseases A comparison of the molecular clock of hepatitis C virus in the United States and Japan predicts that hepatocellular carcinoma incidence in the United States will increase over the next two decades Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis Different population dynamics of human T-cell lymphotropic virus type II in intravenous drug users compared with endemically infected tribes Evolution of human influenza A viruses over 50 years: rapid, uniform rate of change in NS gene Episodic evolution of RNA viruses Punctuated equilibrium and positive Darwinian evolution in vesicular stomatitis virus Rates and dates of divergence between AIDS virus nucleotide sequences Selection forces and constraints on retroviral sequence variation Illustrates some of the key factors that constrain the adaptability of RNA viruses Molecular clocks and the puzzle of RNA virus origins Virus evolution and genetic diversity of hantaviruses and their rodent hosts Toward a comprehensive phylogeny for mammalian and avian herpesviruses Emerging infectious diseases of wildlife: threats to biodiversity and human health Factors in the emergence of infectious diseases Diseases of humans and their domestic mammals: pathogen characteristics, host range and the risk of emergence Population biology of multihost pathogens Error thresholds and the constraints to RNA virus evolution Reduced positive selection in vector-borne RNA viruses in Viral Ecology Evolution of cell recognition by viruses Analysis of influenza A virus nucleoproteins for the assessment of molecular genetic mechanisms leading to new phylogenetic virus lineages Risk to human health from a plethora of simian immunodeficiency viruses in primate bushmeat A remarkable demonstration of the frequency with which primate bushmeat is infected with SIVs Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes AIDS as a zoonosis: scientific and public health implications Preferential host switching by primate lentiviruses can account for phylogenetic similarity with the primate phylogeny Characterization of a novel coronavirus associated with severe acute respiratory syndrome Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Mosaic evolution of the severe acute respiratory syndrome coronavirus Predicting the evolution of human influenza A Infectious Diseases of Humans High diversity of unknown picorna-like viruses in the sea A PCR-based analysis of viral diversity in marine environments Disease transmission dynamics and the evolution of antibiotic resistance in hospitals and communal settings Population biology, evolution, and infectious disease: convergence and synthesis Down or out in blood and lymph? Recovery of replication-competent HIV despite prolonged suppression of plasma viremia Recombination: multiply infected spleen cells in HIV patients Virus dynamics and drug therapy Reduced replication of 3TC-resistant HIV-1 variants in primary cells due to a processivity defect of the reverse transcriptase enzyme Infectivity and replication capacity of drugresistant human immunodeficiency virus type 1 variants isolated during primary infection Safety and immunogenicity of combinations of recombinant subtype E and B human immunodeficiency virus type 1 envelope glycoprotein 120 vaccines in healthy Thai adults Positive Darwinian evolution in human influenza-A viruses Clustering patterns of cytotoxic T-lymphocyte epitopes in human immunodeficiency virus type 1 (HIV-1) proteins reveal imprints of immune evasion on HIV-1 global variation Evidence for HIV-1 adaptation to HLA-restricted immune responses at the population level Recent paper showing that the intra-host evolution of HIV-1 is dominated by immune-driven positive selection Population dynamics of HIV-1 inferred from gene sequences Homogeneous quasispecies in 16 out of 17 individuals during very early HIV-1 primary infection Virulence and transmissibility of pathogens: what is the relationship? Challenging the trade-off model for the evolution of virulence: is virulence management feasible? Transmission bottlenecks as determinants of virulence in rapidly evolving pathogens Superinfection and the evolution of parasite virulence Chemokine coreceptor usage by diverse primary isolates of human immunodeficiency virus type 1 Genotypic and phenotypic characterization of HIV-1 patients with primary infection Molecular evidence of HIV-1 transmission in a criminal case Molecular epidemiology and forensic genetics: application to a hepatitis C virus transmission event at a hemodialysis unit Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies Phylogenetic Analysis Using Parsimony (*and other methods) This work has been funded by The Wellcome Trust, the Spanish Ministerio de Ciencia y Tecnologia and Generalitat Valenciana. We would like to thank J. M. Cuevas and R. Sanjuan for help in preparing Box 1. We also thank four referees for useful comments. A key experimental paper that provides evidence for both frequent adaptive and parallel evolution in RNA viruses. 45