key: cord-0892699-ig0rnbqb authors: Day, Troy; Gandon, Sylvain; Lion, Sébastien; Otto, Sarah P. title: On the evolutionary epidemiology of SARS-CoV-2 date: 2020-06-11 journal: Curr Biol DOI: 10.1016/j.cub.2020.06.031 sha: bde38198f0a804809fcf52c809df2f10077945cd doc_id: 892699 cord_uid: ig0rnbqb There is no doubt that the novel coronavirus SARS-CoV-2 that causes COVID-19 is mutating and thus has the potential to adapt during the current pandemic. Whether this evolution will lead to changes in the transmission, the duration, or the severity of the disease is not clear. This has led to considerable scientific and media debate, from raising alarms about evolutionary change to dismissing it. Here we review what little is currently known about the evolution of SARS-CoV-2 and extend existing evolutionary theory to consider how selection might be acting upon the virus during the COVID-19 pandemic. While there is currently no definitive evidence that SARS-CoV-2 is undergoing further adaptation, continued, evidence-based, analysis of evolutionary change is important so that public health measures can be adjusted in response to substantive changes in the infectivity or severity of COVID-19. Troy Day 1 , Sylvain Gandon 2 , Sébastien Lion 3 , and Sarah P. Otto 4 There is no doubt that the novel coronavirus SARS-CoV-2 that causes COVID-19 is mutating and thus has the potential to adapt during the current pandemic. Whether this evolution will lead to changes in the transmission, the duration, or the severity of the disease is not clear. This has led to considerable scientific and media debate, from raising alarms about evolutionary change to dismissing it. Here we review what little is currently known about the evolution of SARS-CoV-2 and extend existing evolutionary theory to consider how selection might be acting upon the virus during the COVID-19 pandemic. While there is currently no definitive evidence that SARS-CoV-2 is undergoing further adaptation, continued, evidence-based, analysis of evolutionary change is important so that public health measures can be adjusted in response to substantive changes in the infectivity or severity of COVID-19. Zoonotic pathogens, which have jumped from animal to human hosts, can result in enormous public health challenges because so little is known about the pathogen during the initial stages of an outbreak. The most important public health intervention for such pathogens is therefore to suppress transmission as much as possible. The current COVID-19 pandemic, caused by the SARS-CoV-2 coronavirus, provides a stark example. Wide-scale shifts in human social networks, from restrictions on travel to lockdowns of entire cities or countries, have been critical to the success of slowing the pandemic and reducing the number of deaths. Because zoonotic pathogens are often poorly adapted following a host shift, it is also natural to ask how they will evolve in response to their novel human host and to medical and public health interventions. Examples where some evidence exists for adaptation following host shifts include myxoma virus in rabbits, and avian flu, Ebola, and Zika virus in humans [1] . With SARS-CoV-2, we might also expect further adaptation to its human host. For example, although SARS-CoV-2 is already able to bind the ACE2 receptors critical for entry into human cells, computational models and data have identified additional mutations that might further strengthen binding affinity [2] . In this essay, we explore the evolutionary potential for SARS-CoV-2, guided by available data and evolutionary models. At present, there is a lack of compelling evidence that any existing variants impact the progression, severity, or transmission of COVID-19 in an adaptive manner. Models, however, indicate that natural selection can be strong and act on diverse aspects of SARS-CoV-2 as it spreads in its new human host. We argue for developing better strategies to detect, verify, and respond to evolutionary changes in COVID-19 that have important effects on human health and disease spread. Doing so will enhance the set of tools at our disposal for implementing effective public health measures. The growth of the human population has led to an increasing number of human-wildlife interactions, facilitating the movement of pathogens from animal hosts to humans (zoonoses) [3] . Viral spillover to a new species requires either pre-adaptation or rapid evolution of the proteins that dock and allow entry into new host cells. For SARS-CoV-2, six amino acids in the receptorbinding domain of the spike protein are critical for binding the host target receptor ACE2 and allowing infection in humans [4] . These critical spike protein residues are not all present in the most closely related coronavirus identified to date, RaTG13, sampled from the horseshoe bat, Rholophus affinis (RaTG13 and SARS-CoV-2 are 96% similar at the nucleotide level), but they are found in coronavirus sampled from pangolin [4] . The SARS-CoV-2 genome shows no evidence of recent recombination, arguing against a recombinant origin involving pangolin [5, 6] . Given the poor sampling of coronaviruses from wildlife and the wide range of animals with similar ACE2 receptors (including pigs, ferrets, cats, and non-human primates [2] ), it is likely that we simply have not identified the most closely related animal source [4] , making it impossible to know what evolutionary changes happened immediately prior to or during the transition to humans. Clues to the history of a disease can be obtained from its phylogenetic tree. Within humans, SARS-CoV-2 displays a star-like phylogeny with many long tip branches [7] , as expected in a growing population. Based on genomic sampling over time, the substitution rate is estimated to be 0.00084 per site per year (www.nextstrain.org; [8] ; 16 May 2020), 2-to 6-fold lower than the substitution rate for influenza (0.004-0.005 subst/site/yr for influenza A and 0.002 subst/site/yr 4 for influenza B [9] ). Across its ≈30,000 basepair genome (Nextstrain), SARS-CoV-2 thus undergoes roughly one genetic change every other week. Genomic tracking tools like Nextstrain [8] allow us to analyze genetic variants very rapidly, as soon as their sequences become available during an outbreak [8] . Figure 1 illustrates nucleotide variation among the 5380 genomes available on May 16, 2020 ( Figure 1A) . Several of these are nonsynonymous and thereby alter the amino acid composition of viral proteins ( Figure 1B ). Although Figure 1 reveals substantial genetic variation in SARS-CoV-2, it is unclear if these changes have any functional significance. Many are likely neutral or slightly deleterious to the virus [5, 10] , having risen in frequency by chance when carried to new susceptible hosts. Mutations with no functional significance readily fluctuate in abundance, acting like genomic fingerprints that can be used to track viral geographic spread and to reconstruct epidemiological dynamics (see, e.g., May 8, 2020 Situation Report on international spread from Nextstrain). In the long run, deleterious mutations are expected to be eliminated, as seen in genetic comparisons between more distantly related coronavirus lineages [6]. However, even deleterious mutations can rise in abundance during an epidemic as long as their effective reproduction number remains larger than one. More controversial is whether any of the nonsynonymous variants circulating among humans increases viral fitness. Box 1 describes four of the most prominent studies about adaptive SARS-CoV-2 mutations and the reasons for caution. Even though the adaptive significance of genetic variants remains to be established, we can use evolutionary theory to gain insights about how natural selection might act on disease characteristics. Modeling SARS-CoV-2 is currently challenging because we lack crucial information. For example, the fraction of cases that are asymptomatic, the relative infectiousness of asymptomatic individuals, and how these vary with age, are not yet well understood. We therefore first explore a general model, without specifying the exact parameter values. We then illustrate the dynamics using parameter values consistent with available data. An extensive body of theory has been developed to understand the short-and long-term 5 evolution of pathogens [11] [12] [13] [14] [15] . There are two types of pathogen traits whose evolution is usually distinguished: antigenic traits and disease life-history traits [15] . Antigenic evolution refers to the mutational appearance and spread of viral genotypes that can escape existing immunity in the population [16] . For example, continued antigenic evolution is the reason why seasonal influenza vaccines must be periodically updated -the influenza virus eventually evolves so much that it escapes the immune response induced by the vaccine [17] . In the case of SARS-CoV-2, where no natural immunity previously existed [18] and no vaccine is yet in widespread use, natural selection for antigenic escape mutations will be very weak. Modeling protective immunity and escape mutations will, however, be an important step for future theory if SARS-CoV-2 becomes endemic to the human population. Disease life-history evolution refers to the mutational appearance and spread of genotypes that cause different disease characteristics [19] . For example, the transmission rate of the virus, the length of the asymptomatic period that it causes, and the mortality induced by the infection are all disease life-history traits. As will be shown, these traits can be under strong selection, even for emergent diseases that are spreading rapidly in immunologically naive populations. To make predictions about the evolution of pathogens, we must consider the potentially complex interaction between epidemiological and evolutionary dynamics. An important starting point is therefore to develop an appropriate model for the epidemiology of SARS-CoV-2. Data from multiple countries suggest that the median duration of time between infection and the onset of symptoms (i.e., the incubation period) is approximately 5 days [20] . Also, infectiousness has been inferred to start ~2.5 days before symptom onset, with high transmission prior to the onset of symptoms [21] . Viral loads then drop after symptom onset, halving within ~2-4 days [21] . Together these results suggest that there is both an exposed but non-infectious stage and a pre-symptomatic stage that is highly infectious. Once infected, individuals either recover with relatively little medical intervention, or they progress to more severe disease and suffer a higher mortality rate. The overall case fatality for symptomatic individuals is estimated to be 1-2% [22] , and the mean length of time from symptoms until death is approximately 18 days [23] . Finally, some infected individuals remain asymptomatic throughout the course of infection. Based on a 6 systematic review, Buitrago-Garcia et al. estimate that 29% of cases remain asymptomatic (CI: 23-37%) but note that this may be an overestimate due to publication biases and the requirement for at least one asymptomatic individual in many of the studies [24] . Regarding the source of new infections, their review suggests that pre-symptomatic cases account for about 40-60% of new infections, with <10% from asymptomatic individuals and the remainder from individuals who have developed symptoms [24] . As they highlight, these numbers remain highly uncertain. We illustrate the epidemiological dynamics within a population in Figure 3 We next use the model in Figure 2 to explore how SARS-CoV-2 might evolve by considering the fate of mutations that alter the disease life history traits (e.g., transmission, disease progression, and/or virulence). Many functionally relevant mutations will be lost through stochasticity when they first arise, even if they are selectively advantageous. Indeed, the probability of a mutation escaping stochastic loss during the initial stages of an outbreak is 1 − 1/R 0 , where R 0 is the 7 number of new infections caused by an infected individual in a fully susceptible population (the initial 'reproduction number'). Assuming the mutant R 0 is similar to that of the wildtype R 0 (estimated to be ≈ 2.5 [25] ), this gives a probability of escaping stochastic loss of approximately 60%. In the Supplemental Discussion, we describe the evolutionary dynamics of genotypes that avoid stochastic extinction. Figure 2 summarizes the resulting selection on each life history trait (derived analytically without specifying the parameter values, many of which remain uncertain). Of course, viral mutations can alter multiple traits simultaneously through pleiotropy. For example, mutations that increase viral replication rate might affect transmission rate (the β's), the rate of disease progression (the κ's), the fraction of asymptomatic cases (f), and/or virulence (α). The summed effect of selection on all such traits governs the fate of the mutant, as described by Some intuition for the general results derived in the Supplemental Discussion can be obtained by considering the special case where individuals with symptoms are immediately isolated and where asymptomatic individuals do not contribute to new infections (so that β I and β A are negligible). In this case (and assuming a very transient exposed class), evolutionary change is driven by selection on the pre-symptomatic class, as they are then the major source of new infections. We can then describe the change in frequency of a mutation using a single equation (see Supplemental Discussion): Here, q P is the fraction of pre-symptomatic infections that harbour the new mutation, ∆β P is the difference in pre-symptomatic transmission rate between the new mutation and wildtype, ∆f is the difference in the fraction of asymptomatic individuals, and ∆κ P is the difference in the rate at which pre-symptomatic individuals develop symptoms. Given that the 'genetic variance' in (1) is positive, the mutation will increase in frequency as long as its 'selection coefficient' is positive; that is, if it transmits better, leads to fewer asymptomatic cases, remains infectious longer, or some combination of the three. 8 We next use these results to examine the nature of selection acting on the life history traits of SARS-CoV-2. We start by considering selection acting on each life history trait on its own and then explore how mutations that affect multiple traits will evolve. Throughout, we use the full model, with selection described by equation (S6) in the Supplemental Discussion, using the special case (1) only to help explain the results. The accompanying Mathematica package provides all code and explores a range of parameters to confirm the robustness of the illustrated results (available from DRYAD: https://doi.org/10.5061/dryad.5hqbzkh3g). Transmission rates -Selection favors genotypes that have a higher transmission rate, when considered as a trait on its own (no pleiotropy), with the strength of selection for increased transmissibility being proportional to the density of susceptible hosts, S. This result holds generally (equation S6) and can be seen most clearly in equation (1). This means that selection for genotypes that have a higher transmission rate will be strongest in dense populations with a large number of immunologically naive individuals [11, 13, 14] , a prediction that has been verified experimentally in other systems [26, 27] . Public health interventions, like social distancing, will typically reduce the rate of contact among individuals and so reduce the transmission rate of all genotypes. As a result, social distancing will decrease the ∆β's (e.g., by 60% in Figure 3E ) and so will weaken selection for increased transmission rates. That said, because social distancing will maintain a large population of susceptibles, S, for longer, this epidemiological feedback will increase the average strength of selection over the outbreak. In addition, social distancing spreads infections out over a longer period of time (i.e., it flattens the curve). This, in turn, allows selection to act over a longer time period, resulting in a larger cumulative amount of evolution (Box 2). Overall, adaptive mutations therefore typically reach a higher final frequency with social distancing than without (compare black curves in Figure 3A ,E), although the net effect depend on the details. Evolution, in turn, can affect the epidemiology of the disease, both in terms of infection prevalence (compare cumulative effects on the remaining number of susceptibles in Figure 3E with [solid blue] and without [dashed blue] evolution) and the cumulative number of deaths (inset bars in Figure 3E with [red] and without [pink] evolution). The public health importance of any evolutionary change depends also on the mutation's effect size (illustrated is a mutation that 9 increases transmission by 20%). Finally, some mutations may become more strongly selected in the presence of social distancing if they allow for viral transmission despite the intervention (e.g., mutations affecting aerosolization or persistence in the environment). Other interventions, like rapid contact tracing and testing, essentially eliminate some transmission chains, but we would not expect this to affect the strength of selection among untraced cases, as long as genotypes are equally likely to be traced. Equation (1) also reveals that the strength of selection on the fraction of asymptomatic cases is proportional to the density of susceptible hosts, just as with selection on transmission. As a result, selection for genotypes that produce a lower fraction of asymptomatic infections will also be strongest in dense populations with a large number of immunologically naive individuals. Public health measures aimed at isolating people once they develop symptoms will weaken this selection, by reducing the reproductive value of infections leading to symptoms. The direction of selection will remain the same, however, as long as more infections arise from pre-symptomatic individuals than asymptomatic individuals. Disease progression -Natural selection generally favours a lengthening of the pre-symptomatic phase (i.e., smaller values of κ P ), including mutations that reduce the morbidity of the disease so that people remain less aware of their symptoms. Simply put, the longer the pre-symptomatic period, the longer the duration over which the virus can transmit ( Figure 3C ,G). While the special-case model without any disease class structure (equation (1) Disease virulence -In general, disease-induced mortality, α, is selected to decrease ( Figure 3D ,H; Supplemental Discussion), when considered as a trait on its own (no pleiotropy). However, this selection tends to be weak because most individuals with COVID-19 recover and the costs of mortality are paid only late in the course of the disease, after substantial transmission has already occurred (in the pre-symptomatic and early symptomatic phases). This can be seen most clearly in equation (1) where we have assumed that only pre-symptomatic individuals transmit. In that case, selection on virulence is entirely absent. Again, public health measures change how natural selection acts upon SARS-CoV-2. In particular, the widespread recommendation to self-isolate once symptoms appear has likely weakened direct selection against virulence, although this effect must be small given that direct selection against virulence was already weak (as seen in Figure 3D ,H). Moreover, the benefits of self-isolation in terms of disease containment massively outweigh the small potential costs of reduced selection against virulence (compare inset bar charts in Figure 3D ,H). A crucial question is how virulence will evolve [28] . As discussed above, direct selection on virulence is weak (see Figure 3D ,H). Thus, virulence evolution will be driven largely by the indirect effects of pleiotropy. In Figure 4 , we consider two potential examples. First, consider mutations that couple a higher transmission rate, the β's, with higher mortality, α (positive pleiotropy, Figure 4A ,C), as might occur if mutations increase viral replication rates. In this case, evolution will lead to higher mortality (see inset bars), as an indirect consequence of selection for increased transmission (see Supplemental Discussion and also [12, 29] ). Alternatively, consider a mutation that alters tissue tropism such that the disease tends to preferentially infect cells of the upper respiratory tract, rather than the lower respiratory tract. Such infections could lead to a higher transmission rate but be less virulent (negative pleiotropy) [30] . This would generate indirect selection for lower mortality rates ( Figure 4B,D) . Of course, pleiotropy may link virulence evolution with any of the life history traits of SARS-CoV-2. For example, selection to prolong the infectious stage, the κ's, might reduce virulence as a side consequence (e.g., if a weaker immune reaction is elicited). Or selection for more symptomatic cases (lower f) may lead to a pleiotropic increase in virulence. Pleiotropy is hard to predict, which is why it is not possible to say with any confidence whether evolution of COVID-19 will translate into meaningful effects for patients (positive or negative) or for the spread of the disease. In Of course, the absence of evidence is not the same as evidence of absence. At present, we know very little about the scope for adaptation or the functional significance of existing genetic variants. This knowledge gap is exacerbated by the lack of accessible data linking disease outcomes with genetic variants. The fact that the virus already displays effective human-tohuman transmission might mean there is little opportunity for further adaptation. Likewise, the low mutation rate of SARS-CoV-2 compared to influenza suggests that it may evolve more slowly in response to selection, although the relative rates of evolution depend too on how many sites are targeted by selection (SARS-CoV-2's genome is double in size) . On the other hand, we also know that the virus often fails to infect close contacts [31] , suggesting that there is ample scope to increase transmission further. Moreover, the large number of viruses circulating within a patient implies that every possible genomic mutation is likely to arise over the course of an infection (median estimate of 100,000 viral copies per mL of saliva [32] ). Thus, adaptation in response to strong selection for survival and transmission in human hosts could occur very rapidly and may have done so when it first switched into humans. This paper is intended to provide a framework for thinking about the potential evolutionary routes that SARS-CoV-2 might take and to dispel some of the current misinformation that is circulating in the media. For example, in the absence of pleiotropy, the mortality rate due to COVID-19 is actually either selectively neutral or selected against. Furthermore, selection for increased transmission rate is at its strongest during the initial phase of an epidemic, when spread is exponential, even though one might have initially thought that selection would be stronger later, when host immunity is more widespread. Our analysis also shows that selection likely favors viruses that progress slowly towards disease and whose symptoms remain mild for longer, 13 because such viruses will be transmitted more before people are aware that they are infected. This contrasts strongly with Ebola, where a major transmission route was contact with the dead, which can strongly select for greater virulence [33, 34] . On the other hand, if transmission is pleiotropically coupled with faster rates of viral replication, and the latter leads to greater virulence, then selection for more efficient transmission could result in an evolutionary increase in virulence (as illustrated in Figures 4A,C) . As with any theoretical analysis, our predictions rely on some biologically informed simplifications. For example, we have neglected within-host evolution, the roles of age-and spatial structure in the transmission process and the potential importance of host genetics in disease susceptibility. Likewise, we have largely ignored the influence of stochasticity, despite the fact that chance events like founder effects, genetic drift, and super-spreading can be particularly important in emerging diseases. It is also important to emphasize that much remains unknown about SARS-CoV-2. As our understanding of the virus improves and new data emerge, it will be possible to refine predictions and explore other scenarios for the short-and longer-term evolution of viral traits. Experimental data using cell lines and animal models will help reveal the pleiotropic effects of mutations in SARS-CoV-2. Phylogenetic methods will help reveal which, if any, genetic changes have been driven by positive selection. Such methods include classic approaches searching for sites that undergo amino acid changes more often than expected (e.g., [35] ), as well as newer methods that use the shape of the phylogeny to infer the effect of genetic variants on transmission rates and virulence [36] . A key challenge for future studies is to determine whether inferences are robust to the extreme stochasticity we are seeing with COVID- Within weeks after the initial reports emerging from Wuhan, China, of a new respiratory illness in December 2019, scientists had already started searching for signatures of adaptation to humans within the genomes of SARS-CoV-2. One of the earliest studies to appear identified two strains of SARS-CoV-2 circulating in Wuhan (strains 'L' and 'S') and suggested that they had functional consequences, with the former being more 'aggressive' [5] . This claim was widely picked up in the media and led to considerable speculation that evolutionary change could result in COVID-19 becoming more severe. However, this inference was based solely on the frequencies of the two variants, and the stochastic occurrence of mutations on the basal branches of a star-like phylogeny for a spreading disease can fully account for the frequency data [10] . The original authors have since acknowledged this more parsimonious explanation [5] . Similarly, a recent study [37] found that the non-synonymous mutation D614G in the spike gene (see peak in "S" in Figure 1B ) has been increasing in frequency in multiple countries. Although this pattern is expected if the mutation were selectively advantageous, it can also be explained by purely neutral sampling processes. Even without selection, a parallel increase in the frequency of a mutation across multiple countries is expected if new disease outbreaks are first seeded by travelers from a geographic location with a low mutant frequency (e.g., China) followed by travelers from a location where the mutation is (by chance) at a high frequency (e.g., Italy). Definitive evidence of selection must therefore reject such a neutral explanation, which was not done [37] . On the other hand, [37] did look for corroborating evidence of selection in hospitalization rates but did not find a difference associated with D614G. The authors did find a difference in viral load, but this finding is hard to interpret given that days since symptom onset (a major determinant of viral load [21] ) was not controlled and may have changed over time as testing became more available. Additional data are thus needed before concluding that selection is acting upon D614G. Another genetic change that has received media attention involves a 382 nucleotide deletion in ORF8 found in multiple COVID-19 patients in Singapore. Although no direct evidence exists that this deletion was positively selected, similar deletions have been found in other coronaviruses, including SARS variants that arose during the 2003-2004 outbreak [38] . Cell culturing demonstrated that one of these earlier SARS deletions reduced the rate of viral replication [39] . The repeated appearance of such deletions is intriguing, but direct evidence is needed linking such deletions to disease outcomes and/or transmission rates. Finally, [40] used sister clade comparisons to examine whether any of 31 specific mutations identified in the SARS-CoV-2 genome, including the D614G mutation mentioned above, are associated with an increased transmission rate. Their results show that most mutations are found in clades that, if anything, are associated with reduced transmission, concluding that there is no evidence for positively selected alleles. However, sister clade comparisons lack power and are biased against finding derived characters that boost the growth of a lineage. Definitive conclusions await further monitoring and testing, accounting for the null expectations with a rapidly expanding zoonotic disease. The key public health interventions used against COVID-19 focus on reducing contact rates, isolating travelers and sick individuals, and contact tracing. These interventions aim to reduce transmission (the β's) and 'flatten' the trajectory describing the time-course of infections or, even better, squash it to zero. These interventions are essential for health care systems to manage the influx of cases and to save lives. Flattening the epidemic curve has a secondary effect though: by prolonging the duration of the epidemic, it increases the time period over which evolutionary change accumulates. As seen in Figures 3 and 4 (compare top and bottom panels), these interventions often result in a greater total change in gene frequency. This finding is perhaps not as obvious as it first seems. Evolutionary adaptation requires that new infections be generated and old ones lost at rates that differ among genotypes. Thus, it seems reasonable to expect that the cumulative amount of evolutionary change in gene frequency that occurs over an epidemic is positively related to the total number of new infections that occur (i.e., the outbreak size). And since public health interventions typically reduce outbreak size we might expect them to also reduce the total amount of evolution. However, this is not true. The rate of evolution is determined by the difference in growth rate of infections carrying the two different alleles (this is the selection coefficient in equation (1)) whereas the total outbreak size is determined by the values of these growth rates themselves. This is what evolutionary biologists refer to as the distinction between relative versus absolute fitness. So interventions like social distancing that flatten the curve can slow the rate at which an epidemic grows without slowing the rate at which evolution occurs. Because such interventions typically extend the duration of the epidemic, all else equal, they will thereby result in a greater total amount of evolutionary change. Of course, all else need not be equal, because social distancing can also affect the strength of selection (see main text). The net outcome will therefore depend on the balance of these effects. In addition, reducing the total number of infections will reduce the input of SARS-CoV-2 mutations, which can slow adaptation, especially if complex mutations underlie fitness gains. Furthermore, even if rapid epidemics end the potential for evolution locally (see 'X's in Figures 3 and 4) , evolution will continue globally as long as COVID-19 is still circulating in the human population. To determine the net impact of public health measures on evolutionary change in this context requires models that consider the appearance of mutations and their geographic spread. • The RNA virus SARS-CoV-2 is genetically variable, but there is no conclusive evidence that existing variants affect viral fitness or disease progression. • Claims of positive selection must be tested against null models that account for the stochasticity of disease spread and founder events, which can mimic the action of selection. • Modeling reveals how selection would act on SARS-CoV-2 mutations that alter viral transmission, disease progression, disease severity, or combinations of these traits. • Direct evidence linking mutations to disease characteristics is needed before evolutionary ideas can guide public health interventions. • If functional differences are verified, rapid typing would allow limited resources for mitigation measures to be tailored and targeted most appropriately. Curves show the numbers of infected (red) and susceptible individuals (blue), measured as a fraction of the initial number of susceptibles, as well as the frequency of the mutation (black). Solid curves are with evolution (dashed without for reference). Inset bar chart shows cumulative deaths, with ticks at 1% intervals (pink: without evolution; red: with evolution). In Brief Evolutionary theory predicts that selection favors increased transmission, longer presymptomatic periods, fewer asymptomatic cases, and lower disease severity for SARS-CoV-2. However, viral mutations are expected to affect combinations of these traits, making it challenging to predict the direction and disease impact of evolution. To understand the evolution of SARS-CoV-2, we describe changes over time in the number of individuals for each disease class (S, E, A, P, I, R) and genotype (*) using the following set of differential equations (see Figure 2 for a description of the notation and reference [S1] for an introduction to modeling disease dynamics: Equations (S1) allow for an arbitrary number of genotypes, but for simplicity we focus on two: the resident (where * = r) and a mutant (* = m). Within each class X of infected individual (e.g., pre-symptomatic, X = P), the frequency of the mutation, , evolves according to: To understand the net effect of selection and movement, especially in more complicated cases where mutations affect multiple life-history traits, we must approximate (S1). One approach assumes weak selection and focuses on the early growth phase of the epidemic, assuming that 3 the number of susceptible individuals does not decline fast relative to the transition rates among disease classes. Then, treating S as a constant in equations (S1), the dynamics of viral genotype * within infected individuals (E*, A*, P*, I*) become a linear system of equations that can be described in matrix form, M*, and analysed using standard approaches for stage-structured models (see, e.g., reference [S1]). The change in frequency of a mutation is then approximated by its effect on the leading eigenvalue (λ). For mutations that have a small effect on disease attributes (z), we can write: where ( Selection during the growth phase of the disease then favors fewer transitions to the asymptomatic class (negative ∆f ), a quicker progression out of the non-infectious exposed class (positive ∆κ E ), and a slower progression out of the pre-symptomatic class (negative ∆κ P ). 4 Note that the predictions on the evolution of the asymptomatic class f are sensitive to the reproductive values of asymptomatic individuals relative to pre-symptomatic ones. For example, in Figure 3 , we assumed that transmission rates for asymptomatic individuals are only 1/10 th that of pre-symptomatic cases, but if transmission rates were equal then selection reverses and favors more asymptomatic cases (higher f, see additional figures in DRYAD package). On the other hand, we also assumed that transmission occurred over a longer period -9 days, on average -for asymptomatic cases versus 6 days for individuals who will eventually show symptoms; if the period of asymptomatic transmission were shorter, selection would again favor a reduction in number of asymptomatic cases (lower f). An alternative approximation is possible when the time spent in the exposed class is short enough that the E class can be ignored (κ E high: newly infected individuals enter directly into the pre-symptomatic or asymptomatic classes), we can then revise (S1) by dropping When self-isolation of symptomatic cases is effective and new infections emerge primarily from the pre-symptomatic class (β P * >> β I *, β A * ), the pre-symptomatic class will predominate the seeding of new infections, and evolutionary changes in this class will be inherited by descendent infections. We can thus approximate the evolutionary trajectory by describing the mutant dynamics in this source class, P. Plugging dP*/dt from (S7) into (S3) for the pre-symptomatic class (X* = P*) gives the evolutionary equation (1) discussed in the text (assuming weak selection and dropping lower order terms such as ∆β P ∆f ). Figures 3 and 4 illustrate the disease dynamics given by equation (S1) for parameters chosen to be roughly consistent with the references cited in the main text. All individuals are assumed to be initially susceptible (S = 1, measured in units of the initial 5 population size), into which a small fraction of exposed individuals is introduced (E * = 10 −5 ). The introduced virus is assumed to be either genetically homogeneous (no evolution case; dashed curves) or to consist of 99% resident and 1% mutant genotypes (allowing evolution; solid curves). Although we consider a deterministic model, the pathogen population is doomed to go extinct locally because the resource (the density of susceptible hosts) is assumed to decrease over the time scale of the pandemic (we neglect human reproduction, migration, and waning immunity). We thus assume the pathogen goes locally extinct if the frequency of The phylogenomics of evolving virus virulence Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus Global trends in emerging infectious diseases The proximal origin of SARS-CoV-2 On the origin and continuing evolution of SARS-CoV-2 Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic Regaining perspective on SARS-CoV-2 molecular tracing and its implications Nextstrain: real-time tracking of pathogen evolution Global circulation patterns of seasonal influenza viruses vary with antigenic drift Response to: On the origin and continuing evolution of SARS-CoV-2 The evolution of virulence in parasites and pathogens: reconciliation between two competing hypotheses Models of Parasite Virulence A general theory for the evolutionary dynamics of virulence Applying population-genetic models in theoretical evolutionary epidemiology Forecasting epidemiological and evolutionary dynamics of infectious diseases Mapping the antigenic and genetic evolution of influenza 20 virus Current and future influenza vaccines A serological assay to detect SARS-CoV-2 seroconversion in humans Adaptive dynamics of infectious diseases: in pursuit of virulence management The incubation period of Coronavirus Disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application Temporal dynamics in viral shedding and transmissibility of COVID-19 Estimating clinical severity of COVID-19 from transmission dynamics in Wuhan, China Estimates of the severity of coronavirus disease 2019: a model-based analysis The role of asymptomatic SARS-CoV-2 infections: rapid living systematic review and meta-analysis Early dynamics of transmission and control of COVID-19: a mathematical modelling study Evolution of virulence in emerging epidemics Spatial structure, transmission ,odes and the evolution of viral exploitation strategies The adaptive evolution of virulence: a review of theoretical predictions and empirical tests Virulence evolution and the timing of disease life-history events Review of aerosol transmission of influenza A virus Modes of contact and risk of transmission in COVID-19 among close contacts Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: an observational cohort study Virulence evolution via host exploitation and toxin production in sporeproducing pathogens Can Ebola virus evolve to be less virulent in humans? Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene Coupling adaptive molecular evolution to phylodynamics using fitness-dependent birth-death models Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2 Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China Attenuation of replication by a 29 nucleotide deletion in SARS-coronavirus acquired during the early stages of human-tohuman transmission No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2 ) FIGURE LEGENDS Figure 1: Variability among SARS-CoV-2 genomes. Genetic diversity segregating among SARS-CoV-2 genomes We are grateful to four anonymous reviewers, Ailene MacPherson, Leithen M'Gonigle, Arne Mooers, Berni Crespi, and Michael Whitlock for helpful comments on previous drafts that The authors declare no competing interests. Supplemental Information includes methods, discussion and references and can be found with this article online at *bxs.