key: cord-0939316-b2s5uu8a
authors: Ruan, Yongsen; Hou, Mei; Li, Jiarui; Song, Yangzi; Wang, Hurng-YI; He, Xionglei; Zeng, Hui; Lu, Jian; Wen, Haijun; Chen, Chen; Wu, Chung-I
title: One viral sequence for each host? – The neglected within-host diversity as the main stage of SARS-CoV-2 evolution
date: 2021-06-21
journal: bioRxiv
DOI: 10.1101/2021.06.21.449205
sha: 56c5488d61a846f0ca0d0aecff110f37866dd7ff
doc_id: 939316
cord_uid: b2s5uu8a

The standard practice of presenting one viral sequence for each infected individual implicitly assumes low within-host genetic diversity. It places the emphasis on the viral evolution between, rather than within, hosts. To determine this diversity, we collect SARS-CoV-2 samples from the same patient multiple times. Our own data in conjunction with previous reports show that two viral samples collected from the same individual are often very different due to the substantial within-host diversity. Each sample captures only a small part of the total diversity that is transiently and locally released from infected cells. Hence, the global SARS-CoV-2 population is a meta-population consisting of the viruses in all the infected hosts, each of which harboring a genetically diverse sub-population. Advantageous mutations must be present first as the within-host diversity before they are revealed as between-host polymorphism. The early detection of such diversity in multiple hosts could be an alarm for potentially dangerous mutations. In conclusion, the main forces of viral evolution, i.e., mutation, drift, recombination and selection, all operate within hosts and should be studied accordingly. Several significant implications are discussed.

In the current practice of studying the evolution of viruses, in particular, SARS-CoV-2, each individual host is assumed to harbor one strain, or one highly prevalent strain (CANDIDO et al. 2020; FORSTER et al. 2020; RAMBAUT et al. 2020; TANG et al. 2020) . Hence, only one genomic sequence needs to be presented for all the virions within the host. In this mainstream view, the amount of diversity within each individual is negligible. Consequently, the selective advantage driving adaptive evolution would not be within the host. Instead, viral 40 evolution would happen mainly between human hosts and the evolutionary dynamics of the virus would parallel the evolution of its human host. Viral selective advantages may take the form of surviving in the aerosol or altering host behaviors to facilitate transmission.

Despite the mainstream view, the viral evolution might happen mainly within each individual host. After common to be transmitted. Virions that proliferate more efficiently within the host would then be transmitted at a higher rate than others. For example, the spike D614G mutation can lead to more efficient replication, infection, and competition, compared with the wild-type virus (HOU et al. 2020; KORBER et al. 2020) . Given the large number of virions within a host, generally > 10 9 , high genetic diversity seems plausible. 5 If the evolution first happens within individual hosts and then continues between individuals, the twostage evolution would add a layer of complexity and present a set of new challenges in experimentation, data collection, modeling and conceptual development. In particular, the transmission of the within-host diversity from one host to another would be a key factor. The transmission is determined by i) the number of virions transmitted in an infection (N0); ii) the number of infections by each host, corresponding roughly to R0 in 10 epidemiology.

In the literature, N0 has often been estimated to be < 10 (and close to 1) for various viruses including SARS-CoV-2 (POPA et al. 2020; BRAUN et al. 2021; LYTHGOE et al. 2021; MARTIN AND KOELLE 2021; WANG et al. 2021) , influenza virus (XUE AND BLOOM 2019) and HIVs (KEELE et al. 2008 ). This conclusion is usually 15 reached by comparing the within-host diversity profiles between the donor and recipient. Such comparisons will be shown to be inappropriate for estimating N0 for SARS-CoV-2.

In this study, we will determine the level of viral diversity within each individual host by sampling the viral populations from the same individual at different times or from different tissues. If the within-host 20 diversity is substantial, we would ask how the diversity is transformed into the between-host polymorphism. In particular, it may be possible to detect the emergence of advantageous mutations when they are still part of the within-host diversity before they spread among hosts. This time-lapse may be used as an early warning system. Knowing the within-host diversity is the first step in studying the long-term viral evolution.

In this study, the collection of viruses within an individual is referred to as a population or, when necessary, a sub-population. The entire collection of viruses from all infected individuals is referred to as the 30 meta-population. Genetic diversity and polymorphism are used, respectively, for within-and between-host variation. Genetic diversity is transmitted during infection via N0 virions, which would proliferate to Nt virions at time t in the new host. We use the branching process to track the genetic drift during the rapid growth phase of the viral population RUAN et al. 2021a; RUAN et al. 2021b) . 35 

We first examine viral samples collected from the same host, either from the same tissue at different times or from different tissues at the same time. Strikingly, many variants are sample-specific and at a high frequency (referred to as SSH sites) as shown in Fig. 1 . In these panels, variants that are found in more than one sample are connected by line 45 segments and SSH variants are unconnected by lines. Fig. 1A and Fig. 1C (from a pharyngeal and sputum sample, respectively) show a simple pattern whereby almost all variants are SSH variants with no sharing among samples. Some of these samples were collected only 2-4 days apart. Fig. 1B and 1D show a degree of sharing in addition to the SSH variants. The shared variants, however, are often very different in frequency. For example, variants that are 0.1 -0.4 in frequency in the first sample often jump to near fixation in a second sample collected two days later and stay in high frequency in the third sample (Fig. 1B, pharyngeal) . 1D (feces sample) shows a similar pattern as Fig. 1B . Such large changes are unlikely to be attributable to natural selection as they involve too many variants that change too rapidly. Furthermore, Fig. 1E -F presents the results of samples collected from the same patient at the same time but from different tissues. The differences among such samples resemble the patterns found between sample collected at different times. These differences between samples of the same individual would cast doubt on the value of comparing samples 10 from two individuals.

The different patterns among samples from the same patient ( Fig. 1 ) may suggest the within-host 15 diversity to be far larger than realized. What would then be the total viral diversity within the same patient? And how is that diversity distributed in space and time within the host? The model presented in Fig. 2 attempts to address the issue. Briefly, each sample collected represents a small slice of the total diversity, consisting of virions recently released by a small subset of cells locally. The main viral population may be actively proliferating inside the cells unavailable to sampling. In this model, the viral population within a host is started by the N0 founders acquired in an infection. Each virion in the N0 sample subsequently expands into a large genealogy with numerous virions. The 25 genealogy itself is quite different from the conventional bifurcating tree in organismal evolution. Here, the viral evolution within the host is modeled by the branching process, rather than by the conventional Wright-Fisher (WF) model RUAN et al. 2021a; RUAN et al. 2021b ).

The salient feature of Fig. 2 is that each sample represents only a small part of the genealogy, due to 30 the transient releases of virions from the local patches of infected cells. This view of restricted sampling in both space and time may be the main reason for the discordant sample profiles from the same individual. Random sampling from the entire viral population within an individual appears untenable for SARS-CoV-2. (It may be more achievable for other viruses like HIV or HBV that has a high concentration in the peripheral blood than for coronaviruses (KOZIEL AND PETERS 2007) .) The observations and modeling suggest very high 35 viral diversity within individual hosts, within which the virus evolves.

From the genealogical tree emerge de novo mutations, some of which become SSH variants shown by the star symbol in Fig. 2 . The de novo mutations, up to 50% of which could be the products of RNA editing, have been suggested to be an anti-viral strategy of the host cells. The signature of these de novo mutations is 40 in accord with that reported for RNA editing (DI GIORGIO et al. 2020; MOURIER et al. 2021 ) with a preponderance of T-to-C, A-to-G and C-to-T changes (Fig. 3A ). It can be further shown that the frequency distributions are the same for all classes of mutations (Fig. 3B ), suggesting that these 12 types are comparable in their fitness consequences.

If the virus evolves within each host, what would be the fate of new mutations between their emergence and the time of transmission. Due to the rapid population expansion from a small N0 cohort, each neutral mutation would have a very low probability of being transmitted. Imagine a mutation with a fitness advantage of s within the host. For the neutral mutation, s = 0. Each of the N0 virions would grow to the size of NT ≥ 10 8 . 5 If the mutation occurs at time t, its frequency would be 1/Nt at emergence and stay near 1/ Nt as the population expands. A mutation is far more likely to be sufficiently frequent to be transmitted if it has a selective advantage. Fitness advantage may be of two kinds -between hosts during transmission (the B-model) or within hosts after transmission (the W-model). An advantageous mutation in the B-model would be assumed neutral within hosts. Hence, the neutral dynamics presented below applies to the advantageous mutations of 10 the B-model.

At transmission, the number of virions transmitted to the next host is N0. Since N0 has never been estimated to be > 10 3 , the mutant frequency has to be in the order of 10 -3 or higher to have a chance of being transmitted. The dynamics is shown in Fig. 4A as a function of the time of emergence. Fig. 4B shows that if 15 the mutation rate is implausibly high for the first mutant to emerge when Nt < 100, transmission may be possible. In the W-model with s > 0, there is a chance for transmission even if the mutation emerges late, as long as s is sufficiently large (Fig. 4C ). results (see Methods). In a time interval (i.e. one generation), the number of virions produced by each virion is k with the mean of E(k) and variance of V(k). Let the mutation rate be μ and the mutant would reach a frequency of x when NT = 10 8 . We let E(k) = 2 for neutral mutations and E(k) = 3 for s = 0.5 in the W-model. V(k) is usually set at 10E(k) as done before (RUAN et al. 2021a; RUAN et al. 2021b ).The adaptive mutation rate is set at either μ = 0.01 or μ = 0.0001 per generation. These rates, artificially high for adaptive mutations, 25 permit us to compare the fates of neutral vs. adaptive mutations.

While the mean x can be obtained analytically, the interest is in the outliers for the occasional mutant 30 reaching a high x (say, > 0.1). The quantitative results are shown in Fig. 4D . For neutral mutations, x is always < 0.05 even when μ is extremely high at 0.01. If μ is set at a lower and more plausible 0.0001, x is always < 0.001 and N0 has to be > 1000 for transmission. For s = 0.5 with μ = 0.01, x is always > 0.3 and often > 0.5; hence, N0 >= 3 should be sufficient for the mutation to be transmitted. When μ = 0.0001, there is a 20% chance that x > 0.5 and the mutation is still highly transmissible. 35 We should note that the parameters of μ = 0.0001 and a selective advantage of s = 0.5 (i.e., a strongly advantageous mutation emerges once among 10,000 virions) are likely too high as well. The high values are used to show how adaptive mutations within hosts would greatly speed up the rate of viral evolution. At a lower mutation rate, there would be almost no chance for neutral mutations to reach a high frequency to be transmitted. For the W-model, the rate of adaptive evolution may still be small but it would be non-zero if s is not too small.

From within-host diversity (iSNVs) to between-host polymorphism (SNPs) 45 The evolution of virus in the meta-population happens in stages as illustrated in Fig. 5A where each circle portrays the viral diversity of an infected individual. Stage 1a and 1b are mainly about within-host evolution while Stage 2a and 2b are about between-host evolution. Multiple circles in a stage comprise the meta-population. The evolution starts with iSNVs (for intra-host nucleotide variation), which may or may not evolve into SNPs (single nucleotide polymorphism in the host population). All SNPs obviously evolve from some iSNVs but most iSNVs probably fail to become SNPs. In the current practice, only the prevalent variant 5 is presented for each infected individual, shown next to the circle.

In stage 1a, a T-mutant emerges from the viral population of A in an infected individual (the red-border box). Stage 1 is likely the most common outcome because the new mutant is unlikely to rise to a high frequency to be transmitted. Occasionally, if T has a selective advantage, it will rise to a "transmissible" frequency, which 10 depends on the size of N0 (see below) and the evolution progresses to stage 1b. Between stages 1a and 1b is the most crucial transition when the mutant comes "out of the gate". From stage 1b and on, T would increase in frequency is some individuals while the A/T diversity would continue to infect more individuals as illustrated in Fig. 3 . Stage 2a is the only stage when SNPs overlap with iSNVs. Stage 2b is the next stage of evolution but appears to be the mirror image of Stage 1b, neither of which has the SNP polymorphism. respectively. Each quartet of bars represents SNPs of an increasingly higher frequency as marked at the bottom. It is evident that iSNVs and SNPs move in the same direction and the number of samples with iSNVs is substantial. Across all SNP sites (the last set of bars), 8.6% of samples are in the transition from iSNV to SNP whereas fixed SNPs account for only 6.8% of the samples. The mean mutant frequency from all samples for each site, iSNV (mean), is plotted against the SNP frequency (Fig. 5C ). The high correlation coefficient (R 2 = 25 0.75) supports the view that SNP increases in frequency as iSNV becomes more common.

The genetic diversity shaped by events within hosts may or may not be transmitted to other hosts, 30 depending on how often the transmission happens and how many virions are transmitted each time. The former can be roughly estimated since it is correlated with basic reproduction number (R0). The latter, i.e., the size of N0, has been of substantial interest lately. N0 is usually done by comparing the diversity profiles between donors and recipients. 35 Not unexpected, two samples from the same patient shown in Fig. 6A look like samples from the donorrecipient pairs. Fig. 6B presents a typical example of the observed pattern between donors and recipients. This example is a composite of multiple samples from patients of the Beijing area in the period of January to April 2020. Another example (Fig. 6C-6D) is a re-compilation of the published data in Austria (POPA et al. 2020; MARTIN AND KOELLE 2021) . In all these panels, variants common in one host are often undetectable in the 40 other (i.e., data points on the X or Y axis).

The large donor-recipient difference has led to the conclusion of a very small N0 during transmission 45 (POON et al. 2016; MCCRONE et al. 2018; XUE AND BLOOM 2019; POPA et al. 2020; BRAUN et al. 2021; LYTHGOE et al. 2021; MARTIN AND KOELLE 2021; WANG et al. 2021) . However, only a certain type of differences can be explained by a small N0. In the simulations of Fig. 6E , the donors and recipients are shown to have similar profiles when N0 = 100. With N0 = 1, the donor-recipient difference is large but the variants are either fixed or lost with the arrows indicating where the data points would be. At N0 = 10, the pattern is closer to N0 = 100 than to N0 = 1 (Fig. 6F) . The difference between the expected (Fig. 6D-6F ) and the observed 5 ( Fig. 6B-6D ) would argue against transmission bottlenecks of any size.

It seems obvious that SSH variants have to be discounted in estimating N0. In Fig. 1E , we re-analyzed the data of Popa et al. It can be seen clearly that, by removing SSH variants entirely, the pattern agrees with Popa et al.'s estimates of N0 >100 for SARS-CoV-2 (POPA et al. 2020; MARTIN AND KOELLE 2021). 10 Nevertheless, the accuracy of estimation based on the donor-recipient comparisons would be low if N0 > 10. At present, a conservative conclusion would simply be that N0 ≫ 10.

The 2-stage evolution of viruses has several implications. Let the average number of virions in a host be N and the number of infected individuals in the meta-population be M. The size of the meta-population is hence MP = N × M. For SARS-CoV-2, MP is easily 10 15 or larger at some point.

First, given the extremely large MP, the rate of adaptive evolution of SARS-CoV-2 may in fact be 20 quite modest. In a 2-stage system, a mutant has to clear the hurdle within the host in order to enter into the meta-population. Thus, moderately advantageous mutations may not be able to rise to a frequency high enough to be transmitted. On the other hand, although strongly advantageous mutations may be able to rise, they are generally uncommon. In a sense, the two stages of viral evolution divided by inter-host transmissions put a brake on the rapid evolution of viruses. (1) 30 Here, U is the rate of adaptive mutation and s is the selective advantage (EYRE-WALKER 2006). Eq. (1) is as conventionally understood but, in viral evolution, an increase in R would also lead to an increase in M. In other words, R and M would form a positive feedback loop as a larger R (more advantageous mutations) would lead to a larger M and vice versa. Thanks to the positive feedback look, M could be in a runaway process, 35 akin to the mutation accumulation of cancer cells (RUAN et al. 2020 ). Pandemics are not as common as one may fear because the feedback loop cannot easily get started. Nevertheless, when M is allowed to become very large (due to host behaviors, for example), the loop may be able to get rolling thus kicking start the runaway process. Whether COVID-19 has entered the runaway process at some point of the pandemic will be a worthy topic for the future. 40 Third and fortunately, there may be a way to catch the rise of an advantageous mutation before it is too late. While stage 2a of Fig. 5 Fig. 5A , if we could detect T in several infected individuals as a low frequency iSNV before it progressed to stage 2a, it may be possible to stop T from getting out of the gate. It thus behooves us to have some withinhost diversity data. The solution needs not be to increase the number of samples, which may be logistically difficult. Instead, all samples could be sequenced to a greater depth and in high fidelity such that all minor variants at 1%, or even lower, can be detected with certainty. The diversity data within hosts would then be 5 immensely more informative with only modest extra efforts. Virion samples of the respiratory tract are of the greatest interest.

In conclusion, the evolution within hosts is the very first step of the long-term evolution of viruses. Figs. 4-5 show how adaptive mutations would spread in the meta-population. This spread is key to our view 10 of the evolutionary dynamics and, hence, the containment strategy. The standard data of one viral sequence per host do not lend themselves easily to within-host analyses. Fortunately, slightly greater efforts in bulk sequencing, data presentation and model building may fill the large gap in studying SARS-CoV-2 as well as many other viruses. (2) for each site of the SARS-CoV-2 genome, the aligned low-quality bases and indels were excluded to reduce possible false positives and the site depth and strand bias were recalculated; (3) samples with more than 3,000 sites with a sequencing depth ≥100× were selected as candidate samples for iSNVs 35 

A series of criteria were used to ensure high quality iSNVs with Q20 reads support: (1) minor allele frequency of ≥ 5% (a conservative cutoff based on an error rate estimation); (2) depth of the minor allele ≥ 5. After calling variants, we used ANNOVAR software (YANG AND WANG 2015) to annotate the variants and found the count of alternative allele and total depth for each variant using SAMtools (see Data file S1).

We reanalyzed 138 COVID-19 samples with clinical information of Popa's data (POPA et al. 2020) , which including 39 transmission pairs. We downloaded the clinical information and vcf files at https://zenodo.org/record/4247401 and https://github.com/m-amartin/sarscov2_nb_reanalysis. We used python scripts to merge the frequency of iSNVs of these 138 samples (see Data file S2). For each transmission pair, we identified the variants at frequency of ≥ 1% and showed the allele frequency change between donor and recipient (Fig. 6C, 6D) . We used the threshold 100 of transmission bottleneck (N0), estimated by Martin et al (MARTIN AND KOELLE 2021) , to divide the alleles into two groups (see Fig. 6C, 6D) .

5 the genetic drift after single generation ). Here we expand it and get the genetic drift after multiple generations, which can be used to estimate the variance of alternative allele frequency within host. According to , the average and variance of population size at time t are

Assuming there are two kinds of alleles, and their numbers at generation t are , . and will be independent. If there is no selection, then 

Specially, when = 1,

which is the same as Eq. (5) in .

Assuming there are n alleles with corresponding 30 frequencies 1 , 2 , … , in donor, we will obtain the expected allele frequency of recipient under a particular transmission bottleneck size 0 as follows. For traditional WF model (Fig. 6E) , each allele is independent and its allele frequency in next generation will follow binomial distribution. Thus, given transmission bottleneck size 0 , for the allele with frequency in donor, its frequency in donor, ′, will be sampled from following binomial distribution

After transmission, we assume the virus will grow to a particular number, , before it be sampled and sequenced (Fig. 6F) . During the branching process of virus growth, we assume each virus will generate k number of offspring, where k follows a negative binomial distribution with mean E(k) and variance V(k). Thus, the expected time at which the virus is sampled to determine the recipient allele frequency (denoted by ) is

According to Eq. (A1) and Eq. (A3), given the initial allele frequency ′, we can obtain the mean and variance of when population size grows from 0 to :

Simply, we can assume follows a normal distribution with mean and variance to be ( ) and ( ). Now, we can obtain the expected allele frequency in donor recipient by sampling from the normal distribution.

Simulating the fate of new mutations within host under different selection strength. As done 20 before (RUAN et al. 2021a; RUAN et al. 2021b) , we simulate mutation accumulation of virus within host by branching process. In the branching process, each virion would produce k descendants in a period of time, say 24 hours, with the mean and variance of E(k) and V(k), respectively. E(k) is the rate of viral proliferation and can be estimated with reasonable accuracy. In the WF model, V(k) would equal E(k) and is unlikely to be correct for viruses. Instead, the population genetic parameters of SARS-CoV-2 (and perhaps many other 25 viruses as well) suggest that V(k) may be far larger than E(k). For example, it has been estimated that only 10 5 -10 7 cells in a human host are infected at any moment (SENDER et al. 2020) . This is a tiny fraction of human cells, given the estimated 10 13 -10 14 cells in the host. Hence, each infected cell may need to produce a large number of virions. The numbers suggest a very large V(k) for SARS-CoV-2. Here we let V(k) = 10E(k), where k follows negative binomial distribution. The initial E(k) is 2, which can be increased by new adaptive 30 mutations.

First, we set up a mutation-free viral population (initial population size is 8 to avoid population extinction) at initial time. Mutations were accumulated in a Poisson process with mutation rate = 0.01, 0.0001 per generation until the population size reached 10 8 . In neutral evolution, we assume each new mutation is neutral (s = 0). And s is 0.5 for each mutation in adaptive evolution. For each parameter set, we 35 repeat the simulation 10 times (see Fig. 4D ). Finally, we calculated the frequency of the most dominant allele (i.e. the highest frequency of all the accumulating mutations).

To estimate the correlation between SNPs and iSNVs frequency, we picked 29 of 44 local patients (101 samples in total) by 40 following criteria: (1) patients with at least one sputum or pharyngeal samples; (2) the 100X genome coverage ratio of samples in this patient is greater than 80%. For the patient with more than one samples fitting the criteria, we only picked the sample with the highest coverage. Then, we used the 29 local samples (one sample for one patient) to find SNP sites based on the frequency of iSNV sites.

For each iSNV site in the 29 samples, we counted the number of samples whose iSNV frequency is < 0.5 (reference allele), denoted by Nref. Similarly, we denoted the number of samples with iSNV frequency ≥ 0.5 by Nalt (alternative allele). If Nref > 0 and Nalt > 0 (reference allele and alternative allele existed at the same time), the iSNV site corresponds to a consensus SNP site. And the SNP frequency was Nalt / (Nalt + Nref). Based on this, we found 95 SNP sites in the 29 samples. For the 95 SNP sites, we also calculated the average iSNV 5 frequency in the 101 local samples (see Data file S1). The correlation between SNP frequency and iSNV frequency is shown in Fig. 5C. fitness. Nature 592: 116-121. shown. The Y-axis shows the frequency of the variant (or AF, for alternative allele vis-à-vis the reference genome). The dashed orange lines show the frequency cutoff at 0.05 or 0.95. (A-D) Allele frequency from samples of the same point collected at different time points. Variants found in more than one sample are connected by a line. (E-F) Allele frequency of samples collected at the same time but from different tissues of the same patient. X-axis labels different tissue types, S for sputum, P for pharyngeal swap, F for feces. All panels show that most variants, even high-frequency ones, are sample-specific.

The model attempts to explain the different diversity profiles between samples of the same patient. In this model, any viral sample (e.g., taken by the nasal-pharyngeal swab) represents only a small part of the geological tree that descends from a virion in the N0 cohort. The star symbol represents mutations. These virions are likely extra-cellular particles 5 released from infected cells but have not infected another cell. Each blue triangle depicts the expansion of an ancestral virion in a local patch of tissue. It is assumed that the released virions tend to infect nearby cells. Two samples taken from the same tissue at different times (day X vs. day Y) may come from different parts of the genealogy and display very different mutation profiles. Note that the genealogy represents the descendants of only one virion. The total genealogy should consist of all lineages descending from the original can be detected or transmitted. Note that most mutations do not rise above the blue line. (D) The mutation frequency is simulated under two different mutation rates and two selective intensities by the branching process (see Methods). The viral population grows from 1 to 10 8 . The mutation with the highest frequency at the end of the simulation in each of the 10 simulations is marked on the horizontal line. See the main text for interpretation. 

Transmission of SARS-CoV-2 in domestic cats imposes a narrow bottleneck

Evolution and 5 epidemic spread of SARS-CoV-2 in Brazil

MINERVA: A Facile Strategy for SARS-CoV-2 Whole-Genome Deep Sequencing of Clinical Samples

A New Formulation of Random Genetic Drift and Its Application to the Evolution of Cell Populations

Antibody evasion by the P.1 strain of SARS-CoV-2

Transmission, infectivity, and neutralization of a spike L452R SARS-CoV-2 variant

Evidence for host-15 dependent RNA editing in the transcriptome of SARS-CoV-2

Mean and variance of ratios of proportions from categories of a multinomial distribution

The genomic rate of adaptive evolution

Phylogenetic network analysis of SARS-CoV-2 genomes

SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo

Identification and 25 characterization of transmitted and early founder virus envelopes in primary HIV-1 infection

Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus

Viral hepatitis in HIV infection

Fast gapped-read alignment with Bowtie 2

The Sequence Alignment/Map format and SAMtools

SARS-CoV-2 withinhost diversity and transmission

Reanalysis of deep-sequencing data from Austria points towards a small SARS-COV-2 transmission bottleneck on the order of one to three virions

Stochastic processes constrain the within and between host evolution of influenza virus

Host-directed editing of the SARS-CoV-2 genome

Quantifying influenza virus diversity and transmission in humans

Genomic epidemiology of superspreading events in Austria reveals mutational dynamics and transmission properties of SARS-CoV-2

A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology

On the founder effect in COVID-19 outbreaks: how many infected travelers may have started them all?

Mutations Beget More Mutations-Rapid Evolution 10 of Mutation Rate in Response to the Risk of Runaway Accumulation

A theoretical exploration of the origin and early evolution of a pandemic

The total number and mass of SARS-CoV-2 virions in an infected person

On the origin and continuing evolution of SARS-CoV-2

Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity

During Human-to-Human Transmission of SARS-CoV-2

A new coronavirus associated with human respiratory disease in China

Reconciling disparate estimates of viral genetic diversity during human influenza infections

Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR

Genetic drift during the proliferation in the host is simulated by the branching process model of Ref