key: cord-321166-nvphu1fm authors: Thomson, Emma C.; Rosen, Laura E.; Shepherd, James G.; Spreafico, Roberto; da Silva Filipe, Ana; Wojcechowskyj, Jason A.; Davis, Chris; Piccoli, Luca; Pascall, David J.; Dillen, Josh; Lytras, Spyros; Czudnochowski, Nadine; Shah, Rajiv; Meury, Marcel; Jesudason, Natasha; De Marco, Anna; Li, Kathy; Bassi, Jessica; O’Toole, Aine; Pinto, Dora; Colquhoun, Rachel M.; Culap, Katja; Jackson, Ben; Zatta, Fabrizia; Rambaut, Andrew; Jaconi, Stefano; Sreenu, Vattipally B.; Nix, Jay; Jarrett, Ruth F.; Beltramello, Martina; Nomikou, Kyriaki; Pizzuto, Matteo; Tong, Lily; Cameroni, Elisabetta; Johnson, Natasha; Wickenhagen, Arthur; Ceschi, Alessandro; Mair, Daniel; Ferrari, Paolo; Smollett, Katherine; Sallusto, Federica; Carmichael, Stephen; Garzoni, Christian; Nichols, Jenna; Galli, Massimo; Hughes, Joseph; Riva, Agostino; Ho, Antonia; Semple, Malcolm G.; Openshaw, Peter J.M.; Baillie, J. Kenneth; Rihn, Suzannah J.; Lycett, Samantha J.; Virgin, Herbert W.; Telenti, Amalio; Corti, Davide; Robertson, David L.; Snell, Gyorgy title: The circulating SARS-CoV-2 spike variant N439K maintains fitness while evading antibody-mediated immunity date: 2020-11-05 journal: bioRxiv DOI: 10.1101/2020.11.04.355842 sha: doc_id: 321166 cord_uid: nvphu1fm SARS-CoV-2 can mutate to evade immunity, with consequences for the efficacy of emerging vaccines and antibody therapeutics. Herein we demonstrate that the immunodominant SARS-CoV-2 spike (S) receptor binding motif (RBM) is the most divergent region of S, and provide epidemiological, clinical, and molecular characterization of a prevalent RBM variant, N439K. We demonstrate that N439K S protein has enhanced binding affinity to the hACE2 receptor, and that N439K virus has similar clinical outcomes and in vitro replication fitness as compared to wild- type. We observed that the N439K mutation resulted in immune escape from a panel of neutralizing monoclonal antibodies, including one in clinical trials, as well as from polyclonal sera from a sizeable fraction of persons recovered from infection. Immune evasion mutations that maintain virulence and fitness such as N439K can emerge within SARS-CoV-2 S, highlighting the need for ongoing molecular surveillance to guide development and usage of vaccines and therapeutics. SARS-CoV-2, the cause of COVID-19, emerged in late 2019 and expanded globally, resulting in over 41 million confirmed cases as of October 2020. Molecular epidemiology studies across the world have generated over 135,000 viral genomic sequences and have been shared with unprecedented speed via the GISAID Initiative (https://www.gisaid.org/). These data are essential for monitoring virus spread (Meredith et al., 2020) and evolution. Of particular interest is the evolution of the SARS-CoV-2 surface protein, spike (S), which is responsible for viral entry via its interaction with the human angiotensin-converting enzyme 2 (hACE2) receptor on host cells. The S protein is the target of neutralizing antibodies generated by infection or vaccination (Folegatti et al., 2020; Jackson et al., 2020; Keech et al., 2020) as well as monoclonal antibody (mAb) drugs currently in clinical trials (Hansen et al., 2020; Jones et al., 2020; Pinto et al., 2020) . A SARS-CoV-2 S variant, D614G, is now dominant in most places around the globe (Callaway, 2020) . Studies in vitro indicate that this variant may have greater infectivity while molecular epidemiology indicates that it spreads efficiently and likely maintains virulence (Hu et al., 2020; Korber et al., 2020; Volz et al., 2020; . Amino acid 614 is outside the receptor binding domain (RBD) of S, the domain targeted by 90% of neutralizing antibody activity in serum of SARS-CoV-2 survivors (Piccoli et al., 2020) . Initial studies suggest that D614G actually exhibits increased sensitivity to neutralizing antibodies, likely due to its effects on the molecular dynamics of the spike protein (Hou et al., 2020; Yurkovetskiy et al., 2020) . Therefore, this dominant variant is unlikely to escape antibody-mediated immunity. The low numbers of novel mutations reaching high frequency in sequenced SARS-CoV-2 isolates may relate to the moderate intrinsic error rate of the replication machinery of SARS-CoV-2 (Li et al., 2020c; Robson et al., 2020) and to this new human coronavirus requiring no significant adaption to humans (MacLean et al., 2020) . Nevertheless, the increasing number of infected individuals and the large reservoir of individuals susceptible to infection increases the likelihood that novel variants that impact vaccine and therapeutic development will emerge and spread. Moreover, the full impact of immune selection, which can drive variant selection, likely has not yet had a dominant influence on the pandemic, since herd immunity has not yet been attained. As population immunity increases and vaccines are deployed at scale this might change. The potential for circulating viral variants to derail promising vaccine or antibody-based prophylactics or treatments, even in the absence of selective pressure from the drug or vaccine, is demonstrated by the failure of a Phase III clinical trial of a mAb targeting the respiratory syncytial virus (Simoes et al., 2020) , and the need for new influenza vaccines on a yearly basis. It is therefore critical to understand whether and how SARS-CoV-2 may evolve to evade antibody-dependent immunity. Here, we examined the immunodominant SARS-CoV-2 receptor binding motif (RBM), the primary target of the neutralizing Ab response within the RBD (Piccoli et al., 2020) and found it to be less conserved than the RBD or the entire spike protein in circulating viruses. To understand the implications of this structural plasticity for immune evasion, we defined the clinical and epidemiological impact, the molecular features, and the immune response to an RBM variant, N439K. This variant has arisen independently twice, in both cases forming lineages of more than 500 sequences. As of October 2020, it has been observed in 12 countries and is the second most commonly observed RBD variant worldwide. We find that the N439K mutation is associated with a similar clinical spectrum of disease and slightly higher viral loads in vivo compared with isolates with the wild-type N439 residue, and that it results in immune escape from polyclonal sera from a proportion of recovered individuals and a panel of neutralizing mAbs. N439K provides a sentinel example of immune escape, indicating that RBM variants must be evaluated when considering vaccines and the therapeutic or prophylactic use of mAbs. Long term control of the pandemic will require systematic monitoring of immune escape variants and selection of strategies that address the variants circulating in targeted populations. Competing pressures influence the evolution of the spike RBM. First, the RBM mediates viral entry (Shang et al., 2020; Walls et al., 2020; Wrapp et al., 2020b) and therefore it must maintain sufficient affinity to engage the entry receptor hACE2. Second, it is a major target of neutralizing antibodies (Robbiani et al., 2020; Rogers et al., 2020; Wec et al., 2020) and could be a primary location for the emergence of immune escape mutations. We set out to understand these competing pressures by evaluating the landscape of RBM sequence divergence observed in circulating SARS-CoV-2 variants and in other viruses of the Sarbecovirus lineage. We used published X-ray structures of SARS-CoV and SARS-CoV-2 RBD:hACE2 complexes (Lan et al., 2020; Li et al., 2005) to define the RBM residues using a 6 Å distance cutoff (Figures 1A-C) . We evaluated ~130,000 SARS-CoV-2 genomic sequences deposited in GISAID as of October 7, 2020 and observed a high number of variants occurring in the RBM (Figure 1A) . To understand how the divergence of the RBM compares to the divergence of the entire RBD and the whole spike protein, we divided the spike protein into three non-overlapping regions: the RBM, the RBD outside of the RBM, and the full S protein outside of the RBD. We counted individual variants occurring at least ten times, and quantified substitutions of different amino acids at the same position as separate variants. We found that the RBM is the least conserved region of S ( Figure 1B) . To understand this result further, we evaluated a published deep mutational scanning (DMS) data set of the RBD and compared it to sequences of circulating viruses. The DMS data defines the effect of each possible single amino acid change on both expression of the RBD and its capacity to bind hACE2. For each position in the RBM, we compared the DMS results for all amino acid substitutions at that position versus only substitutions that have been observed in circulating SARS-CoV-2 isolates ( Figure 1C) . A subset of residues shows the largest loss of hACE2 binding upon mutation (top ~1/3 of RBM residues in Figure 1C ) and, as would be expected, few natural variants of these residues have been observed to be circulating to date. Surprisingly, these conserved residues each contribute weakly to the RBD:hACE2 total interaction energy (the sum of pair-wise interaction energies for all residues at the binding interface in the X-ray structure; "binding energy" in Figure 1C ). For the majority of the RBM (bottom ~2/3 of RBM residues in Figure 1C ), variation in circulating virus sequences confirms the tolerance to mutation predicted by the DMS data. Notably, several RBM residues forming the strongest interactions with the receptor, e.g. K417 and E484, are not highly conserved despite their predicted importance. These results suggest that the RBM has a degree of structural plasticity whereby it is able to accommodate mutations without disrupting hACE2 binding. Evolutionary analysis of Sarbecoviruses provides further support for RBM plasticity Li et al., 2020b; Rambaut et al., 2020) . The SARS-CoV RBM is highly divergent from the SARS-CoV-2 RBM (Figure S1A-B) while maintaining hACE2 binding affinity. Additionally, there are many sequence changes in the RBM across a panel of related coronaviruses from animal isolates (Figure S1A-B, Table S1 ). To determine the ability of members of the Sarbecovirus lineage to bind hACE2, we produced nine recombinant RBD proteins corresponding to seven animal isolates, SARS-CoV-2, and SARS-CoV and evaluated their binding to recombinant hACE2 ( Figure S1C ). We found that three of the RBDs from animal isolates showed strong affinity for hACE2: GD Pangolin, which has a highly similar RBM to SARS-CoV-2, and GX Pangolin and Bat CoV WIV1, which have highly divergent RBMs (Figure S1A-B) . This further supports the conclusion that the RBM is structurally plastic, while retaining binding with hACE2 as a receptor. Given this plasticity, we next considered whether an RBM variant can lead to immune evasion while retaining virulence. The two most commonly observed circulating RBD variants as of October 2020 contain mutations in the RBM (S477N and N439K). We first identified the N439K variant in March 2020, circulating in Scotland from lineage B.1 on the background of D614G (Da Silva Filipe et al., 2020) . Using phylogenetic analysis, we determined this variant represented a single lineage (Figure 2A ) that increased in frequency to 553 sequences by June 20, 2020 (~10% of the available Scottish viral genome sequences for this time period). Numbers of N439K and all other isolates decreased in Scotland concurrent with control of the pandemic by initiation of stringent public health measures and this lineage has not been detected in Scotland after June. However, the N439K variant has been observed in >659 sequences in a second lineage in Europe, first sampled in Romania on May 13, 2020, then Norway on June 23, 2020 and is now circulating in 12 countries, as well as arising independently in the U.S. (Figure 2A-C) . As of Oct 6, 2020, all 1201 N439K variants arose from a C-to-A transversion in the third codon position, though these counts are heavily influenced by sampling frequency which varies widely between countries. As Scotland has a high sampling frequency for its population size (~5.5M), it is possible to calculate a growth rate (Voltz and Frost, 2017) based on a comparison of the Scottish lineages. We find that the growth rate is similar to what has already been shown for the D614G background with no evidence for a faster rate of growth than N439 lineages ( Figure S2A ). In addition to its frequency and spread, the N439K variant stood out from other circulating RBM variants as having a plausible mechanism for maintenance of viral fitness. The equivalent position to N439K in the SARS-CoV RBM is also a positively-charged amino acid (R426), which forms a salt bridge with hACE2 (Li et al., 2005) . We therefore hypothesized that the N439K SARS-CoV-2 variant may form this additional salt bridge at the RBD-hACE2 interface (RBD N439K:hACE2 E329). Structural modeling supported that this salt bridge could form without disrupting the binding interface, including the two original salt bridges (RBD K417:hACE2 D30 and RBD E484:hACE2 K31) (Figure 3A-C) . A salt bridge is the strongest type of non-covalent bond and the N439K mutation could plausibly increase the number of salt bridges at the binding interface from two to three, presenting the hypothesis that the N439K variant may have enhanced binding for hACE2. To test this hypothesis, we used surface plasmon resonance (SPR) to evaluate binding of recombinant N439K S or RBD protein to recombinant hACE2. We also evaluated the N439R and K417V variants, each of which is found in SARS-CoV at these positions. Across multiple assay formats, we found that the N439K and N439R variants exhibited a ~2-fold enhanced binding affinity for hACE2 as compared to the original N439 variant (termed herein WT) ( Figure 3D ). The magnitude of this enhancement was paralleled by a ~2-fold loss of binding affinity for the K417V variant relative to WT. Lastly, we also tested the effect of the N439K/R and K417V mutations in combination. These double variants form the same number of salt bridges at the hACE2 binding interface as compared to WT, but one is at RBD position 439 rather than 417; we found they had an hACE2 affinity similar to the WT ( Figure 3D ). These data indicate that acquisition of the N439K mutation enhances binding affinity, which could have implications in vivo in the context of natural infection. Also, the enhanced affinity could plausibly compensate for other mutations that would otherwise be detrimental (e.g. K417V), further highlighting the plasticity of the RBM. The enhanced hACE2 affinity of the N439K variant, its geographical emergence as independent lineages as well as its prevalence among circulating viral isolates is consistent with maintained viral fitness. We set out to directly examine fitness by evaluating clinical data and outcomes of virus carrying the N439K mutation versus WT N439, as well as by direct in vitro viral growth and competition. We used qPCR to evaluate viral load (as measured by cycle threshold, Ct) in 1,918 Scottish patients whose viral isolates had been sequenced (Figures 4A-B ). Viral isolates were either N439K/D614G (n=406), N439/D614G (n=978) or ancestral (N439/D614) (n=534). Our analysis found strong evidence that the N439K/D614G genotype was associated with marginally lower cycle threshold (Ct) than the N439/D614G genotype (mean Ct value difference between N439K/D614G and N439/D614G: -0.65, 95% CI: -1.22, -0.07) ( Figure 4B ). As Ct measurements were carried out in multiple sites, a sub-analysis of viral load using RNA standards was carried out with available samples and showed a near-complete correlation with Ct ( Figure 4B ). D614G has previously been associated with higher viral loads/lower Ct values than D614 (Korber et al., 2020) but we did not detect this difference in this statistical analysis due to the intercept of the model being imprecisely estimated (Table S2) . Clinical outcomes were also obtained for a subset of these patients (n=1,591), who were scored for severity of disease based on oxygen requirement: 1. No respiratory support, 2: Supplemental oxygen, 3: Invasive or non-invasive ventilation or high flow nasal cannulae, 4: Death (Figures 4C and S2B ). Genotype counts for this analysis were N439K/D614G (n=399), D614G (with N439) (n=735) or ancestral (N439/D614) (n=457). Analysis based on our ordinal scale indicated that the N439K/D614G viral genotype was associated with similar clinical outcomes compared to D614G or ancestral genotypes (posterior mean: 0.06, 95% CI: -1.21, 1.33) ( Table S3 ). All other results from the severity analysis were qualitatively similar to a previous analysis of the D614G mutation (Volz et al., 2020) . These clinical data indicate that the N439K virus is not attenuated. We next tested growth of two representative SARS-CoV-2 isolates, GLA1 (WT N439) and GLA2 (N439K), both with the D614G background (Table S4) . Culture was carried out for 72 hours in Vero E6-ACE2 cells either with or without TMPRSS2 expression. There was no significant difference between the growth of these strains after inoculation at multiplicities of infection (MOIs) of 0.005 and 0.01. The N439K strain replicated slightly faster early after inoculation ( Figure 4D ). These data indicate that the N439K mutation does not exhibit dominant negative effects on viral growth, and most likely supports normal replication. To further assess fitness for replication in cultured cells, we carried out a cross-competition assay using inoculation of cells at a matched MOI followed by quantitation of N439 and N439K by metagenomic NGS over time (Figure 4E ). The N439K strain demonstrated similar fitness as the WT N439 strain, with a possible fitness advantage for N439K in cells expressing TMPRSS2. Taken together with the clinical outcomes, these results indicate that the N439K mutation results in viral fitness that is similar or possibly slightly improved compared to the wild-type N439. Having established that virus carrying the N439K mutation is fit, we sought to understand whether this mutation evades antibody-mediated immunity by evaluating recognition of the N439K variant by monoclonal antibodies and by polyclonal immune serum from 445 recovered individuals, including 6 donors who were infected by the SARS-CoV-2 N439K variant. 7.4% of the tested sera showed a greater than 2-fold reduction in binding to N439K RBD as compared to WT RBD (Figures 5A-B and S3) . In some individuals the RBD response was diminished to low titers of <1:30 by the N439K mutation. Thus, the response to the RBD is significantly influenced by the N439K mutant within the immunodominant RBM domain (Piccoli et al., 2020) in a significant portion of persons potentially immune to WT SARS-CoV-2. The majority of sera demonstrating loss of binding were those that had overall lower responses to WT RBD, indicating lower Ab titers. The sera from the six individuals known to have recovered from infection with SARS-CoV-2 N439K virus showed no change in binding levels to WT RBD as compared to N439K RBD (Figures 5A-B and S3) . This may reflect a true variant-specific response or that differential binding could not be measured due to the limited number of samples analyzed. To understand our results at the level of individual antibodies, we evaluated a panel of 144 mAbs isolated from individuals recovered from SARS-CoV-2 infection early in the pandemic (likely with N439 WT virus) (Piccoli et al., 2020; Tortorici et al., 2020) , as well as clinical-stage mAbs REGN10933, REGN10987, LY-CoV555, and S309 (the parent of VIR-7831) Hansen et al., 2020; Chen et al., 2020; Pinto et al., 2020) . 15.5% of these mAbs demonstrated a >2-fold reduction of RBD binding in response to the N439K mutation ( Figures 5C-D and S4 ). For comparison, we also evaluated the K417V mutation which eliminates one salt bridge at the RBM:hACE2 interface and the N439K/K417V double mutation. A similar percentage (12.8% for K417V vs 15.5% for N439K) of mAbs lost >2-fold binding to these variants, including several (13.5%) which were not sensitive to either single mutant but were sensitive to the double mutant ( Figures 5C-D) . The reduced binding of mAbs to these RBD mutants were also confirmed by bio-layer interferometry analysis (BLI) (Figures 5E and S5A) . To define the potential biological importance of these mutations for evasion of antibody-mediated neutralization, we tested mAbs against pseudoviruses expressing S variants N439K, K417V or N439K/K417V (Figures 5F-H and S5B ). Neutralization of pseudoviruses containing these mutations was significantly diminished for certain mAbs, including some that are in clinical development. As predicted by its non-RBM epitope , S309 was capable of neutralizing each of these variants. Sensitivity of some neutralizing mAbs to mutations at these positions have also been reported in other studies Greaney et al., 2020; Li et al., 2020a; Weisblum et al., 2020) but combinations of mutations have not typically been evaluated. Overall, our results demonstrate that mutations compatible with viral fitness can result in immune evasion from both monoclonal and polyclonal antibody responses. The evolution of the SARS-CoV-2 RBM, a critical epitope for vaccine response and therapeutic mAbs, will depend on the fitness of RBM variants. The findings herein describe an example of a naturally-occurring RBM variant which can evade antibody-mediated immunity while maintaining fitness. Fitness of this variant, N439K, was demonstrated by repeated emergence by convergent evolution, spread to multiple countries and significant representation in the SARS-CoV-2 sequence databases, the fact that the N439K RBD retains a high affinity interaction with the hACE2 receptor, efficient viral replication in cultured cells, and no disease attenuation in a large cohort of infected individuals. The fitness of N439K is consistent with our findings that the RBM is the most divergent region of S. This divergence indicates an ability of SARS-CoV-2 to accommodate mutations at the RBM while retaining the functional requirement of hACE2 binding, and is likely to be linked to immune pressure from neutralizing Ab responses. There is precedent for the most immunogenic region of a viral surface protein to be the fastest mutating despite harboring the receptor binding site; for example, the immunogenic globular head domain of the influenza virus hemagglutinin surface protein, which contains the sialic acid receptor binding site, evolves faster than the stalk region (Doud et al., 2018; Kirkpatrick et al., 2018) . The ability to accommodate mutations in the RBM indicates a high likelihood that immune-evading SARS-CoV-2 variants compatible with fitness will continue to emerge, with implications for reinfection, vaccines, and both monoclonal and polyclonal antibody therapeutics. In our profile of immune escape from the N439K variant, we observed resistance to a mAb currently being evaluated in clinical trials as part of a two-mAb cocktail. The promise of using cocktails of mAbs is that they should significantly lower the likelihood of drug-induced selection of resistant viruses . However, if circulating viral strains already carry resistant mutations to one antibody in the cocktail, this could reduce the cocktail to a monotherapy. Additionally, considering the high level of plasticity of the RBM demonstrated in the present study, there could be many combinations of RBM mutations compatible with viral fitness while leading to immune escape. This is supported by our result that N439K can compensate for a mutation (K417V) that otherwise decreases receptor binding affinity ( Figure 3D ). This particular combination of mutations is plausibly compatible with fitness as it parallels SARS-CoV RBM:hACE2 interactions (salt bridge at SARS-CoV RBD position R426 and no salt bridge at V404, Figure 3A) . Notably, several mAbs which were not sensitive to these mutations individually were sensitive to them in combination, including the two-mAb cocktail ( Figure 5C-H) . We propose two approaches that will be critical for minimizing the impact of mAb escape mutations. One is to develop mAbs with epitopes that are highly resistant to viral escape. This may include epitopes outside of the RBM and/or epitopes that are crossreactive across SARS-CoV and SARS-CoV-2, indicating conserved epitopes with a low tolerance for mutation Wec et al., 2020; Wrapp et al., 2020a) . A comparison of epitopes of RBM-targeting mAbs with the most conserved regions of the RBM ( Figure 1C ) may also identify RBM mAbs with a higher barrier to escape. The second approach is to screen patients, likely at the population level, for the presence of potential resistance variants prior to drug administration. The availability of multiple different mAb therapeutics in the clinic could provide the opportunity to tailor the choice of therapeutic to local circulating variants. In general, given that access to therapeutic monoclonal antibodies via clinical trials and emergency use authorization is expanding, and as more people develop immune responses to the wildtype virus, monitoring the evolution of SARS-CoV-2 will be increasingly critical. Although SARS-CoV-2 is evolving slowly and at present should be controllable by a single vaccine (Dearlove et al., 2020) , variation accumulating in the RBM could put this at risk, especially for individuals with a moderate Ab response to vaccination or infection. While we only report on evasion of antibody-mediated immunity here, it would be surprising to us if similar changes are not observed to evade T cell immunity and innate immunity. Wec, A.Z., Wrapp, D., Herbert, A.S., Maurer, D.P., Haslwanter, D., Sakharkar, M., Jangra, R.K., Dieterle, M.E., Lilov, A., Huang, D., et al. (2020) . Broad neutralization of SARS-related viruses by human monoclonal antibodies. Science 369, 731-736. Weisblum, Y., Schmidt, F., Zhang, F., DaSilva, J., Poston, D., Lorenzi, J.C.C., Muecksch, F., Rutkowska, M., Hoffmann, H.-H., Michailidis, E., et al. (2020) . Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants. (2020) . A new coronavirus associated with human respiratory disease in China. Nature 579, 265-269. Yurkovetskiy, L., Wang, X., Pascal, K.E., Tomkins-Tinch, C., Nyalile, T.P., Wang, Y., Baum, A., Diehl, W.E., Dauphin, A., Carbone, C., et al. (2020) . Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant. Cell. Zhang, L., Jackson, C.B., Mou, H., Ojha, A., Rangarajan, E.S., Izard, T., Farzan, M., and Choe, H. (2020) . The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity. https://wwwbiorxivorg/content/101101/20200612148726v1. Samples from 439 SARS-CoV-2 infected individuals were obtained from the Ticino healthcare workers cohort (Switzerland), described previously (Piccoli et al., 2020) , and under study protocols approved by the local Institutional Review Board (Canton Ticino Ethics Committee, Switzerland). All donors provided written informed consent for the use of blood and blood components (such as PBMCs, sera or plasma). In the Ticino region of Switzerland and during the time period of collection (February-March 2020) no N439K SARS-CoV-2 isolates were reported. Samples from six N439K variant infected individuals were obtained from the ISARIC4C consortium (https://isaric4c.net/). Ethical approval was given by the South Central-Oxford C Research Ethics Committee in England (reference 13/SC/0149), and by the Scotland A Research Ethics Committee (reference 20/SS/0028). The study was registered at https://www.isrctn.com/ISRCTN66726260. Residual nucleic acid extracts derived from the nose-throat swabs of 1918 SARS-CoV-2 positive individuals whose diagnostic samples were submitted to the West of Scotland specialist virology centre between 3 rd March and 30 th June 2020 were sequenced as part of the COG-UK consortium under study protocols approved by the relevant national biorepositories (16/WS/0207NHS and 10/S1402/33) (consortiumcontact@cogconsortium.uk, 2020) . RBM residues were determined based on the RBD:ACE2 complex crystal structures 2AJF for SARS-CoV (Li et al., 2005) and 6M0J for SARS-CoV-2 (Lan et al., 2020) . The 2AJF structure was obtained from the PDB-REDO server (pdb-redo.eu) and was subsequently prepared in the molecular modeling software MOE (v2019.0102, https://www.chemcomp.com) using the structure preparation, protonation and energy minimization steps with default settings. RBD residues within 6.0A distance of any ACE2 atoms (determined using MOE) were determined for each of the two copies of the complex in the asymmetric unit, and then were combined to obtain the RBM. 6M0J was obtained from the Coronavirus Structural Task Force server (https://github.com/thornlab/coronavirus_structural_task_force) and was further refined (using Refmac5 v5.8.0258), manually fitted (using Coot v0.9) and prepared (using MOE, as described above) in multiple iterative cycles. The final structure was analyzed for RBD-ACE2 contact residues with a 6.0A cutoff to obtain the RBM (using MOE). The final list of RBM residues (Figure 1C ) was arrived at by combining the SARS-CoV and SARS-CoV-2 results. Using MOE, the pairwise binding energy between each residue in SARS-CoV-2 RBD and each residue in ACE2, and the total binding energy for all interactions, was determined at cutoff distances 3.0A, 3.5A, 4.0A, 4.5A, 5.0A, 5.5A, 6.0A, 6.5A and 7.0A. The percentage of the total binding energy for each interacting RBD residue was calculated for each distance cutoff and was then averaged over all cutoffs. The resulting values are shown in green in Figure 1C . Differential accumulation of amino acid variants in the RBM, RBD or Spike protein was computed taking into account only the presence or absence of a variant at any residue. Each variant called present counts one. A variant is called present if there are at least X number of supporting sequences deposited in GISAID, where X varies from 2 to 20. The number of variants is then normalized to the size of the domain (number of residues). DMS data was retrieved from . Variant-level DMS scores were aggregated to residue-level by taking the minimum (most disruptive variant) or the average score across all variants of a residue, except for the reference residue and the stop codon. Alternatively, minimum and average scores are computed only across variants that have been observed as naturally occurring. Data were represented as a heatmap annotated with: frequency of non-reference amino acids in deposited GISAID sequences (n ≈ 130,000, at least 4 sequences were required to call a variant as present), in Log10 scale; number of countries in which a variant was observed; and percentage of total binding energy computed from an X-ray crystal structure (cf. structural analysis methods section). Prefusion-stabilized SARS-CoV-2 spike protein variants (residues 14-1211, containing the 2P and Furin cleavage site mutations with a muphosphatase signal sequence and a C-terminal Avi-8xHis-EPEA-tag in a pD2610-V5 vector (ATUM Bio) were expressed in Expi293F cells at 37°C and 8% CO2 according to manufacturer's instructions (Thermo Fisher Scientific). Cell culture supernatant was collected after four days and purified over a 5 mL C-tag affinity matrix (Thermo Fisher Scientific). Elution fractions were concentrated and injected on a Superose 6 Increase 10/300 GL column with 1x PBS pH 7.4 as running buffer. SARS-CoV-2 RBD variants (residues 328-531 with a C-terminal thrombin-cleavage site-TwinStrep-8xHis-tag, and N-terminal signal sequence) were expressed in Expi293F cells at 37°C and 8% CO2 in a humidified incubator. Transfection was performed using ExpiFectamine 293 reagent (Thermo Fisher Scientific). Cell culture supernatant was collected three days after transfection and supplemented with 10x PBS to a final concentration of 2.5x PBS (342.5 mM NaCl, 6.75 mM KCl and 29.75 mM phosphates), or 3.2x for RBD N439R. SARS-CoV-2 RBDs were purified using 1 or 5 ml HisTALON superflow cartridges (Takara Bio) and subsequently buffer exchanged into Cytiva 1x HBS-N buffer or PBS. RBDs from other sarbecoviruses were expressed in Expi293F cells at 37°C and 8% CO2. Cells were transfected using PEI MAX. Cell culture supernatant was collected seven days after transfection. Proteins were purified using a 5 ml Strep-Tactin XT Superflow high capacity cartridge followed by buffer exchange to PBS using HiPrep 26/10 desalting columns. For S binding measurements, recombinant ACE2 (residues 19-615 from Uniprot Q9BYF1 with a C-terminal thrombin cleavage site-TwinStrep-10xHis-GGG-tag, and Nterminal signal sequence) was expressed in Expi293 cells at 37°C and 8% CO2 in a humified incubator. Transfection was performed using ExpiFectamine 293 reagent (Thermo Fisher Scientific). Cell culture supernatant was collected seven days after transfection, supplemented with buffer to a final concentration of 80 mM Tris-HCl pH 8.0, 100 mM NaCl, and then incubated with BioLock solution for one hour. After filtration through a 0.22 µm filter, ACE2 was purified using a 1 ml StrepTrap HP column (Cytiva) followed by isolation of the monomeric ACE2 by size exclusion chromatography using a Superdex 200 Increase 10/300 GL column pre-equilibrated in PBS (Gibco 10010-023). For binding measurements with surface-captured RBD, recombinant ACE2 (residues 19-615 from Uniprot Q9BYF1 with a C-terminal AviTag-10xHis-GGG-tag, and Nterminal signal sequence) was expressed in HEK293.sus using standard methods (ATUM Bio). Protein was purified via Ni Sepharose resin followed by isolation of the monomeric ACE2 by size exclusion chromatography using a Superdex 200 Increase 10/300 GL column pre-equilibrated with PBS. For binding measurements with surface-captured ACE2, recombinant ACE2 (residues 18-615 with a C-terminal GS-IgG2a-Mm-Fc tag, and N-terminal signal sequence) was stably transfected in CHO-K1 GS knock-down cell line (ATUM Bio). Protein was purified via protein A and buffer exchanged into PBS. SPR binding measurements were performed using a Biacore T200 instrument. S protein was surface captured via anti-AviTag pAb covalently immobilized on a CM5 chip, RBD protein was surface captured via StrepTactin XT covalently immobilized on a CM5 chip, and ACE2-mFc was surface captured via covalent immobilization of the Cytiva Mouse antibody capture kit on a C1 chip. Running buffer was Cytiva HBS-EP+ (pH 7.4) and all measurements were performed at 25 °C. All experiments were performed as singlecycle kinetics, with a 3-fold dilution series of monomeric ACE2 starting from 300 nM, each concentration injected for 180 sec, or a 3-fold dilution series of RBD starting from 50 nM, each concentration injected for 240 sec. All data were double reference-subtracted and fit to a binding model using Biacore Evaluation software. For one representative replicate, capture levels were normalized to WT for visualization. Binding data with ACE2 as analyte were fit to a 1:1 binding model. Binding data with RBD as analyte were fit to a Heterogeneous Ligand binding model, due to an artifactual kinetic phase with very slow dissociation that arises when RBD is an analyte; the lower affinity of the two KDs reported by the fit is reported as the KD of the RBD-ACE2 interaction (the two reported KDs are separated by at least two orders of magnitude for all fits). The measured KD for ACE2 binding to S is likely influenced by conformational dynamics of the RBDs in the context of the prefusion S trimer. Reported KDs are an average of 3-4 replicates measured on at least two separate days, with error given as SEM. A national sequencing collaboration formed at the start of the epidemic in the UK, CoG-UK consortium (consortiumcontact@cogconsortium.uk, 2020), has facilitated the tracking of SARS-CoV-2 sequences across Scotland since the start of the outbreak in February 2020 (6,825 sequences by Oct 6, 2020) and real-time monitoring of genetic changes in the Spike gene that might be associated with changes in virulence or transmissibility. Sequencing was carried out using an amplicon-based protocol in real-time at a rate of up to 300 genomes per week. 50% of samples were selected as surveillance samples, representing Scottish health boards proportionately based on population size, while 50% were selected to allow intervention with local issues such as nosocomial infection in hospitals and nursing homes. A gradual increase in the prevalence of the N439K polymorphism was noted to become increasingly prevalent during April 2020. This was noted to be particularly common in the Greater Glasgow & Clyde NHS health board region but spread to adjacent Scottish health boards also. Sequencing libraries were prepared according to the ARTIC nCoV-2019 described in detail at https://artic.network/ncov-2019. Briefly, PCR amplicons were generated using the nCoV-2019 PrimalSeq sequencing primers using 25-35 cycles of amplification. Generated amplicons were used to prepare either Oxford Nanopore or Illumina sequencing libraries. Oxford Nanopore libraries were prepared as described in the link above and sequenced in a flow cell R9.4.1 (Oxford Nanopore Technologies, Part Number FLO-MIN106D), using MinKNOW version 19.12.6. Raw FAST5 files were basecalled using Guppy version 3.2.10 in high accuracy mode with a minimum quality score of 7. Reads were size filtered, demultiplexed and trimmed with Porechop (https://github.com/rrwick/Porechop), and mapped against reference strain Wuhan-Hu-1 (MN908947). Variants were called using Nanopolish 0.11.3 and accepted if they had a loglikelihood score of greater than 200 and minimum read coverage of 20. For Illumina sequencing, amplicons were used to prepare libraries using the Kapa HyperPrep kit (Kapa Biosystems, Part Number KK8504) and further processed as described in the competition assay sequencing method. Sequencing was carried out on Illumina's MiSeq system (Illumina, Part Number SY-410-1003) using a MiSeq Reagent v2 500 cycle kit (Illumina, Part Number MS-102-2003) . Reads were trimmed with trim_galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and mapped with BWA (Li and Durbin, 2009) ) to the Wuhan-Hu-1 (MN908947) reference sequence, followed by primer trimming and consensus calling with iVar (Grubaugh et al., 2019 ) and a minimum read coverage of 10. UK sequences were obtained from the COG-UK consortium, https://www.cogconsortium.uk. Global sequences were obtained from the GISAID Initiative, https://www.gisaid.org on Oct 19 2020. The sequences were mapped using minimap2 and padded against the Wuhan/WH04/2020 reference. The sequences were downsampled with weights that normalise sequence count per epiweek, maximise the number of countries and lineages represented, and enriching for sequences with the N439K mutation. A maximum-likelihood phylogenetic tree was constructed using IQ-TREE with the the following parameters: -czb -blmin 0.0000000001 -m HKY --runs 5 and all other parameters set to default. The tree was visualised with custom python code using the baltic library, https://github.com/evogytis/baltic. For the phylodynamic analysis, Scottish "introduction" lineages were identified (Lycett et al., 2020, in prep and see http://sars2.cvr.gla.ac.uk/RiseFallScotCOVID), and the skygrowth package in R was used to estimate the effective population size over time, and the growth rate of the lineage within Scotland (Volz and Frost, 2017) . Clinical samples submitted to the West of Scotland Specialist Virology Centre for SARS-CoV-2 diagnostic rt-PCR testing were selected for sequencing as part of the COVID-19 UK Genomics UK Consortium (COG-UK) project, resulting in 1918 whole genome sequences originating from the NHS Greater Glasgow and Clyde Health Board region. Sequences were linked to electronic patient records and basic metadata including sample date, age, sex, admission to hospital and mortality at 28 days post diagnosis extracted. The electronic patient records of a subset of 1591 patients underwent full casenote review and clinical severity was recorded based on a 4-level ordinal scale: 1. no requirement for respiratory support, 2. treatment with supplemental oxygen via facemask or low-flow nasal cannulae, 3. intubation and ventilation, non-invasive ventilation or oxygen delivery by high flow nasal cannulae devices, 4. death within the 28 days following diagnosis. We modified the WHO ordinal scale to these 4 points as described previously (Volz et al., 2020) to avoid using hospitalisation as a criterion of severity because 1) many patients in nursing homes had severe infection but were not admitted to hospital, and 2) early in the outbreak, all cases were hospitalised irrespective of the severity of their infection. These data had previously been analysed to test for an effect of the D614G mutation on the severity of disease (Volz et al., 2020) ; we extend that analysis here using the same methodology to test for an effect of the N439K mutation. Additionally, we perform a new analysis using a model with the same structure to test for an effect of both the D614G mutation and the D614G/N439K mutation combination on the viral load of infected patients, as measured by cycle threshold value. In both cases we cannot estimate the marginal effect of the N439K mutation, as we only have the mutation on the 614G genetic background, so the individual effect of N439K cannot be separated from any potential epistatic interactions between the mutations. Briefly, the structure of the model used previously (Volz et al., 2020) and in the present study is a phylogenetic generalised additive model with mutation being the primary predictor of interest. The model controls for biological sex, age and the number of days since the first reported case in the dataset, with the latter two being included as penalised splines with a maximum of 30 knots. If the patient was part of a cluster of cases, this was included as a random effect, with individuals not part of clusters being assigned their own levels. Correlations driven by the rest of the genome are controlled for by a phylogenetic random effect using a correlation matrix generated under a Brownian motion assumption from a phylogeny estimated in IQ-TREE 2 v. 2.0.6 (Minh et al., 2020) using a HKY + Γ model, masking the positions recommended by De Maio et al. as of 22/7/2020 (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473/13), rooted on the first sequenced SARS-CoV-2 genome (Wu et al., 2020) . The priors for the severity model were those used in the previous analysis of this data. The priors for the model of the viral load were a student-t (mean = 20, scale = 10, degrees of freedom = 3) prior on the model intercept, a Gaussian (mean = 0, standard deviation = 10) prior over the fixed effects, and an exponential (lambda = 0.1) prior over the random effect, penalised spline and residual standard deviations. There are two key structural differences between the model used previously (Volz et al., 2020) and the model used here. Firstly, mutation is a three level rather than two level factor (D614/N439, D614G/N439 and D614G/N439K) with the ancestral D614/N439 being the reference level. Secondly, as we are now interested in two mutations, we estimated the phylogeny used to control for the effect of the rest of the genome excluding both the nucleotide position underlying the D614G mutation and the nucleotide position underlying the N439K mutation (in addition to the sites from De Maio et al mentioned above). The severity model used a cumulative error structure while the model on the CT values used a Gaussian error structure. In both cases, the models were estimated in brms v. 2.13.5 (Bürkner, 2018) . The presented models had no divergent transitions, Rhat values less than 1.01, and appropriate bulk and tail effective sample sizes for all parameters. Shortest probability intervals were calculated using the R package SPIn v. 1.1 (Liu et al., 2015) . Analysis code is available at https://github.com/dpascall/SARS-CoV-2-mutationanalysis. All samples were tested in duplicate using the 2019-nCoV_N1 assay RT-qPCR assay (https://www.fda.gov/media/134922/download). Ready-mixed primers and probe were obtained from IDT (Leuven, Belgium). PCR was carried out using NEB Luna Universal Probe One-Step Reaction Mix and Enzyme Mix (New England Biolabs, Herts, UK), primers and probe at 500 nM and 127.5 nM, respectively, and 5 µl of RNA sample in a final volume of 20 µl. No template negative controls were included after every seventh sample. Six ten-fold dilutions of SARS-CoV-2 RNA standards were tested in duplicate in each assay; standards were calibrated using a plasmid containing the N sequence that had been quantified using droplet digital PCR. Thermal cycling was performed on an Applied Biosystems™ 7500 Fast PCR instrument running SDS software v2.3 (ThermoFisher Scientific) under the following conditions: 55 °C for 10 minutes and 95 °C for 1 minute followed by 45 cycles of 95 °C for 10 s and 58 °C for 1 minute. Assays were repeated if the reaction efficiency was <90% or the R2 value of the standard curve was ≤0.998. Where possible, testing of samples was repeated if the %CV of the duplicates was <10%. VeroE6-ACE2 cells (VeroE6 cells induced to overexpress Ace2) either with or without TMPRSS2 overexpression (Rhin et al., 2020 under review) were seeded in a 12well plate and inoculated with an MOI of 0.01 with either the GLA1 (N439/D614G) or GLA2 (N439K/D614G) virus isolates for 1hr before washing the cells three times in PBS and replacing with 2% DMEM. 100ul of media was removed at each timepoint, RNA was extracted, and the presence of SARS-CoV-2 determined using 2019-nCOV-N1 assays (IDT) with an NEB Luna Universal Probe One-Step RT-qPCR Kit. A standard curve was used to determine the copy number present per ml of cell culture media. 100ul of the fresh media was also tested for the presence of virus, which was undetectable in all wells. Three T25 Flasks were seeded with VeroE6-Ace2 or VeroE6-Ace2-TMPRSS2 and inoculated with either single viruses or both GLA1 and GLA2 virus strains at an MOI of 0.01 for 1 hr. The flasks were washed three times with PBS, with 100 ul of the final wash being retained to determine the presence of free virus, before adding 5 ml of fresh 2% DMEM. At 24, 48, and 72 hrs, 500 ul of media was removed, which was replaced with 500 ul fresh media. 300 ul was used for RNA extraction and NGS analysis of the frequencies of the specific positions within the spike protein. The single virus inoculations showed no alternations in the frequency of the amino acid positions and the final wash showing no free virus in the supernatant. We used an unbiased metagenomic NGS sequencing pipeline to quantify variation across the whole viral genome on the Illumina NGS Next Seq platform. Briefly, extracted nucleic acid was incubated with DNaseI (Thermo Fisher, Part Number AM2222) followed by cDNA synthesis using SuperScript III (Thermo Scientific, Part Number 18080044) and NEBNext Ultra II Non-Directional RNA Second Strand Synthesis Module (New England Biolabs, Part Number E6111L). Samples were further processed using the Kapa LTP Library Preparation Kit for Illumina Platforms (Kapa Biosystems, Part Number KK8232) and indexed with the NEBNext Multiplex Oligos for Illumina 96 Unique Dual Index Primer Pairs (New England Biolabs, Part Number E6442S). Libraries were sequenced on Illumina's NextSeq 550 System (Illumina, Part Number SY-415-1002), generating 10 million pairs of reads per sample. Human mAbs were isolated from plasma cells or memory B cells of SARS-CoV or SARS-CoV-2 immune donors, as previously described (Corti et al., 2011; Pinto et al., 2020; Tortorici et al., 2020) . LY-CoV555 mAb was obtained from Eli Lilly and Company. REGN10933 and REGN10987 mAbs were produced recombinantly based on published sequences (Hansen et al., 2020) . A total of 148 human monoclonal antibodies or 445 human sera were tested for binding to RBD WT and mutants. Spectraplate-384 plates with high protein binding treatment (custom made from Perkin Elmer) were coated overnight at 4 °C with 0.5 µg/ml (for mAbs) or 5 ug/ml (for sera) SARS-CoV-2 RBD WT, N439K, K417V or N439K/K417V in phosphate-buffered saline (PBS), pH 7.2. Plates were subsequently blocked with Blocker Casein 1% supplemented with 0.05% Tween 20 (Sigma-Aldrich) for 1 h at room temperature. The coated plates were incubated with serial dilutions of the monoclonal antibodies or of the sera for 1 h at room temperature. The plates were then washed with PBS containing 0.1% Tween-20 (PBS-T), and alkaline phosphatase-goat anti-human IgG (Southern Biotech) was added and incubated for 1 h at room temperature. After 3 washing steps with PBS-T, P-NitroPhenyl Phosphate (pNPP, Sigma-Aldrich) substrate was added and incubated for 30 min at room temperature. The absorbance of 405 nm was measured by a microplate reader (Biotek). Fitting was performed using a 4-parameter logistic (4PL) model, yielding dose-response curves from which the area under the curve (AUC) between 5 and 500 ng/ml was computed. The AUC allows to capture, in a single metric, shifts of interest in two parameters of the 4PL model: EC50 and upper asymptote. BLI binding measurement was performed on a selection of human monoclonal antibodies tested by ELISA. Antibodies were diluted to 2.7 µg/ml in kinetic buffer (PBS supplemented with 0.05% BSA) and immobilized on Protein A Biosensors of an Octet RED96 system (FortéBio). Antibody-coated biosensors were incubated for 5 min with a solution containing 5 µg /ml of SARS-CoV2 RBD WT, N439K, K417V or N439/K417V in kinetic buffer. A dissociation step was then performed by incubating the biosensors for 5 min in kinetic buffer. Change in molecules bound to the biosensors caused a shift in the interference pattern that was recorded in real time and plotted using GraphPad Prism 8 software. Replication defective VSV pseudovirus (Takada et al., 1997) expressing SARS-CoV-2 spike protein were generated as previously described (Riblett et al., 2016) with some modifications. Plasmids encoding SARS-CoV-2 spike variants were generated by site-directed mutagenesis of the wild-type plasmid, pcDNA3.1(+)-spike-D19 (Giroglou et al., 2004) . Lenti-X™ 293T cells (Takara, 632180) were seeded in 10-cm dishes at a density of 1e5 cells/cm 2 and the following day transfected with 5 µg of spike expression plasmid with TransIT-Lenti (Mirus, 6600) according to the manufacturer's instructions. One day post-transfection, cells were infected with VSV-luc (VSV-G) (Kerafast, EH1020-PM) for 1 h, rinsed three times with PBS, then incubated for an additional 24 h in complete media at 37°C. The cell supernatant was clarified by centrifugation, filtered (0.45 µm), aliquoted, and frozen at -80°C. Vero E6 cells (ATCC CRL-1586) were seeded into clear bottom white 96 well plates (Costar, 3903) at a density of 2e4 cells per well. The next day, mAbs were serially diluted in pre-warmed complete media, mixed at a 1:1 ratio with pseudovirus and incubated for 1 h at 37°C in round bottom polypropylene plates. Media from cells was aspirated and 50 µL of virus-mAb complexes were added to cells and then incubated for 1 h at 37°C. An additional 100 µL of prewarmed complete media was then added on top of complexes and cells incubated for an additional 16-24 h. Conditions were tested in duplicate wells on each plate and at least six wells per plate contained uninfected, untreated cells (mock) and infected, untreated cells ('no mAb control'). Virus-mAb-containing media was then aspirated from cells and 100 mL of a 1:4 dilution of Bio-glo (Promega, G7940) in PBS was added to cells. Plates were incubated for 10 mins at room temperature and then were analyzed on the Envision plate reader (PerkinElmer). Relative light units (RLUs) for infected wells were subtracted by the average of RLU values for the mock wells (background subtraction) and then normalized to the average of background subtracted "no mAb control" RLU values within each plate. Percent neutralization was calculated by subtracting from 1 the normalized mAb infection condition. Data were analyzed and visualized with Prism (Version 8.4.3). IC50 curves were calculated from the interpolated value from the log(inhibitor) vs. response -variable slope (four parameters) nonlinear regression with an upper constraint of <100. Each neutralization infection was conducted on three independent days. . DMS score is the binding or expression fold change over WT on a Log10 scale. Aggregated DMS data is shown for each residue by taking the minimum (most disruptive variant) or the average score across all possible variants of a residue, except for the reference residue and the stop codon ('mutagenesis' columns). Alternatively, minimum and average scores are computed only across variants that have naturally occurred ('observed variants' columns). When no natural variants have been observed, cells are grey. The heatmap is annotated with frequency of non-reference amino acids in deposited sequences (at least 4 sequences were required to call a variant), in Log10 scale; number of countries in which a variant was observed; and percentage of total binding energy between RBD and hACE2 computed from an X-ray crystal structure. Data were sorted on the leftmost DMS column. Legend on next page (H) Correlation of ELISA-binding fold change and Neutralization fold change for each variant relative to WT (where a smaller ELISA AUC and therefore a smaller ratio represents loss of binding, and a larger IC50 and therefore a larger ratio represents loss of neutralization) A RBM RBD Table S1 . Details of the sarbecovirus sequences used for Figure S1 . The top 8 sequences shaded in gray were used for the similarity plot and all 69 sequences were used for the entropy plot. Parameter estimates on the link scale from the model estimating the impact of the N439K mutation on the Ct value of patients infected with SARS-CoV-2 in Scotland. Credible intervals represent 95% the shortest posterior density intervals. The difference between D614G/N349 and D614G/N349K was estimated by direct subtraction of the Hamiltonian Monte Carlo samples of the D614G/N349K estimate from the D614G/N349 estimate. Ct value did not appear strongly correlated with biological sex or age after controlling for the other factors. Patients infected with related viral genomes had correlated Ct values at testing potentially implying that there are other undescribed mutations in the genome that are affecting the viral load. Parameter estimates on the link scale from the model estimating the impact of the N439K mutation on the severity of infection of patients infected with SARS-CoV-2 in Scotland. Credible intervals represent 95% the shortest posterior density intervals. Thresholds correspond to the positions of the boundaries between the different severity classes. Amino Acid Change Gene Mutation GLA1 C3037T nsp12 P323L C14408T S D614G A23403G E V5A A24388T T26258C GLA2 C3037T nsp12 P323L C14408T nsp15 V35A T19724C S N439K C22879A S D614G A23403G ORF 10 V6F G29573T Table S4 Nucleotide Differences between GLA1 and GLA2. SNPs determined by Cov-GLUE on consensus sequences relative to Wuhan-Hu-1 (NC_045512.2). Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic Advanced bayesian multilevel modeling with the R package brms An integrated national scale SARS-CoV-2 genomic surveillance network SARS-CoV-2 Neutralizing Antibody LY-CoV555 in Outpatients with Covid-19 A neutralizing antibody selected from plasma cells that binds to group 1 and group 2 influenza A hemagglutinins Genomic epidemiology of SARS-CoV-2 spread in Scotland highlights the role of European travel in COVID-19 emergence A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants How single mutations affect viral escape from broad and narrow antibodies to H1 influenza hemagglutinin Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: a preliminary report of a phase 1/2, single-blind, randomised controlled trial Retroviral vectors pseudotyped with severe acute respiratory syndrome coronavirus S protein Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar SARS-CoV-2 D614G Variant Exhibits Enhanced Replication ex vivo and Earlier Transmission in vivo D614G mutation of SARS-CoV-2 spike protein enhances viral infectivity An mRNA Vaccine against SARS-CoV-2 -Preliminary Report Neutralizing Antibodies against SARS-CoV-2 and Other Human Coronaviruses LY-CoV555, a rapidly isolated potent neutralizing antibody, provides protection in a non-human primate model of SARS-CoV-2 infection Phase 1-2 Trial of a SARS-CoV-2 Recombinant Spike Protein Nanoparticle Vaccine The influenza virus hemagglutinin head evolves faster than the stalk domain Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Structure of SARS coronavirus spike receptorbinding domain complexed with receptor Fast and accurate short read alignment with Burrows-Wheeler transform The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity Emergence of SARS-CoV-2 through recombination and strong purifying selection Transmission dynamics and evolutionary history of 2019-nCoV Simulation-efficient shortest probability intervals Natural selection in the evolution of SARS-CoV-2 in bats, not humans Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era REFMAC5 for the refinement of macromolecular crystal structures Mapping Neutralizing and Immunodominant Sites on the SARS-CoV-2 Spike Receptor-Binding Domain by Structure-Guided High-Resolution Serology Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology A Haploid Genetic Screen Identifies Heparan Sulfate Proteoglycans Supporting Rift Valley Fever Virus Infection Convergent antibody responses to SARS-CoV-2 in convalescent individuals Coronavirus RNA Proofreading: Molecular Basis and Therapeutic Targeting Isolation of potent SARS-CoV-2 neutralizing antibodies and protection from disease in a small animal model Structural basis of receptor recognition by SARS-CoV-2 Suptavumab for the Prevention of Medically Attended Respiratory Syncytial Virus Infection in Preterm Infants Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding A system for functional analysis of Ebola virus glycoprotein Ultrapotent human antibodies protect against SARS-CoV-2 challenge via multiple mechanisms Scalable Relaxed Clock Phylogenetic Dating Evaluating the effects of SARS-CoV-2 Spike mutation D614G on transmissibility Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Top -Pairwise similarity to SARS-CoV-2 (sliding window size of 30 amino acids) for seven related Sarbecoviruses (see figure key) across the RBD region of the Spike protein. Bottom -Site-specific entropy plot across the RBD protein alignment of SARS-CoV-2 and 68 related viruses (Data S1). Entropy for each position l (H(l)) was calculated using Shannon's entropy formula with a natural log as Sites constituting the RBM are annotated in blue the x-axis refers to absolute positions in the SARS-CoV-2 Spike protein sequence. Rightbox plot of site-specific entropy values for the RBM sites (blue) and remaining non-RBM RBD sites (gray) Sequence alignment (left) and identity for RBM and RBD (right) to SARS-CoV-2 of the RBD sequences showing binding to hACE2. RBM residues indicated by blue boxes. (C) Binding of hACE2 to human, pangolin and bat Sarbecovirus RBDs by BLI. Bat CoV RaTG13 We thank all Scottish NHS virology laboratories who provided samples for sequencing and Scott Arkison for HPC maintenance. We thank Chiara Silacci-Fregni from Humabs BioMed, Sandra Jovic, Blanca Fernandez Rodriguez, Federico Mele, from the Institute for Research in Biomedicine in Bellinzona and Tatiana Terrot from Ente Ospedaliero Cantonale in Lugano for the help in collecting sera samples. We thank Cindy Ng for help with protein production. We thank Julia Di Iulio for help with analyzing GISAID sequences. We gratefully acknowledge the authors, originating and submitting laboratories of the sequences from GISAID, https://www.gisaid.org, on which much of this research is based.The ISARIC WHO CCP-UK study protocol is available at https://isaric4c.net/protocols; study registry https://www.isrctn.com/ISRCTN66726260. This work uses data provided by patients and collected by the NHS as part of their care and support #DataSavesLives. We are grateful to the 2648 frontline NHS clinical and research staff and volunteer medical students who collected the data in challenging circumstances; and the generosity of the participants and their families for their individual contributions in these difficult times. We also acknowledge the support of Jeremy J Farrar, Nahoko Shindo, Devika Dixit, Nipunie Rajapakse, Lyndsey