key: cord-0755475-h9ds8qwl authors: Golubchik, T.; Lythgoe, K. A.; Hall, M. D.; Ferretti, L.; Fryer, H. R.; MacInyre-Cockett, G.; de Cesare, M.; Trebes, A.; Piazza, P.; Buck, D.; Todd, J. A.; The COVID-19 Genomics UK consortium,; Fraser, C.; Bonsall, D. title: Early Analysis of a potential link between viral load and the N501Y mutation in the SARS-COV-2 spike protein date: 2021-01-15 journal: nan DOI: 10.1101/2021.01.12.20249080 sha: 0252dcb6b38cd1869db79c6a5c362e4310e6f684 doc_id: 755475 cord_uid: h9ds8qwl A new variant of SARS-CoV-2 has emerged which is increasing in frequency, primarily in the South East of England (lineage B.1.1.7 ( 1 ) ; VUI-202012/01). One potential hypothesis is that infection with the new variant results in higher viral loads, which in turn may make the virus more transmissible. We found higher (sequence derived) viral loads in samples from individuals infected with the new variant. Median inferred viral loads were three-fold higher in individuals with the new variant (Fig. 1). Most of the new variants were sampled in Kent and Greater London. We observed higher viral loads in Kent compared to Greater London for both the new variant and other circulating lineages. Outside Greater London, the variant has higher viral loads. Within Greater London, the new variant does not have significantly higher viral loads compared to other circulating lineages. Higher variant viral loads outside Greater London could be due to demographic effects, such as a faster variant growth rate compared to other lineages or concentration in particular age-groups. Our analysis does not exclude a causal link between infection with the new variant and higher viral loads. This is a preliminary analysis and further work is needed to investigate any potential causal link between infection with this new variant and higher viral loads, and whether this results in higher transmissibility, severity of infection, or affects relative rates of symptomatic and asymptomatic infection. This is an updated report submitted to NERVTAG in December 2020 as part of urgent investigations into the new variant of SARS-COV-2 (VUI-202012/01). It makes full use of (and is restricted to) all sequence data and associated metadata available to us at the time this original report was submitted and remains provisional. Under normal circumstances more genomes and metadata would be obtained and included before making this report public. We will update this preprint when more genomes and metadata are available and before submitting for peer review. On 14 December 2020 a new variant of SARS-CoV-2 circulating in the UK was reported ( 2 , 3 ) , characterised by the N501Y mutation in the receptor binding domain (RBD) of Spike, the ΔH69/V70 deletion, and numerous other mutations ( 1 ) . The rise in frequency of this variant is associated with a sharp increase in reported cases in the South East of England, raising concerns that the variant could be more transmissible. We performed a rapid analysis to investigate whether the new variant is associated with higher viral loads, since higher viral loads may indicate increased transmissibility. As members of the COG-UK consortium ( https://www.cogconsortium.uk/ ), we sequenced RT-QPCR SARS-CoV-2 positive samples originating from four UK Lighthouse laboratories, which provide Pillar 2 COVID-19 testing services. The samples were sequenced using veSEQ, our quantitative sequencing approach for which the number of unique mapped reads is correlated with, and thus can be used as a proxy for, viral load. For a full description of the sequencing protocol see ( 4 , 5 ) . We used log 10 (mapped reads) as a proxy for viral load (see fig S1 in ( 5 ) ). Comparisons between distributions of log 10 (mapped reads) were made using Welsh t-test (two-tailed), with p-values combined using Stouffer's method where appropriate. We also performed a multivariate logistic regression analysis. Given the known negative correlation between viral load and cycle threshold (Ct) values ( 6 ) obtained during PCR testing ( 7 ) , we first confirmed a strong negative correlation between log 10 (unique mapped reads) and Ct values for samples that we sequenced from Lighthouse laboratories (linear regression, r 2 =0.43, p <<0.001, Fig. 1 ). Number of uniquely mapped reads per sample can be used as a proxy for viral load. The Ct value shown is the maximum Ct value obtained from Majora (the COG database) from retrospective data for all Lighthouse laboratories that supply Ct data; log 10 of uniquely mapped (deduplicated) reads obtained with veSEQ platform correlates well with Ct. This does not include samples in this report since Ct values were not yet available. The N501Y mutation is strongly linked with other mutations characterising the new variant (VUI-202012/01) in our dataset, including the ΔH69/V70 deletion, and therefore we used Y501 as a marker of the new variant. The ΔH69/V70 deletion alone is not a specific marker of VUI-202012/01 in our data, while lineage B.1.1.70, which is currently present in Wales and in some cases carries Y501 but never the deletion, was not present in our data. We identified 88 samples that produced consensus sequences with the Y501 variant. All variant samples were taken between 31 Oct 2020 and 13 Nov 2020, and therefore we only considered samples (Y501 and N501) taken during this period, since Ct values have been shown to vary by calendar time ( 7 ) . When comparing the number of unique mapped reads in the Y501 variant samples (median log 10 (reads)=4.64, N =88) with that in the to N501 samples (median log 10 (reads)=4.16, N =1299), we found higher counts in the former (Welch t -Test p =0.014; Fig. 1 ). This is equivalent to around 3-fold higher median viral loads in the Y501 variant samples compared to N501 samples. This result remained significant when we controlled for batch (Fig. 3a, p =0 .011, combined p -value via Stouffer's method), but not Lighthouse laboratory (Fig. 3b, p =0 .052). The correlation between the new variant and viral load is also associated with a relative paucity of samples with lower (<10 3 ) mapped reads among Y501 samples (Fig. 1 , p= 0.0053, chi-squared test; 10 3 logged mapped reads is equivalent to a viral load of ~10 4 copies per reaction, max Ct~28 ). When comparing samples with just the ΔH69/V70 deletion (without the Y501 variant) to samples without the deletion, we did not find a significant difference in log 10 (reads) ( p =0.86; controlling for batch p =0.56, and for Lighthouse lab p =0.54) (Fig. 1) . . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 15, 2021. ; To test whether the difference in viral loads for samples with the new variant could in part be explained by geographic effects, we considered the sampling location (adm2 district) where this information was available. Of the 88 Y501 variants sampled, 24 were in Greater London, 46 in Kent, and in lower numbers ( N =1-5) in other areas (Bristol, Essex, Hampshire, Leicestershire, Norfolk, Surrey and West Sussex). Regardless of variant presence, all samples from Greater London had significantly lower viral loads than those from other locations ( p =0.0016, Welch's t -test), and the association between Y501 and higher viral load was not significant in this region (p=0.91; Fig. 4 ). Outside Greater London, viral loads for Y501 were significantly higher than for N501 ( p =0.0068). Within Kent, the location with the greatest number of Y501 samples, Y501 viral loads were not significantly higher than N501 viral loads ( p =0.089). These results indicate a correlation between infection with the new variant (VUI-202012/01) and (inferred) viral load outside Greater London, although we are currently underpowered to draw firm conclusions. The lack of association within Greater London could be due to lack of power, or to demographic or epidemiological differences in London compared with the other locations. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 15, 2021. Box and scatter plots of unique mapped reads, stratified by sampling location. Points within each batch are jittered to aid visualisation. Horizontal lines in boxplots represent the median and the interquartile range. Only sampling locations with at least one Y501 sample were included. In a multivariate logistic regression analysis for variables associated with higher viral load (Table 1) , the Y501 variant was associated with a fivefold increase in odds of >=10 3 mapped reads ( p =0.036). The fitted model with interaction terms suggest a much smaller effect of the variant outside Kent, with the total odds increase reduced to 1.75 for Greater London and 1.24 for other regions, but the interaction term coefficients were not statistically significant ( p =0.27 and p =0.16, respectively). Thus, if the association of the variant with a paucity of low viral load samples is stronger in Kent compared to other areas (e.g. due to epidemiological, demographic, or sampling differences), we lack the necessary power to demonstrate it. No other variables showed evidence of an association. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 15, 2021. ; Table 1 . Logistic regression analysis, identifying variables associated with >=10 3 mapped reads. Odds ratios for each variable and 95% confidence intervals for those ratios are presented. Lighthouse labs are anonymised as in Fig. 2 . This is a preliminary analysis, and other factors could explain the (inferred) higher viral loads in samples with the new variant (VUI-202012/01), in addition to a working hypothesis that there is a causal effect of the new variant on within-host virus abundance. Whether the correlation is causative (infections with the new variant have higher viral loads) or correlative (e.g. due to epidemiological dynamics, demographics of individuals infected with the new variant, and/or sampling) warrants further study. Individuals contributing samples in this analysis were tested as part of the test and trace program, which is primarily aimed towards individuals seeking a test following the onset of symptoms. We observed a broad spectrum of viral loads among the samples we sequenced. Given known associations between lower viral loads and later infection ( 8 ) , and higher viral loads at the onset of symptoms, this suggests our full dataset consists of individuals in both early and late stages of symptomatic infection. Whilst we do not a priori expect there to be a systematic difference in the timing of sampling relative to infection, in an exponentially growing population the expectation is to sample relatively more people early in infection ( 9 ) . Whether or not early sampling-bias supports an effect on inferred viral loads will depend on the relative epidemiological dynamics of the new and other variants. If, for example, VUI-202012/01 is growing faster, this could result in a bias for it to be sampled earlier. This is consistent with the relative paucity of VUI-202012/01 samples with low viral load. In addition, VUI-202012/01 might be circulating within particular demographics (e.g. age groups) that tend to have higher viral loads when sampled. This may explain the apparently different patterns in Greater London and elsewhere. Focussed transmission within a particular demographic group is also more likely during the early stages of epidemic growth of a given lineage, before it disperses into the wider population. We were unable to test these hypotheses as we did not have demographic data relating to the sampled individuals with the new variant. We also cannot rule out other additional confounding effects and recommend that such effects are investigated further. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.12.20249080 doi: medRxiv preprint Future prospects A number of processess could have caused the rapid growth of the new variant (VUI-202012/01), including founder effects, or biological mechanisms that increase its transmissibility. Higher viral loads are one such potential mechanism: Transmissibility of viruses is understood to be higher in individuals who exhibit higher viral loads ( 10 ) and in HIV viral load is partly determined by virus genotype ( 11 ) . Our observation of higher inferred viral loads in individuals infected with the new variant suggests that increased transmissibility of the new variant is plausible, but important caveats remain. We recommend further investigations to evaluate this hypothesis. We note that we have used Y501 as a marker for the new variant; a large number of other mutations also characterise this new variant lineage ( 1 ) , and therefore Y501 per se might not be causing the effect (if there is one). We also note that higher viral loads can be associated with higher levels of viral virulence, and therefore links between the new variant and the severity of infection should be monitored carefully ( 12 ) . Whether or not observed higher viral loads associated with this variant are a direct cause of infection with the variant, a consequence of faster epidemic growth, or linked to particular demographics, our data are consistent with rapid growth of this specific lineage. Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations Recurrent emergence and transmission of a SARS-CoV-2 Spike deletion ΔH69/V70 Update on new SARS-CoV-2 variant and how COG-UK tracks emerging mutations -COG-UK Consortium HPTN 071 (PopART) Team, A Comprehensive Genomics Solution for HIV Surveillance and Clinical Monitoring in Low-Income Settings The COVID-19 Genomics UK Cycle threshold (Ct) in SARS-CoV-2 RT-PCR (2020) Pouwels, the COVID-19 Infection Survey team, Viral load in community SARS-CoV-2 cases varies widely and temporally SARS-CoV-2 viral dynamics Estimating epidemiologic dynamics from single cross-sectional viral load distributions. medRxiv Temporal dynamics in viral shedding and transmissibility of COVID-19 Virulence and Pathogenesis of HIV-1 Infection: An Evolutionary Perspective PHE investigating a novel strain of COVID-19. GOV.UK (2020) Gaurang Patel 1 , Brendan AI Payne 66 , Liam Prestwood 1 , Veena Raviprakash 67 We are grateful to Lorne Lornie, Angie Green, The Oxford Genomics Centre and The Wellcome Centre for Human Genetics, for all their support in generating the data for this study.