key: cord-0336751-6f91ncsv authors: Johnson-Mackinnon, J. C.; Agius, J. E.; Fong, W.; Gall, M.; Lam, C.; Basile, K.; Kok, J.; Arnott, A.; Sintchenko, V.; Rockett, R. title: SARS-CoV-2 within-host and in-vitro genomic variability and sub-genomic RNA levels indicate differences in viral expression between clinical and in-vitro cohorts. date: 2021-11-24 journal: nan DOI: 10.1101/2021.11.23.21266789 sha: b71ecbc0911a5ed74b6f0b57db5c003639528f05 doc_id: 336751 cord_uid: 6f91ncsv Background: Low frequency intrahost single nucleotide variants (iSNVs) of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) have been increasingly recognised as predictive indicators of positive selection. Particularly as growing numbers of SARS-CoV-2 variants of interest (VOI) and concern (VOC) emerge. However, the dynamics of subgenomic RNA (sgRNA) expression and its impact on genomic diversity and infection outcome remain poorly understood. This study aims to investigate and quantify iSNVs and sgRNA expression in single and longitudinally sampled cohorts over the course of mild and severe SARS-CoV-2 infection benchmarked against an in-vitro infection model. Methods: Two clinical cohorts of SARS-CoV-2 positive cases in New South Wales, Australia collected between March 2020 and August 2021 were sequenced. Longitudinal samples from cases hospitalised due to SARS-CoV-2 infection (severe) were analysed and compared with cases that presented with SARS-CoV-2 symptoms but were not hospitalised (mild). SARS-CoV-2 genomic diversity profiles were also examined from daily sampling of culture experiments for three SARS-CoV-2 variants (Lineage A, B.1.351, and B.1.617.2) cultured in VeroE6 C1008 cells (n = 33). Results: ISNVs were detected in 83% (19/23) of the mild cohort cases and 100% (16/16) of the severe cohort cases. SNP profiles remained relatively fixed over time, with an average of 1.66 SNPs gained or lost and an average of 4.2 and 5.9 low frequency variants per patient were detected in severe and mild infection, respectively. SgRNA was detected in 100% (25/25) of the mild genomes and 92% (24/26) of the severe genomes. Total sgRNA expressed across all genes in the mild cohort was significantly higher than that of the severe cohort. Significantly higher expression levels were detected in the spike and the nucleocapsid genes. There was significantly less sgRNA detected in the culture cohort than the clinical. Discussion and Conclusions: The positions and frequencies of iSNVs in the severe and mild infection cohorts were dynamic overtime, highlighting the importance of continual monitoring, particularly during community outbreaks where multiple SARS-Cov-2 variants may co-circulate. SgRNA levels can vary across patients and the overall level of sgRNA reads compared to genomic RNA can be less than 1%. The relative contribution of sgRNA to the severity of illness warrants further investigation given the level of variation between genomes. Further monitoring of sgRNAs will improve the understanding of SARS-CoV-2 evolution and the effectiveness of therapeutic and public health containment measures during the pandemic. SARS-CoV-2 symptoms but were not hospitalised (mild). SARS-CoV-2 genomic diversity profiles 48 were also examined from daily sampling of culture experiments for three SARS-CoV-2 variants 49 (Lineage A, B.1.351, and B.1.617.2) cultured in VeroE6 C1008 cells (n = 33). complex mechanism involving discontinuous or 'paused' transcription, followed by an RNA-dependant 90 RNA polymerase (RdRp) template switch during negative-strand RNA synthesis (Parker et al., 2021) . 91 The resulting nested set of negative sense RNAs serve as templates for the transcription of positive Kim et al., 2020) . 98 Compared to DNA viruses, the replication of RNA viruses is typically associated with a high 99 error rate due to the lack of sufficient proofreading activities during genome replication (Domingo and 100 Holland, 1997). However, coronaviruses employ a highly conserved proofreading exoribonuclease 101 encoded by non-structural protein 14 (nsp14) which enhances the fidelity of RNA synthesis (Graepel 102 et al., 2017) . Despite this mechanism, the mutation rate of SARS-CoV-2 is 1-2 mutations per month 103 and is generally higher than DNA viruses. (Smith et al., 2014, Day et al., 2020, Nakagawa and 104 Miyazawa, 2020). Additionally, coronaviruses have the propensity to recombine and generate extensive 105 and diverse recombination products, particularly within the spike region of the genome (Wells et al., 106 2020).At an inter-host level, newly emerging viruses acquire adaptive mutations to enhance replication, 107 modulate the host response, and facilitate effective transmission. However, the intra or within-host 108 variability of RNA viruses is associated with the quasi-species concept, leading to multiple diverse 109 circulating quasi-species of varying frequencies linked through mutation (Ramazzotti et al., 2020, 110 Karamitros et al., 2020). The quasi-species collectively contribute functional characteristics at the 111 population level, and in combination with the genetic profile of the host, can influence viral phenotype 112 and adaptive capabilities (Stone et al., 2006) . Since most of the immune escape and adaptive 113 mechanisms of SARS-CoV-2 involve intra-cellular interactions, it is expected that SARS-CoV-2 114 evolves through intra-host selective pressure (Kumar et al., 2020) , highlighting the capacity for the 115 development of genetically different SARS-CoV-2 viruses within the same host. 116 Higher within-host diversity of viral RNA pathogens can be associated with increasing viral 117 virulence and antigenic variability (Stone et al., 2006) , exacerbated disease severity and clinical 118 outcome, immune escape (Nowak et al., 1991) , and drug resistance (Johnson et al., 2008) . Given these 119 effects, the importance and urgency of monitoring variants at the within-host level is paramount. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint E6] (ATCC® CRL-1586™) in Dulbecco's minimal essential medium (DMEM, Lonza Bioscience, 166 Alpharetta, GA, USA), and supplemented with 9% foetal bovine serum (FBS, HyClone, Cytiva, 167 Sydney, Australia). Culture media was changed within 12 hours and contained 1% FBS, and 1% 168 antimicrobials including amphotericin B deoxycholate (25µg/mL), penicillin (10,000 U/mL), and 169 streptomycin (10,000 µg/mL) (Lonza, Basel, Switzerland) to inhibit microbial overgrowth. The plates 170 were inoculated with 200 μL of serially diluted virus stock (1x10 -2 to 1x10 -6 ) in triplicate. Cells were 171 incubated at 37°C in 5% CO2 for 4 days (days 0 to 3), and were sealed with AeraSealä Film (Excel 172 Scientific, Inc., Victorville, CA, USA) to minimise evaporation, spillage, and well-to-well cross- 194 The SARS-CoV-2 viral RNA was detected and quantified using a previously described RT- is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint at 58°C. The enriched library was purified, and the concentration and fragment size were quantified 212 using the Qubit TM 1x dsDNA High Sensitivity Assay (ThermoFisher Scientific, USA), and Agilent High 213 Sensitivity D1000 ScreenTape assay on the Agilent 4200 Tapestation (Agilent, Germany), respectively. 214 The libraries were sequenced using 2 x 74 bp runs on the Illumina MiniSeq TM or iSeq (Illumina, 215 USA) and multiplexed with the aim of producing 2 x 10 6 raw reads per library. 216 Bioinformatic analysis and clustering 217 The raw sequence reads were subjected to an in-house quality control procedure prior to 218 downstream analysis. The reads were demultiplexed and quality trimmed using Trimmomatic version 219 0.36 (minimum read quality score of 20, leading/trailing quality of 5). Reference mapping and 220 consensus calling was performed using iVar version 1.2.1. Reads were mapped to the reference SARS- is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint Statistical significance (p ≤ 0.05) was determined using the Mann-Whitney test for difference 257 between means on variables which contained at least five data points (iSNV/SNP counts and read 258 frequencies, sgRNA counts and read frequencies). (Table 1 ). There were no significant 272 differences between age and sex across the mild and severe cohorts, however, a significant difference 273 between the age ranges (p = 0.16) was noted (Table 1) . Thirty-three SARS-CoV-2 consensus genomes 274 were sequenced from 34 culture specimens of varying sample dilutions and time intervals (one genome 275 did not pass quality filtering and was excluded from the analyses). High depth genomes were produced 276 across all cohorts and the median depth achieved was not significantly different. The median depth for 277 the severe cohort was 2,021x, mild 928x, Lineage A 2,964x, Beta VOC 3,408x, and Delta VOC 2,529x 278 (Supplementary Figure S1b) . 280 A range of SARS-CoV-2 viral loads were detected in each cohort (Supplementary Figure 1a) . 281 The median severe viral load was 516,643 copies (range: 151,246,026 to 2,512 copies), and the median 282 mild viral load was 457,284 copies (range: 95,727,865 to 668.7 copies). Within the culture cohorts, 283 lineage A median viral load was 1,408,340 copies (range: 18,976,383 to 29.4 copies), Beta median viral 284 load was 560,453.7 copies (range: 9,026,044 to 130.2 copies), and Delta median viral load 300,232.9 285 copies (range: 3,107,421 to 424.5 copies) (Supplementary Figure S1a) . There was no significant 286 difference between the viral loads across cohorts. 288 A wide variety of pango lineages were defined across the clinical cohorts (Table 1) is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint 303 Low frequency iSNVs were detected in 100% (16/16) of the severe (Figure 2a) , 83% (19/23) 304 of the mild (Supplementary Figure S2) , and 100% (33/33) of the culture samples (Figure 2b) . 305 Longitudinal samples collected from the same patient were collected over a mean of 6.36 days (range: 306 0 to 11) post-symptom onset compared to one to three days after inoculation in cultured specimens. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint at day one converted to a low frequency deletion event by day 11 (S, n=1). The final longitudinal 348 sample, severe case P0615, retained one iSNV and two low frequency deletion events from three days 349 post symptom onset to eight days post symptom onset (ORF1ab, n = 1; S n = 1; NC, n = 1) with one 350 iSNV lost at day 8 (NC, n = 1). 351 Our cohort also contained five epi-linked family groups. There were no shared iSNVs between 352 cases in groups 1 and 2 (Lineages B.1 and D.2). Group's three to five were all in lineage B.1.617, and 353 of those groups, groups four and five had no shared iSNVs between cases. In group three there was one 354 iSNV shared amongst all cases and samples (ORF1ab, n = 1), and one iSNV that was present in one is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint differences except in the N (p = 0.0114) and S (p = 0.0128) genes ( Figure 5) . Although there were also 394 some interesting trends in sgRNA between lineages, the sample sizes per lineage were insufficient to 395 determine significance (Supplementary Table S1 ). Within the culture cohort, sgRNA was also present 396 in the majority of genomes (30/33) at low levels with a median of 0.8% of the total reads compared to 397 gRNA (range 0.02% -10.4% depth/ average gene depth). Overall, sgRNA expression was significantly 398 less than in both the severe (p = 0.0002) and mild (p = 0.0001) cohorts (Supplementary Figure S3) . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint (Lythgoe et al., 2021 , Valesano et al., 2021 . However, there may be implications for transmission and 468 potential emergence of VOCs if iSNVs evolve into SNPs. Our findings indicate that the number of 469 iSNVs tends to increase the longer the infection progresses, particularly in the ORF1ab, S, and N genes. 470 It is therefore possible that the longer an individual remains infectious, the higher the likelihood of the 471 accumulation and transmission of functional iSNVs. 472 It is important to note that the presence and/or absence of iSNVs and their distribution across is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint Subgenomic RNA variation 514 We uncovered low levels of sgRNA expression across all three cohorts, representing, on 515 average less than 2% of the read depth of gRNA. However, the relative abundance of the eight sgRNA 516 transcripts was similar to other investigations, where nucleocapsid sgRNA transcripts were the most 517 abundant and ORF10 sgRNA was not detected (Alexandersen et al., 2020 , Parker et al., 2021 . 518 Interestingly the pattern of sgRNA detection was similar across cohorts and we did not detect a higher 519 abundance of sgRNA in cultured isolates, as previously reported (Nomburg et al., 2020) . Instead, we 520 found a significantly higher level of sgRNA in cases with mild symptoms. This is an interesting result 521 as it has been reported that sgRNA transcripts are reduced in asymptomatic cases of COVID-19 (Wong It is still unknown how these new transcripts will impact pathology, but 537 it is hypothesized that it could lead to diversification and adaptation to the host (Long, 2021) . 538 We have established significant differences between severe and mild disease cohorts and genes 539 as well as distinct and consistent patterns of sgRNA. Our findings are also consistent with relative 540 abundances of sgRNA described at spike and nucleocapsid genes (Alexandersen et al., 2020) . However, 541 this study was limited in the available sample size, which was further complicated by low viral levels 542 in later longitudinal samples. Strict lockdown procedures and border closures also greatly reduced or 543 eliminated the proliferation of SARS-CoV-2 lineages, leading to low numbers of representative 544 genomes per lineage. Further investigations with larger time frames and more longitudinal samples will 545 be required to fully understand the behaviour and contribution of iSNVs to disease and transmission. Conclusion 547 We demonstrate that iSNVs in SARS-CoV-2 genomes can accumulate over the course of 548 COVID-19 infection and were predominately sporadic across cases with severe or mild disease. There 549 were lineage-specific hot spots associated with persistent and low level iSNVs within diverse samples. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint Within-Host Diversity 573 of SARS-CoV-2 in COVID-19 Patients With Variable Disease Severities SARS-CoV-2 genomic and 576 subgenomic RNAs in diagnostic samples are not an indicator of active replication Intra-Host Diversity of SARS-Cov-2 Should 579 Not Be Neglected: Case of the State of Victoria Structural insights into SARS-CoV-2 582 proteins SARS-CoV-2 Quasispecies Mediate Rapid Virus 586 Evolution and Adaptation. bioRxiv Characterisation of the transcriptome and proteome of SARS-CoV-2 reveals a 590 cell passage induced in-frame deletion of the furin-like cleavage site from the spike 591 glycoprotein On the evolutionary epidemiology of SARS Current biology : CB RNA virus mutations and fitness for survival Temporal signal and the phylodynamic threshold of SARS-598 CoV-2. Virus Evolution, 6 Proofreading-Deficient Coronaviruses Adapt for Increased Fitness over Long-Term Passage 601 without Reversion of Exoribonuclease-Inactivating Mutations Nextstrain: real-time tracking of 604 pathogen evolution SARS-CoV-2 variants, spike 608 mutations and immune escape Minority HIV-1 drug resistance mutations are present in antiretroviral treatment-naïve 612 populations and associate with reduced treatment efficacy SARS-CoV-2 exhibits intra-host genomic plasticity and low-frequency 615 polymorphic quasispecies The Architecture of 617 SARS-CoV-2 Transcriptome Host Immune Response and 619 Immunobiology of Human SARS-CoV-2 Infection From Low Viral Load Samples. bioRxiv Selective pressure on SARS-CoV-2 protein 626 coding genes and glycosylation site prediction SARS-CoV-2 Subgenomic RNAs: Characterization, Utility, and Perspectives SARS-CoV-2 within-host diversity and 636 transmission Genome evolution of SARS-CoV-2 and its virological 638 characteristics Insights into SARS CoV-2 genome, structure, evolution, pathogenesis and therapies: Structural genomics 642 approach Pervasive generation of non-canonical 644 subgenomic RNAs by SARS-CoV-2 Antigenic diversity thresholds and the development of AIDS Subgenomic RNA identification in SARS-CoV-2 genomic sequencing 654 data Spike 659 mutation D614G alters SARS-CoV-2 fitness Genomic epidemiology of superspreading events in Austria reveals mutational dynamics and transmission properties of SARS-CoV-2. Science 669 translational medicine Interpret with caution: An evaluation of the commercial AusDiagnostics versus in-house 674 developed assays for the detection of SARS-CoV-2 virus Characterization of intra-host 678 . CC-BY-NC-ND 4.0 International license SARS-CoV-2 variants improves phylogenomic reconstruction and may reveal functionally 679 convergent mutations Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK 682 defined by a novel set of spike mutations & 687 SINTCHENKO, V. 2020. Revealing COVID-19 transmission in Australia by SARS-CoV genome sequencing and agent-based modeling Hidden genomic diversity of SARS CoV-2: implications for qRT-PCR diagnostics and transmission. bioRxiv : the preprint server 694 for biology Replication Fidelity of the Largest RNA Viruses Continuous and Discontinuous 698 RNA Synthesis in Coronaviruses. Annual review of virology From SARS to MERS, Thrusting Coronaviruses into the Spotlight. Viruses Quasispecies diversity determines pathogenesis through cooperative interactions in a viral 703 population Patterns of 713 within-host genetic diversity in SARS-CoV-2. eLife Temporal dynamics of SARS CoV-2 mutation accumulation within and across infected hosts Intra-host variation and evolutionary dynamics of SARS-CoV-2 populations in 723 COVID-19 patients The evolutionary history of 728 ACE2 usage within the coronavirus subgenus < Reduced subgenomic RNA expression is a molecular indicator of 733 asymptomatic SARS-CoV-2 infection A new coronavirus 737 associated with human respiratory disease in China A 742 pneumonia outbreak associated with a new coronavirus of probable bat origin SARS-CoV-2 Subgenomic N (sgN) Transcripts in Oro-Nasopharyngeal Swabs Correlate with the Highest Viral Load, as Evaluated by Five Different Molecular Methods It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 24, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 24, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 24, 2021. ; https://doi.org/10.1101/2021.11.23.21266789 doi: medRxiv preprint