key: cord-0691920-ygs1rcy0 authors: Samoilov, A. E.; Kaptelova, V. V.; Bukharina, A. Y.; Shipulina, O. Y.; Korneenko, E. V.; Lukyanov, A. V.; Grishaeva, A. A.; Ploskireva, A. A.; Speranskaya, A. S.; Akimkin, V. G. title: Change of dominant strain during dual SARS-CoV-2 infection date: 2020-11-30 journal: nan DOI: 10.1101/2020.11.29.20238402 sha: af080d7fe2d2d4083b16f410931e0b2b10807ba4 doc_id: 691920 cord_uid: ygs1rcy0 Background: The effect of SARS-CoV-2 mutations and viral load on the severity of COVID-19 is not well understood. The possibility of reinfection with SARS-CoV-2 has already been reported, but dual infection with SARS-CoV-2 is poorly described and is currently under discussion. We describe a study of two strains of SARS-CoV-2 detected in the same patient during the same disease presentation. Methods: Two nasopharyngeal swabs were obtained eight days apart from the patient in their 90s, diagnosed with lobar pneumonia (J18.1). Both tests were positive for SARS-CoV-2 with high viral load (Ct = 13). We have performed high-throughput sequencing of SARS-CoV-2 genomes from both swabs. Findings: Genomic analysis of SARS-CoV-2 revealed the presence of two genetically distant strains in both swabs. Detected strains belong to different phylogenetic clades (GH and GR) and differ in the seven nucleotide positions. The relative abundance of strains was 70% (GH) and 30% (GR) in the first swab, and 3% (GH) and 97% (GR). Interpretation: Our findings suggest that the patient was infected by two genetically distinct SARS-CoV-2 strains at the same time. One of the possible explanations is that the second infection occurred in the hospital. Change of the dominant strain ratio during disease manifestation could be explained by the advantage or higher virulence of the strain belonging to the clade GR. Introduction using 10 μL of the RNA samples, random hexanucleotide primers, and Reverta-L kit (AmpliSens, Russia) according to the manufacturer's instructions. Libraries for whole-genome sequencing of SARS-CoV-2 were prepared using single-strand cDNA, and SCV-2000bp protocol described previously [11] . In brief, we amplified four pools of ~2000 bp-long fragments covering the whole SARS-CoV-2 genome. The obtained PCR products were purified, mixed equally, and sheared in microTUBE-50 AFA Fiber Screw-Cap (PN 520166) using Covaris M220 (Covaris, Woburn, MA). Libraries were constructed using Y-shaped adapters compatible with Nextera XT Index Kit and amplified with 8 cycles using oligos from Nextera XT Index Kit and Q5 High-Fidelity DNA Polymerase (New England BioLabs, NEB). Libraries for total RNA sequencing were prepared using the NEBNext® Ultra™ II Raw reads from amplicon libraries and total RNA libraries were processed as described in [11] . We performed adapter and quality trimming with Trimmomatic [12] , removed PCR primers with cutadapt (for amplicon libraries) [13] , and mapped reads to the reference sequence (strain hCoV-19/Wuhan/WIV04/2019, GISAID accession ID EPI_ISL_402124) using bowtie2 [14] . After that, we filtered out reads with low mapping quality using SAMtools [15] , performed base-calling with GATK [16] , filtered gvcf files were with BCFtools [17] . Finally, the consensus sequences were obtained with BEDTools [18] . The validity of the resulting sequence was verified manually by visual inspection of mapped reads. Areas with coverage lower than 5 were masked with NNN. The frequency of nucleotides over the SARS-CoV-2 genome was analyzed using the Rsamtools package. All of the SARS-CoV-2 strains were aligned using mafft [19] , the phylogenetic tree was built using IQtree 2 [20] using the GTR model. We have performed the sequencing of two swab samples obtained from the same patient eight days apart using the SCV-2000bp protocol and Illumina sequencing. The sequencing of sample 1 yielded 1.1 M paired-end raw reads. After quality filtration and PCR primer trimming, 837 thousand reads remained, 99.87% of which mapped to the reference SARS-CoV-2 genome strain hCoV-19/Wuhan/WIV04/2019 (GISAID accession ID EPI_ISL_402124). The mapping of trimmed reads to a reference sequence revealed seven heterogeneous positions (see Fig. 1 for an example). The sequencing of sample 2 yielded 3.9 M paired-end raw reads. After quality filtration and PCR primer trimming, 3.7 M reads remained, 99.93% of which mapped to the reference SARS-CoV-2 genome strain hCoV-19/Wuhan/WIV04/2019 (GISAID accession ID EPI_ISL_402124). We analyzed the mapped reads and found the same heterogeneity at the same positions as in Sample 1, but at a much lower level. We interpreted our observations as the simultaneous presence of two SARS-CoV-2 strains in the same patient's samples. After obtaining consensus genomic sequence from the dominant strain from the less heterogeneous second sample (hereafter called strain 2), it became possible to unambiguously reconstruct genomic sequences of the strain prevalent in the first sample (hereafter called strain 1). Read coverage at the genomic positions differentiating strain 1 from strain 2 varied from 288 to 16000 in sample 1 and from 950 to 64000 in sample 2. The relative abundance of strains 1 and 2 in both time points was assessed by averaging the relative coverage of heterogeneous positions ( Figure 2A ) and amounted to roughly 69% and 31% in the first sample and 3% and 97% in the second sample, respectively ( Fig. 2C ). We found that strain 1 was dominant in sample 1, and strain 2 became dominant in sample 2. The resulting sequences are available at GISAID with accession numbers EPI_ISL_610237 (hCoV-19/Russia/CRIE170668/2020, the dominant strain from Sample 1) and EPI_ISL_610238 (hCoV-19/Russia/CRIE300223/2020, the dominant strain from Sample 2, collected eight days after the first swab). Heterogeneity in the sequence reads can also be explained by sequencing artifacts arising from the polymerase errors, chimeric fragments generation during nucleic acid amplification, and contamination during RNA extraction and library preparation. To exclude the possibility that the observed heterogeneity is a mistake, we have isolated RNA from the original swabs for the second time and performed sequencing of libraries prepared from total RNA without any enrichment for both samples. RNA-seq of sample 1 yielded 12.6 M paired-end 250 bp long reads. After quality filtration and PCR primer trimming, 12.2 M reads remained, 2.29% of which were mapped to the reference SARS-CoV-2 genome strain hCoV-19/Wuhan/WIV04/2019 (GISAID accession ID EPI_ISL_402124). We compared RNA-seq reads with reads obtained using the SCV-2000bp protocol (Fig. 2B ). Roughly the same frequencies of alternative nucleotides prove that All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 30, 2020. ; https://doi.org/10.1101/2020.11.29.20238402 doi: medRxiv preprint observed heterogeneity was not a result of a sequencing artifact and that DNA originating from two different SARS-CoV-2 strains is present in sample 1. Read coverage at the genomic positions differentiating strain 1 from strain 2 varied from 2100 to 4090 in sample 1 after total RNA sequencing. Relative coverage of SARS-CoV-2 genomic variants, which differentiate strains 1 and 2, at the time points 1 and 2. Genomic positions are at the X-axis, their relative frequencies are at Yaxis. Sample 1, sequencing using: (а) genome fragments amplification (SCV-2000bp protocol), (b) total RNA library. Sample 2 (collected eight days later), sequencing using: (c) genome fragments amplification (SCV-2000bp protocol). All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 30, 2020. ; https://doi.org/10.1101/2020.11.29.20238402 doi: medRxiv preprint Comparison of strain 1 genomic sequence to all of the GISAID SARS-CoV-2 database (as of 11.11.2020) revealed that this sequence is unique to GISAID, the closest genomes having at least two mismatches compared to strain 1 genome. Out of 571 closest sequences, only three originated from Russia (EPI_ISL_428905, EPI_ISL_428875, EPI_ISL_428871, all of them were collected in March), most of the other genomes originated from the USA (402), Iceland (28) , and Canada (26) and were collected in March-early April. Comparison of strain 2 genomic sequence to all of the GISAID SARS-CoV-2 database (as of 11.11.2020) revealed 1062 genomic sequences with 100% identity to strain 2, 78 out of which originated from different regions of Russia, 10 of them were collected in Moscow (collection dates of which vary from late March to early April), including eight genomes obtained in our lab, as described in [11] . The latest genomes with 100% identity to strain 2 were collected in Saint-Petersburg in the middle of September (EPI_ISL_602339 and EPI_ISL_602340). Other genomes with 100% identity to strain 2 originated mostly from G25563T, resulting in Q57H in NS3 protein) and three mutations, present only in strain 2 (G28881A and G28882A, resulting in R203K mutation in N protein; G28881C, resulting in G204R mutation in N protein) ( Table 1) . Phylogenetic analysis was performed by building a tree of all of the available SARS-CoV-2 genomes from the samples collected in Russia. It has been revealed that strain 1 (EPI_ISL_610237) belongs to the GH clade, and strain 2 (EPI_ISL_610238) belongs to the GR clade (GISAID classification) (see Fig. 3 ). All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 30, 2020. ; All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 30, 2020. ; https://doi.org/10.1101/2020.11.29.20238402 doi: medRxiv preprint consist of double peaks, but the authors also present too many electropherograms with low intensity of the minor peak. In conventional Sanger sequencing analysis, it is challenging to interpret the data when there is a low level of the second peak at the same nucleotide position. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 30, 2020. ; https://doi.org/10.1101/2020.11.29.20238402 doi: medRxiv preprint Minor variants can be reliably detected in the 20% mixture analyzed trace; however, in the analyzed traces from the 10% and 5% mixtures, the fragment is indistinguishable from the baseline noise [21] . The sensitivity of NGS is significantly superior to Sanger sequencing. Authors of the work [10] processed publicly available 3939 deeply sequenced SARS-CoV-2 genomes and 749 sequenced samples from Switzerland. The most interesting of their findings is that age was a statistically significant predictor of viral genetic diversity in the analyzed cohort. Authors have also discovered that Spike_D614G exhibits high intra-host diversity, with 29. The presence of two viral variants within the same patient might be associated with nosocomial infection. However, the patient spent only one day in the hospital until the collection of the first swab, in which we have detected two SARS-CoV-2 strains. The probability of getting a positive PCR test result in the early days after infection is rather small [22] . Therefore, our data can be interpreted either as an infection with two strains before admission to the hospital or as a rapid increase of hospital strain viral load to a detectable level due to the patient's weakened immune system. The severity and rapid progression of the disease, along with unchangeably high viral load (Ct = 13), could be associated with either a change in the dominant strain of SARS-CoV-2 or the patient's elderly age [23] . The mutation rate of SARS-CoV-2 inside the same host is a critical parameter for understanding viral evolution because it may become more infectious or more virulent. Choi et al. [24] described the case of persistent infection (over 150 days) accompanied by accelerated evolution of SARS-CoV-2 in an immunocompromised patient. On days 18 and 25, sequencing of SARS-CoV-2 genomes obtained from the patient revealed five amino acid substitutions compared to the reference, but later their number grew to over 20. The largest number of mutations was detected in the Spike protein, especially in the receptor-binding domain. We have not observed the appearance of new mutations, but strain 1, discussed in this work, is closely related to the strain obtained from the immunocompromised patient at day 18 (both of All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 30, 2020. ; https://doi.org/10.1101/2020.11.29.20238402 doi: medRxiv preprint them possess mutations Spike_D614G, NS3_Q57H, NSP2_T85I, and NSP12_P323L, but differ in mutation NSP13_T115I present only in the strain obtained from the immunocompromised patient and NSP3_T1482I present only in the strain 1). Our results show a drastic change in both strains' abundance: the dominant strain from the first sample almost disappeared in the sample obtained over a week later. The strain dominating in the first sample belongs to GH clade, while the strain which prevailed in the second sample belongs to GR clade. Change of the dominant strain in the viral community can be explained by different relative fitness caused by advantageous or disadvantageous mutations that distinguish one strain from another. In our case, potentially disadvantageous mutations (present in strain 1, which decreased its abundance over time) include Q57H in NS3 protein, T85I in NSP2 protein, and T1482I in NSP3 protein. NS3_Q57H demarcates GH clade [25] . As of 11.11.2020, it occurs in 4.6 % (40 out of 874) of Russian SARS-CoV-2 sequences. Possible effects of this mutation were discussed in several papers. In a work by Gupta et al. [26] , who used protein modeling, its effect was predicted as deleterious. Alam et al. discussed that it prevents ion permeability by constricting the channel pore more tightly, possibly reducing viral release and immune response [27] . In work by Wang et al. [28] , the authors suggested that it can make the SARS-CoV-2 more infectious. The second mutation present only in strain 1, NSP2_T85I, is rare in Russia and occurs in 3.4% (30 out of 874) Russian SARS-CoV-2 genomes (as of 11.11.2020). This mutation also has predicted deleterious functional outcome [29] . Wang et al. [28] discussed that this mutation benefits from other mutations like Spike_D614G and NS3_Q57H and could strengthen infectivity. Finally, the mutation NSP3_T1482I is present in only 96 genomes submitted to GISAID (as of 11.11.2020), and it was never discussed in scientific literature to our knowledge. Potentially advantageous mutations (present in the dominant strain in the second sample) include R203K and G204R in N protein. These mutations occur together as a result of the substitution of three consecutive nucleotides. The presence of these mutations demarcates clade GR [25] . Over 85% of Russian SARS-CoV-2 genomes submitted to GISAID (11.11 .2020) belong to the GR clade and possess both of these mutations. According to different protein modeling approaches, these mutations either destabilize N protein [27] or have a neutral effect [26] . Several articles point out the association of this clade with higher mortality [30] or significant prevalence among the group of severe disease or deceased patients and also higher prevalence in females and children compared to other clades [31] . All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 30, 2020. ; https://doi.org/10.1101/2020.11.29.20238402 doi: medRxiv preprint Other mutations present in both strains are D614G in spike protein and P323L in NSP12. Spike_D614G is one of the most widely discussed mutations. Its presence defines G clade [25] , it increases infectivity [32] [33] [34] , mortality [35, 36] , alters viral fitness [37] , and, according to protein modeling, strengthens the folding stability of the spike protein [28] . It is present in 99.2 % of SARS-CoV-2 genomes obtained from Russia (as of 11.11.2020). NSP12_P323L is predicted to make the polymerase more rigid, which may increase the replication speed [28] and mutation rate [38, 39] . It is present in 97.9 % of Russian SARS-CoV-2 genomes (as of 11.11.2020). In our work, we present a case of dual SARS-CoV-2 infection, which allowed us to compare the relative fitness of two genetically distant strains. The effect of SARS-CoV-2 dual infection on viral load and the severity or duration of COVID-19 is currently unknown. It is possible that presence of two different SARS-CoV-2 strains was a factor which lead to rapid progression of the patient's disease and death. The importance of research aimed at cases similar to ours can hardly be overestimated because it allows insights into the molecular epidemiology of COVID-19 and can help detectpotentially advantageous mutations, which increase virulence and fitness of SARS-CoV-2. Our study shows the case of dual SARS-CoV-2 infection by two phylogenetically distant strains and the dynamic in the viral community. The strain that became dominant eight days after the first sample was collected belonged to the GR clade, which is most prevalent in Russia, which raises the question of whether strains from this clade have higher fitness compared to strains belonging to the other clades. We have also detected two potentially advantageous mutations (N_R203K and N_G204R, belonging to the strain which increased its abundance over time) and three potentially disadvantageous (NS3_Q57H, NSP2_T85I, and NSP3_T1482I, belonging to the strain which decreased its abundance over time). Dual HIV-1 infection associated with rapid disease progression Update on HIV-1 and HIV-2 Dual Infection Dual infection of novel influenza viruses A/H1N1 and A/H3N2 in a cluster of Cambodian patients. The American journal of tropical medicine and hygiene Dual infections of the central nervous system with Epstein-Barr virus Genomic evidence for reinfection with SARS-CoV-2: a case study Setting the criteria for SARS-CoV-2 reinfection-six possible cases Symptomatic SARS-CoV-2 reinfection by a phylogenetically distinct strain Infection with different strains of SARS-COV-2 in patients with COVID-19 Within-patient genetic diversity of SARS-CoV-2. bioRxiv SCV-2000bp: a primer panel for SARS-CoV-2 full-genome sequencing Trimmomatic: a flexible trimmer for Illumina sequence data Cutadapt removes adapter sequences from high-throughput sequencing reads Fast gapped-read alignment with Bowtie 2 The sequence alignment/map format and SAMtools The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data BCFtools/csq: haplotype-aware variant consequences BEDTools: a flexible suite of utilities for comparing genomic features MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era Improving the limit of detection for Sanger sequencing: a comparison of methodologies for KRAS variant detection Variation in false-negative rate of reverse transcriptase polymerase chain reactionbased SARS-CoV-2 tests by time since exposure Updated understanding of the outbreak of 2019 novel coronavirus (2019-nCoV) in Wuhan Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host Emergence of European and North American mutant variants of SARS-CoV-2 in Southeast Asia Non-synonymous mutations of SARS-CoV-2 leads epitope loss and segregates its variants Evolving Infection Paradox of SARS-CoV-2: Fitness Costs Virulence? Characterizing SARS-CoV-2 mutations in the United States Conserved Genomic Terminals of SARS-CoV-2 as Co-evolving Functional Elements and Potential Therapeutic Targets. bioRxiv Large-scale population analysis of SARS-CoV2 whole genome sequences reveals host-mediated viral evolution with emergence of mutations in the viral Spike protein associated with elevated mortality rates. medRxiv Global dynamics of SARS-CoV-2 clades and their relation to COVID-19 epidemiology The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity. bioRxiv Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity SARS-CoV-2 viral spike G614 mutation exhibits higher case fatality rate Could the D614 G substitution in the SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? Spike mutation D614G alters SARS-CoV-2 fitness RdRp mutations are associated with SARS-CoV-2 genome evolution Specific mutations in SARS-CoV2 RNA dependent RNA polymerase and helicase alter protein structure, dynamics and thus function: Effect on viral RNA replication The authors would like to express their gratitude to the staff of the Molecular diagnostic