key: cord-0982069-fqpajsa6 authors: Lamarca, A. P.; Almeida, L. G. P. d.; Francisco, R. d. S.; Lima, L. F. A.; Scortecci, K. C.; Perez, V. P.; Brustolini, O. J.; Sousa, E. S. S.; Secco, D. A.; Santos, A. M. G.; Albuquerque, G. R.; Mariano, A. P. M.; Maciel, B. M.; Gerber, A. L.; Guimaraes, A. P. d. C.; Nascimento, P. R.; Freire Neto, F. P.; Gadelha, S. R.; Porto, L. C.; Campana, E. H.; Jeronimo, S. M. B.; Vasconcelos, A. T. R. d. title: Genomic surveillance of SARS-CoV-2 tracks early interstate transmission of P.1 lineage and diversification within P.2 clade in Brazil date: 2021-03-25 journal: nan DOI: 10.1101/2021.03.21.21253418 sha: 6d8ca4c6cbca4bdb0bc80f8bbb60bc75de25d599 doc_id: 982069 cord_uid: fqpajsa6 The sharp increase of COVID-19 cases in late 2020 has made Brazil the new epicenter of the ongoing SARS-CoV-2 pandemic. Novel SARS-CoV-2 lineages P.1 and P.2, first identified respectively in Manaus and Rio de Janeiro, have been associated with potentially higher transmission rates and antibody neutralization escape. In this study, we performed a whole-genome sequencing of 185 samples isolated from three out of the five Brazilian regions, including Amazonas (North region), Rio Grande do Norte, Paraiba and Bahia (Northeast region), and Rio de Janeiro (Southeast region) aiming to identify SARS-CoV-2 mutations that could be involved in the surge of COVID19 cases in Brazil. Here, we showed a widespread dispersion of P.1 and P.2 across Brazilian regions. Except for Manaus, P.2 was the predominant lineage identified country-wise. P.2 lineage was estimated to have originated in February, 2020 and has diverged into new clades. Interstate transmission of P.2 was detected since March, but reached its peak in December, 2020 and January, 2021. Transmission of P.1 was also high in December. P.1 origin was inferred to have happened in August 2020. We also confirmed the presence of the variant under investigation (VUI) NP13L recently described in the southernmost region of Brazil to have spread across the Northeastern states. P.1, P.2 and NP13L are descended from the ancient B.1.1.28 strain, although during the first phase of the pandemic in Brazil presence of B.1.1.33 strain was also reported. Here, for the first time, we investigate the possible occurrence of a new variant of concern descending from B.1.1.33 that also carries the E484K mutation. Indeed, the recurrent report of many novel SARS-CoV-2 genetic variants in Brazil could be due to the absence of effective control measures resulting in high SARS-CoV2 transmission rates. Altogether, our findings provided a landscape of the critical state of SARS-CoV-2 across Brazil and confirm the need to sustain continuous sequencing of the SARS-CoV-2 isolates worldwide in order to early identify novel variants of concern and to monitor for vaccine effectiveness. to have happened in August 2020. We also confirmed the presence of the variant under investigation (VUI) NP13L, recently described in the southernmost region of Brazil, to have spread across the Northeastern states. P.1, P.2 and NP13L are descended from the ancient B.1.1.28 strain, although during the first phase of the pandemic in Brazil presence of B.1.1.33 strain was also reported. We investigate here the possible occurrence of a new variant of interest descending from B.1.1.33 that also carries the E484K mutation. Indeed, the recurrent report of many novel SARS-CoV-2 genetic variants in Brazil could be due to the absence of effective control measures resulting in high SARS-CoV2 transmission rates. Altogether, our findings provided a landscape of the critical state of SARS-CoV-2 across Brazil and confirm the need to sustain continuous sequencing . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.21.21253418 doi: medRxiv preprint One year after identifying the first case of SARS-CoV-2 infection in Brazil, the country is in a catastrophic situation with 11 million cases of COVID-19 and 265,000 deaths (https://coronavirus.jhu.edu/map.html). The initially dominant lineages B.1.1.28 and B.1.1.33 [1] have been gradually replaced by the new variant of concern P.1 and variant of interest P.2 harboring harmful mutations [2] [3] [4] . P.2 was firstly reported in November 2020 in samples from the state Rio de Janeiro and was estimated in previous works to have emerged in late July [3] . By December 2020, it was already prevalent in samples in Rio Grande do Sul, which borders the countries Uruguay and Argentina [5] . P.1, on the other hand, was first detected in the city of Manaus, north region of Brazil in early January 2021 [2, 4] , with a proposed emergence between April and mid/late December. Both lineages evolved within the B.1.1.28 clade carrying the E484K mutation in the receptor-binding domain (RBD) of the Spike protein. In addition to E484K, P.1 also harbors N501Y and K417T mutations in the RBD region, and both shared with the new variants of concern from the United Kingdom (B.1.1.17) and South Africa (B.1.351). Those three mutations are suggested to allow the viral escape from previous hosts' immune responses [6] [7] [8] . This hypothesis is supported by the explosive spread of P.1 cases in Manaus and reports of reinfection involving both lineages. During the first phase of the COVID-19 pandemic in Brazil, national long-distance travels from large urban centers in the Southeast region to North and Northeast states was attributed to be the source of the explosion of cases across the country [1] . Since mid-November, there has been an expected and dramatic surge in new COVID-19 cases, supposedly caused by a reduction of social distance levels due to end-of-the-year holidays and summer vacations (from December to . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.21.21253418 doi: medRxiv preprint February). This sharp increase in cases is attributed to the emergence of P.1 lineage, which has already been reported in several cities in Brazil [9, 10] . Unfortunately, lineage pervasiveness and genomic diversity is still unknown or outdated in several Brazilian states. If the aforementioned mutations in P.1 and P.2 indeed promote escape from the host's immune response, this information is crucial to elaborate measures to slow nationwide and worldwide spread. Furthermore, there is also a need to investigate whether there is a relationship between the variants and clinical outcomes, epidemiological patterns, response to vaccines and novel drugs. Monitoring of P.1 lineage in Brazil is mostly executed using specific-targeted screening. Although this strategy is a valuable one to identify a chosen variants' occurrence, it cannot correctly evaluate the relative frequency of this variant in the screened population. Furthermore, targeted sampling's exclusive use prevents monitoring other lineages' evolution. In this context, systematic random sequencing of SARS-CoV-2 samples was decisive, as our work shows, to evaluate the prevalence of P.1 and P.2 across Brazilian states and monitor the emergence of new variants of interest within known lineages. In order to reduce this information gap, we performed an epidemiological and genomic survey by sequencing 185 SARS-CoV-2 new genomes from three Brazilian regions, including states of Amazonas (North region), Rio Grande do Norte, Paraíba, Bahia (all three in the Northeast region) and Rio de Janeiro (Southeast region). Samples were originated from random RT-PCR positive results and obtained between December 2020 and February 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Estadual de Santa Cruz -CAAE 39142720.5.0000.5526). All data was analyzed anonymously. cDNA synthesis and viral whole-genome amplification were carried out following the Artic Network protocol (https://artic.network/ncov-2019). Amplicon libraries were prepared either using the Nextera DNA Flex kit (Illumina, USA). Sequencing was performed in a MiSeq System using MiSeq Reagent Kit v3 (Illumina, USA). Bioinformatic analysis was performed using an inhouse pipeline for NGS data pre-processing, variant calling, and genome assembly as previously . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.21.21253418 doi: medRxiv preprint described [3, 5, 12] . The Wuhan-Hu-1 (NC_045512.2) sequence was used as the reference genome in our analysis. The evolutionary position of the newly sequenced genomes was inferred using 1441 sequences from Brazil and 70 from other countries obtained from the GISAID database on February 25th, 2021. The Brazilian background sequences were selected following previous published protocol [13] . Modifications were clustering aligned sequences by 0.99985 similarity with CD-hit [14] , keeping only the oldest record of each cluster and removing restrictions by country. Global sequences were added by selecting the oldest occurrence of each lineage in which the newly sequenced samples were classified into. Genome sequence from Wuhan-Hu-1 (NC_045512.2) sample was then added as an outgroup. All sequence alignment steps were conducted using MAFFT [15] . We used IQ-TREE2 [16] to infer the phylogeny of the final alignment. Simultaneously, the substitution model was selected with ModelFinder [16] using the global sequences as a proxy for genomic diversity within the larger alignment. Clade support was estimated using 10,000 replicates of bootstrap. To confirm the monophyly of NP13L clade and the new proposed lineage described in this work, we have also reconstructed their phylogenies with an expanded sampling to include all available sequences in GISAID that share their characteristic mutations. We extracted P.1 and P.2 clades from the resulting maximum likelihood phylogeny to infer divergence dates and spatial dispersion with BEAST v1.10.4 [17] . After evaluating with TempEst [18] the correlation between root-to-tip distances and sampling dates ( Figure S1 ), we selected the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.21.21253418 doi: medRxiv preprint strict clock model to date P.1 divergence and the lognormal uncorrelated clock for P.2 [19] . Models used in both analyses were the Cauchy's relaxed random walk for geographic coordinates [20, 21] and the exponential growth coalescent tree prior. The MCMC was run through 10,000,000 steps with sampling every 10,000th and a burn in of 10% of the posterior results. We extracted ancestor coordinates using the SERAPHIM package [22] in R software. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We identified 794 single-nucleotide variants (SNVs) across the 185 genomes sequenced, of which 49% were missense substitutions, 45% synonymous and 6% in non-coding regions of the genome ( Figure S3 ). We also found three nonsense mutations in ORF8 (n = 2) and ORF7a (n = 1) in genomes from Rio de Janeiro, Rio Grande do Norte and Paraíba. We observed an elevated accumulation of mutation in the 3'UTR of the genome, mainly targeting ORF3 (subunits a, c and d), ORF9 (b and c), ORF8 and ORF7a (Appendix 1 Table 2 ). The nucleocapsid (N) protein and the subunit S1 of Spike protein showed the highest accumulation among the structural proteins of the SARS-CoV-2 genome. We found 16 SNVs targeting the receptor-binding domain (RBD) in S1, of which eight were missense variants, including K417T, N439K, L452R, S477R, E484K, N501Y, L518I, A522V. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The ongoing surge of SARS-CoV-2 in Brazil since the end of 2020 has turned the country into the epicenter of a very quickly spread of new variants [2, 4] . In the present work, we have conducted a genomic surveillance of SARS-CoV-2 spread and evolution in historically undersampled regions of Brazil. Through the reconstruction of phylodynamics from P.1 and P.2 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.21.21253418 doi: medRxiv preprint lineages, we have identified possible new variant lineages and past/current interstate transmission routes. We have also inferred the origin of P.1, suggested to be the lineage causing a drastic resurgence in COVID-19 cases [24] , to have occurred around August. In contrast, phylogenetic analyses of P.2 indicate that the lineage originated in February 2020, when the virus was first reported in the country, and is evolving into differentiated clades. Our genomic surveillance has evaluated the frequency of lineages currently circulating in each sampled state. As expected, proximity to the Amazonas state seems to be correlated to the pervasiveness of P.1 lineage, as exemplified by the variation observed in Rio Grande do Norte, Paraíba and Rio de Janeiro. The relatively low frequency of P.1 and high frequency of P.2 in our sample from the south of the state of Bahia, a region distant from large airports, may shed light on a much more complex relation between traveling and viral dynamics rather than guilt by association (i.e., mere vicinity). Indeed, previous works suggest that viral spread in smaller or distant cities may happen in a first-come-first-get dynamic, with one lineage overtaking the population [25] [26] [27] . Beyond south Bahia cities, this can be seen on Amazonas samples, where all four samples were from P.1 lineage. A significant advantage of low viral diversity is to decrease the likelihood recombination between lineages during a coinfection [5] . These results reinforce the importance of both local and international traveling restrictions as a preventive measure to slow the spread of the virus [28] , measures still not enforced in Brazil and in many other countries. As an alternative, policies such as social distancing and early detection of more pathogenic variants could have curtailed the spread of P.1 and P.2 across states and unburdened the public health system [29] [30] [31] . Beyond inferred dispersal routes and spread, some lineages analyzed in this work require attention due to their evolutionary dynamics. First, we have observed that P.2 lineage has . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.21.21253418 doi: medRxiv preprint differentiated in several subclades between April and September of 2020 (Figure 3) , all of them present in many states. The occurrence of P.2 subclades, in practice, means that epidemiological parameters, such as transmission rate, lethality, and immune response escape may vary within the lineage, hindering its containment [32] . If uncontrolled, the expected evolutionary course is for these subclades to evolve into whole lineages with exclusive mutations. Secondly, we have confirmed the monophyly of lineage NP13L [5] , first described in January in Brazil's southern regions. Since December, NP13L has been transmitted to Paraíba, and Bahia's states, possibly from the Rio Grande do Sul. It has also been detected in England, Japan, and the Netherlands. Higher . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.21.21253418 doi: medRxiv preprint Convergent mutations seem to play an essential role in the evolutionary dynamics of SARS-CoV-2. Intense selective pressure from the immune system against prolonged infections may promote intrahost variants with higher adaptive value [12, [35] [36] [37] [38] [39] . Previous studies have shown that both N501Y and E484K have independently emerged in patients with persistent infection [35, 40] . Indeed, all convergent mutations aforementioned are somehow associated to viral escape from immune system response: N439K has shown to escape immune escape from both polyclonal and monoclonal antibodies [41, 42] ; E484K has been associated with escape from both vaccines and previous infections [8, 23, [43] [44] [45] ; and N501Y leads to increased binding specificity to the receptor and is associated with high transmissibility while also escaping immune response [46, 47] . Altogether, the combination of these mutations raises the variant's fitness even higher, and increases the chance of the variant sequence becoming a new and dominant lineage [46] . Continuous monitoring of the convergent sequences here described is fundamental to follow their development and prevent spread in a worst-case-scenario. Implementation of suitable genomic surveillance approaches through random sampling is a powerful tool to monitor known and new variants across the country. It can guide the elaboration of efficient governmental policies that avoid the collapse of the national healthcare system, as it is happening now. Both targeted screening and random sampling methods are complementary and congruent to an adequate evaluation of the current pandemic status. Of note, the analyses conducted here are highly dependent on broad sequence sampling through both time and space, which requires both technical and human resources training. Consequently, genomic surveillance is conducted only by a handful of laboratories, much less than needed to cover a continental-sized country as Brazil efficiently. Scientific collaborations such as conducted here bypasses regional barriers to monitor the advances of new and known lineages across states and foments an integrated . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Country-wide genomic surveillance is a significant step to better understand the origin and spread of new lineages. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Distribution of single-nucleotide variants (SNVs) found in the 185 genomes sequenced in this study. Each vertical line represents the relative variant frequency in the total number of genomes sequenced and its target protein products. The receptor-binding domain (RBD) highlighted in red showed the main mutations associated with the variant of concern P.1 and variant of interest P.2. Density plot shows the accumulation of mutations across the SARS-CoV-2 genome. Tables Table S1 . Sample information. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Evolution and epidemic spread of SARS-CoV-2 in Brazil Genomic characterisation of an emergent SARS-CoV-2 lineage in Manaus: preliminary findings Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil Phylogenetic relationship of SARS-CoV-2 sequences from Amazonas with emerging Brazilian variants harboring mutations E484K and N501Y in the Spike protein Pervasive transmission of E484K and emergence of VUI-NP13L with evidence of SARS-CoV-2 co-infection events by two different lineages in Rio Grande do Sul, Brazil Genomic evidence of a SARS-CoV-2 reinfection case with E484K spike mutation in Brazil Spike E484K mutation in the first SARS-CoV-2 reinfection case confirmed in Brazil Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition Genomics and epidemiology of a novel SARS-CoV-2 lineage in Manaus, Brazil International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted CDC. CDC 2019-Novel Coronavirus (2019-nCoV) Real-Time RT-PCR Diagnostic Panel Intra-host evolution during SARS-CoV-2 persistent infection Multiple Introductions Followed by Ongoing Community Spread of SARS-CoV-2 at One of the Largest Metropolitan Areas of Northeast Brazil CD-HIT: accelerated for clustering the next-generation sequencing data MAFFT multiple sequence alignment software version 7: improvements in performance and usability ModelFinder: fast model selection for accurate phylogenetic estimates Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) Relaxed phylogenetics and dating with confidence Phylogeography takes a relaxed random walk in continuous space and time Unifying the spatial epidemiology and molecular evolution of emerging epidemics SERAPHIM: studying environmental rasters and phylogenetically informed movements Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence First Phylogenetic Analysis of Malian SARS-CoV-2 Sequences Provides Molecular Insights into the Genomic Diversity of the Sahel Region Genomic Epidemiology of SARS-CoV-2 in Madrid, Spain, during the First Wave of the Pandemic: Fast Spread and Early Dominance by D614G Variants. Microorganisms Tracking the introduction and spread of SARS-CoV-2 in coastal Kenya. bioRxiv. medRxiv Variation in human mobility and its impact on the risk of future COVID-19 outbreaks in Taiwan Insufficient social distancing may contribute to COVID-19 outbreak: The case of Ijuí city in Brazil Impact of self-imposed prevention measures and short-term government-imposed social distancing on mitigating and delaying a COVID-19 epidemic: A modelling study A model-based evaluation of the efficacy of COVID-19 social distancing, testing and hospital triage policies Toward Understanding Molecular Bases for Biological Diversification of Human Coronaviruses: Present Status and Future Perspectives Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa. bioRxiv. medRxiv Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host The emergence and ongoing convergent evolution of the N501Y lineages coincides with a major global shift in the SARS-CoV-2 selective landscape. bioRxiv. medRxiv A Genomic Perspective on the Origin and Emergence of SARS-CoV-2 Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape SARS-CoV-2 genomic analyses in cancer patients reveal elevated intrahost genetic diversity SARS-CoV-2 evolution during treatment of chronic infection Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-mediated immunity The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants mRNA vaccineelicited antibodies to SARS-CoV-2 and circulating variants Evolution of antibody immunity to SARS-CoV-2 Comprehensive mapping of mutations in the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human plasma antibodies We would like to thank all authors and the administrators of the GISAID database, allowing this genomic epidemiology study to be properly conducted. A full list of acknowledgment is available in Table S3 . A list acknowledging those from different institutions that participated in this study follows bellow: 23. Voloch CM, da Silva F R, de Almeida LGP, Cardoso CC, Brustolini OJ, Gerber AL, et al.. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.21.21253418 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.