key: cord-0698820-g9pwxfd2 authors: Muñoz, Marina; Patiño, Luz H.; Ballesteros, Nathalia; Paniz-Mondolfi, Alberto; Ramírez, Juan David title: Characterizing SARS-CoV-2 genome diversity circulating in South American countries: signatures of potentially emergent lineages? date: 2021-02-20 journal: Int J Infect Dis DOI: 10.1016/j.ijid.2021.02.073 sha: 5adfe79458623b8b71e1c1b3ccf370a81445a654 doc_id: 698820 cord_uid: g9pwxfd2 Objectives To evaluate the genomic diversity and geographic distribution of SARS-CoV-2 lineages currently circulating in South America (SA). Methods SARS-CoV-2 lineages reported from South American countries were analyzed from a public dataset of 5597 genome assemblies. Polymorphisms in the main open reading frames were identified and compared to those present in the main lineages of epidemiology concern: B.1.1.7 (from UK) and B.1.351 (from South Africa). Results A total of 169 circulating lineages across 16 South American countries were identified, revealing a predominant circulation of the major lineage B, the one with the greatest diversity and broadest geographic distribution. Sixteen lineages were selected to be comprehensively analyzed because are predominant in the region. Two lineages of concern report were remarkably identified: P.1 recognized as the Brazilian variant (with 94 genomes) and B.1.1.7 the UK variant (with 28 genomes), both displayed 33 polymorphisms compared to the reference genome, the highest number among analyzed genomes. While the most of other lineages circulating in SA displayed 24 or less polymorphisms compared to the reference strain. A high number of polymorphisms were detected in the evaluated lineages with a limited number of common variable positions amongst them, in agreement with the profiles identified in the main lineages of epidemiology concern. Conclusions The ever-increasing genetic diversity of SARS-CoV-2 continues to lead to the emergence of novel lineages. Different variants and lineages are now present across SA with the presence of major lineage B mostly. The circulation of variants as P.1 and B.1.1.7 in different countries as well as the elevated number of polymorphisms highlighted the importance of continuing with the genomic surveillance to determine introduction events, identify transmission chains, trace emergence and help to implement successful prevention, vaccination and control strategies. Objectives: To evaluate the genomic diversity and geographic distribution of SARS-CoV-2 lineages currently circulating in South America (SA). Methods: SARS-CoV-2 lineages reported from South American countries were analyzed from a public dataset of 5,597 genome assemblies. Polymorphisms in the main open reading frames were identified and compared to those present in the main lineages of epidemiology concern: B.1.1.7 (from UK) and B.1.351 (from South Africa), Results: A total of 169 circulating lineages across 16 South American countries were identified, revealing a predominant circulation of the major lineage B, the one with the greatest diversity and broadest geographic distribution. Sixteen lineages were selected to be comprehensively analyzed because are predominant in the region. Two lineages of concern report were remarkably identified: P.1 recognized as the Brazilian variant (with 94 genomes) and B.1.1.7 the UK variant (with 28 genomes), both displayed 33 polymorphisms compared to the reference genome, the highest number among analyzed genomes. While the most of other lineages circulating in SA displayed 24 or less polymorphisms compared to the reference strain. A high number of polymorphisms were detected in the evaluated lineages with a limited number of common variable positions amongst them, in agreement with the profiles identified in the main lineages of epidemiology concern. Conclusions: The ever-increasing genetic diversity of SARS-CoV-2 continues to lead to the emergence of novel lineages. Different variants and lineages are now present across SA with the presence of major lineage B mostly. The circulation of variants as P.1 and B.1.1.7 in different countries as well as the elevated number of polymorphisms highlighted the importance of continuing with the genomic surveillance to determine introduction events, identify transmission chains, trace emergence and help to implement successful prevention, vaccination and control strategies. Genomic surveillance along with real-time monitoring and data-sharing networks have become a valuable combination of tools to improve understanding of SARS-CoV-2 transmission and epidemic dynamics in developed countries. Following the onset of the pandemic, multiple SARS-CoV-2 variants have arisen, including the emerging lineage B.1.1.7 initially described in the UK, and now spreading globally. This lineage is of interest because of its estimated increased transmissibility (Rambaut Andrew et al., 2020) . Despite substantial advances, implementation of genomic surveillance remains a challenge for most developing countries were access to whole genome sequencing is limited. This study not only aimed to characterize the genetic lineages circulating in South America (SA) but also to provide a deeper understanding on the geographical pattern of distribution and potential diversification pathways of the virus across the region. We analyzed a total of 5,583 publicly available genomes of high quality from SA hosted in the GISAID (Global Initiative on Sharing All Influenza Data) database (Hadfield et al., 2018) reported until February 8 th , 2021 (Supplementary Table 1 ). The typing report based on PANGOLIN J o u r n a l P r e -p r o o f nomenclature lineage assigner available in the metadata file was downloaded and analyzed using the previously reported scheme (Ramirez et al., 2020b) . Results were graphically represented in Microreact (Argimon et al., 2016) . A set of lineages of interest was comprehensively analyzed considering as criteria for its selection a predominant distribution in SA countries (first reporting date identified and frequency of the total data for that lineage). Genetic polymorphisms across the main Open Reading Frames (ORFs) distributed among lineages of interest were identified and compared to the variants of concern reported worldwide (https://cov-lineages.org/index.html), following the methodology previously published (Ramirez et al., 2020b) from the alignment of the total of sequences available for each one. The total of 5,583 publicly available whole-genome assemblies from 16 South American countries (Supplementary Table 1 has not been reported in the region at the analysis date. Interestingly, P.1. the third lineage of concern report was identified in the region and was part of the comprehensively analyzed dataset, with 74 genomes at the date of analysis. The phylogenetic reconstructions confirm the high diversity of the major lineage B and its close relationship with all other major lineages in monophyletic clusters (Fig. 1C) . Despite the predominance of these lineages of interest in certain Identifying the diversity of SARS-CoV-2 is essential to monitor the dispersion dynamics of the pandemic in different regions of the world. A high diversity of lineages was found in the South American region, some of them with a predominant distribution in specific South American countries that has now spread to different countries. In addition, it can be highlighted the first detection in Brazil and the rapid increase in the number of sequences of the P.1. lineage (a lineage of concern initially described in Manaus, Brazil with suspected increased transmissibility), as well as the occurrence of common polymorphisms of lineages comprehensively analyzed with the variant of concern report B.1.1.7 (identified in Brazil, Ecuador, Argentina, Perú and Trinidad and Tobago) across main-ORFs of the viral genome. It provides insights about the existence of potentially emerging lineages across SA, that can be the result of the changing nature of its genome that leads to the appearance of new variants, a frequent event in this type of viruses (Lauring and Hodcroft, 2021) . Despite the potential impact of these genomic variants on infectivity and pathogenicity remain yet to be fully determined, it has been proposed that some of the polymorphisms found in the lineages circulating in the region may have potential functional significance, mainly those located in spike gene (i.e. N501Y, E69/70, P681H, 144Y, A570D, E484K, K417N, K417T), which could affect in terms of transmission, and most concerning with immunological escape (Rambaut Andrew et al., 2020) . The broad SARS-CoV-2 lineages diversity J o u r n a l P r e -p r o o f circulating in SA could then exacerbate the impact of the pandemic in the region, affecting the reliability of molecular diagnosis schemes (Ramirez et al., 2021) and even the reduction in the efficacy of vaccines, because the polymorphisms in the spike gene could promote immune evasion (Bouayad, 2020 , McCarthy et al., 2021 ; however, these potential impacts are still grounds for investigation. In general, the phylogenetic diversity of SARS-CoV-2 circulating in SA represented a dynamic and fast-growing number of lineages in the region. However, the limited number of available genomes for the region in comparison with others, prevent a precise view of the overall genetic landscape, reaffirm the need to strengthen genomic-based surveillance systems in the region. At future, it is necessary to develop studies to evaluate the impact of this lineage diversity in the reliability of molecular diagnosis and success of vaccination strategies. sites were compared with those found in the variants of concern report B.1.1.7 and B.1.351 (bottom). Colors were assigned by major lineage according to the color code of Fig. 1 . The number inside the boxes indicates the proportion of genomes exhibiting the polymorphism compared to the reference strain NC045512-2-Wuhan-Hu-1. The absence of color represents a conserved position. The terms 'Del' and "Ins" indicate a deletion and insertion, respectively. The description of lineages was consulted in https://cov-lineages.org/lineages.html. * Lineage with a number of spike mutations with likely functional significance E484K, K417T, and N501Y. Described in https://virological.org/t/genomic-characterisation-of-an-emergent-sarscov-2-lineage-in-manaus-preliminary-findings/586. ** Lineage of epidemiological concern associated with the N501Y mutation. More information can be found at cov-lineages.org/global_report.html *** Defined as the new variant 501Y.V2 -The description of this lineage is available as preprint: https://www.medrxiv.org/content/10.1101/2020.12.21.20248640v1. J o u r n a l P r e -p r o o f Microreact: visualizing and sharing data for genomic epidemiology and phylogeography Innate immune evasion by SARS-CoV-2: Comparison with SARS-CoV Nextstrain: real-time tracking of pathogen evolution Genetic Variants of SARS-CoV-2-What Do They Mean? Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations The arrival and spread of SARS-CoV-2 in Colombia Genetic Diversity Among SARS-CoV2 Strains in South America may Impact Performance of Molecular Detection Will the emergent SARS 1.1.7 lineage affect molecular diagnosis of COVID-19 We thank the High Computing Cluster (CENTAURO) from Universidad del Rosario for their support during the analyses of the data. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.