key: cord-0805474-yzhi9v7g authors: Pattabiraman, C.; Prasad, P.; George, A. K.; Sreenivas, D.; Rasheed, R.; Reddy, N. V. K.; Desai, A.; Vasanthapuram, R. title: Importation, circulation, and emergence of variants of SARS-CoV-2 in the South Indian State of Karnataka date: 2021-03-20 journal: nan DOI: 10.1101/2021.03.17.21253810 sha: a4cb3fcf963d3fa76b0993f8e32c4537ce46f4c0 doc_id: 805474 cord_uid: yzhi9v7g As the pandemic of COVID-19 caused by the coronavirus SARS-CoV-2 continues, the selection of genomic variants which can influence how the pandemic progresses is of growing concern. Of particular concern, are those variants that carry mutations/amino acid changes conferring higher transmission, more severe disease, re-infection, and immune escape. These can broadly be classified as Variants of Concern (VOCs). VOCs have been reported from several parts of the world- UK (lineage B.1.1.7), South Africa (lineage B.1.351) and, Brazil (lineage P.1/B.1.1.28). The conditions that contribute to the emergence of VOCs are not well understood. International travel remains an important means of spread. To track importation, spread, and the emergence of variants locally; we sequenced whole genomes of SARS-CoV-2 from international travellers (n=75) entering Karnataka, a state in South India, between Dec 22, 2020- Jan 31, 2021, and from positive cases in the city of Bengaluru (n=108), between Nov 22, 2020- Jan 22, 2021. The resulting 176 SARS-CoV-2 genomes could be classified into 34 lineages, that were either imported (73/176) or circulating (103/176) in this time period. The lineage B.1.1.7 (a.k.a the UK variant) was the major lineage imported into the state (24/73, 32.9%), followed by B.1.36 (20/73, 27.4%) and B.1 (14/73, 19.2%). We identified B.1.36 (45/103; 43.7%), B.1 (26/103; 25.2%), B.1.1.74 (5/103; 4.9%) and B.1.468 (4/103; 3.9%) as the major variants circulating in Bengaluru city. A distinct clade within the B.1.36 lineage was associated with a local outbreak. Analysis of the complete genomes predicted multiple amino acid replacements in the Spike protein. In total, we identified nine amino acid changes (singly or in pairs) in the Receptor Binding Domain of the Spike protein. Of these, the amino acid replacement N440K was found in 37/65 (56.92%) sequences in the B.1.36 lineage. The E484K amino acid change which is present in both VOCs, B.1.351 and P.1/B.1.1.28, was found in a single circulating virus in the B.1.36 lineage. This study highlights the introduction of VOCs by travel and the local circulation of viruses with amino acid replacements in the Spike protein. These were spread across lineages, suggesting that multiple paths can lead to the emergence of VOCs, this, in turn, highlights the need to sequence and limit outbreaks of SARS-CoV-2 locally. Our data support the use of concentrated and continued genomic surveillance of SARS-CoV-2 to direct public health measures, suggest revisions to vaccines, and serve as an early warning system to prepare for a surge in COVID-19 cases. The COVID-19 pandemic caused by the coronavirus SARS-CoV-2 has claimed millions of lives and has affected people living in all parts of the globe 1 . The evolution of the virus did not initially alarm public health specialists or those involved in vaccine development 2 . However, the emergence of variants with distinct biological properties which include one or more mutations that confer higher infectivity, increased transmission, severe disease, re-infection, and immune escape are a cause for concern [3] [4] [5] [6] [7] [8] [9] . Such variants may influence the trend of the pandemic and are therefore broadly known as Variants of Concern (VOCs) [3] [4] [5] [6] [7] [8] . In India, the COVID-19 pandemic began with the importation of the virus in January 2020 10 . It is only after 11 million cases and over 150k deaths that the numbers declined, signalling the end of the first wave of SARS-CoV-2 in the country 1, 10, 11 . As with other countries in the world, India too started vaccination campaigns in January 2021, at about the same time that reports of VOCs were communicated from the UK, Brazil, and South Africa 3, 4, 6, 11 . The primary concern is that they may herald the second wave of SARS-CoV-2 in the county and/or undermine the vaccination drive. Genomic studies in India have shown that several lineages of SARS-CoV-2 have been introduced, have spread, and fallen below the limit of detection since January 2020 12, 13, 22, [14] [15] [16] [17] [18] [19] [20] [21] . We have previously performed detailed genomic epidemiology of SARS-CoV-2 in the South Indian state of Karnataka, with a population of 64.1 million (Census 2014) 22 . We found multiple introductions of SARS-CoV-2 into the state and at least seven distinct lineages were already circulating in the state by May 2020. Detailed analysis of the contact network of COVID-19 cases to look at transmission within the state emphasized the role of symptomatic individuals in spreading the virus 23 . These data have contributed to our understanding of how the virus enters, spreads, and evolves in a population. In the genomic epidemiology study, no particular lineages were associated with disease severity 22 . Studies of sequences from India juxtaposed with sequences from all over the world, suggest that mutations associated with immune escape and re-infection are already circulating in the population 2,24-26 . Multiple lineages of SARS-CoV-2 have been reported from across the world and in India 12, 13, [15] [16] [17] [19] [20] [21] [22] 27 . There are two ancestral lineages of SARS-CoV-2 in the PANGO classification system, A and B 28 . While viruses of both lineages are circulating across the world, viruses of lineage B are more widespread and prominent in number. The viruses responsible for the catastrophic outbreak in Italy, in early 2020, with an amino acid change in the spike protein D614G and were classified into lineage B.1 28 . This lineage is now the dominant lineage across the world. Several studies have now shown that viruses in this lineage transmit better, with increased infectivity in cell culture [29] [30] [31] [32] 2, 5, 7, 8 . The spread of these lineages, therefore, has global implications 5, 33 . Early data suggests that some variants may escape neutralization by both therapeutic antibodies and antibodies induced by previous infection and vaccination 8, 9, 34, 35 . This has implications for the efficacy of Spike sequence-based vaccines and suggests that re-infection is possible 7, 36 . Rapid sharing of genomic information enabled the global community to pick-up cases of VOCs and implement relevant public health measures 3,4,6 . A concentrated, ongoing, local approach to genomic surveillance is critical for the identification of variants and establishing epidemiological links with the trend of the outbreak 5, 7, 12, 22 . This has also proved critical for local outbreak management and informed policy decisions across the world 5, 7, 37, 38 . It is in this context that we conducted genomic surveillance of COVID-19 positive international travellers to the south Indian state of Karnataka between Dec 22, 2020-Jan 31, 2021 (n=75). We also performed sequencing of SARS-CoV-2 (n=108), collected between Nov 22, 2020-Jan 22, 2021) in Bengaluru city (Bengaluru Urban District) to identify and track locally circulating variants and potential VOCs. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The Department of Neurovirology, at the National Institute of Mental Health and Neurosciences Nucleic acid extraction was performed with automated magnetic bead-based extraction method, using the Chemagic Viral DNA/RNA special H96 kit (PerkinElmer, CMG-1033-S) following manufacturer's instruction. SARS-CoV-2 detection was done using ICMR approved diagnostic kits. A total of 197 RT PCR positive samples fulfilling the following criteria -i. Ct values less than 30 in the case of international travellers (n=75), and local outbreak (n=14) or ii. Ct value less than 25 for local cases (n=108), were taken for whole genome sequencing. Whole genome sequencing was performed using the amplicon sequencing approach described in the ARTIC Network protocol using the V3 primer set 39 . The resulting amplicons from 12-24 samples were barcoded using the native barcoding kits (NBD104/114, Oxford Nanopore Technology (ONT)) and sequencing libraries were prepared using the ligation sequencing kit (SQK-LSK109, ONT). The barcoded library was loaded on to FLO-MIN-106 flow cells and sequenced on the MinION (ONT). An average of 0.12 million (median) sequencing reads were acquired per sample with a median coverage of 1737x (Supplementary Table1). Analysis of sequencing reads was performed as described previously 22 . Briefly, sequences were basecalled and demultiplexed using guppy (v3.6). Amplicon sequencing primers were removed from the reads by trimming 25bp at the ends and using BBDuk (v38.37). Reference mapping based assembly of the genomes was performed using Minimap2 ver 2.17 using NC_045512 as the reference. A consensus genome was generated with a coverage cut-off of 10x and the 0% majority is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2021. ; rule. This was then edited, and aligned to the reference for annotation. Of the 183 samples from international travellers and local cases, 176 (73/75 imported, 103/108 circulating) genomes could be used for the determination of lineage using the PANGO web application (Pangolin v2.2.2 lineages version 2021-02-12) 28 . Of the 176 genomes,162 were complete (>92% at 1X and >85% at 10X) and were deposited into the GISAID Database 40 , accession numbers are provided in Supplementary Table 2 . Complete sequences (162) were analysed for SNPs and amino acid replacements with reference MN908947.3 (Wuhan-Hu-1) using the CoV-Glue Web Application 41 . A total of 168 genomes, including the 162 described above, and an additional 6 complete genomes from a local outbreak, were used for phylogenetic analysis with the reference NC_045512 as an outgroup. Multiple sequence alignment was performed using MUSCLE and a maximum likelihood tree was constructed using iqtree 42, 43 . The GTR+F+I+G4 substitution model was found to be the best-fit model (of the 88 models tested) using the Bayesian Information Criterion. The consensus tree was constructed from 1000 bootstraps and bootstrap values over 70 were interpreted. We sequenced SARS-CoV-2 genomes from 197 SARS-CoV-2 positive individuals, including international travellers (n=75), local cases (n=108), and a local outbreak (n=14). Lineage classification using the PANGO scheme was possible for 176 genomes which were either imported (73/75) or circulating (103/108) (Fig 1 A, B) , and for all 14 genomes from the local outbreak (Supplementary Table 3 ). The genomic surveillance for the local outbreak was carried out to identify the lineage/lineages responsible for the outbreak (Fig 1C) . A total of 34 lineages were detected from the 176 genomes in this study. A complete list of lineages and their frequencies is provided in Supplementary (Fig 1 A, Supplementary Table 4 ). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint (Fig 1C) . (Fig 1C) . Of the 176 genomes from travellers and in circulation, for which lineage classification was possible,162 complete genomes (with coverage > 92% at 1X and > 85% at 10X) were used for the analysis of SNPs and amino acid replacements. A total of 968 SNPs (Supplementary Table 5 Table 9 ). We carried out further analysis of the amino acid replacements in the RBD domain of the spike protein (Fig 2A, Supplementary Tables 7,8 ) and mapped them on the Maximum-Likelihood tree ( Fig 2B) . We identified mutations leading to nine amino acid replacements in the RBD (Fig 2A) . Of these, five (S477N, E484K, E484Q, S494L, S494P) were found in viruses circulating in Bengaluru, and the amino acid replacement V483A was from an imported case. The N501Y change was is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Table 4 Table 9 ). The immune escape associated amino acid change, N440K has been reported from the states of Andhra Pradesh, Maharashtra, Telangana, and Karnataka, and is also associated with reinfection 24, 36, 44 . This change was found in 37/65 (56.92%) of the sequences clustering to B.1.36 (Supplementary Table 9 ). An outbreak of SARS-CoV-2 occurred in Bengaluru in early Feb 2021, raising concerns about the spread of variants, the threat of a second wave, and reduction in the efficacy of vaccines. This is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2021. ; https://doi.org/10.1101/2021.03.17.21253810 doi: medRxiv preprint outbreak in a college where students were returning from different states within India was driven by related viruses belonging to the B.1.36 lineage (Fig 1C, Supplementary Table 3 ). Only one of the six sequences from the outbreak cluster had the mutation resulting the N440K replacement in the Spike protein (Fig 2C, Supplementary Table 10 ). This supports the idea that mutations in gene encoding the Spike protein may arise sporadically/multiple times in different clades. Apart from the introduction and spread of known VOCs, the emergence of variants locally is also a cause for concern. Early in the pandemic, a single mutation in the gene encoding the Spike protein of SARS-CoV-2 resulting in a D614G amino acid change was identified to increase infectivity and transmission 2, 29, 32 . Viruses with this amino acid replacement dominate across the globe 31, 45 . Mutations in the gene encoding the Spike protein are of particular concern due to the role of this protein and its Receptor Binding Domain (RBD) in viral binding and entry 46 . Some of these mutations have been shown to increase infectivity, affinity to the ACE-2 receptor or affect neutralization by antibodies in vitro. Viral genomes with these mutations were already circulating viruses by mid-2020 2, 25, 26, 44, 47, 48 . In the sequences from this study, nine amino acids replacements were noted in the RBD domain of the Spike protein (Fig 2B, Supplementary 7-8 ). They occurred singly or in pairs (N440K+F490S) (Fig 2) . All nine amino acid changes, namely N440K, S477N, V483A, E484K/Q, F490S, S494L/P, N501Y are associated with immune escape 24, 25 . Viruses with some of these amino acid changes were already known to be circulating in other parts of India 16, 17, 24 . Mutations in the gene encoding Spike protein that do not map to the RBD have also been described; particularly near the polybasic cleavage site at the S1/S2 boundary of the Spike protein. Towards the end of the year 2020, multiple lineages with amino acid replacements at position 677 were noted 49 . Four viruses in our study have mutations resulting in amino acid changes at this position (Q677H (n=3), Q677P (n=1)) (Supplementary Table 6 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2021. ; phylogenetic relationships between SARS-CoV-2 sequences, especially in the context of outbreaks 50 . In summary, our data highlight an increase in the frequency of the lineage B.1.36 in Bengaluru Urban, in Karnataka, and importation events indicate an underappreciated global burden (Fig 1, Supplementary Table 4 ). Whether this increase is because of epidemiological linkages such as increased travel, continued local transmission chains or super-spreader events remains to be determined. It is beyond the scope of this work to examine whether the lineage, contributing mutations, and amino acid changes impact transmission/infectivity of the virus. Our data emphasize that a consolidated and local approach to genomic surveillance which includes sequencing of SARS-CoV-2 from travellers, circulating variants, and outbreaks, in a continuous manner is necessary to detect VOCs. Rapid identification of such variants can aid in preparing the healthcare system for a surge in cases, suggest revisions to vaccines and diagnostic tests, inform the international community, and guide public health measures. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2021. ; https://doi.org/10.1101/2021.03.17.21253810 doi: medRxiv preprint An interactive web-based dashboard to track COVID-19 in real time The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations Genomics and epidemiology of a novel SARS-CoV-2 lineage in Manaus Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence Escape of SARS-CoV-2 501Y.V2 variants from neutralization by convalescent plasma Antibody Resistance of SARS-CoV-2 Variants B.1.351 and B.1.1.7. Nature Laboratory surveillance for SARS-CoV-2 in India: Performance of testing & descriptive epidemiology of detected COVID-19 Integrated genomic view of SARS-CoV-2 in India SARS-CoV-2 genomics: An Indian perspective on sequencing viral variants A Distinct Phylogenetic Cluster of Indian Severe Acute Respiratory Syndrome Coronavirus 2 Isolates Insights from Genomes and Genetic Epidemiology of SARS-CoV-2 isolates from the state of Andhra Pradesh Mutational landscape and dominant lineages in the SARS-CoV-2 infections in the state of Telangana Full-genome sequences of the first two SARS-CoV-2 viruses from India Initial insights into the genetic epidemiology of SARS-CoV-2 isolates from Kerala suggest local spread from limited introductions Analysis of the potential impact of genomic variants in global SARS-CoV-2 genomes on molecular diagnostic assays Genomic variations in SARS-CoV-2 genomes from Gujarat: Underlying role of variants in disease epidemiology Genomic epidemiology reveals multiple introductions and spread of SARS-CoV-2 in the Indian state of Karnataka Descriptive epidemiology of SARS-CoV-2 infection in Karnataka state, South India: Transmission dynamics of symptomatic vs. asymptomatic infections Genetic epidemiology of variants associated with immune escape from global SARS-CoV-2 genomes Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding High throughput detection and genetic epidemiology of SARS-CoV-2 using COVIDSeq next generation sequencing A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Spike mutation D614G alters SARS-CoV-2 fitness Functional importance of the D614G mutation in the SARS-CoV-2 spike protein Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity The COVID-Genomics UK (COG-UK) consortium, Network for Genomic Surveillance in South Africa (NGS-SA), Brazil-UK CADDE Genomic Network, Houriiyah MUGK. Tracking the international spread of SARS-CoV-2 lineages B SARS-CoV-2 variants B.1.351 and B.1.1.248: Escape from therapeutic antibodies and antibodies induced by infection and vaccination SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma Asymptomatic Reinfection in 2 Healthcare Workers From India With Genetically Distinct Severe Acute Respiratory Syndrome Coronavirus 2 Large scale sequencing of SARS-CoV-2 genomes from one region allows detailed epidemiology and enables local outbreak management Sixteen novel lineages of SARS-CoV-2 in South Africa Josh Quick 2020. nCoV-2019 sequencing protocol v3 (LoCost). Protocols CoV-GLUE: A Web Application for Tracking SARS-CoV-2 Genomic Variation IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies MUSCLE: multiple sequence alignment with high accuracy and high throughput Escape from neutralizing antibodies 1 by SARS-CoV-2 spike protein variants Analysis of RNA sequences of 3636 SARS-CoV-2 collected from 55 countries reveals selective sweep of one virus type Structural basis of receptor recognition by SARS-CoV-2 Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies Prospective mapping of viral mutations that escape antibodies used to treat Emergence in late 2020 of multiple lineages of SARS-CoV-2 Spike protein variants affecting amino acid position 677 Stability of SARS-CoV-2 phylogenies This work would not have been possible without the support of the Government of Karnataka, State Surveillance team for COVID-19, in particular Ms. Prameela Dinesh, Directorate of Health and Family Welfare Services, Government of Karnataka. We would like to thank all the labs and Primary Health Care centres that collected samples for testing and genomic surveillance. We would like to thank the COVID testing lab in NIMHANS. We would also like to acknowledge Prof. Supplementary Table 1-Summary of sequencing results Supplementary Table 2 -GISAID Accession ID for sequences Supplementary Table 3 -Details of sequenced samples Supplementary Table 4 The authors declare that they do not have any financial or non-financial relationships that could present a conflict of interest.