key: cord-0321012-ckqi8m2e authors: Umair, M.; Ikram, A.; Salman, M.; Badar, N.; Haider, S. A.; Rehman, Z.; Ammar, M.; Rana, M. S.; Ali, Q. title: Detection and whole-genome sequencing of SARS-CoV-2 B.1.617.2 and B.1.351 variants of concern from Pakistan during the COVID-19 third wave date: 2021-07-15 journal: nan DOI: 10.1101/2021.07.14.21259909 sha: f454004d2353640cc74cfdcebab391a3a67e5cb7 doc_id: 321012 cord_uid: ckqi8m2e The viral lineages reflecting variants of concern have emerged worldwide and among them B.1.1.7 (Alpha), B.1351 (Beta) and B.1.617.2 (Delta) variants are the most significant ones and merit close monitoring. In Pakistan, very limited information is available on the circulation of these variants and only the alpha variant has been reported as the main circulating lineage. The objective of this study was to detect and explore the genomic diversity of B.1.351 and B.1.617.2 during the third wave in the indigenous population. During the current study, a total of 2274 samples were tested on real-time PCR for the presence of SARS-CoV-2 from May 14 to May 31, 2021, and among these, 17% were spike positive, whereas 83% of samples had the spike gene target failure (SGTF). From these spike positive samples, 22 samples were processed for whole-genome sequencing. Among VOCs, 45.5% (n=10) belonged to B.1.351 followed by B.1.617.2, 36% (n=8). The delta variant cases were reported mostly from Islamabad (n = 5; 63%) followed by Peshawar and Azad Kashmir (n = 1; 13% each). Beta variant cases originated from Islamabad (n=5; 56%), Peshawar (n=2; 22%), Azad Kashmir and Rawalpindi (n=1; 11% each). The mutation profile of delta variant isolates reported significant mutations, L452R, T478K, P681R, and D950N. The beta variant isolates reported characteristic mutations, D215G, K417N, E484K, N501Y, and A701V. Notably, a characteristic mutation, E484Q was also found in delta variant, B.1.617.2. Our current findings suggest detection of these VOCs from the local population and warrants large-scale genomic surveillance in the country. In addition, it provides timely information to the health authorities to take appropriate actions. The COVID-19 Pandemic is still one of the leading causes of infectious mortality worldwide in the advent of the emergence of novel SARS-CoV-2 variants. As of June 20, 2021, the global total of SARS-CoV-2-related infections has surpassed over 179.1 million, with 3.86 million deaths. In recent months, a diversification of SARS-CoV-2 has been observed globally as a result of evolution and adaptation processes. Some emerging mutations may confer a selective advantage to the virus, resulting in the selection of the "variants of concern" (VOCs) with significant epidemiological and pathogenic consequences [1, 2] . Four specific viral lineages reflecting variants of concern have emerged worldwide and warrant close monitoring: B.1.1.7 (alpha), B.1.351 (beta), P.1 (gamma), and B.1.617.2 (delta) [3] . Among them, alpha, beta, and delta variant has been recently the most important Variant of Concern which has contributed significantly to the upsurge of new waves worldwide. The beta variant, B.1.351, is reported in about 95 countries including Pakistan as of June 20, 2021 , and is characterized by seven different lineage defining variants in the spike protein region, with three significant mutations, N501Y, E484K and K417N at particular residue sites called receptor-binding domain (RBD) [4] . There is evidence that South African strains have increased transmissibility as a result of the N501Y substitution [5] due to the increased binding affinity of human angiotensin-converting enzyme 2 (ACE2) in a mouse model, as determined by deep mutation scanning [6, 7] . Additionally, the South African variant may provide an immune escape mechanism from antibodies due to the E484K mutation in the spike protein [8] . It was also recently identified as a variant capable of evading monoclonal and serum antibody responses [9, 10] . The variant B.1.617 originated in Maharashtra and has since spread throughout the state and to a number of other countries. The sub-lineage, B.1.617.1 is defined by the presence of constellation of mutations, L452R, P681R and E484Q in spike region, whereas B.1.617.2 is characterized by the spike mutations, L452R, P681R and T478K. The RBD mutations enhance infectivity due to presence of L452R and T478K by increasing the spike protein's affinity for the human ACE2 receptor [11, 12] . Both mutations reduce the binding affinity of monoclonal antibodies, thereby impairing their neutralizing ability. Additionally, structural analysis of mutations in the furin cleavage site, L452R, and E484Q, as well as P681R, revealed increased ACE2 binding and cleavage rate, resulting in increased transmissibility [13] . Keeping in view the emergence of . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 15, 2021. ; B.1.351 and B.1.617.2 variants of concern, we aimed to detect and explore the genomic diversity of these VOCs in Pakistan. Oropharyngeal samples (n = 2274) were collected from the COVID-19 suspected patients received at the National Institute of Health's Department of Virology in Islamabad. As the spike gene target failure (SGTF) was used as a proxy for the detection of the B.1.1.7 variant, on TaqPath TM kit (ThermoFisher Scientific, Waltham, US) [13] . This detection method can also be used the other way around for surveillance of lineages other than B.1.1.7. Based on this strategy, for whole-genome sequencing, we initially screened spike positive samples having low cycle threshold (Ct) values (< 27 of S gene) followed by selection based on their geographical location i.e., representing the whole country. The viral RNA was isolated from patient samples according to the standard protocol using a QIAamp Viral RNA Mini Kit (Qiagen, Germany). The cDNA synthesis and amplification were performed according to the Primal-Seq Nextera XT protocol (version 2) using SuperScript™ IV VILO™ Master Mix (Invitrogen, USA) and Q5® High-Fidelity 2X Master Mix (New England BioLabs, USA), with the ARTIC nCoV-2019 Panel (Integrated DNA Technologies, Inc, USA) [14] . The paired-end sequencing library (2x150 bp) was prepared from the generated amplicons using the Illumina DNA Prep Kit (Illumina, Inc, USA) by following the standard protocol. The prepared libraries were pooled and subjected to sequencing on Illumina platform, iSeq using sequencing reagent, iSeq 100 i1 Reagent v2 (300-cycle) (Illumina, Inc, USA) at Department of Virology, National Institute of Health, Islamabad, Pakistan. The FastQC tool (v0.11.9) was used to assess the read quality of sequenced files [16] . Trimmomatic (v0.39) [15] was employed to remove Illumina adapter sequences and low-quality . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.07.14.21259909 doi: medRxiv preprint base calls with scores less than 30 to eliminate technical biases and artifacts. The filtered reads were assembled by aligning with the available reference genome of SARS-CoV-2 (Accession number: NC_045512.2) using the Burrows-Wheeler Aligner's (BWA, v0.7.17) default settings [16] . Picard Tools was used to remove PCR duplicates from aligned reads (v2.25.4) [17] . The variants were called and consensus sequences of all genomes were generated as per Centers for Diseases Control and Prevention (CDC, USA) guidelines [18] . The assembled genomes were classified into PANGO lineages using the Pangolin v3.1.5 and pangoLEARN model dated 15-06-2021 [19] . Nextstrain's standard protocol for analyzing SARS-CoV-2 genomes was used for the phylogenetic analysis. To begin, BLAST searches were conducted on the GISAID database against all SARS-CoV-2 genome sequences for each of the current study's beta and delta isolates. This resulted in a total of 78 sequences including the current study's sequences. The sequences were clustered using Augur, Nextstrain's phylodynamic pipeline [20] . Alignment of the sequences to the Wuhan reference genome was performed using MAFFT v7.470 [21] . The initial phylogenetic tree was constructed using IQTREE v1.6.12 [22] , which implements the Augur tree using the generalized time reversible (GTR) model and performs bootstrapping to ensure a high degree of confidence in the tree topology. Using the Wuhan reference genome (GISAID ID: EPI_ISL_406798), the raw tree was rooted. TreeTime v0.8.1 was used to further process the tree to generate a time-resolved phylogeny based on maximum likelihood [23] . The resulting tree was visualized using Auspice. Table 1 . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.07.14.21259909 doi: medRxiv preprint To determine the origin of Pakistani isolates, we constructed a maximum likelihood phylogenetic tree using 79 SARS-CoV-2 virus full-length genomes. In the phylogenetic tree, eighteen viruses from the Pakistani population were dispersed across 2 SARS-CoV-2 clades, GH (Beta) and G (Delta) (Figure 4 ) depicting close similarities with isolates originating from Asia, Europe, and North America. The beta-variant infected patients (GISAID ID: EPI_ISL_2438665 and EPI_ISL_2438683) having a travel history of the Middle East, clustered mostly with USA, India, and England viral isolates ( Figure 5 ). The close similarities of one delta variant patient (travel history of Saudi Arabia) were observed mostly with Bangladesh followed by India and England viral isolates (GISAID ID: EPI_ISL_2434174). Also, another case (GISAID: EPI_ISL_2438666) with a Middle East travel history showed close association with Indian isolates (Figure 6 ). This result suggests the introduction of VOCs through the importation of cases from different countries. The COVID-19 pandemic has affected every continent, resulting in a global health crisis that has been aggravated by the recent emergence of different variants of the virus. Amongst the SARS-CoV-2 variants of concern, B.1.1.7 (Alpha) became the predominant lineage worldwide after the initial detection from the UK in September 2020. The first case of B.1.1.7 from Pakistan was reported in December 2020 [24] from 77 countries and linked to the recent surge in COVID-19 cases in regional countries such as Nepal, Bangladesh, and Russia with the reported prevalence of 73%, 50%, and 36% respectively [26] . Similarly, UK has witnessed a rise in cases due to the delta variant during the first three . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.07.14.21259909 doi: medRxiv preprint weeks of June 2021 with health authorities fearing a possible third wave. The first case of B.1.617.2 from Pakistan was a male from Islamabad whose sample was collected in May 2021 and sent to NIH for laboratory testing. We found a high number of delta variant cases (n=8) from Islamabad followed by Azad Kashmir (n=1) and Rawalpindi (n=1) in the current study highlighting the hotspot regions of the country. To restrict these VOCs from community transmission, efficient screening and strict quarantine protocols for international flights need to be implemented. Pakistan has confronted a four-month-long (March-June 2021) third wave of COVID-19 with the highest positivity rate reaching 11.6% in April and has declined since then to 2.35% as of 20 June 2021 [28] . Pakistan has recently eased the lockdown restrictions i.e., opening of restaurants, tourism sector, educational institutes, outdoor marriage ceremonies starting from May 24, 2021 [29] . However, the circulation of delta variant (the most transmissible form of the virus) in the indigenous population provides an early warning to national health authorities to take timely decisions. This also highlights the need for enhanced genomic surveillance to track the virus spread and devise suitable interventions. All sequences generated in this study have been submitted to GISAID under the following . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.07.14.21259909 doi: medRxiv preprint We declare no competing interest. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.07.14.21259909 doi: medRxiv preprint Table 1 : Details of the patients infected with VOCs along with the amino acid mutations found in the study enrolled samples. Lab is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint : Phylogenetic distribution of beta and delta lineages in 78 SARS-CoV-2 viral genomes from around the world, including current study sequences from Pakistan with reference to the Wuhan reference genome (EPI ISL 402125). The maximum likelihood phylogenetic tree was constructed using Nextstrain's Augur tree pipeline, and IQ-TREE was run with the default parameters. The time-resolved phylogenetic tree with selected metadata information was constructed using TreeTime and visualized in Auspice. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.07.14.21259909 doi: medRxiv preprint COVID-19 evolution during the pandemic-Implications of new SARS-CoV-2 variants on disease control and public health policies Genetics and genomics of SARS-CoV-2: A review of the literature with the special focus on genetic diversity and SARS-CoV-2 genome detection WHO, Tracking SARS-CoV-2 variants Molecular dynamic simulation reveals E484K mutation enhances spike RBD-ACE2 affinity and the combination of E484K, K417N and N501Y mutations (501Y. V2 variant) induces conformational change greater than N501Y mutant alone, potentially resulting in an escape mutant The N501Y spike substitution enhances SARS-CoV-2 transmission Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding Adaptation of SARS-CoV-2 in BALB/c mice for testing vaccine efficacy SARS-CoV-2 501Y. V2 escapes neutralization by South African COVID-19 donor plasma. Nature medicine Complete mapping of mutations to the SARS-CoV-2 spike receptorbinding domain that escape antibody recognition. Cell host & microbe Comprehensive mapping of mutations to the SARS-CoV-2 receptorbinding domain that affect recognition by polyclonal human serum antibodies Complete map of SARS-CoV-2 RBD mutations that escape the monoclonal antibody LY-CoV555 and its cocktail with LY-CoV016 SARS-CoV-2 RBD in vitro evolution follows contagious mutation spread, yet generates an able infection inhibitor Convergent evolution of SARS-CoV-2 spike mutations, L452R, E484Q and P681R, in the second wave of COVID-19 in Maharashtra nCoV-2019 sequencing protocol v2 (GunIt) V. 2. Protocols. io FastQC: a quality control tool for high throughput sequence data Fast and accurate long-read alignment with Burrows-Wheeler transform Rapid, sensitive, full-genome sequencing of severe acute respiratory syndrome coronavirus 2 Pangolin: lineage assignment in an emerging pandemic as an epidemiological tool Augur: a bioinformatics toolkit for phylogenetic analyses of human pathogens MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular biology and evolution TreeTime: Maximum-likelihood phylodynamic analysis Importation of SARS CoV 2 variant B. 1.1. 7 in Pakistan Proliferation of SARS-CoV-2 B. 1.1. 7 Variant in Pakistan-A Short Surveillance Account. Frontiers in Public Health Karthik Gangavarapu, Laura D. Hughes, and the Center for Viral Systems Biology. outbreak.info Effect of bamlanivimab as monotherapy or in combination with etesevimab on viral load in patients with mild to moderate COVID-19: a randomized clinical trial Ministry of Health Services & Regulation NCOC allows staggered reopening of educational institutes, outdoor dining from