key: cord-0784127-ge6rqr7b authors: Nahid, Abdullah Al; Ghosh, Ajit title: Investigating the possible origin and transmission routes of SARS-CoV-2 genomes and variants of concern in Bangladesh date: 2021-05-27 journal: bioRxiv DOI: 10.1101/2021.05.24.444482 sha: 5584a7b6c8e98f6f47ec904115f47b1682bac90b doc_id: 784127 cord_uid: ge6rqr7b The COVID-19 pandemic induced by the SARS-CoV-2 virus and its variants has ravaged most countries around the world including Bangladesh. We have analyzed publicly available genomic data to understand the current COVID-19 outbreak scenario as well as the evolutionary origin and transmission routes of SARS-CoV-2 isolates in Bangladesh. All the early isolates as well as recent B.1.1.7 and B.1.351 variants had already spread across the major divisional cities of Bangladesh. A sex biasness towards male COVID-19 patient samples sequencing has observed over female in all age-group, that could be the trend in infection rate. Phylogenetic analysis indicated a total of 13 estimated countries, including Italy, India, United Kingdom, Saudi Arabia, United Arab Emirates, Germany, Australia, New Zealand, South Africa, Democratic Republic of the Congo, United States, Russia, and Denmark, could be the possible origin introduced SARS-CoV-2 isolates in Bangladesh because of regional and intercontinental travel. Recent, B.1.1.7 variant could be imported from a total of 7 estimated countries including UK, India, Nigeria, Spain, Ireland, Australia, and Indonesia, while South Africa and the United States are the most likely sources of B.1351 variant in Bangladesh. Based on these findings, public health strategies could be designed and implemented to reduce the local transmission of the virus. studies indicate the B1.351 variant is associated with higher viral load with potential advantage of enhanced transmissibility or immune escape 14, [18] [19] [20] . This variant may also amplify the risk of infection in people who have already been immunized 21 . As COVID-19 continues to wreak havoc due to rapid transformation of SARS-CoV-2, Bangladesh has begun a nationwide inoculation drive in order to restrain the virus, and administered more than 9.6 million doses of the Oxford-AstraZeneca vaccine to date (https://github.com/owid/covid-19data/blob/master/public/data/vaccinations/country_data/Bangladesh.csv). However, the daily COVID-19 cases and deaths in the country have seen a steep rise recently 6 , which is found to be correlated with the increasing detection of B.1.351 variant circulating in the country 22 (https://www.icddrb.org/news-and-events/news?id=874). In order to identify such emerging variants in the country, and monitor the viral evolution on a genomic level, scientists from Bangladesh have sequenced over 1,500 SARS-CoV-2 genomes, and deposited the sequences in the Global Initiative on Sharing All Influenza Data (GISAID) database 23 . Analyses of these genomic data may aid in understanding the genetical and evolutionary features of the virus 24 , as well as in evaluating the outcome of various disease control strategies in Bangladesh, ranging from quarantine measures to travel restrictions, both locally and internationally, to reduce the transmission rate of evolving lineages. An epidemiological analysis on genome samples from Bangladesh revealed regional sources of the isolates including variants rather than country-specific origin information A total of 97,040 unique SARS-CoV-2 genome sequences and metadata were downloaded from the GISAID database 23 and subsequently divided into 4 subsets ( Sample metadata of all subsets were checked in terms of data availability and any sample with incomplete collection date were excluded. Prior to phylogenetic analysis, G2, G3 and G4 subset samples were quality filtered and genome sequences with ambiguous characters were omitted. The metadata of G1 subset samples was analyzed with Python version 3.8.5 27 to investigate the Nextstrain clade distribution among divisions, as well as the gender and age-group distribution in Bangladeshi samples. G1 samples with no division data were labeled NA (Not Available), and the missing gender and age-group data were discarded before plotting with ggplot2 version 3.3.3 28 and tidyverse version 1.3.0 29 packages of R programming language version 3.6.3 30 . Quality filtered sequences of G2 subset were aligned to the reference SARS-CoV-2 sequence (NC_045512.2) 31 using MAFFT version 7.475 32 . The aligned genomes were subsampled afterwards. All 612 genomes from Bangladesh within the G2 subset were assigned as focal samples during the subsampling process. In addition, 10 samples per month were chosen from 48 other countries and territories available in the subset based on genetic similarity to focal samples. To better understand the current COVID-19 situation in Bangladesh, the pattern of daily reported COVID-19 cases was analyzed. The highest peak of nearly 8,000 confirmed cases per day was recently reported in the early April 2021 (Fig. 1A) . The metadata of 1009 complete human-host SARS-CoV-2 genomes from Bangladesh (Dataset S1) accessible via GISAID 23 were examined. The overall Nextstrain clade distribution across all the major divisional cities of Bangladesh ( Fig. 1B) VOCs are now dispersed in Bangladesh. We have also looked at the gender and age-group distribution within the available samples and found out that samples from male patients are more frequently sequenced in almost every age-group except 10-19, than female patients ( Fig. 1C and Supplementary Table S4 ). An independent phylogenetic analysis has been performed using a curated collection of 86,926 SARS-CoV-2 genome sequences covering all regions and most countries/territories where the COVID-19 infection rate was high during the early stages of the pandemic to precisely trace the origin of initial SARS-CoV-2 isolates of Bangladesh. The generated phylogenetic tree containing a total of 1,966 subsampled genomes, showed that all 612 high-coverage genome sequences from Bangladesh are scattered across the tree (Fig. 2) In order to identify the origin of the B.1.351 variant isolates in Bangladesh, we analyzed 3, 196 samples from 31 countries and territories, the bulk of which were B.1.351 variant. The resulting phylogenetic tree (Fig. 4) As the global pandemic of COVID-19 progresses, new SARS-CoV-2 variants that are highly pathogenic and possibly more transmissible in nature than pre-existing variants continues to emerge due to mutational changes in the viral genome, causing a second wave of COVID-19 in several regions and countries around the world 35, 36 . From the COVID-19 confirmed cases data, a severe second wave is detected in Bangladesh from late March to early April 2020 (Fig. 1A) as Chattogram, Mymensingh, and Khulna (Fig. 1B) . A continuous trend of male-female disparity in terms of sample sequencing in almost every age group was observed (Fig. 1C) , with more male patient samples being sequenced than female patient samples. This is most likely attributed to the fact that male COVID-19 patients has a far higher infection and mortality rate than females, as several studies have reported similar gender-based differentiation of COVID-19 severity in Fig. 2-4) , a total of 17 unique countries from all regions of the world, except South America, have collectively introduced initial SARS-CoV-2 isolates as well as two analyzed VOCs into Bangladesh (Fig 5) . Of these countries, initial This study illustrates the contribution of both regional and intercontinental travel in the spread of SARS-CoV-2 VOCs in Bangladesh. Continuous genomic surveillance is crucial while the COVID-19 pandemic unfolds in order to closely monitor the ever-changing circulation pattern of SARS-CoV-2 variants in Bangladesh since these findings have major ramifications for mitigation and vaccination tactics. Given how quickly new variants can propagate across the world due to air travel without being detected for a long time, these findings are critical for administrative authorities and public health officials in terms of implementing effective interventions and guidelines including but not limited to travel restrictions and mandatory quarantine or lockdown measurements to prevent further proliferation of these variants in Bangladesh. Immunization efforts must also be continued and enforced to keep the infection rates down. The authors declare that all the data will be available without any restrictions. The species Severe acute respiratory syndrome-related coronavirus : classifying 2019-nCoV and naming it SARS-CoV-2 A pneumonia outbreak associated with a new coronavirus of probable bat origin Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: Structural genomics approach Characterization of accessory genes in coronavirus genomes An interactive web-based dashboard to track COVID-19 in real time Impact of Lockdown Measures and Meteorological Parameters on the COVID-19 Incidence and Mortality Rate in Bangladesh Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Structural and Functional Analysis of the D614G SARS-CoV-2 Spike B.1.1.7 Escape of SARS-CoV-2 501Y.V2 from neutralization by convalescent plasma SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma Estimates of severity and transmissibility of novel SARS-CoV-2 variant 501Y.V2 in South Africa Sensitivity of infectious SARS-CoV-2 B.1.1.7 and B.1.351 variants to neutralizing antibodies COVID-19 rise in Bangladesh correlates with increasing detection of B.1.351 variant Global initiative on sharing all influenza data -from vision to reality Genome-wide in silico identification and characterization of Simple Sequence Repeats in diverse completed SARS-CoV-2 genomes In silico comparative genomics of SARS-CoV-2 to determine the source and diversity of the pathogen in Bangladesh Nextstrain: real-time tracking of pathogen evolution Python 3 Reference Manual Elegant Graphics for Data Analysis Welcome to the tidyverse R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing A new coronavirus associated with human respiratory disease in China MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era Maximum-likelihood phylodynamic analysis Second wave COVID-19 pandemics in Europe: a temporal playbook The first and second waves of the COVID-19 pandemic in Africa: a crosssectional study Considering how biological sex impacts immune responses and COVID-19 outcomes Insights into the first wave of the COVID-19 pandemic in Bangladesh: Lessons learned from a high-risk country Sex differences in immune responses that underlie COVID-19 disease outcomes Genetic analysis of SARS-CoV-2 isolates collected from Bangladesh: Insights into the origin, mutational spectrum and possible pathomechanism Authors acknowledge the logistic support and laboratory facilities of the Department of Biochemistry and Molecular Biology, Shahjalal University of Science and Technology, Sylhet, Bangladesh. The authors declare that there is no competing interest. There was no funding for this study.