key: cord-0741691-z14rf85c authors: Kanteh, Abdoulie; Manneh, Jarra; Jabang, Sona; Kujabi, Mariama A.; Sanyang, Bakary; Oboh, Mary A.; Bojang, Abdoulie; Jallow, Haruna S.; Nwakanma, Davis; Secka, Ousman; Roca, Anna; Amambua-Ngwa, Alfred; Antonio, Martin; Baldeh, Ignatius; Forrest, Karen; Samateh, Ahmadou Lamin; D’Alessandro, Umberto; Sesay, Abdul Karim title: Origin of imported SARS-CoV-2 strains in The Gambia identified from Whole Genome Sequences date: 2020-04-30 journal: bioRxiv DOI: 10.1101/2020.04.30.070771 sha: e578a75e4576c315db6c2f0914a95c60498c5a7b doc_id: 741691 cord_uid: z14rf85c Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a positive-sense single stranded RNA virus with high human transmissibility. This study generated Whole Genome data to determine the origin and pattern of transmission of SARS-CoV-2 from the first six cases tested in The Gambia. Total RNA from SARS-CoV-2 was extracted from inactivated nasopharyngeal-oropharyngeal swabs of six cases and converted to cDNA following the ARTIC COVID-19 sequencing protocol. Libraries were constructed with the NEBNext ultra II DNA library prep kit for Illumina and Oxford Nanopore Ligation sequencing kit and sequenced on Illumina MiSeq and Nanopore GridION, respectively. Sequencing reads were mapped to the Wuhan reference genome and compared to eleven other SARS-CoV-2 strains of Asian, European and American origins. A phylogenetic tree was constructed with the consensus genomes for local and non-African strains. Three of the Gambian strains had a European origin (UK and Spain), two strains were of Asian origin (Japan). In The Gambia, Nanopore and Illumina sequencers were successfully used to identify the sources of SARS-CoV-2 infection in COVID-19 cases. The emerging and re-emerging of pathogens such as severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2) pose a grave threat to human health 1 . The SARS-CoV-2 disease, first detected in Wuhan, China, in December 2019 has become a global pandemic 2 and is causing an unprecedented burden on the health care systems and economies globally [3] [4] [5] [6] . Worldwide, the number of cases has been increasing exponentially 6 , especially in Europe and America, with significant but variable case-fatality rates between continents. By April 28 th , 2020, there were more than 3.1 million SARS-CoV-2 confirmed cases and more than 200,000 deaths 7 . Nevertheless, SARS-CoV-2 confirmed cases in sub-Saharan Africa are currently relatively low, possibly due to much lower international air traffic than in other continents and thus a low number of imported cases 9 . By the 28 th April 2020, The Gambia, a tourism hotspot, had reported a total of ten SARS-CoV-2 cases, including one death. While the travel history of index cases may suggest the origin of infection, phylogenetic analysis of the strains isolated from these cases and contacts will provide a precise link between local transmission and other global populations. The first SARS-CoV-2 case was reported to be an acquired zoonotic infection 10, 11 , followed by efficient and rapid human-to-human transmission from Wuhan, China, to other Asian countries and then other continents [12] [13] [14] . The single stranded positive sense RNA genome of the SARS-CoV-2 is closely related to the Middle East Respiratory Syndrome-Coronavirus (MERS-CoV) and the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) [10] [11] [12] [13] [14] [15] . These pathogens pose significant risk to global health and modern-day life, hence the need for effective strategies to detect the sources of infections, outbreaks and transmission patterns in different geographical settings. The phylogenetic analyses of global SARS-CoV-2 sequences provide insight into the relatedness of strains from different areas and suggest the transmission of four superclades 16 geographically clustering into viral isolates from Asia (China), US (two super clades) and Europe. The objective of this analysis was to provide genome data on six cases of SARS-CoV-2 in The Gambia, determine the source of these strains, baseline for subsequent local transmission, and contribute genomic diversity data towards local and global vaccine design. The Oxford Nanopore GridION and Illumina MiSeq platforms were utilized to sequence the viral genomes from four confirmed SARS-CoV-2, one inconclusive and one negative case by rRT-PCR. We also analysed the genomes of samples classified as indeterminate and negative by RT-PCR (COVID-19 detection assay) from two different cases respectively. For WGS, four SARS-CoV-2 confirmed cases, one indeterminate case and one negative case were processed (Table 1 ). In one of the confirmed cases, different isolates from samples collected up to 10 days apart were sequenced. Of the 6 cases sequenced, 4 were male; 2 female, there was one death, two recoveries and two active cases. Total RNA was purified from eleven samples (see Table 1 ) using the QiaAmp viral RNA mini kit (Qiagen -52906) following viral inactivation at the MRCG at LSHTM containment level 3 facility. The purified RNA samples were quantified using Qubit RNA reagent kit on a Qubit fluorometer 3.0 (concentration range 3-7 ng/μl) (Invitrogen). RNA integrity (RINe) was checked on the Agilent Tapestation 4200 ( Figure 1 ) yielding a RINe range of 2.1-5. Two of the samples (day 0 and 4) from Case A were depleted using the RiboMinus transcription isolation kit from ThermoFisher and purified using RNA purification beads from Beckman Coulter. The purified rRNA-depleted samples were converted to cDNA as per the NEBNext ultra II RNA library prep kit for Illumina (NEB, E7770L). Total RNA from the rest of the samples was converted to cDNA according to the ARTIC amplicon sequencing protocol for SARS-CoV-2 17 . ARTIC protocol primer 17 schemes for SARS-CoV-2 (Version 2) were used for the multiplex PCR. Two primer pools at 10 μM containing 98 primers each were used for the PCR amplification. The samples were subjected to 35 cycles of PCR. The purified products were visualised and quantified. The purified cDNA from the depletion and PCR products from the ARTIC protocol were normalised to 100 ng with EB buffer (10 mM Tris-HCl) to a final volume of 25 µl for Illumina library preparation using the NEBNext ultra II DNA library prep kit for Illumina (New England Biolabs, UK; E7645). Following 7 cycles of PCR enrichment, the libraries were purified and quantified using the high sensitivity dsDNA Qubit kit and sized using D1000 ScreenTape on the Agilent Tapestation 4200 (amplicon size range 519-572 bp). Each sample was normalised to 10 nM before pooling. The pool was run at a final concentration of 10 pM on an Illumina MiSeq instrument using MiSeq V3 reagent kit. The pool was denatured with sodium hydroxide according to Illumina recommendation and spiked with 5% PhiX (PhiX control v3 Illumina Catalogue FC-110-3001) before loading (Fig. 1 ). Summary of the Library preparation steps for Illumina and Oxford Nanopore Sequencing Technology platforms. Library preparation took ~ 8 hours for the Nanopore workflow and ~10 hours for the Illumina workflow. Nanopore sequencing library preparation was performed according to the manufacturer's instructions for the Ligation Sequencing Kit (SQK-LSK109, Oxford Nanopore Technologies). Briefly, the cDNA samples were amplified using the ARTIC protocol and purified with 1X AMPure XP beads. Individual samples were then subjected to end repair and adapter ligation following SQK-LSK109 protocol. 20 ng of each library was loaded on the Oxford Nanopore GridION on individual R9.4.1 flow cells and sequencing data monitored on the fly using Rampart (v1.1.0). Although a minimum read depth of 30X for the SARS-CoV-2 genome was targeted, more than 100X coverage was generated on both platforms. FASTQ files were subjected to various quality control checks and analysed following standard analysis pipelines (SARS-CoV-2 novel Coronavirus bioinformatics protocol; SAMTOOLS). For Nanopore data, sequencing reads were quality checked using MinIONQC 18 and only reads with a minimum Q score of 7 were included in our subsequent analysis. Quality checked reads were run through what's in My Pot (WIMP) pipeline on the Oxford Nanopore EPI2ME platform to verify the number of reads characterised as SARS-CoV-2. We used SARS-CoV-2 novel Coronavirus bioinformatics protocol developed by Nick Prank (v140603) was used to generate a multiple alignment of all the samples including some available reference genomes around the globe (Downloaded from RefSeq). These strains were selected based on the patients' travel history and the major geographical spread of the pandemic. We finally constructed a maximum likelihood phylogenetic tree using the General time reversible model (GTR) with IQTREE (v1.3.11.1). The Interactive Tree of Life (ITOL) (v5) was used to visualise and annotate the phylogenetic tree. Whole genome sequencing data was generated from six confirmed cases from both sequencing platforms; the additional time points from cases D and E were sequenced only on the Nanopore GridION (Table 2 ). Two samples from the first case were sequenced on both platforms following ribosomal depletion, the results generated (not included) showed depletion of human sequences and the majority of the reads mapped to bacterial sequences with only 0.03% from the Illumina reads mapping to the SARS-CoV-2 reference strain. The rRT-PCR and the sequencing data generated are summarized in table 2. Although WGS data is still limited in sub-Saharan Africa, this approach has proven to be a highly sensitive, specific and confirmatory tool for SARS-CoV-2 detection. Hence, the use of second and third generation sequencing technologies coupled with bioinformatics is quite imperative in providing data for monitoring transmission dynamics. From the two sequencing platforms, we were able to rapidly generate sequencing data, in 20 hours and 3 days after sample reception on the Nanopore and Illumina platforms, respectively. While Illumina sequencing may be more accurate in determining within-sample-diversity, Nanopore data can help with the understanding of the linkage between SNPs within individual virions. The Nanopore platform with its flexibility for number of samples per run, and the generation of data in real-time and at a reasonable cost makes it most suitable for outbreaks. Therefore, with our optimised and ready-to-go workflow, we are set to generate data for tracking SARS-CoV-2 in The Gambia and other African countries within 24 hours of sample reception. This would go a long way in providing knowledge on the molecular epidemiology of this disease, give the true burden of the disease in this setting (as seen in the resolution of the indeterminate cases) as well as provide information for African specific vaccine development and inform policy makers on decisions for strategic control measures. We have demonstrated that the Nanopore platform with the flexibility of high-end A novel coronavirus from patients with pneumonia in China WHO 2020 Director-General's opening remarks at the media briefing on IHME COVID-19 health service utilization forecasting team. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator days and deaths by US state in the next 4 months McKinsey and company; COVID-19: Implications for business World finance (2020) A world of hurt: how pandemics such as COVID-19 affect the global economy COVID-19 and Italy: What next? Insight into 2019 novel coronavirus -an updated intrim review and lessons from SARS-CoV and MERS-CoV Mingxuan COVID-19 pandemic in West Africa. The Lancet Global Health Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Zoonotic origins of human coronaviruses The Extent of Transmission of Novel Coronavirus in Wuhan, China, 2020 Monitoring Transmissibility and Mortality of COVID-19 in Europe First known person-to-person transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the USA Genomic characterization of the 2019 novel humanpathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Comparative genomics suggests limited variability and similar evolutionary patterns between major clades of SARS-Cov-2 2020) nCov-2019 sequencing protocol. ARTIC coronavirus method development community MinIONQC: fast and simple quality control for MinION sequencing data Bioinformatics Andrew Rambaut (2020) nCoV-2019 novel coronavirus bioinformatics protocol FastQC: a quality control tool for high throughput sequence data Available Why are RNA virus mutation rates so damn high? Moderate mutation rate in the SARS coronavirus genome and its implications available at We acknowledge the use of CLIMB server for the cloud-based analysis, the field sample collection by the teams at Ministry of Health, Epidemiology Department, Thushan de Silva for helpful discussion on ARTIC protocol and sequencing, Covid-19 The authors declare that they have no competing interests.The Genomic Core facility at MRCG at LSHTM is the one and only certified service provider for the ONT GridION platform in Africa. The details of methods used in the paper is available as a supplementary document. The data from the genomes sequenced in the Gambia were submitted and available in Nextstrain website for real-time tracking of the pathogen evolution.