key: cord-298321-8871aifz authors: Laamarti, Meriem; Kartti, Souad; Alouane, Tarek; Laamarti, Rokia; Allam, Loubna; Ouadghiri, Mouna; Chemao-Elfihri, M.W.; Smyej, Imane; Rahoui, Jalila; Benrahma, Houda; Diawara, Idrissa; Essabbar, Abdelomunim; Boumajdi, Nasma; Bendani, Houda; Bouricha, El Mehdi; Aanniz, Tarik; Elattar, Jalil; Hafidi, Naima EL; Jaoudi, Rachid EL; Sbabou, Laila; Nejjari, Chakib; Amzazi, Saaid; Mentag, Rachid; Belyamani, Lahcen; Ibrahimi, Azeddine title: Genetic analysis of SARS-CoV-2 strains collected from North Africa: viral origins and mutational spectrum date: 2020-07-01 journal: bioRxiv DOI: 10.1101/2020.06.30.181123 sha: doc_id: 298321 cord_uid: 8871aifz In Morocco two waves of SARS-CoV-2 infections have been recorded. The first one occurred from March 02, 2020 with infections mostly imported from Europe and the second one dominated by local infections. At the time of writing, the genetic diversity of Moroccan isolates of SARS-CoV-2 has not yet been reported. The present study aimed to analyze first the genomic variation of the twenty-eight Moroccan strains of SARS-CoV-2 isolated from March 03, 2020 to May 15, 2020, to compare their distributions with twelve other viral genomes from North Africa as well as to identify their possible sources. Our finding revealed 61 mutations in the Moroccan genomes of SARS-CoV-2 compared to the reference sequence Wuhan-Hu-1/2019, of them 23 (37.7%) were present in two or more genomes. Focusing on non-synonymous mutations, 29 (47.54%) were distributed in five genes (ORF1ab, spike, membrane, nucleocapsid and ORF3a) with variable frequencies. The non-structural protein coding regions nsp3-Multi domain and nsp12-RdRp of the ORF1ab gene harbored more mutations, with six for each. The comparison of genetic variants of fourty North African strains revealed that two non-synonymous mutations D614G (in spike) and Q57H (in ORF3a) were common in four countries (Morocco, Tunisia, Algeria and Egypt), with a prevalence of 92.5% (n = 37) and 42.5% (n = 17), respectively, of the total genomes. Phylogenetic analysis showed that the Moroccan and Tunisian SARS-CoV-2 strains were closely related to those from different origins (Asia, Europe, North and South America) and distributed in different distinct subclades. This could indicate different sources of infection with no specific strain dominating yet in in these countries. These results have the potential to lead to new comprehensive investigations combining genomic data, epidemiological information and the clinical characteristics of patients with SARS-CoV-2. The new coronavirus 2019, also known as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) (1) is the causative agent of COVID-19, a new type of pneumonia that caused in late December, 2020, an epidemic in Wuhan, China, and then spread to 215 countries around the world. In February, 2020, COVID-19 was emerged in North African countries, notably in Egypt, Tunisia, Algeria and Morocco (2, 3) . The first case was reported in Egypt on February 14, followed by Algeria on February 25, then Morocco and Tunisia on the same day, March 2, 2020 (2, 3) . Due to the rapid transmission of viruses in the 5 continents and the large number of confirmed cases, the World Health Organization (WHO) has declared (March 11, 2020) COVID-19 as a global pandemic (4) . As of June 26 th 9,473,214 and 484,249 (5.11%) of confirmed and deceased cases, respectively, have been reported worldwide (5) . It should be noted that mortality from SARS-CoV-2 differs considerably according to the geographic region. USA has the largest population of confirmed cases (2,367,064) and deaths (121,645) (5) . Meanwhile, South America and Europe were also hit hard with 1,188,631 and 620,794 confirmed cases in Brazil and Russia, on their respective continents, while the African region had the least number of cases, with 258,752 (5) . SARS-CoV-2 is a single-stranded positive-sense RNA virus, coding for four structural proteins (spike (S), envelope (E), membrane (M) and nucleocapsid (N)), 16 nonstructural proteins (nsp1 to nsp16) and several accessory proteins (ORF3a, ORF6, ORF7a, ORF7b, and ORF8) (6, 7) . Protein S which s responsible for binding to membrane receptors in host cells (ACE2) via its receptor-binding domain (RBD), therefore is considered as the most important target for candidate vaccines (8, 9, 10) . It is known that the mutation rate of the RNA virus contributes to viral adaptation, creating a balance between the integrity of genetic information and the variability of the genome, thus allowing viruses to escape host immunity and develop drug resistance (11, 12) . Our recent study (13) based on the analysis of 30,983 genomes of SARS-CoV-2 variants belonging to 80 countries, revealed 5.67% of total mutations with a frequency greater than 1% of all the sequences analyzed suggesting that this virus is not yet adapted to its host. From the viral RNA extracted from six clinical samples, the cDNA was synthesized using reverse transcriptase with random hexamers, then amplified for genomes enrichment using Q5 Hot Start High-Fidelity DNA Polymerase (NEB) using a set of primers targeting regions of the SARS-CoV-2 genome designed by ARTIC network A set of 40 SARS-CoV-2 genomes: 28 from Morocco, including six sequenced in the present study, 7 from Tunisia, 3 from Algeria, and 2 from Egypt, were downloaded from GISAID database (http://www.gisaid.org/) (14) ( Table 1 ). The reads generated by MinION Nanopore-Oxford of the six isolates were mapped to the reference sequence genome Wuhan-Hu-1/2019 using BWA-MEM v0.7.17-r1188 (15) with default parameters, while the data downloaded from GISAID database was mapped using Minimap v2.12-r847 (16) . The BAM files were sorted using SAMtools (17) and were subsequently used to call the genetic variants in variant call format (VCF) by BCFtools (17) . The final call set of the 40 genomes, was annotated and their impact was predicted using SnpEff v 4.3t (18) . We performed multiple sequence alignment using Muscle v 3.8 (19) for the 28 Moroccan strains with 229 genomes of SARS-CoV-2 circulating in the world from different geographical areas (Africa, Asia, Europe, North and South America and Oceania) ( Table S2) . Maximum-likelihood trees were inferred with IQ-TREE v1.5.5 under the GTR model (20) . Generated trees were visualised using In order to identify the genetic variants of the SARS-COV-2 moroccan genomes, 28 genomes were studied, including six sequenced in the present study and twenty-two others available in GISAID database ( Table 1) . 94.9 % to 99.93 % of the reads produced for the six genomes were mapped on the reference sequence Wuhan-Hu-1/2019 (Table S1 ). In all Moroccan SARS-CoV-2 genomes, the analysis of genetic variants revealed 61 mutations compared to the reference sequence (Fig 1) , including 29 non-syn- (Fig 2A) . These mutations have been distributed in seven genes, (ORF1ab, S, E, M, N, ORF3a and ORF8) with variable frequencies. As regard to non-synonymous mutations (Fig 2B) It is interesting to note that among the 58 non-synonymous mutations, 13 (22.41%) were recurrent in two or more genomes (Fig 2A) . The most frequent one was the D614G mutation (in S protein) with a prevalence of 92.5% (n = 37) among the 40 genomes included in this study, the second one was Q57H (in ORF3a) with a prevalence of 42.5% (n = 17). These two mutations have been observed within the four north African countries (Fig 2B) . However, the eleven other mutations were variable between these four countries, for example, T265I (in nsp2) was found in 25% of the genomes, including those of Moroccan, Algerian and Tunisian origins. Likewise, T5020I mutation (in nsp12-RdRp) was found with a prevalence of 17.5% within genomes belonging to Morocco and Tunisia. In addition, K2798R mutation (in nsp4transmembrane domain-2) was present in 10% of the genomes from Tunisia and Egypt. The appearance and monitoring of genetic variants plays a major role in orienting the therapeutic approach for the development of candidate vaccines in order to limit this SARS-CoV-2 pandemic (21) . To date, the genetic diversity of SARS-CoV-2 strains from North Africa is poorly documented. In this study, we performed a genetic analysis of forty SARS-Cov-2 genomes from North Africa, including twenty-eight from Morocco (6 newly sequenced), seven from Tunisia, three from Algeria and two from Egypt, to provide new information on genetic diversity and transmission of SARS- Genetic diversity could potentially increase the physical shape of the viral population and make it difficult to fight, or reverse, make the virus weaker, which could be correlated with the loss of their virulence and a decrease in the number of critical cases (22) . Compared to the reference sequence of Wuhan-Hu-1/2019, strains from North Africa harbored 4 to 15 genetic variants, of which 1 to 11 are involved in the change of amino acids. These results are consistent with the mutation rate previously reported in SASR-CoV-2 from different geographic areas (13, (23) (24) (25) . In Morocco, Tunisia, Algeria and Egypt, five non-synonymous mutations were common within at least two countries. Among them, D614G (in S protein) and Q57H (in OR3a) were observed in strains from the four countries. The D614G mutation is proximal to the S1 cleavage domain of advanced glycoprotein (26) and was of great interest due to their predominance in the six continents (27, 28) . Alouane et al. (13) showed that this mutation appeared for the first time on January 24, 2020 in the Asian region (China), after a week it was also observed in Europe (Germany). The Q57H mutation was taken away end of February in Africa (Senegal), Europe (France and Belgium) and North America (USA and Canada). Likewise, our previous study (13) showed that D614G had no impact on the two-dimensional or three-dimensional structure of advanced glycoprotein. Of the other three non-synonymous mutations that are variable between the strains from the four countries, T265I mutation (in nsp2) was The ORF1ab polyprotein is known to be cleaved into 16 non-structural proteins (nsp1-nsp16) (6) . We observed two domains rich in non-synonymous mutations, the first, nsp3-Multi domain due to its large size compared to other non-structural proteins and previously described as playing a different role in SARS-CoV-2 infection (29) . Likewise, nsp12-RdRp displays the same number of non-synonymous mutations although it has a smaller size and considered as a key element of the replication/ transcription mechanism (30) . A Novel Coronavirus from Patients 218 with Pneumonia in China COVID-19: Are Africa's diagnostic challenges blunting response effectiveness Preparedness and vulnerability of African countries against importations of COVID-19: a modelling study WHO declares COVID-19 a pandemic Coronavirus disease 2019 (COVID-19) Situtation report Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China Properties of Coronavirus and SARS-CoV-2 Subunit vaccines against emerging pathogenic human coronaviruses Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine The SARS-CoV-2 vaccine pipeline: an overview Viruses at the edge of adaptation RNA virus mutations and fitness for survival. Annual review of microbiology Genomic diversity and hotspot mutations in 30,983 SARS-CoV-2 genomes: moving toward a universal vaccine for the" confined virus GISAID: Global initiative on sharing all influenza data-from vision to reality Fast and accurate short read alignment with Burrows-Wheeler transform Minimap2: pairwise alignment for nucleotide sequences The sequence alignment/map format and SAMtools A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118 MUSCLE: multiple sequence alignment with high accuracy and high throughput IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies A review of SARS-CoV-2 and the ongoing clinical trials Cross-species virus transmission and the emergence of new epidemic diseases Large scale genomic analysis of 3067 SARS-CoV-2 genomes reveals a clonal geodistribution and a rich genetic variations of hotspots mutations SARS-CoV2 Envelope Protein: Non-Synonymous Mutations and Its Consequences Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2/HCoV-19) using whole genomic data Prediction of the effectiveness of COVID-19 vaccine candidates Could the D614 G substitution in the SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? Bioinformatic prediction of potential T cell epitopes for SARS-Cov-2 COVID-2019: the role of the nsp2 and nsp3 in its pathogenesis Emerging SARS-CoV-2 Mutation Hot Spots Include a Novel RNA-dependent-RNA Polymerase Variant Pasteur Institute (Morocco) Illumina-NextSEQ500 Morocco/15N EPI_ISL_458150 2020-05-15 ANOUAL laboratory (Morocco) AppliedBiosystems PGM Tunisia/COV1339 Tunisia/COV1663 Tunisia/COV0425 Tunisia/COV1482 Pasteur Institute (Algeria) Illumina-NextSEQ500 We sincerely thank the authors and laboratories around the world who have sequenced and shared the full genome data for SARS-CoV-2 in the GISAID database. All data authors can be contacted directly via www.gisaid.org.This work was carried out under National Funding from the Moroccan Ministry of Higher Education and Scientific Research (Covid-19 Program) to AI. This work was also supported by a grant to AI from Institute of Cancer Research and the PPR-1 program to AI. The authors declare that they have no competing interests.