key: cord-0827585-wvb7vnx6 authors: Abou-Hamdan, Mhamad; Hamze, Kassem; Sater, Ali Abdel; Akl, Haidar; El-zein, Nabil; Dandache, Israa; Abdel-sater, Fadi title: Variant analysis of the first Lebanese SARS-CoV-2 isolates date: 2020-10-20 journal: Genomics DOI: 10.1016/j.ygeno.2020.10.021 sha: 9977672d25ae3a0aeb274a5940550babffed41e5 doc_id: 827585 cord_uid: wvb7vnx6 Recently the first genome sequences for 11 SARS-CoV-2 isolates from Lebanon became available. Here, we report the detection of variants within the genome of these strains. Pairwise alignment analysis using blastx was performed between these sequences and the UniProtKB data for the SARS-CoV-2 coronavirus to identify amino acid variations. Variants analysis was performed using multiple Bioinformatics tools. We noticed for the first time 18 mutations that have never been reported before. Among those, a frame shift (8651A>) in NSP4, a stop codon 6887A > T in NSP3 and two missense mutations in spike S2 were found. In addition, we found 28 variants in ORF1ab alone. A previously reported variant, 23403A > G, in the spike protein S2 was mostly seen. Two other known mutations 25563G > T in ORF3a and 14408C > T in ORF1ab were detected respectively in 6 and 8 out of the 11 isolates. Our results may help to prognose forthcoming infections in this region. interferon responses [1] . The NSP 12 is the RNA-dependent RNA polymerases (RdRps) of SARS-CoV-2, this is the major enzyme in the replication and transcription of the viral genome [4] . NSP 14 also named ExoN, is reported to have an exonuclease activity with proofreading function; it has been reported that any variant in this protein will make the genome of this SARS-CoV-2 strain prone to high mutational changes [4] . Moreover, scientists proved the association of NSP 7 and NSP 8 with NSP 12, forming a stable supercomplex [4] . Altogether these proteins will ensure the transcription fidelity of the virus. ORF2 encodes the spike proteins (S) [2] . This glycoprotein is divided into two subunits, a globular S1 subunit that contains the receptor-binding domain (RBD) whereas S2 subunit has the domain involved in fusion [1] . It was well demonstrated that through receptors on the host cell-membrane the spike proteins mediate the entry of SARS-Cov-2 into human cells [3] . Mutations affecting the S protein will induce perturbations in the virus entry. Therefore, these glycoproteins were the target of neutralizing antibodies investigations. Even though, structural proteins, being mostly exposed to the host immune response, grabbed all the attention of therapeutic research, yet nonstructural components play as well a major role in virus-host interactions [2] . Out of this group, ORF3a encodes an ion channel protein related to inflammasome, inducing the activation of caspase 1 and the maturation of IL-1β [2] . It was lately described that SARS-CoV-2 induces cytokine storm and pyroptosis of the host cells which in many cases end in severe symptoms leading to death [5] . The 2020 pandemic caused by SARS-CoV-2 still lacks approved countermeasures to control the infection. Despite the efforts for data collection worldwide to control the spread, scientists are still trying to understand the genome variability and evolution of this virus. The purpose of this study is to investigate the different variants found in the Lebanese isolates infected with SARS-CoV-2. Our data may provide new evidence to understand the etiology of the variation in symptoms exacerbation among populations. The Lebanese suspected cases between February and March 2020 were collected by the national reference hospital at Lebanon. Diagnostic purposes for the presence of SARS-CoV-2 were followed as per the guidelines of the Lebanese Ministry of Health. For this study, we obtained 11 complete genomic sequences of Lebanese SARS-CoV-2 isolates from the GISAID's EpiCoVTM Database (https://www.gisaid.org/). Among them, eight sequences were obtained from patients with a travel history to SARS-CoV-2-affected J o u r n a l P r e -p r o o f Journal Pre-proof countries, namely Egypt (1 sample), Iran (3 samples), United Kingdom (2 samples), France (1 sample), Italy (1 sample) and the remaining three sequences from local residents with no travel history, at least, for the past three months ahead of their infection. Even though some of the isolates had low-quality sequence reads described by a stretch of a single consecutive nucleotide, but variants detection was possible. The accession numbers of the isolate strains used in this study are EPI_ISL_454420, EPI_ISL_450508 to EPI_ISL_4505017. We used the Wuhan (hu-1) virus sequence as a genome reference (NCBI Genbank, NC_045512) and the following sequences as a protein SARS-CoV-2 reference (New prortal Mutations specific to the Lebanese SARS-CoV-2 isolates were identified using multiple analysis. First of all, we identified amino acid mutations in Lebanese isolates by extracting pairwise alignment to each reference protein (downloaded from Covid19 Uniprot) using Blastx. Then, we analyzed sequence variations using Clustal Omega tool that conduct multiple sequence alignment between the genome of Lebanese isolates and the reference genome (NC_045512). the sequence variations analysis was also performed using VipR analysis tool and coVsurver enabled by GISAID. This data was checked and validated carefully against aligned sequences. Finally, we investigated the mutation existence and the global frequency from the previously reported worldwide data of SARS-CoV-2, by using A 67 total variants were found with 40 unique missense variants as shown in table 1 and 2. For what it concerns the 40 missense variants, 28 were found in ORF1ab, which is the longest ORF occupying 2/3 of the entire genome. ORF1ab is cleaved into many nonstructural proteins (NSP1-NSP16). Among them, NSPs, NSP3 and NSP4 had the highest number of variants in the analyzed samples. The most common variants were 23403A>G in the spike protein S2 and 14408C>T in ORF1ab, both in eight samples, and 25563G>T in ORF3a in six samples ( (Table 1) The patient travelling from Egypt died correlating the deterioration of his clinical status with a past medical history and old age. While, the sequence of this strain is 100% identical with the strain isolated from a local old released patient (EPI_ISL_450511). Finally, one of the two patients travelling from Iran, isolated on the 27th of February, showed identical genome sequence as the reference strain with no identified mutations in spike protein, but 7 mutations in ORF1ab. We expect out of our results that these haphazard described mutations will be detected from new Lebanese SARS-CoV-2 isolates and will most likely spread after gradual reopening normal life. Comparative analysis of genome sequences of SARS-CoV-2, from different worldwide isolates, revealed new variants that could be involved in varied exacerbation of symptoms in patients. Here we describe 40 unique missense variants investigated from 11 sequenced SARS-CoV-2 isolates, identified from individuals being present in Lebanon when presented to the hospital with SARS-CoV-2 symptoms. Even though the number of the sequenced isolates is very low and just 3 out of the 11 were local residents, it was very important to study the incidence and frequency of such important number of mutation(s) rarely seen in other countries. We observed the emergence of 18 novel mutations from only 11 sequences, not described before in the literature or in any database. Of those 18 mutations, sixteen missense mutations were found in different positions as follows; 6198C>A, 6281A>G/T, 6285C>A, 6887A>T and 7766A>C belonging to NSP 3, 8897A>T belonging to NSP 4, 10595T>C belonging to J o u r n a l P r e -p r o o f Journal Pre-proof NSP 5, 12297A>T belonging to NSP 8, 14369G>T and 14993C>T belonging to NSP 12 (RdRp), 16301G>T belonging to NSP 13, 18670G>T and 19499A>C belonging to NSP 14, 22093G>T and 22425C>T belonging to S and 26428A>T belonging to E gene. This was expected since after rapid spread of the virus across countries, many reports were published describing mutation hotspots, and correlating the results to the variable clinical condition of Covid-19 patients among populations [4] . Most of our newly described mutations belong to non-structural proteins (NSPs). Among these, we identified five distinctive mutations in ORF1ab affecting the NSP 3 protein (table 1). This protein was described previously to suppress host interferon response and to interact with other protein playing a role in viral replication. In one study, assessing mutations in non-structural viral proteins, the authors considered that NSP 3 protein might loses its stability upon specific mutational changes [6] . Moreover, we reveal the identification of two extremely important novel mutations, a stop codon and a frameshift mutation affecting NSP 3 and NSP 4 at position 6887A>T and 8651/fs, respectively. The onset of such deleterious mutation could be explained by the onset of the mutation after the entry of the virus to the human body [7] . It is well known that RNA viruses mutate at a very high rate, with a possible appearance of mutations in a patient every day, as it was reported for HIV virus infection [4] . Usually viruses use such modalities to adapt and survive antiviral therapies, but sometimes it fails to regulate such mechanism producing deleterious genome modifications [7] . Nevertheless, such types of mutation, frame shift and codon stop, were previously reported in essential genes, and alternative proteins expression strategies were proposed, such as ribosomal frameshifting and shunting [8, 9] . It seems obvious that subsequent laboratory work is needed to confirm the biological activity of the mutated strain. In addition, the analyzed sequence must be representative of the viral population infecting the patient, and not selective, in case of a double population with a nonmutated strain complementing the mutated one. Double viral population was documented in a recent study, where the authors reported the presence of 29 nucleotides deletion segment in the gene coding for accessory protein ORF 9 that eliminated additionally ORFs 10 and 11. They claimed to detect a co-existence of a non-deleted genome and the deleted one in the same sample from the same patient [10] . In our results, we detected a frequent missense mutation among 6 of our 11 isolates (25563G>T), and another deletion mutation belonging to the gene of the accessory protein ORF3. The ORF3a induces apoptosis and inflammatory responses in the infected cells [11] . ORF3a protein contains TNF receptor-associated factors (TRAF), ion channel, and caveolin binding domain. Near to these three domains is located the Q57H mutation, which may affect the inflammasome activation [11] . Many previously characterized mutations were also observed in our work. Out of these, we identified mutations in the structural Spike coding gene [8, 12] . We found a major mutation reported 76294 times in GISAID database at position 23403A>G (D614G). Korber et al. have shown that this variant is associated with greater infectivity as well as clinical evidence that it is associated with higher viral loads but does not appear to cause a more serious form of disease [13] . Two less frequently reported at positions 21724G>T and 22021G>T; and two additional novel mutations based on our analysis at positions 22093G>T, 22425C>T [12] . It is well known, that any mutation at the level of S gene, coding for one of the structural proteins, will affect the attachment and the transmission of the disease [3] . Among structural proteins, we describe the presence as well of two mutations affecting the N gene in two positions, 28881G>A and 28883G>C, in two isolates from patients coming from France and UK. Of high interest in our variants was the co-existence of 3 mutations (23403A>G, 14408C>T and 25563G>T) in 8 out of the 11 sequences (with only two strains, lacking the 25563G>T mutation). This combination of changes affected simultaneously, at one place, the nonstructural, structural, and accessory proteins. Such occurrence was statistically proved and correlated to the presence of the RdRp mutation at position 14408C>T [4] . When found, this mutation presumably affecting the proofreading activity of RdRp, provokes the onset of other changes [4] . At the clinical level, we noticed that two patients having the same age, one travelling from Egypt and the other is among local residents, had identical changes harboring the previous association of mutations but with different prognosis. The patient coming from All authors disclose no conflict of interest and no external funding was used for this work. Recently the first genome sequences for 11 SARS-CoV-2 isolates from Lebanon became available. Here, we report the detection of variants within the genome of these strains. Pairwise alignment analysis using blastx was performed between these sequences and the UniProtKB data for the SARS-CoV-2 coronavirus to identify amino acid variations. Variants analysis was performed using multiple Bioinformatics tools. We noticed for the first time 18 mutations that have never been reported before. Among those, a frame shift (8651A>) in NSP4, a stop codon 6887A>T in NSP3 and two missense mutations in spike S2 were found. In addition, we found 28 variants in ORF1ab alone. A previously reported variant, 23403A>G, in the spike protein S2 was mostly seen. Two other known mutations 25563G>T in ORF3a and 14408C>T in ORF1ab were detected respectively in 6 and 8 out of the 11 isolates. Our results may help to prognose forthcoming infections in this region. Keywords: Severe acute respiratory syndrome coronavirus 2, novel mutations, Lebanese isolates, missense. The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-Cov-2) is a newly identified β-coronavirus that was declared as a pandemic by the World Health Organization (WHO) on March 11, 2020. SAR-CoV2 is an enveloped, positive sense RNA virus that was found in humans and other mammals [1, 2] . Phylogenetic analysis of 160 complete genomes of SARS-Cov-2 isolated from different countries, identified three central variants (A, B, and C) based on variabilities in their amino acid sequence [3] . Although such analysis helped in tracing routes of infections, we still need an important number of cases to be studied and compared [3] . The genome of SARS-Cov-2 is composed of a total of 11 genes with 11 open reading frames (ORFs). ORF1ab after proteolytic cleavage encodes 16 nonstructural proteins (NSPs); S, E, M and N, encode structural proteins, namely spike, envelope, membrane and nucleocapsid proteins; and ORF3a, ORF6, ORF7, ORF8 and ORF10 encode accessory proteins [1, 2] . The latter protein group is known to have high variability in its sequence [3] . ORF1ab consists of two third of the entire genome and produces two polyproteins, pp1a and pp1ab, these are cleaved into 16 nonstructural proteins (NSP 1 -NSP 16) [1] . NSP 3 is J o u r n a l P r e -p r o o f Journal Pre-proof characterized to be a multifunctional protein, acting as viral protease and able to suppress interferon responses [1] . The NSP 12 is the RNA-dependent RNA polymerases (RdRps) of SARS-CoV-2, this is the major enzyme in the replication and transcription of the viral genome [4] . NSP 14 also named ExoN, is reported to have an exonuclease activity with proofreading function; it has been reported that any variant in this protein will make the genome of this SARS-CoV-2 strain prone to high mutational changes [4] . Moreover, scientists proved the association of NSP 7 and NSP 8 with NSP 12, forming a stable supercomplex [4] . Altogether these proteins will ensure the transcription fidelity of the virus. ORF2 encodes the spike proteins (S) [2] . This glycoprotein is divided into two subunits, a globular S1 subunit that contains the receptor-binding domain (RBD) whereas S2 subunit has the domain involved in fusion [1] . It was well demonstrated that through receptors on the host cell-membrane the spike proteins mediate the entry of SARS-Cov-2 into human cells [3] . Mutations affecting the S protein will induce perturbations in the virus entry. Therefore, these glycoproteins were the target of neutralizing antibodies investigations. Even though, structural proteins, being mostly exposed to the host immune response, grabbed all the attention of therapeutic research, yet nonstructural components play as well a major role in virus-host interactions [2] . Out of this group, ORF3a encodes an ion channel protein related to inflammasome, inducing the activation of caspase 1 and the maturation of IL-1β [2] . It was lately described that SARS-CoV-2 induces cytokine storm and pyroptosis of the host cells which in many cases end in severe symptoms leading to death [5] . The 2020 pandemic caused by SARS-CoV-2 still lacks approved countermeasures to control the infection. Despite the efforts for data collection worldwide to control the spread, scientists are still trying to understand the genome variability and evolution of this virus. The purpose of this study is to investigate the different variants found in the Lebanese isolates infected with SARS-CoV-2. Our data may provide new evidence to understand the etiology of the variation in symptoms exacerbation among populations. The Lebanese suspected cases between February and March 2020 were collected by the national reference hospital at Lebanon. Diagnostic purposes for the presence of SARS-CoV-2 were followed as per the guidelines of the Lebanese Ministry of Health. For this study, we obtained 11 complete genomic sequences of Lebanese SARS-CoV-2 isolates from the GISAID's EpiCoVTM Database (https://www.gisaid.org/). Among them, J o u r n a l P r e -p r o o f eight sequences were obtained from patients with a travel history to SARS-CoV-2-affected countries, namely Egypt (1 sample), Iran (3 samples), United Kingdom (2 samples), France (1 sample), Italy (1 sample) and the remaining three sequences from local residents with no travel history, at least, for the past three months ahead of their infection. Even though some of the isolates had low-quality sequence reads described by a stretch of a single consecutive nucleotide, but variants detection was possible. The accession numbers of the isolate strains used in this study are EPI_ISL_454420, EPI_ISL_450508 to EPI_ISL_4505017. We used the Wuhan (hu-1) virus sequence as a genome reference (NCBI Genbank, NC_045512) and the following sequences as a protein SARS-CoV-2 reference (New prortal A 67 total variants were found with 40 unique missense variants as shown in table 1 and 2. For what it concerns the 40 missense variants, 28 were found in ORF1ab, which is the longest ORF occupying 2/3 of the entire genome. ORF1ab is cleaved into many nonstructural proteins (NSP1-NSP16). Among them, NSPs, NSP3 and NSP4 had the highest number of variants in the analyzed samples. The most common variants were 23403A>G in the spike protein S2 and 14408C>T in ORF1ab, both in eight samples, and 25563G>T in ORF3a in six samples ( The patient travelling from Egypt died correlating the deterioration of his clinical status with a past medical history and old age. While, the sequence of this strain is 100% identical with the strain isolated from a local old released patient (EPI_ISL_450511). Finally, one of the two patients travelling from Iran, isolated on the 27th of February, showed identical genome sequence as the reference strain with no identified mutations in spike protein, but 7 mutations in ORF1ab. We expect out of our results that these haphazard described mutations will be detected from new Lebanese SARS-CoV-2 isolates and will most likely spread after gradual reopening normal life. Comparative analysis of genome sequences of SARS-CoV-2, from different worldwide isolates, revealed new variants that could be involved in varied exacerbation of symptoms in patients. Here we describe 40 unique missense variants investigated from 11 sequenced SARS-CoV-2 isolates, identified from individuals being present in Lebanon when presented to the hospital with SARS-CoV-2 symptoms. Even though the number of the sequenced isolates is very low and just 3 out of the 11 were local residents, it was very important to study the incidence and frequency of such important number of mutation(s) rarely seen in other countries. We observed the emergence of 18 novel mutations from only 11 sequences, not described before in the literature or in any database. Of those 18 mutations, sixteen missense mutations were found in different positions as follows; 6198C>A, 6281A>G/T, 6285C>A, 6887A>T and 7766A>C belonging to NSP 3, 8897A>T belonging to NSP 4, 10595T>C belonging to J o u r n a l P r e -p r o o f (RdRp), 16301G>T belonging to NSP 13, 18670G>T and 19499A>C belonging to NSP 14, 22093G>T and 22425C>T belonging to S and 26428A>T belonging to E gene. This was expected since after rapid spread of the virus across countries, many reports were published describing mutation hotspots, and correlating the results to the variable clinical condition of Covid-19 patients among populations [4] . Most of our newly described mutations belong to non-structural proteins (NSPs). Among these, we identified five distinctive mutations in ORF1ab affecting the NSP 3 protein (table 1). This protein was described previously to suppress host interferon response and to interact with other protein playing a role in viral replication. In one study, assessing mutations in non-structural viral proteins, the authors considered that NSP 3 protein might loses its stability upon specific mutational changes [6] . Moreover, we reveal the identification of two extremely important novel mutations, a stop codon and a frameshift mutation affecting NSP 3 and NSP 4 at position 6887A>T and 8651/fs, respectively. The onset of such deleterious mutation could be explained by the onset of the mutation after the entry of the virus to the human body [7] . It is well known that RNA viruses mutate at a very high rate, with a possible appearance of mutations in a patient every day, as it was reported for HIV virus infection [4] . Usually viruses use such modalities to adapt and survive antiviral therapies, but sometimes it fails to regulate such mechanism producing deleterious genome modifications [7] . Nevertheless, such types of mutation, frame shift and codon stop, were previously reported in essential genes, and alternative proteins expression strategies were proposed, such as ribosomal frameshifting and shunting [8, 9] . It seems obvious that subsequent laboratory work is needed to confirm the biological activity of the mutated strain. In addition, the analyzed sequence must be representative of the viral population infecting the patient, and not selective, in case of a double population with a nonmutated strain complementing the mutated one. Double viral population was documented in a recent study, where the authors reported the presence of 29 nucleotides deletion segment in the gene coding for accessory protein ORF 9 that eliminated additionally ORFs 10 and 11. They claimed to detect a co-existence of a non-deleted genome and the deleted one in the same sample from the same patient [10] . In our results, we detected a frequent missense mutation among 6 of our 11 isolates (25563G>T), and another deletion mutation belonging to the gene of the accessory protein ORF3. The ORF3a induces apoptosis and inflammatory responses in the infected cells. ORF3a protein contains TNF receptor-associated factors (TRAF), ion channel, and caveolin binding domain. Near to these three domains is located the Q57H mutation, which may affect the inflammasome activation [11] . Many previously characterized mutations were also observed in our work [8, 12] . Out of these, we identified mutations in the structural Spike coding gene. We found a major mutation reported 76294 times in GISAID database at position 23403A>G (D614G). Korber et al. have shown that this variant is associated with greater infectivity as well as clinical evidence that it is associated with higher viral loads but does not appear to cause a more serious form of disease [13] . Two less frequently reported at positions 21724G>T and 22021G>T; and two additional novel mutations based on our analysis at positions 22093G>T, 22425C>T [11] . It is well known, that any mutation at the level of S gene, coding for one of the structural proteins, will affect the attachment and the transmission of the disease [3] . Among structural proteins, we describe the presence as well of two mutations affecting the N gene in two positions, 28881G>A and 28883G>C, in two isolates from patients coming from France and UK. Of high interest in our variants was the co-existence of 3 mutations (23403A>G, 14408C>T and 25563G>T) in 8 out of the 11 sequences (with only two strains, lacking the 25563G>T mutation). This combination of changes affected simultaneously, at one place, the nonstructural, structural, and accessory proteins. Such occurrence was statistically proved and correlated to the presence of the RdRp mutation at position 14408C>T [4] . When found, this mutation presumably affecting the proofreading activity of RdRp, provokes the onset of other changes [4] . At the clinical level, we noticed that two patients having the same age, one travelling from Egypt and the other is among local residents, had identical changes harboring the previous association of mutations but with different prognosis. The patient coming from All authors disclose no conflict of interest and no external funding was used for this work. Table1. Variants detected in coding sequences of 11 Lebanese SARS-CoV-2 isolates. Table2. Variants frequencies of Lebanese SARS-CoV-2 isolates. Molecular Evolution of Human Coronavirus Genomes The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n-COV19), the Cause of COVID-19 Coronavirus Spike Protein and Tropism Changes Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant SARS-CoV-2 infection and overactivation of Nlrp3 inflammasome as a trigger of cytokine "storm" and risk factor for damage of hematopoietic stem cells Evolutionary analysis of SARS-CoV-2: how mutation of Non-Structural Protein 6 (NSP6) could affect viral autophagy Extremely High Mutation Rate of HIV-1 In Vivo Variant analysis of COVID-19 genomes Programmed −1 ribosomal frameshifting in the SARS coronavirus The large 386-nt deletion in SARS-associated coronavirus: evidence for quasispecies? Molecular conservation and differential mutation on ORF3a gene in Indian SARS-CoV2 genomes Molecular Evolution of Human Coronavirus Genomes The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n-COV19), the Cause of COVID-19 Coronavirus Spike Protein and Tropism Changes Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant SARS-CoV-2 infection and overactivation of Nlrp3 inflammasome as a trigger of cytokine "storm" and risk factor for damage of hematopoietic stem cells Evolutionary analysis of SARS-CoV-2: how mutation of Non-Structural Protein 6 (NSP6) could affect viral autophagy Extremely High Mutation Rate of HIV-1 In Vivo Variant analysis of COVID-19 genomes Programmed −1 ribosomal frameshifting in the SARS coronavirus The large 386-nt deletion in SARS-associated coronavirus: evidence for quasispecies? Molecular conservation and differential mutation on ORF3a gene in Indian SARS-CoV2 genomes Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome. Osong Public Health Res Perspect Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus