key: cord-0003165-1zjrunzc
authors: Faye, Martin; Faye, Oumar; Diagne, Moussa Moise; Fall, Gamou; Weidmann, Manfred; Sembene, Mbacke; Sall, Amadou Alpha; Faye, Ousmane
title: Full-Genome Characterization and Genetic Evolution of West African Isolates of Bagaza Virus
date: 2018-04-13
journal: Viruses
DOI: 10.3390/v10040193
sha: 2769723b9b497b45c8178f8f68dc48a5fed06db6
doc_id: 3165
cord_uid: 1zjrunzc

Bagaza virus is a mosquito-borne flavivirus, first isolated in 1966 in Central African Republic. It has currently been identified in mosquito pools collected in the field in West and Central Africa. Emergence in wild birds in Europe and serological evidence in encephalitis patients in India raise questions on its genetic evolution and the diversity of isolates circulating in Africa. To better understand genetic diversity and evolution of Bagaza virus, we describe the full-genome characterization of 11 West African isolates, sampled from 1988 to 2014. Parameters such as genetic distances, N-glycosylation patterns, recombination events, selective pressures, and its codon adaptation to human genes are assessed. Our study is noteworthy for the observation of N-glycosylation and recombination in Bagaza virus and provides insight into its Indian origin from the 13th century. Interestingly, evidence of Bagaza virus codon adaptation to human house-keeping genes is also observed to be higher than those of other flaviviruses well known in human infections. Genetic variations on genome of West African Bagaza virus could play an important role in generating diversity and may promote Bagaza virus adaptation to other vertebrates and become an important threat in human health.

Bagaza virus (BAGV) belongs to the Flaviridae family, Flavivirus genus and Ntaya serological group. BAGV is a mosquito-transmitted virus, which was first isolated in the Bagaza district of the Central African Republic (CAR), in 1966, from a pool of mixed-species female Culex spp. mosquitoes during entomological investigations [1] . As is characteristic of flaviviruses, BAGV possesses a linear single-stranded, positive-sense RNA genome [2] . The BAGV genome is 10,941 nucleotides in length, encoding a single polyprotein (3426 amino acids) from which 11 viral proteins are derived, and flanked by 5 and 3 untranslated regions (UTRs) of 94 and 556 nt, respectively [3] .

BAGV has been isolated repeatedly with a high titer from different species of mosquitoes in Central and West African countries [4] [5] [6] , and in India, where serological investigations suggested subclinical infections in humans [7, 8] . Despite this widespread circulation of BAGV, outbreaks involving humans or animals have not yet been reported from these countries. Subsequently, in September 2010, BAGV was associated with a high mortality among game birds (Partridges and pheasants) in southern Spain, the first detection of BAGV in Europe and the first isolation from a vertebrate host [9, 10] . However, it is not surprising that BAGV infects birds since it has been shown to be synonymous with Israel turkey 

All 11 virus strains analyzed in this study were derived from the WHO Collaborating Center (http://apps.who.int/whocc/Detail.aspx?cc_ref=SEN-5&cc_code=sen) for arboviruses and viral hemorrhagic viruses in Senegal at Institut Pasteur de Dakar (Dakar, Senegal) ( Table 2 ). Viral stocks were prepared by inoculating viral strains into Aedes albopictus (C6/36) cell line in Leibovitz 15 (L-15) growth medium Viruses 2018, 10, 193 3 of 26 (GibcoBRL, Grand Island, NY, USA) supplemented with 5% fetal bovine serum (FBS) (GibcoBRL, Grand Island, NY, USA), 10% Tryptose Phosphate and antibiotics (Sigma, Gmbh, Germany). BAGV infection was confirmed after 4 days of propagation by immunofluorescence assay (IFA) using specific hyper-immune mouse ascitic fluid, as previously described [15] . Cultures supernatants were collected for virus RNA isolation. Extraction of viral RNA from supernatants was performed with the QIAamp viral RNA mini kit (Qiagen, Heiden, Germany) according to manufacturer's instructions. Extracted RNA was frozen at −80 • C prior to downstream applications. 

Real-time RT-PCR (Reverse Transcriptase-Polymerase Chain Reaction) was performed using the Quantitect ® Probe RT-PCR Kit (Qiagen, Heiden, Germany) in a final volume of 25 µL following previously established protocols and primers [16] . Reverse-transcription was performed using the AMV kit (Promega, Madison, WI, USA) following manufacturer's instructions and cDNA were stored at −20 • C. The polymerase chain reaction with each primer set was carried out in a final volume of 50 µL using the GoTaq ® DNA polymerase kit (Promega, Madison, WI, USA) according to manufacturer's instructions. Briefly, 5 µL (around 10 µg) of cDNA was added to 45 µL of a RT-PCR mix containing 25 mM MgCl2, 10 mM of dNTP, 5X reaction buffer, 5 U Gotaq polymerase, 16.5 µL of nuclease-free water and 40 pmols of each primer (Sense and Antisense). PCR was carried using the following conditions: an initial incubation at 95 • C for 5 min, followed by 40 cycles of 95 • C for 1 min, 1 min at melting temperature of primers, and 72 • C according to the length of PCR product and 72 • C during 10 min. Subsequently, 5 µL of each PCR product was analyzed by gel electrophoresis on 1% agarose gels stained with ethidium bromide to check the size of amplified fragments by comparison to a DNA molecular weight marker (HyperLadder™ 1 kb, Bioline, Taunton, MA, USA). The DNA bands from the PCR amplification were purified (QIAquick Gel Extraction Kit, Qiagen, Heiden, Germany) and sequenced from both ends for each positive sample (Beckmann Coulter, High Wycombe, UK). Sequencing of the 5 and 3 termini of the viral genome was performed using a 5 RACE kit (Invitrogen, Carlsbad, CA, USA) and a 3 RACE kit (Roche, Basel, Switzerland) following the manufacturer's protocols. Additional sequences representing strains from Central African Republic, India, strain related to Spanish wild bird's outbreak in 2010, the ITV and the Ntaya virus were obtained from GenBank, with the following accession numbers, respectively: AY632545, EU684972, HQ644143, KC734549 and NC_018705.

Full-length genome sequences BAGV isolates were obtained by assembling overlapping nucleotide sequences using the Unipro UGENE software (http://ugene.net/download.html) [17] . Multiple alignments of full-genome sequences were carried out by using Muscle algorithm (http://www.drive5.com/muscle/) [18] within Unipro UGENE software. Based on these alignments, we investigated the genetic properties of these different isolates circulating in West Africa, such as genome length and location of main conserved amino acid motifs previously described in mosquito-borne flaviviruses (MBFVs) with sometimes mutations which include no physicochemical properties changes [3] . Comparatively, conservation of these motifs was also assessed in Culex flavivirus (CxFV) and Aedes flavivirus (AeFV) (insect-specific flaviviruses; (ISFs)) and in Modoc virus (ModV) and Rio Bravo virus (RBV) (Vertebrate-specific flaviviruses, also known as no known vector flaviviruses (NKVFs)). We also searched for evidence of informative amino acid sites among BAGV sequences using the DIVEIN web server (https://indra.mullins.microbiol.washington.edu/DIVEIN/) [19] . The genetic divergence between previously available BAGV complete genomes and new characterized sequences was also assessed at the nucleotide and protein levels.

Prediction of N-glycosylated sites on the genome of BAGV were performed by submitting complete polyproteins on online version of NetNGlyc 1.0 Server (http://www.cbs.dtu.dk/services/ NetNGlyc/). N-linked glycosylation is a post-translational event whereby carbohydrates are added to Asparagines, which occur in the consensus sequence Asn-Xaa-Ser/Thr, where Xaa is any amino acid except proline. "Potential" scores of predicted N-glycosylated sites across the protein chain from N-to C-terminal were illustrated using the default threshold of 0.5 and the "jury agreement" indicates how many of the nine networks support the prediction [20] .

The RNAz method [21] implemented in the Vienna RNA Websuite (http://rna.tbi.univie.ac.at/) [22] was used to detect thermodynamically stable and evolutionarily conserved structural RNA domains on complete non-coding regions of the 11 West African BAGV isolates characterized in this study and the isolates from Spain and CAR, because complete non-coding sequences are not currently available for the isolate from India. The RNAz method use an algorithm which testing a large set of well-known conserved structural RNA domains and reports a "RNA classification probability" or p-value as a measure of thermodynamic stability. Structural RNA domains with p > 0.5 are classified as stable [21] . Furthermore, the optimal secondary structures were predicted with a minimum free energy using the RNAalifold method [23] implemented also in the Vienna RNA Websuite that use a dynamic programming algorithm with RNA parameters as previously described [24] . Furthermore, previously described organization of conserved sequences (CS) [3] was analyzed on predicted secondary structures of the 3 UTR, considering possible repetitions of these CS. Thus, a conserved sequence was considered as imperfect when it presented three or more differences with corresponding consensus sequence previously described [3] , marked by a deletion, an insertion, or a substitution.

A Bayesian phylogenetic analysis for estimation of data quality and selection of the best-fit nucleotide substitution model were performed using Mega 6.0 (https://www.megasoftware.net/) with a discrete Gamma distribution (+G) with 5 rate categories. Thus, a total of 24 different nucleotide substitution models were tested and model with the lowest BIC score (Bayesian Information Criterion) was considered to describe the best substitution pattern. Further parameters as AICc value (Akaike Information Criterion, corrected) and Maximum Likelihood value (lnL) are also estimated [25] . A maximum likelihood tree was then constructed with complete polyprotein sequences from insect-specific flaviviruses, no known vector flaviviruses, tick-born flaviviruses, mosquito-borne flaviviruses, the 11 ORFs from new characterized West African BAGV isolates and BAGV sequences previously available from Spain (HQ644143-4, KR108244-6), India (EU684972) and CAR (AY632545). Tree was inferred using FastTree v2.1.7 (http://www.microbesonline.org/fasttree/) [26] , where nucleotide substitution was modeled using General Time-Reversible with a proportion of invariant sites (GTR+I). Nodes were labeled with local support values, which were computed with the Shimodaira-Hasegawa test (SH-like) for 5000 replications. Topology was visualized by FigTree v.1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).

To prevent potential biases during phylogenetic inference due to recombination, all polyprotein sequences were analyzed using seven methods (RDP, GENECONV, MaxChi, BootScan, Chimaera, SiScan, and 3Seq) implemented in the Recombination Detection Program (RDP4beta 4.8) to uncover evidence for recombination events [27] . The disentangle recombination signals option was "on" and the linear sequence setting was used. The remaining settings were kept at their default values. Only events with p-values < 1 × 10 −6 that were detected by four or more methods were considered to represent strong evidence for recombination using 100 permutations and the Bonferroni correction [28] implemented in the RDP4 program to prevent false positive results. A chi-square test was used to determine if the sequence identity between a recombinant isolate and a given parent was significantly different both inside and outside the recombinant region. In addition, a BootScan analysis including the recombinant and the parental strains determined above was also performed to confirm these putative recombination events. The occurrence of recombination in BAGV genomes was also investigated with a method called Genetic Algorithms for Recombination Detection (GARD) implemented in Datamonkey web server (http://datamonkey.org) [29] , that estimates breakpoints based on a genetic algorithm. The statistical significance of putative breakpoints was evaluated through Kishino-Hasegawa (HK) tests; breakpoints were considered significant if their p value was <0.05. Separate Neighbor-Joining (NJ) trees were constructed for identified putative recombinant region and non-recombinant alignment partitions dictated by the breakpoint locations. Phylogenetic trees were inferred using the percentage of 1000 bootstrap replications under the appropriate model of nucleotide substitution.

Recombination can mislead inference of positive natural selection if it is not properly accounted for. If recombination was identified, these potential recombinant sequences were excluded from further analyses to avoid inferential biases [30] . The non-synonymous/synonymous rate ratio (dN/dS) is a widely used method to detect positive selection. The statistical test dN/dS permitted to distinguish diversifying or positive selection (dN/dS > 1) from negative or purifying selection (dN/dS < 1). Positive selection is inferred when the rate of non-synonymous (dN) substitutions is higher than that of synonymous (dS) substitutions (dN > dS). Episodes of positive selection in each gene of BAGV were analyzed using methods of estimation among individual sites and internal sites on branches of the phylogenetic tree. For this, a total of 9 alignment partitions were performed corresponding to C, prM, E, NS1, NS2A, NS2B, NS3, NS4A, NS4B and NS5 proteins. As site model, we used the single-likelihood ancestor counting (SLAC) that estimated the difference between non-synonymous (dN) and synonymous (dS) rates per codon site at 0.1 significance level. The fast unconstrained Bayesian approximation (FUBAR) method which evaluated episodic positive selection at each site in the alignment at posterior probability ≥0.9 was also used [31] . The mixed effects model of evolution (MEME) was also conducted at a 0.1 significance level for estimation selective pressure changes among codon sites. Finally, branch-site random effects likelihood (Branch-site REL) analysis was used to evaluate evidence of diversifying selection on specific branches in the phylogenetic tree at a proportion of sites, considering p-values less than 0.05 as significant. All four methods were conducted with HyPhy package implemented in Datamonkey web server [29] . An episode of positive diversifying selection in concern of a region was considered if it was detected by at least two different methods.

The evolutionary analysis was performed using a strict clock GMRF Bayesian Skyride coalescent tree prior [32, 33] . The GTR substitution model was used with 4 gamma rate categories. The Bayesian Markov Chain Monte Carlo (MCMC) algorithms using BEAST v1.8.4 (http://beast.community/) [34] were employed to estimate the rate of BAGV evolution from first isolation to 2014. MCMC analyses were run for 100 million generations, sampling every 100 thousand to ensure convergence of estimates. Population size (ESS) above 200 was assessed using the analysis program Tracer v1.6 (http://beast.bio.ed.ac.uk/Tracer). The posterior distribution of trees obtained from the BEAST analysis was also used to obtain the Bayesian maximum clade credibility (MCC) tree for these sequences generated by TreeAnnotator v.2.3.2 (http://beast.community/treeannotator) (from 100 million) after removing 10% of the runs burn-in and visualized by FigTree v.1.4.2.

The Codon Adaptation Index (CAI) is a measure of the synonymous codon usage bias making comparisons of codon usage preferences in different organisms and assessing the adaptation of viral genes to their hosts [35, 36] . CAI was applied in many recent studies involving humans and RNA viruses [37] [38] [39] . To know if there is evidence of BAGV adaptation for codon usage in humans, the CAI was calculated for each isolate. To calculate normalized CAI, full-length polyprotein sequences of West African BAGV isolates and previously available BAGV sequences from Spain were compared to that of human using CAIcal v1.4 program (http://genomes.urv.es/CAIcal/) [40] . First, we obtained a "raw" CAI (rCAI) and then, the CAI was normalized by the "expected neutral CAI" (eCAI) value based on 1000 random viral sequences using similar length, codon composition, GC-content and human amino acid usage. Indeed, a table for human codon usage containing the entirety of human coding genes is publicly available [41] . Based on this table, we created a new table where only the 3804 identified human housekeeping genes were considered [42] . Normalized CAI threshold was obtained by calculating rCAI/eCAI values and a value above '1' is higher than neutral and considered as evidence of codon adaptation to the reference set of codon preferences [40] . CAI values obtained for BAGV were then compared to those of others MBFVs well known to infect humans such as Dengue virus (DENV), Usutu virus (USUV), WNV, Zika virus (ZIKV) and Yellow fever virus (YFV), NKV flaviviruses (ModV and RBV) and ISFs (CxFV and AeFV), using the non-parametric Wilcoxon test with R program. A p-value less than 0.05 was considered as significant. Sequences of tobacco mosaic virus (TMV) were compared to human codons and used as negative control to provide an example that results for codon adaptation to human house-keeping genes are robust and not false positives or anomalies. As there are no known cases of human infection, or evidence of human adaptation for TMV, we expected all sequences to have a lower CAI threshold than the calculated CAI.

In this study, a total of 11 full-genome sequences (10,954 bp) of West African BAGV isolates were obtained by sequencing overlapping PCR amplifications covering the genome and by using RACE (Rapid amplification of cDNA ends) techniques for the terminal ends and deposited in GenBank (www.ncbi.nlm.nih.gov/genbank/) (Accession numbers: MF380424-34) ( Table 1 ). Analysis of new characterized BAGV complete open reading frames (ORFs) was performed at nucleotide and amino acid levels including previously available sequences from CAR (isolate DakArB209_CAR_1966, accession No. AY632545) and Spain (isolate Spain_H_2010, accession No. HQ644143) into multiple sequence alignments. The polyprotein length of the newly sequenced West African BAGV isolates was determined with respect to gene sizes (Table 3) . Although the 5 UTR was similar in length, the 3 UTR of these West African isolates was either 10 nt or 137 nt longer than those of sequences from CAR and Spain, respectively. In the 5 UTR, positions 52, 55 and 93 had nucleotide changes that were distinguishable for West African isolates. Nucleotide changes A to C at position 52 and T to C at position 55 were seen in West African sequences, and a T to C change at position 93 was observed only in sequences of the isolates ARD54139_Dakar-Bango_SEN_1989 and ARA23139_Dezidougou_CI_1988. Interestingly, the 3 UTR can be divided into three sections; a proximal highly variable section constituted by the 139 first nucleotides following the stop codon, a second highly conservative section located between nucleotide positions 140 and 434 and a moderately variable distal region comprising the last 142 nucleotides. In this distal section, 3 UTR sequences of West African isolates presented insertions of 74 nt and 1 nt, compared to the isolates from CAR and Spanish (KR108244-6), respectively.

Pairwise genetic distances of coding sequences were evaluated at nucleotide and amino acid levels between isolates characterized in this study and in comparison with previously available BAGV sequences ( Figure 1 ). Nucleotide sequences of BAGV isolated from Senegal showed a mean distance of 1.9% ± 0.8% (0.3-3.4%). This lowest genetic distance was also apparent at amino acid level with a mean distance of 1.9% ± 0.6% (0.4-3.7%). 

Here, we described location of main conserved amino acid motifs on BAGV proteins using in silico analysis of complete genome sequences of the 11 West African BAGV isolates characterized in this study and sequences from India, CAR and Spain. Most of highly conserved amino acid motifs localized across E, NS1, NS3 and NS5 proteins of MVFs were identified in the BAGV genomes, sometimes with presence of conservative amino acid mutations (positions highlighted in Black) or non-conservative amino acid mutations (positions highlighted in red) ( Table 4) In protein NS1, all analyzed motifs were conserved, but nc T713P and T713A were observed in sequences of isolates ARD171102_Barkedji_SEN_2004 and EU684972_96363_India_1996, respectively. The BAGV isolate ARD54139_Dakar-Bango_SEN_1989 also contains nc P1127T. In NS3, the conserved motif identified at positions 1722-1728 contains nc A1723P and P1724L for BAGV isolates ARD137998_Diawara_SEN_2000 and ARD138018_Diawara_ SEN_2000, whereas nc L1722S In comparison to sequence of the isolate ARA23139_Dezidougou_CI_1988 from Côte d'Ivoire, Senegalese BAGV isolates showed a higher mean distance of 3.4% ± 0.5% (2.7-4.1%) at nucleotide level. However, this highest genetic distance was less apparent at amino acid level with a mean distance of 1.7% ± 0.7% (2.7-4.1%). Furthermore, mean distances of 6.7% (6.2-7.3%), 1.3% (0.2-2.7%), 5.8% (5.2-6.3%) were recorded at nucleotide level between Senegalese BAGV isolates and the isolate from CAR, Spain, and EU684972__96363_India_1996, respectively while respective mean distances were 1.5% (0.8-2.5%), 1.7% (1.0-2.8%), 2.7% (2.0-3.8%) at amino acid level. A differentiation coefficient value of 0.17 was also observed between these West African BAGV isolates and previously available sequences.

Here, we described location of main conserved amino acid motifs on BAGV proteins using in silico analysis of complete genome sequences of the 11 West African BAGV isolates characterized in this study and sequences from India, CAR and Spain. Most of highly conserved amino acid motifs localized across E, NS1, NS3 and NS5 proteins of MVFs were identified in the BAGV genomes, sometimes with presence of conservative amino acid mutations (positions highlighted in Black) or non-conservative amino acid mutations (positions highlighted in red) ( Table 4) In protein NS1, all analyzed motifs were conserved, but nc T713P and T713A were observed in sequences of isolates ARD171102_Barkedji_SEN_2004 and EU684972_96363_India_1996, respectively. The BAGV isolate ARD54139_Dakar-Bango_SEN_1989 also contains nc P1127T. In NS3, the conserved motif identified at positions 1722-1728 contains nc A1723P and P1724L for BAGV isolates ARD137998_ Diawara_SEN_2000 and ARD138018_Diawara_ SEN_2000, whereas nc L1722S is present in ARD138018_Diawara_SEN_2000, and ARD171075_Barkedji_SEN_2004. A non-conserved motif at positions 1759-1766 contained nc F1766L in all BAGV sequences analyzed and isolate ARD138018_Diawara_SEN_2000 contained additional nc T1765P and D1785Y. A non-conserved motif in NS5 at positions 2734-2741 contained nc T2738N in all BAGV sequences. BAGV isolate ARD54139_Dakar-Bango_SEN_1989 has two supplementary mutations S2734W and S2737P. In addition, these MBFVs amino acid motifs were also mostly conserved in BAGV, NKVFs and CxFV (ISFs) than in AeFV (ISFs). Non-conservative amino acid mutations on the BAGV polyprotein might be associated to phenotypic differences of BAGV isolates. In addition, the presence of phylogenetically informative sites was assessed on the DIVEIN web server. The identified site LAP is harbored by the conserved motif LAPTRVV previously identified in NS3 protein of flaviviruses [3] and presents nc mutations in the genome of three BAGV isolates. In addition, phylogenetically informative sites IEGA and GRIWNA identified in NS4B and NS5, showed combined variations in the genome of the CAR isolate (DKGQ and RTDMEC, respectively) and the Senegalese BAGV isolates ARD152146_Diawara_SEN_2001 (RRAA and GRIWNA, respectively) and ARD260266_Barkedji_SEN_2014 (RRSS and RTDMEC, respectively) ( Figure 2 ). Non-conservative amino acid mutations on the BAGV polyprotein might be associated to phenotypic differences of BAGV isolates. In addition, the presence of phylogenetically informative sites was assessed on the DIVEIN web server. The identified site LAP is harbored by the conserved motif LAPTRVV previously identified in NS3 protein of flaviviruses [3] and presents nc mutations in the genome of three BAGV isolates. In addition, phylogenetically informative sites IEGA and GRIWNA identified in NS4B and NS5, showed combined variations in the genome of the CAR isolate (DKGQ and RTDMEC, respectively) and the Senegalese BAGV isolates ARD152146_Diawara_SEN_2001

(RRAA and GRIWNA, respectively) and ARD260266_Barkedji_SEN_2014 (RRSS and RTDMEC, respectively) ( Figure 2 ). 

Prediction of N-glycosylation sites was performed using complete genome sequences of the 11 West African BAGV isolates characterized in this study and sequences from India, CAR and Spain on the DIVEIN web server. The "potential" score represents the averaged output of nine neural networks and the "jury agreement" indicates how many of the nine networks support the prediction. In total, eight N-glycosylated motifs were identified in the BAGV genome (potential > 0.5) including two highly probable sites (potential > 0.5 and jury agreement of 9/9). Despite high potential (0.7452) and jury agreement (9/9), the motif (Asn-X-Thr) NPTD identified at position 603 was not considered to be glycosylated because it contained a Proline known to preclude the N-glycosylation by rendering inaccessible the Asparagine in the majority of cases (Figure 3 ). This motif was in the domain III region 

Prediction of N-glycosylation sites was performed using complete genome sequences of the 11 West African BAGV isolates characterized in this study and sequences from India, CAR and Spain on the DIVEIN web server. The "potential" score represents the averaged output of nine neural networks and the "jury agreement" indicates how many of the nine networks support the prediction. In total, eight N-glycosylated motifs were identified in the BAGV genome (potential > 0.5) including two highly probable sites (potential > 0.5 and jury agreement of 9/9). Despite high potential (0.7452) and jury agreement (9/9), the motif (Asn-X-Thr) NPTD identified at position 603 was not considered to be glycosylated because it contained a Proline known to preclude the N-glycosylation by rendering inaccessible the Asparagine in the majority of cases (Figure 3 ). This motif was in the domain III region of the E protein of all BAGV isolates. However, a second (Asn-X-Ser) motif NFSL was highly predicted (score 0.6223 (9/9)) and suggested an N-linked glycosylation site at the residue Asn-2333 in the NS4B protein. Interestingly, we also found six others probable N-glycosylation at different positions on the BAGV polyprotein including one site (NYSI) harboring, the NYS motif at the 443th position (153th position of the E protein), previously described as a virulence factor for WNV and DENV. of the E protein of all BAGV isolates. However, a second (Asn-X-Ser) motif NFSL was highly predicted (score 0.6223 (9/9)) and suggested an N-linked glycosylation site at the residue Asn-2333 in the NS4B protein. Interestingly, we also found six others probable N-glycosylation at different positions on the BAGV polyprotein including one site (NYSI) harboring, the NYS motif at the 443th position (153th position of the E protein), previously described as a virulence factor for WNV and DENV. The "potential" score is the averaged output of nine neural networks and the "jury agreement" indicates how many of the nine networks support the prediction. The N-Glyc Result column shows one of the following outputs for predictions. Nglycosylated sites highly predicted by the nine networks (potential > 0.5 and jury agreement of 9/9) are highlighted in red and the site previously reported as virulence factor on E protein of flaviviruses is colored in blue.

Assessment of thermodynamically stable and evolutionarily conserved structural RNA domains was performed using complete non-coding sequences of the 11 West African BAGV isolates characterized in this study and the isolate from Spain. The RNAz method implemented in the Vienna Figure 3 . Prediction of N-glycosylation on Bagaza virus genome. Predictions were performed using the NetNGlyc 1.0 server. A position with a potential (green vertical lines) crossing the threshold (horizontal red line at 0.5) is predicted glycosylated. The "potential" score is the averaged output of nine neural networks and the "jury agreement" indicates how many of the nine networks support the prediction. The N-Glyc Result column shows one of the following outputs for predictions. N-glycosylated sites highly predicted by the nine networks (potential > 0.5 and jury agreement of 9/9) are highlighted in red and the site previously reported as virulence factor on E protein of flaviviruses is colored in blue.

Assessment of thermodynamically stable and evolutionarily conserved structural RNA domains was performed using complete non-coding sequences of the 11 West African BAGV isolates characterized in this study and the isolate from Spain. The RNAz method implemented in the Vienna RNA Websuite was used to identify conserved structural RNA domains in the UTRs of BAGV characterized by a p > 0.5. Using the RNAz method, highly conserved structural RNA domains was not identified in the 5 UTR of BAGV genome while a total of four highly conserved structural RNA domains were determined in the 3 region with respective classification probabilities of 0.671490, 0.994641, 0.976295 and 0.846482 ( Figure S1 ).

However, the RNAalifold method implemented in the Vienna RNA Websuite server predicted that, as in the genome of other members of the genus flavivirus, BAGV has a shorter 5 UTR (≈100 nt), consisting of a pair of conserved stem-loops (SL-A and SL-B) (Figure 4) . SL-A serves as promoter of viral polymerase activity followed by a shorter loop which contains a cyclisation sequence upstream of the 5 AUG (SL-B) . The secondary structure of BAGV's 3 UTR could be divided in three parts; a highly variable domain 1 following the stop codon and consisting in an AU-rich stem-loop (SL-I), a second domain 2 with highly conserved sequence and two stem-loops (SL-II and SL-III) and dumbbell structures (DB1 and DB2), and the moderately conserved distal domain 3 which contains the complementary cyclisation elements. In the intermediate domain, the SL-II presented a pseudoknot PK1 preceding a short conserved loop (RCS3). This structural motif was repeated in a stem-loop SL-III with PK2 and CS3. These stem-loops were followed by dumbbell structures DB1 and DB2 that presented conserved loop RCS2 connected with a pseudoknot PK3 and its repetition CS2, respectively [43] . Thus, organization of conserved sequences on consensus secondary structure of BAGV's 3 UTR was structured RCS3-CS3-RCS2-CS2-ImCS1. Indeed, CS1 was imperfect (ImCS1) only on sequences of West African BAGV isolates with a total of nine substitutions compared to the corresponding consensus sequence previously described [3] .

Viruses 2018, 10, x 12 of 26 RNA Websuite was used to identify conserved structural RNA domains in the UTRs of BAGV characterized by a p > 0.5. Using the RNAz method, highly conserved structural RNA domains was not identified in the 5′ UTR of BAGV genome while a total of four highly conserved structural RNA domains were determined in the 3′ region with respective classification probabilities of 0.671490, 0.994641, 0.976295 and 0.846482 ( Figure S1 ). However, the RNAalifold method implemented in the Vienna RNA Websuite server predicted that, as in the genome of other members of the genus flavivirus, BAGV has a shorter 5′ UTR (≈100 nt), consisting of a pair of conserved stem-loops (SL-A and SL-B) (Figure 4) . SL-A serves as promoter of viral polymerase activity followed by a shorter loop which contains a cyclisation sequence upstream of the 5′ AUG (SL-B) . The secondary structure of BAGV's 3′ UTR could be divided in three parts; a highly variable domain 1 following the stop codon and consisting in an AU-rich stem-loop (SL-I), a second domain 2 with highly conserved sequence and two stem-loops (SL-II and SL-III) and dumbbell structures (DB1 and DB2), and the moderately conserved distal domain 3 which contains the complementary cyclisation elements. In the intermediate domain, the SL-II presented a pseudoknot PK1 preceding a short conserved loop (RCS3). This structural motif was repeated in a stem-loop SL-III with PK2 and CS3. These stem-loops were followed by dumbbell structures DB1 and DB2 that presented conserved loop RCS2 connected with a pseudoknot PK3 and its repetition CS2, respectively [43] . Thus, organization of conserved sequences on consensus secondary structure of BAGV's 3′ UTR was structured RCS3-CS3-RCS2-CS2-ImCS1. Indeed, CS1 was imperfect (ImCS1) only on sequences of West African BAGV isolates with a total of nine substitutions compared to the corresponding consensus sequence previously described [3] . 

The Bayesian phylogenetic analysis for estimation of data quality and selection of the best-fit nucleotide substitution model were performed using Mega 6.0 with a discrete Gamma distribution (+G) with 5 rate categories. The General Time-Reversible with a discrete Gamma distribution and a proportion of invariant sites (GTR+I) was the best nucleotide substitution model for our sequences data presenting score values of 69 

The Bayesian phylogenetic analysis for estimation of data quality and selection of the best-fit nucleotide substitution model were performed using Mega 6.0 with a discrete Gamma distribution (+G) with 5 rate categories. The General Time-Reversible with a discrete Gamma distribution and a proportion of invariant sites (GTR+I) was the best nucleotide substitution model for our sequences data presenting score values of 69,924.128, 69,453.449 and −34,680.714 for BIC, AICc and lnL criteria, respectively. The maximum likelihood (ML) tree was inferred using FastTree v2.1.7 [26] on our total data set including the 11 complete polyprotein sequences of West African BAGV isolates circulating in Senegal and Côte d'Ivoire from 1988 to 2014, the 5 BAGV sequences from Spain, the BAGV sequences from India and CAR and complete polyproteins from different flaviviruses, with 10,281 bp alignment length ( Figure 5 ). A GTR+I model was used, as selected by Bayesian criteria. Nodes were labeled with local support values computed with 5000 bootstrap replications using the Shimodaira-Hasegawa (SH) test. The phylogeny of complete BAGV genome sequences presented evidence of a single BAGV phylogenetic group. Furthermore, we observed also that Israel meningo-encephalitis turkey virus (ITV) was closed to BAGV in genetic relatedness [11] .

Viruses 2018, 10, x 13 of 26 respectively. The maximum likelihood (ML) tree was inferred using FastTree v2.1.7 [26] on our total data set including the 11 complete polyprotein sequences of West African BAGV isolates circulating in Senegal and Côte d'Ivoire from 1988 to 2014, the 5 BAGV sequences from Spain, the BAGV sequences from India and CAR and complete polyproteins from different flaviviruses, with 10,281 bp alignment length ( Figure 5 ). A GTR+I model was used, as selected by Bayesian criteria. Nodes were labeled with local support values computed with 5000 bootstrap replications using the Shimodaira-Hasegawa (SH) test. The phylogeny of complete BAGV genome sequences presented evidence of a single BAGV phylogenetic group. Furthermore, we observed also that Israel meningoencephalitis turkey virus (ITV) was closed to BAGV in genetic relatedness [11] . 

Given the major implications of recombination events for evolution, pathogenicity, or diagnosis of non-segmented positive RNA viruses like flaviviruses [44] , it is clearly important to determine their occurrence in the BAGV genome. The RDP4beta 4.8 program used for assessment of recombination events on complete polyprotein sequences [27] revealed evidence of only one highly credible recombination event from the E protein to NS2B, with estimated breakpoints at positions 2202 and 4908 of BAGV genome. This recombination event involved the isolate ARD54139_Dakar-Bango_SEN_1989 originating from Saint-Louis, in the North of Senegal ( Figure 6 ). Considering the isolates ARD260266_ Barkedji_SEN_2014 and ARD171075_Barkedji_SEN_2004 as respective minor and major parents of the isolate ARD54139_Dakar-Bango_SEN_1989 (Similarity of 98.8% and 97%, respectively), this recombination event was found by RDP, GENECONV, Bootscan, Maxchi, Chimaera, SiSscan and 3Seq methods and supported by significant p-values of 3.09 × 10 -16 , 9.23 × 10 −12 , 7.36 × 10 −13 , 8.45 × 10 −7 , 1.59 × 10 −7 , 3.60 × 10 −8 and 1.17 × 10 −12 , respectively. The BootScan and GARD analyzes identified one significant recombination breakpoint at nucleotide position 2201 corresponding to the E protein, supported by a LHS p-value of 0.024 and a RHS p-value of 0.001.

This breakpoint divides the BAGV genome into two regions: one that encodes the structural proteins and another that encodes the non-structural proteins. Phylogenetic trees were constructed using 1000 bootstrap replications and midpoint rooted for clarity only (Figure 7 ). This recombination event led to a mismatch between NJ phylogenetic trees constructed using comparison of nucleotides sequences of recombinant (positions 2202-4908) and non-recombinant genomic regions (positions 1-2201 and 4909-10,281). The tree is midpoint-rooted, nodes are labeled with local support values computed using the Shimodaira-Hasegawa (SH) test for 5000 bootstrap replications, species names are color-coded as follows: new characterized BAGV isolatesdark blue with dots; previous sequences of BAGV-dark blue; mosquito-borne flaviviruses (MBFVs)-green; dual-host affiliated ISFs (dISFs)-red; no Known Vector (NKV) flavivirusesyellow; tick-born flaviviruses (TBFVs)-light blue; classical ISFs (cISFs)-Orange.

Given the major implications of recombination events for evolution, pathogenicity, or diagnosis of non-segmented positive RNA viruses like flaviviruses [44] , it is clearly important to determine their occurrence in the BAGV genome. The RDP4beta 4.8 program used for assessment of recombination events on complete polyprotein sequences [27] revealed evidence of only one highly credible recombination event from the E protein to NS2B, with estimated breakpoints at positions 2202 and 4908 of BAGV genome. This recombination event involved the isolate ARD54139_Dakar-Bango_SEN_1989 originating from Saint-Louis, in the North of Senegal (Figure 6 ). Considering the isolates ARD260266_Barkedji_SEN_2014 and ARD171075_Barkedji_SEN_2004 as respective minor and major parents of the isolate ARD54139_Dakar-Bango_SEN_1989 (Similarity of 98.8% and 97%, respectively), this recombination event was found by RDP, GENECONV, Bootscan, Maxchi, Chimaera, SiSscan and 3Seq methods and supported by significant p-values of 3.09 × 10 -16 , 9.23 × 10 −12 , 7.36 × 10 −13 , 8.45 × 10 −7 , 1.59 × 10 −7 , 3.60 × 10 −8 and 1.17 × 10 −12 , respectively. The BootScan and GARD analyzes identified one significant recombination breakpoint at nucleotide position 2201 corresponding to the E protein, supported by a LHS p-value of 0.024 and a RHS p-value of 0.001.

This breakpoint divides the BAGV genome into two regions: one that encodes the structural proteins and another that encodes the non-structural proteins. Phylogenetic trees were constructed using 1000 bootstrap replications and midpoint rooted for clarity only (Figure 7 ). This recombination event led to a mismatch between NJ phylogenetic trees constructed using comparison of nucleotides sequences of recombinant (positions 2202-4908) and non-recombinant genomic regions (positions 1-2201 and 4909-10,281). 

The structural and non-structural coding regions were analyzed separately for estimation of sites and branches under positive diversifying selection, applying four different methods to ensure consistency of these events along of BAGV sequences. Using this approach, we found several sites under strong negative selection and most of them were in the E, NS3 and NS5 proteins (Table 5) . However, the significant evidence (p < 0.1) of episodic positive selection was obtained for all the coding genes, except for the prM, NS2B and NS4A regions. All positively selected sites estimated by the FUBAR model (posterior probability ≥ 0.9) were also identified by the MEME method (p < 0.1). Thus, an important number of positively selected sites were detected; interestingly, the majority of such sites were in the E, NS1 and NS5 proteins. Branch-site analysis showed also a total of 11 branches evaluating under positive selection (p < 0.05) and the highest proportion was in the E and NS1 proteins. 

The structural and non-structural coding regions were analyzed separately for estimation of sites and branches under positive diversifying selection, applying four different methods to ensure consistency of these events along of BAGV sequences. Using this approach, we found several sites under strong negative selection and most of them were in the E, NS3 and NS5 proteins (Table 5) . However, the significant evidence (p < 0.1) of episodic positive selection was obtained for all the coding genes, except for the prM, NS2B and NS4A regions. All positively selected sites estimated by the FUBAR model (posterior probability ≥ 0.9) were also identified by the MEME method (p < 0.1). Thus, an important number of positively selected sites were detected; interestingly, the majority of such sites were in the E, NS1 and NS5 proteins. Branch-site analysis showed also a total of 11 branches evaluating under positive selection (p < 0.05) and the highest proportion was in the E and NS1 proteins. Pervasive diversifying selection at posterior probability ≥ 0.9 with FUBAR model; Episodic diversifying selection at 0.1 significance level with SLAC and MEME models; Episodic diversifying selection at p-value p ≤ 0.05 with Branch-sites REL model.

MCMC convergence was obtained for three independent runs with 100 million generations, which were sufficient to obtain a proper sample for the posterior at MCMC stationarity assessed by effective sample sizes (ESS) above 200 for each gene. Furthermore, the evolutionary rates (µ) and the highest posterior densities (HPD with 95% of confidence interval) were 1.226 × 10 

Evidence of BAGV adaptation to human house-keeping genes was analyzed by calculating CAI indices using complete coding polyprotein sequences of West African BAGV isolates and BAGV sequences available from Spain, in comparison to other MBFVs such as DENV, USUV, WNV, ZIKV and YFV, NKV flaviviruses (ModV and RBV) and ISFs (CxFV and AeFV). CAI values > 1 were obtained for polyprotein sequences of all BAGV isolates. Thus, there is evidence that BAGV could have adaptation to the human genes ( Figure 9 ). ModV(mean CAI: 1.072 and median CAI: 1.072), RBV (mean CAI: 1.059 and median CAI: 1.059) and YFV (mean CAI: 1.075 and median CAI: 1.072) showed the highest CAI values for human housekeeping genes and were significantly different to Spanish and West African BAGV isolates (Wilcoxon Rank Sum Test, p-values ranging from 0.0001 to 1.028 × 10 −7 ). Compared to those of Spanish isolates, sequences of West African BAGV isolates presented significantly higher CAI values (mean CAI: 1.044 and median CAI: 1.044, Wilcoxon Rank Sum Test, p-value < 0.002). In addition, they were also higher than DENV serotype 2 (DENV-2) (Wilcoxon Rank Sum Test, p-value < 1.4 × 10 −6 ). However, CAI values of West African BAGV isolates were lower than those of DENV-1 (Wilcoxon Rank Sum Test, W = 231, p-value = 2.869 × 10 −6 ) and comparable to CAI values given by DENV-3 and DENV-4 (Wilcoxon Rank Sum Test, W = 3258, p-value = 0.06477 and W = 824, p-value = 0.3463, respectively). Interestingly, CAI values of West African isolates were also significantly higher than those obtained for other MBFVs well known to infect human such as USUV (Wilcoxon Rank Sum Test, p-value < 6.796 × 10 −9 ), WNV (Wilcoxon Rank Sum Test, p-value < 2.718 × 10 −10 ), ZIKV (Wilcoxon Rank Sum Test, p-value < 1.67 × 10 −8 ) and ISFs (means CAI: 1.0015 and 1.0006 and median CAI: 1.0015 and 1.0006 for CxFV and AeFV, respectively) which showed low evidence 

Evidence of BAGV adaptation to human house-keeping genes was analyzed by calculating CAI indices using complete coding polyprotein sequences of West African BAGV isolates and BAGV sequences available from Spain, in comparison to other MBFVs such as DENV, USUV, WNV, ZIKV and YFV, NKV flaviviruses (ModV and RBV) and ISFs (CxFV and AeFV). CAI values > 1 were obtained for polyprotein sequences of all BAGV isolates. Thus, there is evidence that BAGV could have adaptation to the human genes ( Figure 9 ). ModV(mean CAI: 1.072 and median CAI: 1.072), RBV (mean CAI: 1.059 and median CAI: 1.059) and YFV (mean CAI: 1.075 and median CAI: 1.072) showed the highest CAI values for human housekeeping genes and were significantly different to Spanish and West African BAGV isolates (Wilcoxon Rank Sum Test, p-values ranging from 0.0001 to 1.028 × 10 −7 ). Compared to those of Spanish isolates, sequences of West African BAGV isolates presented significantly higher CAI values (mean CAI: 1.044 and median CAI: 1.044, Wilcoxon Rank Sum Test, p-value < 0.002). In addition, they were also higher than DENV serotype 2 (DENV-2) (Wilcoxon Rank Sum Test, p-value < 1.4 × 10 −6 ). However, CAI values of West African BAGV isolates were lower than those of DENV-1 (Wilcoxon Rank Sum Test, W = 231, p-value = 2.869 × 10 −6 ) and comparable to CAI values given by DENV-3 and DENV-4 (Wilcoxon Rank Sum Test, W = 3258, p-value = 0.06477 and W = 824, p-value = 0.3463, respectively). Interestingly, CAI values of West African isolates were also significantly higher than those obtained for other MBFVs well known to infect human such as USUV (Wilcoxon Rank Sum Test, p-value < 6.796 × 10 −9 ), WNV (Wilcoxon Rank Sum Test, p-value < 2.718 × 10 −10 ), ZIKV (Wilcoxon Rank Sum Test, p-value < 1.67 × 10 −8 ) and ISFs (means CAI: 1.0015 and 1.0006 and median CAI: 1.0015 and 1.0006 for CxFV and AeFV, respectively) which showed low evidence for codon adaptation towards human housekeeping genes (Wilcoxon Rank Sum Test, p-value < 2.328 × 10 −16 ). Although CAI results for ISFs were significantly lower to human housekeeping genes, we did not find any significant difference between CxFV and AeFV codon adaptation. Compared to codon usage of human genes, sequences of tobacco mosaic virus (TMV) showed mean and median CAI values of 0.9587 and 0.9592, respectively. 

With an increasing number of emergent and re-emergent pathogens involved in human encephalitis, it is important to try to better understand which viruses have a potential to emerge causing human infection in the future. Since its first isolation, BAGV was only detected in mosquito pools collected in the field during entomological investigations in West and Central Africa and in India [12] . However, in 2010, BAGV was identified as the cause of an encephalitis outbreak in wild birds circulating in Southern Spain [9] . In a possible host-switching event [45] , BAGV could acquire future adaptation to other vertebrates such as humans [46] . In this study, genetic properties of BAGV isolates circulating in West Africa, the evolutionary phylogeny of BAGV and evidence of BAGV adaptation to human house-keeping genes were evaluated in comparison with different flavivirus groups.

Genomes of 11 West African BAGV strains isolated from mosquito pools collected in the field from 1988 to 2014 showed similarities in terms of gene lengths when compared with polyprotein sequences of previously available isolates from CAR and Spain.

Low amino acid distances observed between West African isolates (<2%) in comparison with previously non-West African sequences (<3%) combined with the weak coefficient of differentiation (<0.2) revealed evidence of a low genetic diversity of BAGV sequences analyzed in this study as previously described [9] . In addition, the West African BAGV isolates were more closely related to the CAR isolate. Genome sequences originating from BAGV isolates from other geographic locations would be helpful to understand if this low diversity is secluded to West-Africa.

Although the 5′ UTR was conserved between isolates, the 3′ UTR of West African isolates varied in terms of length and structure. As in other mosquito-borne flavivirus genomes, BAGV genome harbored structural RNA domains both in 5′ and 3′ UTRs which play a major role in flaviviral replication and interactions with host proteins and regulate cellular response to infection [47, 48] . 

With an increasing number of emergent and re-emergent pathogens involved in human encephalitis, it is important to try to better understand which viruses have a potential to emerge causing human infection in the future. Since its first isolation, BAGV was only detected in mosquito pools collected in the field during entomological investigations in West and Central Africa and in India [12] . However, in 2010, BAGV was identified as the cause of an encephalitis outbreak in wild birds circulating in Southern Spain [9] . In a possible host-switching event [45] , BAGV could acquire future adaptation to other vertebrates such as humans [46] . In this study, genetic properties of BAGV isolates circulating in West Africa, the evolutionary phylogeny of BAGV and evidence of BAGV adaptation to human house-keeping genes were evaluated in comparison with different flavivirus groups.

Genomes of 11 West African BAGV strains isolated from mosquito pools collected in the field from 1988 to 2014 showed similarities in terms of gene lengths when compared with polyprotein sequences of previously available isolates from CAR and Spain.

Low amino acid distances observed between West African isolates (<2%) in comparison with previously non-West African sequences (<3%) combined with the weak coefficient of differentiation (<0.2) revealed evidence of a low genetic diversity of BAGV sequences analyzed in this study as previously described [9] . In addition, the West African BAGV isolates were more closely related to the CAR isolate. Genome sequences originating from BAGV isolates from other geographic locations would be helpful to understand if this low diversity is secluded to West-Africa.

Although the 5 UTR was conserved between isolates, the 3 UTR of West African isolates varied in terms of length and structure. As in other mosquito-borne flavivirus genomes, BAGV genome harbored structural RNA domains both in 5 and 3 UTRs which play a major role in flaviviral replication and interactions with host proteins and regulate cellular response to infection [47, 48] . However, differences in determination of structural RNA domains in 5 UTR between the RNAz and the RNAalifold methods used in this study could be attributed to variations in algorithm of analysis used by each method [21] [22] [23] [24] . The small subgenomic RNA (sfRNA) identified in the 3 UTR of BAGV is generated through incomplete degradation of the viral genome by cellular 5 -3 exonuclease XRN1 [49, 50] and plays an important role in viral pathogenicity [49] and modulation of host responses [51, 52] .

In addition, the stable 3' terminus region of the sfRNA following the dumbbell structures (DB1 and DB2) and complementary to the 5 terminus of the 5 UTR, was shown to be necessary in genomic RNA cyclisation for viral replication and translation [46] . The sfRNA can be in competition with the 3 UTR of genomic RNA in binding to proteins of viral replication complexes (RC) [53] and/or cellular machinery [54] . Thus, it slows down the replication or translation and assembly of particles [51] .

The 3 UTR region is important for translation and replication of the RNA genome through interactions with viral and host proteins, genome stabilization, and RNA packaging [55] . A better understanding of the potential impact of 3 UTR variations in replication of BAGV could be important in the study of mechanisms implicated in their pathogenicity [56, 57] .

Most motifs linked to virulence previously described in these proteins of other MBFVs were conserved among BAGV isolates. However, some non-conservative mutations were identified in E, NS1, NS3 and NS5. In general, non-conservative amino acid mutations (nc) are spontaneous, rare, and hazardous, and then represent the main causes of genetic diversity. Thus, non-conservative mutations observed on BAGV genome could modulate viral phenotypes of particular isolates in mechanisms such as virus cell entry, replication, production of viral particles, and assembly, and cause modifications in post-translational regulation as previously demonstrated for other flaviviruses such as DENV [58] [59] [60] . The E protein is involved in cell receptor recognition, attachment, cell fusion, tropism, and virulence [58] . NS1 is the most conserved non-structural protein of flaviviruses. Associated with the other non-structural proteins, the NS1 protein plays an important function in viral replication and assembly and viral escape to host innate immune response [61] . The NS3 protein is the main component of the replication machinery and ensures multiple functions in viral evasion to host antiviral response and in production and assembly of infectious viral particles [62] . The NS5 protein is the largest viral protein that serves as the RNA-dependent RNA polymerase (RdRp) and performs multiple functions essential for viral replication, including processing the viral polyprotein, replicating the viral RNA. Sharing these motifs of virulence mostly with MBFVs, NKVFs and CxFV than with AeFV showed that BAGV could be more closely related to MBFVs transmitted by Culex mosquitoes and could explain frequent BAGV isolations mainly from mosquitoes of Culex genus and its capacity to infect vertebrates such as wild birds [1, 4, 5, 9, 10] . In addition, The West African BAGV isolates characterized in our study were mainly isolated from Culex poicilipes and Culex neavei mosquitoes which have been reported as potential vectors for flaviviruses such as WNV [63] . Culex neavei was also found as a competent vector able to transmit flaviviruses such as USUV and WNV [64, 65] . Despite no available data on Culex poicilipes competency to transmit flaviviruses, these two mosquito species belonging to Culex genus could play an important role in natural transmission of BAGV to vertebrates such as wild birds since another member of the Culex genus, Culex tritaeniorhynchus, has been found competent to transmit BAGV to mice [8] .

The phylogenetically informative sites identified on the BAGV genome located mainly in NS3, NS4B and NS5 proteins, respectively, could have a considerable impact in viral fitness on host for corresponding West African isolates. In addition, the prediction of the N-glycosylated sites at different positions on BAGV genome such as Asn2333 in NS4B and the NYSI motif at 153th position of the E protein showed that post-translational modifications may influence acquisition or loss of capacity in mechanisms such as pathogenicity, evasion of innate immune pathways. Indeed, flaviviruses NS4B plays an important role in replication of viral RNA facilitating the formation of replication complexes and modulating host innate immune response such as interferons, microRNAs and RNA interference, formation of stress granules and the unfolded protein responses [66] [67] [68] . A previous study had shown that N-glycosylation of NS4B of DENV does not affect the protein stability but causes a considerable reduction in efficiency of viral production [69] . Presence of a glycosylation site and an informative site in the viral NS4B protein could influence the efficiency of viral replication and the outer shape of the virion. The presence of the N-linked glycosylation motif NYS had been previously reported at 67/153th and 154th on the E protein of DENV and WNV (lineage 1 strains and some neuroinvasive lineage 2 strains), respectively, involving in receptor binding, viral morphogenesis, viral infectivity, and tropism [70] [71] [72] [73] [74] . Since glycosylation is a means of evasion to immune recognition within the host by masking particular antigenic sites from recognition by neutralizing antibodies, it could increase the diversity of the glycosylation on viral proteins [75, 76] . Nevertheless, it could be important in future studies to determine whether the predicted glycosylation sites are really used (asparagine-linked) using specific enzymatic digestion by Endo H and peptide N-glycosidase (PNGase F) [77] . Our data suggest the ability of BAGV to develop phenotypically important variations and potentially adaptation to new vertebrate hosts such as humans. However, to understand better the impact of variation on these predicted N-glycosylation sites and the identified phylogenetically important variations would require in vitro studies with reverse genetically engineered infectious clones on mosquito or mammalian cell lines and in vivo experiments in mosquitoes or in animal models like mouse [78, 79] . However, antibodies against BAGV proteins or infectious clone are currently not available for BAGV.

The identification of natural recombination events between virus isolates is important for our understanding of virus evolution. In our study, we identified a recombination event in the E protein BAGV. Recombination was documented in other members of the mosquito-borne flavivirus group [80, 81] , but had not yet been demonstrated to occur in BAGV. Identification of recombination breakpoints and the graphical detection of conflicting phylogenetic signals gave confirmation of this recombination event in E protein of the Senegalese isolate ARD54139_Dakar-Bango_SEN_1989 as previously described for ZIKV [74, 82] . Nevertheless, the precise molecular mechanisms of the template switches are unknown. The E protein is highly important because it encodes the most important antigen with regards to virus biology and humoral immunity. Therefore, large-scale genetic changes in this region, as might be brought about by recombination, could have significant impact on virus phenotype [44] .

The estimation of the selection pressures acting on each protein of BAGV demonstrated episodes of strong negative selection in functionally important proteins. These results suggested frequent purging of deleterious polymorphisms in the BAGV genome that could be associated with accumulation of synonymous mutations during BAGV transmission [83] . However, location of more significant episodes of positive selection in the E, NS1 and NS5 proteins indicated that they could represent preferential selection targets during BAGV evolution [84] . Indeed, the E protein of flaviviruses plays a crucial role in early steps of host cell binding and viral entry and represents a main target for immune responses influencing antigenic response and positive selection on the E protein is a hallmark of the emergence of flaviviruses [85, 86] . Positive selection episodes have been also previously reported for the DENV-3 capsid, however, the impact needs to be further investigated [87] . Likewise, non-structural proteins could also be targets of positive selection.

The NS1 protein is essential for viral RNA replication, is involved in immune system evasion, and represents the major positive selection target during speciation of arthropod-born flaviviruses such as DENV and ZIKV [88] . NS2A and NS4B proteins have been shown to antagonize the interferon response during DENV infection [89] and changes in these regions would be evolutionary advantageous selecting for BAGV strains with strong innate immunity suppression mechanisms. Mutations in the NS4B protein were also seen to modulate several phenotype mechanisms of flaviviruses, such as pathogenesis [90] , viral adaptation [91] , replication [68] , neurovirulence [66] and host preferences [92] . Thus, presence of positively selected sites in NS4B of BAGV isolates could have major impact in its natural evolution.

NS3 and NS5 proteins are crucial for viral replication, since non-conservative changes in these regions could modify process of protease and ATPase/helicase functions of NS3 protein [93] and RNA polymerase activity of NS5 protein [94] . These several polymorphic amino acid coding sites in the BAGV genome suggest that these proteins may be experiencing relatively adaptive changes in the natural evolution and they should be prioritized in future experimental studies.

Despite the evidence of a single phylogenetic group for BAGV sequences analyzed in our study, the evolutionary rates are expected in accordance to proteins functions; the NS5 representing the polymerase and the most conserved protein [86] . The inferred Bayesian MCC trees indicated a single introduction of BAGV into Europe and Africa from India, contrary to other African flaviviruses as WNV [95] and ZIKV [74] , suggesting an Indian origin of BAGV. Estimated times from the MRCA suggested a distant origin of West African BAGV sequences analyzed in this study from the 15th century. Thus, further phylodynamic analyzes based on more complete sequences could be interesting for determining geographical pathways and potential evolution patterns in correlation with BAGV spread from India to African and European continents.

The Codon Adaptation Index (CAI) represents a reliable bio-informatics approach to measure the synonymous codon usage bias and to assess the adaptation of viral genes to their hosts [96] . Flaviviruses can infect and replicate in hosts of different phyla. Therefore, their versatility in gene expression and protein synthesis and changes in the viral RNA genome could affect the fitness of the virus in a specific host relating to dinucleotide frequencies, codon preferences, and codon pair biases [97] [98] [99] . Nevertheless, ecology, different virus-host relationships, biogeographical migrations of flavivirus species and genetic differences may explain observed differences in flaviviral codon usage preference to human housekeeping genes [98, 100, 101] . In particular, NKV flaviviruses were only isolated from vertebrates and are maintained in nature by horizontal transmission between vertebrate hosts [102, 103] . Although ISFs were thought to sustain their populations in their respective insect vectors in the absence of mammal reservoirs, so lower translational efficiency in vertebrates could be expected [97, 104] .

In addition, the highest CAIs of YFV and DENV could be related to their long histories of infection in humans [105] . Indeed, YFV and DENV are maintained in endemic and sylvatic cycles, which conducted to repeated epidemics for more than one hundred years. However, YFV showed generally a higher virulence in human infections, particularly when it is compared to DENV infections reported in Africa [106] . This could explain the higher CAI values of YFV towards codon usage of the human housekeeping genes.

With evidence of adaptation to human house-keeping genes, BAGV could be potential cause of infection in vertebrates, such as humans. Considering the highest CAI values of West African BAGV isolates when compared to isolates responsible of the Spanish wild bird's outbreak in 2010 [9] , BAGV adaptation to vertebrate species such as birds could have led to an extension of adaptation to other species as shown in a previous virus study [46] . Interestingly, West African BAGV isolates showed a higher evidence of codon adaptation than MBFVs well known to infect humans, such as WNV which is a major cause of human encephalitis in USA and responsible of recent outbreaks in Europe [107] and ZIKV associated with microcephaly in fetuses and newborns during the outbreak in Brazil in 2015 [91] . Thus, further comparison of codon adaptation indexes of other BAGV genomic regions, such as the 3 UTR, among isolates that differ in biological, ecological, and genetic characteristics could help to characterize the evolutionary adaptation of BAGV genomes to vertebrate hosts [46, 108] .

Nevertheless, to ensure the potential of BAGV to be involved in human encephalitis cases, it would be important to evaluate its pathogenicity on human induced pluripotent stem cell lines (iPSC) capable of differentiating into brain microvascular endothelial cells (BMECs) and constituting a robust model of the human blood-brain barrier [109] . Otherwise, the iPSC cells can also generate primitive neural stem cells (NSCs), which can differentiate into neurons, astrocytes, or oligodendrocytes [110] . These BAGV sequences data obtained in our study could be used not only in future viral studies, but also in development of reverse genetic reagents or reliable diagnostic tools for investigation of this virus in human populations.

The following are available online at http://www.mdpi.com/1999-4915/10/4/193/s1, Figure S1 : Sequences of conserved structural RNA domains identified on the 3 UTR of Bagaza genomes used in this study.

BAG) strain: Dak Ar B 209

Full-length sequencing and genomic characterization of Bagaza, Kedougou, and Zika viruses

Mosquito vectors of the 1998-1999 outbreak of Rift Valley fever and other arboviruses (Bagaza, Sanar, Wesselsbron and West Nile) in Mauritania and Senegal

Isolations of West Nile and Bagaza viruses from mosquitoes (Diptera: Culicidae) in central Senegal (Ferlo)

Arbovirus isolations from mosquitoes collected during 1988 in the Senegal River basin

Genetic characterization of Bagaza virus (BAGV) isolated in India and evidence of anti-BAGV antibodies in sera collected from encephalitis patients

Preliminary findings on Bagaza virus (Flavivirus: Flaviviridae) growth kinetics, transmission potential transovarial transmission in three species of mosquitoes

Bagaza virus in partridges and pheasants

Monitoring of the Bagaza virus epidemic in wild bird species in Spain

Bagaza virus and Israel turkey meningoencephalomyelitis virus are a single virus species

Bagaza virus inhibits Japanese encephalitis & West Nile virus replication in Culex tritaeniorhynchus & Cx. quinquefasciatus mosquitoes

Natural Bagaza virus infection in game birds in southern Spain

MEGA7: Molecular Evolutionary Genetics Analysis version 7.0

Continuous cell lines and immune ascitic fluid pools in arbovirus detection

Development of one-step quantitative reverse transcription PCR for the rapid detection of flaviviruses

Unipro UGENE: A unified bioinformatics toolkit

Multiple sequence alignment with high accuracy and high throughput

DIVEIN: A Web Server to Analyze Phylogenies, Sequence Divergence, Diversity, and Informative Sites

Prediction of N-glycosylation sites in human proteins

The RNAz web server: Prediction of thermodynamically stable and evolutionarily conserved RNA structures

The Vienna RNA Websuite

Improved consensus structure prediction for RNA alignments

Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure

Does the choice of nucleotide substitution models matter topologically? BMC Bioinform

FastTree 2: Approximately Maximum-Likelihood Trees for Large Alignments

RDP4: Detection and analysis of recombination patterns in virus genomes

Multiple significance tests: The Bonferroni method

Hypothesis testing using phylogenies

The effect of recombination on the accuracy of phylogeny estimation

FUBAR: A fast, unconstrained Bayesian approximation for inferring selection

Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty

Accurate model selection of relaxed molecular clocks in Bayesian phylogenetics

Bayesian phylogenetics with BEAUti and the BEAST 1.7

The codon Adaptation Index-A measure of directional synonymous codon usage bias, and its potential applications

Codon usage in yeast: Cluster analysis clearly differentiates highly and lowly expressed genes

Spread of the pandemic Zika virus lineage is associated with NS1 codon usage adaptation in humans

Importance of codon usage for the temporal regulation of viral gene expression

Genomic analysis of codon usage shows influence of mutation pressure, natural selection, and host features on Marburg virus evolution

A combined set of tools to assess codon usage adaptation

A novel server to estimate an expected value of Codon Adaptation Index (eCAI)

Human housekeeping genes, revisited

Berzal-Herranz, A. Functional information stored in the conserved structural RNA domains of the flavivirus genome

The extent of homologous recombination in members of the genus Flavivirus

Family level phylogenies reveal modes of macroevolution in RNA viruses

Clinical Sequencing Uncovers Origins and Evolution of Lassa Virus

Conserved RNA secondary structures in Flaviviridae genomes

Flaviviral RNAs: Weapons and targets in the war between virus and host

Identification and characterization of small sub-genomic RNAs in dengue 1-4 virus-infected cell cultures and tissues

A highly structured, nuclease-resistant, noncoding RNA produced by flaviviruses is required for pathogenicity

Noncoding subgenomic flavivirus RNA: Multiple functions in West Nile virus pathogenesis and modulation of host responses

Functional non-coding RNAs derived from the flavivirus 3 untranslated region

Japanese encephalitis virus non-coding RNA inhibits activation of interferon by blocking nuclear translocation of interferon regulatory factor 3

Poly (A)-binding protein binds to the non-polyadenylated 3 untranslated region of dengue virus and modulates translation efficiency

RNA Structure Duplications and Flavivirus Host Adaptation

West Nile virus encodes a microRNA-like small RNA in the 3 untranslated region which up-regulates GATA4 mRNA and facilitates virus replication in mosquito cells

Isolation and genome characterization of a novel duck Tembusu virus with a 74 nucleotide insertion in the 3 non-translated region

Post-translational regulation and modifications of flavivirus structural proteins

Structural and functional studies of the promoter element for Dengue virus RNA replication

A trade-off in replication in mosquito versus mammalian systems conferred by a point mutation in the NS4B protein of dengue virus type 4

Flavivirus NS1: A multifaceted enigmatic viral protein

A Proline-Rich N-Terminal Region of the Dengue Virus NS3 Is Crucial for Infectious Particle Production

Evaluation of the efficiency of bird-baited traps for sampling potential West Nile fever mosquito vectors (Diptera: Culicidae) in Senegal

Vector Competence of Culex neavei (Diptera: Culicidae) for Usutu Virus

Vector Competence of Culex neavei and Culex quinquefasciatus (Diptera: Culicidae) from Senegal for Lineages 1, 2, Koutango and a Putative New Lineage of West Nile virus

Flaviviral NS4b, chameleon and jack-in-the-box roles in viral replication and pathogenesis, and a molecular target for antiviral intervention

A Combined Genetic-Proteomic Approach Identifies Residues within Dengue Virus NS4B Critical for Interaction with NS3 and Viral Replication

Evidence for a genetic and physical interaction between nonstructural proteins NS1 and NS4B that modulates replication of West Nile virus

Mutation of Putative N-Glycosylation Sites on Dengue Virus NS4B Decreases RNA Replication

Biological and phylogenetic characteristics of West African lineages of West Nile virus

N-linked glycans on dengue viruses grown in mammalian and insect cells

Crucial role of the N-glycans on the viral E-envelope glycoprotein in DC-SIGN-mediated dengue virus infection

N-linked glycosylation of west nile virus envelope proteins influences particle assembly and infectivity

Molecular evolution of Zika virus during its emergence in the 20(th) century

The HIV glycan shield as a target for broadly neutralizing antibodies

Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy

Characterization of N-Glycan Structures on the Surface of Mature Dengue 2 Virus Derived from Insect Cells

Structure-based mutational analysis of several sites in the E protein: Implications for understanding the entry mechanism of Japanese encephalitis virus

In vitro and in vivo evaluation of mutations in the NS region of Lineage 2 West Nile virus associated with neuroinvasiveness in a mammalian model

Why do RNA viruses recombine?

Analysis of genotype diversity and evolution of Dengue virus serotype 2 using complete genomes

Effects of recombination on densovirus phylogeny

A large variation in the rates of synonymous substitution for RNA viruses and its relationship to a diversity of viral infection and transmission modes

Recombination and positive selection identified in complete genome sequences of Japanese encephalitis virus

Antigenic structure of flavivirus proteins

Molecular evolution of dengue 2 virus in Puerto Rico: Positive selection in the viral envelope accompanies clade reintroduction

Jyh-Hsiung, H. Comparative analysis of full genomic sequences among different genotypes of dengue virus type 3

Nonstructural Proteins Are Preferential Positive Selection Targets in Zika Virus and Related Flaviviruses

Inhibition of interferon signaling by dengue virus

Subcellular localization and membrane topology of the Dengue virus type 2 Non-structural protein 4B

Message in a bottle: Lessons learned from antagonism of STING signalling during RNA virus infection

Comparison of genotypes I and III in Japanese encephalitis virus reveals distinct differences in their genetic and host diversity

Crystal structure of the NS3 protease-helicase from dengue virus

Structural and functional analysis of methylation and 5 -RNA sequence requirements of short capped RNAs by the methyltransferase domain of dengue virus NS5

Phylogeny, selection pressure and evolutionary time-scale analysis

The global distribution of the arbovirus vectors Aedes aegypti and Ae. albopictus. Elife

Does adaptation to vertebrate codon usage relate to flavivirus emergence potential?

Codon usage bias: Causative factors, quantification methods and genome wide patterns: With emphasis on insect genomes

Bicluster pattern of codon context usages between flavivirus and vector mosquito Aedes aegypti: Relevance to infection and transcriptional response of mosquito genes

Factors shaping the adaptive landscape for arborviruses: Implications for the emergence of disease

Arbovirus evolution in vivo is constrained by host alternation

Pathogenesis of Modoc virus (Flaviviridae; Flavivirus) in persistently infected hamsters

Latent Infection of Rio Bravo Virus in Salivary Glands of Bats

Insect-Specific Flaviviruses: A Systematic Review of Their Discovery, Host Range, Mode of Transmission, Superinfection Exclusion Potential and Genomic Organization

Yellow fever virus: Genetic and phenotypic diversity and implications for detection, prevention and therapy

Identification of host genes leading to West Nile virus encephalitis in mice brain using RNA-SEQ analysis

Zika virus: An update on epidemiology, pathology, molecular biology, and animal model

Genome-wide analysis reveals class and gene specific codon usage adaptation in avian paramyxoviruses 1

An Isogenic Blood-Brain Barrier Model Comprising Brain Endothelial Cells, Astrocytes and Neurons Derived from Human Induced Pluripotent Stem Cells

Efficient and rapid derivation of primitive neural stem cells and generation of brain subtype neurons from human pluripotent stem cells

This research received no specific grant from any funding agency in the public, commercial,