key: cord-0002822-b6l0qshz
authors: Forrester, Naomi L.; Wertheim, Joel O.; Dugan, Vivian G.; Auguste, Albert J.; Lin, David; Adams, A. Paige; Chen, Rubing; Gorchakov, Rodion; Leal, Grace; Estrada-Franco, Jose G.; Pandya, Jyotsna; Halpin, Rebecca A.; Hari, Kumar; Jain, Ravi; Stockwell, Timothy B.; Das, Suman R.; Wentworth, David E.; Smith, Martin D.; Kosakovsky Pond, Sergei L.; Weaver, Scott C.
title: Evolution and spread of Venezuelan equine encephalitis complex alphavirus in the Americas
date: 2017-08-03
journal: PLoS Negl Trop Dis
DOI: 10.1371/journal.pntd.0005693
sha: 6b3a2955735dd17ee9c0ae0f528a9bb52229973c
doc_id: 2822
cord_uid: b6l0qshz

Venezuelan equine encephalitis (VEE) complex alphaviruses are important re-emerging arboviruses that cause life-threatening disease in equids during epizootics as well as spillover human infections. We conducted a comprehensive analysis of VEE complex alphaviruses by sequencing the genomes of 94 strains and performing phylogenetic analyses of 130 isolates using complete open reading frames for the nonstructural and structural polyproteins. Our analyses confirmed purifying selection as a major mechanism influencing the evolution of these viruses as well as a confounding factor in molecular clock dating of ancestors. Times to most recent common ancestors (tMRCAs) could be robustly estimated only for the more recently diverged subtypes; the tMRCA of the ID/IAB/IC/II and IE clades of VEE virus (VEEV) were estimated at ca. 149–973 years ago. Evolution of the IE subtype has been characterized by a significant evolutionary shift from the rest of the VEEV complex, with an increase in structural protein substitutions that are unique to this group, possibly reflecting adaptation to its unique enzootic mosquito vector Culex (Melanoconion) taeniopus. Our inferred tree topologies suggest that VEEV is maintained primarily in situ, with only occasional spread to neighboring countries, probably reflecting the limited mobility of rodent hosts and mosquito vectors.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

The Venezuelan equine encephalitis (VEE) antigenic complex comprises a group of alphaviruses that share similar genetic characteristics and can be defined by broad cross-reactivity antigenically [1] which defines them as a group within the Alphaviridae. The VEE complex alphaviruses are classified into six subtypes, designated I to VI, and consist of 9 species [2] , of which subtype I contains the veterinary and medically important VEE virus (VEEV) ( Table 1) . Historically, the VEE complex subtypes I-VI were defined by serological analysis, and this nomenclature has persisted. However, the advent of sequencing resulted in several viruses now being reclassified as distinct species. For example, subtype IF is genetically distinct from the remainder of the subtype I viruses and is now classified as the species Mosso das Pedras virus. Most VEE complex viruses have enzootic cycles where they circulate between wild animals, generally rodents and mosquitoes, particularly Culex (Melanoconion) spp. mosquito vectors. With the exception of VEEV subtype II, VEE complex viruses are geographically distributed throughout Central and South America. VEEV subtype II is found only in Florida and is usually transmitted by Culex cedeci mosquitoes. The recent appearance for the first time of Culex (Melanoconion) species in southern Florida [3] underscores the continuing threat of emergence, possibly enhanced by climate change, which also increases the potential for other VEEV subtypes to spread northwards and establish enzootic transmission cycles. Although many VEE complex viruses have not been implicated in human disease, those that are associated with human disease (VEEV) can cause acute, often severe febrile illness that may progress to encephalitis, causing severe human morbidity and mortality [4] . Patients who survive encephalitis are often left with permanent neurologic sequelae, and the cost for treatment and long-term care related to a single case can be several million dollars [5] . In addition to VEEV (subtype I), which cases the majority of the encephalitis cases within the VEE subtype, subtype II Everglades virus (EVEV), which is found only in Florida, can cause neurologic disease in humans [6] and equids [7] . Subtype IIIA, Mucambo virus, also causes febrile disease in humans [8, 9] .

VEEV is associated with human disease, and is further subdivided into subtypes IAB, IC, ID, IE. Subtypes ID and IE comprise enzootic/endemic strains that circulate continuously in forests and swamps of northern South America, Central America and Mexico and cause a large burden of endemic disease from direct spillover [10] . The remaining VEEV subtypes, IAB and IC, comprise epizootic/epidemic strains that are associated with periodic equineamplified outbreaks that result in severe disease in equids and extensive spillover to humans [11] . These outbreaks can spread from South America as far north as the southern United States [12, 13] , resulting in up to hundreds-of-thousands of cases over a period of months to a few years. Prior to the 1980s, VEE epizootics involving high case-fatality rates were frequently recorded. Because horses have been an important component of the local agricultural economies within many Latin American regions, VEE has often had a sizeable economic impact as well as a direct effect on public and veterinary health [14] . Recent outbreaks during the 1990s in Venezuela, Colombia and Mexico have demonstrated the potential for VEEV to re-emerge periodically from enzootic progenitors [15] [16] [17] [18] . The emergence of VEEV into an epidemic/ epizootic form has been associated with specific mutations that arise in the VEEV envelope glycoprotein 2 (E2) gene of enzootic subtype ID or IE strains. These mutations result in the addition of positively charged amino acid changes on the surface of the virion spikes [19] that give rise to increased virulence and viremia in equids [20, 21] and sometimes enhanced infection of epidemic vector mosquitoes such as Aedes (Ochlerotatus) taeniorhynchus [22] . The higher viremia levels in equids can lead to infection of mosquitoes that are not normally involved in enzootic circulation [23] , which can then result in spillover infections of humans and other domestic animals. The phylogenetic characteristics of the VEE complex have been studied for several decades, recently focusing on structural protein gene sequences [24] [25] [26] [27] . These studies support the hypothesis that the IAB and IC subtypes arise from mutations in enzootic ID strains that result in the acquisition of epizootic/epidemic characteristics [19, 20, 28] . Although some studies have addressed the evolution and continued circulation of VEEV ID and IE strains, the use of only partial genomic sequences has limited their resolution and accuracy.

To date, insufficient complete genomic sequences have been available to permit a detailed, global analysis of all VEE complex species/strains and obtain high-resolution phylogenetic results. Our goal was to determine more accurately the temporal origin of the VEE complex and patterns of historic spread. By increasing the number of sequenced VEEV strains from the ID and IE subtypes, we sought a more robust analysis of the evolution of these subtypes. To this end, a set of 130 complete genome sequences was prepared, of which 94 were determined in this study, and a comprehensive phylogenetic study was performed to determine the origin and evolutionary patterns of these important viruses.

Viruses were obtained from the World Reference Center for Emerging Viruses and Arboviruses and other collections at the University of Texas Medical Branch, and included 94 isolates that had not been previously sequenced or had only partial genome sequences available. These strains were geographically and temporally distributed across North and Central America, and over the past 80 years. The metadata for the 94 virus strains sequenced in this study and their accession numbers are found in S1 Table. Virus propagation and cDNA generation Vero (African green monkey kidney) cells (CCL-81, ATCC) were grown in 150 cm 2 flasks tõ 80% confluency and inoculated with VEE complex virus strains. S1 Table shows the strains included in this study and their associated metadata. Infected cells were maintained at 37˚C until the development of cytopathic effects. Then, cell culture supernatants were clarified at 1,125 x g for 10 min and mixed with a 1/3 volume of 4X precipitation buffer (28% PEG 8000, 9.2% NaCl). After overnight incubation at 4˚C, virus was pelleted at 2,880 x g for 30 min at 4˚C and resuspended in 250 μl of TEN buffer (10 mM Tris-HCl, pH 7.5, 1 mM EDTA, 0.1 M NaCl), which was then added to 750 μl of TRIzol LS Reagent (Invitrogen, Grand Island, NY). RNA was extracted following the manufacturer's protocol, then resuspended in 50 μl of H 2 O and stored at -80˚C.

The SuperScript III First-Strand Synthesis System (Invitrogen, Grand Island, NY) was used to produce cDNA following the manufacturer's recommendations. Three 20 μl-reactions were performed, each with 6 μl of extracted RNA and 1 μl of one of the following cDNA primers, 50 ng/μl of random hexamers, 50 μM oligo (dT) 20 , and 10 μM of reverse primers designed to anneal approximately 500 nt downstream of the 5' end of the viral genome. Samples were treated with RNase H, and the resulting cDNA generated from random hexamers and oligo (dT) 20 primers were then mixed together. The cDNA samples were stored at -80˚C until further processing, and efficient reverse transcription was confirmed by PCR using 0.5 μl of each sample and primer pairs designed to anneal near the 5' and 3' portions of the genome.

Sequences were assembled using sequence-independent single primer amplification (SISPA) to barcode random primed cDNAs [29, 30] from individual cDNA samples. SISPA products were normalized and pooled into a single reaction that was purified using a PCR purification kit (Qiagen, Valencia, CA). Samples were subsequently gel purified to select for products ranging from 300-500bp in size for sequencing with the Illumina Genome Analyzer II or 500-800bp in size for Roche 454 Titanium (GS-FLX) sequencing [31], or were sequenced on the Illumina HiSeq using the following protocol; cDNA (0.05-1.7 μg) was fragmented by incubation at 94˚C for eight (8) minutes in 19.5 ul of fragmentation buffer (Illumina 15016648). Samples were tracked using the "index tags" incorporated into the adapters as defined by the manufacturer. Cluster formation of the library DNA templates was performed using the Tru-Seq PE Cluster Kit v3 (Illumina) and the Illumina cBot workstation using conditions recommended by the manufacturer. Paired end 50 base sequencing by synthesis was performed using TruSeq SBS kit v3 (Illumina) on an Illumina HiSeq 1500 using protocols defined by the manufacturer.

Next generation sequencing (NGS) reads from Roche 454 GS-FLX were sorted based on SISPA barcode matches, trimmed, and searched by TBLASTX against a custom reference nucleotide database of full-length VEE complex genomes downloaded from GenBank. Any chimeric VEEV sequences or non-VEEV sequences amplified during the random hexamerprimed amplification were removed. For each sample, the filtered GS-FLX reads were then de novo assembled using CLC Bio's clc_novo_assemble program. The consensus sequence of the de novo assembly was used to identify the best full-length VEEV genome downloaded from GenBank to use as a mapping reference sequence. Both GS-FLX and Illumina reads were then mapped to the selected reference VEEV genome using CLC Bio's clc_ref_assemble_long program. At loci where both GS-FLX and Illumina sequence data agreed on a variant compared to the reference sequence, the latter was updated to reflect the difference. A final mapping of all sequences to the updated reference sequences was then performed, resulting in the final assembled genome. Upon review of NGS assemblies, there were circumstances that required the use of RT-PCR followed by Sanger capillary sequencing to fill gaps in genomic regions with low coverage. These cases included finishing/closure tasks to increase the sequencing coverage of genome regions inadequately covered by NGS.

Sequences identified with an asterisk in S1 Table were part of a different sequencing project and were therefore analyzed using a different platform. These sequences were first subjected to a blast analysis to determine the number of viral reads and the percentage of contaminating reads from hosts, or other sources. Then the contamination reads from the Vero cells were filtered out using the African Green Monkey (Chlorcebus sabeus) genome as a template. The remaining Fastq files were additionally processed using trimmomatic to remove low quality sequence. Finally, assembly was performed using iMetAMOS [32] using the standard parameters.

Nucleotide sequences were manually aligned with similar GenBank sequences using MUSCLE implemented in SeaView [33] . Untranslated regions (UTRs) of alphavirus genomes show limited conservation and are often difficult to align reliably. Therefore, the sequences were edited and trimmed to remove untranslated regions, resulting in concatenated ORFs for 130 (126 VEEV and 4 EEEV) sequences. Sequences were then re-aligned as protein sequences before being reverse translated to nucleotides to maintain codon alignments. Within the VEEV genome, there are two regions that have proved historically difficult to align, the 3' end of the nsP3 and the 5' end of the capsid gene [34] . Thus these were removed from the alignment and the resulting alignment was used for subsequent analyses and this alignment is available upon request. Eastern equine encephalitis virus, the sister virus to VEEV in the alphavirus genus [35] , was used as an outgroup in all analyses except molecular dating and selection. Sequences were analyzed for percent identity at the nucleotide and amino acid level using BioEdit [36].

A maximum likelihood (ML) phylogeny was inferred using PAUP Ã 4.0 [37] and the General Time Reversible (GTR +Γ 4 +I) model, selected by Modeltest [38] . To assess the robustness of tree topologies, bootstrapping was performed using 1,000 replicate neighbor-joining trees. Bayesian phylogenetic inference was performed using the GTR +Γ 4 +I model in MrBayes v3.1 [39, 40] . Analyses were run for one million iterations until they reached convergence.

Substitution rates and times to most recent common ancestor (tMRCAs) were estimated using Bayesian evolutionary analysis by sampling trees (BEAST) v1.7.1 [41, 42] . BEAST employs a Bayesian Markov chain Monte Carlo (MCMC) approach to infer demographic histories, evolutionary rates and dates of divergence from serially (dated) sampled sequence data. Statistical uncertainty in the data is reflected in the 95% highest posterior density (HPD) values. Analyses were typically performed using the Bayesian Skyline Plot (BSP) model of population growth, which does not use a pre-specified demographic model [43] . We used this model because the VEE complex would not all show the same patterns of demographic change and this did not impose constraints on the different subtypes of VEEV. To estimate substitution rate variation among lineages we used the models implemented in the BEAST program [44] including the uncorrelated lognormal (UCLN) model for the VEEV IE strains and the uncorrelated exponential (UCEX) model for the VEEV ID/II/IAB/IC strains as these were determined to be the most accurate model using the stepping stone algorithm implemented in BEAST [45] . The MCMC chain was 100 million generations long, thinned to include every 5000 th generation in the final sample. The program Tracer version 1.5 (http://tree.bio.ed.ac.uk/software/tracer/) was used to confirm convergence and mixing. The software TreeAnnotator version 1.7.1 (http://beast.bio.ed.ac.uk/software/TreeAnnotator) was used to summarize the data output from BEAST. The maximum clade credibility (MCC) tree was estimated using mean node heights after discarding the initial 10% of generations as burn-in.

To investigate the nature of selective pressures acting on VEE complex viruses, we estimated the average number of nonsynonymous (d N ) and synonymous (d S ) nucleotide substitution per site (d N /d S ratio), using a counting method SLAC [46] , as well as the numbers and locations of sites experiencing episodic positive selection using MEME [47] .

In the presence of strong purifying selection, standard models of molecular evolution (e.g. GTR +Γ 4 +I) tend to underestimate branch lengths in RNA and DNA virus phylogenies [48] [49] [50] . Therefore we re-estimated branch lengths using a Branch-Site REL (BRSEL) [51] model that accounts for variation in selection pressure across the genome and across different branches in the phylogeny, using the HyPhy software package (50) . The original formation of the BSREL model allowed three ω (d N /d S ) rate classes on each branch, each representing a proportion of sites in the alignment. To prevent over-fitting, we implemented a step-up procedure via an adaptive BSREL model (aBSREL; [52] ), starting with a single d N /d S class on each branch and testing the fit of an additional d N /d S class using small-sample corrected Akaike information criteria (c-AIC).

All sequences were analyzed for recombination using the program RDP3 [53] . Recombination events were determined using RDP, GENECONV, MaxChi, Chimaera and 3Seq, and were only considered robust if they were identified using more than one method.

The unrestricted Bayesian (MrBayes) and maximum likelihood trees (PAUP Ã ) inferred the expected topology with Cabassou virus (CABV, subtype V), Rio Negro virus (RNV, subtype VI), and Mosso das Pedras virus (MDPV, subtype IF) falling basal to the remainder of the subtypes, as has been shown in previous analyses (Fig 1) [25, 34] . Unexpectedly, the MCC tree, inferred using BEAST (S1 Fig) did not resolve the placement of CABV, RNV, and MDPV, with this grouping showing low posterior support.

Mutations that were unique to each subtype or group of subtypes were identified, as well as those synapomorphies that defined lineages within subtypes; uninformative sites were not counted. For VEEV subtype IE, there were significantly more synapomorphic mutations in the structural protein genes than would be expected if mutations were randomly distributed across the genome (p<0.05, Fisher's exact test). For all other subtypes and groups of subtypes, the distribution of mutations was not significantly different from expected given the length of the ORF's. For the VEE complex subtypes III-VI, the number of mutations was identified on each branch (See S1 Fig). There was no correlation between branch length and the number of amino acid substitutions unique to each of these subtypes.

Correcting for the effect of purifying selection when estimating tMRCAs and evolutionary rates

The predominance of purifying selection in alphavirus evolution has been widely reported [54, 55] , and this type of selection has been postulated to bias tMRCA estimations for ancient divergence events in RNA viruses [48, 49] . To test our hypothesis that, like other ancient RNA viruses [48] [49] [50] , dating estimates may be biased in the VEE complex due to the presence of strong purifying selection, we employed a branch-site random effects likelihood (BSREL) approach to determine the extent to which branch lengths (and tMRCAs) were underestimated by standard models of nucleotide evolution. By moving up through the VEE complex phylogeny in an iterative process, we removed the most basal lineages and compared total tree length optimized under BSREL and GTR+Γ 4 , thereby identifying subtrees whose branch lengths did not expand under BSREL compared with a standard nucleotide model (GTR+Γ 4 ) (Figs 1 & 2) . We assert that it is these subtrees, which were not underestimated by GTR+Γ 4 , whose tMRCA can be reliably inferred by standard Bayesian molecular clock dating (e.g. BEAST) (Fig 2) .

As we could not generate a fully resolved MCC tree for the entire VEE complex, two trees were generated for subtypes IAB/IC/ID and for subtypes IE. Removing the other branch lengths removed some of the uncertainty regarding the substitution rates as well as the lack of samples for many of the other subtypes, which can skew the data. The subtype IAB/IC/ID VEEV strains fell into two main groups, a Panamanian lineage and a Venezuelan/Colombian lineage. In addition, we sequenced two additional strains MAC10 and MHC88 from Venezuela, which are outliers from the rest of the ID/IAB/IC strains (Fig 3) . These strains showed that VEEV has been circulating in its current form for around 253 years, although the date of this node could not be accurately determined as determined by the BSREL analysis. In fact only group K (Figs 1 and 3) , which fell within the Colombian/Venezuelan lineage could be reliably dated at 1934 (1904-1950) (without BSREL correction). The Venezuelan samples MAC10 and MHC88 were basal to the rest of the strains. Within the Venezuelan/Colombian and Panamanian lineages, the Peruvian strains were present in both lineages and, given the shape of the tree and the distribution of these viruses, it is most likely that they represent recent introductions. Although synapomorphic amino acids were identified for the major lineages of the ID subtype (see S2 Fig) and non-structural proteins for each lineage was not significantly different from expected based on a random distribution across the genome.

All of the subtype IAB strains occupied a single clade (see Fig 3) , with the Hoja Redonda (1942.HojaRedonda.Peru) and Piura (1950.Piura.Peru) strains forming the most basal branch. Previous analyses suggested that the IAB outbreaks after 1943 were caused by incompletelyinactivated vaccines derived from early IAB isolates [27] , consistent with our results. These vaccines were used in South and Central America until the early 1970s, when the TC-83 liveattenuated strain replaced them after its demonstrated efficacy during the 1971 Texas outbreak and in experimental studies [56] . The mechanism of the original IAB emergence is believed to be mutations in the E2 protein gene of enzootic ID strains [19] . Unfortunately, the ID progenitor strains for this group have has not been identified. For the subtype IE, a separate coalescent analysis was performed (Fig 4) . The BT2607 and MenaII isolates were basal in this tree as expected given previous analyses [16] . These viruses were isolated in the 1960s and no further strains are available from this lineage. The remaining subtype IE strains fell into two major groups with the tMRCA of 91 (68-124) years before the present. The Pacific Coast strains contained samples from the entire known geographic range of VEEV subtype IE, except Panama (Guatemala, Honduras, Nicaragua and Mexico) and the Gulf Coast strains included isolates from Mexico only, and a single strain from Belize. The most recent VEEV strains from Mexico were found in this second group. However, the paucity of recent isolates from Central America suggests that some of the temporal groupings in our trees may reflect sampling bias. The Gulf Coast IE strains diverged after the Pacific Coast strains 76 (95% HPD; 58-102) years prior to 2010, and the Pacific Coast strains diverged 85 (95% HPD; . However, only the clades designated R and T (Figs 1 & 4) could be reliably dated. Evolution of VEEV complex alphaviruses

There was evidence of VEEV recombination between nucleotides 4800-5830 observed using the program RDP. However, this region corresponds to the nsP3 gene, which is highly variable with frequent insertions and deletions, even within subtypes. Aligning the insertions and deletions is problematic, so this recombination result cannot be confirmed.

Using SLAC, we estimated a global d N /d S ratio of 0.057, suggesting that purifying selection is the predominant evolutionary force acting on the VEEV genome. A total of 2900 negatively selected sites were detected using the SLAC algorithm (at p<0.1). Nonetheless, we found evidence for episodic diversifying positive selection at 23 codons using MEME (at p<0.05; identified relative to their position within each protein relative to prototype VEEV strain 3880 in Table 2 ). Several positively selected codons were detected in important protein genes including the nsP4 RNA-dependent RNA polymerase, Capsid, E2 and E1 glycoproteins. However, these sites did not include substitutions previously demonstrated experimentally to mediate Evolution of VEEV complex alphaviruses adaptation for equine amplification or bridge vector infection involved in epizootic VEE emergence [21, 57] , underscoring the limitations of d N /d S -based selection analyses.

VEEV continues to be a major public health problem in Latin America and understanding the evolution and spread of this virus in the New World is integral to implementing surveillance strategies and prevention programs. Although the genetic mechanisms of emergence of epidemic strains is relatively well understood [19, 20, 28, 59] , the evolution of the group as a whole has not been studied in a comprehensive manner. Using Bayesian molecular dating techniques, we investigated the evolutionary dynamics of the VEE complex while taking into consideration the effect of purifying selection on the accuracy of dating ancient nodes within this group of viruses. The genomic sequencing of 94 VEEV isolates from all known endemic countries, the inclusion of most isolates from epidemic/epizootic events, and incorporation of the known temporal distribution of available isolates facilitated an extensive investigation into the origin and evolutionary history of this important group of pathogens.

We estimated dates of divergence for several subtypes in the VEE complex. However, the presence of purifying selection can bias such analyses, and our attempts to date the entire VEE complex were limited by this bias. Analysis of ID/IAB/IC strains revealed two main lineages. One circulates primarily in Panama, with a few Peruvian strains included. The origin of this Table 2 . Codon positions predicted to be under episodic positive selection, showing the amino acid changes and the proteins altered. Amino acid position is defined using the Genome sequence 3908 (Accession number U55350 in the NCBI database) and numbered against the non-structural and structural polyproteins.

Amino group was estimated at around 102 years ago, although confidence limits were broad. From the phylogeny we were able to infer that circulation of VEEV may have been present initially in Panama, with a subsequent introduction into Peru. The exact mechanism of introduction into Peru is not understood; the enzootic cycle of VEEV involves rodents and mosquitoes that have limited geographic range, suggesting other mechanisms of transport such as birds may be occasionally be involved in virus movement. The presence of Peruvian sequences in two distinct lineages suggests that there have been at least two independent introductions. Exclusion of VEEV subtypes III-VI and the lack of isolates from countries intermediate between Panama and Peru will therefore limit the possibility of fully resolving the ancestral dispersal of these viruses. As in previous analyses, our results confirm that the IAB strains group together with strain Hoja Redonda, isolated in Peru in 1942, which is basal in the group. Previous partial sequencing and analysis of these strains suggested that some of the IAB outbreaks were a result of incompletely inactivated vaccines [27] . We were able to corroborate this finding using full genome strains that showed tight genetic distance (96.08-99.23%), and no identified ID progenitors (i.e. strains that are genetically related but lack the defining E2 mutations that are associated with the IAB subtypes), although this could be due to inaccurate sampling. However, compared to the spacing of the strains comprising the IC subtype, which are not known to have been used for vaccine production and are interspersed throughout the phylogeny, all the IAB strains appear to have a single origin, and previous analyses have shown that these cluster with vaccine strains used around the time these viruses were circulating [27] .

The IC subtype, which has been characterized both phylogenetically and experimentally using reverse genetics, was determined to be a result of amino acid changes in the E2 protein of enzootic subtype ID strains [28, 59] . The IC strains are found within 2 distinct clades in the VEEV phylogeny, particularly in the Colombian/Venezuelan lineage. However, our data suggest that no IC epizootic/epidemic strains have arisen from the Panamanian lineage. It is possible that epistatic interactions limit the emergence of IC epidemic mutations from these Panamanian ID strains, as has been described for chikungunya virus emergence [60, 61] . Despite strong experimental evidence that substitutions in the E2 protein mediate adaptation for equine amplification or bridge vector infection involved in epizootic VEE emergence [21, 57] , our sequence analyses failed to detect positive selection on the corresponding codons ( Table 2 ). This finding underscores the limitations of current d N /d S methods for identifying unique adaptive substitutions occurring during virus evolution, as they typically require repeated substitution events at a given site to detect adaptive evolution.

The IE VEEV subtype is confined to Central America and Mexico; in fact there appears to be a demarcation between the ID subtype and the IE subtype at the Panamanian/Costa Rican border with the exception of the BT2607 and MenalI strains isolated in western Panama in 1961 and 1962, respectively (Fig 1) . More surveillance in Panama is required to determine if these VEEV IE subtypes are still circulating. The main VEEV IE lineage is subdivided into two clades: a widely dispersed group with strains from Guatemala, Honduras, Nicaragua and Mexico (Pacific Coast lineage) and a group with more limited dispersal circulating mainly in Mexico (Gulf Coast lineage). As is expected of a rodent-hosted arbovirus, there is very little spread among countries and the dispersal within the IE subtype appears to occur between neighboring countries, presumably reflecting the limited mobility of rodents and mosquitoes (Fig 4) . Given the limited number of countries represented in the IE phylogeny and the lack of recent sampling in most countries, it is hard to draw further conclusions about the historical spread of this subtype within Central America. However, recent detailed studies of VEEV circulation in Mexico demonstrate geographic stability of independently evolving lineages in that country [62] .

Previous phylogenetic analyses of the VEE complex viruses were performed using partial and complete structural protein gene sequences [25] . Using these expanded sequences we observed 63 synapomorphic mutations associated with the IE subtype (Group H). Of these, 39 were in the non-structural protein genes and 37 in the structural genes. This finding was significantly different from an expected, random distribution throughout the VEEV genome. It is possible that the preponderance of structural protein substitutions reflects adaptation to different mosquito vectors. Subtype ID strains are vectored by at least three mosquito species: Cx. (Melanoconion) adamesi, Cx. (Mel.) vomerifer and Cx. (Mel.) pedroi [63] , whereas the subtype IE strains appear to be more specialized and vectored nearly exclusively by Cx. (Mel.) taeniopus [64] . Experimental infections with subtypes IAB, IC and ID have demonstrated poor infectivity at the level of midgut infection, suggesting specific subtype IE adaptation to this vector [64, 65] . Although the true distributions of Cx. taeniopus and Cx. pedroi are not completely certain because they were not distinguished until 1980 [66] , recent collection records and revisions [63, [67] [68] [69] [70] [71] [72] [73] suggest that the former is restricted to Mexico, Central America and the Caribbean, while the latter occurs throughout northern South America and Central America, as well as in Mexico. Additionally, neither Cx. vomerifer or Cx. adamesi, both subtype ID vectors, are found north of Panama [74, 75] , which may restrict the distribution of the ID subtype. Regardless of the original mechanism of spatial compartmentalization, between these subtypes, the IE strains appear to have adapted specifically to Cx. taeniopus while the other subtypes remain poorly infectious for this species [76] .

In summary, our comprehensive analysis of all available full-length VEE complex genomes demonstrated that purifying selection as a confounding factor in coalescent analyses and only the more recently diverged subtypes could have their tMRCAs reliably estimated. Purifying selection is an inherent determinant of arbovirus evolution because of the alternating vectorhost transmission and could therefore introduce bias into coalescent analyses of similar viruses. By restricting this bias by analyzing only those clades for which the BSREL analysis indicated that standard nucleotide models would not introduce severe bias we were able to robustly estimate the tMRCA of several VEEV lineages. Evolution of the IE subtype appears to have been characterized by a significant evolutionary shift from the rest of the VEEV complex, with an increase in structural protein substitutions that may reflect adaptation to its mosquito vector. Additionally our inferred tree topologies, suggest that VEEV is maintained primarily in limited geographic foci with only occasional spread to neighboring countries, probably reflecting the limited mobility of rodent hosts and mosquito vectors. Additional strains of subtypes II-VI are needed to more completely characterize the evolution of the entire VEE complex of alphaviruses.

Supporting information S1 Table. List of virus strains used in the study and known metadata. 

Alphaviruses: Population genetics and determinants of emergence

Virus Taxonomy, Ninth Report of the International Committee on Taxonomy of Viruses

Culex (Melanoconion) panocossa from peninsular Florida, USA

Encephalitic alphaviruses

Eastern equine encephalitis virus-old enemy, new threat

Everglades virus infection in man

Venezuelan equine encephalomyelitis in Florida: endemic virus circulation in native rodent populations of Everglades hammocks

Endemic Venezuelan equine encephalitis in northern Peru. Emerging infectious diseases

Anchez Spindola I. Two Human Cases of Laboratory Infection with Mucambo Virus. The American journal of tropical medicine and hygiene

Endemic Venezuelan equine encephalitis in the Americas: hidden under the Dengue umbrella

Endemic Venezuelan equine encephalitis in the Americas: hidden under the dengue umbrella

History and geographic distribution of Venezuelan equine encephalitis

Epidemic Venezuelan equine encephalitis in North America in 1971: Vector studies

The Arboviruses: Epidemiology and Ecology. IV

Genetic and antigenic diversity among eastern equine encephalitis viruses from North, Central, and South America

Association of Venezuelan equine encephalitis virus subtype IE with two equine epizootics in Mexico

Epidemic Venezuelan equine encephalitis in La Guajira, Colombia

Re-emergence of epidemic Venezuelan equine encephalomyelitis in South America. VEE Study Group

Positively charged amino acid substitutions in the E2 envelope glycoprotein are associated with the emergence of Venezuelan equine encephalitis virus

Envelope glycoprotein mutations mediate equine amplificiation and virulence of epizootic Venezuelan equine encephalitis virus

Venezuelan encephalitis emergence mediated by a phylogenetically predicted viral mutation. Proceedings of the National Academy of Sciences of the United States of America

Vector infection determinants of Venezuelan equine encephalitis virus reside within the E2 envelope glycoprotein

Venezuelan equine encephalitis

Genetic characterization of Venezuelan equine encephalitis virus from Bolivia, Ecuador and Peru: identification of a new subtype ID lineage

Nucleotide sequences of the 26S mRNA's of the viruses defining the Venezuelan equine encephalitis antigenic complex

Venezuelan equine encephalitis in Panama: fatal endemic disease and genetic diversity of etiologic viral strains

Genetic evidence for the origins of Venezuelan equine encephalitis virus subtype IAB outbreaks

Venezuelan encephalitis emergence mediated by a phylogenetically predicted viral mutation

Viral genome sequencing by random priming methods

MRBAYES 3: Bayesian phylogenetic inference under mixed models

BEAST: Bayesian evolutionary analysis by sampling trees

Bayesian phylogenetics with BEAUti and the BEAST 1.7

Bayesian coalescent inference of past population dynamics from molecular sequences

Relaxed phylogenetics and dating with confidence

Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty

Datamonkey: rapid detection of selective pressure on individual sites of codon alignments

Detecting individual sites subject to episodic diversifying selection

A case for the ancient origin of coronaviruses

Purifying selection can obscure the ancient age of viral lineages

Evolutionary origins of human herpes simplex viruses 1 and 2

A random effects branch-site model for detecting episodic diversifying selection

Less is more: An adaptive branch-site random effects model for efficient detection of episodic diversifying selection

RDP3: a flexible and fast computer program for analyzing recombination

Alphaviruses: Population genetics and determinants of emergence

Transmission cycles, host range, evolution and emergence of arboviral disease

Experimental infections of horses with an attenuated Venezuelan equine encephalomyelitis vaccine (strain TC-83)

Venezuelan equine encephalitis emergence: Enhanced vector infection from a single amino acid substitution in the envelope glycoprotein

Venezuelan equine encephalitis virus int he guinea pig model: evidence for epizootic determinants outside the E2 envelope glycoprotein gene

Venezuelan equine encephalitis emergence: Enhanced vector infection from a single amino acid substitution in the envelope glycoprotein

Chikungunya virus emergence is constrained in Asia by lineage-specific adaptative landscapes

Sequential adaptive mutations enhance efficient vector switching by Chikungunya virus and its epidemic emergence

Venezuelan equine encephalitis virus activity in the Gulf Coast region of Mexico

Natural Enzootic vectors of Venezuelan equine encephalitis virus

Vector competence of Culex (Melanoconion) taeniopus for allopatric and epizootic Venezuelan equine encephalomyelitis viruses

Vector incompetancy: its implication in the disappearance of epizootic Venezuelan equine encephalomyelitis virus from Middle America

The identity of Culex (Melanoconion) taeniopus and related species with notes on synonymy and description of a new species (Diptera: Culicidae)

Spatial dispersion of adult mosquitoes (Diptera: Culicidae) in a sylvatic focus of Venezuelan equine encephalitis virus

Mosquito-host interactions during and after an outbreak of equine viral encephalitis in Eastern Panama

Annotated checklist of hte mosquito species encountered during arboviral studies in Iquitos, Peru (Diptera: Culicidae)

Revision of the Spissipes section of Culex (Melanoconion) (Diptera:Culicidae)

Vector competence of Mexican and Honduran Mosquitoes (Diptera:Culicidae) for enzootic (IE) and Epizootic (IC) strains of Venezuelan equine encephalomyelitis virus

Contrasting sylvatic foci of Venezuelan equine encephalitis virus in northern South America

Molecular phylogeny of the Vomerifer and Pedroi groups in the Spissipes section of the subgenus Culex (Melanoconion)

Melanoconion) adamesi, a new species from Panama. Mosquito Systematics

On the identity of Culex (Melanoconion) portesi senevet & abonnenc 1941. Proceedings of the

Vector competence of Culex (Melanoconion) taeniopus for allopatric and epizootic Venezuelan equine encephalomyelitis viruses. The American journal of tropical medicine and hygiene

The authors wish to thank Robert Tesh and Hilda Guzman of the WRCEVA for providing many of the viruses used in this study. We thank Nadia Fedorova for reassembly and validation of sequences.