key: cord-279794-hn5vmic0 authors: Guo, Jiahui; Fang, Liurong; Ye, Xu; Chen, Jiyao; Xu, Shangen; Zhu, Xinyu; Miao, Yimin; Wang, Dang; Xiao, Shaobo title: Evolutionary and genotypic analyses of global porcine epidemic diarrhea virus strains date: 2018-08-27 journal: Transbound Emerg Dis DOI: 10.1111/tbed.12991 sha: doc_id: 279794 cord_uid: hn5vmic0 Porcine epidemic diarrhea virus (PEDV), which re‐emerged in China in October 2010, has spread rapidly worldwide. Detailed analyses of the complete genomes of different PEDV strains are essential to understand the relationships among re‐emerging and historic strains worldwide. Here, we analysed the complete genomes of 409 strains from different countries, which were classified into five subgroup strains (i.e., GI‐a, GI‐b, GII‐a, GII‐b, and GII‐c). Phylogenetic study of different genes in the PEDV strains revealed that the newly discovered subgroup GII‐c exhibited inconsistent topologies between the spike gene and other genes. Furthermore, recombination analysis indicated that GII‐c viruses evolved from a recombinant virus that acquired the 5′ part of the spike gene from the GI‐a subgroup and the remaining genomic regions from the GII‐a subgroup. Molecular clock analysis showed that divergence of the GII‐c subgroup spike gene occurred in April 2010, suggesting that the subgroup originated from recombination events before the PEDV re‐emergence outbreaks. Interestingly, Ascaris suum, a large roundworm occurring in pigs, was found to be an unusual PEDV host, providing potential support for cross‐host transmission. This study has significant implications for understanding ongoing global PEDV outbreaks and will guide future efforts to develop effective preventative measures against PEDV. Porcine epidemic diarrhoea (PED) is a devastating enteric disease in pigs that results in severe diarrhoea, vomiting, and dehydration, with very high mortality observed in suckling pigs (Pensaert & de Bouck, 1978) . Porcine epidemic diarrhea virus (PEDV), which is the causative agent of PED, belongs to the genus Alphacoronavirus within the family Coronaviridae and is an enveloped single-stranded positivesense RNA virus (Woo et al., 2012) . The PEDV genome consists of seven open reading frames (ORFs) organized in the order ORF1a, ORF1b, spike (S) glycoprotein gene, ORF3 hypothetical protein gene, envelope (E) gene, membrane (M) gene, and nucleocapsid (N) gene . Among the proteins encoded by the ORFs, the S glycoprotein is located on the envelope of the virus in the large surface projections of the virion and plays an important role in the attachment of viral particles to host cell receptors (Lee, Park, Kim, & Lee, 2010) . Thus, the S gene is considered important for understanding the genetic relatedness and epidemiological status of PEDV field isolates, as well as for advancing vaccine development (Chen, Liu, Lang, et al., 2013) . The PED disease was first discovered in pig farms in Belgium and the United Kingdom in 1976 (Pensaert & de Bouck, 1978) , with reports of its occurrence in China as early as the 1980s (Li et al., 2012) . With the emergence of new PEDV strains, however, serious disease epidemics have been observed in China since October 2010 (Sun et al., 2012) . Beyond China, the disease has rapidly spread to more than 38 states in the USA following its first outbreak in May 2013, affecting more than 4,000 farms accounting for more than 7 million piglets (Cima, 2014) . Japan, Canada, Mexico, and Colombia have also experienced successive outbreaks, with considerable economic losses to the global pig industry (Lara-Romero et al., 2018; Ojkic et al., 2015; Takahashi, Okada, & Ohshima, 1983; Valko et al., 2017) . Major global outbreaks since 2012 have renewed concerns about the potential changes in the mode of PEDV transfer (Chen, Liu, Lang, et al., 2013; Li et al., 2012; Sun et al., 2012) . Although increasing evidence suggests that PEDV routinely undergoes significant changes, especially in spike proteins (Lara-Romero et al., 2018; Stott et al., 2017) , the prevalence and evolution of PEDV strains is not well-defined and limited knowledge is known regarding the ways in which PEDV subgroups circulate among themselves and how they might influence the evolution of PEDV. To better understand the molecular epidemiology and genetic diversity of PEDV field isolates, we investigated the genetic characterization, origin, and evolution of emergent PEDV strains worldwide, which will provide much needed information for the effective prevention and control of this disease. The complete PEDV genome was selected for genetic analysis. To clarify the evolution of PEDVs, we obtained four complete genome sequences of PEDV from our own lab (i.e., ZL29, AJ1102, HuB1-2017, and HuB7-2017) (Bi, Zeng, Xiao, Chen, & Fang, 2012) Firstly, we performed multiple sequence alignment of the 409 complete PEDV genomes, as well as the ORF1ab, S, ORF3-E-M-N genes, and applied the bat coronavirus BtCoV/512/2005 (GenBank accession no. DQ648858) sequence as an outgroup (Tang et al., 2006) . A maximum-likelihood (ML) phylogenetic tree was constructed using IQ-TREE v.1.6.5 (Nguyen, Schmidt, von Haeseler, & Minh, 2015) , with the best fitting evolutionary model suggested by the program following 1,000 bootstrap replicates. The phylogenetic tree was rooted against the PEDV-related bat coronavirus, with removal of the long-branch leading to greater resolution of the viruses of interest. Nucleotide and deduced amino acid sequences were aligned using MAFFT v.7.402 (Katoh & Standley, 2013) . The resulting tree was visualized using iTOL v.4 (Interactive Tree of Life, http://itol.e mbl.de/). We preliminarily screened the PEDV sequence data set for recombination using RDP, GENECONV, Chimaera, MaxChi, and 3Seq, followed by secondary scanning and recombination using BootScan and SiScan in Recombination Detection Program version.4.95 (RDP v.4 .95) (Martin, Murrell, Golden, Khoosal, & Muhire, 2015) . Sequences with significant signals for recombination determined by more than two methods were analysed in greater detail. Nucleotide sequence similarity was assessed by SimPlot v.3.5.1 (Lole et al., 1999) , with a sliding window size of 500 bp, step size of 100 nucleotides, and 1,000 bootstrap replicates, using gap-stripped alignments and the F84 (ML) distance model. All data were analysed using GraphPad Prism software (v.5.03, San Diego, CA, USA). All S protein sequences from the PEDV sample strains were analysed using the meta data-driven comparative analysis tool (meta-CATS) (Pickett et al., 2013) , with a p-value threshold of 0.05 (this threshold F I G U R E 1 Genotyping and origin of the 409 PEDV strains based on full-length genomic sequence analyses. (a) Phylogram was tested by 1,000 bootstrap replicates, branch lengths were measured by the number of substitutions per site (see scale bars). Names of strains, years, places of isolation, GenBank accession numbers, genogroups, and subgroups are shown. (b) Line chart shows the number of PEDV sequences obtained by gene subgroup and year of sampling. Yearly percentages of samples positive for PEDV are indicated by different coloured lines respectively. Data are indicated below sampling years. (c) Subgroup distribution of all available complete or partial PEDV genome sequences from countries reporting PEDV infections (n.a., sequence not available). In the bar charts, counts are shown by country or region. Data are indicated below bar charts [Colour figure can be viewed at wileyonlinelibrary.com] was the maximum probability level for the likelihood that the position differed among groups simply by chance), to identify significantly different sites between the five subgroups. To characterize the genetic diversity of PEDVs circulating globally, we constructed a phylogenetic tree using IQ-TREE based on the 409 complete PEDV genomes (see Materials and Methods 2.2). Consistent with our previous research (Wang, Fang, & Xiao, 2016a) , the phylogenetic tree indicated that the complete PEDV genomes evolved into two separate genogroups, GI (classical) and GII (variant), as presented in Figure 1a . Furthermore, genogroup GI evolved into two subgroups (GI-a and GI-b) and genogroup GII evolved into three subgroups (GII-a, GII-b, and GII-c). The GI-a subgroup mainly included the earlier PEDV strains found in Europe and Belgium (virulent We identified the geographical and temporal distributions of the PEDV strains to clarify the evolution of the virus. As shown in Figure 1b , only sporadic outbreaks of PEDV were reported before 2010, with the pathogens involved in these outbreaks found within the GI genogroup. The PEDVs were primarily located in the GI-a (virulent CV777) and GI-b (attenuated DR13) subgroups and were predominantly from Asia (Figure 1b ). However, considerable PEDV outbreaks were reported in Asia and the United States after 2010, even for vaccinated piglets (Lin, Saif, Marthaler, & Wang, 2016) . Based on our examination and assembly of public data, we identified that the PEDV strains were primarily from the GII genogroup. Interestingly, GII-b subgroup strains were reported predominantly in 2011, whereas GII-a strains were reported more prevalently after 2011 and occupied a larger proportion of strains. Moreover, reports of the newly discovered GII-c strains showed a significant increase after 2012 due to further sequencing from Europe. The above results indicate that the epidemic strains from different periods were from the five different subgroups. Based on the geographical distribution of PEDVs (Figure 1c) , the GI genogroup (classical and cell culture-adapted vaccine strains) largely originated from the earlier PEDV-threatened areas, such as China, South Korea, and Europe. The different subgroups from the GII genogroup also showed characteristic geographical distribution. While most GII-a subgroup strains were from the Americas, a small number were from China and Japan or from sporadic outbreaks in a few other isolated areas. The GII-b subgroup strains were mostly endemic to Asia, especially China, South Korea, and Japan. The GII-c subgroup strains were primarily from Europe, with some from the USA and China. Our study showed that PEDV strains from different subgroups were prevalent within the same areas, implying that the coincident "hot spots" in PEDV-endemic areas (e.g., China and South Korea, Figure 1c critical for determining the sources of some PEDV variations. These "hot spot" areas have the potential to be important reservoirs for the genetic variation of PEDVs, resulting in recombination between different PEDV subgroups. We also examined the potential hosts of the 409 PEDV strains. Results showed an unusual PEDV strain (GenBank accession no. KX883635) hosted by Ascaris suum (Supporting information Table S1 ) (Shi et al., 2016) , a large roundworm in pigs, thus providing novel insight into the possible epidemiology of PEDV infection. Indeed, parasites have long been regarded as a harmful factor to the pig industry as sources for a variety of infectious agents (Jesudoss Chelladurai et al., 2017) . For example, Metastrongylus larvae are considered a reservoir for various porcine viral pathogens, primarily swine fever virus and swine flu virus (Sen, Kelley, Underdahl, & Young, 1961) . However, whether Ascaris suum plays a critical role as a PEDV reservoir requires further investigation. To further explore the evolution of PEDVs, we constructed three phylogenetic trees based on the ORF1ab, S, and ORF3-E-M-N gene sequences of the 409 PEDV strains. The ORF1ab and ORF3-E-M-N gene alignments confirmed that the GII-c subgroup was deeply nested within the GII-a subgroup (Figure 3a and b) . Strikingly, the phylogeny of the S gene suggested an entirely different evolutionary history for the GII-c subgroup compared to the other subgroups (Figure 3c ). All GII-c subgroup strains showed inconsistent topology in the S gene phylogenetic tree, differing from the ORF1ab gene phylogenetic tree. This inconsistent topology, in which outlier sequences were found between two well-defined subgroups in a phylogenetic tree, was attributed to (95%-99% genetic identity to the ORF1ab and ORF3-E-M-N genes). In contrast, the genetic identity between the S gene sequences of these viruses was only 85% and showed strong similarity with virulent CV777 (95% gene identity). In the S gene phylogeny, virulent DR13 did not cluster with KUPE21 but instead with virulent CV777, suggesting that the PEDV S gene was subjected to relatively frequent recombination, even between divergent subgroups (Figure 3c and Supporting information Figure S2a ). Moreover, the Italy/7239/2009 PEDV strain Intragenogroup recombination provides a mechanism for amalgamation among these distinct subgroups and increases the genetic repertoire of co-circulating PEDV strains. The S protein attaches to the cellular receptors of a host, resulting in virus entry by membrane fusion, and contains the domain that stimulates the production of neutralizing antibodies. Variations in the S protein are important for understanding the genetic relatedness of PEDV field strains (Chang et al., 2002; Lara-Romero et al., 2018; Wang et al., 2016b) . To determine the significant S protein sequences among the five different PEDV subgroups, meta-CATS analysis was performed for spike protein sequences of all 409 PEDV strains. We identified 257 amino acid positions with significant variation among the isolates from the five subgroups (Supporting information Table S2 ). As shown in Figure 4 , the GII genogroup contained 11 distinct patterns of aa mutations (I116T, I356T, E365Q, T549S, G594S, N724S, A959V, S1044A, G1173D, S1232R, and R1298Q), distinguishing it from isolates in the GI genogroup. Although the GII-c subgroup S genes shared some aa substitutions with the GI-a and GII-a subgroups (two parents of recombination), they also exhibited three unique patterns (L76, A/S92, and H/T113), which clearly distinguished these isolates from those in the other subgroups, suggesting that the GII-c subgroup S gene evolved gradually through antigenic drift. It is well established that the S protein of coronaviruses induces high levels of neutralizing antibodies. Four neutralizing epitopes (499-638, 748-755, 764-771 , and 1,368-1,374 amino acids) have also been identified in the PEDV S protein (Chang et al., 2002) . To explore whether the GII genogroup acquired substitutions in neutralizing epitopes characteristic of the GI genogroup, we mapped the significantly different positions to the equivalent positions in representative sequences (Figure 4) . Results identified seven substitutions in the neutralizing epitopes of the S protein in the GII genogroup (L521H, S523G, V527I, T549S, G594S, A605E, and L612F), which may explain why traditional inactivated vaccines and attenuated vaccines against the GI genogroup cannot effectively protect piglets threatened by a pandemic strain from the GII genogroup. In summary, this study revealed the genetic diversity and evolutionary dynamics of PEDV strains. Our genetic analyses showed that the PEDV strains could be categorized into two groups, namely, GI (classical) and GII (variant) . We also discovered a new subgroup (GII-c) with novel genetic, molecular, and phylogenic characteristics. The GII-c subgroup evolved from a recombination event between the GI-a and GII-a subgroups, and we further found recombination in two relatively early strains: virulent DR13 and Italy/7239/2009. These recombination events occurred prior to the re-emergence of PEDV in 2010. Additionally, to explore the potential link between S protein amino acid sequence variations and recombination, we performed a series of comparative analyses of the PEDV S protein sequences. We found 10 positions that were localized in a well-known neutralizing epitope and revealed several unique amino acids that could easily distinguish the different subgroups. This study provides critical information to help trace the sources of PEDV variants and identify the evolutionary mechanisms involved. Furthermore, this research will hopefully facilitate the development of diagnostic kits, vaccines, and new therapeutic strategies, which are expected to turn the tide in the prevention of pandemic outbreaks of PEDV. None. http://orcid.org/0000-0003-3394-3702 Baele, G., Lemey, P., Rambaut, A., & Suchard, M. A. (2017) . Adaptive MCMC in Bayesian phylogenetics: An application to analyzing partitioned data in BEAST. Bioinformatics, 33, 1798-1805. https://doi.org/ 10.1093/bioinformatics/btx088 Bi, J., Zeng, S., Xiao, S., Chen, H., & Fang, L. (2012 Identification of the epitope region capable of inducing neutralizing antibodies against the porcine epidemic diarrhea virus Genetic variation of nucleocapsid genes of porcine epidemic diarrhea virus field strains in China Detection and molecular diversity of spike gene of porcine epidemic diarrhea virus in China Two novel porcine epidemic diarrhea virus (PEDV) recombinants from a natural recombinant and distinct subtypes of PEDV variants PED virus reinfecting U.S. herds. Virus estimated to have killed 7 million-plus pigs Molecular epidemiology of Ascaris infection among pigs in Iowa. The Journal of infectious diseases MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular biology and evolution Molecular characterization of the spike gene of the porcine epidemic diarrhea virus in Mexico Heterogeneity in spike protein genes of porcine epidemic diarrhea viruses isolated in Korea New variants of porcine epidemic diarrhea virus, China Genome sequencing and analysis of a novel recombinant porcine epidemic diarrhea virus strain from Henan, China Complete genome sequence of a recombinant porcine epidemic diarrhea virus strain Evolution, antigenicity and pathogenicity of global porcine epidemic diarrhea virus strains Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination RDP4: Detection and analysis of recombination patterns in virus genomes IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies The first case of porcine epidemic diarrhea in Canada A new coronavirus-like particle associated with diarrhea in swine Metadata-driven comparative analysis tool for sequences (meta-CATS): An automated process for identifying significant sequence variations that correlate with virus attributes Transmission of swine influenza virus by lungworm migration Redefining the invertebrate RNA virosphere Evolutionary and epidemiological analyses based on spike genes of porcine epidemic diarrhea virus circulating in Thailand Outbreak of porcine epidemic diarrhea in suckling piglets An outbreak of swine diarrhea of a new-type associated with coronavirus-like particles in Japan. Nihon juigaku zasshi Prevalence and genetic diversity of coronaviruses in bats from China Porcine epidemic diarrhoea virus with a recombinant S gene detected in Hungary Immunogenicity and antigenic relationships among spike proteins of porcine epidemic diarrhea virus subtypes G1 and G2 Porcine epidemic diarrhea in China Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus