key: cord-0298049-yo1qvazf authors: Zehr, Jordan D.; Kosakovsky Pond, Sergei L.; Martin, Darren P.; Ceres, Kristina; Whittaker, Gary R.; Goodman, Laura B.; Stanhope, Michael J. title: Recent zoonotic spillover and tropism shift of a Canine Coronavirus is associated with relaxed selection and putative loss of function in NTD subdomain of spike protein date: 2021-11-17 journal: bioRxiv DOI: 10.1101/2021.11.15.468709 sha: 3add4d74c4aeb0cd83c87f4b60dfffbe1eacfd68 doc_id: 298049 cord_uid: yo1qvazf A recent study reported the occurrence of Canine Coronavirus (CCoV) in nasopharyngeal swabs from a small number of patients hospitalized with pneumonia during a 2017-18 period in Sarawak, Malaysia. Because the genome sequence for one of these isolates is available, we conducted comparative evolutionary analyses of the spike gene of this strain (CCoV-HuPn-2018), with other available Alphacoronavirus 1 spike sequences. The most N-terminus subdomain (0-domain) of the CCoV-HuPn-2018 spike protein has sequence similarity to Transmissible Gastroenteritis Virus (TGEV) and CCoV2b strains, but not to other members of the type II Alphacoronaviruses (i.e., CCoV2a and Feline CoV2-FCoV2). This 0-domain in CCoV-HuPn-2018 has evidence for relaxed selection pressure, an increased rate of molecular evolution, and a number of unique amino acid substitutions relative to CCoV2b and TGEV sequences. A region of the 0-domain determined to be key to sialic acid binding and pathogenesis in TGEV had clear differences in amino acid sequences in CCoV-HuPn-2018 relative to both CCoV2b (enteric) and TGEV (enteric and respiratory). The 0-domain of CCoV-HuPn-2018 also had several sites inferred to be under positive diversifying selection, including sites within the signal peptide. Downstream of the 0-domain, FCoV2 shared sequence similarity to the CCoV2b and TGEV sequences, with analyses of this larger alignment identifying positively selected sites in the putative Receptor Binding Domain (RBD) and Connector Domain (CD). Recombination analyses strongly implicated a particular FCoV2 strain in the recombinant history of CCoV-HuPn-2018 with molecular divergence times estimated at around 60 years ago. We hypothesize that CCoV-HuPn-2018 had an enteric origin, but that it has lost that particular tropism, because of mutations in the sialic acid binding region of the spike 0-domain. As selection pressure on this region was reduced, the virus evolved a respiratory tropism, analogous to other Alphacoronavirus 1, such as Porcine Respiratory Coronavirus (PRCV), that have lost this region entirely. We also suggest that signals of positive selection in the signal peptide as well as other changes in the 0-domain of CCoV-HuPn-2018 could represent an adaptive role in this new host and that this could be in part due to the different spatial distribution of the N-linked glycan repertoire for this strain. The ongoing coronavirus (CoV) disease 19 is the third documented animal 31 to human CoV spillover, (SARS-CoV; SARS-CoV-2 and MERS-CoV), within the past two 32 decades, to have resulted in a major epidemic. Coronaviruses (CoVs) that infect mammals (with 33 the exception of pigs) belong principally to two genetic and serologic groups: 34 the Αlphacoronavirus (α) and Βetacoronavirus (β) genera. Alphacoronavirus 1 is a species 35 within the Alpha genus which comprises viruses that infect dogs, cats and pigs, and is further 36 subdivided into type I and II based on serological parameters and genetic differences in the 37 spike gene, although further genetic and biological differences are also apparent (see for e.g., CCoV1; Decaro et al. 2015) . Vlasova et al. (2021) 87 We collected complete spike gene sequences from Alphacoronavirus 1 type II CoVs 88 available in GenBank (accession numbers appear in Supplementary to either closely approximate site positions (e.g., for N terminal subdomains) or to more 100 precisely pinpoint site positions (e.g., RBD). Type I and type II Alphacoronavirus 1 viruses were 101 not included together in sequence alignments used for selection analyses, because of the 102 divergence between these two types. 103 We prepared two sets of alignments for comparative sequence analyses. The first of 104 these (set I) included TGEV, CCoV-HuPn-2018, and the CCoV2b strains. This set of sequences 105 was assembled for the analysis of the N-terminus subdomain and we only consider results 106 involving the first 288 aligned amino acid positions (up to and including position 266 of CCoV-107 HuPn-2018) referred to as the 0-domain in the FIPV structural paper (Yang et al. 2020) , and 108 here. The second set (set II) included all strains, and positions downstream of 289, which 109 represents the beginning of the region where FCoV2 and the other sequences share a high 110 degree of sequence similarity. In-frame nucleotide sequences were translated to amino-acids, 111 aligned with MAFFT (Katoh and Standley 2013) , and then mapped back to the nucleotide 112 sequences to produce a codon-aware alignment. The resulting alignments were largely gapless, 113 with the exception of two short regions of indels in alignment set I (specifically, the 0-domain). 114 We excluded these regions from positive selection analyses, since uncertain alignment is known 115 to degrade method performance. Both alignment sets were screened for recombination with breakpoints identified using 119 GARD (Kosakovsky Pond et al. 2006) ; set II was also evaluated with RDP5 (Martin et al. 2020) 120 for an additional level of granularity with regard to determining the polarity of sequence 121 exchanges. Each of the resulting GARD fragments served as input to the selection analyses, 122 concomitant with their respective phylogeny, which was inferred using RaxML (Stamatakis 123 2014) under the GTR+Γ nucleotide substitution model. 124 We performed site-, branch-, and alignment-level selection tests based on the dN/dS 125 (nonsynonymous / synonymous) ratio estimation as implemented in the HyPhy software 126 package v.2.5.31 (Kosakovsky Pond et al. 2020) . We used the MEME method (Murrell et al. 127 2012) to look for episodic diversifying selection pressure at individual sites across the entire tree 128 (both sets I and II). We tested the CCoV-HuPn-2018 terminal branch for evidence of selection, both overall (some subset of sites along this branch), using the aBSREL , and 130 BUSTED methods, and at individual sites using the FEL (Kosakovsky Pond 131 et al. 2005) and MEME (Murrell et al. 2012) methods. We modified the FEL and MEME tests to Information Criterion (BIC). Each tree with the best-fitting substitution model was then used as 146 input for root-to-tip regression analysis, where correlation coefficients were calculated using the 147 heuristic residual mean squared function. If a strong temporal signal exists (a linear relationship 148 between genetic distance and sampling time), the correlation coefficient will be positive. For 149 GARD partitions with correlation coefficient greater than 0.1, temporal signal was confirmed 150 using TDR. The R package TipDatingBeast (Rieux et al. 2017 ) was used to generate ten 151 random permutations of sample dates for each GARD alignment. BEAST2 (Bouckaert et al. 152 2014) was then used to estimate the evolutionary rate for both alignments with the true sample 153 dates and alignments for each randomized replicate. If the mean clock rate estimate of the 154 alignment with real sample dates fell outside the 95% highest posterior density (HPD) for the 155 randomized date set, temporal signal was deemed sufficient for subsequent analyses. (Yang et al. 2020) . S1, S2, of spike are highlighted and the protein is further subdivided into functional subunits and subdomains. Blue dots represent sites under positive selection in CCoV-HuPn-2018 as identified by MEME and FEL in the single branch tests; red dots represent sites that are unique in CCoV-HuPn-2018, but are not under positive selection; black "x"s indicate sites under positive selection in the MEME test of the complete alignment that had moderate EBF values for CCoV-HuPn-2018. Red text labels accompany each subdomain/functional unit and are based on the original FIPV spike structure (Yang et al. 2020 ): SP, signal peptide; 0 domain; A domain; B, includes RBD-Receptor-Binding Domain; C; D; UH, upstream helix; S2', S2' cleavage (predicted furin site, using ProP1.0 (Duckert et al. 2004) ); FP, fusion peptide; HR1, heptad repeat region 1; CH, central helix; CD, connector domain; HR2, heptad repeat region 2; TM, transmembrane domain; CT, cytoplasmic tail. The dashed line between D and UH refers to a region of peptide with no sequence similarity between FIPV and CCoV-HuPn-2018; this region includes the S1/S2 furin cleavage site in FIPV, which is absent in CCoV-HuPn-2018. The vertical black lines represent the breakpoints of the GARD identified non-recombinant fragments, and are labeled numerically. The ⧪ symbol represents a 3' GARD fragment of alignment set I that was analyzed for positive selection; this GARD fragment was determined from an alignment of just CCoV2b and TGEV sequences (set I). The 5' end of GARD fragment 2 represents the onset of FCoV2 sequence similarity (set II). when we refer to a sequence as a donor in the following descriptions, it applies to, and is limited 217 by, the sequences in this alignment, and that other closely related genetic variants could be the 218 precise donor. A total of 19 recombination events were well supported (Table S3 ) by a subset of 219 three or more of the recombination detection methods implemented in RDP5 including BURT, 220 RDP, MaxChi (Smith 1992) and GENECONV (Padidam et al 1999; ) . Of the 19 supported (Table S4) (Rasschaert et al. 1990 ). This 0-domain deletion and the resulting loss in sialic acid 279 binding led to a switch in tropism and pathogenicity for PRCV to predominantly respiratory tract-280 tropic (Krempl et al. 1997) . TGEV on the other hand can infect both the respiratory and enteric Yang et al. 2020; 330 Wu et al. 2009 ) and the specifics of the interaction between these loops and APN, even 331 between closely related viruses, can be very different . that the signal peptide can influence the glycan profile and antigenicity of the HIV surface 344 protein gp120 (Yolitz et al. 2018) , prompting these authors to suggest that despite the fact the 345 signal peptide is not part of the mature protein, it is likely to be subject to immune pressure. Both the MEME and FEL methods detected positive selection at a codon in the signal peptide of Tresnan et al. (1996) which demonstrate that feline APN can serve as a 390 functional receptor of type II CCoV, TGEV and HCoV-229E, suggesting that cats may act as a 391 mixing vessel for generating recombinant Alphacoronavirus 1 CoVs. The origins of WSU 79-392 1683 may also include two recombination events in the Orf1ab region with FCoV1 and CCoV as 393 sequence donors (Herrewegh et al. 1998 ). These observations lead us to conclude that WSU 394 79-1683, or its close relative, has had a prominent role in the evolution of CCoV-HuPn-2018 and 395 that these viruses have repeatedly coinfected hosts, resulting in recombinant progeny. 396 We propose that at some time in the history of CCoV-HuPn-2018, its spike protein 0- SignalP 5.0 improves signal peptide predictions using deep neural networks BEAST 2: a software platform for Bayesian evolutionary analysis Evidence for an Ancestral Association of Human Coronavirus 229E with Bats Recombinant canine coronaviruses related to transmissible gastroenteritis virus of Swine are circulating in dogs Fulllength genome analysis of canine coronavirus type I Does pathogen plasticity facilitate host shifts? Relaxed phylogenetics and dating with confidence Bayesian coalescent inference of past population dynamics from molecular sequences The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data Prediction of proprotein convertase cleavage sites Molecular Evolution of Human Coronavirus Genomes Feline coronavirus type II strains 79-1683 and 79-1146 originate from a double recombination between feline coronavirus type I and canine coronavirus Deletion of a 197-amino-acid region in the N-terminal domain of spike protein attenuates porcine epidemic diarrhea virus in piglets ModelFinder: fast model selection for accurate phylogenetic estimates MAFFT multiple sequence alignment software version 7: improvements in performance and usability Not so different after all: a comparison of methods for detecting amino acid sites under selection HyPhy 2.5-A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies GARD: a genetic algorithm for recombination detection Point mutations in the S protein connect the sialic acid binding activity with the enteropathogenicity of transmissible gastroenteritis coronavirus Molecular Evolutionary Genetics Analysis across Computing Platforms Computing Bayes factors using thermodynamic integration Structure, Function, and Evolution of Coronavirus Spike Proteins Receptor usage and cell entry of porcine epidemic diarrhea coronavirus RDP5: a computer program for analyzing recombination in, and removing signals of recombination from, nucleotide sequence datasets Isolation of feline coronaviruses from two cats with diverse disease manifestations Molecular diversity of coronavirus host cell entry receptors IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era Past and ongoing adaptation of human cytomegalovirus to its host Gene-wide identification of episodic selection Detecting individual sites subject to episodic diversifying selection Cell culture isolation and sequence analyses of genetically diverse US porcine epidemic diarrhea virus strains including a novel strain with a large deletion in the spike gene Cryptic genetic variation: evolution's hidden substrate Possible emergence of new geminiviruses by frequent recombination Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7 Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) Porcine respiratory coronavirus differs from transmissible gastroenteritis virus by a few genomic deletions Structural bases of coronavirus attachment to host aminopeptidase N and its inhibition by neutralizing antibodies tipdatingbeast: an r package to assist the implementation of phylogenetic tip-dating tests using beast Minimum Determinants of Transmissible Gastroenteritis Virus Enteric Tropism Are Located in the N-Terminus of Spike Protein Structural basis for broad coronavirus neutralization Transmissible gastroenteritis coronavirus, but not the related porcine respiratory coronavirus, has a sialic acid (N-glycolylneuraminic acid) binding activity Prevalence of feline coronavirus in two cat populations in Malaysia Analyzing the mosaic structure of genes Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies The Role of Cell Surface Sialic Acids for SARS-CoV-2 Infection Feline aminopeptidase N serves as a receptor for feline, canine, porcine, and human coronaviruses in serogroup I Novel Canine Coronavirus Isolated from a Hospitalized Pneumonia Patient, East Malaysia A conserved immunogenic and vulnerable site on the coronavirus spike protein delineated by cross-reactive monoclonal antibodies AXL is a candidate receptor for SARS-CoV-2 that promotes infection of pulmonary and bronchial epithelial cells RELAX: detecting relaxed selection in a phylogenetic framework The S gene of canine coronavirus, strain UCD-1, is more closely related to the S gene of transmissible gastroenteritis virus than to that of feline infectious peritonitis virus Improving Virus Taxonomy by Recontextualizing Sequence-Based Classification with Biologically Relevant Data: the Case of the Alphacoronavirus 1Species. mSphere Receptor-binding loops in alphacoronavirus adaptation and evolution Crystal structure of NL63 respiratory coronavirus receptor-binding domain complexed with its human receptor Cryo-EM analysis of a feline coronavirus spike protein reveals a unique structure and camouflaging glycans Signal peptide of HIV envelope protein impacts glycosylation and antigenicity of gp120 This study received funding (FOA PAR-18-604) from the U.S. Food and Drug Administration's Veterinary Laboratory Investigation and Response Network (FDA Vet-LIRN) under grant 1U18FD006993-01, awarded to LBG and MJS. SLKP and JDZ were supported in part by grants R01 AI134384 (NIH/NIAID) and U01 GM110749 (NIH/NIGMS). We gratefully acknowledge Jean Millet for helpful advice on Alphacoronavirus biology and spike structural domains.