key: cord-324054-d71rj29o authors: Zhang, Xuming; Kousoulas, Konstantin G.; Storz, Johannes title: The hemagglutinin/esterase gene of human coronavirus strain OC43: Phylogenetic relationships to bovine and murine coronaviruses and influenza C virus date: 1992-01-31 journal: Virology DOI: 10.1016/0042-6822(92)90089-8 sha: doc_id: 324054 cord_uid: d71rj29o Abstract The complete nucleotide sequences of the hemagglutinin/esterase (HE) genes of human coronavirus (HCV) strain OC43 and bovine respiratory coronavirus (BRCV) strain G95 were determined from single-stranded cDNA fragments generated by reverse transcription of virus-specific mRNAs and amplified by polymerase chain reaction. An open reading frame of 1272 nucleotides was identified as the putative HE gene by homology to the bovine coronavirus HE gene. This open reading frame encodes a protein of 424 amino acids with an estimated molecular weight of 47.7 kDa. Ten potential N-linked glycosylation sites were predicted in the HE protein of HCV-OC43 while nine of them were present in BRCV-G95. Fourteen cysteine residues were conserved in the HE proteins of both viruses. Two hydrophobic sequences at the N-terminus and the C-terminus may serve as signal peptide and transmembrane anchoring domain, respectively. The predicted HE protein of HCV-OC43 was 95% identical to the HEs of BRCV-G95 and other bovine coronaviruses, and 60% identical to the HEs of mouse hepatitis viruses. Phylogenetic analysis suggests that the HE genes of coronaviruses and influenza C virus have a common ancestral origin, and that bovine coronaviruses and HCV-OC43 are closely related. Coronaviruses possess a single-stranded, nonsegmented RNA genome of positive polarity (1, 2) and are associated with a variety of diseases in man and animals (3-5). Coronaviruses are divided into two major antigenic groups. The first group includes human coronavirus strain OC43 (HCV-OC43), bovine coronavirus (BCV), mouse hepatitis virus (MHV), and hemagglutinating encephalomyelitis virus of swine (HEV) (2, 5). HCV-OC43 causes respiratory infection of man similar to those of influenza viruses (6). BCV causes enteritis of newborn calves and is also considered to be an etiological factor of respiratory diseases of calves (7, 8) . MHV can infect different organs, causing enteric, respiratory, and neurological diseases (5, 9) . A unique property of coronaviruses within this antigenic cluster is the presence of the hemagglutinin/esterase (HE) gene. The genome of MHV-A59 contains an open reading frame (ORF), which may code for an HE protein. However, the HE is not expressed in infected cells (10, 11, 23 binding) and acetylesterase (receptor-destroying) activities similar to the HE (or HEF) glycoprotein of influenza C virus (ICV) (12) (13) (14) (15) (16) (17) (18) (19) (20) . It was shown that the HE glycoprotein of BCV strain Quebec induces neutralizing antibodies both in vitro and in viva and thus, is important in viral infectivity (21, 22) . It is evidently not required for viral infectivity in MHV-A59 and MHV-JHM (11). The role of the HE gene and its protein in coronavirus evolution, replication, and pathogenesis remains unclear. The exact genomic organization of HCV-OC43 is not known. Antigenic and nucleic acid hybridization studies indicate that the HCV-OC43 is closely related to BCV (23-25). By analogy to BCV, the order of the genes coding for the structural proteins probably is 5'-HE-S-M-N-3'. Recently, the N gene of HCV-OC43 was sequenced, and it was found to be similar to BCV N gene (97.59/o amino acid sequence homology) (26). The origin and evolutionary relationships among the HE genes of hemagglutinating coronaviruses isolated from different species are poorly understood. To elucidate the molecular evolution of the coronavirus HE genes, we sequenced the HE genes of HCV-OC43, a bovine respiratory coronavirus (BRCV), a virulent, and an avirulent BCV strains. We report here the complete nucleotide sequence of the HE genes of HCV-OC43 and BRCV-G95, and their phylogenetic relatedness to BCVs, MHV, and ICV. HCV-OC43 was obtained from the American Type Culture Collection (ATCC, 759-VR) and propagated in human rectal tumor (HRT-18) cells as described previ-ously (27). A bovine respiratory coronavirus strain Giessen 89-4595 (G95) was kindly provided by Dr. W. Herbst, Institute of Hygiene and Infectious Diseases of Animals, Justus Liebig University Giessen, Germany. This virus was originally isolated from nasal swabs of a calf suffering from respiratory disorder, and propagated in HRT-18 cells. Isolation and purification of viral RNA, cDNA synthesis, double-stranded (ds) cDNA amplification and single-stranded (ss) cDNA production by polymerase chain reaction (PCR), as well as DNA sequencing were performed as described previously (28, 44) . Primers were designed to generate cDNA fragments from virus-specific mRNAs by reverse transcription and PCR amplification, based on the high degree of genomic similarity between HCV-OC43 and BCVs (25, 26). These primers were previously used for amplification and sequencing of BCV S and HE genes (28,44). PCRgenerated cDNA fragments were directly sequenced in both directions. Analysis of the sequences revealed that a large ORF of 1272 nucleotides was identical in size to the HE genes of BCVs (29, 30, 44) . This ORF terminated 14 nucleotides upstream from the S gene (Zhang eta/., unpublished data) , and encoded a protein of 424 amino acid residues with an estimated molecular weight of 47.7 kDa (Figs. 1 and 2). Two identical sequences (CTAAAC), similar to the consensus intergenie sequence upstream of the HCV-OC43 N gene (CTAAAT) (26) and identical to the consensus sequence upstream of BCV HE and S genes, were found 16 nt upstream of the predicted initiation codon (at nucleotides 16 to 18) for the HE protein and 8 nt downstream from the termination codon, respectively. Hydropathic analysis of the predicted amino acid sequence indicated that the putative HE protein possessed the characteristics of a membrane protein. Specifically, a hydrophobic stretch of 15 amino acids at the N-terminus may serve as signal peptide with a cleavage site between amino acids 15 and 16 (30, 31, 44) . Another hydrophobic amino acid sequence near the C-terminus (amino acids 389 to 414) may serve as the transmembrane domain anchoring the protein in the viral envelope. A hydrophilic sequence of 10 amino acids at the C-terminus may serve as an intravirion-domain. Ten potential A/-linked glycosylation sites were predicted in the HE protein of HCV-OC43 while nine of them were present in that of BRCV-G95. Two internal ORFs were predicted within the large ORF extending from nt 107 to 517 and from 976 to 1228. By analogy to BCV HE gene, these results suggest that the predicted large ORF 2b represents mRNA 2-l of HCV-OC43 and BRCV-G95, encoding the HE glycoprotein. The predicted amino acid sequences of the HE genes from HCV-OC43 and BRCV-G95 (Fig. l) , BCV- (29), BCV-L9, BCV-LY138 (44), MHV-A59 (IO), MHV-JHM (11) were aligned using the programs of the University of Wisconsin Genetic Computer Group software package, version 6.1. The alignment revealed that the HE gene of HCV-OC43 was more closely related to BRCV-95 and BCVs than to MHVs. Nucleotide and amino acid sequences among HCV-OC43, BRCV-G95, and BCVs were 95.8 to 96.3% and 94.1 to 94.8% identical, respectively, while the amino acid sequence identity between HCV-OC43 and MHVs was approximately 60%. Fourteen cysteine residues were strictly conserved in the HE proteins of HCV-OC43, BRCV-G95, BCVs, and MHVs. The MHV-JHM had 15 amino acids and two cysteine residues more than HCV-OC43 and BRCV-G95. The alignment indicated that the eight HE genes among coronaviruses can be divided into two groups. The first group includes HCV-OC43, BRCV-G95, and all BCVs, and the second group includes MHV-JHM and MHV-A59. To identify a possible evolutionary pathway for the HE gene of coronaviruses, we compared the coronavirus HE genes with the ICV HE gene. An alignment of the predicted amino acid sequences is shown in Fig. 2 . In this alignment, the ICV HE1 subunit shows a sequence identity of approximate 28.2% with the HE protein of HCV-OC43, BRCV-G95, and BCVs, and 26.3% with the HE protein of MHVs. The alignment shows that several regions are completely identical. Most importantly, the putative acetylesterase active site (F-G-D-S) (at amino acids 72 to 75 in Fig. 2) is conserved in all HE proteins of human, bovine, and murine coronaviruses and ICV. Ten of the 14 cysteine residue positions of HCV-OC43 are conserved among all HE proteins compared. These data suggest that these proteins may be evolutionarily related to each other. DNA sequences for each gene were optimally aligned based on the alignment of their respective amino acid sequences (Fig. 2) . A maximum parsimony analysis was per-formed on the aligned DNA sequences to predict possible phylogenetic relationships among coronaviruses (detailed methodology for the phylogenetic, computer-assisted analysis is described in the legend of Fig. 3 ). Cladistic analysis of the DNA sequence data resulted in a single phylogenetic hypothesis (phylogram) with a total length of 1503 steps and a resealed consistency index of 0.962. This analysis suggested that all coronaviruses were divided in two clades. One clade included HCV-OC43, BRCV-G95, and BCVs. The other clade consisted of MHV-JHM and MHV-A59. Neither coronaviral clade is derived from the other. Within the clades, all BCVs were closely related taxa to HCV-OC43, and the MHV-JHM and MHV-A59 were sister taxa. The phylogram sug-G TC GA CTAAACTCAGTGAAAA TGTTTITGCTl'CC!XGATlTATTCTAGTTAGCTGCATAATTGGTAGCITAGGTTlTTACAACCCKCTACCAATGTTG'llTCGC ****** MetPheLeuLeuProArqPheIleLeuValSerCysIleIleGlySer~uGlyPhe~r~nProP~o~rAsnValValserH """""""""""""""""""""""""""""""""""""""""""""""""""""" -uAlaPhePheTrpAlaLeuArqLeu---__""____"_"""""""_""""""""""""""""""""""""""""""-"-""""""""" The sequences for BCV-L9, BCVLY138 were obtained from recent work (44). gested a common ancestor of this antigenic group of and assuming ICV as outgroup. The highly variable recoronaviruses. The highly cell-adapted strains BCV-gions (positions 19-l 85, 235-365,451-l 046, 1255 -L9, BCV-Quebec, and BCV-Mebus are closely related 1295 , 1351 -1430 , and 1459 -1503 were excluded to the wild-types BCV-LY138 and BRCV-G95. We ex-because they were not aligned with confidence. We cluded these strains in the final phylogenetic analysis, identified 125 phylogenetically informative sites (varibecause their close relationships resulted in collapsed able sites with at least two taxa potentially sharing a branches in the phylogenetic tree. We further at-derived base) from 439 aligned base positions. The tempted to analyze the relationship among the HE phylogram shows an almost identical topology for the genes of selected coronaviruses and ICV based on coronaviral ingroup obtained by the previous analysis these results using limited DNA sequence information, (Fig. 3) . (IO), and BCV-LY138 (44) and a partial HE gene sequence of ICV (32) were used for this phylogram. The DNA sequences were aligned based on their deduced amino acid sequence alignment as shown in Fig. 2 Since the HCV-OC43 and ICV infect similar tissues in human subjects, the significant sequence homology between the HE genes of the two viruses suggests that coinfection of an ancestral coronavirus with ICV followed by recombination may have given rise to HCV-OC43. This was also proposed by Luytjes et al. (10) . Phylogenetic analysis also suggests that the HE genes of coronaviruses and ICV may originate from a common ancestor. It is worth noting that the HE protein of ICV contains receptor-binding, acetylesterase and fusion activities while that of coronaviruses contains only the receptor-binding and acetylesterase activities. The fusion function of ICV is associated with the /V-terminal hydrophobic region of the HE2 subunit of the HE protein (17) (18) (19) 32) . A similar hydrophobic domain was not found in the coronavirus HE protein. The high similarity between the HE proteins of HCV-OC43 and BCVs (95% identity on the average) suggests that both viruses are very closely related. This hypothesis is also supported by the tree branch distance in the phylogenetic analysis shown in Fig. 3 . Interestingly, the HE of HCV-OC43 is more closely related to those of BRCVG95 and the wild-type, virulent strain BCV-LY138 than to that of the cell-culture adapted avirulent strain BCV-L9. The wild-type strain BCV-LY138 does not replicate in numerous bovine cells in vitro, but it grows readily in human cells (HRT-18) without requiring prior adaptation (27, 33, 34) . Since these polarized human cells retain many features of primary epithelial cells, infection by BCV suggests that BCV may also infect humans, and therefore, it is a zoonotic virus (23, 35). We previously reported a case of human diarrhea caused by BCV-LY 138, in which the virus was identified from the infected patient (35). Recently, we found that BRCV-G95 exhibited almost identical cytopathology in vitro to the wild-type virulent strains BCV-LY 138 (data not shown). The HCV-OC43, BCVs, and BRCV-G95 differed only in few amino acids in the HE, and their putative acetylesterase active sites were conserved (see Fig. 2 ). 0-acetylneuraminic acid was shown to be the major determinant for ICV (36-38). HCV-OC43, BCVs, and BRCV-G95 probably recognize this receptor on the surface of many different epithelial cells. They may be able to replicate in epithelial cells of both respiratory and intestinal tracts, and to cross species-barriers causing diseases in heterologous hosts. However, HCV-OC43 primarily causes respiratory diseases and BCVs cause enteritis. The ability of these viruses to replicate in different organs and to cause different clinical symptoms is probably due to multiple amino acid differences occurring within several viral proteins. The S protein of MHV was shown to be important in tissue tropism (39). Recently, it was reported that turkey enteric coronavirus is antigenically and genomically closely related to BCVs (40) and similar functions were found in the HE protein of HEV (20). Whether swines or turkeys may also serve as reservoir (mixing-vessel) for coronavirus recombination in nature, as it was proposed for influenza A viruses (41) remains to be elucidated. It is worth noting that ICV was also isolated from pigs (42). It will be worthwhile to compare the HE genes among these coronaviruses. Comparison of the remaining genes with HCV-OC43 and BCVs will provide further insight into their evolution and host cell tropism. Proc. Nat/. Acad. Sci. USA Proc. Nat/. Acad. Sci. USA