key: cord-0763116-evutu0lx authors: Woo, P. C. Y.; Lau, S. K. P.; Huang, Y.; Tsoi, H.-W.; Chan, K.-H.; Yuen, K.-Y. title: Phylogenetic and recombination analysis of coronavirus HKU1, a novel coronavirus from patients with pneumonia date: 2005-06-28 journal: Arch Virol DOI: 10.1007/s00705-005-0573-2 sha: fa5ab7bb8d82ae7c970b9dc6a2f10eadf0873623 doc_id: 763116 cord_uid: evutu0lx Phylogenetic trees constructed using predicted amino acid sequences of putative proteins of coronavirus HKU1 (CoV-HKU1) revealed that CoV-HKU1 formed a distinct branch among group 2 coronaviruses. Of the 14 trees from p65 to nsp10, nine showed that CoV-HKU1 was clustered with murine hepatitis virus. From nsp11, the topologies of the trees changed dramatically. For the eight trees from nsp11 to N, seven showed that the CoV-HKU1 branch was the first branch. The codon usage patterns of CoV-HKU1 differed significantly from those in other group 2 coronaviruses. Split decomposition analysis revealed that recombination events had occurred between CoV-HKU1 and other coronaviruses. It has been estimated that coronaviruses [human coronaviruses 229E (HCoV-229E) and OC43 (HCoV-OC43)] cause about 5-30% of respiratory tract infections. In late 2002 and 2003, Severe Acute Respiratory Syndrome (SARS), caused by SARS coronavirus (SARS-CoV), has resulted in more than 750 deaths [12, 15, 16, 17, [22] [23] [24] . In early 2004, a novel coronavirus associated with respiratory tract infections, human coronavirus NL63 (HCoV-NL63), was discovered [3, 20] . As a result of a unique mechanism of viral replication, coronaviruses have a high frequency of recombination [9, 10, 13, 14] . Coronaviruses were divided into three groups, with HCoV-229E and HCoV-NL63 being group 1 coronaviruses and HCoV-OC43 a group 2 coronavirus respectively [11] . For SARS-CoV, it was initially proposed that SARS-CoV constituted a distinct group of coronavirus [15, 17] . However, after more extensive phylogenetic analysis, it was discovered that SARS-CoV probably represents a distant relative of group 2 coronaviruses [2, 18] . Further in silico analysis also predicted that SARS-CoV could be a product of recombination between mammalian and avian coronaviruses [19] . Recently, we have described the discovery of a novel coronavirus associated with pneumonia, coronavirus HKU1 (CoV-HKU1) [21] . Based on analysis of the putative chymotrypsin-like protease (3CL pro ), RNA-dependent RNA polymerase (Pol), helicase, hemagglutinin-esterase (HE), spike (S), envelope (E), membrane (M) and nucleocapsid (N), CoV-HKU1 is a member of group 2 coronaviruses. However, the origin of CoV-HKU1 is still unknown. In this study, we performed a detailed phylogenetic analysis of CoV-HKU1. Possible recombination events were predicted and the origin of CoV-HKU1 discussed. The predicted amino acid (a.a.) sequences of p65, conserved portions of nsp1 [papain-like protease 1 (PL1 pro ), Appr-1-p processing enzyme family (A1pp), papain-like protease 2 (PL2 pro ), hydrophobic domain 1 (HD1), and hydrophobic domain 2 (HD2)], nsp2-7, nsp9-13, HE, S, E, M and N were extracted from the CoV-HKU1 genome sequence (GenBank accession no. AY597011) [21] . The corresponding a.a. sequences of murine hepatitis virus (MHV), HCoV-OC43, bovine coronavirus (BCoV), porcine hemagglutinating encephalomyelitis virus (PHEV), rat sialodacryoadenitis coronavirus (SDAV) and puffinosis virus (PV) were extracted from complete genome sequences of MHV (GenBank accession no.AF201929), HCoV-OC43 (GenBank accession no. AY585229) and BCoV (GenBank accession no. NC 003045), and sequences of PHEV, SDAV and PV available in GenBank. The a.a. sequence of HE of MHV was extracted from MHV strain JHM (GenBank accession no. BAA00661) because the HE gene in MHV (GenBank accession no. AF201929) stopped prematurely after the 97th a.a. Phylogenetic tree construction was performed using neighbour joining method with ClustalX 1.83. The corresponding a.a. sequences of HCoV-229E were used as outgroups, except for p65 and HE because these were not available in the genome of HCoV-229E. For p65 and HE, the corresponding a.a. sequences in SARS-CoV and influenza C virus were used as the outgroups respectively. Phylogenetic trees were not constructed for p28 and the predicted hypothetical protein of ORF4 and ORF8 in CoV-HKU1 because no a.a. sequences that can be used as the appropriate outgroups can be found. The amino-terminal 800 a.a. residues of the S proteins in various group 1 coronaviruses [porcine transmissible gastroenteritis virus (TGEV), HCoV-NL63 and HCoV-229E], various group 2 coronaviruses (PHEV, SDAV, MHV, HCoV-OC43 and BCoV), infectious bronchitis virus (IBV) (a group 3 coronavirus), SARS-CoV and CoV-HKU1 were aligned using ClustalX 1.83. The presence and positions of conserved cysteine residues in the various peptides were compared. Correspondence analysis was used to compare the codon usage pattern variation in the different genes among group 2 coronaviruses in a multidimensional space [5] . All available sequences of ORF 1ab, HE, S, M and N of MHV, HCoV-OC43, BCoV, PHEV, SDAV, PV and SARS-CoV were downloaded from the GenBank ( Table 1) . Analysis of codon usage in these sequences and the corresponding ones in CoV-HKU1 was performed using CodonW (http://www.molbiol.ox.ac.uk/cu/), with each gene represented as a 59 dimensional vector, representing the 59 possible sense codons. AUG, the only codon for methionine, UGG, the only codon for tryptophan, and the three stop codons were excluded. The ORF for E was excluded because the length of the gene was too short. To delineate the importance of recombination on the evolution of CoV-HKU1, split decomposition analysis was performed. Deduced a.a. sequences of group 1, 2 and 3 coronaviruses and SARS-CoV available in GenBank, that were homologous to 3CL pro , Pol, helicase, HE, S, ORF4, E, M and N in CoV-HKU1 [21] , were retrieved. Split decomposition analysis was performed with SplitsTree version 3.2 [7] using Hamming correction and is presented with the same edge length. The genome organizations of CoV-HKU1 and other group 2 coronaviruses were shown in Fig. 1a . Phylogenetic trees using predicted a.a. sequences of putative proteins and polypeptides of CoV-HKU1 and other group 2 coronaviruses were constructed (Fig. 1b) . The putative proteins and polypeptides included p65, conserved portions of nsp1 (PL1 pro , A1pp, PL2 pro , HD1 and HD2), nsp2-7, nsp9-13, HE, S, E, M and N. All trees revealed that CoV-HKU1 formed a distinct branch among group 2 coronaviruses. Interestingly, of the 14 trees of p65 to nsp10, nine (64%) (p65, HD1, HD2, nsp3, nsp4, nsp6, nsp7, nsp9 and nsp10) showed that CoV-HKU1 was clustered with MHV (Fig. 1b) . However, for the eight trees of nsp11 to N, seven (88%) showed that the CoV-HKU1 branch appeared as the first branch among group 2 coronaviruses (Fig. 1b) . Comparison of the cysteine residues in the N-terminal 800 a.a. residues of S in CoV-HKU1 and those in the different groups of coronaviruses revealed that almost all the conserved cysteine residues in group 2 coronaviruses were present in CoV-HKU1 (Fig. 2a) , supporting that CoV-HKU1 is a member of group 2 coronaviruses. The number of ORF 1ab, HE, S, M and N sequences in the group 2 coronaviruses used for correspondence analysis is shown in Table 1 . The results of the For ORF 1ab, HE, S and M, the scores on axis 1 in group 2 coronaviruses other than CoV-HKU1 were clustered between −0.16 and 0.28 and those in CoV-HKU1 were clustered between −0.40 and −0.24 (Fig. 2b) . For N, the scores on axis 1 in group 2 coronaviruses other than CoV-HKU1 were clustered between 0.48 and 0.57 and that in CoV-HKU1 was at 0.11 (Fig. 2b) . These indicated that the codon usage patterns in the genes in CoV-HKU1 differed significantly from those in other group 2 coronaviruses. Split decomposition analysis revealed that recombination events had occurred between CoV-HKU1 and other group 2 coronaviruses in 3CL pro , Pol, helicase, HE, S, ORF4, E and M (Fig. 3) . No evidence of recombination was shown between the N of CoV-HKU1 and those of other group 2 coronaviruses. CoV-HKU1 is a distinct member of group 2 coronaviruses. It was confirmed by both phylogenetic analysis of 22 protein coding regions (Fig. 1b) and analysis of the conserved cysteine residues in the amino-terminal of the S proteins (Fig. 2a) that CoV-HKU1 is a group 2 coronavirus. Furthermore, phylogenetic analysis of the 22 protein coding regions revealed that there were 10-54% a.a. differences between a particular protein coding region in CoV-HKU1 and the corresponding region in the most closely related sequence, indicating that CoV-HKU1 is distinct from the other group 2 coronaviruses. This fact was further supported by results of correspondence analysis of codon usage (Fig. 2b) . Recombination events were common among CoV-HKU1 and other group 2 coronaviruses. Coronaviruses have high frequency of homologous RNA recombination, which has been observed in both tissue culture [10, 14] and experimentally infected animals [8] . In split tree analysis, recombination events would result in reticulations instead of simple branching structures. As shown in Fig. 3 , recombination was particularly frequent in CoV-HKU1 and MHV as compared to other group 2 coronaviruses such as BCoV and HCoV-OC43. The particular high recombination frequency in MHV [1] is in line with evidence of a lot of interstrain recombination, as shown by the high number of reticulations in various ORFs of the different MHV strains (Fig. 3) . Complete genome sequencing of additional CoV-HKU1 and further split tree analysis would shed light on whether CoV-HKU1 behaves more like MHV or BCoV and HCoV-OC43. CoV-HKU1 may have originated from a major recombination event and numerous minor recombination events among group 2 coronaviruses. In feline coronavirus, the site of recombination has been pinpointed to a region of about 50 nucleotides in the M gene by multiple alignment [6] .As for recombination between different strains of MHV, in vitro studies have shown both variable sites and rates of recombination, with the S gene have a frequency three fold that of the polymerase gene [4, 14] . In CoV-HKU1, nine of the 14 phylogenetic trees constructed using deduced a.a. sequences of p65 to nsp10 showed that CoV-HKU1 was clustered with MHV (Fig. 1b) . Interestingly, the topologies of the phylogenetic trees changed dramatically from nsp11. For the eight trees from nsp11 to N, seven revealed that the CoV-HKU1 branch appeared as the first branch among the group 2 coronaviruses (Fig. 1b) (P < 0.01 by chi-square test). A logical explanation was that a major recombination event has taken place in the region between nsp10 and nsp11 when CoV-HKU1 first appeared. However, this recombination event was not evident in multiple alignment performed at the junction between nsp10 and nsp11 (data not shown). This is because although CoV-HKU1 is more clustered with MHV from p65 to nsp10, the difference in phylogenetic distances between CoV-HKU1 and MHV and those between CoV-HKU1 and BCoV/HCoV-OC43 is not marked (Fig. 1b) , in contrast to what was observed in feline coronavirus [6] . Furthermore, bootscanning analysis in the whole genome did not reveal any putative recombination break point (data not shown). We speculate that this could be due to numerous minor recombination events between p65 and nsp10, such as between p65 and nsp1-PL1 pro , between nsp1-PL2 pro and nsp1-HD1, between nsp4 and nsp5, and between nsp5 and nsp6. This has resulted in CoV-HKU1 being clustered with MHV in only nine of the 14 phylogenetic trees constructed using deduced a.a. from p65 to nsp10, but four of the 14 trees with the CoV-HKU1 branch being the first branch among the group 2 coronaviruses. Establishing a genetic recombination map for murine coronavirus strain A59 complementation groups A previously undescribed coronavirus associated with respiratory disease in humans Evidence for variable rates of recombination in the MHV genome Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type Feline coronavirus type II strains 79-1683 and 79-1146 originate from a double recombination between feline coronavirus type I and canine coronavirus SplitsTree: analyzing and visualizing evolutionary data In vivo RNA-RNA recombination of coronavirus in mouse brain Sequence evidence for RNA recombination in field isolates of avian coronavirus infectious bronchitis virus Recombination between nonsegmented RNA genomes of murine coronaviruses The molecular biology of coronaviruses Detection of severe acute respiratory syndrome (SARS) coronavirus nucleoapsid protein in SARS patients by enzyme-linked immunosorbent assay Evidence of genetic diversity generated by recombination among avian coronavirus IBV High-frequency RNA recombination of murine coronaviruses Coronavirus as a possible cause of severe acute respiratory syndrome Characterization of a novel coronavirus associated with severe acute respiratory syndrome Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage Mosaic evolution of the severe acute respiratory syndrome coronavirus Identification of a new human coronavirus Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia Relative rates of non-pneumonic SARS coronavirus infection and SARS coronavirus pneumonia Longitudinal profile of immunoglobulin G (IgG), IgM, and IgA antibodies against the severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein in patients with pneumonia due to the SARS coronavirus Detection of specific antibodies to severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein for serodiagnosis of SARS coronavirus pneumonia Author's address: Kwok-yung Yuen This work is partly supported by the Research Grant Council Grant and Research Fund for the Control of Infectious Diseases of the Health, Welfare and Food Bureau of the Hong Kong SAR Government.