key: cord-267136-1abp6oom authors: Lan, Yu-Ching; Liu, Hsin-Fu; Shih, Yi-Ping; Yang, Jyh-Yuan; Chen, Hour-Young; Chen, Yi-Ming Arthur title: Phylogenetic analysis and sequence comparisons of structural and non-structural SARS coronavirus proteins in Taiwan date: 2004-12-07 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2004.08.005 sha: doc_id: 267136 cord_uid: 1abp6oom Taiwan experienced a large number of severe acute respiratory syndrome (SARS) viral infections between March and July 2003; by September of that year, 346 SARS cases were confirmed by RT-PCR or serological tests. In order to better understand evolutionary relationships among SARS coronaviruses (SCoVs) from different international regions, we performed phylogenetic comparisons of full-length genomic and protein sequences from 45 human SCoVs (including 12 from Taiwan) and two civet SCoVs. All the Taiwanese SARS-CoV strains which associated with nosocomial infection formed a monophyletic clade within the late phase of the SARS epidemic. This Taiwanese clade could be further divided into two epidemic waves. Taiwan SCoVs in the first wave clustered with three isolates from the Amoy Gardens housing complex in Hong Kong indicating their possible origin. Of the 45 human SCoVs, one isolate from Guangdong province, China, exhibited an extra 29-nucleotide fragment between Orf 10 and Orf 11—similar to the civet SCoV genome. Nucleotide and protein sequence comparisons suggested that all SCoVs of late epidemic came from human-to-human transmission, while certain SCoVs of early epidemic might have originated in animals. On 7 August 2003, the World Health Organization (WHO, 2003) reported that the 2003 SARS pandemic infection had spread to more than 30 countries, affecting 8422 people and killing 916. Later that year a novel coronavirus (SARS-CoV) was isolated from SARS patients Ksiazek et al., 2003; Peiris et al., 2003b; Poutanen et al., 2003) ; an animal inoculation experiment identified a causal relationship between SARS and SARS-CoV infection . Zhong et al. (2003) identified the geographic origin of the epidemic as Guangdong province, China, and the originating month as November 2002. The first SARS case in Taiwan was diagnosed on 14 March 2003. Its history was traced to a trip by the index case to Guangdong in mid-February, when the SARS epidemic in that province reached its peak (CDC, 2003; ROC CDC, 2003) . The index case transmitted the virus to his wife and son; the first SARS coronavirus in Taiwan-SCoV TW1-was isolated from the son (Hsueh et al., 2003) . On 26 March a male resident of the Amoy Gardens housing complex in Hong Kong (hereafter referred to as Mr. X) flew to Taiwan. On 27 March he took a train from Taipei to Taichung City to visit his younger brother. That night he experienced a high fever; most likely he also read a local news report of a major SARS outbreak in Amoy Gardens that same day (Peiris et al., 2003a) . He returned to Hong Kong on 28 March. After he was admitted to a hospital, Mr. www.elsevier.com/locate/meegid Infection, Genetics and Evolution 5 (2005) 261-269 X made a phone call to warn his younger brother, but it was too late. The younger brother who developed symptoms on 31 March became the first SARS-related fatality (TC1) in Taiwan. A second index case (SCoV-TWC) was isolated from this patient (the younger brother) by the ROC CDC . The third index case was Ms. A, a female adult who traveled on the same Taiwanese train as Mr. X on 27 March. Two days later she visited a hospital in Taipei complaining of general fatigue. In addition to the local hospital, she visited two private clinics before being referred to Taipei Municipal Hoping hospital on 9 April. She spent less than 6 h in that hospital's emergency room, but she probably transmitted the virus to two patients, an assistant nurse who escorted her to the X-ray room, and a laundry worker who handled her isolation gown. These individuals transmitted the SARS virus to other medical personnel and patients, resulting in the entire hospital being shut down for more than 2 months starting on 24 April. According to the ROC CDC, the Hoping hospital nosocomial infection resulted in 66 probable and 22 suspected SARS cases (Wu et al., 2003) . Even though the Taiwanese government imposed a quarantine on 28 April on all air travelers arriving from China, Hong Kong, Singapore, Macau, or Toronto, the virus still spread to different parts of the main island of Taiwan and the adjacent Penghu Islands. By 1 September, 346 SARS cases in Taiwan had been confirmed by RT-PCR or serological tests (WHO, 2003) . The size of the SCoV genome is approximately 29.7 kb (Marra et al., 2003; Rota et al., 2003) . The 5 0 portion of the genome (21 kb, about two-thirds) contains the code for the replicase gene, including two large open reading frames (Orfs), referred to as Orfs 1a and 1b. The other onethird of the genome contains Orfs for four structural proteins (spike [S], envelope [E], membrane [M], and nucleocapsid [N] ) and nine putative non-structural proteins (Orfs 3, 4, 7, 8, 9, 10, 11, 13 and 14) . Recently, Guan et al. (2003) isolated SCoV-like viruses from Himalayan palm civets and raccoon dogs in southern China. According to a comparative analysis of human and animal SCoV genomes, the three animal SCoVs (SZ1, SZ13 and SZ16) all retain a 29-nucleotide sequence inserted between Orfs 10 and 11. For this study, we used phylogenetic analysis to investigate relationship among 12 Taiwanese SARS-CoVs and between those SCoVs from other countries. One specific goal was to determine whether the SARS-CoV isolate from Mr. X's younger brother (TWC) clustered with the isolate from Ms. A (TWC2), and whether either one of those isolates clustered with isolates from other Amoy Gardens residents (Chim et al., 2003) . We also compared the amino acid sequences of the S, E, M, and N structural proteins and three of the nine putative non-structural proteins (Orfs 3, 10, and 11) for 47 SARS-CoVs including 12 Taiwanese strains. Twelve Taiwanese SCoV strains were included in this study: TW1 (Hsueh et al., 2003) , TWC, TWC2, TWC3, TWH, TC1, TC2, TC3, TWJ, TWK, TWS and TWY. TW1 was isolated from a patient whose father spent time in Guangdong province in mid-February 2003. TWC was isolated from Taiwan's first SARS-related fatality. TWC2 and TWC3 were isolated from Taipei Municipal Hoping hospital patients, and TWC3 was from Ms. A, the third index case. An additional 33 full-length genomic sequences from human SCoV strains were selected from the GenBank: nine from Beijing (BJ01, BJ02, BJ03, BJ04, PUMC01, PUMC02, PUMC03, Sino3-11 and Sino1-11), six from Hong Kong (CUHK-W1, CUHK-AG03, CUHK-AG02, CUHK-AG01, CUHK-Su10 and HKU-39849), five from Singapore (Sin2679, Sin2677, Sin2500, Sin2774 and Sin2748), two from Guangzhou (GD01 and GZ50), two from Frankfurt (Frankfurt1 and FRA), two from Milan (AS and HSR1), two from Guangdong province (ZMY1 and GD69), and one each from Wuhan (WHU), Zhejiang province (ZJ01), Moscow (SoD), Toronto (TOR2), and Hanoi (Urbani). A BLAST search was performed to locate SARS CoV sequences in the GenBank database. A total of 47 full-length nucleotide sequences from SARS CoV isolates (including two civet isolates) were aligned and edited using the BioEdit program (Hall, 1999) . Phylogenetic analyses were conducted with the Phylip 3.6b (Felsenstein, 1989) and MEGA2 programs (Kumar et al., 2001) using the neighbor-joining (NJ) and Fitch and Wagner parsimony (Pars) methods. Evolutionary distances were estimated with the Kimura twoparameter model (Kimura, 1980) . NJ and Pars tree robustness were statistically evaluated by bootstrap analysis (100 samples). SCoV nucleotide sequence variation was analyzed with the SIMPLOT program (Johns Hopkins University, Baltimore, MD). The 20 SCoVs used for this task were the Urbani, CUHK-W1, TOR2, HKU-39849, BJ01, BJ02, BJ03, BJ04, GD01, TW1, TWC, SIN2774, SIN2748, SIN2679, SIN2677, SIN2500, HSR1, CUHK-Su10, Frankfurt1, and GZ50. Two civet SCoVs (SZ3 and SZ16) were used as references for comparison. Sequence variation distance plots were generated with 1000 bp windows, 100 bp steps, and a Jukes-Cantor correction. Nucleotide sequences for the four structural genes, Orf 3, and Orf 10 were edited and translated into amino acid sequences using the BioEdit program prior to alignment for comparisons. The accession numbers for the SCoVs used in this study are Urbani: To better understand evolutionary relationships between SCoVs isolated in Taiwan and those isolated in other parts of the world, we constructed phylogenetic trees with two different methods using full-length genomic sequences from 45 human (12 Taiwanese) and two civet SCoVs. Tree topologies were consistent for the NJ (Fig. 1a) and Pars (Fig. 1b) methods. Two human SCoV epidemics were identified. The late epidemic SCoVs formed a well-supported monophyletic clade with bootstrap values of 98 and 88 for the NJ and Pars trees, respectively. The early epidemic sequences did not cluster into a monophyletic clade, even though they did clearly differed from those of late epidemic. Fig. 1 . Human and civet SCoV phylogenetic trees, produced with the neighbor-joining (NJ) method using full-length (29.7 kb) sequences. Branch bootstrap values from 100 reps: (a) using the SZ3 civet SCoV as a root; (b) a tree produced using the parsimony (Pars) method. All early epidemic SCoVs had Chinese origins: Beijing (BJ01, BJ02, BJ03 and BJ04), Guangzhou (GD01 and GZ50), and Hong Kong (CUHK-W1). All the Taiwanese SCoVs sequences which associated with nosocomial infection clustered into a monophyletic clade (bootstrap values 75 and 78 for NJ and Pars trees, respectively) within the late epidemic and could be further classified into two epidemic waves. Second wave was a monophyletic clade supported by bootstrap values of 74 and 75 for NJ and Pars tree, respectively, while first wave was not a fully resolved cluster. TWC (from Mr. X's younger brother) did not cluster with three isolates from Amoy Gardens (CUHK-AG03, CUHK-AG02, CUHK-AG01), but did cluster with an isolate (WHU) from Wuhan, China (bootstrap value 90 for NJ tree) (Fig. 1a) . Pairwise comparison methods were used to analyze nucleotide sequence variation within the full-length genomes of 20 human SCoVs (7 from early epidemic and 13 from late epidemic) (Fig. 2) . Two civet SCoV sequences (SZ3 and SZ16) were used as references for comparison. Our results revealed that the highest variation rate was in the 3 0 one-third of the viral genome, especially the nucleotide sequences near the junction between the replicase 1B and spike genes; Orf 3 also had a relatively high sequence variation rate. Amino acid sequences for the S, M, E, and N structural proteins of 45 human SCoVs were compared with those of the SZ-16 civet SCoV (Fig. 3) . The S protein was divided into S1 and S2 domains according to the molecular model proposed by Spiga et al. (2003) . The S1 domain (N-terminal 17-680 amino acid residues, responsible for receptorbinding) had 18 (2.7%) amino acid differences; the S2 domain (681-1255 amino acid residues) had 11 (1.9%)-a total of 29 (2.3%) differences in the S proteins of 43 SCoVs. The S genes of WHU and ZMY1 contained several nucleotide insertions that interrupted the open reading frames. The amino acid distances of S proteins were 1.3% (16.4/1239) for early epidemic SCoVs and 1.4% (17.2/1239) for late epidemic SCoVs in comparison with civet SCoVs. Intra-group sequence variation for early epidemic was 0.3% (n = 7) and for late epidemic 0.09% (n = 38) ( Table 1 ). The numbers of amino acid differences were 4 for the E protein (5.3%), 7 for M (3.2%), 4 for N (0.9%), and 11 for Orf 3 (4.0%) (Fig. 3) . Amino acid distances among the 45 human SCoVs were 0.18% (0.13/76) for the E protein, 0.75% (1.67/ 221) for M, 0.04% (0.16/422) for N, and 1.53% (4.20/274) for Orf 3 (Table 1 and Fig. 4) . Among the 45 human SCoVs that we analyzed, an isolate (GD01) from Guangdong province China, contained an extra 29-nucleotide fragment. Both WHU and TWC had dinucleotide deletions at the 30th and 31st nucleotides of Orf 10, resulting in a frame shift and premature stop of the putative protein (Fig. 5 ). In addition, we observed a 5nucleotide deletion at the 32nd nucleotide of Orf 10 in Sin2748; this also resulted in a frame shift and premature translation stop. Both the NJ and Pars trees separated the human SCoVs into two epidemics, even though early epidemic SCoVs failed to cluster into a well-supported monophyletic clade ( Fig. 1a and b) . The early epidemic sequences were more closely related than the late epidemic sequences to civet SCoVs; all seven early epidemic SCoVs were from either Fig. 2 . Plot analyses were used to compare diversity distributions among genes from 20 human SCoVs. The average genetic distance from the reference genome of civet SCoVs of 20 human SCoVs are plotted over the entire genome of SCoV. Genomic sequences from the SZ3 and SZ16 civet SCoVs were used as references. The X-axis is the nucleotide location of the SARS-CoV genome. The Y-axis is the rate of nucleotide differences between 20 human SCoVs and civet SCoVs. Sequence variation distance plots were generated with a 1000 bp window, 100 bp steps by Simplot program. Guangdong province or Beijing. Among all the analyzed human SCoVs, GD01 was the only one having an extra 29nucleotide fragment which was also found in the civet SCoVs . Furthermore, the average intragroup amino acid distance for the S gene in early epidemic was higher than for late epidemic (Table 1) . We also identified a signature amino acid sequence pattern (amino acid residues 77 and 244; Fig. 6 ) shared by early epidemic isolates and civet SCoVs. These evidences suggested that late epidemic SCoVs were transmitted from human-tohuman, while certain early epidemic SCoVs (e.g., GD01) might have been transmitted from animals to humans before spreading among various human populations. Among the Taiwanese SCoVs, our phylogenetic analysis does not support the hypothesis of an epidemiological link between the first and third index cases (Mr. X and Ms. A). According to our NJ tree, TWC (a SCoV isolate from Mr. X's younger brother) clustered with the WHU SCoV from Wuhan, China (bootstrap = 92), while TWC-3 (Ms. A's isolate) clustered with CUHK-AG02 and CUHK-AG03, both of which originated in Hong Kong's Amoy Gardens housing complex. A sequence analysis demonstrated that TWC and WHU had di-nucleotide deletions in Orf 10, resulting in a shift in the open reading frame (Fig. 5) . Therefore, even though Mr. X and Ms. A took the same train from Taipei to Taichung, the evidence indicates that Mr. X was not the source of Ms. A's infection; that source has yet to be identified. As shown in the diversity plot, the S gene and Orf 3 at the junction between the replicase 1B and S genes had a higher number of sequence variations compared to other genomic regions (Fig. 2) . This influenced our decision to perform additional sequence comparisons of the S, E, M and N structural genes and Orfs 3 and 10. The S proteins of coronaviruses have been described as large, type I membrane glycoproteins that are responsible for both the binding of receptors to host cells and membrane fusion Xiao et al., 2003) . The type I glycoproteins of coronaviruses, whose trimers resemble typical viral spikes, is transformed into virions through noncovalent interactions with M proteins. Coronavirus S proteins contain two domains (or two subunits, depending on whether or not S is cleaved) (Spiga et al., 2003) . The S1 domain contains virus-neutralizing epitopes and the receptor-binding domain (Leparc-Goffart et al., 1998; Sanchez et al., 1999) . Xiao et al. (2003) recently localized the SCoV receptor-binding domain (RBD) to amino acid residues 303-537 of the S1 protein. As shown in Fig. 6 , we observed seven amino acid differences in the RBD of the S protein, including amino acid residues 311, 344, 360, 442, 479, 487 and 501. If we assume that the RBD is (a) conserved among different SCoVs, including civet SCoVs (Bonavia et al., 2002) , and (b) more than 30-50 amino acids in length (Lasky et al., 1987) , then it is possible that the RBD can be mapped onto amino acid residues 360-442. Identification of a receptor-binding domain of the spike glycoprotein of human coronavirus HCoV-229E Memoir of Severe Acute Respiratory Syndrome Control in Taiwan Severe acute respiratory syndrome Analysis of the whole-length sequences of ten strains of SARS coronavirus in Taiwan and its epidemiological implications Amino acid comparisons of S proteins from 44 human and civet SCoVs. The Urbani SCoV was used as a reference. A period (.) indicates concurrence with the top reference sequence (Urbani) in the alignment Genomic characterisation of the severe acute respiratory syndrome coronavirus of Amoy Gardens outbreak in Hong Kong Identification of a novel coronavirus in patients with severe acute respiratory syndrome PHYLIP -Phylogeny Inference Package, Version 3.2 Aetiology: Koch's postulates fulfilled for SARS virus Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT Microbiologic characteristics, serologic responses, and clinical manifestations in severe acute respiratory syndrome A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences A novel coronavirus associated with severe acute respiratory syndrome MEGA2: molecular evolutionary genetics analysis software Delineation of region of the human immunodeficiency virus type 1 gp120 glycoprotein critical for interaction with the CD4 receptor Targeted recombination within the spike gene of murine coronavirus mouse hepatitis virus-A59: Q159 is a determinant of hepatotropism Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Clinical progression and viral load in a community outbreak of coronavirusassociated SARS pneumonia: a prospective study Coronavirus as a possible cause of severe acute respiratory syndrome Identification of severe acute respiratory syndrome in Canada Targeted recombination demonstrates that the spike gene of transmissible gastroenteritis coronavirus is a determinant of its enteric tropism and virulence Molecular modelling of S1 and S2 subunits of SARS coronavirus spike glycoprotein Summary table of SARS cases by country Epidemiological investigation of the SARS outbreak in the Taipei Municipal Hoping Hospital. Memoir of Severe Acute Respiratory Syndrome Control in Taiwan The SARS-CoVs glycoprotein: expression and functional characterization Epidemiology and cause of severe acute respiratory syndrome (SARS) in Guangdong, People's Republic of China We thank Mr. Jon Lindemann for editing our manuscript. This work was supported in part by a grant from the National SARS Research Program of the Republic of China National Science Council (grant no. SVIR12).