key: cord-0733248-kstt07y1
authors: Snijder, Eric J.; Bredenbeek, Peter J.; Dobbe, Jessika C.; Thiel, Volker; Ziebuhr, John; Poon, Leo L.M.; Guan, Yi; Rozanov, Mikhail; Spaan, Willy J.M.; Gorbalenya, Alexander E.
title: Unique and Conserved Features of Genome and Proteome of SARS-coronavirus, an Early Split-off From the Coronavirus Group 2 Lineage
date: 2003-08-29
journal: Journal of Molecular Biology
DOI: 10.1016/s0022-2836(03)00865-9
sha: 31c0669af6f86903dc621787dbe547167c0980da
doc_id: 733248
cord_uid: kstt07y1

Abstract The genome organization and expression strategy of the newly identified severe acute respiratory syndrome coronavirus (SARS-CoV) were predicted using recently published genome sequences. Fourteen putative open reading frames were identified, 12 of which were predicted to be expressed from a nested set of eight subgenomic mRNAs. The synthesis of these mRNAs in SARS-CoV-infected cells was confirmed experimentally. The 4382- and 7073 amino acid residue SARS-CoV replicase polyproteins are predicted to be cleaved into 16 subunits by two viral proteinases (bringing the total number of SARS-CoV proteins to 28). A phylogenetic analysis of the replicase gene, using a distantly related torovirus as an outgroup, demonstrated that, despite a number of unique features, SARS-CoV is most closely related to group 2 coronaviruses. Distant homologs of cellular RNA processing enzymes were identified in group 2 coronaviruses, with four of them being conserved in SARS-CoV. These newly recognized viral enzymes place the mechanism of coronavirus RNA synthesis in a completely new perspective. Furthermore, together with previously described viral enzymes, they will be important targets for the design of antiviral strategies aimed at controlling the further spread of SARS-CoV.

Severe acute respiratory syndrome (SARS) is a life-threatening form of atypical pneumonia that recently emerged in Guangdong Province, China. A previously unknown coronavirus was isolated from SARS patients 1 -3 and is considered the cause of this emerging respiratory disease. In an extraordinary effort, the full-length genome sequence of the SARS-coronavirus (SARS-CoV) was elucidated within weeks after the identification of this novel pathogen and published by the Michael Smith Genome Sciences Center (Vancouver, Canada, 4 Entrez Genomes accession number NC_004718 (AY274119)), the Centers for Disease Control and Prevention (Atlanta, USA, 5 GenBank accession number AY278741), and others. The SARS-CoV genome is , 29.7 kb long and contains 14 open reading frames (ORFs) flanked by 5 0 and 3 0 -untranslated regions of 265 and 342 nucleotides, respectively ( Figure 1 ). Homologs of proteins conserved in all coronaviruses are encoded by the overlapping ORFs 1a and 1b, and by ORFs 2, 4, 5, 6 and 9a (Figure 1 ; Tables 1 and 2) .

Coronaviruses 6, 7 are enveloped, positive-stranded RNA (þ RNA) viruses, with a single-stranded genome of between 27 kb and 31.5 kb, the largest among known RNA viruses. The genomes of coronaviruses and related viruses in the order Nidovirales 8, 9 are polycistronic and are expressed through a sophisticated combination of poorly understood regulatory mechanisms. 6, 7 Coronavirus genome expression starts with the translation of two large replicase ORFs (1a and 1b; Figure 1 ), whose coding capacity is about twice that of the average complete þ RNA virus genome. Via a 2 1 ribosomal frameshift, 10 the ORF1a polyprotein (pp1a; . 4000 amino acid residues) can be extended with ORF1b-encoded sequences to yield a . 7000 amino acid residue pp1ab polyprotein. Replicase polyprotein processing is carried out by two or three ORF1a-encoded viral proteinases. 11 The processing products are a group of largely uncharacterized (putative) replicative enzymes, including an RNA-dependent RNA polymerase, an RNA helicase that is fused to a complex N-terminal Zn-finger, and a Zn-ribbon-containing papain-like proteinase. 12 -15 The replicase subunits are thought to assemble into a viral replication complex that is targeted to cytoplasmic membranes by various membrane-associated viral proteins. 16 -18 In addition to genome replication, the coronavirus replicase complex mediates the synthesis of an extensive nested set of subgenomic (sg) mRNAs (transcription) to express all ORFs downstream of ORF1b, which encode a variety of structural and accessory proteins. 6 -9 The number and composition of these 3 0 -proximal ORFs vary greatly among coronaviruses, but they always include genes for the Figure 1 . Overview of the SARS-CoV genome organization and expression. Comparison of the genome organizations of SARS-CoV and bovine coronavirus (BCoV). The replicase genes are depicted, with ORF1a, ORF1b, and ribosomal frameshift site indicated. Arrows represent sites in the corresponding replicase polyproteins that are cleaved by papain-like proteinases (orange) or the 3C-like cysteine proteinase (blue). Cleavage products are provisionally numbered nsp1-nsp16 (see also Table 1 ). In the 3 0 -terminal part of the genomes, homologous structural protein genes are indicated in matching colors. Close-ups of two regions with major differences are shown (and see the text). In the N-terminal half of replicase ORF1a, SARS-CoV lacks one of the PL pro domains (indicated in orange/green in BCoV) and contains a unique insertion (SUD). In the region with structural and accessory protein genes, the location of the body TRSs involved in subgenomic RNA synthesis are indicated with red boxes (see Figure 3 and Hofmann et al. 76 ). The bottom part of the Figure illustrates which parts of the genome are conserved in the genus Coronavirus and in the order Nidovirales (the ORF1a sequence of toroviruses, which largely remains to be sequenced, could not be included). Furthermore, it is indicated for which domains homologs have been identified in other RNA viruses and the cellular world. Enzymes for which structural data are available are shown in blue. SUD, SARS-CoV unique domain; PL pro , papainlike cysteine proteinase; 3CL pro , 3C-like cysteine proteinase; TM, transmembrane domain; ADRP, adenosine diphosphate-ribose 1 00 -phosphatase; ExoN, 3 0 -to-5 0 exonuclease; CL pro , chymotrypsin-like proteinase; RdRp, RNA-dependent RNA polymerase; HEL1, superfamily 1 helicase; XendoU, (homolog of) poly(U)-specific endoribonuclease; 2 0 -O-MT, S-adenosylmethionine-dependent ribose 2 0 -O-methyltransferase; CPD, cyclic phosphodiesterase. Domains Ac, X, and Y are described by Ziebuhr et al. 32 and Gorbalenya et al. 47 structural proteins S, M, E and N, which drive cytoplasmic virus assembly. The mechanisms underlying the synthesis of genomic and subgenomic RNAs are poorly understood. To explain the composite structure of the sg mRNAs, which are both 5 0 and 3 0coterminal with the viral genome, several models have been put forward, 6, 9 of which the one postulating the discontinuous synthesis of negative-stranded sg templates for sg mRNA synthesis 19 has received wide support recently.

On the basis of antigenic cross-reactivity, coronaviruses were originally classified into three groups (termed groups 1, 2, and 3). Subsequently, the phylogeny-based clustering of coronaviruses proved at first (almost) identical with that based on antigenic cross-reactivity. 6, 7 The same three clusters were evident upon analysis of the replicase region 20 -22 which does not contribute to virion antigenicity. This indicated that different regions of the coronavirus genome have indeed co-evolved and that intergroup recombination has not played a prominent role in coronavirus evolution. 23 However, the agreement between the two classifications is not perfect, as some coronaviruses are sufficiently different to not have antigenic cross-reactivity with the established groups, 24 but close enough to cluster with one of them (group 1) on the basis of sequence comparisons. 7 Consequently, these viruses were placed into (the expanded) group 1. Here, we refer to coronavirus groups as evolutionary clusters that unite viruses not necessarily having antigenic cross-reactivity.

Using the recently published SARS-CoV genome sequences, 4,5 we provide insight into the evolution, organization and expression of SARS-CoV. The SARS-CoV genome and proteome were compared with those of other coronaviruses, distantly related nidoviruses, and databases, and several of our predictions were verified experimentally.

SARS-CoV represents a lineage that has split off from the group 2 branch relatively late in coronavirus evolution

To optimize our understanding of the SARS-CoV genome, we sought to infer the phylogenetic position of the novel agent relative to known 4 ) and the Centers for Disease Control and Prevention (Atlanta, USA; Gen-Bank accession number AY278741 5 ) and an alignment of SARS-CoV with previously characterized coronavirus sequences as summarized in Refs. 11,18,32. a For convenience, replicase cleavage products were provisionally numbered non-structural protein (nsp) 1-16 according to their position in the polyproteins.

b Amino acids of replicase proteins pp1a and pp1ab were numbered assuming that, as in other coronaviruses, a 21 ribosomal frameshift occurs; use of the slippery sequence UUUAAAC 10 is predicted to yield a peptide bond between Asn4378 and Arg4379 in pp1ab.

c Abbreviations: PL2 pro , papain-like proteinase 2; ADRP, adenosine diphosphate-ribose 1 00 -phosphatase; TM, transmembrane domain; 3CL pro , 3C-like cysteine proteinase; GFL, growth factor-like domain; RdRp, RNA-dependent RNA polymerase; ZD, putative Zinc-binding domain; HEL1, superfamily 1 helicase; NTD, nidovirus conserved domain; ExoN, 3 0 -to-5 0 exonuclease; 2 0 -O-MT, S-adenosylmethionine-dependent ribose 2 0 -O-methyltransferase. Domains Ac, X, and Y are described in Refs 32 and 47. d Indicated are the SARS-CoV proteinases predicted to be involved in cleavage of the N-and/or C-termini of the cleavage products; TI, translation initiation; TT, translation termination; RFS, ORF1a/ORF1b ribosomal frameshift. e Compared to the corresponding cleavage product of BCoV (see Figure 1 ), nsp3 lacks PL1 pro and contains a ,375 amino acid insertion between the X and PL2 pro domains which is unique for SARS-CoV (see also Figure 1 ). coronaviruses. Recent phylogenetic analyses of different SARS-CoV proteins using unrooted trees consistently showed that SARS-CoV does not segregate into any of the three currently established coronavirus groups. 4, 5 These results were interpreted as support for the classification of SARS-CoV as the prototype of a novel, fourth group of coronaviruses. 4, 5 However, in our opinion, the evidence leading to this conclusion was inconclusive and alternative interpretations, with SARS-CoV being an outlier in one of the established groups, remained possible. This uncertainty can be resolved only through the reconstruction of coronavirus evolution from its origin using a rooted phylogenetic tree, which is most reliable when an outgroup is included in the analysis. The closest known outgroup for coronaviruses are the toroviruses, which form a separate genus in the same virus family. 8, 25 The ORF1b part of the replicase and the two virion proteins S and M are homologous in coronaviruses and toroviruses. 26 -28 Unfortunately however, the level of conservation of the S and M protein genes is so low that we consider only the phylogenetic analysis of replicase ORF1b to be truly informative.

Consequently, to resolve the phylogenetic position of SARS-CoV, the equine torovirus (EToV 25 ) was included in our analysis, which was limited to replicase ORF1b, 26 the most conserved part of the genome. It should be noted, however, that the size of this genome segment (, 5500 nucleotides) approximates the combined size of the genes encoding the four virion-associated proteins S, M, E, and N. A fully resolved tree was obtained, with all branches supported in more than 960 out of 1000 bootstrap trials ( Figure 2 ). The topology of this tree suggests strongly that the SARS-CoV lineage was an early split-off from the group 2 branch, which occurred after the two bifurcations that gave rise to the three major coronavirus groups ( Figure 2 ). Accordingly, in two regions of the replicase ORF1a polyprotein, nsp1 and one of the nsp3 domains, which differentiate the three coronavirus groups, SARS-CoV contains orthologs of domains that are unique for group 2 coronaviruses (see Figure S1 of the Supplementary Material). The published unrooted trees for the virion proteins and 3CL pro are also compatible with this phylogeny, 4,5 although formally we cannot exclude the occurrence of recombination with other coronaviruses in very limited regions. In this respect, we would like to stress that the differences in the composition and arrangement of ORFs in the 3 0 -proximal region of the genome (downstream of ORF1b; see Figure 1 ) between SARS-CoV and established group 2 coronaviruses does not contradict the above results. Group 1 coronaviruses also differ in this region through the presence of unique socalled "accessory non-structural protein genes". 6, 7 Some of these genes have been found to be dispensable for virus reproduction in tissue culture and/or animals. 6, 7, 29 The fact that, apparently, they can be acquired or lost easily in the course of evolution indicates that these genes can not be considered reliable group markers.

In conclusion, SARS-CoV is distantly related to established group 2 coronaviruses, a relationship comparable to that observed in group 1 between porcine epidemic diarrhoea coronavirus (PEDV) and human coronavirus 229E (HCoV-229E) on the one hand, and transmissible gastroenteritis Predictions are based on the SARS-CoV sequences published by Michael Smith Genome Sciences Centre (Vancouver, Canada; Entrez Genomes accession number NC_004718 (AY274119) 4 ) and the Centers for Disease Control and Prevention (Atlanta, USA; GenBank accession number AY278741 5 ).

a See also Figures 1 and 3 . b ORF3b (462 nucleotides) overlaps with the 3 0 half of ORF3a, the RNA4 body TRS and the 5 0 end of ORF4. It is the fifth largest reading frame downstream of ORF1b (after ORFs 2, 3a, 5 and 9a) making it a likely candidate to be expressed. Since its translation initiation codon is the 13th AUG codon in mRNA3, ORF3b expression should involve a mechanism like internal ribosomal entry (as previously suggested for some other coronavirus ORFs; Ref. 78) or the synthesis of an as yet undetected additional subgenomic mRNA.

c The translation termination codon of ORF7a and translation initiation codon of ORF7b overlap. The absence of any other upstream AUG codons (with the exception of that of ORF7a) and good context for translation initiation of the ORF7b AUG codon suggest that ORF7b may be expressed from subgenomic RNA7 by "leaky scanning" of ribosomes.

d The putative ORF8a start codon is in a good context for translation initiation and immediately follows the body TRS involved in mRNA8 transcription, making it likely that ORF8a is expressed from mRNA8. The mechanism used to express the larger downstream ORF8b is more puzzling, since its (putative) translation initiation codon appears to have a poor context for translation initiation and two additional AUG codon are present in the region between the putative start codons of ORFs 8a and 8b. Recently, some SARS-CoV isolates from human and civet cat origin (L.L.M.P. and Y.G., unpublished results) were reported to contain a 29 nucleotides insertion that results in the in-frame fusion of ORFs 8a and 8b. Consequently, ORF8b in the Frankfurt-1 and HKU-39849 isolates used in this study may be translationally silent. e A functional "internal" open reading frame, overlapping with the N protein gene, has been described for other group 2 coronavirus, e.g. BCoV; 77 ORF9b appears to occupy a corresponding position and may be expressed following "leaky scanning" by ribosomes.

coronavirus (TGEV) and related viruses on the other hand ( Figure 2) . Accordingly, the lack of antigenic cross-reactivity observed between distant group-mates in group 1 24 may be observed between SARS-CoV and the established group 2 viruses. Thus, SARS-CoV may be the first identi-fied representative of a larger cluster that could be called subgroup 2b, if the established group 2 coronaviruses would be referred to as subgroup 2a. The 2b cluster should include the immediate ancestor of SARS-CoV, which may circulate in the field. If close relatives of SARS-CoV were to be identified in animal hosts, the virus would represent the second example of a group 2 coronavirus that may have crossed the animal -human barrier. The first putative case is that of the bovine coronavirus (BCoV) and human coronavirus OC43 (HCoV-OC43), 30 two viruses that are so closely related at the genetic level 30,31 that they can be considered to be the same virus species.

Two proteinases are predicted to cleave the SARS-CoV replicase polyproteins into 16 subunits, the largest of these having a unique domain organization A detailed comparison of the SARS-CoV replicase with that of its closest known relatives in group 2, mouse hepatitis coronavirus (MHV) and BCoV (Figure 1 ), revealed a replicase proteolytic processing scheme and domain organization that, with some notable exceptions (see below), proved to be typical for group 2 viruses. 11,32 Using the conserved signatures of the cleavage sites recognized by coronavirus proteinases 11, 12, 33, 34 and their flanking sequences, we predict the generation of 16 replicase subunits through proteolysis mediated by 3CL pro (11 cleavages) and PL2 pro (three cleavages) ( Figure 1 and Table 1 ).

The most conspicuous differences between known group 2 coronaviruses and SARS-CoV were identified in nsp3, the largest replicase subunit that is encoded by ORF1a (Table 1) . Unlike all other coronaviruses, SARS-CoV does not have an ortholog of papain-like proteinase 1 (PL1 pro ; see close-up in Figure 1 ), 13, 35 which was probably lost during evolution of this lineage. This observation implies that the three cleavages in the N-terminal half of pp1a must all be performed by the conserved PL2 pro , 36,47 a downstream-located paralog of PL1 pro . The ortholog of this proteinase appears to dominate over PL1 pro in HCoV-229E, 32 and is the only active PL pro in avian infectious bronchitis coronavirus (IBV). 32,37 Immediately upstream of PL2 pro , we identified a 375 amino acid residue "orphan domain" in SARS-CoV (called SUD for SARS-CoV unique domain; Figure 1 ), which is not present in other coronaviruses. The corresponding ORF1a region differs profoundly among group 1 coronaviruses. In one of these viruses (TGEV), and in the group 3 IBV, this region contains just a few amino acid residues, essentially fusing PL2 pro to the upstream X domain. In contrast, HCoV-229E and PEDV share a conserved domain in this position. Interestingly, nsp3 also was the main site of replicase differences between BCoV variants isolated from respiratory and intestinal samples from an animal that had died during an outbreak of fatal shipping pneumonia. 20 Due to the plausible 65 and was adjusted manually. Two regions of poor conservation were removed from the alignment, which was converted subsequently into the nucleotide form. All columns containing gaps were removed. The resulting alignment contains the following SARS-CoV sequences fused: 13,623-13,859, 14,310-18,857 and 20,076 -21,482. It included 5487 characters with 3207 of them being parsimony-informative. Using the PAUP program (version 4.0.0d55) and parsimony criterion, an exhaustive tree search of the 135,135 evaluated trees identified the best tree having a score of 10,927 and the second best tree having a score of 10,964; the worst tree had a score of 13,611. A total of 1000 bootstrap trials were conducted using the parsimony criterion and a branch-and-bound search to generate a bootstrap 50% majority-rule consensus tree. The frequency of occurrence of particular bifurcations in bootstraps is indicated at the nodes. Similar trees with similar high bootstrap support above 960 were obtained using the NJ method that was applied to distance matrices obtained for either nucleotide or amino acid alignments (not shown).

Evolution, Genome and Proteome of SARS-Coronavirus multifunctionality of nsp3, which may be involved in the control of subgenomic mRNA synthesis, 13, 38 the gross internal rearrangements and point mutations in this protein may have pleiotropic effect(s) on SARS-CoV properties, including its pathogenic potential.

SARS-CoV produces eight subgenomic mRNAs to express the ORFs located in the 3 0 -proximal part of the genome In a striking parallel with the unique features of nsp3, the 3 0 -proximal part of the SARS-CoV genome contains five ORFs (6, 7a, 7b, 8a and 8b) that are not present in established group 2 coronaviruses and for which no obvious homologs could be identified upon sequence comparison. Furthermore, SARS-CoV lacks counterparts for two genes inserted between replicase ORF1b and the S gene in subgroup 2a viruses (see the close-up in Figure 1 ). 6, 7 All these ORFs (from 2 to 9b) are predicted to be expressed from sg mRNAs in SARS-CoV. In members of the genus Coronavirus and the related family Arteriviridae, all sg mRNAs are 3 0coterminal with the viral genome, and contain a common 5 0 leader sequence that is identical with that of the genome. 6, 7, 9, 39 The fusion of the leader to the coding part (or "body") of each of the sg RNAs involves a discontinuous step in RNA synthesis, which is currently believed to occur during minus strand synthesis, thus producing composite subgenomic negative-stranded templates for sg mRNA synthesis (Figure 3(C) ). 19, 39, 40 Leader-tobody joining is guided by a base-pairing interaction involving conserved transcriptionregulating sequences (TRSs; also previously termed "intergenic sequences (IGSs)" in coronaviruses), which are found at the 3 0 end of the genomic leader (leader TRS) and at the 5 0 end of each of the sg RNA bodies (body TRSs), often located exactly between two genes, but sometimes located within the coding sequence of an upstream gene (Figures 1 and 3(A) ).

In the SARS-CoV genome we readily identified a potential leader TRS (5 0 -CUAAACGAACUUU-3 0 ) that has a 6 -11 nucleotides match with a number of sequences in the 3 0 end of the genome, many of which are positioned immediately upstream of viral genes (Figure 3(A) ). As recognized also by others, 4,5,34 the sequence 5 0 -ACGAAC-3 0 is absolutely conserved and can be considered the core of the SARS-CoV TRS. Based on the SARS-CoV sequence with the largest 5 0 -terminal segment (accession number AY278741 5 ), the SARS-CoV leader sequence is (at least) 72 nucleotides long, similar to e.g. that of BCoV, with which it has a 19, 39 Whereas genome replication relies on continuous minus strand synthesis (antigenome), subgenomic minus strands would be produced by attenuation of nascent strand synthesis at a body TRS (red bar), followed by translocation of the nascent strand to the leader TRS in the genomic template. Following base-pairing between the body TRS complement at the 3 0 end of the minus strand and the leader TRS, RNA synthesis would resume to complete the subgenomic minus strand that would then serve as template for the transcription of subgenomic mRNAs. striking 20 out of 21 nucleotides match immediately upstream of the leader TRS (5 0 -GAUCUCUU GUAGAUCUGUUC-3 0 ). On the basis of the location of putative body TRSs, the synthesis of nine mRNAs by SARS-CoV was expected: the genomic mRNA (RNA1) and eight subgenomic mRNAs with sizes of approximately 8.4, 4.6, 3.8, 3.5, 3.0, 2.6, 2.1 and 1.8 kb (including 5 0 leader and 3 0 poly(A)-tail). However, in the first published experimental analysis of the SARS-CoV-specific mRNAs generated in infected Vero cells, the synthesis of only five viral mRNAs could be confirmed. 5 To investigate SARS-CoV RNA synthesis in more detail, Vero cells were infected with SARS-CoV isolates Frankfurt-1 3 and HKU-39849, 1 and intracellular RNA was analyzed by hybridization with oligonucleotide probes complementary to a part of the 5 0 leader sequence and a sequence just upstream of the 3 0 poly(A) tail. The coronavirus IBV, 41 which also replicates in Vero cells, was used as control and size marker. As illustrated in Figure 3 (B), the genomic RNA and all eight predicted subgenomic transcripts were detected with both SARS-CoV probes, confirming the fact that these RNAs contain both common 5 0 -terminal and common 3 0 -terminal sequences. Remarkably, a slight mobility shift was observed for RNAs 7 and larger of the Frankfurt-1 isolate. The subsequent sequence analysis of this virus revealed that this was due to a 45 nt in-frame deletion in ORF7b, 34 probably the first documented example of SARS-CoV genetic adaptation to cell culture conditions. The confirmation of leader-body fusion sites of the SARS-CoV subgenomic mRNAs will be published elsewhere. 34 Remarkably, up to four of the eight SARS-CoV subgenomic mRNAs (3, 7, 8, and 9) may be functionally bicistronic (Table 2) , as observed occasionally for other coronavirus subgenomic mRNAs. 6, 7 The replicase of coronaviruses includes a variety of putative RNA-processing enzymes

The production of a complex and diverse set of RNA molecules by nidoviruses (including SARS-CoV) is linked to an unparalleled complexity of their giant replicase, which contains a variety of (putative) enzymatic functions and a number of completely uncharacterized domains (Figure 1 ). 18 We have initiated the characterization of coronavirus replicase by comparative genomics, 12 and have regularly updated this analysis through recent years). 18, 32 Our continuing analysis has now identified distant coronavirus homologs of not less than five cellular enzymes that are associated with RNA processing (Figure 4) : poly(U)-specific endoribonuclease (XendoU 42 ), a 3 0 -to-5 0 exonuclease (ExoN) that belongs to the DEDD superfamily, 43 S-adenosylmethionine-dependent ribose 2 0 -Omethyltransferase (2 0 -O-MT) of the RrmJ family, 44 adenosine diphosphate-ribose 1 00 -phosphatase (ADRP 45 ), and cyclic phosphodiesterase (CPD). 45, 46 In the SARS-CoV proteome, conserved domains presumably associated with these activities were mapped (from the N to C terminus) to the X domain 47 of nsp3 (ADRP), the N-terminal domain of nsp14 (ExoN), a "nidovirus-specific" replicase domain 26, 48 in the C-terminal part of nsp15 (XendoU), and nsp16 (2 0 -O-MT). The CPD-related domain is not conserved in SARS-CoV, but was identified in the product of ORF2 49 of established group 2 coronaviruses, and in the very C-terminal domain of the torovirus ORF1a polyprotein, 50 as well as in some double-stranded RNA rotaviruses.

The conservation in the ExoN, 2 0 -O-MT and CPD-related domains of nidoviruses includes the catalytic and other active-site residues identified in the prototype cellular enzymes. Although the active-site residues of the ADRP and XendoU families are yet to be characterized, the most conserved amino acids of these families are found in their putative nidovirus homologs. Some of the nidovirus domains may contain unique and conserved additional domains. For instance, we noted that the nidovirus ExoN homologs contain an additional conserved domain resembling a mononuclear Zn-finger (Figure 4(B) ) between the universally conserved blocks I and II, which include the catalytic residues (two Asp and one Glu). 51 Another Zn-finger-like module has been inserted between blocks II and III in the ExoN homolog of roniviruses, a subset of nidoviruses (data not shown). Our combined observations indicate that the nidovirus homologs of these cellular RNA processing enzymes must be enzymatically active, although they may have evolved to act on specific (and unique) substrates or have additional unique components.

The newly predicted enzymes could be involved in the metabolism of virus and/or cellular RNAs. For instance, the 2 0 -O-MT activity could be used to produce the 5 0 -cap of viral mRNAs, as was demonstrated for a homologous flavivirus enzyme. 52 Based on a parallel with some cellular DNAprocessing homologs, like exonuclease I 53 and the exonuclease domain of DNA polymerases, 54 it is tempting to speculate on a link between the ExoN activity and RNA proofreading, repair, and/or recombination. The first two activities are not known in RNA viruses, and recombination commonly proceeds through the copy-choice mechanism with RdRp switching templates to produce chimeric nascent chains. 55 However, due to the extreme sizes of their giant genomes, coronaviruses may differ from other RNA viruses and share an unprecedented similarity with DNAbased life-forms in the mechanisms of genome biosynthesis and maintenance. If confirmed, these unusual properties would explain the preliminary reports on the resistance of SARS-CoV to ribavirin, a drug that was shown to force other RNA viruses into "error catastrophe". 56 The experimental verification of these predictions will be an important step in increasing our understanding of the functional roles these putative enzymes play in the Z. and A.E.G., unpublished results) . This development indicates that, as before with other distant nidovirus homologs (e.g. the helicase), 15 the translation of bioinformatics predictions into a functional description is likely to be a laborious and time-consuming process, involving mainly the identification of virus-specific substrates and proper assay conditions.

In this respect, we have made an observation that both provides additional support for the provisional assignments made above and may help in the experimental verification of the predicted activities. When the five enzyme families listed ( Figure 4) above were analyzed as a single dataset, it became apparent that representatives of these families cooperate in two nuclear intron RNA processing pathways. These pathways are functionally antagonistic: intron excision aimed at the synthesis of mature tRNA 58 and the production of intronencoded box C/D small nucleolar RNA (snoRNA) from its host pre-mRNA 59 (Figure 5(A) ). In the first pathway, XendoU initiates a cascade of poorly characterized endo-and exonuclease reactions that may involve ExoN, a homolog of the yeast Rrp6p exosome component, 60 ultimately leading to the production of mature U16 and U86 snoRNAs. Subsequently, these snoRNAs may be utilized in diverse rRNA processing events involving nucleotide methylation by fibrillarin, a 2 0 -O-MT, 61 and assisted by helicase(s). 59 Strikingly, the homologs of three cellular enzymes from this pathway, encoded in the replicases of all nidoviruses except for arteriviruses, are genetically clustered in a single protein block (nsp14 -nsp16) immediately downstream of the RNA-helicase (nsp13) (Figures 1 and 4) . Because of the proximity of these four domains to each other, their expression must be tightly coordinated at the level of 3CL pro proteolysis and by the upstream ORF1a/ORF1b ribosomal frameshift signal.

In the other pathway, which involves tRNA-processing, the utilization of a 2 0 -phosphate group of a splicing intermediate involves the conversion of adenosine diphosphate ribose 1 00 -2 00 cyclic phosphate (Appr . p) by CPD 62 into adenosine diphosphate ribose 1 00 -phosphate (Appr-1 00 -p), of which the phosphate group may be further processed by SARS-CoV, and five protein families that include enzymes involved in two nuclear RNA processing pathways: intron excision to produce mature tRNA 58 and the production of intron-encoded box C/D small nucleolar RNA (snoRNA) from its host pre-mRNA ( Figure 5 ). 59 Evolution, Genome and Proteome of SARS-Coronavirus an ADRP. 45 Both these activities may drive the production of mature tRNA. Although the nidovirus homologs of CPD and ADRP remain to be characterized, they are not under the control of the ORF1a/ORF1b ribosomal frameshift signal ( Figure 1 ) and may thus, unlike the ORF1b-encoded enzymes, be produced in larger quantities.

The nidovirus homologs of the five RNA processing enzymes discussed above may interfere with these or similar cellular RNA processing pathways to reprogram the cell for the benefit of virus reproduction. It seems even more conceivable that they, alone or in concert with other enzymes like the RdRp or helicase, are involved directly in viral RNA synthesis, particularly in transcription, which, in an apparent parallel with snoRNAdriven processes, 59 is guided by conserved oligonucleotide base-pairing interactions (Figure 3(C) ). The viral enzymes, like their cellular counterparts, might be part of separate pathways or, alternatively, cooperate in a single pathway in which the XendoU, ExoN and 2 0 -O-MT homologs provide RNA specificity, and the CPD and ADRP homologs modulate the pace through processing of compound(s) containing 2 0 -phosphate groups. In this respect, we note that both the XendoU/ExoN/2 0 -O-MT and CPD/ADRP cellular pathways start with an endoribonuclease-mediated cleavage to produce molecule(s) with 2 0 -3 0 -cyclic phosphate termini ( Figure 5 ), indicating the structural basis for possible cooperation of the coronavirus homologs of these enzymes in a single pathway. The expected functional hierarchy of the five putative nidovirus enzymes ( Figure 5(A) ) is supported by their corresponding evolutionary conservation, with the XendoU homolog being absolutely conserved and the CPD homolog being least conserved among nidoviruses ( Figure 5(B) ).

The availability and comparative analysis of the SARS-CoV genome and proteome set the stage for the extensive biological characterization of this emerging pathogen and the development of anti-SARS-CoV strategies. Our conclusion that SARS-CoV is distantly related to group 2 coronaviruses ( Figure 2 ) implies that viruses from this group, in particular the extensively studied mouse hepatitis virus and its derivatives lacking non-essential CPD-like and HE genes, may be the best available models for both in vitro and in vivo studies, in particular where the synthesis of viral macromolecules and the structure and function of the replication complex are involved. A detailed comparative characterization of the BCoV/HCoV-OC43 pair may provide invaluable insights into the processes of adaptation of a non-human coronavirus to a human host, which should be highly relevant to understanding the emergence of SARS-CoV. The SARS-CoV genome ( Figure 1 ) lacks genes that are common in group 2 viruses, like PL1 pro and CPDlike and HE genes, but encodes a number of unique protein sequences, underlining the ability of coronaviruses to the gross evolution. The comparative studies presented here have tentatively identified both known and novel viral enzymes (Figures 1 and 5) , most of which may be involved in RNA processing and have homologs of which the tertiary structure has been solved ( Figure 1 ). Intriguing parallels have been drawn between these putative viral enzymes and characterized, but distant cellular homologs that will guide the functional dissection of the replicases of SARS-CoV and related viruses and may put the mechanism of coronavirus RNA synthesis in a completely new perspective. The newly described putative enzymes of SARS-CoV double the list of potential targets for the design of antiviral strategies aimed at controlling this emerging virus infection. 33, 34 Materials and Methods

Vero cells were infected with SARS-CoV (Frankfurt 1 or HKU-39849) at an MOI of 0.01 or were mock infected. At the onset of cytopathogenic effect (approximately 40 hours post infection), intracellular RNA was isolated by cell lysis for ten minutes at room temperature with 5% (w/v) lithium dodecyl sulfate in LET buffer (10 mM Tris -HCl (pH 7.4), 100 mM LiCl, 1 mM EDTA), containing 20 mg/ml of proteinase K. After shearing of the cellular DNA using a syringe, lysates were incubated at 42 8C for 15 minutes, extracted with phenol (pH 4.0) and chloroform, and RNA was ethanol-precipitated. The RNAs were separated in denaturing 1% (w/v) agarose gels containing 2.2 M formaldehyde and Mops buffer (10 mM Mops (sodium salt) (pH 7), 5 mM sodium acetate, 1 mM EDTA). Dried gels were used for direct hybridization with 32 P-labeled oligonucleotides SARSV001 (5 0 -CGAGGTTGGTTGGCTTTTCCTG-3 0 ) and SARSV002 (5 0 -CACATGGGGATAGCACTAC-3 0 ), which are complementary to sequences in the SARS-CoV leader sequence and the genomic 3 0 end, respectively. After hybridization, gels were analyzed using a Personal FX Molecular Imager and Quantity One software (both from Bio-Rad).

Genpeptides, Conserved domain (CD) 63 and protein family (Pfam) 64 databases were used in this study. Amino acid sequence alignments were generated using ClustalX1.81 65 and Dialign2 66 programs assisted by Blosum position-specific matrices, 67 and were processed for presentation using GeneDoc. 68 Multiple sequence alignments were converted into hidden Markov model (HMM) profiles using HMMER2.01 software. 69 Sequence databases were searched in default mode, unless stated otherwise, using the HMMER2.01 package. 64, 69 and a family of Blast programs. 70 The expectation values of similarity (E) of 0.05 or lower for Blast searches and 0.1 or lower for HMMERmediated searches were considered to be statistically significant. 71 Database searches with nidovirus proteins (Tables 1 and 2) and their alignments were conducted in an iterative mode until no new homologs were identified. Also, sequences that were identified below the threshold during the last iteration were used to initiate reciprocal searches that might have resulted in new significant matches. This approach worked for all protein families described here, except for the identification of the relationship between the nidovirus ExoN family and cellular DEDD superfamily, which is known to be extremely diverse. 43 In this latter case, using the MAST program, 72 we found a strong match ðp ¼ 3 e 210 Þ between the most conserved motif III of a DEDD protein and a conserved block of the ExoN family that facilitated the identification of the two other motifs in the nidovirus proteins having a non-typical intermotif spacing partially occupied by Zn-finger(s) (see the text and Figure 4 ). Furthermore, we observed an approximately 30 times selective increase of the global similarity between the ExoN family and DEDD proteins, after the coronavirus sequences were modified artificially by removing putative Zn-fingers that are not present in the DEDD proteins. In the HMMER-mediated searches of .10 6 sequences using this Zn-finger-deficient ExoN family as a query, numerous DEDD proteins were retrieved immediately after the nidovirus proteins, starting with E ¼ 0:81: The relatively poor statistics of these hits were due to the failure by HMMER to align all three motifs.

Cluster phylogenetic trees were reconstructed using the neighbour-joining algorithm described by Saitou & Nei 73 with the Kimura correction, 74 and were evaluated with 1000 bootstrap trials, as implemented in the Clus-talX1.81 program. Parsimonious trees were generated using exhaustive search and evaluated with bootstrap branch-and-bound search using a UNIX version of the PAUP p 4.0.0d55 program that is included in the GCG-Wisconsin Package programs. The resulting trees were visualized using the TreeView program. 75 

Coronavirus as a possible cause of severe acute respiratory syndrome

A novel Evolution, Genome and Proteome of SARS-Coronavirus coronavirus associated with severe acute respiratory syndrome

Identification of a novel coronavirus in patients with severe acute respiratory syndrome

The Genome sequence of the SARS-associated coronavirus

Characterization of a novel coronavirus associated with severe acute respiratory syndrome

Coronaviruses. In Fields Virology

The Coronaviridae

Order Nidovirales. Virus Taxonomy (van Regenmortel

Arteriviruses

Characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an RNA pseudoknot

Virus-encoded proteinases and proteolytic processing in the Nidovirales

Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis

A human RNA viral cysteine proteinase that depends upon a unique Zn 2þ -binding finger connecting the two domains of a papain-like fold

The predicted metal-binding region of the arterivirus helicase protein is involved in subgenomic mRNA synthesis, genome replication, and virion biogenesis

The human coronavirus 229E superfamily 1 helicase has RNA and DNA duplex-unwinding activities with 5 0 -to-3 0 polarity

Localization of mouse hepatitis virus nonstructural proteins and RNA synthesis indicates a role for late endosomes in viral replication

RNA replication of Mouse Hepatitis Virus takes place at double-membrane vesicles

Big nidovirus genome. When count and order of domains matter

Coronaviruses use discontinuous extension for synthesis of subgenome-length negative strands

Comparison of genomic and predicted amino acid sequences of respiratory and enteric bovine coronaviruses isolated from the same animal with fatal shipping pneumonia

Mutational analysis of the active centre of coronavirus 3C-like proteases

Phylogenetic analysis of a highly conserved region of the polymerase gene from 11 coronaviruses and development of a consensus polymerase chain reaction assay

A comparative sequence analysis to revise the current taxonomy of the family Coronaviridae

Antigenic homology among coronaviruses related to transmissible gastroenteritis virus

Toroviruses: replication, evolution and comparison with other members of the coronavirus-like superfamily

The carboxyl-terminal part of the putative Berne virus polymerase is expressed by ribosomal frameshifting and contains sequence motifs which indicate that toro-and coronaviruses are evolutionarily related

Primary structure and post-translational processing of the Berne virus peplomer protein

Another triple-spanning envelope protein among intracellularly budding RNA viruses: the torovirus E protein

The group-specific murine coronavirus genes are not essential, but their deletion, by reverse genetics, is attenuating in the natural host

Biological and genetic characterization of a hemagglutinating coronavirus isolated from a diarrhoeic child

Sequence of the 3 0 -terminal end (8.1 kb) of the genome of porcine haemagglutinating encephalomyelitis virus: comparison with other haemagglutinating coronaviruses

The autocatalytic release of a putative RNA virus transcription factor from its polyprotein precursor involves two paralogous papain-like proteases that cleave the same peptide bond

Coronavirus main proteinase (3CLpro) structure: basis for design of anti-SARS drugs

Mechanisms and enzymes involved in SARS coronavirus genome expression

Identification of the catalytic sites of a papain-like cysteine proteinase of murine coronavirus

Identification of mouse hepatitis virus papain-like proteinase 2 activity

Identification of a novel cleavage activity of the first papain-like proteinase domain encoded by open reading frame 1a of the coronavirus Avian infectious bronchitis virus and characterization of the cleavage products

A zinc finger-containing papainlike protease couples subgenomic mRNA synthesis to genome translation in a positive-stranded RNA virus

Sequence requirements for RNA strand transfer during nidovirus discontinuous subgenomic RNA synthesis

Coronavirus subgenomic minus-strand RNAs and the potential for mRNA replicons

Completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus

Purification, cloning, and characterization of XendoU, a novel endoribonuclease involved in processing of intronencoded small nucleolar RNAs in Xenopus laevis

Exoribonuclease superfamilies: structural analysis and phylogenetic distribution

RNA methylation under heat shock control

A biochemical genomics approach for identifying genes by the activity of their products

Characterization of the Saccharomyces cerevisiae cyclic nucleotide phosphodiesterase involved in the metabolism of ADPribose 1 00 ,2 00 -cyclic phosphate

Putative papain-related thiol proteases of positivestrand RNA viruses. Identification of rubi-and aphthovirus proteases and delineation of a novel conserved domain associated with proteases of rubi-, alpha-and coronaviruses

Equine arteritis virus is not a togavirus but belongs to the coronaviruslike superfamily

Identification and stability of a 30-kDa nonstructural protein encoded by mRNA 2 of mouse hepatitis virus in infected cells

Comparison of the genome organization of toro-and coronaviruses: evidence for two nonhomologous RNA recombination events during Berne virus evolution

A conserved 3 0 -5 0 exonuclease active site in prokaryotic and eukaryotic DNA polymerases

An RNA cap (nucleoside-2 0 -O-)-methyltransferase in the flavivirus RNA polymerase NS5: crystal structure and functional characterization

Structure of Escherichia coli exonuclease I suggests how processivity is achieved

Structures of normal single-stranded DNA and deoxyribo-3 0 -S-phosphorothiolates bound to the 3 0 -5 0 exonucleolytic active site of DNA polymerase I from Escherichia coli

New insights into the mechanisms of RNA recombination

The broad-spectrum antiviral ribonucleoside ribavirin is an RNA virus mutagen

Molecular phylogenetics of the RrmJ/fibrillarin superfamily of ribose 2 0 -O-methyltransferases

tRNA splicing

Biogenesis of small nucleolar ribonucleoproteins

Functions of the exosome in rRNA, snoRNA and snRNA synthesis

Crystal structure of a fibrillarin homologue from Methanococcus jannaschii, a hyperthermophile, at 1.6 Å resolution

tRNA splicing in yeast and wheat germ. A cyclic phosphodiesterase implicated in the metabolism of ADPribose 1 00 ,2 00 -cyclic phosphate

CDD: a database of conserved domain alignments with links to domain three-dimensional structure

Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins

The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools

DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment

Position-based sequence weights

GeneDoc: analysis and visualization of genetic variation

Hidden Markov models

a new generation of protein database search programs

Evolution of domain families

Combining evidence using p-values: application to sequence homology searches

The neighbor-joining method: a new method for reconstructing phylogenetic trees

TreeView: an application to display phylogenetic trees on personal computers

Leader -mRNA junction sequences are unique for each subgenomic mRNA species in the bovine coronavirus and remain so throughout persistent infection

Bovine coronavirus I protein synthesis follows ribosomal scanning on the bicistronic N mRNA

Internal entry of ribosomes on a tricistronic mRNA encoded by infectious bronchitis virus

We acknowledge the work of many colleagues in the nidovirus field whom we were unable to cite due to space limitations.