key: cord-0004945-cljnoklc authors: Britton, P.; Otin, C. Lopez; Alonso, J. M. Martin; Parra, F. title: Sequence of the coding regions from the 3.0 kb and 3.9 kb mRNA: Subgenomic species from a virulent isolate of transmissible gastroenteritis virus date: 1989 journal: Arch Virol DOI: 10.1007/bf01311354 sha: fde1a2a7b325186d33517b06a244c3213d86d072 doc_id: 4945 cord_uid: cljnoklc Subgenomic mRNA from a virulent isolate of porcine transmissible gastroenteritis virus (TGEV) was used to produce cDNA clones covering the genome region from the 3′ end of the pelomer gene to the start of the integral membrane protein gene. The nucleotide sequence of this area was determined using clone pTG11 and a previously reported cDNA clone pTG22. Three open reading frames (ORFs) were identified encoding putative polypeptides of relative molecular masses (M(r)) 6,600, 27,600, and 9,200. The sequence encoding the M(r) 9,200 polypeptide was found to be present on the “unique” 5′ region of the 3.0 kb mRNA species whereas the other two ORFs mapped on the 3.9 kb mRNA species. Differences between the ORFs from this strain of TGEV and those from a previously reported avirulent strain of TGEV were compared. TGEV belongs to the family Coronaviridae, a large group of pleomorphic enveloped viruses with a positive-stranded RNA genome, and causes gastroenteritis in pigs resulting in a high mortality in neonates. Coronavirus proteins are expressed from a 'nested' set of subgenomic mRNAs with common 3' termini but different 5" extensions which are translated to produce viral proteins whose genes are absent on the smaller mRNA species. Mouse hepatitis virus (MHV) and infectious bronchitis virus (IBV) mRNA species contain short non-coding sequences at their 5' ends which appear to be joined to the regions encoding the viral genes by discontinuous transcription. A consensus sequence identified upstream of each gene/ORF may act as a binding site for the RNA polymeraseleader complex [11, 12, 25, 36, 40] . It has been previously postulated that a heptameric sequence ACTAAAC [9, 10] or a hexameric sequence CTAAAC 166 P. Britton et al. [20, 33, 34] may be involved in the binding of the TGEV RNA polymeraseleader. The TGEV virion contains three major structural polypeptides; a surface glycoprotein (spike or peplomer protein) with a monomeric M r 200,000, a glycosylated integral membrane protein observed as a series of polypeptides of Mr 28,000-31,000 and a basic phosphorylated protein (the nucteoprotein) of Mr 47,000 associated with the viral genomic RNA [16] . TGEV infected cells, in addition to the genomic RNA, have six species of subgenomic viral mRNA [7, 18] . Expression and sequencing studies have shown that the 1.7 kb mRNA species encodes the nucleoprotein gene [7-9, 20, 34] and the 2.6 kb mRNA species encodes the integral membrane protein gene [ 10, 2 t, 26, 34] . Sequencing studies have shown that the 8.4 kb mRNA species encodes the peplomer gene [32] . The largest RNA species of about 25 kb must encode the RNA dependant-RNA polymerase as shown for IBV [5] . The 0.Tkb mRNA species [7, 9] contains a single ORF encoding a polypeptide (ORF-4) of Mr 9,000 [9, 20, 34] . Antisera raised against synthetic oligopeptides derived from ORF-4 reacted to a polypeptide of Mr 14,000 in TGEV infected cells (unpublished result). The other mRNA species of 3.0 kb and 3.9 kb [7, 18] have had no product assigned to them from either infected cells or virions. A polypeptide product of Mr 24,000 has been identified by in vitro translation of the 3.9 kb mRNA species (Purdue strain) in rabbit reticulocyte lysate [ 18] which was not detected in TGEV infected cells nor in purified virions and was not immuneprecipitated with anti-virion protein antibodies. Sequencing studies [34] , on the avirulent Purdue strain of TGEV (Purdue-115 [2] ), have shown that the 3.9 kb mRNA species has two possible ORFs, X2a and X2b, encoding putative polypeptides of Mr 7,700 and 18,800 and an ORF, X 1, encoding a polypeptide of Mr 9,200 on the 3.0kb species. The observation that more than one ORF can be found on the 5' 'unique' region of a particular mRNA has also been described in two other coronaviruses MHV and IBV. Sequencing studies have shown that mRNA B species from the Beaudette strain of IBV has an ORF with the translation potential for a polypeptide of Mr 7,500 [3] and that mRNA D has three ORFs encoding potential polypeptides of Mr 6,700, 7,400, and 12,400 [4] . Similarly mRNA 4 from the JHM strain of MHV has an ORF encoding a potential polypeptide of Mr 15, 200 [37] and mRNA 5 has two ORFs encoding potential polypeptides of Mr 12,400 and 10,200 [38] . It appears then that some of the coronavirus mRNA species may carry and express more than one gene product, however, their in vivo detection has proven to be difficult [3, 27, 38] . Previous sequencing studies of the genome encoding the TGEV integral membrane protein and nucleoprotein genes, have shown a very high degree of homology between the virulent FS 772/70 strain [9, 10] and the avirulent Purdue strain [20, 21, 26, 34] with minor changes in their amino acid sequences, the majority of which are conservative substitutions. Thus studies were undertaken to address the question of how different are the genome regions potentially coding for non-structural gene products from a virulent field isolate of TGEV as compared with an avirulent laboratory strain. We report in this paper the cloning and sequencing of the genome area corresponding to the 5' coding regions of the 3.9 and 3.0 kb mRNA species from the FS772/70 strain of TGEV. TGEV poly(A)-containing mRNA was prepared from LLC-PK1 cells infected with TGEV strain FS772/70 and purified from other RNA species on poly(U) Sepharose as described previously [8, 9] . Standard recombinant DNA methods were used [29] with enzymes purchased from New England Biolabs (CP Laboratories, Bishop's Stortford) unless otherwise stated in the text. DNA fragments were isolated from agarose gels using Geneclean TM (Stratech Scientific Ltd, London). Ligation reactions were carried out as described before [6] . E. co# strain DH1 was used for isolation of TGEV cDNA clones. E. coli transformants were selected on LB plates containing ampicillin (100#gml-J). cDNA synthesis was carried out as described before [10] using a synthetic oligonucleotide to prime first strand synthesis. Transformants containing TGEV cDNA were identified by colony hybridisation, as described before [9] , using two [32P l-labelled TGEV cDNA fragments. Specific restriction fragments were subcloned into M 13 mp 18 and mp 19 vectors and were sequenced using either the universal primer from the M 13 site or five synthetic oligonucteotides from within the TGEV cDNA sequences as primers. The synthetic oligonucleotide sequences were 5'-TTCTAGCTTTGTACCGC-3'(H1A), 5'-GTCATCTATGACAGTCA-3'(H1B), 5'-TGAAAAAGTGCACATCC-3' (TG2B), 5'-TATAGCACTAACAACCT-GAT-3'(Oligo 8) and 5'-CTAAGTAGGCGAATCTTAAA-3'(Otigo 15) whose positions and directions are shown on Fig. 2 . All the oligonucleotides were synthesised by the phosphoramidite method using an Applied Biosystem model 381 A synthesizer. DNA sequencing was carried out using [c~-35S]-dATP by the di-deoxy method [35] from single stranded DNA templates or directly from plasmid DNA [30] but using the Sequenase rM protocol. Specific restriction fragments from TGEV cDNA clones were separated by agarose gel etectrophoresis, purified from the agarose gel using a Geneclean TM kit and labelled as described before [9] . TGEV sub-genomic mRNA species were isolated from TGEV infected LLC-PKt cells 8 h post-infection, purified, denatured with 6 M glyoxal, electrophoresed into either 0.7% or 1% agarose gels, northern blotted onto Biodyne membranes and hybridised to the labelled cDNA fragments as described before [9] . The probes were hybridised in the presence of 50 % formamide at 42 °C for 16 h. The membranes were washed three times in X2 SSC containing 0.1% SDS at room temperature, twice in X 1 SSC containing 0.1% SDS at 68°C and once in X0. Cloning from TGEV mRNA species TGEV poly(A)-containing mRNA species were isolated from virus infected LLC-PK 1 cells and used for the synthesis of cDNA. Production of plasmid pTG 22 was as described before [ 10] . Plasmid pTG 11 was produced as described for pTG 22. Plasmids pTG 22 and pTG 11 were identified by their ability to hybridise to two TGEV cDNA fragments. One of the fragments, H 11 (1.5 kbp), originated from within the peplomer gene and the other, H 26 (0.6 kbp), originated from within the 5' coding region of the 3.9 kb mRNA species (unpublished result), pTG 22 only hybridised with H 26 whereas pTG 1 t hybridised with both H 26 and H 11 indicating that pTG 11 extended further into the peplomer gene than did pTG 22. The cDNA in plasmid pTG 11 is 2.98 kbp long and from its restriction map was shown to overlap pTG 22 ( Fig. 1 ) and extend 5.3 kb from the 3' end into the TGEV genome. From the size and position along the TGEV genome it was deduced that the cDNA from plasmids pTG 11 and pTG22 contained sequences from the 5' ends of the 3.0 kb and 3.9 kb mRNA species and part of the 3' end of the peplomer gene ( Fig. 1) . The strategy for sequencing the TGEV genome area from the 3' end of the peplomer gene to the 5' end of the integral membrane protein gene is summarised in Fig. 1 , in which the arrows show the regions and direction of sequencing from the M 13 clones. Thus the relevant part of the cDNA was sequenced in both directions. Several independent subctones from pTG 11, and corresponding ones from pTG 22, were sequenced with no differences between their cDNA sequences. ORF-I M D I V K S I N T S V D A V L D E L D C A Y F A V T L K V E F Y V TAAGACTC4GTAAATTAC TTGTGTGTATAC~'I" ~'±'ICg3TGACACAC TTC T T G C G G C T A G G G A T A A A G C A T A T G C~T C ~ ~ ~A~G~~T C~ 360 , , , nlallllll K T G K L L V C I G F G D T L L A A R D K A Y A K L G L S T I * K I ( E E V N S H I V V * ) GCATTAAGTGTTACAAAACAATTAAAGAGAGATTATAGAAAAACTGTCATT~TAAATTTCATC`CGAAAATGATTGGTGGAC~Iq~r C T T A A T A C T C T G A G~T~A G~C C 480 {llllllill |lllllli , • ORF-2 M I G G L F L N T L S F V I V S N H OLIGO 8 ,.( A C A G C A T C~A G~~ T A~A C C~G~ ~ 600 • • * * * S I V N N T A N V H H I Q Q E R V I V Q Q H Q V V S A I T Q N Y Y P E F S I A V H I A TA~TTTTTGTATCTTTTCTAGCTT~TACCGCAGTACAAACTTTAAGACGTGTGTCGGCATCTTAATGTTTAAGATTITATCAATGACAC~iT~"~GGACCTATGCT~A~TA~ 720 L F V S F L A L Y R S T N F K T C V G I L M F K I L S M T L L G P M L I A Y G Y OUGO 15 4 b - A~TATATTGATC~TTGTTACAACAA~TGTCTTAT~TTTAAGATTCGCCTACT~ACK/ATAC~TATGTTAATAGTAGGTTTGAATTTATTTTA~TAATACAACGACACTCATGT 840 , , , * • Y I D G I V T T T V L S L R F A Y L A Y F W Y V N S R F E F I L Y N T T T L M F V TTGTACATGGC.AGAGCTGCAC C G~C , AAGTTC TCACAC~TTf]~TGTCACATTGTATGGTGGCATAAATTATATGTTTGTGAATGAC C TCACGTTC~'±'i'I'~GTAGAC C~ 960 V H G R A A P F K R S S H S S I Y V T L Y G G I N Y M F V N D L T L H F V D P M L V S I A I R G L A H A D L T V V R A V E L L N G D F I Y V F S Q E P V V G V Y H i S ,( .... ACAATGCAGCCTTTTC T CAGGCGGTTCTAAACGAAATTGAC TTAAAAGAAGAAGAGGGAGACCGTAC CTATGACGTTTCC ~T r G A C T G T C A T A G A T G A C A A T C K ? 4~ 1200 N A A F S Q A V L N E I D L K E E E G D R T Y D V S * E H ORF-3 M T F P R A L T V I D D N G L V I V The resulting nucleotide sequence was translated in all six reading frames and the virus sense strand revealed three ORFs of 186bp, 732bp and 243bp. The corresponding DNA sequence 148 bp from the 5' end of the first ORF to the start of the TGEV integral membrane protein gene, present in pTG 22 [10] and pTG 1 t, is illustrated in Fig. 2 . The three ORFs are designated ORF-i, of 62 amino acids between nucleotides 149-334; ORF-2, of 244 amino acids between nucleotides 429-1160 and ORF-3, of 81 amino acids between nucleotides 1150-1392. One ORF was identified, in the complementary strand, of 52 amino acids (Mr 5,802) between nucleotides 1,125-1,280 which was not preceeded by the potential RNA polymerase-leader complex binding site (ACTAAAC), though it does have the sequence CTAAAT 11 bases upstream of the ATG. ORF-1, 186 bp, initiating from the ATG at position 149, has coding capacity for a putative polypeptide with a M~ 6,670. ORF-2, 732 bp, initiating from the ATG at position 429, has coding capacity for a putative polypeptide with a M r 27,624. ORF-3, 243 bp, initiating from the ATG at position 1,150, has coding capacity for a putative polypeptide with a Mc 9,211. ORF-3 was found to overlap the 3'-end of ORF-2 by 12 nucleotides representing 4 amino acids. From their lengths and positions, from the start of the poly(A)-tail, ORFs-1 and -2 mapped within the 5' coding region of the 3.9 kb mRNA species. The 5' end of ORF-3 mapped within the 5' coding region of the 3.0 kb rnRNA species (see Fig. 1 ). The nucleotide sequence (Fig. 2) revealed the presence, 23 bp from the ATG of ORF-1, of the heptameric sequence ACTAAAC, also found 5' of the ATG sequences of other TGEV genes, see Table 1 . Although no ACTAAAC sequence was found 5' to the ATG of ORF-2 there are similar sequences AGTAAAC, ACAAAAC and CTAAAT (82 bp, 49 bp, and 11 bp upstream of the ATG (Table 1) , about the ATG of the ORF-1 is not favourable for initiation by eukaryotic ribosomes, (CC)ACCA TGG [23, 24-1, due to the thymidine residue at position -3, though the presence of the guanosine residue at + 4 may improve its efficiency. The sequence context, (CG)AAAATGA (Table 1) , for ORF-2 is favourable for initiation. The sequence context, (TA)CCTATGA (Table 1) , for ORF-3 is also not very favourable due to the presence of cytosine at position -3 though the efficiency may be improved by the presence of adenosine at position + 4. The sequence context of the nucleoprotein gene, (TC)TAAA TGG (Table 1 ) [9] , also appears not to be very favourable but is expressed efficiently in virus infected cells and this is also confirmed by expression studies in yeast [9] and vaccinia virus (unpubl. observation). There is very little difference in the codon usage between ORFs 1--3 and the nucleoprotein, integral membrane protein and ORF-4 genes. The codons GCG for alanine, TCG for serine, CGA for arginine, CTG for leucine and ATC for isoleucine are very rarely used for any of the genes suggesting that ORFs 1-3 encode genetic information and have not evolved by random incorporation of nucleotides. TGEV mRNA species were northern blotted onto Biodyne membranes as described in Materials and methods. Strips of Biodyne membrane containing the separated TGEV mRNA species were probed with various purified cDNA fragments. As can be seen from Fig. 3 the fragment from ORF-t hybridised with the 3.9kb mRNA species, indicating that the origin of ORF-1 is within the 5' coding region of the 3.9 kb species. The fragment corresponding to ORF-2 hybridised with the 3.9 kb mRNA indicating that ORF-2 is also contained within the 5' coding region of the 3.9 kb mRNA species. No mRNA species of 3.7 kb was detected, corresponding to the theoretical size of a mRNA species for ORF-2, in the infected cells under the conditions used. The fragment corresponding to ORF-3 hybridised with the 3.0 kb mRNA species indicating that the origin of ORF-3 was from within the 5' coding region of this mRNA species. As can be seen from Fig. 3 , sequences from other TGEV genes are found within the 5' coding regions of their associated mRNA species. None of the sequences corresponding to ORF-4 or nucleoprotein (Fig. 3, track 4) and integral membrane protein genes (Fig. 3, track 5 ) detected a mRNA species of 3.7 kb that would correspond to a 'unique' mRNA species for ORF-2. The m R N A from a virulent British field isolate of TGEV was used to produce c D N A clones representing the 5' coding regions from the 3.0kb and 3.9 kb subgenomic m R N A species. Two independently isolated c D N A clones were sequenced allowing the identification of three ORFs, ORF-1 (186 bp), ORF-2 (732bp) and ORF-3 (243 bp) in the viral sense strand (Figs. 1 and 2) . The initiation codon of ORF-1 was preceded by the heptameric sequence, ACTAAAC, previously described as preceding the TGEV nucleoprotein, integral membrane protein and ORF-4 genes. ORF-2 is not preceded by the A C T A A A C sequence, though it has similar sequences present, and does not appear to be expressed on a separate m R N A species isolated from infected tissue culture cells. This may indicate that both ORFs may be expressed from the same message or that only one of them is translated; whether this occurs at the same or different times in the virus life-cycle is not known. The possibility that a new message be synthesized at a precise time in the virus life-cycle or that the message is only expressed in infected animals and not in tissue culture cells cannot be ruled out. ORF-3 is preceded by the hexameric sequence CTAAAC indicating that this hexameric sequence is all that is required for recognition by the TGEV RNA polymerase-leader complex. None of the three ORFs have any predicted N-terminal signal sequence using the weight-matrix method [41] , indicating that the polypeptides, if synthesised, may reside in the cell cytoplasm and not be associated with the viral envelope or infected cell membranes. The lack of an N-terminal signal sequence does not rule out the possibility of an internal signal sequence as found in the integral membrane proteins of IBV and MHV. The predicted ORF-1 product has an overall charge of-1 and contains about 46 % hydrophobic residues. The N-terminal end of the polypeptide appears to be acidic due to the presence of four aspartic and one glutamic acid residues between amino acids 12-19. The carboxyl terminus is slightly basic due to the presence of one arginine and two lysine residues between amino acids 50-56. There is one potential N-linked glycosylation site at residue 8 ( Fig. 2) possibly increasing the molecular weight of the gene product to Mr 9,000 assuming that the presence a single N-linked glycan site can add 2,000 to the molecular weight of a polypeptide [22] . ORF-2 product has an overall charge of + 4 and contains 46% hydrophobic residues. The N-terminus appears to be hydrophobic though the carboxyl terminus appears to be hydrophilic and acidic with four glutamic acid (three consecutive), three aspartic acid and two basic residues between amino acids 22%244. There are three potential N-linked glycosylation sites at residues 17, 22 and 132 (Fig. 2) possibly increasing the molecular weight of the gene product to Mr 33,600. ORF-3 product has an overall charge of + 4 and contains 57% hydrophobic residues. The polypeptide appears to be very hydrophobic with 14.8% of the amino acid residues being leucine and 16% isoleucine. This large percentage of leucine and isoleucine residues is similar to ORF-4 which has 34.6% of the amino acid residues as leucine and 5.1% isoleucine [9] . The product does not contain any potential N-linked glycosylation sites (Fig. 2) . mRNA with coding capacity for more than one ORF have been found in other coronaviruses. IBV mRNA D contains three potential ORFs, D 1 encoding a polypeptide Mr 6,700, D 2 encoding a polypeptide Mr 7,400, and D 3 encoding a polypeptide of M r 12,400. Chimaeric proteins have been produced [39] , consisting of the ORFs fused to the E. coli lacZ gene, and antisera raised against them were used to immunoprecipitate proteins from virus infected cells. A polypeptide corresponding to the size of D 3 was immuneprecipitated from IBV infected cells with antisera to the D 3 chimaera and there was some evidence that the D 2 product may also be synthesised. DNA containing either D 3 or D 2 with D 3 was cloned into pSP 64 and SP 6 polymerase was used to generate RNA, which was then translated in vitro. The D 2 and D 3 products were produced from the DNA containing both genes in the wheatgerm translation system but expression of D 2 in the rabbit reticulocyte lysate system was very poor. The DNA containing D 3 alone was expressed in both systems. The D 2 ORF, like ORF-1 in TGEV, has a pyrimidine at position -3 from the initiation codon whereas D 3 and TGEV ORF-2 have a purine at -3, making expression more favourable from the second ORF for TGEV and the third ORF for IBV. D 1 also has a pyrimidine at position -3 and there was no evidence that D 1 was produced in vivo. MHV mRNA 5 contains two potential ORFs coding for polypeptides of Mr 13,000 and Mr 9,600 though the sizes of the products vary between the two strains of MHV that have been sequenced. The two ORFs from the A 59 strain of MHV have been cloned into the pGEM vectors and the resulting RNA translated in the wheatgerm system [13] . Potypeptides of the correct size were synthesised but no products from in vitro translation of isolated viral poly(A)containing mRNA were detected. The second ORF, encoding the polypeptide Mr 9,600, was fused to the 5' end of the lacZ gene and the resulting chimaeric protein used to raise antibodies [27] . The antibodies raised against the chimaeric protein immune precipitated a polypeptide of Mr 9,600 in MHV infected cells. The first ORF, polypeptide Mrl3,000, on mRNA 5 has a pyrimidine at the -3 position from the initiation codon and the second ORF, polypeptide Mr 9,600, has a purine at the -3 position. Thus it appears that in IBV and MHV, where a mRNA species contains more than one ORF, expression appears to occur from the ORF with the more favourable initiation codon. IfTGEV is analogous, it appears that the ORF-2 gene is the more likely to be expressed, though the possibility that the other ORFs may be expressed but in much lower amounts, cannot be ruled out. The mRNA 4 species of MHV contains a single ORF, encoding a polypeptide Mr 15,200, which has been fused to the lacZ gene and the resultant chimaeric protein used to raised antibodies [14, 15] . The antibodies raised against the chimaeric protein immune precipitated a polypeptide of Mr 15,000 in MHV infected cells and appeared to react with a protein in the cell cytoplasm by indirect immunofluorescence. The sequence context is favourable though the putative polymerase-leader complex binding site is 53 nucleotides upstream of the ATG, about six times the distance when compared to other MHV genes. The sequence context of TGEV ORF-3 is not favourabte (see Table 1 ) though the distance between the polymerase-teader complex binding site is not unusually long, possibly indicating some control over expression. Comparison of the sequences between the end of the peplomer and the start of the integral membrane protein genes derived from the FS 772/70 strain of TGEV reported in this paper to those of the Purdue strain [34] shows that there are several deletions and insertions within the cDNA sequences. Nucleotides 45-53 (Fig. 2) Fig. 2) from G in the Purdue strain to a T in the FS 772/70 strain leads to earlier termination of the ORF in the FS 772/70 strain. An insertion at position 342 and the base substitution at nucleotide 335 in the Purdue strain leads to an increase in the size of ORF-1 (X 2a Purdue strain) by 9 amino acids (Fig. 2) . The insertion at position 1312 of three bases leads to the insertion of an extra isoleucine residue at amino acid position 55 in ORF-3 (X 1 for the Purdue strain), see Fig. 2 . A change from a T residue in the Purdue strain to a G residue in FS 772/70 at nucleotide position 431 results in the formation of an ATG initiation codon for ORF-2. This results in a polypeptide of 244 amino acids for ORF-2 compared to X 2b of the Purdue strain which has 165 amino acids and initiates from nucleotide 666 on Fig. 2 . Both the independently isolated clones, pTG11 and pTG22, from the FS 772/70 strain showed the same sequence and neither had the 13 base deletion described in some of the Purdue cDNA clones [34] which led to the truncation of the X2b product. The homology between the amino acid sequences for the rest of the gene products between the two strains of TGEV is very high. There are only 3, 4, and 5 amino acid substitutions in ORFs 1-3, respectively, a similar result found for the nucleoprotein, integral membrane protein and ORF-4 gene products of these TGEV strains indicating that there has been very little change between the two viral genomes. There is very little, if any, sequence homology between the TGEV ORFs 1-3 and the ORFs from mRNAs B and D from IBV and from mRNAs 4 and 5 from MHV using the SEQHPE program from the Los Alamos package [19] . However, IBV D 3, the ORF from MHV mRNA 4 and TGEV ORF-3 all have hydrophobic N-termini and hydrophilic C-termini and are of similar amino acid length indicating that they may have similar, but as yet unknown, function. ORF-B from MHV mRNA 5 and TGEV ORF-1 have very similar overall charges and have hydrophobic N-termini suggesting some similarity in function. Neither MHV or IBV have a protein of similar size to TGEV ORF-2 indicating that this gene product may be unique to TGEV. It will be interesting to compare the sequences of the other coronaviruses belonging to the TGEV sub-group when they become available for the presence of ORF-2. Comparison of TGEV ORFs 1-3 with the PIR protein database showed that there was no significant homology with any of the proteins in the databank using the FASTA program [31] or by using the SWEEP program against the Leeds University protein database. No homology was found by comparing the nucleotide sequences of ORFs 1-3 against the EMBL [17] or GENBANK [17 nucleic acid database using the FASTN program [28] . No homologies were found by screening the amino acid sequences of ORFs 1-3 against the derived amino acids from all the nucleic acid sequences in GENBANK using the TFASTA program [31] . It is constructive to compare the sequences between avirulent and virulent viruses in order to identify regions that may be involved in pathogenicity and immunogenicity. Evidence presented in this paper indicates that there is very little homology between potential non-structural genes from the different coronaviruses. There appears to be a significant difference between the sequence of the ORF-2 gene from virulent strain, described in this paper, and the avirulent P. Britton etal. Purdue strain of TGEV previously published [34] . However, a polypeptide of Mr 24,000 has been detected by in vitro translation of the 3.4 kb TGEV m R N A species, from the Purdue strain, using the rabbit reticulocyte lysate system [18] . Since the nucleotide sequence of this region has not been published, for the isolate of the Purdue strain that was used for in vitro translation, no conclusions can be reached with respect to variation within isolates of the Purdue strain. Experiments are under way to identify the products of ORFs 1-3 in virus infected cells. The Genbank genetic sequence database Antibody responses in serum, clostrum, and milk of swine after infection or vaccination with transmissible gastroenteritis virus Sequencing of coronavirus IBV genomic RNA: a 195-base open reading frame encoded by mRNA B Sequencing of coronavirus IBV genomic RNA: Three open reading frames in the 5' 'unique' region of mRNA D Completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus Location and direction of transcription of the ptsH and ptsI genes on the Escherichia coli K 12 genome Towards a genetically-engineered vaccine against porcine transmissible gastroenteritis virus Expression of porcine transmissible gastroenteritis virus genes in E. coli as [3-galactosidase chimaeric proteins Sequence of the nucleoprotein from a virulent British field isolate of transmissible gastroenteritis virus and its expression in Saccharomyces cerevisiae b) The integral membrane protein from a virulent isolate of transmissible gastroenteritis virus: molecular characterization, sequence and expression in Esckerichia coli Sequence of TGEV 3.0kb and 3.9kb mRNA A leader sequence is present on mRNA A of avian infectious bronchitis virus Three intergenic regions of coronavirus mouse hepatitis virus strain A 59 genome RNA contain a common nucleotide sequence that is homologous to the 3' end of the viral mRNA leader sequence In vitro synthesis of two polypeptides from a nonstructural gene of coronavirus mouse hepatitis virus strain A 59 Identification of the coronavirus MHV-JHM mRNA 4 gene product using fusion protein antisera Identification of the coronavirus MHV-JHM mRNA 4 product The polypeptide structure of transmissible gastroenteritis virus The EMBL data library Characterization and translation of transmissible gastroenteritis virus mRNAs Los Alamos sequence analysis package for nucleic acids and proteins Sequence analysis of the porcine transmissible gastroenteritis coronavirus nucleocapsid protein gene Nucleotide sequence of the porcine transmissible gastroenteritis coronavirus matrix protein Cotranslational and posttranslational processing of viral glycoproteins Comparison of initiation of protein synthesis in prokaryotes, eukaryotes and organelles Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes Characterization of leader RNA sequences on the virion and mRNAs of mouse hepatitis virus, a cytoplasmic RNA virus Sequence and N-terminal processing of the transmembrane protein E 1 of the coronavirus transmissible gastroenteritis virus Detection of a murine coronavirus nonstructural protein encoded in a down stream open reading frame Rapid and sensitive protein similarity searches Molecular cloning: a laboratory manual Speeding-up the sequencing of double-stranded DNA Improved tools for biological sequence comparison Sequence of TGEV 3.0kb and 3.9kb mRNA The predicted structure of the peplomer protein E 2 of the porcine coronavirus transmissible gastroenteritis gastroenteritis virus Surface glycoproteins of transmissible gastroenteritis virus: functions and gene sequence Enteric coronavirus TGEV: partial sequence of the genomic RNA, its organisation and expression DNA sequencing with chain terminating inhibitors The 5'-end sequence of the murine coronavirus genome: implications for multiple fusion sites in leader-primed transcription Coding sequence of coronavirus MHV-JHM mRNA Coronavirus MHV-JHM mRNA 5 has a sequence arrangement which potentially allows translation of a second down stream open reading frame Identification of a new gene product encoded by mRNA D of infectious bronchitis virus Coronavirus mRNA synthesis involves fusion of non-contiguous sequences A new method for predicting signal sequence cleavage sites Authors' address: Dr. P. Britton, Division of Microbiology, A.F.R.C. Institute fbr Animal Health, Compton Laboratory, Compton, Newbury, Berks. R C l 6 0 N N , U.K.Received February 9, 1989