key: cord-259398-s8qsjkj2 authors: Chouljenko, Vladimir N.; Kousoulas, Konstantin G.; Lin, Xiaoqing; Storz, Johannes title: Nucleotide and Predicted Amino Acid Sequences of All Genes Encoded by the 3′ Genomic Portion (9.5 kb) of Respiratory Bovine Coronaviruses and Comparisons Among Respiratory and Enteric Coronaviruses date: 1998 journal: Virus Genes DOI: 10.1023/a:1008048916808 sha: doc_id: 259398 cord_uid: s8qsjkj2 The 3′-ends of the genomes (9538 bp) of two wild-type respiratory bovine coronavirus (RBCV) isolates LSU and OK were obtained by cDNA sequencing. In addition, the 3′-end of the genome (9545) of the wild-type enteric bovine coronavirus (EBCV) strain LY-138 was assembled from available sequences and by cDNA sequencing of unknown genomic regions. Comparative analyses of RBCV and EBCV nucleotide and deduced amino acid sequences revealed that RBCV-specific nucleotide and amino acid differences were disproportionally concentrated within the S gene and the genomic region between the S and E genes. Comparisons among virulent and avirulent BCV strains revealed that virulence-specific nucleotide and amino acid changes were located within the S and E genes, and the 32 kDa open reading frame. Coronaviruses are important etiological agents of human and animal diseases including respiratory infection, gastroenteritis, hepatic and neurological disorders as well as immune-mediated disease such as feline infectious peritonitis, and other persistent infections (1, 2) . Enteric bovine coronaviruses (EBCV) are generally associated with enteric disease of newborn calves and winter dysentery of adult cattle (2) . Recently, numerous respiratory bovine coronaviruses (RBCV) were isolated in our laboratory from cattle arriving with fever and respiratory disease in feedlots or livestock shows of 8 different states in the USA. The cytopathogenic, cell fusion, and other phenotypic properties of these viruses were different from the known EBCV (3). Coronaviruses contain a single stranded, capped, and polyadenylated positive-sense (infectious) RNA molecule of approximately 30 kb length, which directs the synthesis of a nested set of subgenomic mRNAs (4, 5) . The 3 H -end of the genomic RNA consists of approximately 9.5 kb and contains the spike (S) glycoprotein, the hemagglutinin-esterase (HE) glycoprotein, the integral membrane (M) protein, the small membrane protein (E) and the phosphorylated nucleocapsid (N) protein and a number of ORFs potentially encoding non-structural proteins (N s ) (5) . The 32 kDa non-structural protein is a phosphoprotein that accumulates in the cytoplasm of infected cells (6, 7) . It is not known whether the 12.7 and 4.8 kDa ORFs are expressed in infected cells, while the 4.9 kDa putative protein, most likely, is not translated (8) . BCV uses N-acetyl-9-O acetyl neuraminic acid as receptor determinant to initiate infection (9) . Although the HE glycoprotein also has an af®nity for 9-O-acetylated sialic acid, the S glycoprotein was identi®ed as the major sialic acid binding protein of BCV (10) . The S glycoprotein facilitates viral attachment to susceptible cells, causes cell fusion after cell-surface expression (fusion from within), and induces viral infectivity neutralizing antibodies (1). Porcine transmissible gastroenteritis virus (TGEV) strains were isolated which exhibited respiratory tissue tropism. These viruses contained point mutations or deletions within the ®rst 250 aa of TGEV S1 which were associated with reduced enteropathogenicity and loss of hemagglutinating activity (11±13). To examine the genetic basis for the phenotypic differences between RBCV and EBCV, we cloned and sequenced the 3 H -end of the viral genomes of two virus strains RBCV-LSU-94LSS-051-2(LSU) and RBCV-OK-0514-3(OK) that originated from Louisiana and Oklahoma cattle, respectively. We report here, the nucleotide and predicted amino acid sequences of all genes encoded by the 3 H genomic portion (9.5 kb) of two wild-type RBCV strains and comparisons among respiratory and enteric coronaviruses. Viruses and cell line. All RBCV and EBCV strains were propagated in the G clone of human rectal tumor cells (HRT-18G) developed recently through selection and medium modulation (3) . Supernatant¯uids from infected HRT-18G cells were collected and viruses were puri®ed as described (14) . RBCV OK and LSU virus stocks were tested at the third and fourth passages, respectively. EBCV LY-138(LY) virus stocks were prepared at the second passage, while the EBCV-L9 virus strain, derived from the EBCV-Mebus strain, had been propagated 80 times in cell cultures. Strategy for cDNA construction and assembly of the 9.5 Kb cDNA sequence representing the 3 H -end of different BCV strains. TRI Reagent from the Molecular Research Center, Inc. (Cincinnati, OH, USA), was used for total RNA extraction. Ready-To-Go You-Prime First-Strand Beads from Pharmacia Biotech Inc. (Uppsala, Sweden) were used for cDNA library construction. All ampli®cations were performed using the Gene-Amp PCR system 9600 (Perkin-Elmer, Norwalk, CT, USA) with PCR reagents and AmpliTaq from Perkin-Elmer. The TAcloning kit from Invitrogen Inc. (San Diego, CA) was used for cloning of RT-PCR products. Restriction enzymes were obtained from New England Biolabs (Beverly, MA, USA). The 3 H genomic end of BCV Mebus strain consisting of 8695 nucleotides was assembled from available sequences deposited in Genbank. The accession numbers of the cDNA sequences used to assemble the EBCV Mebus genome were M31053 for S and HE genes, M31054 for the 4.9, 4.8, 12.7 and 9.5 kDa (E) ORFs, and M16620 for the M, N genes and I ORF. The assembled Mebus genomic sequence did not contain the 32 kDa ORF. An 850 nt sequence containing the 32 kDa ORF of BCV-Quebec (accession number X15445) was used for comparisons with other BCV. The S and HE cDNA sequences speci®ed by LY-138 were previously reported (15, 16) . A series of overlapping cDNA clones representing the entire 3 H -end of two RBCV isolates and unpublished sequences of LY-138 were constructed. Two cDNA libraries were produced, a library was made using the BCV3 H primer representing the 3 H terminus of the genomic RNA, and a second cDNA library was produced using an oligonucleotide (3B11) to prime cDNAs starting at nucleotide 6345 (counting from the 3 H -end of the viral genome) (Fig. 1) . The entire 9545 nt sequence representing the 3 H -end of the BCV genome was divided into six overlapping cDNA regions. Each cDNA was ampli®ed by PCR using speci®c primer pairs. Primer pair 5F6/BCV 3 H ampli®ed a cDNA fragment containing the M and N genes. Primer pair 5F5/3B3 ampli®ed a cDNA fragment containing the 3 H -end of S, 4.9 kDa, 4.8 kDa, 12.7 kDa, E, M and the 5 H -end of N genes. Primer pair B5 H /B3 H ampli®ed the 3 H -end of the spike gene, primer pair 5F24/A3 H ampli®ed a cDNA that coded for the carboxy-terminal portion of the S1 subunit, primer pair A5 H /3B11 ampli®ed a cDNA fragment that coded for the amino-terminus of S, and primer pair 5F16/3B10 ampli®ed the 32 kDa and HE genes. B3 H and A5 H primers were designed to contain an extra BamHI and EcoRI sites for cloning purposes, while a BstXI site was naturally present in the 5F24 primer. The actual primer sequences are: DNA sequencing and analyses. DNA sequencing was carried out with the modi®ed dideoxynucleotide chain termination procedure (17) Overall comparisons of genes and predicted proteins speci®ed by RBCV (LSU, OK) and EBCV (LY, Mebus) strains. To establish the close evolutionary relationship between LSU and OK strains and to ascertain RBCV-speci®c amino acid changes (conserved in LSU and OK but different in other strains), a pairwise comparison of nucleotide and amino acid differences among BCV strains for all ORFs, except for the ORF coding for the RNA-dependent-RNA-polymerase and the 32 kDa protein was performed (Table 1 ). In general, the nucleotide and amino acid sequences of RBCV strains LSU and OK were more conserved to each other than to EBCV strains LY-138 and Mebus, and they were more divergent to the Mebus strain than to the LY-138. Speci®cally, the amino acid sequence of M speci®ed by LSU and OK were identical, while they were different by one, and two aa from that of Mebus and LY-138, respectively. The S glycoprotein speci®ed by LSU differed by only 4 amino acids from that of OK, while S glycoproteins of LSU and OK differed by 22 and 33 amino acids in comparison with the LY and Mebus S sequences, respectively. Furthermore, LSU and OK sequences of the N and I ORF (located within N) were more conserved to each other than to any other strain compared. Most amino acid changes within HE, 4.9, 4.8, 12.7 kDa ORFs, E, N and the I ORF were strain-speci®c. HE and M contained one RBCV-speci®c aa change, and N and I ORF contained two RBCV-speci®c aa changes each. RBCV-speci®c amino acid substitutions within S. The S1 subunit contained most of the RBCV-speci®c aa substitutions and included an amino acid change within the signal sequence as well as two clusters of amino acid substitutions within the amino-terminus and the hypervariable region (Fig. 2) . The proteolytic cleavage site that separates S1 and S2 subunits was conserved among RBCV, LY-138 and Mebus strains. In contrast to the high number of RBCV-speci®c substitutions within S1, S2 contained only two RBCV-speci®c amino acid changes, an Ala 769 to Ser change immediately adjacent to the proteolytic cleavage site and an Asp 1026 to Gly located within the heptad repeat sequence. The RBCV-G95 strain was isolated from a nasal sample of a calf that had diarrhea and signs of respiratory distress (18) . The nucleotide and predicted primary structures of S and HE glycoproteins speci®ed by RBCV-G95 were reported previously (19, 20) . LSU and OK had ten unique aa substitutions within S in comparison to all other BCV strains, while G95, LSU, and OK shared only three aa substitutions at aa 100, aa 465, and aa 1026 (Fig. 2) . RBCV-speci®c nucleotide and amino acid substitutions within the 4.9, 4.8, and 12.7 kDa ORFs. The human coronavirus strain OC43 (HCV-OC43) lacks two ORFs which potentially encode two nonstructural proteins of 4.9 and 4.8 kDa (21) . Furthermore, the same genomic areas are deleted in three Hemagglutinating Encephalomyelitis Virus (HEV) strains of swine (22) . The fact that respiratory HCV-OC43 and EBCV strains show remarkable genomic and protein similarities as well as immunological cross-reactivities, prompted us to compare the nucleotide sequences speci®ed by the genomic region between the S and the 12.7 kDa ORF of EBCV, RBCV, HCV-OC43, and three porcine HEV strains (Fig. 3 (Fig. 4) . Identical aa changes were also found in the 32 kDa protein of two more RBCV strains isolated from Texas and Arizona cattle (data not shown). Genetic comparisons among different BCVs revealed substantial differences between RBCV and EBCV strains principally within the S gene and within ORFs located between the S and E genes. Furthermore, genetic differences between virulent and avirulent strains were identi®ed within the S gene, the E gene and the 32 kDa ORF. The salient features of genetic differences between RBCV and EBCV strains are discussed below: RBCV-speci®c genetic alterations in the S gene. A pairwise alignment of TGEV and MHV S aa sequences revealed that the N-terminal portion of S1 which is deleted in the porcine respiratory coronaviruses (PRCV) and HCV-229E, in comparison with TGEV, is the region corresponding to the MHV receptor binding-site (aa 1±330) (23) . The TGEV receptor-binding site is in a different location (aa 500± 700) and aligns with the S polymorphic region of the MHV strain. Recently, it was shown that only two aa changes at the N-terminus of TGEV S resulted in the loss of enteric tropism (13) . The S1 amino terminus speci®ed by RBCV strains LSU and OK contained aa changes at aa11, aa115, aa118, aa173 and aa179 which may affect S1-mediated receptor binding. Hemagglutination of chicken red blood cells (RBC) was shown to be mediated by the S glycoprotein, because puri®ed S of the EBCV Mebus strain agglutinated chicken RBC, while puri®ed HE did not (10, 24) . RBCV strains LSU and OK agglutinated mouse and rat, but not adult chicken RBC (3). Therefore, aa changes within S speci®ed by RBCV may be responsible for the inability to hemagglutinate chicken RBC. The S1A virus neutralizing (VN) immunoreactive epitope (aa 351±403) (25) was identical for all viral strains, except for a single aa change at aa 362 speci®ed by the avirulent, cell culture-adapted strain EBCV-L9 (Fig. 2) . Furthermore, the S1A epitopes of HCV-OC43 and the BCV Mebus strain were identical (21) . Monoclonal antibodies (MABS) against the EBCV Mebus cross-reacted with different animal and human coronaviruses (25) . Therefore, it is likely that these antibodies react with the S1A epitope. The hypervariable region of the S glycoprotein contains the S1B immunoreactive epitope which is the target for virus neutralizing Mabs (25) . Four RBCV-speci®c aa substitutions at aa 510, aa 531, aa 543 and aa 578 were located within or proximal to this epitope. Based on the observed aa changes, it can be predicted that Mabs speci®c for this region may be able to distinguish between respiratory, enteric and vaccine BCV strains. The BCV S2 subunit of S induced cell fusion when it was expressed in insect cells, indicating that S2 contained membrane fusion domains (26) . The hydrophobic and heptad repeat regions of S2 are believed to form the coiled-coil structure of the oligomeric S protein that have been associated with fusion activity. Speci®cally, three aa changes within a predicted heptad region of the MHV S2 subunit were shown to be responsible for pH-dependent cell fusion (27) . RBCV strains LSU and OK are highly fusogenic in cell culture (data not shown). Additional experimentation is required to assess whether the aa change of Ala 769 to Ser immediately after the proteolytic cleavage site and the aa change of Asp 1026 to Gly within the heptad repeat are responsible for the extensive cell fusion induced by RBCV. RBCV-speci®c genetic alterations between the S and E genes. The RBCV genomic regions between the S and E genes contained many nucleotide substitutions, deletions and insertions. HCV-OC43 and three porcine HEV strains speci®ed deletions within the 4.9 and 4.8 kDa ORFs, indicating that they are not essential for virus replication. Similarly, the high number of mutations within the RBCV 4.8 and 4.9 kDa ORFs suggests that these ORFs are not essential for virus replication in cell culture ( Table 1) . The 65 nt leader of a cloned BCV defective interfering (DI) RNA when mapped by mutations, could be converted rapidly to the wild-type leader of a helper virus following DI RNA transfection into helper Fig. 2 . Comparison of the predicted amino acid sequences of the BCV S glycoprotein speci®ed by different strains. Amino acids that are different in at least one strain are shown, except aa 1, aa 768 and aa 1363 which are included as reference points. * indicates unique amino acid changes for each strain. Boxed amino acids are common among different strains. Light-gray boxes contain RBCV-speci®c, dark boxes contain virulent-speci®c, and clear boxes contain EBCV-speci®c aa changes. aa 1±17 is the putative signal peptide; aa 351±403 is the S1A immunoreactive domain; aa 517±621 is the S1B immunoreactive domain; aa 452±593 is the hypervariable region; aa 955±992 is the hydrophobic region; aa 993±1032 is the heptad repeat sequence; aa 1312±1325 is the carboxy-terminal anchor sequence. virus-infected cells (28) . Nucleotide substitutions mapped the crossover region to a 24-nucleotide segment that starts from the last nt of the leader-mRNA junction sequence and extends further downstream. The RBCV isolates LSU, OK, as well as RBCV AZ-26649-2 (AZ) and TX-671-2 (TX) isolated from California and Texas cattle, respectively (data not shown), contained a four nucleotide deletion located within this 24 nucleotide segment (Fig. 3) . This deletion may alter the recombination frequency between the leader and the leader-mRNA junction sequence immediately upstream of the 12.7 kDa subgenomic mRNA, and cause either inhibition or enhancement of the putative 12.7 kDa transcription and subsequent protein expression. Genetic differences between virulent and avirulent BCV strains. The S glycoprotein contained 7 aa substitutions which were common for all virulent strains (aa 33, 40, 248, 470, 965, 1241 and 1341). Three mutations within the SI portion of S caused conservative aa changes, while one non-conservative aa change of His 470 to Asp was located within the S1 hypervariable region. All three mutations within S2 caused non-conservative amino acid changes. Amino changes within S1 and S2 may affect the structure and function of the S glycoprotein and alter the pathogenetic potential of these viruses. The 32 kDa ORF speci®ed by RBCV virulent strains LSU and OK as well as EBCV LY-138 contained two frame-shift mutations which resulted in a 9 aa segment near the carboxy-terminus which was different from the corresponding amino acid sequence speci®ed by the avirulent EBCV-Quebec strain (Fig. 4) . A similar double frame shift mutation was found in the 32 kDa ORF speci®ed by HCV-OC43 (29) . These frame shift mutations increased substantially the hydrophilicity of the carboxy terminal portion of the 32 kDa protein in virulent versus avirulent strains (data not shown). Sequencing of the corresponding region of the 32 kDa protein of the avirulent EBCV L9 and Mebus strains as well as the 32 kDa of other virulent strains will substantiate these differences between virulent and avirulent strains. The aa substitution of Gly 53 to Val in the E protein was conserved for all BCV virulent strains (F15, LY-138, OK, LSU), HCV-OC43 (30) , and three different porcine HEV strains (22) . This mutation may affect the ability of these viruses to invade different tissues, because E is part of the virion and is expressed at infected cell-surfaces (31, 32) . Curr Top Microb & Immunol 99, 165±200 Corona and related viruses: Functional domains in the spike protein of transmissible gastroenteritis virus The Coronaviridae: The coronavirus surface glycoprotein The Coronaviridae: The Coronavirus Non-Structural Proteins Nucl Acids Res 16, 10881±10890 We acknowledge the technical assistance of Mamie Burrell with cell culture and virus propagation, and Galina Rybachuck with sequence analysis and design of ®gures. This work was supported in part by USDA Grant 94-3704-0926 of the National Research Initiative Program to J.S. and K.G.K., Louisiana Educational Support Fund (LEQSF) Grant XRF/1995-1998-RD-B-18 to J.S. and K.G.K., LEQSF grant RD-1993-1998-RD-B-04 to K.G.K., and a grant from Immtech Biologics, Inc., Bucyrus, KS. We are indebted for support by the LSU School of Veterinary Medicine. This publication is identi®ed as GeneLab publication #GL1201.