key: cord-302980-2jlz4c58 authors: Crucière, C.; Laporte, J. title: Sequence and analysis of bovine enteritic coronavirus (F15) genome I.—Sequence of the gene coding for the nucleocapsid protein; analysis of the predicted protein date: 1988-03-31 journal: Annales de l'Institut Pasteur / Virologie DOI: 10.1016/s0769-2617(88)80012-1 sha: doc_id: 302980 cord_uid: 2jlz4c58 Summary Sequences encoding the N protein of the bovine enteritic coronavirus-F15 strain (BECV-F15) have been cloned in PBR322 plasmid using cDNA produced by priming with oligo-dT on purified viral genomic RNA. Some 265 insert-containing clones were studied. Hybridization of these inserts with poly(A)+ RNA extracted from infected cells led to the conclusion that they were located at the 3′-end of the genome. After subcloning in M13 phage DNA, clones were sequenced by the Sanger technique. A 1,710-nucleotide sequence corresponding to the gene coding for the viral N-protein was established. It shows 2 overlapping open reading frames (ORF). The 3′-non-coding end of the gene has an 8-nucleotide sequence in common with the homologous genome areas of MHV, TGE and IBV viruses. This sequence may represent the polymerase RNA binding site. An upstream sequence surrounding the first AUG of the smaller ORF corresponds to a potentially functional initiation codon. The sequence of the primary translation product deduced from the DNA sequence predicts a polypeptide of 207 amino acids (22.9 Kd) with a high leucine (19.8%) content, possessing a hydrophobic N-terminal end. The larger ORF has a coding capacity of 448 amino acids (49.4 Kd), corresponding to the N-protein molecular weight. The deduced protein possesses 43 serine residues (9.6% of the total amino acid content) which may be phosporylated and involved in N-protein/RNA binding. N-protein also has 5 regions with a high basic amino acid content. One of them is also serine-rich and has a strong homology site with MHV, TGE and IBV viruses. In the first part of the N-terminal, a 12-amino-acid sequence (PRWYFYYLGTGP) is highly conserved for BECV-F15, JHM, TGE and IBV viruses. BCV Mebus strain and BECV-F15 have only minor differences in their N-protein sequence. Bovine enteritic coronavirus (BECV) belongs to the monogeneric Coronaviridae family having the avian infectious bronchitis virus as type species. They are pleiomorphic, enveloped, surrounded by a fringe of <~ club-shaped >> spikes looking like a corona in the electron-microscope and giving the name to the family. The viral genome is a positive single-stranded RNA of approximately 18 to 20 kb, its 3'-end is polyadenylated [19, 22] . This genome codes for the viral proteins which are nucleocapsid (N), membrane (El), spikes (E2) and several non-structural proteins. They are translated from a 3'-end coterminal nested set of mRNA, each also having a common 5'-leader sequence [8] . Only the unique 5'-terminal sequence, not present in the next smaller RNA of the set, is translated. It was recently established that, in fact, BECV contains 4 main structural proteins: the nucleoprotein N (50 Kd), the transmembrane E1 glycoprotein (28 Kd) and 3 peplomer glycoproteins E2, gpl05 and gp95. The haemagglutinin protein E2 (125 Kd) is cleaved by reducing agents into 2 subunits having molecular weights of 65 Kd; the main neutralizing epitopes of the viral particle are located on gpl05 (105 Kd) [9, 24, 6] ; the structure of gp95 (95 Kd) is not clearly established. The BECV induces very severe, often fatal, diarrhoea in young calves. It was described for the first time in the United States of America [13] ; we have been able to isolate such a virus in the faeces of diarrhoeic calves in France and to experimentally reproduce the disease [4] . These 2 strains of BECV are distinguishable by using monoclonal antibodies [23] . BECV Vaccines produced from cell culture of attenuated or inactivated BECV are not totally protective and they necessitate production of large volumes of viral suspension because of the low infectious titre obtained in authorized cell lines. For these reasons, we have started cloning and sequencing the French F15 strain of BECV to try and produce cheaper and more efficient vaccines by genetic engineering or by oligopeptidic synthesis. HRT18 cells (human rectal tumour cell line) were grown in RPMI-1640 medium containing 15 ~ foetal calf serum (FCS) [10] except that tylosine (10 ~tg/ml) and lincomycine (200 ~tg/ml) were added to the medium instead of penicillin and streptomycin. Bovine enteritic coronavirus F 15 strain (BECV-F15) was isolated from diarrhoeic calf faeces, then directly adapted on HRT18 cells [10] and plaque-purified. It was grown as previously described [4] . Infectious titres reached 5 x 105 plaque-forming unit (PFU)/ml. After freezing and thawing of infected cells together with supernatant and then clarification, the virus was purified by 2 ultracentrifugation steps (velocity then isopycnic) [9] . A 1-ml sample of purified virus suspension in distilled water was added to the same volume of 2-fold concentrated TNE buffer (20 mM pH 8 Tris-HCl, 200 mM NaCI, 2 mM EDTA) containing 400 ~tg of proteinase K. After incubation for 30 min at 37~ then for 5 min at 50~ a same volume of the same buffer containing 2 ~ SDS was added and incubation carried on for 30 min at 25~ Genomic RNA was phenol/chloroform-extracted, then precipitated in 2.5 volumes of 0.25 M sodium acetate in ethanol. After one night at -20~ RNA suspension was centrifuged for 20 min at 10,000 g, the pellet washed with 75 ~ ethanol, dried and dissolved in minimal volume of distilled water. One optical density (OD) unit at 260 nm corresponded to 40 t~g/ml of single-stranded RNA [12] . cDNA cloning. The synthesis of cDNA complementary to the 3'-end of the BECV-FI5 genome was carried out in a volume of 52 ~tl: 10 [zg in 10 ~tl of BECV RNA, denatured at 65~ for 5 min and quickly chilled in an ice bath, were added to 42 ~zl of 100 mM pH 8.3 Tris-HCl at 42~ containing 100 mM KC1, 100 mM MgCI 2, 10 mM dithiothreitol, 4 ~tg actinomycin D, 500 ~tM each of the 4 dNTP, 75 units RNasin, 140 units reverse transcriptase (P.H. Stehelin), and as primer, 10 tzg oligo-dT. Incubation was performed for 2 h at 42~ and the reaction was stopped by adding 2 tzl 500 mM EDTA. Reaction products were extracted with phenol/chloroform, chloroform and ethanol precipitation. Free RNA strands non-hybridized with cDNA were digested with endonuclease T 2 [25] ; these digests and free nucleotides were removed by gel filtration on a spun column of << Sephadex-G50>> medium (Pharmacia) [12] . The RNA-cDNA heteroduplexes were then poly-dC tailed : 2 pmoles of 3' ends were dissolved in 20 ~L1 of 25 mM Tris-HC1 buffer pH 7 containing 100 mM Kcacodylate, 0.2 mM DTT, l mM CoCI 2, 0.2 mM dCTP, 50 ~Lg bovine serum albumin (B.R.L.), 13.5 units of terminal-deoxynucleotidyl transferase (B.R.L.) and 30 izCi ~-32p-dCTP (3,000 Ci/mmole). The reaction was carried out at 37~ for 3 min and stopped by adding 2 g.l 500 mM EDTA [16] . The product was phenol/ chloroform-extracted. An average of 20 dC/3'-end of heteroduplex was obtained. C-tailed heteroduplexes were annealed to dG-tailed PstI-linearized PBR322 plasmid (1 mole for 2 moles), in a volume where the plasmid was at a concentration of 5 ng/~l at 65~ for 10 min. Competent RR1 Escherichia coli cells were transfected with this material [5] . The total DNA concentration was 0.25 Ezg/ml. E. coli cells were grown overnight in a medium containing 12 ~g/ml tetracycline, then treated by alkaline lysis [12] . Plasmidic DNA was extracted by phenol/chloroform treatment and ethanol-precipitated. DNA inserts were removed by PstI restriction enzyme : 1.2 Izl of 10-fold concentrated buffer (100 mM pH 7.5 Tris-HCl, 1 M NaC1, 100 mM MgC12, 1 mg/ml BSA) and 2 units of PstI enzyme (B.R.L.) were added to 10 ~tl of plasmidic DNA solution. Insert size was established by electrophoretic migration in 1% agarose gels in TBE buffer (89 mM Tris, 89 mM boric acid, 2 mM EDTA). Probes were prepared by nick-translation in a 20 ~tl volume containing 0.5 ~tg DNA, 2 ~tl of 10-fold concentrated buffer (500 mM pH 7.2 Tris-HCl, 100 mM MgSO4, 1 mM DTT, 500 ~tg/ml BSA), 20 ~tM each of the 4 dNTP, 2.5 ng pancreatic DNase I (Boehringer), 40 ~tCi a-32p-dCTP (800 Ci/mmole) and 0.8 unit DNA polymerase I. Mixture was incubated for 2 h at 16~ Reaction was stopped by adding 3 tzl 500 mM EDTA pH 8. Free nucleotides were removed by filtration through a spun column. Northern and Southern blots were performed as described by Maniatis [12] . Probes were incubated for hybridization overnight at 42~ (Southern) or at 55~ (Northern); blots were then washed in low salt concentration solutions: three times for 15 min in 0.1% SDS, 2 • SSC and twice for 15 min in 0.1% SDS, • 0.1 SSC at 52~ M 13 dideoxy sequencing was carried out according to the Sanger technique [17] , using ~-3sS-dATP (New England Nuclear). In short, the main steps were the following : DNA replicative forms of mpl8 or mpl9 M13 phage were prepared [3] ; they possess polylinkers with single cleavage sites for EcoRI, SacI, KpnI, Sinai, BamHI, SalI, PstI, SphI and HindIII restriction enzymes. Viral cDNA inserts were extracted from PBR322 plasmid and treated by restriction enzymes having sites in the MI3 polylinker. DNA fragments ranging between 300 and 500 bases were purified by electrophoresis in low melting point agarose (Gibco-BRL) gel. M13 phage DNA was cleaved by the same enzymes and 5' end phosphates removed by alkaline phosphatase (Boehring) treatment [12] . DNA were then phenol/chloroform-extracted and ethanol-precipitated. After ligation of the insert in the vector, performed with 50 ng of insert in a molar ratio of 3/1 TG1, E. coli competent cells were transfected [5] . TG1 recombinant clones were selected in a IPTG-and X-gal-containing medium. White plaques were then checked by hybrization with insert radioactive probe. Sequencing was then performed using a primer complementary to the 3'-end of the DNA strand to be transcribed. These primers were synthesized in an automated DNA synthesizer (Biosearch 8600). Sequence data were analysed and assembled with the aid of the program of Queen and Korn [14] of the ~< Beckman Microgenie >> program (March 1985, version Beckman Instruments, Inc.) adapted to the <> microcomputer. Starting material for cDNA synthesis was 10 ~g of purified and temperaturedenatured viral RNA. When analysed by electrophoresis in alkaline agarose gels, the sizes of the cDNA obtained using oligo-dT as a primer ranged between 1.3 and 6.0 Kb. After binding of heteroduplexes to PBR322, this construction was transfected into E. coil-competent cells and we obtained 2 x 105 clones/~g of PBR322. Some 265 colonies containing 0.3-to 2.0-Kb inserts were studied. Inserts of a larger size than 0.5 Kb very often showed an internal PstI site (results not shown). Their viral specificity was checked, after nick-translation 32p_ labelling, by hybridization with purified genomic viral RNA or cellular RNA ( fig. 1 ). Viral-specific inserts were further used for characterization of other inserts. Insert orientation was established by hybridization with inserts having no PstI site and by restriction endonuclease mapping with enzymes having no or only one cleavage site in PBR322 plasmid. The location of the insert along the viral genome was determined by Northern blot analysis: full length or purified products of insert restriction cleavage were hybridized with poly(A) § RNA extracted from infected or non-infected cells. Before hybridization these RNA were electrophoresed in hydroxymethyl Hg-containing agarose gel. Under these experimental conditions, 8 viral-specific poly(A) + messenger RNA bands were resolved (J. Laporte and C. Cruciere; to be published). They form a specific RNA-nested set as established for other coronaviruses. All the inserts we obtained hybridized with the 8 viral RNA bands (results not shown); they were complementary to the 3' end of the viral genome. Figure 2 presents the schematic location of the inserts we have studied. The 1.6 insert has a 2,000-nucleotide size and the 5'-end of insert 2.56 is presumably 2,400 nucleotides from the 3'-end of the viral genome. As deduced from the sizes of N and E1 viral proteins, they should cover the whole length of the N gene (1,700 nucleotides) and the beginning of the E1 5'-adjacent gene (320 nucleotides). Radioactive probes were prepared from insert-containing PBR322 plasmid. These probes were hybridized on nitrocellulose sheets with dots of RNA extracted from non-infected (C) or BECV-F15-infected (V) HRT18 cells. Hybridization was checked by autoradiography. In the experiment shown, inserts 1.6, 1.22 and 2.56 were clearly virus-specific. cDNA sequencing. As mentioned above, 400-bp fragments of the cDNA clones were subcloned in m p l 8 or m p l 9 M13 phage DNA. Their nucleotidic sequences were determined by sequencing both M13 DNA strands or by multiple sequencing of one strand. We have been able to establish a 1,710-nucleotide sequence from the 3'-end of the genome (fig. 3) the smaller one from nucleotide 135 to nucleotide 755 ( fig. 3) . The first has a coding capacity for a 448-amino-acid protein, the second for a 207-aminoacid protein ( fig. 4 ). We have determined, by cDNA cloning of BECV-F15 genomic RNA using an oligo-dT primer, a sequence of 1,710 nucleotides. We assume that this sequence comprises the nucleocapsid protein gene sequence. For every coronavirus so far studied, the gene coding for the N protein is located at the 3'-end of the viral genome. The same conclusion arises from our studies on the BECV-F15 poly(A) + RNA (to be published). The largest ORF has a 1,344-nucleotide length and encodes for a 448-aminoacid protein with a molecular weight of 49.4 Kd. Our previous results [4] had shown a 50-Kd molecular weight N protein. Recently [11] it was described for the US Mebus strain of the related bovine corona virus (BCV), that the N protein gene was at the 3'-end of the viral genome. Main ORF. ~ The distance between the first AUG following the initiation codon and this initiation codon is 693 nucleotides. When we compared the sequence around the initiation codon to homologous sequences of different strains of MHV we found the same CTAAAC sequence upstream of the initiation AUG. Secondary ORF. --The consensus sequence GUAAUGGC surrounding its initiation codon is one of optimal environment for starting mRNA translation [7] . Bunyaviruses and adenoviruses express 2 different proteins from only one gene by having 2 overlapping ORF [7] . So, we cannot exclude the translation of a protein from the secondary ORF. Its predicted molecular weight is 22.9 Kd for 207 amino acids. This protein has a rather high leucine content : 19.8 % compared to 5 070 for the N protein. Furthermore, its N-terminal end is hydrophobic and is a potential membrane anchor region. Genes presenting 2 different ORF are also described for other coronaviruses : mRNA5 of JHMvirus [20] , mRNAD of IBV [2] and N protein mRNA of the Mebus BCV strain [11] . This part of the genome may play an important role during the genomic RNA trancription to the complementary minus RNA strand. Sequence homology between BECV-F15 and MHV for the last 100 nucleotides of the coding part is only 59 070, but homology increases to 75 ~ for the 3'-noncoding end. A 10-nucleotide sequence (GGGAAGAGCT) was found in common at the same place of this gene area for MHV and IBV viruses [2] (fig. 5 ). We find an identical sequence (except the last T) for BECV-F15 virus between nucleotides 1,631 and 1,640. When looking at the GETV genome se- . 6 ) and only 25.2 07o and 24.1 070, respectively, with TGE and IBV virus N proteins. These coronavirus N proteins are phosphorylated on their serine residues [18] . Our results show 43 serine residues in BECV-F15 nucleocapsid protein (9.6 070 of the total amino acids). For this virus and for JHM, TGE and IBV viruses we find 2 main areas where serine residues are clustered. For BECV-F15 and JHM viruses they are in homologous areas (nucleotides 9 to 19 and nucleotides 191 to 220) of low overall homology (58 070 and 53 070). One serine cluster is common to the 4 viruses. This fact is striking because of the low sequence homology between these viruses. It was previously established [21, 1] that N protein genomic RNA binding sites are located in the basic portions of the protein. For the complete sequence there is an excess of 19 basic residues compared to acidic residues. There are 5 basic-rich regions which are found in homologous areas of MHV, TGE and IBV viruses. Concerning BECV-F15 and MHV, 4 of these areas have 90 07o homology. The fifth has only 60 % homology but is also serinerich and possesses a sequence in common with TGE and IBV viruses (amino acids 193 to 222). It may have a more specific function in protein/RNA recognition. We also observed a strong sequence homology, not yet described, in the first part of the N-terminal end of the N proteins of BECV-F15, BCV, MHV, TGEV and IBV viruses: This sequence has no peculiar properties : 9 hydrophilic and 8 hydrophobic residues. The biological significance of these findings is not known. In conclusion, we have noticed that there are only minor changes between BECV-F15 and BCV Mebus strain N proteins. Work is in progress to sequence the other virus genes and to find out how similar in fact these two last viruses are. Because of the antigenic differences established by monoclonal antibody screening, the specificities should be found on the gene coding for the spike gpl05 protein. St~QUENCE ET ANALYSE DU Gt~NOME DU CORONAVIRUS ENTI~RITIQUE BOVIN (F15) I. --S6quence du g6ne codant pour la prot6ine nucl6ocapsidique; analyse de la prot6ine d6duite Nous avons clon6 I'ARN g6nomique du coronavirus ent6ritique bovin F15 (BECV-F15), dans le plasmide PBR322 apr~s avoir pr6par6 le cDNA correspondant ~ raide d'une amorce oligo-dT: 265 clones ont 6t6 &udi6s. Leur hybridation avec les ARN poly(A) + extraits des cellules infect6es nous a permis de les localiser ~t l'extr6mit6 3'-terminale du g6nome. Ces clones ont &6 s6quenc6s par la technique de Sanger, apr6s sous-clonage dans I'ADN du phage M13. Nous avons d&ermin6 une s~quence de 1.710 nucl6otides correspondant au g~ne codant pour la prot6ine N virale. Elle pr6sente deux cadres ouverts de lecture (ORF) chevauchants. On observe /t l'extr6mit6 3'-terminale non codante du g6nome une s6quence de 8 nucl6otides observ6e 6galement dans la r6gion homologue des virus MHV, GET et IBV. Cette s6quence pourrait &re le site de fixation de I'ARN polymdrase. Le premier AUG du plus petit ORF poss~de en amont une s6quence nucl6otidique qui en fait un site d'initiation potentiellement fonctionnel. La s6quence du produit primaire de traduction que l'on en d6duit est un polypeptide de 207 acides amin6s (22, 9 Kd ) ~ haute teneur en leucine (19,8 07o) ayant une extr6-mit6 N-terminale hydrophobe. Le plus grand ORF a une capacit~ de codage de 448 acides amin6s (49,4 Kd), correspondant/t la masse mol6culaire de la prot6ine N. La prot6ine d6duite contient 43 r6sidus s6rine (9,6 ~ des acides amin6s), qui peuvent ~tre phos-phoryl6s et impliqu6s dans la liaison entre la prot6ine Net I'ARN g6nomique. Cette prot6ine pr6sente 6galement 5 r6gions fortement basiques, et l'une d'entre elles est 6galement riche en s6rine eta une forte homologie de s6quence avec la r6gion homologue des prot6ines N des virus MHV, GET et IBV. En outre, la premiere partie de l'extr6mit6 N-terminale montre un encha/nement de 12 acides amin6s (PRWYFYYLGTGP) tr~s conserv6 entre ces quatre m~me virus. Les s6quences des prot6ines N de la souche Mebus du BCV et du BECV-F15 ne pr6sentent que des diff6rences mineures. MOTS-CLI~S: Coronavirus, Prot6ine, Nucl6ocapside, G6nome; Souche BECV-F15, S6quence de la prot6ine N. Sequence of the nucleocapsid gene from murine coronavirus MHV-A59 Sequences of the nucleocapsid genes from two strains of avian infectious bronchitis virus An integrated and simplified approach to cloning into plasmids and single-stranded phages The experimental production of diarrhoea in axenic and gnotobiotic calves with enteropathogenic E. coli, rotavirus, coronavirus and in combinated infections of Studies on transformation of Escherichia coli with plasmids Bovine coronavirus structural proteins Comparison of initiation of protein synthesis in procaryotes, eucaryotes and organelles Characterization of leader RNA sequences on the virion and mRNAs of mouse hepatitis virus, a cytoplasmic RNA virus Polypeptide structure of bovine enteritic coronavirus: comparison between a wild strain purified from feces and a HRT18 cell adapted strain Une lign6e cellulaire particnli~rement sensible ~ la r6plication du coronavirus ent6ritique bovin: les cellules HRT18 Sequence analysis of the bovine coronavirus nucleocapsid and matrix protein genes in ~Molecular cloning: a laboratory manual)). Cold Spring Harbor Laboratory Neonatal calf diarrhea: propagation, attenuation and characteristics of a corona-like agent A comprehensive sequence analysis program for the IBM personal computer Enteric coronavirus TGEV: partial sequence of the genomic RNA, its organization and expression Terminal labeling and addition of homopolymer tracts to duplex DNA fragments by terminal deoxynucleotidyl transferase DNA sequencing with chain terminating inhibitors Coronavirus JHM: a virionassociated protein kinase The structure and replication of coronaviruses Coronavirus MHV-JHM mRNA 5 has a sequence arrangement which potentially allows translation of a second, downstream open reading frame Coding sequence of coronavirus MHV-JHM mRNA 4 The molecular biology of coronaviruses Utilization of monoclonal antibodies for antigenic characterization of coronaviruses Antigenic and polypeptide structure of bovine enteritic coronavirus as defined by monoclonal antibodies, in ((Molecular Biology and Pathogenesis of Coronaviruses>> Molecular cloning of the genome of poliovirus type 1 I AUCUCACCCAUU CCCUGCGU GCAUCCGCUUCACU GAUCUCUU CUUACAUCUUUUUAUAAUCU 11 III llIl 111I lIl I J II ll 111 1 IIIlI III I I I L11111 I UAUAAGAGUCAUUCCCCUCCGUACCUACCCUCUCUACUCUAAAACUCUUCUAGUUUAAAUCUAAUCUAAUCU 63 AAACUUUAAGCAUGUCUUUUACUCCU CGUAAG CAAU CCAGUAGUAGACCGUCCUCUGCAAAUCCUUCUG IIIIIIIIIIIIIIIIIIII IIII II Ili III II I11I III IIlllIIllIl II III 73 AAACUUUAAGGAUCUCUUUUCUUCCUGGGCAAGtUU~,UCCCGGUAGCACAACCUCCUCUCGAAJ~CCGCCCUG 132 CUAAUCCCAUCCUUAAG UGCCCCCAUCACUCCGACCAAUCUACAAAUCUUCAAACCACCCGUA IIIIllI IIIII III IIIII II II IIIl I IIII lllIl II III 145 CUAAUGCAAUCCUCAs 195 CAACACCUCAACCCAACCAAACUCCUACUUCUCACCA~ICCAUCAGCACCCAAUCUUGUACCCUACUAUUCUU Ili III IIilliii IIII1 III II IIIII IIII III II III I II IIII 217 CA&ACAAUCACCCCAACCAGACUCCAACUACU CAACCCAAUUCCCCGACUCUCGUUCCCCAUUACUCUU 267 CCUUCUCUCGAAUUACUCAGUUUCAAMIGCGAAACCACUUUGAAUUUGCUGACGCACAAGGUCUCCCUAUUG IIII II II IIIII II II II IIIIIIII IIIIII I IIIll I llIllIII IIilliilli 286 CGUUUUCCCGCAUUACCCAAUUCCAG.,~IGGCAAAACAGUUUCACUUUGCACAAGGAC~GGAGUGCCUAUUG IIIIIIII III II II IIIII IIIIIIilli II l IIIII II I III I II II I 502 AGUAUGCCGACGAUAUCGAAGGAGUUCUCUGCGUCGCAAGCCAACACCCCG&CACUACCACCUCUCCCGAUA 555 UUCUCCAUCCGGACCCAACUACCCAUCACCCUAUUCCCACUAGGUUUCCCCCUCCCACCCUACUCCCUCACC II I II IIIIIIIIIIIIII llllIllllIII1 IIIIIII11 IIII II IIIIII I IIIII I 574 UUCUUGAAACGGACCCAAGUAGCCAUGAGCCUAUUCCUACUACCUUUCCGCCCCGUACGCUAUUCCCUCAAG 627 CUUACUAUAUUGAACCCUCAGGAAGGUCUGCUCCUAAUUCCACAUCUACUUCACGCGCAUCC^GUAGACCCU Ili III IIIIIIIIIIIIIIIIIIIIII III I IIIII II IIII III I I I 646 CUUUUUAUGUUGAAGCCUCACGAAGGUCUCCACCUGCUAGUCCAUCUCCUU CGCG GCCACAAUCCCCU 699 CUACUCCAGCAUCCCCUAGUACAGCCMIUUCUCCCAAUAGAACCCCUACCUCUCCUGUAACACCUGAUAUCG I IIIIII IIII II III I I I III Iilli IIIII Iil11IIIIII 714 CCCCCAAAUAAIJCGCG CUACAAGCAGUUCCAACC^GCGCCAGCCUGCCUCUACUCUAAAACCUCAUAUCG 771 CUCAUCAAAUUGUCAGUCUUCUUUUCCCAAAACUUGGCAAGCAUGCCACUAACCCACACCAAGUAACUAACC I II IIIIII llllllllllll II II II II IIIIII lllI IIIIIIIIII IIII 78& CCGAACA&AUUGCUGCUCUUGUUUUCCCUt~GCUCCCUAAACAUCCCGGCCACCCUAACCAAGUAACAAACC 843 ACACUGCCAAAGAAAUCAGACAGAAAAUUUUGAAUAACCCCCGCCAGAAGAGGAGCCCCAAUAAACAAUCCA I I I111111111 IIII IIIIIIIIIII II IIIII IX II IIIIIII II II II II I I II 1111111IIIIII III II I II IIII IIIII I I II I I I II 1570 CCCUGUAUCAAUUAGUUGAAAGACAUUGCAAAAUAG^GAAU GUCUGACAGAAGUUACCAAGGUCCUACGUC 1605 GAA GUA AUUCCCCACAACUCCCCAAGGGGAAGAGC CAGCAUCUUAACUUCCCACCCAGUAAUUAC II II I I I I I I IIII IIll11111 1111.I II III I I Ill 1641 UAACCAUAAGAACGGCGAUAGGCCCCCCCUGGCAAGAGCUCACAUC^GGGUACUAUUCCUGC~UGCCCUAG 1670 UA&AUCAAUCAAGUUAAUUAUCCCCAAUUGCAAGA~UCACA 111111111111111 I 111111111111111111111 1723 U&AJ~UGAAUCAJ~CUUCAUCAUGGCCAAUUCGAAGAAUCAC