key: cord-298406-7wfdwou8 authors: Sun, Haifang; Luo, Haibin; Yu, Changying; Sun, Tao; Chen, Jing; Peng, Shuying; Qin, Jun; Shen, Jianhua; Yang, Yiming; Xie, Youhua; Chen, Kaixian; Wang, Yuan; Shen, Xu; Jiang, Hualiang title: Molecular cloning, expression, purification, and mass spectrometric characterization of 3C-like protease of SARS coronavirus date: 2003-12-31 journal: Protein Expression and Purification DOI: 10.1016/j.pep.2003.08.016 sha: doc_id: 298406 cord_uid: 7wfdwou8 Abstract Severe acute respiratory syndrome (SARS) is an acute respiratory illness, which has broken out in China. It has been known that SARS coronavirus (SARS_CoV) is a novel human coronavirus and is responsible for SARS infection. Belonging to one of the major proteins associated with SARS_CoV, SARS 3C-like protease (SARS_3CLpro) functions as a cysteine protease engaging in the proteolytic cleavage of the viral precursor polyprotein to a series of functional proteins required for coronavirus replication and is considered as an appealing target for designing anti-SARS agents. To facilitate the studies regarding the functions and structures of SARS_3CLpro, in this report the synthetic genes encoding 3CLpro of SARS_CoV were assembled, and the plasmid was constructed using pQE30 as vector and expressed in Escherichia coli M15 cells. The highly yielded (∼15mg/L) expressed protease was purified by use of NTA-Ni2+ affinity chromatography and FPLC system, and its sequence was determined by LC/MS with the residue coverage of 46.4%. From the end of the year 2002 to the June of the year 2003, one severe epidemic disease called severe acute respiratory syndrome (SARS) broke out severely in China, and SARS infection has also spread to more than 30 countries. By using biophysical and biochemical techniques such as electron microscopy, virus-discovery microarrays containing conserved nucleotide sequences characteristic of many virus families, randomly primed RT-PCR, and serological tests, it has been determined that SARS coronavirus (SARS_CoV) is responsible for SARS infection [1] [2] [3] . Coronavirus is a positive-stranded RNA virus with halo or corona appearance if viewed under a microscope and involves the largest viral RNA genomes known to date. The studies have suggested that SARS_CoV is a previously unknown coronavirus, which belongs neither to a mutant of any known coronavirus nor a recombinant of known coronaviruses; it is believed to be a novel human coronovirus possibly originated from a nonhuman host [4, 5] . Proteolytic processing of viral polyproteins is a key step in the replication cycle of many positive-strand RNA viruses and such processing is performed by the encoded proteases [6, 7] . It has been known that the replicase gene for encoding the proteins required for coronavirus replication and transcription encompasses more than 20,000 nucleotides [8, 9] and encodes two overlapping polyproteins, pp1a (replicase 1a, around 450 kDa) and pp1ab (replicase 1ab, around 750 kDa), which feature the sequence motifs of both papain-like cysteine protease and the 3C-like protease (3CL pro ) [10, 11] . Recently, the genome sequencings deposited in the GenBank (http://www.ncbi.nlm.nih.gov/) for the SARS_CoV from different SARS patients have laid a potent foundation for the research of SARS pathogenesis and anti-SARS drug design [12] [13] [14] . The fact has been demonstrated that the important proteins associated with the SARS_CoV infection involve the RNA polymerase, the spike (S) glycoprotein, the envelope (E) protein, the membrane (M) protein, the nucleocapsid (N) protein, and the main protease, 3Clike (3CL pro ) protease [15, 16] . As the viral main protease, 3CL pro functions as a protease to control the activities of the coronavirus replication complex [17, 18] . It has been concluded from the previous research data that the 3CL pro -mediated processing pathways are conserved in coronaviruses. Coronavirus main proteases employ conserved cysteine and histidine residues in the catalytic site and lack acidic active site residue [6, [19] [20] [21] . The results have also confirmed that for coronavirus main proteases their substrate specificities are also well defined, with the known cleavage sites involving bulky hydrophobic residues (mainly leucine) at the P2 position, glutamine at the P1 position, and small aliphatic residues at the P1 0 position [17, 18] . In addition, the recent determination of the crystal structures for human coronavirus (strain 229E) 3CL pro and for an inhibitor complex of porcine coronavirus (transmissible gastroenteritis virus, TGEV) 3CL pro also confirms a remarkable degree of conservation of the substrate binding sites for coronavirus 3CL pro [16] . In fact, the studies have already shown that 3CL pro is a useful target for screening anti-virus agents [18, 22, 23] . Like other 3CL pro , it is hopeful that SARS_3CL pro will surely become an appealing target in discovering new agents for the treatment of SARS [16] . Therefore, based on the aforementioned facts, it seems to be very important to express and purify large amount of the SARS_3CL pro for its structural and functional research purposes. In our previous work [24, 25] , we reported a 3D model of SARS_3CL pro and its inhibitor design by virtual screening, as well as the cloning, expression, and purification of the E protein of SARS_CoV. In this article, we would like to present the results describing the molecular cloning, expression and purification of 3CL pro of SARS_CoV, and the preliminary study on its mass spectrometric characterization is also reported. The restriction and modifying enzymes in this work were purchased from TaKaRa and the vector pQE30, the bacterial strains M15 and DH5a were from Qiagen. Trizol and Superscript II reverse transcriptase were purchased from Gibco. Trypsin (sequencing grade) was purchased from Sigma. The chelating affinity column and lower molecular weight (LMW) marker were purchased from Amersham-Pharmacia Biotech. All other chemicals were from Sigma in analytical grade. Bacterial strains and culture media Escherichia coli DH5a was utilized for propagation of plasmids. DH5a was maintained on LB agar plates and grown at 37°C, while M15 was cultured on LB agar plates containing kanamycin (25 mg/L). For agar plates, Bacto agar was added to the media to a final concentration of 1.5% (w/v). Ampicillin was added to the media at a final concentration of 100 mg/L for the selection of transformants. E. coli M15 was chosen as the host for gene expression. The strains were maintained in LB medium including 15% glycerol at )80°C. Ampicillin and kanamycin as antibiotics were added to the media at a final concentration of 100 and 25 mg/L, respectively. All cloning techniques including PCR, restriction digestion, ligation, E. coli transformation, and plasmid DNA preparation were according to the literature method [26] . SARS_CoV (isolate BJ01) RNA was extracted with Trizol reagent according to manufacturerÕs instruction (www.genehub.net/trizol.htm). The reverse transcription was performed with the random priming method by the Superscript II reverse transcriptase. The SARS_3CL pro cDNA was subsequently amplified by PCR, using the following primers: 3CLf (5 0 -GGGGGATCCACCA TGAGTGGTTTTAGGAAAATGGCA-3 0 ) and 3CLr (5 0 -GGGAAGCTTTTGGAAGGTAACACCAGAGC A-3 0 ). After digestion with BamHI and HindIII, the PCR product was inserted into the BamHI and HindIII sites of the vector pQE30 (Qiagen). The residues in the expression tag are ''MRGSHHHHHHGSTM''. The SARS_3CL pro insert was verified by sequencing. Expression and purification of SARS_3CL pro Escherichia coli M15 cells transformed with the plasmid pQE30-SARS_3CL pro were grown in 100 ml LB medium containing ampicillin (100 mg/L) and kanamycin (25 mg/L) at 37°C overnight and then inoculated into 1 L LB supplemented with both the antibiotics. The expression of SARS_3CL pro was induced by the addition of 0.5 mM of isopropyl b-D D -thiogalactoside (IPTG). After induction for 5 h at 18°C, the cells were harvested by centrifugation at 4000g, 4°C for 30 min. The pellet was washed, frozen, and then disrupted by sonication against Buffer A (20 mM Tris-HCl, 0.5 M NaCl, and 5 mM imidazole, pH 8.0). The lysed cells were centrifuged at 14,000g at 4°C for 1 h. Keep the supernatant and discard the pellet. A 1-ml HiTrap Ni 2þ chelating column was equilibrated with 10 ml of sterile deionized water, 50 mM NiSO 4 , and finally 10 ml Buffer A. The supernatant was passed over the column at a flow rate of 5 ml/min, followed by washing it with 20 ml Buffer A and 20 ml Buffer B (80 mM imidazole in Buffer A), respectively. The protease of interest was eluted with 10 ml Buffer C (20 mM Tris-HCl, 0.5 M NaCl, and 0.5 M imidazole, pH 8.0) and then purified further by gel filtration using a HiTrap 16/60 Sephacryl S100 column pre-equilibrated with Buffer D (5 mM dithiothreitol, 150 mM NaCl, and 10 mM Tris-HCl, pH 7.5) through an FPLC system (Pharmacia). The highly purified His-tagged SARS_3CL pro with the yield of 15 mg/L was obtained. The protocol used for the in-gel digest in this study was modified according to the literature method described by Yu et al. [27] . The gel band of interest (SARS_3CL pro ) was exercised from the Coomassiestained SDS-PAGE gel with a steel scalpel and destained in an Eppendorf tube by washing sequentially with 100 ll of 30% CH 3 CN/100 mM ammonium bicarbonate. The washing step was repeated until the gel bands were clear. And then the gel band was completely dried by a Speed-Vac Vacuum centrifuge apparatus (Savant, Holbrook, NY) and cut into small pieces. The dried pieces were reswollen by adding about 30 ll of 50 mM ammonium bicarbonate (pH 8.3). The volume added was to the minimum necessary to completely cover the gel pieces and then trypsin was added to the ratio of enzyme to sample in 1:20 (w/w). The gel pieces were incubated at 37°C for 12-16 h.The tryptic peptides were extracted by adding 30 ll solution containing 60% CH 3 CN/0.1% TFA and vortexing for 4 min before removing the solution. This extraction step was performed three times with the same solution. The extraction solution was pooled together in a 0.5 ml Eppendorf tube and evaporated to 10-20 ll by Speed-Vac Vacuum centrifuge apparatus. The LC/MS system used for analyzing tryptic peptides was a combination of HP1100 (Agilent, Cheshire, UK) LC system with LCQ-DECA Mass Spectrometry (Thermofinnigan, San Jose, CA). A microbore reverse phase column (C8 50Â1.0 mm ID, 7 lm, ABI RP300) was used for LC separation. Solvent A was 0.1% FA in 100% (v/v) water and solvent B was 0.1% FA in 100% (v/ v) CH 3 CN. The gradient started at 5% B, held for 2 min, and went linearly to 80% B in 50 min. The peptide mixture was injected into the column by an autosampler and separated at a flow rate of 200 ll/min. The fractions were detected by PDA (TSP UV6000) and directly introduced on-line into ESI source. The operating condition was optimized with standard solution provided by manufactures and the working parameters of ion source were as follows: capillary temperature, 200°C; spray voltage, 5 kV; capillary voltage, 15 V; and sheath gas flow rate, 20 arb. To get more mass spectra within an LC peak, two types of scan modes, full scan and MS/MS (with datadependent), were used for acquiring more data points. The scan mass range was from m=z 400 to m=z 2000 and the collision energy was set at 38%. The SARS_3CL pro PCR product verified by sequencing was digested with BamHI and HindIII, and then inserted into the BamHI and HindIII sites of the vector pQE30. E. coli M15 cells transformed with the plasmid pQE30-SARS_3CL pro were used for the expression of His-tagged SARS_3CL pro . Based on the optimization of the expression and purification method of SARS_3CL pro from E. coli, the homogeneous protein was successfully isolated by two chromatographic steps. The SDS-PAGE analysis of the purification of SARS_3CL pro is shown in Fig. 1 and the purification scheme of SARS_3CL pro expressed in E. coli transformed with pQE30-SARS_3CL pro in 1 L culture is listed in Table 1 . From these results it can be seen that the use of pQE30-SARS_3CL pro plasmid and expression in E. coli M15 cell can produce a large amount of soluble SARS_3CL pro . The purification procedure is also easy to be handled. The result of data search using MS/MS raw data of tryptic peptides from gel band shows that the 3CL pro is the first candidate with a summary score of 456.5, which is much higher than that of the second candidate (score 46.5), and it also shows that eleven peptides (T1-T11) were matched with tryptic peptides of 3CL pro (Fig. 2) . The MS/MS spectrum of doubly charged precursor ion of T3 peptide at m=z 566.2 was displayed as an Table 1 Purification Scheme of SARS_3CL pro expressed in E. coli transformed with pQE30-SARS_3CL pro (1 L culture) Step Total protein (mg) SARS_3CL pro (mg) Purification factor 3CL pro yield (%) Extraction 500 50 1 100 Ni 2þ -affinity column 50 40 8 80 Gel filtration 15 15 10 example in Fig. 3 . Most of the b and y ions were detected (>80%). The total number of amino acids contained within the eleven tryptic peptides was 145 (Scheme 1), with the protein coverage at 46.4%. The molecular weight of 3CL pro was also determined by the LC-MS system. The LC condition was based on that above described for peptide separation. An LC peak at a retention time of 18 min was observed (data not shown). Mass spectrum corresponding to this LC peak gave a multiple charge ion (Fig. 4) . The molecular weight was obtained by deconvolution algorithm in ''sequest'' program. The measured mass of 3CL pro protease is 35831 (Fig. 5 ) and the difference between measured and theoretical mass (mw.35832) was only 1 dalton. These results completely determine the identity of the expressed SARS_3CL pro in this work. In conclusion, in this work we have succeeded in the molecular cloning of pQE30-SARS_3CL pro , and with this plasmid using E. coli as expression system a large amount of purified His-tagged SARS_3CL pro protease has been obtained by NTA-Ni 2þ affinity chromatography. The achieved protease may be surely used for screening its crystallized conditions for X-ray crystallographic analysis. SARS coronavirus: a new challenge for prevention and therapy Coronavirus as a possible cause of severe acute respiratory syndrome Aetiology: KochÕs postulates fulfilled for SARS virus SARS-associated coronavirus Biosynthesis, purification, and characterization of the human coronavirus 229E 3C-like proteinase Expression of virus-encoded proteinases: functional and structural similarities with cellular enzymes Complete sequence (20 kilobases) of the polyprotein-Fig. 5. Molecular weight of SARS_3CL pro protease encoding gene 1 of transmissible gastroenteritis virus Viral replicase gene products suffice for coronavirus discontinuous transcription Nucleotide sequence of the human coronavirus 229E RNA polymerase locus The complete sequence (22 kilobases) of murine coronavirus gene 1 encoding the putative proteases and RNA polymerase Characterization of a novel coronavirus associated with severe acute respiratory syndrome The genome sequence of the SARS-associated coronavirus Perspective: The SARS coronavirus: A postgenomic era Mass spectrometric characterization of proteins from the SARS virus: a preliminary report Coronavirus main protease (3CL pro ) structure: basis for design of anti-SARS drugs Characterization of a human coronavirus (strain 229E) 3C-like proteinase activity Structure of coronavirus main protease reveals combination of a chymotrypsin fold with an extra alpha-helical domain Characterization and mutational analysis of an ORF 1a-encoding proteinase domain responsible for proteolytic processing of the infectious bronchitis virus 1a/1b polyprotein Virus-encoded proteinases and proteolytic processing in the Nidovirales Conservation of substrate specificities among coronavirus main proteases Coronavirus protein processing and RNA synthesis is inhibited by the cysteine protease inhibitor E64d Identification of active-site amino acid residues in the Chiba virus 3C-like protease A 3D model of SARS_CoV 3CL protease and its inhibitors design by virtural screening Small envelope protein E of SARS: cloning, expression, purification, CD determination, and bioinformatics analysis Molecular Cloning: a Laboratory Manual Identification of differentially expressed proteins between human hepatoma and normal liver cell lines by two-dimensional electrophoresis and liquid chromatography-ion trap mass spectrometry