key: cord-0715578-yg41s3by authors: Bresson, Stefan; Robertson, Nic; Sani, Emanuela; Turowski, Tomasz W; Shchepachev, Vadim; Kompauerova, Michaela; Spanos, Christos; Helwak, Aleksandra; Tollervey, David title: Integrative Vectors for Regulated Expression of SARS-CoV-2 Proteins Implicated in RNA Metabolism date: 2020-07-20 journal: bioRxiv DOI: 10.1101/2020.07.20.211623 sha: 4e425dea8f2e0c346da2a4fe30fc5c2664d25e7d doc_id: 715578 cord_uid: yg41s3by Infection with SARS-CoV-2 is expected to result in substantial reorganization of host cell RNA metabolism. We identified 14 proteins that were predicted to interact with host RNAs or RNA binding proteins, based on published data for SARS-CoV and SARS-CoV-2. Here, we describe a series of affinity-tagged and codon-optimized expression constructs for each of these 14 proteins. Each viral gene was separately tagged at the N-terminus with Flag-His8, the C-terminus with His8-Flag, or left untagged. The resulting constructs were stably integrated into the HEK293 Flp-In TREx genome. Each viral gene was expressed under the control of an inducible Tet-On promoter, allowing expression levels to be tuned to match physiological conditions during infection. Expression time courses were successfully generated for most of the fusion proteins and quantified by western blot. A few fusion proteins were poorly expressed, whereas others, including Nsp1, Nsp12, and N protein, were toxic unless care was taken to minimize background expression. All plasmids can be obtained from Addgene and cell lines are available. We anticipate that availability of these resources will facilitate a more detailed understanding of coronavirus molecular biology. SARS-CoV-2 is a large positive-sense, single-stranded RNA virus that encodes four structural proteins, several accessory proteins, and sixteen nonstructural proteins (nsp1-16) (Figure 1 ). The latter are mainly engaged in enzymatic activities important for translation and replication of the RNA genome. In addition, nonstructural proteins also target host RNA metabolism in order to manipulate cellular gene expression or facilitate immune evasion (Gordon et al, 2020; Thoms et al, 2020; Yuen et al, 2020) . Understanding these pathways in greater detail will be an important step in the development of antiviral therapies. Our lab has previously developed techniques to map the RNA-bound proteome, including crosslinking and analysis of cDNA (CRAC), to identify binding sites for proteins on RNA, and total RNA-associated proteome purification (TRAPP) to identify and quantify the RNA-bound proteome (Granneman et al, 2009; Shchepachev et al, 2019) . Among other things, TRAPP has given insights into the mechanism of stress-induced translation shutdown in yeast (Bresson et al, 2020) . In the context of SARS-CoV-2, CRAC is expected to identify host RNAs that are targeted by viral factors, while TRAPP will reveal how host cell RNA-protein interactions are globally remodeled in response to specific viral proteins. To facilitate application of these techniques to SARS-CoV-2, we generated a series of synthetic, codon-optimized constructs for 14 different viral proteins that are expected to interact with RNA or RNA binding proteins. To remove the need for error-prone PCR steps, we devised a cloning scheme in which a single synthetic construct could be used to generate untagged, N-, or Cterminally tagged versions of the protein. Using these vectors, we generated and tested a series of human cell lines with the viral ORFs stably integrated, under the control of an inducible Tet-On promoter. We expect that broad availability of this collection will enable a deeper understanding of the coronavirus life cycle and its impact on host RNA biology. We selected 14 proteins for initial analysis, based on putative roles in viral or host RNA metabolism ( Figure 1 and Table 1 ). Two of the selected proteins, Nsp7 and Nsp8, reportedly form a stable heterodimer in vivo (Gao et al, 2020; Hillen et al, 2020; te Velthuis et al, 2012) , so we also designed constructs in which Nsp7 and Nsp8 were expressed as a fusion protein, connected by a short, unstructured linker. For cloning, we selected pcDNA5-FRT/TO (Thermo Fisher) as the backbone vector ( Figure 2A ). This vector can be used for transient transfection or flippase (Flp) recombinase-mediated integration into the genome of cells with a pre-inserted Flp Recombination Target (FRT). It also carries a hygromycin resistance gene to allow selection for integration ( Figure 2 ). As host cells, we used HEK293 Flp-In T-REx cells (293FiTR), but other cell lines that carry an FRT site could also be used. The expression levels of viral proteins vary substantially during infection, so we used constructs that, in addition to stably integrating, were expressed under the control of a tetracyclineregulated human cytomegalovirus (CMV) ⁄ TetO2 promoter, which is induced by the addition of doxycycline to the medium (Figure 2 ). Viral protein expression can then be titrated by varying either doxycycline concentration or induction time. Size ( Using pcDNA5-FRT/TO as a starting point, we generated two additional parental vectors with pre-inserted tandem-affinity purification tags, either an N-terminal, FH-tag (FLAG-Ala4-His8) or Cterminal HF-tag (His8-Ala4-FLAG) ( Figure 2A) . We have recently shown that these tags work well for tandem affinity purification, including in the denaturing conditions used for CRAC (Bresson et al., 2020) . Each synthetic construct was codon-optimized and included a consensus Kozak sequence upstream of the start codon ( Figure 2B ). The sequences initially used for all ORFs were generated by the algorithms used by Integrated DNA Technologies (IDT; Coralville, Iowa). The prevalence of G-C base pairs, particularly in the third position of codons (GC3), is strongly correlated with increased protein accumulation (Kudla et al, 2006; Mordstein et al, 2020) . To potentially enhance protein synthesis, we ordered alternative ORFs for Nsp8, Nsp13, and N, using the algorithms from GeneArt (Thermo-Fisher Scientific), which have a higher G-C content, particularly in third codon positions. The insert sequences were designed such that a single synthetic construct could be used to generate an untagged, or N-or C-terminal fusion protein ( Figure 2B ). BamHI and EcoRV restriction sites were placed on either side of the open reading frame, together with an AvrII site overlapping the stop codon (important for C-terminal cloning, as discussed below). Generating the untagged and N-terminal tagged constructs was straightforward. The synthetic constructs were cut with BamHI and EcoRV and ligated into pre-cut vector, either pcDNA5-FRT/TO (generating an untagged construct), or pcDNA5-FRT/TO-N-Flag-His8 (generating an Nterminally tagged construct) ( Figure 2C ). The resulting plasmids were verified by colony PCR and Sanger sequencing. Generating the C-terminal tagged construct required additional steps to remove the in-frame stop codon upstream of the EcoRV restriction site. Cleavage of the AvrII restriction site, which overlaps the stop codon in the synthetic constructs ( Figure 2B ), left a 5′ overhang containing the stop codon. Subsequently, the overhang (and thus the stop codon) was removed by treatment with Mung Bean Nuclease, an ssDNA endonuclease ( Figure 2C ). The resulting fragment possessed a blunt 3′ end, compatible with the EcoRV restriction site in the target vector. Subsequently, the 5′ end of the insert was prepared by digestion with BamHI, and the resulting fragment was ligated into pcDNA5-FRT/TO-C-His8-Flag pre-cut with BamHI and EcoRV. In total, we generated 54 viral protein expression vectors, and 3 additional GFP expression vectors as controls. These constructs, together with the parental tagging vectors, are listed in Table 2 . Each construct was introduced into Flp-In T-REx cells (Thermo Fisher) by transfection, followed by hygromycin treatment for 10-16 days to select for chromosomal integration ( Figure 2E ). In the initial experiments, all constructs except the Nsp1 series, and the GC3 (high-expression) optimized versions of untagged N and FH-N protein yielded stable hygromycin-resistant cells (Table 2, Figure S1 . Quantitation is shown in Figure S2 , and associated raw data are presented in Table S4 . Clear protein expression was observed for 29 of the 34 tagged constructs (Table 2 , column 5). In general, C-terminal tagged proteins were more highly expressed than their N-terminally tagged counterparts. As described above, we were initially unable to generate stable cell lines for any of the Nsp1 constructs. To confirm that Nsp1 could be expressed in cells, we transiently transfected each construct into 293FiTR cells, and confirmed protein expression by western blot. Only the Cterminally tagged Nsp1 showed robust expression ( Figure S3 ). To allow absolute quantification of SARS-CoV-2 protein expression, we used the N-terminally tagged N protein (integrated from plasmid 22; moderate expression) as a reference standard. Cells containing integrated FH-N were treated with doxycycline for 0, 6, and 18 hours. Protein was extracted, separated by SDS-PAGE and analyzed using mass spectrometry with label-free quantification. To compare proteins, their abundance was expressed as a percentage of the total proteome. This value was calculated using the relative, intensity-based absolute quantification (riBAQ) score for each protein, which represents the iBAQ score for a given protein divided by the iBAQ scores for all proteins. After induction for six hours, N protein comprised 0.022% of the cellular proteome ( Figure 3A ). Because all of the tagged proteins possessed an identical FLAG epitope, aliquots of this 6h N protein sample could then be used as a reference standard on all subsequent western blots (e.g. lane 1 in Figure 3B ), to allow similar abundance estimates for the other viral proteins ( Figure S2 and Table S3 ). This approach allows viral protein levels to be carefully titrated, so that protein induction approximately matches physiological conditions. This is important because different viral proteins show extreme differences in expression during infection. The nucleocapsid (N) protein can represent 2% of total protein, whereas the non-structural proteins may be 2 to 3 orders of magnitude less abundant (Finkel et al, 2020; Grenga et al, 2020) . Moreover, all proteins will vary in their abundance throughout the course of infection. In summary, we report the construction of 57 expression vectors and cell lines. These plasmids can be obtained from Addgene and cell lines are available. We expect that these collections will be a valuable resource for future research into the mechanisms by which coronavirus exploits the genetic machinery of its host to facilitate its own replication. All viral genes were cloned into each of three parental vectors: pcDNA5-FRT-TO (to generate an untagged version of the protein), pcDNA5-FRT-TO-N-Flag-His8 (N-terminal tag), and pcDNA5-FRT-TO-C-His8-Flag (C-terminal tag). The N-terminal tag consisted of a single Flag motif, a four-alanine spacer, eight consecutive histidine residues, and a short unstructured linker (DYKDDDDKAAAAHHHHHHHHGSG). The C-terminal tag was essentially the same but in reverse (SGGHHHHHHHHAAAADYKDDDDK). To generate the Flag-His8 tag, we first designed partially complementary DNA oligonucleotides The pcDNA5-FRT-TO-C-His8-Flag sequence was generated as described above for the Nterminal tagging vector, with the following changes: 1) the oligos used for hybridization were oSB709 and oSB710, and 2) the insert and pcDNA5-FRT-TO were digested with 40U of EcoRV-HF (NEB) and 40U of XhoI (NEB). pcDNA5-FRT-TO and pcDNA-FRT-TO-N-Flag-His8 were each digested in a 50 µL reaction consisting of 2 µg DNA, 1X Cutsmart buffer, 20U of BamHI-HF, and 20U of EcoRV at 37°C for two hours. In parallel, 2µg of plasmid (KanR) containing the desired viral gene was digested under identical conditions. All three reactions were purified using a PCR cleanup kit. Subsequently, the acceptor plasmids were phosphatase treated as described above and again purified using a PCR cleanup kit. Digested vector and insert were ligated together in a reaction consisting of 40 ng vector, 120 ng insert, 50 mM Tris-HCl 7.5, 10 mM MgCl2, 1 mM ATP, 10 mM DTT, and 400 U of T4 DNA ligase (NEB) in a 10 µL reaction volume. The ligation mix was transformed into homemade DH5α E. coli, and plated overnight on LB-Amp. Colony PCR was performed using oAH195-196 to verify the presence of the insert, and positive clones were confirmed by Sanger sequencing. To prepare the backbone, 2 µg of pcDNA5-FRT-TO-C-His8-Flag was digested with BamHI-HF and EcoRV-HF, as above, followed by DNA purification. To prepare viral gene inserts, 1 µg of pUC containing the relevant gene was initially digested with AvrII in the same reaction conditions followed by purification. The 5' overhang, encoding the stop codon, was then removed by digestion with Mung Bean Nuclease (New England Biolabs): 1 U of Mung Bean Nuclease was added to 1 µg of digested vector in 30 µl of 1x Cutsmart buffer, and incubated for 30 minutes at 30 °C. . After purification, the final insert was produced by digestion with BamHI-HF. The insert was then ligated into the backbone after purification, following procedure described above with a 4:1 molar ratio of insert to backbone. The eGFP insert was amplified using pEGFP-N2 (Clontech) as a template and the DNA oligonucleotides oAH211 and oAH212 (untagged and N-terminal cloning) or oAH211 and oAH213 (C-terminal cloning). Subsequently, the PCR-generated inserts were cloned into pcDNA5-FRT-TO, pcDNA-FRT-TO-N-Flag-His8, and pcDNA5-FRT-TO-C-His8-Flag as described above for the viral constructs. Generation of stable cell lines HEK293 Flip-In TREx cells (Thermo Fisher) were cultured at 37°C with 5% CO2 in DMEM (Thermo-Fisher) supplemented with 10% tetracycline-tested FBS (Sigma), 100 µg/mL Zeocin (Thermo-Fisher), and 15 µg/mL Blasticidin S (Sigma). Approximately 1x106 cells were seeded without antibiotics on six-well plates 24 hours prior to transfection. The following day, the viral expression constructs and pOG44 (the FRT recombinase) were co-transfected in a 1:9 ratio (1 µg total) using Lipofectamine 2000 (Thermo-Fisher) according to the manufacturer's protocol. The medium was replaced approximately five hours later to remove the transfection reagents. The next day, the cells were split to a 10 cm plate, and after an additional 24 hours, hygromycin B (150 µg/mL) and blasticidin S (15 µg/mL) were added to the medium. Stable integrants were selected over the course of 10-16 days, with medium replacement at regular intervals. Thereafter, stable cell lines were maintained in hygromycin B and blasticidin S. For the FH-N constructs that initially yielded no colonies, transfection was repeated with the addition of pcDNA6/TR (Invitrogen), encoding the tetracycline repressor protein: total transfected DNA was kept at 1 µg, with pcDNA5/FRT-TO, pcDNA6/TR and pOG44 used at a ratio of 1:4.5:4.5. Approximately 2x105 cells were seeded without antibiotics on 24-well plates. The following day, 0.2 µg of FH-Nsp1 or Nsp1-HF was transfected into cells using Lipofectamine 2000 (Thermo-Fisher) according to the manufacturer's protocol. For induction tests, 1-2x105 cells were plated into 6 wells each of a 24-well plate. in Tris-Glycine running buffer at 100 V. Subsequently, the gel was rinsed with water, stained for 1 hour with Imperial Protein Stain (Thermo Scientific), rinsed several times, and destained in water for three hours. Each lane was cut into four fractions and processed using in-gel digestion and the STAGE tip method, as previously described (Bresson et al., 2020; Rappsilber et al, 2007) . The TT, and AH conceived the project. SB, NR, AH, ES, TT, and DT analyzed data and wrote the manuscript. All authors edited and reviewed the manuscript. The authors declare no competing interests. Table S1 . Oligonucleotides used in this work. Table S2 . Sequences of fusion protein ORFs. Table S3 . Raw data for Figure S2 . Table S4 . Mass-spectrometry data for N expression. Western analyses of fusion protein expression. The severe acute respiratory syndrome coronavirus Nsp15 protein is an endoribonuclease that prefers manganese as a cofactor Stress-induced translation inhibition through rapid displacement of scanning initiation factors A proposed role for the SARS-CoV-2 nucleocapsid protein in the formation and regulation of biomolecular condensates Biochemical and structural insights into the mechanisms of SARS coronavirus RNA ribose 2'-O-methylation by nsp16/nsp10 protein complex Infidelity of SARS-CoV Nsp14-exonuclease mutant virus replication is revealed by complete genome sequencing The severe acute respiratory syndromecoronavirus replicative protein nsp9 is a single-stranded RNA-binding subunit unique in the RNA virus world The coding capacity of SARS-CoV-2 Severe acute respiratory syndrome coronavirus ORF6 antagonizes STAT1 function by sequestering nuclear import factors on the rough endoplasmic reticulum/Golgi membrane Structure of the RNA-dependent RNA polymerase from COVID-19 virus A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Identification of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking and high throughput analysis of cDNAs Shotgun proteomics analysis of SARS-CoV-2-infected cells and how it can optimize whole viral particle antigen production for vaccines Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses SARS coronavirus nsp1 protein induces template-dependent endonucleolytic cleavage of mRNAs: viral mRNAs are resistant to nsp1-induced RNA cleavage A high ATP concentration enhances the cooperative translocation of the SARS coronavirus helicase nsP13 in the unwinding of duplex Structural plasticity of SARS-CoV-2 3CL M(pro) active site cavity revealed by room temperature X-ray crystallography Structural analysis of the SARS-CoV-2 methyltransferase complex involved in coronaviral RNA cap creation High guanine and cytosine content increases mRNA levels in mammalian cells The ORF6, ORF8 and nucleocapsid proteins of SARS-CoV-2 inhibit type I interferon signaling pathway Crystal Structure of the SARS-CoV-2 Non-structural Protein 9 Severe acute respiratory syndrome coronavirus protein nsp1 is a novel eukaryotic translation inhibitor that represses multiple steps of translation initiation Structural basis and functional analysis of the SARS coronavirus nsp14-nsp10 complex Codon Usage and Splicing Jointly Influence mRNA Localization SARS-CoV 3CL protease cleaves its C-terminal autoprocessing site by novel subsite cooperativity The enzymatic activity of the nsp14 exoribonuclease is critical for replication of Middle East respiratory syndrome-coronavirus The PRIDE database and related tools and resources in 2019: improving support for quantification data Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips Defining the RNA Interactome by Total RNA-Associated Protein Purification Release of severe acute respiratory syndrome coronavirus nuclear import block enhances host transcription in human lung cells The nsp9 Replicase Protein of SARS-Coronavirus Severe acute respiratory syndrome coronavirus nsp1 facilitates efficient propagation in cells through a specific translational shutoff of host mRNA The severe acute respiratory syndrome (SARS) coronavirus NTPase/helicase belongs to a distinct class of 5' to 3' viral helicases The SARS-coronavirus nsp7+nsp8 complex is a unique multimeric RNA polymerase capable of both de novo initiation and primer extension Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2 Identification and Characterization of a Human Coronavirus 229E Nonstructural Protein 8-Associated RNA 3'-Terminal Adenylyltransferase Activity Structural Basis of RNA Cap Modification by SARS-CoV-2 Coronavirus SARS-CoV-2 nsp13, nsp14, nsp15 and orf6 function as potent interferon antagonists Quantitation of western data from Figure S1 .loading control Nsp1 induction (hr): 3 6 12 24 3 6 12 24 FH-Nsp1 Nsp1-HF Figure S3 . Transient transfection of FH-Nsp1 (derived from plasmid #12) and Nsp1-HF (derived from plasmid #39). Protein expression was assessed following 3 to 24 h induction.