key: cord-0000339-sbnnh2mm authors: Pellet, J.; Tafforeau, L.; Lucas-Hourani, M.; Navratil, V.; Meyniel, L.; Achaz, G.; Guironnet-Paquet, A.; Aublin-Gex, A.; Caignard, G.; Cassonnet, P.; Chaboud, A.; Chantier, T.; Deloire, A.; Demeret, C.; Le Breton, M.; Neveu, G.; Jacotot, L.; Vaglio, P.; Delmotte, S.; Gautier, C.; Combet, C.; Deleage, G.; Favre, M.; Tangy, F.; Jacob, Y.; Andre, P.; Lotteau, V.; Rabourdin-Combe, C.; Vidalain, P. O. title: ViralORFeome: an integrated database to generate a versatile collection of viral ORFs date: 2009-12-08 journal: Nucleic Acids Res DOI: 10.1093/nar/gkp1000 sha: f87d2665175ce7e1c6ffd14e0da7bfc21b784e2e doc_id: 339 cord_uid: sbnnh2mm Large collections of protein-encoding open reading frames (ORFs) established in a versatile recombination-based cloning system have been instrumental to study protein functions in high-throughput assays. Such ‘ORFeome’ resources have been developed for several organisms but in virology, plasmid collections covering a significant fraction of the virosphere are still needed. In this perspective, we present ViralORFeome 1.0 (http://www.viralorfeome.com), an open-access database and management system that provides an integrated set of bioinformatic tools to clone viral ORFs in the Gateway® system. ViralORFeome provides a convenient interface to navigate through virus genome sequences, to design ORF-specific cloning primers, to validate the sequence of generated constructs and to browse established collections of virus ORFs. Most importantly, ViralORFeome has been designed to manage all possible variants or mutants of a given ORF so that the cloning procedure can be applied to any emerging virus strain. A subset of plasmid constructs generated with ViralORFeome platform has been tested with success for heterologous protein expression in different expression systems at proteome scale. ViralORFeome should provide our community with a framework to establish a large collection of virus ORF clones, an instrumental resource to determine functions, activities and binding partners of viral proteins. The number of viral genomic sequences available in public databases has increased exponentially, opening new perspectives to understand genetic basis and functional mechanisms that underlie virus replication, pathogenesis and evolution. In particular, this enabled to establish for each virus a list of potential regulatory and expressed sequences, a framework often referred as the 'parts list' of biological systems (1) . Current investigations aim at understanding how these viral components act upon each other and interact with host macromolecules to carry on viral replication and spreading. To reach such a system view of virus cycles, more functional analyses of viral components are necessary, especially in the field of virus-host molecular interactions (2, 3) . To address this question, a large collection of viral open reading frames (ORFs) established in a recombination-based cloning system allowing their mass transfer into various functional assays would be extremely helpful. Such ORF collections, often referred as 'ORFeomes', have been developed for human and few other organisms and represent great resources to explore protein functions in a large-scale setting (4) (5) (6) (7) (8) (9) . Recombination-based cloning technologies like the Gateway Õ system enable the mass cloning of polymerase chain reaction (PCR)-amplified ORFs into a 'donor' vector to create 'entry' clones. Once entry clones have been established, ORFs can be easily recombined into different 'destination' vectors that allow protein expression. So far, only few viral ORFeomes have been built and they are dedicated to a single virus, arguing that much more needs to be achieved (10) (11) (12) (13) . Because of their small-sized genomes, most viruses exhibit a limited number of ORFs. Therefore, building a viral ORFeome collection covering a significant number of viral pathogens can appear like a manageable project. However, this eventually becomes a daunting task when considering virus strains and polymorphisms that affect viral proteins (Table 1) . Nonetheless, these variations cannot be neglected since they often determine virulence and/or host adaptation. For example, a unique amino acid mutation in the polymerase of poliovirus can alter its processivity, turning a highly pathogenic virus into an attenuated strain that can eventually be used as a vaccine (14) . Similarly, a Semliki Forest virus strain encoding an nsP2 protein, with a single amino acid mutation in its nuclear localization signal, is impaired for the control of type I interferon response and strongly attenuated in vivo (15) . Thus, specific care must be taken when cloning a virus ORFeome, since an accurate identification of the strain that is used as a template is absolutely critical, both in terms of genotype and phenotype. For this reason, wild-type virus strains corresponding to primary isolates will be generally preferred to culture-adapted laboratory strains that often exhibit an altered phenotype in vivo. Such biological samples are often difficult to obtain, since it requires access to patients, medical-care facilities and laboratories of an appropriate biosafety level. With such constraints to obtain suitable viral RNA or DNA templates, a collaborative effort between virology laboratories is needed to build a comprehensive viral ORFeome resource. This motivated the development of ViralORFeome 1.0, an open-access database and management system designed to assist academic laboratories in the development of a viral ORFeome collection using a recombination-based cloning technology. ViralORFeome is a database for creating and managing viral ORF clones. It includes a web interface to search and display viral sequences and annotations downloaded from GenBank. When users have identified the viral ORF sequences that they want to clone, clicking on target sequences automatically creates virtual clones in ViralORFeome database together with suitable cloning primers. These primers are designed for PCR amplification and cloning of viral ORFs using the Gateway Õ system. To determine if 'entry' clones produced at the bench exhibit expected sequences, sequence traces can be uploaded in ViralORFeome, automatically aligned, and compared to corresponding virtual clones. Finally, ViralORFeome database allows users to manage their collection of viral ORF clones, keep track of expression plasmids derived from 'entry' clones and share plasmids with other laboratories. System architecture and data flow are depicted in Figure 1 . The World Wide Web server for ViralORFeome is Apache (http://www.apache.org/) with Hypertext Preprocessor (PHP, http://www.php.net) and Asynchronous JavaScript and XML (AJAX) technologies. The relational database management system for ViralORFeome is PostgreSQL (http://www.postgresql.org/). ViralORFeome Entity-Relationship (ER) model was divided into five interconnected parts: 'Taxonomy', 'Genome', 'ORFeome', 'Interactome' and 'Users' schemes (Supplementary Figure S1 ). ViralORFeome 1.0 is based on conventional sequence and virus databases. Virus and host organism classification was retrieved from the taxonomy database at National Center for Biotechnology Information [NCBI; ftp://ftp .ncbi.nih.gov/pub/taxonomy/; (16) ]. Complete viral CDS and associated annotations were downloaded from the CoreNucleotide division of GenBank [http://www.ncbi .nlm.nih.gov/sites/entrez?db=nucleotide; (17) ] using Entrez Programming Utilities (eUtils) at NCBI (see Supplementary Data). ICTV names were obtained from the most recent publication of the International Committee on Taxonomy of Viruses using Perl scripts (18) . Finally, host gene and protein sequence annotations were extracted from Ensembl database (19) . ViralORFeome database can be accessed at: http://www .viralorfeome.com. When selecting 'Virus' option in the menu bar, ViralORFeome web interface allows users to navigate throughout viral sequences retrieved from GenBank by entering criteria such as virus species, taxon IDs or accession numbers (Supplementary Figure S2 ). ViralORFeome can also execute a BLASTn or BLASTp search among available viral sequences (20) . Virus genomes and coding sequences of interest are visualized with an integrated genome browser adapted from GBrowse that provides a synthetic view of genomic sequence features (21) . When users have identified a viral ORF sequence that they want to clone, clicking on this sequence automatically generates cloning primers and virtual clones in ViralORFeome database ( Figure 2 ). ORF-specific Gateway Õ primers are designed using the Oligonucleotide Selection Program (OSP) software adapted for recombinational cloning (22) . By default, primers are designed to clone full-length ORFs from ATG to STOP codon according to GenBank annotations. However, users can specify 5 0 -and 3 0 -coordinates to clone ORF fragments corresponding to specific domains or even upload manually designed primers. Once primers have been validated, virtual clones are generated and used as entry points in ViralORFeome to store and access all information relative to the viral ORF constructs (e.g. primers, sequencing traces, potential mutations, and comments). Importantly, when users want to clone some variant of an ORF that is not defined in GenBank, and therefore is not available in ViralORFeome, an interface allows for the creation of a 'variant' clone that is anchored to the most similar annotated sequence found by BLAST in the database (Supplementary Figure S3 ). This ensures a maximum of flexibility when viral sequences are known and characterized but not readily available in GenBank, a frequent situation when working with primary isolates. This also facilitates the management of mutants generated on purpose to study the role of specific residues in a protein's function. In this recombination-based cloning pipeline, viral ORFs amplified by PCR or RT-PCR are cloned by in vitro recombination into donor vectors such as pDONR207 or pDONR223 (see Supplementary Data for detailed protocols). After transformation, a construct can be purified either from a mini-pool of bacteria colonies to keep the sequence diversity of the original template (e.g. when working with viral quasi-species) or from a single isolated colony. These two cloning strategies, referred as 1.0 and 2.0, respectively (5), are manageable using ViralORFeome interface (Figure 2 ). To validate 'entry' constructs by sequencing, ViralORFeome automatically aligns uploaded sequence traces onto virtual clones using either an extended Libalign Perl tool or Phred and T-Coffee programs that build first a contig of sequence traces before the alignment is performed (23, 24) . Validated 'entry' constructs can be subsequently used to recombine viral ORFs into various Gateway Õ -compatible destination vectors to achieve protein expression. ViralORFeome allows users to keep track of the different expression constructs that have been established for each viral ORF. Users can browse the set of viral ORF clones already established in ViralORFeome by selecting the 'clones' option in the menu bar. Queries can be performed by entering criteria such as virus species, GenBank protein accession numbers or ViralORFeome clone IDs. Viral ORF clones can also be searched by nucleotide or protein BLAST. Until now, three laboratories involved in the Infection-MAPping project (I-MAP) have used ViralORFeome interface and generated a collection of 528 viral ORFs cloned into pDONR vectors. Among them, 145 have already been stored in a dedicated repository for viral ORFeome resource (ORFeotheque; Hospices Civils de Lyon), and 134 are available upon request under material transfer agreement via ViralORFeome. This set of viral ORF clones can be searched selecting the 'ORFeotheque' option in the menu bar. To validate the viral ORF clone collection built using ViralORFeome pipeline, we tested a subset of 66 ORFs (Figure 3a ORFs were expressed in fusion downstream of the red fluorescent protein Cherry and constructs transfected in human cells to achieve subcellular localization (Figure 3b ). We observed a specific localization pattern distinct from Cherry alone for most constructs, demonstrating that viral ORFs are properly expressed in this system. In addition, and even if the N-terminal tag was expected to alter the localization of proteins like viral envelopes, observed localization patterns were generally consistent with literature. For example, expression of nsP1 not only from Semliki Forest virus and Sindbis virus (25) but also Chikungunya virus induced characteristic filopodia-like extensions in transfected cells. The capsid of yellow fever virus (C) was found in the nucleus where it accumulated in a dot-like pattern, reminiscent of nucleoli, as reported for C of dengue and Japanese encephalitis viruses (26, 27) . As numerous viral proteins are known to inhibit the host immune response, the 66 ORFs were also tested for their ability to block signaling downstream of two key antiviral cytokines: Interferon-b (IFN-b) and tumor necrosis factor-a (TNF-a). Each ORF was expressed in fusion downstream of the 3xFLAG tag and, using luciferase reporter constructs, tested for its ability to inhibit IFN-b or TNF-a signaling (Figure 3c and d) . Both pathways were inhibited by nsP2 from Chikungunya and Semliki forest viruses. This is consistent with previous reports that used mutant viruses or replicons to demonstrate that nsP2 from Old World Alphaviruses induces a transcriptional shutoff and Figure 2 . Building a viral ORF collection using ViralORFeome interface. Viral sequences and annotations from GenBank are visualized with a genome browser that provides a synthetic view of sequence features (1) . CDS are shown in blue and proteins in green. Users can design a new clone by clicking on a viral protein of interest (2) . By default, ViralORFeome will anchor cloning primers at the extremities of selected ORFs (Method 1), but user can specify 5 0 -and 3 0 -coordinates and clone ORF fragments corresponding to specific domains. Users can also upload manually designed primers (Method 2). ViralORFeome will automatically design Gateway Õ cloning primers (3) and after validation (4), a virtual clone is created in the database (5) . Users need to select between two cloning strategies, 1.0 ('in pool') or 2.0 ('individual clone'), before they can access a webpage where all information relative to the construct are stored (6) . This includes clone coordinates, primers, sequence and comments (upper panel), sequencing traces and alignments (middle panel), and available entry and destination vectors to achieve viral ORF expression (lower panel). When back to the genome browser (1), viral ORF clones are displayed in red (1.0 constructs) or purple (2.0 constructs). A CMV-Renilla plasmid was also co-transfected and used as an internal control for transfection efficiency and cell viability. After transfection, cells were incubated for 24 h in the presence of IFN-b (c) or TNF-a (d) to activate ISRE or NF-kB response elements, respectively. Cells in position A1 and B1 correspond to negative and positive controls that were respectively left untreated or stimulated with IFN-b or TNF-a. Relative luciferase activity was determined using a chemiluminescent substrate, and results expressed in relative percentage to positive control. Data show one representative experiment out of two. controls the host antiviral response (15, 28) . In contrast, nsP2 derived from a Sindbis virus infectious cDNA clone (29) did not localize in the nucleus and failed to block signaling. This supports previous reports showing that nsP2 from Alphaviruses must be nuclear to control the antiviral response (15) . We also confirmed that V proteins of measles and mumps viruses and V and W proteins of Nipah virus block IFN-b signaling (30) . Interestingly, the V protein of Tioman virus was unable to do so, suggesting that this virus infecting flying foxes (Pteropus genus) is not adapted to human cells. Altogether, observed localization patterns and functional assays validate the clone collection that was generated with ViralORFeome platform. Furthermore, these results illustrate how large collections of viral ORF clones established in a versatile cloning system provide access to reverse proteomic platforms and large-scale functional assays. In conclusion, ViralORFeome is the first open-access database that provides an integrated set of bioinformatic tools to build a collection of viral ORFs clones in a versatile system suitable for reverse proteomic experiments. In this perspective, ViralORFeome was especially designed to handle the diversity of virus strains, variants and species. As shown here, our cloning pipeline has been validated using functional assays and a collection of 528 viral ORFs has been generated in the Gateway Õ system. As collaborative efforts between virology laboratories are required to establish an ORFeome collection covering most viral genera and species, we believe ViralORFeome will provide the community with a framework to sustain this global effort. In the near future, our objective will be to motivate more laboratories to join this program. In addition, we would like to implement new modules allowing users to store and visualize all kind of functional data generated with viral ORFs from the collection. This virus clone collection was primarily developed to map virus-host protein-protein interactions using the yeast two-hybrid (Y2H) system as a collaborative effort between co-authors of this manuscript. Current storage and display of virushost interaction data obtained by Y2H screening of HCV proteins constitute a first attempt to reach this goal. The 'Interaction' option of the menu bar allows users to select HCV ORFs and access all interacting cellular cDNA clones identified by Y2H. Whereas filtered data sets have already been published (12) , ViralORFeome interface allows users to access Y2H raw data to filter these results according to their own quality criteria. Other functional data such as subcellular localization or interference with immune response pathways should be implemented. Public access to the ViralORFeome is available at: http://www.viralorfeome.com. Registration is not necessary, and use of the database is free. Users who want to generate viral ORF clones that are tagged with their institutional name can create an account to register. ORFeome cloning and systems biology: standardized mass production of the parts from the parts-list From ORFeomes to protein interaction maps in viruses VirHostNet: a knowledge base for the management and the analysis of proteome-wide virus-host interaction networks C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression Human ORFeome version 1.1: a platform for reverse proteomics Generation of the Brucella melitensis ORFeome version 1.1 Development of a functional genomics platform for Sinorhizobium meliloti: construction of an ORFeome ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe The ORFeome of Staphylococcus aureus v 1.1 Epstein-Barr virus and virus human protein interaction maps Analysis of intraviral protein-protein interactions of the SARS coronavirus ORFeome Hepatitis C virus infection protein network Analysis of vaccinia virus-host protein-protein interactions: validations of yeast two-hybrid screenings Engineering attenuated virus vaccines by controlling replication fidelity Semliki Forest virus nonstructural protein 2 is involved in suppression of the type I interferon response The NCBI. Publicly available tools and resources on the Web Database resources of the National Center for Biotechnology Information Further progress in ICTVdB, a universal virus database Gapped BLAST and PSI-BLAST: a new generation of protein database search programs The generic genome browser: a building block for a model organism system database OSP: a computer program for choosing PCR and DNA sequencing primers Base-calling of automated sequencer traces using phred. I. Accuracy assessment T-Coffee: a novel method for fast and accurate multiple sequence alignment Alphavirus replicase protein NSP1 induces filopodia and rearrangement of actin filaments Intracellular localization and determination of a nuclear localization signal of the core protein of dengue virus Nuclear localization of Japanese encephalitis virus core protein enhances viral replication The Old World and New World alphaviruses use different virus-specific proteins for induction of transcriptional shutoff Production of infectious RNA transcripts from Sindbis virus cDNA clones: mapping of lethal mutations, rescue of a temperaturesensitive marker, and in vitro mutagenesis to generate defined mutants Inhibition of interferon induction and signaling by paramyxoviruses pISTil: a pipeline for yeast two-hybrid Interaction Sequence Tags identification and analysis We thank all members of PF1-Pasteur Genopole sequencing core facility, in particular C. Bouchier and C. Gouyette. We thank Drs Marie-Louise Michel, Charles M. Rice, Kaoru Takeuchi, T. Fabian Wild and Ali Amara for providing RNA or DNA templates used to build viral ORF clones. We also thank Eric Coissac for providing SQL codes. The Institut National de la Sante´et de la Recherche Me´dicale; the Institut National de la Recherche Agronomique; the Association de la Recherche contre le Cancer (3731XA0531F and 4867); the Ligue Nationale Contre le Cancer (RS07/75-75); the Agence Nationale de la Recherche (EPI-HPV-3D); the French Ministry of Industry; the Institut Pasteur; the Centre National de la Recherche Scientifique (Maladies Infectieuses Emergentes to P.O.V., G.C. and F.T.). Funding for open access charge: Institut Pasteur. Supplementary Data are available at NAR Online.Conflict of interest statement. None declared.