key: cord-0000338-8y8ho2x5 authors: Bekaert, Michaël; Firth, Andrew E.; Zhang, Yan; Gladyshev, Vadim N.; Atkins, John F.; Baranov, Pavel V. title: Recode-2: new design, new search tools, and many more genes date: 2009-09-25 journal: Nucleic Acids Res DOI: 10.1093/nar/gkp788 sha: fef7a8f796c4068d359d7b65f03ddba1841d1738 doc_id: 338 cord_uid: 8y8ho2x5 ‘Recoding’ is a term used to describe non-standard read-out of the genetic code, and encompasses such phenomena as programmed ribosomal frameshifting, stop codon readthrough, selenocysteine insertion and translational bypassing. Although only a small proportion of genes utilize recoding in protein synthesis, accurate annotation of ‘recoded’ genes lags far behind annotation of ‘standard’ genes. In order to address this issue, provide a service to researchers in the field, and offer training data for developers of gene-annotation software, we have gathered together known cases of recoding within the Recode database. Recode-2 is an improved and updated version of the database. It provides access to detailed information on genes known to utilize translational recoding and allows complex search queries, browsing of recoding data and enhanced visualization of annotated sequence elements. At present, the Recode-2 database stores information on approximately 1500 genes that are known to utilize recoding in their expression—a factor of approximately three increase over the previous version of the database. Recode-2 is available at http://recode.ucc.ie The term 'translational recoding' describes the utilization of non-standard decoding during protein synthesis and encompasses such processes as ribosomal frameshifting, codon redefinition, translational bypassing and StopGo (1) (2) (3) (4) (5) (6) (7) . What is often considered as a decoding error-e.g. a frameshifting error or mistranslation of a particular codon-may occasionally benefit the organism by increasing its fitness and survival. In such instances the propensity for the decoding 'error' may be selected for during evolution, leading to the formation of a particular sequence context that elevates the frequency of the 'error'. To discriminate such cases of programmed decoding 'misbehaviour' from promiscuous translational errors or translational noise, the term recoding is used. The position within an mRNA where a recoding event takes place is termed the 'recoding site'. Sequence elements responsible for increasing the efficiency of recoding events are termed 'recoding stimulatory signals', and a minimal sequence fragment that allows recoding to take place at the natural efficiency (i.e. relative to the level of standard decoding at the recoding site) is termed a 'recoding cassette'. Recoding can benefit gene expression in a number of ways. It can regulate gene expression by being part of a sensor for particular cellular conditions. Prominent examples include ribosomal frameshifting in bacterial release factor 2 (RF2) and eukaryotic antizyme mRNAs. In both instances, ribosomal frameshifting is required for the production of the corresponding active full-length protein products. In the RF2 mRNA, the efficiency of frameshifting is negatively regulated by the cellular concentration of its product, RF2, providing an autoregulatory circuit for its biosynthesis (8) (9) (10) . In the antizyme mRNA, the efficiency of frameshifting is modulated by cellular levels of polyamines, whose concentration in turn is controlled by antizyme (11, 12) . Thus, this mechanism ensures the maintenance of antizyme production at the levels required to support physiologically appropriate concentrations of polyamines. Recoding can also be used for the diversification of protein products encoded by a single gene. An illustrative example is in bacterial dnaX mRNA, where frameshifting allows synthesis of two different protein subunits-sharing the same N-terminal part-from a single open reading frame (ORF) in its mRNA (13) (14) (15) . A presumed constant ratio of frameshifting in dnaX ensures a fixed stoichiometric balance between these two subunits (16) . This balance, then, is independent of the absolute levels of dnaX transcription and translational initiation on its mRNA. Similarly, in many viruses recoding is responsible for setting a ratio between protein products (such as those encoded by gag-pro-pol genes in retroviruses) produced from a single mRNA (17) . Recoding also provides RNA viruses with a mechanism for the translation of downstream ORFs on polycistronic RNAs [other mechanisms include leaky scanning, shunting, reinitiation, IRESs and the production of subgenomic RNAs (18) ] and may also be involved in global regulation mechanisms, such as mediating the switch between translation and replication on the same genomic RNA (19) . Finally, recoding provides a way for the incorporation of non-standard amino acids-e.g. amino acids that share their codons with termination signals (the most prominent example of which is selenocysteine, encoded by UGA) (20) (21) (22) . For further information on the diverse variety of recoding functions, see recent reviews (1, 3, 7, 23, 24) . Recoding cassettes may be composed of a variety of diverse sequence elements. For example, primary nucleotide sequences may promote re-arrangements of tRNA molecules relative to their codons in mRNA inside the ribosome or affect recognition of tRNAs or release factors in the ribosomal A-site. On the other hand, many recoding signals act in the form of RNA secondary structures, such as simple stem-loops, or more complex pseudoknots, kissing stem-loops and other structures that involve interactions between considerably distant RNA regions (19, (25) (26) (27) (28) . Trans-acting RNA signals affecting ribosomal decoding through complementary interactions with ribosomal RNA (29-32), or through the nascent peptide acting within the ribosome exit tunnel (6, 33, 34) , are also known. Some recoding events-such as selenocysteine insertion-require the presence of additional specialized machinery such as selenocysteine tRNAs, selenocysteine-specific translation factors and several other components of the selenocysteine biosynthesis and insertion pathway (20, (35) (36) (37) . Recent reviews on stimulatory signals involved in the modulation of recoding events and molecular mechanisms of recoding provide further details (7, 25, 27, 38, 39) . Despite considerable progress in the development of computational tools for the prediction of protein coding genes in sequenced genomes, the identification and annotation of recoded genes lags far behind. The hurdle lies not so much in the fact that recoded genes do not obey standard rules of genetic readout but, rather, in the considerable diversity of recoded genes and sequence elements responsible for recoding. Even among evolutionarily related genes, all utilizing recoding, the diversity of recoding signals can be considerable. An extreme example is when orthologous genes utilize recoding at different stages of gene expression to achieve the same goal. An example is in dnaX, where ribosomal frameshifting is employed by enterobacteria, but transcriptional slippage is used in Thermus thermophilus (40) . A similar situation occurs in bacterial insertion sequence (IS) elements, where a certain group of IS elements utilizes transcriptional slippage to produce ORFA-ORFB fusions, while many other IS elements utilize ribosomal frameshifting for the same purpose (41) . The diversity of recoding functions, combined with the wide spectrum of unrelated sequence elements involved in recoding, makes the design of a uniform model of recoding intractable. Nonetheless, in recent years, we have witnessed the development of specialized models and computational tools for the identification of particular subsets of recoding cassettes, or tools that are specific to recoding events in particular groups of homologous genes (42) (43) (44) (45) . These developments, at least partially, were facilitated by the availability of a compiled dataset of known recoded genes collected together in the Recode database (http://recode.genetics.utah.edu), which was initially launched 9 years ago (46, 47) . To facilitate further development of computational tools for the prediction of recoded genes in the ever faster growing body of sequence data, as well as to provide bench researchers with upto-date information on recoding, an efficient means of Recode database population and annotation are now required. In this article, we describe the incarnation of the database, Recode-2. The major advances of Recode-2 (hosted in a new location http://recode.ucc.ie) over previous versions include a new web design allowing enhanced visualization of stimulatory signals, a uniform RecodeML format for the annotation of recoded genes, and a significantly larger number of entriesincluding many recently identified cases-that altogether have more than doubled the size of the database since its last published update. The data are stored in a local PostgreSQL database that is queried by PHP scripts embedded in the web interface. The schema of the PostgreSQL database is shown in Figure 1 . The database stores information on individual genes that utilize recoding, the mechanisms and stimulatory signals involved, and references to the original literature sources that describe the recoding events. In order to facilitate the uniform annotation of recoding events, we have designed an XML-based format for the annotation of recoded genes, RecodeML. The document type definition for RecodeML is available at the Recode-2 web site at http://recode.ucc.ie/dtd The extensibility of the RecodeML format will allow incorporation of new annotation, if required, for newly discovered types of recoding, and the associated features, as they are being discovered. The database handles batch importation of properly designed RecodeML entries into the PostgreSQL database, thus facilitating rapid population of the database with new data. The data in the database may be explored in two ways. They may be browsed by one of the three categories: kingdom (archaea, bacteria, eukaryotes and viruses), organism and type of recoding. The data may also be searched directly by key words that can be inserted into the search field. Searches that use regular expressions are allowed. The output of a database search is a list of Recode-2 entries in a short format that includes organism name, kingdom, genus, type of recoding event, status of Figure 2 shows an example of sequence annotation for the human oaz1 gene, alongside a diagram of a stimulatory RNA secondary structure, and the Recode-1 logo. Unlike Recode-1, where all data on recoding events were introduced manually, Recode-2 also utilizes automated identification of recoding events by the recently developed computer programs ARFA (43) and OAF (44) , that are able to identify and annotate +1 frameshifting events in mRNAs of bacterial RF2s and eukaryotic antizyme (OAZs), respectively. However, a significant source of recoding events remains to be serendipitous discoveries by experimental studies that sometimes are complemented by more systematic studies of large groups of similar genes (51, 52) . Therefore, a large proportion of new data are still populated manually or semi-manually. To ease manual population of recoding events, a special form has been designed that is available in the database upon user registration. User registration needs to be approved by one of the database contributors. The novel data in the database include 249 RF2 mRNAs identified by ARFA, 152 events identified by OAF, 200 new selenoprotein genes (53) (54) (55) (56) and 200 new viral annotations (57) including the newly discovered frameshift cassettes in potyviruses (58), alphaviruses (59) and the Japanese encephalitis group of flaviviruses (60) . The database will expand in accordance with the growth of available sequence information that will be scanned by one of the existing programs for recode annotation. We also plan to continue developing tools for the automatic identification of recoding events from nucleotide sequences. As the field grows and the number of recoded genes progressively increases, it becomes harder to extract data from the relevant literature and a number of novel recoded genes may escape the database. Therefore, we encourage users and researchers in the field to submit their data directly to the Recode-2 database. We are also willing to provide help with the analysis of potential new recoding events. Reprogrammed genetic decoding in cellular gene expression Programmed translational frameshifting Recoding: translational bifurcations in gene expression Programmed ribosomal frameshifting goes beyond viruses: organisms from all three kingdoms use frameshifting to regulate gene expression, perhaps signaling a paradigm shift A case for ''StopGo'': reprogramming translation to augment codon meaning of GGN by promoting unconventional termination (Stop) after addition of glycine and then allowing continued translation (Go) Coupling of open reading frames by translational bypassing Recoding: expansion of decoding rules enriches gene expression Expression of peptide chain release factor 2 requires high-efficiency frameshift The function, structure and regulation of E. coli peptide chain release factors Release factor 2 frameshifting sites in different bacteria Ribosomal frameshifting in decoding antizyme mRNAs from yeast and protists to humans: close to 300 cases reveal remarkable diversity despite underlying conservation Autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme The gamma subunit of DNA polymerase III holoenzyme of Escherichia coli is produced by ribosomal frameshifting Translational frameshifting generates the gamma subunit of DNA polymerase III holoenzyme Programmed ribosomal frameshifting generates the Escherichia coli DNA polymerase III gamma subunit from within the tau subunit reading frame Structural probing and mutagenic analysis of the stem-loop required for Escherichia coli dnaX ribosomal frameshifting: programmed efficiency of 50% Programmed ribosomal frameshifting in HIV-1 and the SARS-CoV Alternative translation strategies in plant viruses Long-distance RNA-RNA interactions in plant virus gene expression and replication Eukaryotic selenoprotein synthesis: mechanistic insight incorporating new factors and new functions for old factors Selenoprotein synthesis: UGA does not end the story Selenium: Its Molecular Biology and Role in Human Health Recoding in bacteriophages and bacterial IS elements The role of programmed-1 ribosomal frameshifting in coronavirus propagation Frameshifting RNA pseudoknots: structure and mechanism Structure, stability and function of RNA pseudoknots involved in stimulating ribosomal frameshifting RNA pseudoknots and the regulation of protein synthesis A -1 ribosomal frameshift element that requires base pairing across four kilobases suggests a mechanism of regulating ribosome and replicase traffic on a viral RNA Slippery runs, shifty stops, backward steps, and forward hops: -2, -1, +1, +2, +5, and +6 ribosomal frameshifting Upstream stimulators for recoding Overriding standard decoding: implications of recoding for ribosome function and enrichment of gene expression Use of tRNA suppressors to probe regulation of Escherichia coli release factor 2 Translational bypassing without peptidyl-tRNA anticodon scanning of coding gap mRNA A nascent peptide is required for ribosomal bypass of the coding gap in bacteriophage T4 gene 60 Protein factors mediating selenoprotein synthesis Solution structure of SECIS, the mRNA element required for eukaryotic selenocysteine insertion-interaction studies with the SECIS-binding protein SBP Selenocysteine inserting tRNAs: an overview P-site tRNA is a crucial initiator of ribosomal frameshifting A new kinetic model reveals the synergistic effect of E-, P-and A-sites on +1 ribosomal frameshifting Nonlinearity in genetic decoding: homologous DNA replicase genes use alternatives of transcriptional slippage or translational frameshifting Transcriptional slippage in bacteria: distribution in sequenced genomes and utilization in IS element gene expression KnotInFrame: prediction of -1 ribosomal frameshift events ARFA: a program for annotating bacterial release factor genes, including prediction of programmed ribosomal frameshifting Ornithine decarboxylase antizyme finder (OAF): fast and reliable detection of antizymes with frameshifts in mRNAs Predicting genes expressed via -1 and +1 frameshifts RECODE: a database of frameshifting, bypassing and codon redefinition utilized for gene expression Database resources of the National Center for Biotechnology Information PseudoViewer3: generating planar drawings of large-scale RNA structures with pseudoknots Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli Conserved translational frameshift in dsDNA bacteriophage tail assembly genes Comparative genomics of trace elements: emerging dynamic view of trace element utilization and function Dynamic evolution of selenocysteine utilization in bacteria: a balance between selenoprotein loss and evolution of selenocysteine from redox active cysteine residues Trends in selenium utilization in marine microbial world revealed through the analysis of the global ocean sampling (GOS) project The selenoproteome of Clostridium sp. OhILAs: characterization of anaerobic bacterial selenoprotein methionine sulfoxide reductase A An extended signal involved in eukaryotic -1 frameshifting operates through modification of the E site tRNA An overlapping essential gene in the Potyviridae Discovery of frameshifting in Alphavirus 6K resolves a 20-year enigma A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1' may derive from ribosomal frameshifting We would like to express our appreciation to the colleagues who have contributed data for the previous versions of the database. Conflict of interest statement. None declared.