key: cord-102968-mhawyect authors: Desirò, Daniel; Hölzer, Martin; Ibrahim, Bashar; Marz, Manja title: SilentMutations (SIM): a tool for analyzing long-range RNA-RNA interactions in viral genomes and structural RNAs date: 2018-09-23 journal: bioRxiv DOI: 10.1101/424002 sha: doc_id: 102968 cord_uid: mhawyect Background A single nucleotide change or mutation in the coding region can alter the amino acid sequence of a protein. In consequence, natural or artificial sequence changes in viral RNAs may have various effects not only on protein stability, function and structure but also on viral replication. In the last decade, several tools have been developed to predict the effect of mutations in structural RNA genomes. Some tools employ multiple point mutations and are also taking coding regions into account. However, none of these tools was designed to specifically simulate the effect of mutations on viral long-range interactions. Results Here, we developed SilentMutations (SIM), an easy-to-use tool to analyze the effect of multiple point mutations on the secondary structures of two interacting viral RNAs. The tool can simulate destructive and compensatory mutants of two interacting single-stranded RNAs. This will facilitate a fast and accurate assessment of key regions, possibly involved in functional long-range RNA-RNA interactions and finally help virologists to design appropriate experiments. SIM only needs two interacting single-stranded RNA regions as input. The output is a plain text file containing the most promising mutants and a graphical representation of all interactions. Conclusion We applied our tool on two experimentally validated influenza A virus and hepatitis C virus interactions and we were able to predict potential double mutants for in vitro validation experiments. Availability The source code and documentation of SIM are freely available at github.com/desiro/SilentMutations In the last decades, several computational tools for the analysis of RNA secondary structures have been developed. However, tools specifically targeting virus needs are still rare and underresearched [1] [2] [3] . For example, long-range RNA-RNA interactions (LRIs) play an important role in the life cycle of RNA viruses. Many LRIs are already known to directly functioning as activators or inhibitors of viral replication and translation 4 . For instance, several interactions were recently identified as possibly new LRIs in the hepatitis C virus (HCV) genome using LRIscan 5 . Some of these computationally identified interactions have already been verified experimentally [6] [7] [8] [9] [10] [11] [12] [13] . Apart from long-range interactions forming in a single sequence, LRI-like structures can also occur between two separate sequences. Whereas the general function of such interactions is still under investigation, they can be seen as viral LRIs and are presumably responsible for the correct packaging of all segments into the viral capsid of segmented viruses such as the influenza A virus (IAV) [14] [15] [16] [17] . Due to the rapidly growing number of possible functional LRIs identified, it is essential to have an effective verification method. Technically, such interactions can be destroyed by mutation or removel of the interacting parts using in vitro experiments 8, 18 . Such sequence changes can result in secondary structure changes and manifest different viral titers. However, with the alteration of the sequence, unwanted effects can arise that not only effect the LRI. To cope with this issue, we can alter both interacting RNA parts simultaneously. Finally, we want the combination of both mutated parts resulting in a similar interaction strength in comparison to the interaction between the wild type (WT) sequences. A single mutated sequence part combined with the opposing WT sequence should destroy the interaction. Such a technique was recently used to verify a possible LRI between two IAV segments 15 . Several computational tools are available that can alter a sequence and report alternative secondary structures with one or even multiple mutations [19] [20] [21] [22] [23] [24] [25] [26] . However, these tools are designed for certain applications and do not meet the requirements of the previously proposed experimental technique: the combinatorial in vitro analysis of RNA-RNA interactions. In this study, we present a tool called SilentMutations (SIM) that effectively simulates synonymous (silent) compensatory mutations in two single-stranded viral RNAs and is therefore appropriate for the in vitro assessment of predicted LRIs. Here, we present a command-line tool, called SilentMutations (SIM), that can simulate synonymous structure-destroying and structurepreserving mutation pairs within coding regions for long-range RNA-RNA interaction experiments. The tool has been written in Python (v3.6.5) and relies heavily on the RNAcofold python sitepackage of the ViennaRNA Package (v2.4) 27 . As the ViennaRNA Package is available for Linux, Windows, and MacOS, SIM runs on all three different platforms. The various parameters of SIM are fully adjustable and several filters (Fig. 1) allow a fast and accurate prediction, therefore the tool can be run on a standard notebook. A simple run takes about 20 seconds when using the standard parameters, two sequences of length 21 as input, and a single core. Generally, the runtime highly depends on the parameter setup and the length of the input sequences. The main computational steps of SIM are (1) preprocessing, (2) permutation, (3) attenuation, (4) recovery, and (5) sampling. A special challenge in respect to negative singlestranded viruses (such as Ebola, Marburg, and Coronaviruses) is, that most databases only contain the positive strand of the viral genomes. While the protein-coding sequence is encoded on the positive strand, the folding of these viruses happens on the negative strand. Depending on the user input, SIM can automatically create the reverse and complement strand of negative singlestranded RNA viruses for folding while maintaining codon integrity on the positive strand. This can be easily achieved with the --virusclass=ssRNA-, --reverse, and --complement options. The user can directly provide the positive strand sequence from the database in a fasta file, with the reading frame and regions on the positive strand. The default output of SilentMutations prints all relevant information of the folding in a plain text file and additionally creates for each folding a VARNA 28 command for visualization. Additionally, if the tool can find installations of VARNA and inkscape 29 , it will directly generate high-quality vector graphics of the various foldings. A default binary of VARNA (v3.93) is provided with the tools source code. SilentMutations is primary designed for two sequence snippets (hereafter called snips) in coding regions but can be used to simulate interactions between coding and non-coding regions as well as for interactions between two non-coding regions. This can be exercised with the --noncoding1 and --noncoding2 parameters, denoting the first or second sequence as non-coding, respectively. We use the following notation for the minimum free energy (mfe) obtained by folding sequence x with sequence y via RNAcofold: The mfe is represented as a negative value in kcal/mol and therefore having a lower mfe results in a more stable structure. The first step of SIM extracts sequence snips from a longer sequence template while keeping codons intact (Fig. 1) . Especially, the difficult extraction of snips in the exact reading frame from negative sense single-stranded RNAs can be easily handled by the build-in functionalities of the tool. SIM thus directly accepts the whole sequence as input, together with a start and end position for the snips and the reading frame. The tool first extracts the requested interacting parts from each sequence and automatically increases them in both directions if the provided range would otherwise split a codon based on the predefined reading frame (Fig. 1) . This ensures a preservation of all codons. Both sequences, pre-snip1 (ps 1 ) and pre-snip2 (ps 2 ), will then be fold together via RNAcofold to acquire only codons involved in the folding. Unpaired codons at the termini will be removed to get the final snip1 (s 1 ) and snip2 (s 2 ) from which SilentMutations calculates the two mutant sequences mut1 (m 1 ) and mut2 (m 2 ). Figure 1 : Overall workflow of the SilentMutations tool, exemplary shown for two sequences from a negative single-stranded RNA virus genome (ssRNA-) (a) The first step will extract the defined range in each sequence and possibly increase the range to preserve codons in the given reading frame. Both so-called pre-snips are folded with RNAcofold to remove unpaired codons at the endings. We refer to an extracted singlestranded RNA snippet as snip. (b) All possible codon permutations (here, reverse complements due to ssRNA-) are generated for both snips (called perms) to conserve the amino acid sequence. Permutations with too many mutations defined by the mutations (-mut) parameter are discarded. (c) To further decrease the total number of combinations, SIM will fold each snip with all permutations of the other snip and keep permutations with a mfe higher or equal to mfe(snip1,snip2) times the filter percentage (-prc) parameter, denoted as upper limit l. (d) All remaining snip1 permutations will then be folded against all snip2 permutations to find snip1 and snip2 mutations with a similar fold mfe. A double-mutant mfe(mut1,mut2) fold is considered to be similar, if its mfe is lower than the wild-type mfe(snip1,snip2) times the lower deviation (-ldv) parameter and higher than mfe(snip1,snip2) times the upper deviation (-udv) parameter. (e) The last step will minimize a combined single-mutant mfe(snip1,mut2) and mfe(mut1,snip2) by keeping both mutations in a similar range depending on the mutation range (-mrg) parameter. The range for each mutation combination is defined by (mfe(snip1, mut2) + mfe(mut1, snip2)) · 0.5 · (1 + -mrg) for the lower threshold (l-mrg) and (mfe(snip1, mut2) + mfe(mut1, snip2)) · 0.5 · (1 − -mrg) for the upper threshold (u-mrg). Different sets and their abbreviations are given in brackets. Details can be found in the Methods. In this step all possible permutations pre-perms1 (P*) and pre-perms2 (Q*) of synonymous codons are created from each sequence in order to find the most suitable mutations that maximize the distance between mfe(s 1 ,s 2 ) and mfe(s 1 ,m 2 ) as well as the distance between mfe(s 1 ,s 2 ) and mfe(m 1 ,s 2 ), while keeping the wild-type mfe(s 1 ,s 2 ) and the double-mutant mfe(m 1 ,m 2 ) similar. The number of these permutations can be vast, depending on the number of synonymous codons and the length of the sequence. In order to reduce the computational complexity, this step also removes every permutation sequence with more mutations than predefined by the --mutations parameter resulting in the sets perms1 (P) and perms2 (Q), see Fig. 1 . Having fewer alterations in a sequence with a strong attenuation effect on the viral titer additionally improves the authenticity of the mutational experiment. Here, a further reduction of the computational complexity is applied while already creating valid sequences with reduced mfe folding scores. Having a small difference between the wildtype mfe(s 1 ,s 2 ) and single-mutant mfe(s 1 ,m 2 ) or mfe(m 1 ,s 2 ) respectively would diminish the mutational effect on the interaction. It is therefore vital to increase the distance between these mfe scores. Therefore, the user can directly specify the required distance with the filter percentage (--filterperc) parameter. The defined value is then multiplied with the mfe(s 1 ,s 2 ) value to obtain a upper limit (denoted l) for mfe(s 1 ,m 2 ) and mfe(m 1 ,s 2 ). Then, the tool creates all foldings between s 1 and Q as well as P and s 2 . Each folding mfe(s 1 , q j ∈ Q) and mfe(p i ∈ P, s 2 ) (with i and j defining the i-th and j-th element of set P and Q, respectively) is only considered for the next recovery step if the mfe is higher than l. The new sets mut1 (U) and mut2 (V ) have a drastically reduced size compared to P and Q which greatly reduces the computational complexity for finding the two mutant sequences m 1 and m 2 where mfe(m 1 ,m 2 ) is similar to the wild-type mfe(s 1 ,s 2 ). While it seems to be convenient to set a very low --filterperc parameter, this could also result in an empty permutation set. On the other hand, setting a high --filterperc could result in longer running times. It is therefore recommended to start with the default 0.5 and later adjust this parameter depending on the performance. To find double-mutants (mut) with a similar folding mfe such as the wild-type (WT) folding mfe, each sequence u i ∈ U and v i ∈ V has to be folded against each other with RNAcofold. The runtime of this folding process is highly dependent upon the previous attenuation step. When using the unprocessed P* and Q* sets from the example in Fig. 1 , the number of required folding operations would be around 24 million and reduced to only 37,000 for the filtered sets U and V. Some LRIs function as inhibitors for viral replication 4 and a stronger interaction would decrease functional viral reproduction. It is therefore necessary for the simulation of the double-mutant interaction to set a lower but also an upper similarity threshold. Each of the calculated mfe(u i ∈ U, v j ∈ V ) values are therefore compared with the wild-type mfe(s 1 ,s 2 ). for each mutant tuple. The most suitable mutant pair, holding all predefined requirements, is finally found by calculating: SilentMutations is primarily designed as a supporting tool in the assembly step of an in vitro experiment, such as previously performed by Gavazzi et al. in influenza A viruses 15 . With the help of SIM, experimentalists can simulate for the first time specific RNA-RNA interactions of viral coding-and non-coding sequence snippets, potentially forming stable long-range interactions. Using SIM, the search space of possible doublemutant sequences, forming a stable secondary structure with a comparable mfe to the wildtype structure and simultaneously disrupting the single-mutants structure, can be drastically reduced. Therefore, SIM can be used prior time-and cost-consuming wet-lab experiments to simulate promising double-mutant sequences that preserve the wild-type LRI, however are not functional as single-mutants. To validate SIM and to show that the tool is able to predict biological relevant mutants, we used two validated interaction examples, one in the influenza A virus H5N2 A/finch/England/2051/1991 strain and another in the hepatitis C virus type 1b strain (accession: AJ238799.1). The influenza A virus (IAV) genome consists of 8 viral ribonucleoproteins (vRNPs) and each vRNP segment includes one of 8 different negative sense and single-stranded viral RNAs. It is hypothesized, that these segments are packed selectively through RNA-RNA interactions between the 8 segments 16, [30] [31] [32] . To this end, Gavazzi et al. 15 performed an in vitro mutation experiment in IAV, perfectly shaped to validate our tool. They took a yet unconfirmed interaction from one of their previous experiments 14 between the PB1 and the NS segment and introduced four transcomplementary point substitutions by hand. An interaction between the PB1 mutant and NS mutant resulted in a similar viral titer than using the PB1 WT and NS WT segment. Introducing only the PB1 mutant or NS mutant resulted in both cases in an attenuation of the viral replication. To show that we are able to obtain the same experimental results of Gavazzi et al. 15 computationally, we adjusted some key parameters of SIM. Importantly, we limited the number of possible mutation pairs to four and calculated the mfe between the WT and mutant (mut) combinations in mere seconds. The SIM results of the secondary structure predictions and mfe values for the two IVA segment combinations are shown in Fig. 2 . By only allowing a maximum of four mutation pairs, we were able to calculate the same structures and mutations as previously proposed by Gavazzi et al. 15 . Moreover, our results show a mfe difference of 1.80 kcal/mol between wildtype and double-mutant (Fig. 2 a and b) and a mfe difference of 2.20 kcal/mol between singlemutants ( Fig. 2 c and d) . Again, we want to point out that for the in silico experiment it is preferable to have a similar mfe between the two wild-type IAV segments mfe(NS W T , PB1 W T ) and the double-mutant mfe(NS mut , PB1 mut ) as well as between the single-mutants mfe(NS W T , PB1 mut ) and mfe(NS mut , PB1 W T ). In a next step and by using SIM with default parameters, we were able to calculate a double-mutant that not only reflects the results of Gavazzi et al. 15 , but also shows slightly lower mfe differences between the wild-type and the double-mutant (1.20 kcal/mol) as well as between the two singlemutants (1.20 kcal/mol), see Fig. 3 . Therefore, the interaction strengths of mfe(NS W T , PB1 W T ) and mfe(NS mut , PB1 mut ), as well as mfe(NS W T , PB1 mut ) and mfe(NS mut , PB1 W T ) are more closely to each other in comparison to the results of Gavazzi et al. 15 . As a possible drawback, our simulation needs to introduce one more point mutation in each single-stranded RNA snip (Fig. 3) . The hepatitis C virus (HCV) genome consists of a positive single-stranded RNA of about 10 kb length. This RNA is translated into a single polyprotein that is later cleaved into four structural (C, E1, E2, p7) and six nonstructural (NS2, NS3, NS4A, NS4B, NS5A, RdRp) proteins by viral and host proteases 33, 34 . Both UTR regions of the viral genome are highly structured and have been extensively studied in the past 33, [35] [36] [37] [38] . For our validation of SIM, we have chosen a well studied interaction 8 between the 3' UTR and the end region of the ORF encoding the HCV polyprotein. Several studies 9, 33, [39] [40] [41] [42] have already shown that the RNA replication highly depends on the conserved structures of the X-tail 33, 43 sequence contained in the 3' UTR. This sequence is presumed to contain three experimentally verified stem-loops (SLI, SLII, SLIII) 44 which may interact with other parts of the viral genome through long-range RNA-RNA interactions to regulate replication 40 . The LRI between the free nucleotides of the hairpin from the 3' SLII structure and the free nucleotides of the hairpin from the 5BSL3.2 structure have been subject of many HCV studies. The interaction was first verified by Friebe et al. 8 and was also computationally found with LRIscan by Fricke et al. 5 . Another reason for selecting this interaction as an example was the duality of having one interacting part in a coding and one in a non-coding region. Applying SIM on this interaction resulted in two compensatory point mutations in each snip (Fig. 4) . As the 3' SLII region is non-coding, only the point mutations in the snip from the 5BSL3.2 structure are silent. Our results show, that the wild-type structure mfe(5BSL3.2 W T , SLII W T ) should have a similar strength compared to the calculated doublemutant structure mfe(5BSL3.2 mut , SLII mut ), see Fig. 4 . Additionally, both mfe(5BSL3.2 W T , SLII mut ) and mfe(5BSL3.2 mut , SLII W T ) weaken the interaction significantly. We propose, that our simulated mutations in HCV may be used to verify the given longrange interaction. Taking all the new long-range interactions found by Fricke et al. 5 into account, we propose that our tool can be used to create mutation experiments for every predicted LRI to provide evidence for a biological function of these interactions. Such experiments would be especially interesting for IAV, where the exact packaging process of the vRNP segments is not yet fully understood. By presenting SIM, we provide an easy and fast way to analyze possible interactions between vRNPs. The tool can be used to heavily reduce the search space of possible synonymous mutation interactions between two RNAs. Another difficulty when creating silent IAV mutations lies in preserving the codons on the positive strand, while mutating the negative strand, and is also intercepted by SIM. Furthermore, our tool provides a significant speedup, not only in the verification of interactions in these two viruses, but also for many other virus families. Our simulations will help to gather a deeper understanding of the translation and replication processes in viruses and also how long-range interactions are regulating these. A promising future approach would be the combined application of LRIscan and SilentMutations to detect currently unknown LRIs and to provide in the same step possible mutational verification experiments for each LRI. Virologists-heroes need weapons A new era of virus bioinformatics Software Dedicated to Virus Sequence Analysis Functional long-range RNA-RNA interactions in positive-strand RNA viruses Conserved RNA secondary structures and long-range interactions in hepatitis C viruses The functional RNA domain 5BSL3.2 within the NS5B coding sequence influences hepatitis C virus IRES-mediated translation. Cellular and molecular life sciences : CMLS A twist in the tail: SHAPE mapping of long-range interactions and structural rearrangements of RNA elements involved in HCV replication Kissing-loop interaction in the 3' end of the hepatitis C virus genome essential for RNA replication Cis-acting RNA elements in human and animal plus-strand RNA viruses Natural variation in translational activities of the 5' nontranslated RNAs of hepatitis C virus genotypes 1a and 1b: evidence for a long-range RNA-RNA interaction outside of the internal ribosomal entry site Core protein-coding sequence, but not core protein, modulates the efficiency of cap-independent translation directed by the internal ribosome entry site of hepatitis C virus Long-range RNA-RNA interaction between the 5' nontranslated region and the core-coding sequences of hepatitis C virus modulates the IRES-dependent translation RNase III cleavage demonstrates a long range RNA: RNA duplex element flanking the hepatitis C virus internal ribosome entry site An in vitro network of intermolecular interactions between viral RNA segments of an avian H5N2 influenza A virus: comparison with a human H3N2 virus A functional sequence-specific interaction between influenza A virus genomic RNA segments A supramolecular assembly formed by influenza A virus genomic RNA segments Interaction network linking the human H3N2 influenza A virus genomic RNA segments. Vaccine A long-range RNA-RNA interaction between the 5' and 3' ends of the HCV genome corRna: a web server for predicting multiple-point deleterious mutations in structural RNAs The RNAsnp web server: predicting SNP effects on local RNA secondary structure The RNAmute web server for the mutational analysis of RNA secondary structures RDMAS: a web server for RNA deleterious mutation analysis RNAmutants: a web server to explore the mutational landscape of RNA secondary structures Mutational analysis in RNAs: comparing programs for RNA deleterious mutation prediction Rtools: a web server for various secondary structural analyses on single RNA sequences Efficient algorithms for probing the RNA mutation landscape ViennaRNA package 2.0 VARNA: Interactive drawing and editing of the RNA secondary structure Genome packaging in influenza A virus Architecture of ribonucleoprotein complexes in influenza A virus particles Selective incorporation of influenza virus RNA segments into virions Hepatitis C virus RNA replication Hepatitis C virus proteins: from structure to function Secondary structure of the 5' nontranslated regions of hepatitis C virus and pestivirus genomic RNAs A phylogenetically conserved stem-loop structure at the 5' border of the internal ribosome entry site of hepatitis C virus is required for cap-independent viral translation Role of RNA structures in genome terminal sequences of the hepatitis C virus for replication and assembly Hepatitis C virus RNA translation In vivo analysis of the 3' untranslated region of the hepatitis C virus after in vitro mutagenesis of an infectious cDNA clone Genetic analysis of sequences in the 3' nontranslated region of hepatitis C virus that are important for RNA replication 3' nontranslated RNA signals required for replication of hepatitis C virus RNA 3' RNA elements in hepatitis C virus replication: kissing partners and long poly(U) A novel sequence found at the 3' terminus of hepatitis C virus genome Secondary structure determination of the conserved 98-base sequence at the 3' terminus of hepatitis C virus genome RNA Author Contributions DD performed the design, development and programming of the tool and wrote the main draft of the paper. MH, BI, and MM contributed in writing, discussions and proofreading of the final manuscript. All authors read and approved the final manuscript.