key: cord-318478-fn0gcxbb authors: Ziv, Omer; Price, Jonathan; Shalamova, Lyudmila; Kamenova, Tsveta; Goodfellow, Ian; Weber, Friedemann; Miska, Eric A. title: The short- and long-range RNA-RNA Interactome of SARS-CoV-2 date: 2020-10-07 journal: bioRxiv DOI: 10.1101/2020.07.19.211110 sha: doc_id: 318478 cord_uid: fn0gcxbb The Coronaviridae is a family of positive-strand RNA viruses that includes SARS-CoV-2, the etiologic agent of the COVID-19 pandemic. Bearing the largest single-stranded RNA genomes in nature, coronaviruses are critically dependent on long-distance RNA-RNA interactions to regulate the viral transcription and replication pathways. Here we experimentally mapped the in vivo RNA-RNA interactome of the full-length SARS-CoV-2 genome and subgenomic mRNAs. We uncovered a network of RNA-RNA interactions spanning tens of thousands of nucleotides. These interactions reveal that the viral genome and subgenomes adopt alternative topologies inside cells, and engage in different interactions with host RNAs. Notably, we discovered a long-range RNA-RNA interaction - the FSE-arch - that encircles the programmed ribosomal frameshifting element. The FSE-arch is conserved in the related MERS-CoV and is under purifying selection. Our findings illuminate RNA structure based mechanisms governing replication, discontinuous transcription, and translation of coronaviruses, and will aid future efforts to develop antiviral strategies. RNA viruses comprise the dominant component of the eukaryotic virome (Dolja and Koonin, 2018) . Their error-prone genome replication mode allows them to rapidly evolve new variants and to jump from animals to humans (Woolhouse and Gaunt, 2007) , thus presenting a high epidemic and pandemic threat. Several members of the betacoronavirus genus (family Coronaviridae), namely the Severe Acute Respiratory Syndrome coronavirus (SARS-CoV), the Middle East Respiratory Syndrome coronavirus (MERS-CoV), as well as the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) are of special concern. SARS-CoV-2, the causative agent of Coronavirus Disease 2019 , has spread to date to nearly every country in the world, resulting in millions of infections, over a million of deaths, and a massive global economic impact (McKibbin and Fernando, 2020) . Even though worldwide efforts and resources are redirected to overcome the COVID-19 pandemic, at present, there are no approved vaccines or antiviral medicines. This illustrates the urgent need for deciphering in-depth the molecular biology of coronaviruses, especially SARS-CoV-2. Coronaviruses have evolved the largest known single-stranded RNA genome in nature. Regulation of their mRNA transcription and translation is facilitated by cis-acting structures that interact with each other, with viral proteins, and with host machineries (Madhugiri et al., 2016) . mRNA transcription in coronaviruses involves a process whereby so-called subgenomic mRNAs (sgmRNAs) are produced through discontinuous genomic RNA (gRNA) template utilization, which is in contrast to replication of the full-length genome (Sawicki et al., 2007) . This discontinuous transcription is mediated by the Transcription Regulating Sequence-leader (TRS-L) at the 5′ end of the genome, and the Transcription Regulating Sequence-body (TRS-B) at the 5′ ends of each ORF. Template switching between these RNA sequence elements results in a set of 5′ and 3′ co-terminal, "nested" sgmRNAs of different sizes on which the 5′ proximal ORFs are translated into nonstructural or structural viral proteins (Moreno et al., 2008; Sola et al., 2015) . The mechanisms underlying discontinuous transcription and genome replication have not been fully worked out, however long-distance RNA-RNA interactions along the viral genome have been proposed as key regulators (Mateos-Gómez et al., 2011; Mateos-Gomez et al., 2013; Moreno et al., 2008; Sola et al., 2015) . On the full-length gRNA itself, two partially overlapping open reading frames (ORF1a and ORF1b) are translated from the same start codon at the 5′ end, resulting in the polyproteins pp1a and pp1ab. Translation of the longer product pp1ab is made possible by a hairpin-type pseudoknot RNA structure known as the frameshifting element (FSE) which regulates a programmed -1 ribosomal frameshifting that overrides with about 50% efficiency the stop codon of ORF1a (Kelly et al., 2020; Namy et al., 2006) . Previous studies applied RNA structure probing techniques using selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE) and DMS, as well as nuclear magnetic resonance (NMR) to effectively identify conserved cis-acting RNA structures regulating the life cycle of coronaviruses. However, when it comes to identifying long-distance base-pairing between distal nucleotides, these methods fall short. Therefore, the long-range RNA-RNA interactome of coronaviruses has never been mapped in full. Deciphering how the various structural elements along the coronavirus gRNA and sgmRNA are folded and brought together in time and space is vital for understanding, dissecting and manipulating viral replication, discontinuous transcription, and translation regulation. We recently developed Crosslinking Of Matched RNAs And Deep Sequencing (COMRADES) for in-depth RNA conformation capture in living cells (Ziv et al., 2018) . COMRADES is derived from a class of methods that combine psoralen crosslinking of base paired RNA and deep sequencing (Aw et al., 2016; Lu et al., 2016; Sharma et al., 2016) . COMRADES utilizes a clickable psoralen derivative to specifically crosslink paired nucleotides, and high throughput sequencing to retrieve their positions ( Figure 1 ). Following in vivo crosslinking, the viral RNA is selectively captured, fragmented and subjected to a click-chemistry reaction to add a biotin tag to crosslinked fragments. Crosslinked RNA duplexes are then selectively captured using streptavidin affinity purification. Half of the resulting RNA is proximity ligated, following reversal of the crosslink to create chimeric RNA templates for high throughput sequencing. The other half is used as a control, in which reversal of the crosslink precedes the proximity ligation, and accurately represents the background level of non-specific ligation. The coupling of two biotin-streptavidin mediated enrichment steps, first of viral RNA, and second of crosslinked RNA duplexes provides high structural depth for identification of both long-and short-lived conformations. COMRADES can therefore measure (i) the structural diversity of alternative RNA conformations that co-exist inside cells; (ii) short-distance, as well as long-distance (over tens of thousands of nucleotides) base-pairing within the same RNA molecule; and (iii) base-pairing between different RNA molecules, such as those of host and viral origin (Kudla et al., 2020; Ziv et al., 2018) . Here we apply COMRADES to study the structural diversity of the SARS-CoV-2 gRNA and sgmRNA inside cells. We discover networks of short-and long-range RNA-RNA interactions spanning the entirety of SARS-CoV-2 gRNA and sgmRNA. We reveal site-specific interactions with the host transcriptome. Finally, we uncover a conserved long-range structure encompassing the programmed ribosomal frameshifting element. The SARS-CoV-2 genome and sgmRNA adopt alternative co-existing topologies that involve long-distance base-pairing Inside the host, the gRNA of SARS-CoV-2 is transcribed into sgmRNA ( Figure 2A ). To compare the structure of both types of RNA, we applied the COMRADES method and set up a dual enrichment strategy to analyse the positive sense gRNA and positive sense sgmRNA separately ( Figure 2B ). Briefly, we selectively pulled down the full-length positive sense SARS-CoV-2 genome from in vivo crosslinked, SARS-CoV-2 inoculated Vero E6/TMPRSS2 cells (Matsuyama et al., 2020) , using a tiling array of antisense probes for ORF1a/b, which resulted in a highly enriched gRNA fraction ( Figure 2C ). The full-length positive sense sgmRNA was subsequently enriched from the gRNA-depleted supernatant of the first pulldown, using a second tiling array of antisense probes to the region downstream of ORF1a/b ( Figure 2B ). This dual enrichment strategy resulted in a high degree of separation between the gRNA and the sgmRNA ( Figure 2C ). COMRADES provided >6 million nonredundant chimeric reads, which was sufficient to generate high-resolution maps for both the gRNA and sgmRNA with a high signal to noise ratio ( Figure S1 ), and high reproducibility between independent biological replicates (r = 0.92, p value <2.2e-16, Figure 2D ,E). Our structural data covered >99.99% of the coronavirus gRNA and the sgmRNA ( Figure 2C ), and represents the base-pairing nature of SARS-CoV-2 gRNA and sgmRNA inside cells. Available models for the RNA structure of SARS-CoV-2 and related viruses are largely confined to short-distance base-pairing which result in local folding of important cis-acting elements (Andrews et al., 2020; Huston et al., 2020; Kelly et al., 2020; Lan et al., 2020; Manfredonia et al., 2020; Ryder, 2020; Sanders et al., 2020; Sun et al., 2020) . However, longdistance base-pairing between distal RNA elements are equally essential for many RNA viruses (Huber et al., 2019) , including coronaviruses (Mateos-Gómez et al., 2011; Mateos-Gomez et al., 2013; Moreno et al., 2008; Sola et al., 2015) . The ability of COMRADES to capture RNA base-pairing regardless of the distance between the interacting bases enabled us to confirm in vivo the structure of nearly all previously characterised cis-acting elements (with one exception, discussed below) and to discover long-distance RNA-RNA interactions as they occur inside cells. Indeed, we observed a high prevalence of long-range RNA basepairing along the SARS-CoV-2 genome, with ORF1a demonstrating more long-range connectivity than any other ORF ( Figure 2F ). Most of the base-pairing is confined to a single ORF, however, some interactions cross ORF boundaries. For example, ORF1a base-pairs with ORF1b, as well as with the 5′ and 3′ untranslated regions (UTRs) ( Figure 2F ). We additionally discovered long-distance interactions unique to the sgmRNA ( Figure 2G ). Previous models of the SARS-CoV-2 and related viruses mainly analysed structural population averages, i.e. assuming that all copies of the genome and sgmRNA have a single static conformation. Yet, the complex life cycle of viral RNA genomes, i.e. their engagement with multiple cellular and viral machineries such as the ones for replication, transcription, and translation, suggests a dynamic RNA structure, as we and others have reported for Zika virus (Huber et al., 2019; Li et al., 2018; Ziv et al., 2018) and for HIV-1 (Tomezsko et al., 2020) . Our structural analysis of SARS-CoV-2 reveals a high level of structural dynamics whereby alternative high-order conformations, some of which involve long-distance basepairing, co-exist in vivo (Figures 3 and S2A , Table S1 ). For example, nucleotides 5, 680 in ORF1a interact with three alternative distal regions: 3.6 kb upstream, 3.4 kb downstream, and 2 kb upstream ( Figure 3 , arches 4, 5 and 8 respectively), and the 5′ UTR interacts with ORF1a as well as with the 3′ UTR ( Figure 3 , arches 2 and 3 respectively). In contrast, we find that ORF N sgmRNA is held in a single dominant conformation where the leader sequence interacts exclusively with a region 0.8 kb downstream ( Figure S2B ). In summary, we discover the co-existence of alternative SARS-CoV-2 gRNA and sgmRNA topologies, held by long-range base-pairing between regions tens of thousands of nucleotides apart. Each topology brings in physical proximity previously characterised and new elements involved in viral replication and discontinuous transcription, therefore offering a model for facilitating distinct patterns of template switching to produce the complete SARS-CoV-2 transcriptome. The infectious life cycle of coronaviruses takes place mainly in the host cell's cytoplasm, where many cellular RNAs reside (Sola et al., 2015) . Host-virus RNA-RNA interactions regulate the replication of some RNA viruses, e.g. the interaction between hepatitis C virus and human microRNA miR-122 (Jopling et al., 2005) , the interaction between Zika virus and human miR-21 (Ziv et al., 2018) , and the priming of HIV-1 replication by human tRNAs (Mak and Kleiman, 1997) . However, to the best of our knowledge, whether the SARS-CoV-2 gRNA or sgmRNA interact with cellular RNA is unknown. Our COMRADES method provides an opportunity to undertake an unbiased analysis of the host-virus RNA-RNA interactome (Ziv et al., 2018) . We discovered site-specific interactions between the SARS-CoV-2 RNA and various cellular RNAs, especially small nuclear RNAs (snRNAs) (Figures 4A, B and S3A, B) . Apart from their canonical role in splicing, snRNAs mature in the cytoplasm and may have additional biological roles (Matera et al., 2014) . Along the viral gRNA, cellular snRNA interactions are mostly confined to ORF1a and ORF1b, and include site specific binding of U1, U2 and U4 snRNAs. The gRNA coding region for the sgmRNA ORFs and the UTRs are largely devoid of snRNA binding. In contrast, along the viral sgmRNA, both the N ORF and the 3′ UTR show high occupancy of U1 and U2 snRNA binding. In order to explore the conservation of these snRNA interactions in a related coronavirus, we performed COMRADES on MERS-CoV-inoculated Huh-7 cells. Similarly to SARS-CoV-2, we identified a site-specific interaction of U2 snRNA within the MERS-CoV ORF1a ( Figures 4C and S3C ), illustrating the evolutionary conservation of the U2 snRNA base-pairing with ORF1a of betacoronaviruses. In addition to cellular small RNAs we also detected long cellular RNAs interacting with SARS-CoV-2 RNA, although to a lesser extent. Of specific interest, the RNase MRP RNA was found to base-pair with an extended 3′ region of the sgmRNA, but not the gRNA, of SARS-CoV-2 ( Figure S3D ). The RNase MRP RNA has a conserved secondary structure similar to that of the RNA component of the bacterial RNase P ribonucleoprotein (RNP) complex (Dávila López et al., 2009; Welting et al., 2006) . The RNase MRP RNA has a role in human pre-ribosomal RNA processing (Goldfarb and Cech, 2017) , when mutated leads to a spectrum of human disease (Ridanpää et al., 2001) , and has been implicated in viral RNA degradation (Jaag et al., 2011) . Targeting host-virus RNA-RNA interactions provides an attractive platform for developing new antiviral therapies, as resistance would require the virus to acquire considerable mutational changes to become independent of the host RNA. However, whereas multiple tools and efforts are dedicated to identifying host-virus protein-protein interactions, the crosstalk between host and virus RNA remains largely unexplored. Coupled with the recent advancement in techniques to target RNA in vivo, COMRADES's capacity to map the hostvirus RNA-RNA interactome opens up new opportunities to control emerging RNA viruses. The data we present here could be valuable for the development of new targets for antiviral drugs. The 5′ UTR of coronaviruses contain five evolutionary conserved stem-loop structures (denoted SL1-SL5) that are essential for genome replication and discontinuous transcription (Madhugiri et al., 2016) . The 3′ UTR contains 3 structural elements important for replication: an evolutionary conserved bulged stem-loop (BSL) (Hsue and Masters, 1997) , a partially overlapping hairpin-type pseudoknot (Goebel et al., 2004; Williams et al., 1999) , and a 3′ terminal multiple stem-loop structure containing a hyper-variable region (HVR), which folds back to create a triple helix junction (Liu et al., 2013) . Our analysis identified seven of these eight cis-acting elements within the UTRs ( Figures 5A and S4A ). However, our data did not support the folding of the stem-loop pseudoknot at the 3′ UTR. Of note, two recent studies using SHAPE methods to map the structure of SARS-CoV-2 inside cells similarly failed to identify this pseudoknot (Huston et al., 2020; Sun et al., 2020) , and a previous study demonstrated the instability of this pseudoknot in the related mouse hepatitis virus (MHV) (Stammler et al., 2011) . In addition to the canonical UTR structures, we provide here a direct in vivo evidence for genome cyclization in SARS-CoV-2, mediated by long-range base-pairing between the 5′ and 3′ UTRs ( Figures 5B and S4B ). This base-pairing spans a distance of 29.7 kilobases and is among the longest distance RNA-RNA interactions ever reported. Genome cyclization was previously hypothesised from mutational analyses of murine coronavirus (MHV) and was suggested to facilitate discontinuous transcription (Li et al., 2008) . However, while MHV genome cyclization involves the 5′ SL1 structure, we find that in SARS-CoV-2, this process is mediated by the 5′ SL3 instead, and results in complete opening of SL3 and disruption of the triple helix junction in the 3′ UTR ( Figure 5B ). In agreement with this observation, SL3 of related betacoronaviruses was suggested to be weakly folded or unfolded (Chen and Olsthoorn, 2010; Li et al., 2008) . Genome cyclization plays an essential role in the replication of a number of RNA viruses, including flaviviruses (Hahn et al., 1987; Ziv et al., 2018) . The evolutionary selection of such a mechanism might stem from in-cell competition between intact and defective viral genomes, as it ensures that only genomes bearing two intact UTRs engage with the replication machinery. The SARS-CoV-2 genome cyclization we report here results in a complete opening of the 5′ SL3 where the Transcription Regulating Sequence-Leader (TRS-L) resides, raising the possibility that genome cyclization regulates SARS-CoV-2 discontinuous transcription, as was previously suggested for MHV (Li et al., 2008) . It remains to be seen whether this base-pairing can be targeted to inhibit viral replication in vivo. In addition to genome cyclization, we identified two alternative conformations involving long-distance RNA-RNA interactions between each UTR and ORF1a. These long-distance conformations result in unfolding of SL2 and SL3 in the 5′ UTR ( Figures 5C and S4C) , and unfolding of the terminal stem-loop in the 3′ UTR ( Figures 5D and S4D ). Of note, unlike the gRNA, the leader sequence within the 5′ UTR of ORF N sgmRNA is held in a single longrange conformation through base-pairing with a region 0.8 kb downstream ( Figures 5E, S2B and S4E). All of the long-range interactions described above are strongly supported through chimeric reads ( Figures 5F and S2A ). Overall, our data demonstrate the existence of alternative, mutually exclusive UTR conformations inside cells, involving interactions between functional UTR elements and distal regions within the ORFs. We further show that the N ORF sgmRNA folds differently than the viral genome. The long-distance RNA structure map for SARS-CoV-2 provides a practical starting point to dissect the regulation of discontinuous transcription, as it identifies cis-acting elements that interact with each other to create genome topologies that favour the synthesis of the ensemble of sgmRNAs. RNA viruses evolve sophisticated mechanisms to enhance the functional capacity of their size-restricted genomes and to regulate the expression levels of their replicase components. In coronaviruses, one such mechanism is programmed -1 ribosomal frameshifting to facilitate translation of ORF1b which contains the viral RdRp activity, and to set a defined ratio of ORF1a and ORF1b products (Plant et al., 2010) . This is mediated by a ~120 nucleotide long cis-acting frameshifting element (FSE) composed of a stem-loop attenuator, and a slippery sequence followed by a single-stranded spacer and an RNA pseudoknot (Kelly et al., 2020) . It has been suggested that pausing the progression of the ribosome upstream of the pseudoknot facilitates a tandem-slippage of the peptidyl-tRNA and aminoacyl-tRNA to the −1 reading frame, thus allowing continuous translation through the stop codon at the end of ORF1a (Brierley et al., 1989) . Altering the frameshifting mechanism had a deleterious effect on SARS-CoV replication (Plant et al., 2013) , making the FSE an attractive target for antiviral therapy. Understanding the surrounding RNA structure and function is therefore of great importance as it might aid the design of drugs targeting the FSE. Unexpectedly, we find that the FSE of SARS-CoV-2 is embedded within a much larger, ~1.5 kb long higher-order structure that bridges the 3′ end of ORF1a with the 5′ region of ORF1b, which we termed the FSE-arch ( Figure 6A ,B). To the best of our knowledge, this is the first time such a long-range structural bridge has been reported for any coronavirus, and importantly this structure is supported by the largest number of chimeric reads in our data (more than tens of thousands of non-redundant chimeric reads) ( Figures S6A,B) , reflecting its high folding stability in vivo. The FSE-arch results in a stem-loop structure encompassing 1,475 nucleotides, and bearing the FSE within it ( Figures 6B and S5C ). We hypothesized that if an RNA-RNA interaction is functionally important, there should be purifying selection and hence reduced nucleotide evolution rate in this region. Therefore we used a recent dataset (Firth, 2020) to explore the nucleotide conservation of the FSE-arch ( Figure 6C ). Strikingly, the FSE-arch is under a strong purifying selection and is among the most conserved regions within the SARS-CoV-2 genome. Consistent with this, analysing the phylogeny of the SARS-related coronavirus subgenus (taxid: 2509511) revealed two positions of covariance that support the conservation of the FSE-arch ( Figure 6B ). To further explore this structure experimentally, we analysed its existence in MERS-CoV. MERS-CoV shares only ~50% sequence identity with SARS-CoV-2 Lu et al., 2020) , yet even so, performing COMRADES on MERS-CoVinoculated Huh-7 cells revealed a strong evidence for an homologous FSE-arch surrounding the MERS-CoV FSE, bridging ORF1a with ORF1b ( Figure 6D,E) . While the mechanism governing the FSE-arch formation will require further investigation, similar long-distance interactions around the frameshifting elements of several plant RNA viruses were previously demonstrated to regulate frameshifting, possibly by assisting in back-stepping of ribosomes at the slippery sequence, and by stabilising the FSE, allowing it to refold after the passage of each ribosome (Barry and Miller, 2002; Cimino et al., 2011; Gao and Simon, 2016; Tajima et al., 2011) . In addition to their coding capacity, nucleic acids have evolved structural capabilities to sense metabolites (Mandal and Breaker, 2004) , catalyse reactions (Pyle, 1993) , and interact with other cellular components. When brought in physical proximity, different combinations of cis-acting sequences can lead to new biological activities. For example, interactions between promoters and enhancers dictate the rate of transcription along the eukaryotic genome (Rowley and Corces, 2018) . Great effort is being made to reveal the structural landscapes of the SARS-CoV-2 genome (Andrews et al., 2020; Huston et al., 2020; Kelly et al., 2020; Lan et al., 2020; Manfredonia et al., 2020; Ryder, 2020; Sanders et al., 2020; Sun et al., 2020) . However, without deciphering the long-range connectivity, our understanding is far from being complete. Here we reveal how cis-acting elements along the coronavirus genome are The authors declare no competing interests. E.A.M. is a founder and director of STORM Therapeutics. STORM Therapeutics had no role in the design, performance, analysis, interpretation, and writing of the study. O.Z is a consultant in Evotec Int. Evotec Int. had no role in the design, performance, analysis, interpretation, and writing of the study. Interactions that span at least 500 nt are shown. Colours as in (F). See Figure S2 and Table 1S for numbers of chimeric reads and significance of the arches. Numbers within loops in (B-E) represent the loops sizes. Grey arches adjacent to nucleotide sequences in (B-E) mark unpaired bases. Full sequences are available in Figure S4 . See Figure 5S for statistical significance and the full sequence of the FSE-arch. irradiating the RNA on ice with 2.5 KJ/m2 254 nm UVC using a CL-1000 crosslinker (UVP). Sequencing library preparation. Library preparation was done as described in (Ziv et al., 2018) , using 6N unique molecular identifiers to eliminate PCR biases. Pre-adenylated adapters were used and all ligation reactions were carried without ATP to reduce ligation artefacts. All libraries and controls went through 13 PCR cycles using KAPA HiFi HotStart Ready Mix (KAPA Biosystems). PCR products were size-selected on a 1.8% agarose gel before loading on a Novaseq (Illumina) for paired-end 150 bp runs. Total of ~1.6 billion sequences were achieved for this study. Data Pre-processing. Data pre-processing was performed according to (Ziv et al., 2018) . In brief, raw paired-end reads were trimmed for adaptors and checked for quality using cutadapt (Martin, 2011) . Trimmed paired-end reads were assembled into single reads using the program pear (https://cme.h-its.org/exelixis/web/software/pear/doc.html). PCR duplicates were removed using unique molecular identifiers via collapse.py (https://gitlab.com/tdido/tstk). Chimeric reads were identified and annotated to the respective genome using hyb (Travis et al., 2014) . SARS-CoV2 samples were processed using the Chlorocebus sabaeus reference genome (ChlSab1.1) with the addition of the SARS-CoV-2 sequence (NC_045512.2). MERS samples were processed using the Homo sapiens reference genome (GRCh38) with the addition of MERS (NC_019843.3). Clustering of chimeras into chimeric groups. Due to crosslinking and fragmentation, the COMRADES data can provide redundant structural information whereby the same in vivo structure produces sequencing reads differing by a few nucleotides. This results in increased computation load of folding each chimeric read separately. To overcome this issue, and to gain better structure predictions, the reads were clustered into chimeric groups. Each chimeric read is composed of a left side (L) and right side (R), each originated from a different position along the gRNA or sgmRNA. Each chimeric read can therefore be described as (g): the genomic distance between L and R, and chimeric reads that originated from the same structure will have a similar g and can be clustered based on their g values. Clustering of chimeric reads that originated from the same structure was performed using a network-based approach whereby an adjacency matrix is created for all chimeric reads based on the nucleotide difference between their g values (Deltagap). This results in Deltagap=0 for identically overlapping gaps, and increasing Deltagap values for chimeric reads that share less overlapping sites. The clustering was performed twice per sample, once for the chimeric reads that represent short stem structures (g <= 10 nt) and once for chimeric reads that represent long distance interactions (g > 10 nt). Short range interactions weights were calculated as: E(gi,gj) = 10 -Deltagap (gi, gj) . This allows exactly overlapping gaps to have the highest weight, and gaps with no overlaps to have a weight of 0. For long range interactions weights were calculated as: E(gi,gj) = 15 -Deltagap(gi, gj) Long range interactions with weights lower than 0 were set to 0, meaning that gaps that differ by more than 15 nucleotides could not be considered as part of the same chimeric group. The weighted graphs created for long-and short-range interactions consisted of g as vertices and weights as edges: G = (V,E) using iGraph (http://www.interjournal.org/manuscript_abstract.php?361100992). To identify densely connected subgraphs (communities) with chimeric groups containing chimeric reads that originated from the same structure, we clustered the graph using random walks with the cluster_waltrap function (steps = 2) from the iGraph package. Chimeric groups containing less than 10 chimeric reads were discarded. Chimeric groups often contained a small amount of longer L or R sequences due to the random fragmentation in the COMRADES protocol. To avoid introducing biases in the folding results, clusters were trimmed to the region from L to R for which evidence in the cluster is higher than the mean evidence -2 standard deviations. Folding. Folding of the chimeric groups was performed using the Vienna package (Lorenz et al., 2011) . For short range chimeric groups RNAFold was used (with default parameters) and for long range chimeric groups RNADuplex was used (with default parameters). (III) MSA-3-SARSrel-137seq: pairwise distances between the sequences in (I) were calculated using the mbed function in the kmer package (Blackshields et al., 2010) . The "seeds" attribute was used to extract the sequence indices of all seed sequences. These seed sequences were included in this multiple sequence alignment. This resulted in a smaller sequence set but with a representative variation as (I). MUSCLE (default parameters). (IV) MSA-4-SARSrel-559seq: The unaligned sequences used to generate (II) were divided into seven smaller sequence sets (six 500-sequences sets and one 515-sequence set). The seed sequences ("seeds" attribute of the mbed function in the kmer package (https://cran.rproject.org/package=kmer)) for all seven sets were combined in a new sequence set to be aligned to make up multiple sequence alignment (IV). This resulted in a sequence set with less sequences than (I), but more than (III) and representative variation as (I). MUSCLE (default parameters). In all cases the NCBI reference genome for SARS-CoV-2 (NC_045512.2) was used as reference. For each of the four sequence alignments, the following steps were taken: (I) SARS-CoV-2 COMRADES cluster coordinates. For long-range clusters (defined above) the segments defined by the coordinates of the left and right side of the respective cluster were extracted from the MSA, and fused together to form a smaller MSA containing only the aligned left and right side sequences. For short-range clusters (defined as above) the whole region defined by the start position of the left side and the end position of the right side was extracted. Those segments of the full MSA will be referred to as "cluster alignments". In both cases, any sequence starting with more than 10 empty positions was removed from that cluster alignment. (II) The cluster alignments were analyzed with R-Scape (Rivas et al., 2017) using default parameters. (iii) The R-Scape output for each candidate co-varying pair includes an E-value statistical score (probability of a false positive result for the respective position pair in the cluster alignment). The default significance level of 0.05 was kept, so only position pairs with Evalues smaller than 0.05 were considered in the subsequent analysis. Covariation analysis of sgmRNA chimera clusters. The alignments described above were shortened to include the leader sequence fused to the full-length of mRNA-S, and were subsequently used here. A modified version of the code used from full genome chimera clusters was used. Sequence conservation analysis of the extended FSE structure. Genome conservation data analyzed with synplot2 (Firth, 2014) was taken from (Firth, 2020) . These data were aligned with our structural data and displayed in Figure 6C . Ridanpää, M., van Eenennaam, H., Pelin, K., Chadwick, R., Johnson, C., Yuan, B., vanVenrooij, W., Pruijn, G., Salmela, R., Rockas, S., et al. (2001) a-a-c-c-a-a-c------c--a--c--c---g-----c--g---c----a-a-c-g-a-a-----a --g -g---g-a---a-a-g --a-g-a-g-a-a-c-a---a-g-a-c-a-a-g------g-c----a-a-a-u-g-u-u-c-u-c-u--a-a g-a-a-c-u-u-u-a-a g-u-a-a-g-a-g-g--u-u-c-u-u-g-a-a-a-u-u c-u-u-g-u--g-a-u--g-u-u-c g-a-a-c-a--c-u-a--c-a-a-g An in silico map of the SARS-CoV-2 RNA Structurome In Vivo Mapping of Eukaryotic RNA Interactomes Reveals Principles of Higher-Order Organization and Regulation A -1 ribosomal frameshift element that requires base pairing across four kilobases suggests a mechanism of regulating ribosome and replicase traffic on a viral RNA Sequence embedding for fast construction of guide trees for multiple sequence alignment Characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an RNA pseudoknot Group-specific structural features of the 5'-proximal sequences of coronavirus genomic RNAs Emerging coronaviruses: Genome structure, replication, and pathogenesis Multifaceted regulation of translational readthrough by RNA replication elements in a tombusvirus Conserved and variable domains of RNase MRP RNA Metagenomics reshapes the concepts of RNA virus evolution by revealing extensive horizontal virus transfer MUSCLE: a multiple sequence alignment method with reduced time and space complexity Mapping overlapping functional elements embedded within the proteincoding regions of RNA viruses A putative new SARS-CoV protein, 3c, encoded in an ORF overlapping ORF3a Multiple Cis-acting elements modulate programmed -1 ribosomal frameshifting in Pea enation mosaic virus Characterization of the RNA Components of a Putative Molecular Switch in the 3′ Untranslated Region of the Murine Coronavirus Genome Targeted CRISPR disruption reveals a role for RNase MRP RNA in human preribosomal RNA processing Conserved elements in the 3' untranslated region of flavivirus RNAs and potential cyclization sequences A bulged stem-loop structure in the 3' untranslated region of the genome of the coronavirus mouse hepatitis virus is essential for replication Structure mapping of dengue and Zika viruses reveals functional long-range interactions Comprehensive in-vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms Role of RNase MRP in viral RNA degradation and RNA recombination Modulation of hepatitis C virus RNA abundance by a liver-specific MicroRNA Structural and functional conservation of the programmed −1 ribosomal frameshift signal of SARS coronavirus 2 (SARS-CoV-2) RNA Conformation Capture by Proximity Ligation Structure of the full SARS-CoV-2 RNA genome in infected cells Structural Lability in Stem-Loop 1 Drives a 5′ UTR-3′ UTR Interaction in Coronavirus Replication Integrative Analysis of Zika Virus Genome RNA Structure Reveals Critical Determinants of Viral Infectivity Functional analysis of the stem loop S3 and S4 structures in the coronavirus 3′UTR ViennaRNA Package 2.0 Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Coronavirus cis-Acting RNA Elements Primer tRNAs for reverse transcription Gene regulation by riboswitches Genome-wide mapping of therapeutically-relevant SARS-CoV-2 RNA structures Cutadapt removes adapter sequences from high-throughput sequencing reads Gene N proximal and distal RNA motifs regulate coronavirus nucleocapsid mRNA transcription Longdistance RNA-RNA interactions in the coronavirus genome form high-order structures promoting discontinuous RNA synthesis during transcription A day in the life of the spliceosome Enhanced isolation of SARS-CoV-2 by TMPRSS2-expressing cells The Global Macroeconomic Impacts of COVID-19: Seven Scenarios. Centre for Applied Macroeconomic Analysis shows lack of evidence for structure in lncRNAs Organizational principles of 3D genome architecture Analysis of Rapidly Emerging Variants in Structured Regions of the SARS-CoV-2 Genome Comparative analysis of coronavirus genomic RNA structure reveals conservation in SARS-like coronaviruses A contemporary view of coronavirus transcription Global Mapping of Human RNA-RNA Interactions Continuous and Discontinuous RNA Synthesis in Coronaviruses A conserved RNA pseudoknot in a putative molecular switch domain of the 3'-untranslated region of coronaviruses is only marginally stable In vivo structural characterization of the whole SARS-CoV-2 RNA genome identifies host cell target proteins vulnerable to re-purposed drugs A long-distance RNA-RNA interaction plays an important role in programmed -1 ribosomal frameshifting in the translation of p88 replicase protein of Red clover necrotic mosaic virus Determination of RNA structural diversity and its role in HIV-1 RNA splicing Hyb: A bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data Differential association of protein subunits with the human RNase MRP and RNase P complexes A Phylogenetically Conserved Hairpin-Type 3′ Untranslated Region Pseudoknot Functions in Coronavirus RNA Replication Ecological origins of novel human pathogens COMRADES determines in vivo RNA structures and interactions The authors thank Christian Drosten and John Ziebuhrfor for providing the SARS-CoV-2 and MERS-CoV strains used in this study. We thank B. Luisi, G. Evan, and the Department of