key: cord-0980211-zr7f975u authors: Zhu, Haibo; Fu, Hao; Cui, Tianyu; Ning, Lin; Shao, Huaguo; Guo, Yehan; Ke, Yanting; Zheng, Jiayi; Lin, Hongyan; Wu, Xin; Liu, Guanghao; He, Jun; Han, Xin; Li, Wenlin; Zhao, Xiaoyang; Lu, Huasong; Wang, Dong; Hu, Kongfa; Shen, Xiaopei title: RNAPhaSep: a resource of RNAs undergoing phase separation date: 2021-10-30 journal: Nucleic Acids Res DOI: 10.1093/nar/gkab985 sha: d7d07656e64e1417ca70e1b45d5fe8011071f1cb doc_id: 980211 cord_uid: zr7f975u Liquid-liquid phase separation (LLPS) partitions cellular contents, underlies the formation of membraneless organelles and plays essential biological roles. To date, most of the research on LLPS has focused on proteins, especially RNA-binding proteins. However, accumulating evidence has demonstrated that RNAs can also function as ‘scaffolds’ and play essential roles in seeding or nucleating the formation of granules. To better utilize the knowledge dispersed in published literature, we here introduce RNAPhaSep (http://www.rnaphasep.cn), a manually curated database of RNAs undergoing LLPS. It contains 1113 entries with experimentally validated RNA self-assembly or RNA and protein co-involved phase separation events. RNAPhaSep contains various types of information, including RNA information, protein information, phase separation experiment information and integrated annotation from multiple databases. RNAPhaSep provides a valuable resource for exploring the relationship between RNA properties and phase behaviour, and may further enhance our comprehensive understanding of LLPS in cellular functions and human diseases. In addition to canonical membrane-bound organelles, eukaryotic cells contain numerous membraneless organelles (MLOs) that concentrate specific collections of proteins and nucleic acids (1, 2) . Liquid-liquid phase separation (LLPS), a phenomenon that describes the formation of two immiscible fluids from a single homogeneous mixture, has emerged as a general mechanism to interpret how cells can spatiotemporally create MLOs (3) (4) (5) . To date, a large number of MLOs have been discovered, including but not limited to stress granules, P bodies and even the nucleolus (6, 7) . These MLOs have been implicated in a wide range of cellular functions, organizing molecules that act in processes ranging from RNA metabolism to signalling to gene regulation (8) (9) (10) . Moreover, aberrant MLO behaviours have been linked with multiple human diseases, such as neurodegeneration and cancer (11) (12) (13) . Identifying the molecules driving or undergoing LLPS is the foundation of understanding the mechanisms of MLOs. Many, but not all, phase-separated biological condensates arise from proteins and RNAs (14) . Many nuclear and cytoplasmic condensates are rich in RNAs and RNA-binding proteins (RBPs), which play roles in LLPS (15, 16) . The roles of proteins in condensation have been well studied (17) . However, less attention has been paid to the contribution of RNA to LLPS (18) . As an anionic polymer, RNA is an excellent platform for achieving multivalency and accommodating RBPs (19, 20) . Peptides in condensates usually contain low-complexity domains (LCDs) or intrinsically disordered regions (IDRs) those enabling weak and multivalent interactions to promote liquid-like properties, and a similar role for unstructured sequences in RNAs potentially contribute to RNA-driven phase separation (21, 22) . Recent findings have confirmed that RNAs can function as 'scaffolds' and play essential roles in seeding or nucleating the formation of MLOs (23, 24) . Experimental evidence has demonstrated that RNA self-assembly contributes to stress granule formation and defines the stress granule transcriptome (25) . Furthermore, apart from driving LLPS, diverse RNA properties, such as composition, length, structure, modification and expression level, can modulate the biophysical features of native condensates, including their size, shape, viscosity, liquidity and surface tension (26) (27) (28) . The molecular mechanisms under these associations remain unknown (14) . Over the last five years, a large number of publications have reported cases of RNAs being involved in LLPS, and the number is still increasing rapidly. Five databases focused on proteins undergoing LLPS have been created, including LLPSDB, PhaSePro, PhaSepDB, DrLLPS and RNAGranuleDB (29) (30) (31) (32) (33) . However, centralized resources on newly reported phase separation related RNAs are still lacking. In particular, RNA self-assembly LLPS events have not been recorded in any published phase separation database. To fill this gap, here we introduce RNAPhaSep, a database using RNA as its core and focusing on various properties of RNAs and their roles in the phase separation process. After careful manual collation, a total of 1113 entries of experimentally validated RNA self-assembly or RNA and protein co-involved phase separation events were included in RNAPhaSep. Our RNAPhaSep database provides a convenient interface to help users browse, search and download RNA-related phase separation entries. RNAPhaSep was constructed based on the curated information derived from published literature and the RNALocate database (34) . Literature mining was performed via retrieving from PubMed using the following keywords: ((((phase transition) OR (phase separation) OR (membraneless organelles) OR (biomolecular condensates)) AND RNA) AND cell). A total of 4,804 publications before 30 June 2021, were extracted ( Figure 1A ). For review articles, we read through each manuscript, extracted sentences describing the RNA-related phase separation events and downloaded the corresponding research articles for curation. As some of the known MLOs, such as P-bodies, had been extensively studied before the emergence of the LLPS concept, which underlies MLOs formation, the RNALocate database (34) , which contains subcellular localization of RNA in MLOs, was used ( Figure 1A ). For each RNA record related to MLOs in RNALocate (34) , an original research article was extracted by PMID. To obtain relevant publications that describe MLOs or phase separation and related RNAs, we manually checked the abstracts or full texts of all these articles. During the curation process, we sought to collect phase separation-associated RNAs and as much helpful information as possible, such as original supporting sentences, RNA sequence, mutation, phase separation experimental conditions, phase diagrams, compositions and corresponding cell lines or tissues used for experimentation ( Figure 1B) . RNAPhaSep integrates two types of RNA IDs, including NCBI Gene IDs (35) and RNAcentral identifiers (36) . For several rRNAs and snoRNAs, which could not be found in NCBI, an RNAcentral identifier was supplied. The graph of each RNA's structure was generated using the RNAfold server on ViennaRNA web services (37) . The LCD and IDR information for each protein were collected from MobiDB (38) or PONDR (39) . The molecular properties of RNAs are essential for understanding their potential phase behaviour. For each natural RNA, the description in NCBI (35) , molecular function in Gene Ontology (GO) (40) , subcellular localization in RNALocate (34) , interaction neighbour in RNAinter (41) and associated disease in DisGeNET (42) and OMIM (43) were all integrated as RNA annotation information (Figure 1B) . Designed RNAs were divided into four subclasses based on sequence characteristics. In order of preference, there were: 'poly RNA' if the sequence was just a repetition of a single nucleotide, 'repeat RNA' if the sequence was the duplication of a fragment with at least two different nucleotides, 'nucleotide rich RNA' if the sequence was enriched with two specific types of nucleotides; and the rest are classified as 'irregular RNA'. It is important to emphasise that RNAPhaSep is concerned with cases where the RNA itself or together with other components (DNA or protein) was experimentally validated in vitro or in vivo for LLPS. Thus, the systems with only proteins or DNAs as the main components were excluded. Moreover, systems with the mixtures of RNA, such as total mRNA, were included, as RNA was the only component driving LLPS in these systems. After sorting out the records, we noticed no DNA and RNA co-involved reports, which may be due to the limited research in this area. We may include this type of LLPS event in the updated version of RNAPhaSep. The state of phase separation can vary dynamically in a wide range, from liquid to solid. Four states, including 'solute', 'liquid', 'gel' and 'solid', were used to define the morphological characteristics of phase separation (1) . Changes in experimental conditions can lead to a phase transition, such as from liquid to gel, then 'liquid, gel' was recorded as the morphology of this phase separation event. RNArelated phase separation events curated in RNAPhaSep were verified by experiments, including reconstituting LLPS condensation in vitro and examining droplet formation by immunofluorescence in vivo. RNAs detected by highthroughput methods were excluded. As of August 2021, RNAPhaSep included 1113 curated entries about RNA self-assembly or RNA and protein co-involved phase separation events, involving 325 nonredundant RNAs of 22 organisms (Figure 2A) . We consolidated the entries with RNA or RNA plus protein names to reduce the data redundancy and assigned all entries to 628 unique RNAPSIDs. RNA properties such as composition, species, classification, sequence, length, structure, subcellular localization, RNA interaction neighbours, related molecular functions and diseases were collected and organized for each RNA. For different RNA types, we classified RNAs into natural and designed RNA ( Figure 2B and C) . Although natural RNAs often have diverse annotation information, which could comprehensively describe their molecular functions in the cell, the impact of RNA sequence on phase separa-tion events is more clearly demonstrated by designed RNAs due to their designability and low complexity. In vitro experiments are very important as their simplified processes can help researchers clearly identify the conditions involved in phase separation. Researchers can simulate intracellular phase separation events by controlling various experimental details, such as salt concentration, buffer, temperature, and pH. Most importantly, the components involved in these experiments are known, so for in vitro experiments, we classified entries as 'RNA(s)', 'RNA + protein(s)' and 'RNA(s) + protein(s)' ( Figure 2D ). The morphological distribution of phase-separated records was demonstrated in Figure 2E . The RNA sequences of in vivo records, mostly from the annotation in NCBI (35) or RNACentral (36), were not validated in the in vivo phase separation experiment. Thus, 184 RNAs from in vitro records were used for sequence and length analysis. The sequence analysis demonstrated that LLPS related RNAs' sequences were enriched with adenine and uracil, which together accounted for 58.2%, 58%, 67.5% for all RNA, natural RNA and designed RNA ( Figure 2F ). Motif analysis of the RNA sequence by STREME (44) (28) mRNA (123) tRNA (5) total RNA (13) virus RNA (43) undefined (278) RNA (30) RNA+protein (549) RNA+proteins (59) RNAs (14) RNAs+protein (158) RNAs+proteins (25) A B repeat RNA (28) poly RNA (41) irregular RNA (25) nucleotide rich RNA (11) Caenorhabditis elegans(18) Drosophila melanogaster (6) Homo sapiens (192) Photinus pyralis(4) Other Species (17) Mus musculus (22 (10) liquid (10) solid (1) gel (8) liquid (501) solid (23) Virus (43) In vivo In vitro snRNA (2) snoRNA (6) performed to discover common sequence elements. However, we have not found any significant motifs, which may be due to a relatively small number of RNAs. Designed RNA favours shorter RNA sequences than natural RNA in sequence length distribution ( Figure 2G) . A novel coronavirus, SARS-CoV-2, has caused the ongoing worldwide COVID-19 pandemic. Scientists have found that viral genomic RNAs can form phase-separated droplets with nucleocapsid proteins and that these droplets become solid-like structures as the RNA length increases (45, 46) . Thus, this LLPS morphology is utterly dependent on the length and concentration of the given RNAs. To date, 76 phase-separated entries of SARS-CoV-2 are present in the RNAPhaSep database. A user-friendly and fully functional website has been developed for searching, browsing and downloading RNArelated phase separation data. This database includes eight modules, including Home, Search/Blast, Browse, Submit, Download, Statistics, Help and Contact modules. For the convenience of the user for searching the data, RNAPhaSep provides two different searching ways on the Search/Blast page, including 'By options' (search by the combination of keywords, component type, species and RNA type) and 'By RNA sequence' (search by inputting RNA sequence) ( Figure 3A ). For 'Search by options', we have provided three typical examples. By clicking on the example button, the options information is automatically applied, and then by clicking on the 'search' button, the 'Search Result' is presented in a table format. The 'Search by RNA sequence' module enables users to identify the sequence similarity between their target RNA and the LLPSrelated RNA stored in the database. The 'Search/Blast' module can help the user quickly screen how their interested RNA contributes to LLPS under available conditions with or without any partners. The data sources are briefly described on the 'Browse' page, which has a table containing all entries that can be divided into different subsets by in vivo/vitro and RNA type ( Figure 3B ). Users can click on a unique RNAPSID to navigate to the 'Phase Separation Details Page', which includes various descriptions of the involved RNA, protein and experimental condition details ( Figure 3C ). For RNA, the phase separation-related RNA sequence and structure were demonstrated on the page. The corresponding Uniprot ID, IDR, LCD, mutation, modification, and sequence Nucleic Acids Research, 2022, Vol. 50, Database issue D345 information are listed for each involved protein, if available. For each involved protein, in order to allow users to easily obtain a wide range of information on protein-related phase separation, we provided direct links to the protein detail page of known protein-related LLPS databases, including LLPSDB, PhaSePro and DrLLPS (29) (30) 32) . As PhaSepDB did not provide the unique link for each protein, the link to its 'Browse Page' was supplied (31) . The same protein involved in different LLPS experiments may have different sequences or structures, such as wild-type, mutant or post-transcriptional modified. These differences potentially affect the LLPS conditions. Thus, the 'State' of the protein is used to discriminate different protein sequences or structures and corresponding experimental conditions ( Figure 3C ). For each LLPS experiment, a phase diagram is presented in a graphical form, if available. For the natural RNAs, which have extensive information, clicking on the RNA symbol will take the user to the RNA annotation page, which integrates RNA description (35) , molecular function (40), subcellular location (34), RNA interaction network (top 100 extracted from RNAinter) (41) and associated diseases (42-43) ( Figure 3D) . A detailed tutorial for the usage of the database can be found on the 'Help' page. On the 'Submission' page, we supplied three choices for different data sources, including published, preprint and unpublished data. The submission of published or preprint data will be routinely reviewed manually and formatted into database form with PMID or tagged as 'preprint'. For the unpublished data, authors may not want to share the original experiment evidence, and we will tag the data by 'unpublished'. We added the reminder on the phase separation page that 'Entries with 'unpublished' tag have relatively lower reliability'. The database is stored in the MySQL v5.7 (https://www. mysql.com) database engine. The web framework was constructed on Django v3.2 (https://www.djangoproject.com) and ran on a CentOS Linux operating system server. We have tested it on Google Chrome, Mozilla Firefox, Microsoft Edge and Apple Safari browsers. The RNAPhaSep database is freely available to the research community online at http://www.rnaphasep.cn. Here, we present a novel resource on RNA-related phase separation, RNAPhaSep, generated from information obtained from the literature and RNALocate database. It contains 1113 experimentally validated RNA self-assembly or RNA and protein co-involved phase separation events, helping and guiding researchers to perform further studies related to LLPS. RNAPhaSep was designed explicitly for RNA-related phase separation, and we believe it will be a handy tool for researchers in this field. From recently published perspectives and review articles, we noticed that the role of RNA in driving phase separation has attracted increasing researchers' attention and interest. The establishment of RNAPhaSep is only the first step, and we will continue to expand and improve it to satisfy more requirements in this field. We are now collecting a reliable LLPS corpus for developing a text mining system, which can automatically extract LLPS information from biomedical literature in PubMed. To ensure that all data from literature has consistently high reliability, these automatically extracted records will still need to be manually curated. After applying this system, we can update RNAPhaSep more frequently. Protein phaseseparation: a new phase in cell biology Physical chemistry of cellular liquid-phase separation Liquid-liquid phase separation in biology Spatial patterning of P granules by RNA-induced phase separation of the intrinsically-disordered protein MEG-3. Elife, 5 Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates The molecular language of membraneless organelles Biomolecular condensates: organizers of cellular biochemistry Dynamic transcriptomic m(6)A decoration: writers, erasers, readers and functions in RNA metabolism RNA sequence context effects measured in vitro predict in vivo protein binding and regulation RNA polymerase II clustering through carboxy-terminal domain phase separation TIA1 mutations in amyotrophic lateral sclerosis and frontotemporal dementia promote phase separation and alter stress granule dynamics Biomolecular condensates and cancer Ubiquilin 2 modulates ALS/FTD-linked FUS-RNA complex dynamics and stress granule formation RNA contributions to the form and function of biomolecular condensates Phase separation of FUS is suppressed by its nuclear import receptor and arginine methylation ALS/FTD mutation-induced phase transition of FUS liquid droplets and reversible hydrogels into irreversible hydrogels impairs RNP granule function Formation and maturation of phase-separated liquid droplets by RNA-binding proteins Promiscuous interactions and protein disaggregases determine the material state of stress-inducible RNP granules Nuclear-import receptors reverse aberrant phase transitions of RNA-binding proteins with prion-like domains Nuclear import receptor inhibits phase separation of FUS through binding to multiple sites ) mRNA structure determines specificity of a polyQ-driven phase separation ) m(6)A enhances the phase separation potential of mRNA RNA phase transitions in repeat expansion disorders Free mRNA in excess upon polysome dissociation is a scaffold for protein multimerization to form stress granules RNA self-assembly contributes to stress granule formation and defining the stress granule transcriptome The disordered P granule protein LAF-1 drives phase separation into droplets with tunable viscosity and dynamics RNA controls PolyQ protein phase transitions Interfacial tension of reactive, liquid interfaces and its consequences LLPSDB: a database of proteins undergoing liquid-liquid phase separation in vitro PhaSePro: the database of proteins driving liquid-liquid phase separation PhaSepDB: a database of liquid-liquid phase separation related proteins DrLLPS: a data resource of liquid-liquid phase separation in eukaryotes Properties of stress granule and P-body proteomes RNALocate v2.0: an updated resource for RNA subcellular localization with increased coverage and annotation Database resources of the National Center for Biotechnology Information RNAcentral: a hub of information for non-coding RNA sequences The ViennaRNA web services MobiDB: intrinsically disordered proteins in 2021 PONDR-FIT: a meta-predictor of intrinsically disordered amino acids The Gene Ontology resource: enriching a GOld mine RNAInter v4.0: RNA interactome repository with redefined confidence scoring system and improved accessibility The DisGeNET knowledge platform for disease genomics: 2019 update OMIM.org: leveraging knowledge across phenotype-gene relationships STREME: accurate and versatile sequence motif discovery Genomic RNA elements drive phase separation of the SARS-CoV-2 nucleocapsid Phosphoregulation of phase separation by the SARS-CoV-2 N protein suggests a biophysical basis for its dual functions