key: cord-0778093-nfzym5oj authors: da Rosa, Rafael Lopes; Yang, Tung Sheng; Tureta, Emanuela Fernanda; de Oliveira, Laura Rascovetzki Saciloto; Moraes, Amanda Naiara Silva; Tatara, Juliana Miranda; Costa, Renata Pereira; Borges, Júlia Spier; Alves, Camila Innocente; Berger, Markus; Guimarães, Jorge Almeida; Santi, Lucélia; Beys-da-Silva, Walter Orlando title: SARSCOVIDB—A New Platform for the Analysis of the Molecular Impact of SARS-CoV-2 Viral Infection date: 2021-01-21 journal: ACS Omega DOI: 10.1021/acsomega.0c05701 sha: 03abe171160fd9e8b6876a737819b711bb558c4e doc_id: 778093 cord_uid: nfzym5oj [Image: see text] The COVID-19 pandemic caused by the new coronavirus (SARS-CoV-2) has become a global emergency issue for public health. This threat has led to an acceleration in related research and, consequently, an unprecedented volume of clinical and experimental data that include changes in gene expression resulting from infection. The SARS-CoV-2 infection database (SARSCOVIDB: https://sarscovidb.org/) was created to mitigate the difficulties related to this scenario. The SARSCOVIDB is an online platform that aims to integrate all differential gene expression data, at messenger RNA and protein levels, helping to speed up analysis and research on the molecular impact of COVID-19. The database can be searched from different experimental perspectives and presents all related information from published data, such as viral strains, hosts, methodological approaches (proteomics or transcriptomics), genes/proteins, and samples (clinical or experimental). All information was taken from 24 articles related to analyses of differential gene expression out of 5,554 COVID-19/SARS-CoV-2-related articles published so far. The database features 12,535 genes whose expression has been identified as altered due to SARS-CoV-2 infection. Thus, the SARSCOVIDB is a new resource to support the health workers and the scientific community in understanding the pathogenesis and molecular impact caused by SARS-CoV-2. In December 2019, a new strain of coronavirus associated with severe acute respiratory syndrome was identified as SARS-CoV-2. 1 This is an encapsulated, single-stranded RNA virus from the Coronaviridae family, presenting high virulence and generating a significant global impact. 2 The COVID-19 pandemic caused by SARS-CoV-2 had as its epicenter the city of Wuhan, China, and in a short time became a serious public health problem worldwide. 3 As of end-October, it is present in more than 200 countries, accounting for 45 million cases. 4 In the USA, the country with the highest number of deaths, more than 8 million cases and 220,000 deaths have been recorded. 4 The same dramatic outcome of the pandemic can be noticed in Brazil, the country with the second highest number of COVID-19 deaths, with more than 5 million cases and 150,000 deaths. 5 The increasing proliferation of SARS-CoV-2-infection cases and the lack of specific therapeutics and vaccines have caused concern among public authorities and international agencies, resulting in the mobilization of the scientific community to understand the disease and clinical outcomes aiming to improve treatments and find ways to prevent cases. This current scenario has generated significant changes in the field, such as a rapid and unprecedented increase in the number of scientific articles being published. It was possible through facilitated submission processes and preprint publications, without peer evaluation, to disseminate articles with very speculative results, such as computational predictions. 6 Understanding of the molecular aspects associated with COVID-19 and the search for understanding the complex response of the host after viral infection have been gaining space. 7−9 Approaches that evaluate viral infections from the perspective of the molecular impact on the host contribute to the understanding of the disease, which is important for outlining potential antiviral strategies. 10 Analysis of differentially expressed genes (DEGs), which allow in silico characterization of the molecular response and the impact of infection, can be applied. Some databases contain a collection of expression data, but their results are mostly obtained by text mining with automatic and semiautomatic approaches, which may lead to nonaccurate data being deposited. 11 Furthermore, the results may present redundancy and ambiguity, and a postanalysis is necessary for data conference. 12 In the same way, the need for bioinformaticians to extract raw data in many of the available databases is another important limitation of its use; it can be especially limiting for medical workers without bioinformatics skills who are facing the COVID-19 impact in real time and need information promptly. This scenario hampers the access of the medical community and also other nonspecialized scientists who may need this information in their research. Recently, our group developed the ZIKAVID, a database based on gathering all up-to-date gene expression data generated after Zika virus (ZIKV) infection, containing different experimental approaches, hosts and strains, and other related information. 12 In this work, we present the SARS-CoV-2 virus infection database (SARSCOVIDB: https://sarscovidb.org/), a public database containing all DEGs identified in SARS-CoV-2 infection and COVID-19 samples, manually developed and ACS Omega http://pubs.acs.org/journal/acsodf Article checked, with a friendly interface that is easy to navigate. This database will help researchers worldwide, and general users, to speed up the research and understanding of the molecular impact of COVID-19 and possible clinical outcomes. The outbreak of COVID-19 worldwide, linked to the lack of efficient treatments and approved vaccines, triggered a great effort by the scientific community and governments toward research involving SARS-CoV-2 and the potential comorbidities associated in humans. 13 For this reason, the SARSCO-VIDB was created, comprising all data from DEGs identified after SARS-CoV-2 infection to date. The database was initially built by searching the specific terms "COVID" and "SARS-CoV-2", with a manual doublecheck for differential expression of genes or proteins after SARS-CoV-2 infection, regardless of the host. To increase the search and the user interface, all data were categorized according to experimental approach, as described above, including the reference article. The SARSCOVIDB is a database to exclusively gather DEGs after SARS-CoV-2 infection and thus is an important resource to be explored by the scientific community and COVID-19 medical workers in this topic of urgent need. So far, the SARSCOVIDB contains 12,535 differential DEG entries and 9,283 unique genes. These data comprise different experimental approaches with distinct objectives, the majority being obtained from clinical samples ( Figure 1A ). Thus, users can easily consult the most frequently reported host models and compare specific gene sets and other information from all published studies with differential expression after SARS-CoV-2 infection. This can facilitate the planning of new experiments, accelerating understanding of the data available and contributing to accelerate understanding of the molecular impact of COVID-19. These expressive numbers in such a short time span of only a few months also highlight the great efforts made by the scientific community to study SARS-CoV-2 infection and COVID-19. The ZIKAVID covered the same kind of data but on ZIKV; however, this generated a smaller number of clinical samples of ZIKV-infected patients, 12 even though the epidemic also reached a global threat status 14 and caused great concern worldwide in 2015−2017. Regarding the samples used, most are peripheral blood samples (five serum, four blood, and two plasma samples) and one lung sample ( Figure 1B ). On the other hand, the experimental samples used different cell lines, most of them from humans ( Figure 1C ). As observed, many scientific data have been generated using cell lines and other models but there are still no robust animal models that faithfully replicate the pathogenesis of SARS-CoV-2. 15 Thus, the lack of comprehensive studies comparing different hosts and their responses to infection made the survey of data gathered in the SARSCOVIDB an important alternative to planning future experiments regarding the choice of a more meaningful experimental model. The SARSCOVIDB comprises data from SARS-CoV-2 isolates from different geographic regions ( Figure 1D ). However, studies have still given little importance to the impact that the mutations suffered by strains in different regions may have on disease dynamics and virulence, as occurred with ZIKV for instance. 16 Interestingly, almost 50% of the SARS-CoV-2 isolates or clinical samples studied are from China, followed by America (around 16%) and Europe (around 16%). It is important to highlight that depending on the origin, viral isolates of SARS-CoV-2 can lead to different pathological impacts and mortality rates, as previously suggested. 17 The most recent example of a virus that had a differential clinical impact, depending on the origin, was ZIKV, where Brazilian isolates were strongly associated with severe neurological data. 18 Thus, the SARSCOVIDB contributes to the study of the pathology on the origin of the virus, once it is http://pubs.acs.org/journal/acsodf Article possible to promptly cross-reference expression data in similar models using viruses from different sources. Currently, there are various databases gathering genomic, transcriptomic, and proteomic data on viruses and their impact on the host. 19, 20 Several databases were developed to obtain host−pathogen interaction data (gene expression and protein interaction), such as VirHostNet 2.0, 21 Virhostome, 22 the HIV-1 human interaction database, 23 the Gene Expression Omnibus, 24 and the Virus Pathogen Resource (ViPR). 19 On SARS-CoV-2, there are already two databases: Coronascape, 25 available at Metascape database, and The COVID-19 Drug and Gene Set Library, 26 which contain a collection of drug and gene sets related to COVID-19. Although not the first SARS-CoV-2 database, SARSCOVIDB is certainly the first to devote itself exclusively to this specific focus on differential expression after the SARS-CoV-2 infection. Most of the data generated by other databases are automated by concentrating generic information, which can result in inaccurate information. 27 The SARSCOVIDB proposes to fill this gap in a simple, organized, and objective way by curating the data manually, making it reliable. Furthermore, the user, who does not need a bioinformatic background or skills, can query data from different experimental perspectives, such as the methodology used, host, virus strain, and so forth. The SARSCOVIDB is the first database to gather all data to date from differential expression analyses after SARS-CoV-2 infection, with manual checking of the data to give more accurate and faster analysis. Users do not need to have a background in bioinformatics because the user-friendly and simple interface enables all search possibilities to be explored. This allows the user to cross-reference data for the best understanding and analysis of the deposited data. Finally, the SARSCOVIDB is a promising tool for supporting scientists and medical professionals carrying out research and analysis on the molecular mechanisms of SARS-CoV-2 infection. Original Article Selection. The SARSCOVIDB (available at: https://sarscovidb.org/) comprises differential gene expression measurements, at mRNA and protein levels, built through four main steps ( Figure 2 ). The first step was a manual search by manual text mining to find all articles available in PubMed, Web of Science, Google Scholar, and ScienceDirect databases containing the terms "SARS-CoV-2" and "COVID-19". Only accepted/published manuscripts were used as the source of DEGs in SARS-CoV2 infection. This search retrieved 5554 articles, from December 2019 to date. The second step comprised a manual check of abstracts to select articles containing only differential gene expression measurements after SARS-CoV-2 infection. All references were double-checked by two independent individuals, resulting in the selection of 24 articles. Data Collection and Related Information. The data collected from selected papers were checked and organized, comprising a list of DEGs at mRNA and/or protein level identified after SARS-CoV-2 infection. Other information was also collected, such as the type of study (in vivo, in vitro, and clinical), methodological approach (transcriptomic, proteomic, qRT-PCR, and immunoblotting), viral strain, hosts (clinical or animal model and cell culture), and expression status (see Table 1 ). The SARSCOVIDB also contains information from data deposited in a repository or database. The database will be updated at least monthly by considering the availability of new data from published articles. Webpage Construction and User Interface. The last step in the SARSCOVIDB was webpage development ( Figure 1 ). MySQL v5.0, PHP v.5.2.99, and HTML10 were used to build the database and the graphical user interface. The data were stored and maintained by the relational database management system (RDBMS) server, with the SQL language used for data management. The search can be customized easily by the user, combining specific proteins with hosts or selecting a specific viral strain or geographic origin. The database also provides step-by-step links to instruct the user how to search. A list containing all genes deposited in the SARSCOVIDB is available through direct download (https:// sarscovidb.org/download/). Available in: https:// www.who.int/news-room/detail/30-01-2020-statement-on-thesecond-meeting-of-the-international-health-regulations World Health Organization Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Statement regarding cluster of pneumonia cases in Wuhan, China COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) Painel COVID-19 Pandemic Publishing: Medical journals drastically speed up their publication process for Covid-19 Proteomics of SARS-CoV-2-infected host cells reveals therapy targets Downregulated gene expression spectrum and immune responses changed during the disease progression in patients with COVID-19 Proteomic and metabolomic characterization of COVID-19 patient sera Understanding human-virus protein-protein interactions using a human protein complex-based analysis framework ZIKAVIDZika virus infection database: a new platform to analyze the molecular impact of Zika virus infection Unprecedented surge in publications related to COVID-19 in the first three months of pandemic: A bibliometric analytic report WHO announces a Public Health Emergency of International Concern; Pan American Health Organization A Review on SARS-CoV-2 virology, pathophysiology, animal models, and anti-viral interventions Comparative analysis of African and Asian lineage-derived Zika virus strains reveals differences in activation of and sensitivity to antiviral innate immunity Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity Zika virus infection of human mesenchymal stem cells promotes differential expression of proteins linked to several neurological diseases ViPR: an open bioinformatics database and analysis resource for virology research ZIKV − CDB: a collaborative database to guide research linking sncRNAs and Zika virus disease symptoms VirHostNet 2.0: surfing on the web of virus/host molecular interactions data Interpreting cancer genomes using systematic host perturbations by tumour virus proteins HIV-1, human interaction database: current status and new features The gene expression omnibus database Metascape provides a biologist-oriented resource for the analysis of systems-level datasets Iliopoulos, I. Protein−protein interaction predictions using text mining methods The authors declare no competing financial interest. The authors would like to thank the Brazilian agency Coordenacaõ de Aperfeicoamento de Pessoal de Nível Superior (CAPES) for financial support.