key: cord-0799595-mqq6pgua authors: Bugnon, L. A.; Raad, J.; Merino, G. A.; Yones, C.; Ariel, F.; Milone, D. H.; Stegmayer, G. title: Deep Learning for the discovery of new pre-miRNAs: Helping the fight against COVID-19 date: 2021-09-09 journal: Machine Learning with Applications DOI: 10.1016/j.mlwa.2021.100150 sha: 7dab2aab79c144ab51ed7616d6511e4485105d9c doc_id: 799595 cord_uid: mqq6pgua The Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) has been recently found responsible for the pandemic outbreak of a novel coronavirus disease (COVID-19). In this work, a novel approach based on deep learning is proposed for identifying precursors of small active RNA molecules named microRNA (miRNA) in the genome of the novel coronavirus. Viral miRNA-like molecules have shown to modulate the host transcriptome during the infection progression, thus their identification is crucial for helping the diagnosis or medical treatment of the disease. The existence of the mature miRNAs derived from computationally predicted miRNA precursors (pre-miRNAs) in the novel coronavirus was validated with small RNA-seq data from SARS-CoV-2-infected human cells. The results demonstrate that computational models can provide accurate and useful predictions of pre-miRNAs in the SARS-CoV-2 genome, underscoring the relevance of machine learning in the response to a global sanitary emergency. Moreover, the interpretability of our model shed light on the molecular mechanisms underlying the viral infection, thus contributing to the fight against the COVID-19 pandemic and the fast development of new treatments. Our study shows how recent advances in machine learning can be used, effectively, in response to public health emergencies. The approach developed in this work could be of great help in future similar emergencies to accelerate the understanding of the singularities of any viral agent and for the development of novel therapies. Data and source code available at: https://sourceforge.net/projects/sourcesinc/files/aicovid/. albeit including yet unknown hidden pre-miRNAs. For example, the Anopheles 53 gambiae genome has only 66 well known pre-miRNAs, but more than 4 million 54 hairpin-like sequences, thus giving an imbalance of 1:60,000 (Bugnon et al., We present here our proposal in detail, and the results obtained. Remarkably, 102 some candidate pre-miRNAs that were computationally predicted by our pro-103 posal have been experimentally identified, very recently, with small RNA-seq 104 data from SARS-CoV-2 infected human cells. This paper is organized as follows. Section II explains in detail the ML-based 106 pipeline designed for finding novel pre-miRNAs in the SARS-CoV-2 genome and 107 the ML models used in this work. In Section III the data sets used in this study length of the cutting window has to be configured to define the maximum size 129 that the stem-loops found will have (then shorter stems can also be identified). A stem-loop is a sequence that, once predicted its secondary structure, fulfils 131 certain conditions such as minimum energy released when folding, unpaired 132 nucleotides at the middle (the loop) and a minimum length in the remaining 133 paired nucleotides (the stem). The window must be long enough to correctly include a complete hairpin, The identification of potential pre-miRNAs encoded in the SARS-CoV-2 319 genome was performed with mirDNN, due to its best performance in the com-320 parison of the previous subsection. For each candidate sequence, the method 321 gives a score indicating whether it is a good miRNA precursor candidate (score 322 close to 1) or not (score close to 0). In the case of mirDNN, the activation level 323 of the pre-miRNA output neuron is used as a score. The experimental validation of these computational predictions involved ex- The second example is shown in Figure 4 for the sars cov2 101-701 stem- mirDNN Importance a a a c g u u c g g a u g c u c g a a c u g c a c c u c a u g g u c a u g u u a u g g u u g a g c u g g u a g c a g a a c u c g a a g g c a u u c a g u a c g g u c g u a g u secondary structure corresponding to this pre-miRNA, with its mature miRNA 381 marked in red, which was determined as in the other case according to the 382 importance given by mirDNN and confirmed with the experimental reads. Once the most likely mature miRNA derived from the pre-miRNA has been 384 determined, it is to predict the corresponding target genes for each of the newly Deep residual learning for im-502 age recognition Vienna RNA secondary structure server Genome-wide identification of microRNA expression quantitative trait 510 loci AI tech-512 niques for COVID-19 Deep learning approaches for covid-19 515 detection based on chest x-ray images How miRNAs can 518 protect humans from coronaviruses COVID-19 Research Square COVID-19 Preprints Ar-524 tificial intelligence and COVID-19: Deep learning approaches for diagnosis 525 and treatment miRNA repertoire and host immune factor regulation 529 upon avian coronavirus infection in eggs miRBase: from 532 microRNA sequences to function Computational approaches 535 for microRNA studies: a review An overview of RNA virus-encoded microRNAs Focal loss for 540 dense object detection Identifying miRNAs, targets and func-543 tions Novel SARS-CoV-2 546 encoded small RNAs in the passage to humans Artificial intelligence (AI) and big data for coronavirus (COVID-19) 550 pandemic: A survey on the state-of-the-arts Grad-cam: Visual explanations from deep networks via gradient-based 2017 IEEE International Conference on Computer Vision 555 (ICCV) A single-cell RNA expres-557 sion map of human coronavirus entry factors The role of the precursor structure in the biogenesis of mi Predicting novel microRNA: a comprehensive comparison of ma-567 chine learning approaches Fast and accurate microRNA search using CNN Improved and 572 promising identification of human micrornas by incorporating a high-quality 573 negative set Classification of 576 real and pseudo microrna precursors using local structure-sequence features 577 and support vector machine HextractoR: an r package for automatic extraction of hairpins from 581 genome-wide data. bioRxiv High 583 precision in microrna prediction: a novel genome-wide approach based on 584 convolutional deep residual networks. bioRxiv NAfe: A comprehensive tool for feature extraction in microRNA prediction A comparison study between 590 one-class and two-class machine learning for MicroRNA target detection Visual interpretability for deep learning: a survey Authors would like to thank Emanuel Wyler and Prof. M. Landthaler for