key: cord-268795-tjmx6msm authors: Sardar, Rahila; Satish, Deepshikha; Birla, Shweta; Gupta, Dinesh title: Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis date: 2020-03-21 journal: bioRxiv DOI: 10.1101/2020.03.21.001586 sha: doc_id: 268795 cord_uid: tjmx6msm The ongoing pandemic of the coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV2). We have performed an integrated sequence-based analysis of SARS-CoV2 genomes from different geographical locations in order to identify its unique features absent in SARS-CoV and other related coronavirus family genomes, conferring unique infection, facilitation of transmission, virulence and immunogenic features to the virus. The phylogeny of the genomes yields some interesting results. Systematic gene level mutational analysis of the genomes has enabled us to identify several unique features of the SARS-CoV2 genome, which includes a unique mutation in the spike surface glycoprotein (A930V (24351C>T)) in the Indian SARS-CoV2, absent in other strains studied here. We have also predicted the impact of the mutations in the spike glycoprotein function and stability, using computational approach. To gain further insights into host responses to viral infection, we predict that antiviral host-miRNAs may be controlling the viral pathogenesis. Our analysis reveals nine host miRNAs which can potentially target SARS-CoV2 genes. Interestingly, the nine miRNAs do not have targets in SARS and MERS genomes. Also, hsa-miR-27b is the only unique miRNA which has a target gene in the Indian SARS-CoV2 genome. We also predicted immune epitopes in the genomes The first case of COVID-19 patient was reported in December 2019 at Wuhan (China) and then it has spread worldwide to become a pandemic, with maximum death cases in Italy, though initiallythe maximum mortality was reported from China (1). According to a WHO report, as on 18 th March 2020 there were confirmed 209, 839 COVID-19 cases and 8778 cases of deaths, that includes cases which were locally transmitted or imported (2) . There are published reports which suggests that SARS-CoV2 shares highest similarity with bat SARS-CoV. Scientists across the globe are trying to elucidate the genome characteristics using phylogenetic, structural and mutational analysis. Recent paper identified specific mutations in receptor binding domain (RBD) domain of spike protein which is most variable part in coronavirus genome (3) . There are more than 400 SARS-CoV2 assembled genomes available at NCBI database. Sequence analysis of the genomes can give us plethora of information which can of use for drug development and vaccine development research attempts. In the current work we collected SARS-CoV2 genomes from different geographical origins mainly from India, Italy, USA, Nepal and Wuhan to identify notable genomic features of SARS-CoV2 by integrated analysis. These analyses include identification of notable mutational signatures, host antiviral-miRNA identification and epitope prediction. As a host defense mechanism, a repertoire of host miRNAs also target invading viruses. We followed the parameters used in various anti-viral miRNA databases to predict host anti-viral miRNAs against SARS-CoV2. Our analysis shows unique host-miRNAs targeting SARS-CoV2 virus genes. respectively, were retrieved from NCBI genome database. SARS-CoV2 genomes from India, Italy, USA, Nepal along with SARS-CoV and MERS were used as query genomes to compare with Wuhan SARS-CoV2 genome. Genes and protein sequences of SARS-CoV2 were retrieved from ViPR database(4). All assembled query genomes in FASTA format were analyzed using Genome To understand the variation in genomes from various geographical areas used in the study, we performed a phylogenetic analysis. Neighbor joining method with bootstrap value of 1000 replicates was used for the construction of consensus tree using MEGA software(6) (10.1.7 version). CELLO2GO (7)server was used to infer biological function for each protein of SARS-CoV2 genome with their localization prediction. The mutations reported in literature (3)were catalogued and evaluated for pathogenicity. We used MutPred(8)server to identify disease associated amino acid substitution from neutral substitution, with a p-value of >=0.05. In order to assess the impact of SNPs on protein stability, we used two machine learning based prediction methods. The first method, I-MUTANT server(9) was used to predict stability of the protein sequences at pH 7.0 and temperature 25˚C. The second prediction method is MuPro(10) server, the predictions with the former method helps in getting a consensus prediction. To predict host miRNAs targeting the virus, we collected a list of experimentally verified antiviral miRNAs with their targets from VIRmiRNA database (11) . Only these host miRNAs were processed for downstream analysis. (Figure 1 To identify potential host microRNA target sites in the virus genome sequences, we have used miRanda (3.3 a version) (12, 13) software, with an energy threshold of -20 kcal/mol. We also used psRNATarget server to compare the predicted targets by the two methods (14) . All the genes and protein sequences for SARS-CoV2 were retrieved from ViPR database. To identify CTL and B-cell epitopes we have used CTLpred(15), ABCpred(16) servers with default parameters. CHEMOpred (17) and Vaxijen server (18) were used to predict chemokines and protective probable antigen, respectively ( Figure 1(c) ). Assembled SARS-CoV2 genomes sequences in FASTA format from India, USA, China, Italy and Nepal used for coronavirus typing tool analysis. Using the tool, we were able to locate query SARS-CoV genomes with known SARS-CoV2 to obtain a cladogram for evolutionary analysis as shown in Figure Several mutations are revealed when SAR-CoV2 and SARS-CoV spike glycoproteins are compared. Six frameshift mutations and 1 insertion in the genome that corresponds to S13_Q14insSDLD (21601_21602insAGTGACCTTGAC) ( Table S1 ) was also revealed. We also observed that there are several mutations located in the regions associated with high immune response (Table S2) From SNPs analysis we observed that all the mutations might bring about decrease in stability without changing their properties i.e. hydrophobicity to hydrophilicity or vice versa. L455Y mutation predicted to altered Ordered interface, Disordered interface Stability, transmembrane protein and gain of GPI-anchor amidation at N450 position (Table 2) . It is known, and also confirmed by Gene ontology analysis that the protein is involved in pathogenesis, membrane organization, reproduction, symbiosis, encompassing mutualism through parasitism, and locomotion. PsRNATarget analysis based on the complementary matching between the sRNA sequence and target mRNA sequence with predefined scoring schema identified 6 miRNAs out of the 9 identified miRNAs to target SARS-CoV2 genes. The 6 miRNAsare predicted to act on the viral genomes by cleaving their target sites (Table 4 ). Intriguingly, our analysis (S. Figure 2 ) revealed that there is only a single host miRNA We have used bioinformatics tools to investigate SARS-CoV2 sequences from different geographical locations. The Phylogenetic analysis of the genomes, the nucleotide sequence diversity analysis of the genomes, the predicted antiviral host miRNAs specific to the genomes and the prediction of immune active sequences in the genomes have yielded some interesting facts, including unique features. For the phylogenetic analysis, we compared the sequences of 6 SARS-CoV2 isolates from different countries namely, Wuhan, India, Italy, USA and Nepal along with other corona virus species ( Figure 1 ). As reported earlier too (19, 20) , the virus from Wuhan showed higher similarity with SARS-CoV. There was no phylogenetic segregation of the genomes based on geographic origin, whether from the same continent or a neighboring country (Figure 1 ) but, ambiguously showing varied clustering like Italy and Nepal clustered together, followed by India and USA. This reiterates the findings indicating the massive exchange and importation of the carriers between the epicenter Wuhan and these countries. However, a detail analysis, complemented with more sequences and patient met data will give further evolutionary insights regarding the fast spreading pandemic. The phylogenetics heterogeneity between different strains is explored by genome variation profiling to find alterations in genetic information during the course of evolution, outbreak, and clinical spectrum caused by the different strains. In case of SARS-CoV2 and SARS-CoV too, few clinical characteristics differentiate them among themselves and with other seasonal influenza infections as well, as reported recently (21) . Interestingly in the present analysis, in comparison to SARS-CoV, we observed at least one of the variations like indels, deletions, misaligned and frameshift in all the SARS-CoV2 proteins except ORF6, ORF10 and ORF14 (Table S1 ). The (22) . Going well with the expectations from a rapidly transmitting pandemic virus, in our analysis, we observed various mutations located in the regions associated with immune response (Table S2) . These mutations may have significant impact on the antigenic and immunogenic changes responsible for differences in the severity of the outbreak in different geographical regions. To gain further insights, we compared the genetic mutation spectrum identified in the four countries, namely USA, Italy, India and Nepal. Surprisingly, the mutation spectrums were different among these countries ( (2), combined with other factors-a speculation which maybe verified with more evidences. From this analysis, we also speculate that the presence of country specific mutation spectrum may also be able to explain the current scenario in these countries like severity of illness, containment of the outbreak, the extent and timing of exposures to a symptomatic carrier etc. Non-structural proteins have their specific roles in replication and transcription (23) . Previous studies on SARS-CoV revealed Nsp15 as a potential candidate for the therapeutic target (24) . It is noteworthy to mention that in the present study; various mutations have been identified in all the non-structural proteins suggesting them to be an important and potential player in proposing therapeutic targets and should be explored experimentally. Many studies have reported that miRNAs not only act as the signature of tissue expression and function but also as potential biomarkers playing important role in regulating disease pathophysiology (25) . In viral infections, host antiviral miRNAs play a crucial role in the regulation of immune response to virus infection depending upon the viral agent. Many known human miRNAs appear to be able to target viral genes and their functions like interfering with replication, translation and expression. In the present study, we tried to predict the antiviral host-miRNAs specific for (26) . Also there are studies on the regulatory role of miRNA hsa-mir-27b-3p described in ACE2 Signaling (27) . The results of the present study suggest a strong correlation between miRNA hsa-mir-27b-3p and ACE2 which needs to be confirmed experimentally in SARS-CoV2 cases. Further, we tried to compare the miRNAs in the genomes and observed some striking findings. We observed that out of all the miRNAs, hsa-miR-27b is the only unique Based on our analysis, we speculate an important regulatory role of miR-27b in SARS-CoV2 infection. The contradictory treatment outcomes may be due to the presence of the miR-27b target in the Indian genome specifically. It probably indicates that the specific genetic and miRNA spectrum should be considered as the basis of the treatment management. The findings in the study have revealed unique features of the SARS-CoV2 genomes, which may be explored further. For example, one may analyse the link between severity of diseases to each of the variants, expression of the predicted host antiviral miRNAs can be checked in the patients, the predicted epitopes may be explored for their immunogenicity, difference in treatment outcomes may also be correlated with genome variations, lastly the potential of the unique segments of the virus proteins and the unique host miRNAs may be explored in development of novel antiviral therapies. Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Outbreak Coronavirus disease 2019 (COVID-19). Situation Report -58 The proximal origin of SARS-CoV-2 ViPR: an open bioinformatics database and analysis resource for virology research Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes Molecular Evolutionary Genetics Analysis across Computing Platforms. Molecular biology and evolution CELLO2GO: a web server for protein subCELlular LOcalization prediction with functional gene ontology annotation Automated inference of molecular mechanisms of disease from amino acid substitutions 0: predicting stability changes upon mutation from the protein sequence or structure Prediction of protein stability changes for single-site mutations using support vector machines VIRmiRNA: a comprehensive resource for experimentally validated viral miRNAs and their targets. Database : the journal of biological databases and curation2014 The microRNA.org resource: targets and expression. Nucleic acids research PAmiRDB: A web resource for plant miRNAs targeting viruses. Scientific reports psRNATarget: a plant small RNA target analysis server Prediction of CTL epitopes using QM, SVM and ANN techniques Prediction of continuous B-cell epitopes in an antigen using recurrent neural network Prediction and classification of chemokines and their receptors VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Systematic Comparison of Two Animal-to-Human Transmitted Human Coronaviruses: SARS-CoV-2 and SARS-CoV A Novel Coronavirus from Patients with Pneumonia in China Composition and divergence of coronavirus spike proteins and host ACE2 receptors predict potential intermediate hosts of SARS-CoV-2 Recent progress in the discovery of inhibitors targeting coronavirus proteases Coronavirus nonstructural protein 15 mediates evasion of dsRNA sensors and limits apoptosis in macrophages Extracellular miRNAs: the mystery of their origin and function. Trends in biochemical sciences Are patients with hypertension and diabetes mellitus at increased risk for COVID-19 infection? The Lancet Respiratory Medicine The ACE2/Apelin Signaling, MicroRNAs, and Hypertension. International journal of hypertension Regulation of cyclin T1 and HIV-1 Replication by microRNAs in resting CD4+ T lymphocytes Interferon-beta and interferon-gamma synergistically inhibit the replication of severe acute respiratory syndrome-associated coronavirus (SARS-CoV), Virology Diagnosis and treatment of 2019 novel coronavirus infection in children: a pressing issue