key: cord-0029728-o5sw0v0t authors: Girdhar, Neha; Kumari, Nilima; Krishnamachari, A. title: Computational characterization and analysis of molecular sequence data of Elizabethkingia meningoseptica date: 2022-04-09 journal: BMC Res Notes DOI: 10.1186/s13104-022-06011-5 sha: 068ea8df2ed7f784b0d1918f6205c50b119f9d05 doc_id: 29728 cord_uid: o5sw0v0t OBJECTIVE: Elizabethkingia meningoseptica is a multidrug resistance strain which primarily causes meningitis in neonates and immunocompromised patients. Being a nosocomial infection causing agent, less information is available in literature, specifically, about its genomic makeup and associated features. An attempt is made to study them through bioinformatics tools with respect to compositions, embedded periodicities, open reading frames, origin of replication, phylogeny, orthologous gene clusters analysis and pathways. RESULTS: Complete DNA and protein sequence pertaining to E. meningoseptica were thoroughly analyzed as part of the study. E. meningoseptica G4076 genome showed 7593 ORFs it is GC rich. Fourier based analysis showed the presence of typical three base periodicity at the genome level. Putative origin of replication has been identified. Phylogenetically, E. meningoseptica is relatively closer to E. anophelis compared to other Elizabethkingia species. A total of 2606 COGs were shared by all five Elizabethkingia species. Out of 3391 annotated proteins, we could identify 18 unique ones involved in metabolic pathway of E. meningoseptica and this can be an initiation point for drug designing and development. Our study is novel in the aspect in characterizing and analyzing the whole genome data of E. meningoseptica. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-022-06011-5. In 1959, Elizabeth O King, discovered Elizabethkingia (renamed in 2005) [1] , earlier known as Chryseobacterium. It is a non-glucose fermenting, non-motile, catalase-oxidase positive gram negative bacteria belonging to Flavobacteriaceae family, ubiquitous in soil, fresh and salty water [2] . The genus comprises of six species [3] that is, E. meningoseptica associated with meningitis and sepsis in premature neonates, [4, 5] E. anophelis isolated from the midgut of Anopheles gambiae mosquitoes which causes respiratory tract illness in human [6] , E. miricola, isolated from condensation water on the Mir space station of Russia collected in 1997 [7] , and E. brunniana, E. ursingii and E. occulta (three CDC genomospecies) [8] . Elizabethkingia meningoseptica is causative agent of meningitis in neonates and sepsis in immunocompromised patients [9] . The occurrence of nosocomial infection has risen, mainly in patients, with prolonged hospitalization, treated with invasive procedures, subsequently on use of broad-spectrum antimicrobials as well as having concomitant infections [10] . The mortality rate in patients infected with E. meningoseptica is significantly higher due to its unusual resistance pattern and mechanism [11] . Further studies are needed to initiate the most effective therapeutic approach. One can follow the time Open Access BMC Research Notes *Correspondence: chari@jnu.ac.in 2 School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India Full list of author information is available at the end of the article consuming and labor-intensive experimental approach but advancement in bioinformatics field provided enormous software tools, that are used to analyze and extract information from the molecular sequence, structure, expression and pathway data [12, 13] . The current study focused on analyzing the whole genome data of Elizabethkingia to unravel the embedded features hitherto not reported, secondly to explore the possibility of getting some lead in the directions of possible novel therapeutic candidates. Accordingly, we have studied genomic features, origin of replication sites, phylogenetic relationships, comparative genomics among E. meningoseptica species and further explored subtractive genomics approach together with pathway analysis. The whole genome (Accession Number NZ_CP016376) and protein sequences of Elizabethkingia meningoseptica G4076 were downloaded from NCBI (www. ncbi. nlm. nih. gov). Nucleotide composition of genome was obtained using ORIS software [14] . To find all open reading frames in the genome, ORF finder, a graphical tool was used (https:// www. ncbi. nlm. nih. gov/ orffi nder/) [15] . CG-Viewer was used for plotting circular plot of genomes [16] . Discrete Fourier Transform based computational approach using customized python codes was carried out to see the typical three-base periodicity feature embedded in E. meningoseptica genomic sequence [17] . Rapid Annotation using Subsystem Technology (RAST) server was carried out for studying genome annotation [18, 19] . Ori-Finder [20] and ORISv1.0 [14] software tools were used to identify putative origin of replication (oriC) sites in the genome. MegaX software was utilized to carry out phylogenetic analysis for species within the same genus such as E. miricola, E. meningoseptica, E. anophelis, E. bruuniana, E. ursingii and E. occulta as well as Flavobacterium coloumnare ATCC49512, Riemerella anatipestifer ATCC11845 (other genus in same family) [21] . The orthologous gene identification among Elizabethkingia species was carried out using Orthovenn2 with default parameters [22, 23] . All protein sequences of Elizabethkingia meningoseptica G4076 and Homo sapiens (Host) were downloaded from NCBI database [24, 25] . Out of the total 3406 proteins in E. meningoseptica, hypothetical proteins and proteins having length less than 100 amino acids were discarded. Remaining 2503 proteins were subjected to BLASTP against proteomes of Homo sapiens [26] . Based on previous studies, expectation value cut off of 10 -4 and minimum bit score of 100 used as threshold to shortlist non-homologous proteins [27] . Further, these nonhomologous proteins were queried against Database of Essential Genes (DEG) server to get a list of essential genes for E. meningoseptica using e-value cut off 10 -10 and bit score value of 100 as threshold [28] . These shortlisted essential genes that were non-homologous to host and essential for bacteria were studied further with respect to metabolic pathway. Essential non-homologous proteins of E. meningoseptica, were further analyzed using KAAS (KEGG Automated Annotation Server) in order to study metabolic pathways [29] . KEGG analysis performed BLAST comparison against available KEGG gene database and provide metabolic pathway maps including KO and EC number for a particular gene. To determine the location of proteins in a cell PSORTb version 3.0 server was used [30] . The essential gene subjected to BLASTP analysis against FDA approved drug targets from Drugbank to search novel drug targets. Targets with identification of more or equal 80% are druggable targets and others that show considerable low degree of matching with already approved drug target can be used as novel targets for new drug identification [31] . The whole genome data of E. meningoseptica G4076 having length of 3,873,125 bp showed a mean GC content of 36.5%, number of genes as per annotation is 3477 and the percentage base composition viz %A ≈ %T i.e., 31.76 and %G ≈ %C i.e., 18.23 calculated using ORIS software [14] (Additional file 1: Figure S1 ) which is in agreement with Chargaff 's parity rule [32] . Open reading frame is effective in identifying genes that encodes proteins. Total number of 7593 ORFs were found in whole genome. The products are of varying length and it shows that the number of ORFs found are actually slightly more than the annotated number of proteins (Additional file 1: Figure S2 ). To visualize sequence conservation, the circular genome plot was created using CG view Server (Additional file 1: Figure S3 ). Gene coding segments of E. meningoseptica genome does show the typical threebase periodicity indicating underlying codon structure that enables us to predict and identify all possible genes in majority of the bacterial genome with very high accuracy [17] . Additional file 1: Figure S4A shows all the bases considered for the fourier spectrum and indicates the presence of three base periodic signal as seen in most of bacterial genomes. Signal strength is prominent for purine-pyrimidine (Additional file 1: Figure S4B ) whereas in the case of individual bases it is considerably low (Additional file 1: Figure S4C -F). RAST server shows annotated data indicating 3477 putative genes, 61 RNAs which includes 4,4,4 (5S, 16S, 23S) ribosomal RNAs and 49 tRNAs and 335 subsystems (set of functional role) under 27 categories [18] . Sixty two coding sequences were related with antibiotics resistance and toxic compounds which suggests E. meningoseptica might be multiple drug resistant (Additional file 1: Figure S5 ). Ori-Finder (a web based software tool for finding oriCs) predicted oriC region of 649 bp ranging from 740,720 bp to 741,368 bp having three DnaA box sequence motifs (TTA TCC ACA) with no more than one mismatch. Further, replication related gene, dnaA located from 2,613,273 to 2,614,727 bp which is followed by dnaN gene (Fig. 1A ) [20] . A cluster of three DnaA boxes and two AT rich DNA unwinding elements (DUE) are indication of functional chromosomal origin (Fig. 1F ). Similar kind of result was found with ORIS v1.0 software tool. DNA asymmetry, distribution of DnaA boxes as well as location of the dnaA gene help in predicting OriC regions [33] [34] [35] [36] . Both graphs enable us to pin-point or identify ORI/TER site. The difference in the position (genome coordinates) of OriC predicted by Ori-Finder and ORIS are well within 1 kb and hence, close agreement. Genomic comparison among Elizabetkingia species anatipestifer ATCC11845(WP_004918717.1)] has been done using MEGAX software. It depicts phylogenetic relatedness by comparing homology of protein sequence specifically 16S rRNA processing Protein RimM (Ribosomal maturation factor RimM) (Additional file 1: Figure S6 ) [37] . It has been found that E. meningoseptica are relatively at a large phylogenetically distance from other species of Elizabethkingia. Cluster of orthologous gene analysis of E. meningoseptica G4076 was compared with four other species of Elizabethkingia to provide insights into biological process, molecular functions and cellular components [22, 23] . It was found that among 3970 clusters, 1401 were orthologous clusters which contain at least two species and 2569 singletons. The number of orthologous genes shared by five species of Elizabethkingia genome was 2606 whereas 17 COGs were present only in Elizabethkingia meningoseptica G4076 genome which is involved only in metallopeptidase activity (Additional file 1: Figure S7 ). In pairwise comparison ranges varies from 3396 to 3409 COGs (Additional file 1: Figure S7C ). Subtractive genomic analysis is unique, fast and efficient method for identifying essential genes in pathogenic species that are non-homologous to human (host). These non-homologous essential genes can be used as putative drug targets against pathogens [38] . The genome of E. meningoseptica G4076 has 3391 annotated proteins. After exclusion of protein which are < 100 amino acids and hypothetical, remaining 2503 were subjected to BLASTP against proteins of Homo sapiens (host). Using e-value cut off 10 -4 and bit score > 100, it was found total of 2052 proteins were non-homologous to host protein. Thereafter, these proteins were subjected to BLAST analysis using DEG server and using e-value cut off 10 -10 and bit score > 100, shortlisted 692 proteins that are essential for E. meningoseptica G4076 but absent in host (Additional file 1: Table S1 ). DEG contains gene that plays important role in cell survival and can be novel targets for antibacterial drugs (Fig. 2) . The shortlisted non-homologous essential genes were analyzed using KEGG database for metabolic pathway annotation. It was found, only 41 out of 692, are present in pathogen as unique pathways ( Table 1) . Majority of them were involved in DNA binding response regulator, ribosomal proteins, replication and repair, Glycan biosynthesis, protein folding and sorting, two-component system, biotin metabolism and ATP transporters. It is very important for drug designing to determine whether target protein resides on cell surface or in cytoplasm. Localization of proteins play important role in drug binding and action. Subcellular localization reveals, out of 41 target proteins, 80% of total are cytoplasmic, rest located in periplasm or cytoplasmic membrane and no extracellularly proteins were obtained (Additional file 1: Figure S8 ). Extracellularly secreted proteins may be better opted for vaccine development. Here, it is clear that majority of proteins resides in cytoplasm and cytoplasmic membrane that further can be considered as potential therapeutic targets. Unique E. meningoseptica essential proteins nonhomologous to host further subjected to BLASTP against FDA approved drug targets from Drugbank which shortlisted to 18 target proteins. Out of which penicillin binding protein (2), ABC transporter ATP binding proteins (2) that targets for broad-spectrum antibiotics. The rest includes ribosomal proteins (rpsB, rpsl, rpsG, rpsJ, rpsE, lend support for choosing the specific drug target [39] . In that regard, computational analysis may include homology modelling and docking of selected candidate. Meningitis and sepsis is a major illness in newborn and immunocompromised patients caused by Elizabethkingia meningoseptica. Though typical clinical diagnostics are used to identify the illness but a greater understanding of molecular based diagnosis is desired and it is a long term goal. Increase in number of cases in Intensive care units (ICUs) makes it big challenge for clinicians to deal and manage. In this context, comprehensive analysis of whole genome data and pathway analysis were explored as we do not see much work related to computational analysis. Accordingly, bioinformatics approach was undertaken for characterizing molecular sequence data of Elizabethkingia. Our study identified 41 unique proteins in Elizabethkingia with respect to the host using subtractive genomics which further narrow down to18 therapeutic target proteins using in-silico comparative genomics. The suitable shortlisted ribosomal proteins which are linked to translation may be useful for future treatment and management of the infection. We have studied in an integrated fashion of considering and analyzing sequence data of E. meningosptica together with pathway analysis. Our study is small step in the direction of rapid diagnosis and possible drug development. The current investigation is limited to in silico study only. The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s13104-022-06011-5. Additional file 1: Figure S1 . Percentile distribution of DNA base composition in E. meningoseptica G4076 genome. Figure S2 . Open reading Frame viewer-a Window showing ORFs on the interval from 1 to 50,000 nucleotides. Figure S3 . Circular genomic plot of E. meningoseptica. Figure S4 . Fourier Transform Spectrum. Figure S5 . Annotation of Elizabethkingia meningoseptica G4076 genome using RAST server. Figure S6 . Phylogeny tree of Elizabethkingia species. Figure S7 . Cluster of genes, Venn diagram and pairwise heat map among Elizabethkingia species. Figure S8 . Pie-chart showing subcellular localization of proteins. Table S1 . List showing subtractive genomic and metabolic pathway analysis result of E. menigoseptica. Studies on a group of previously unclassified bacteria associated with meningitis in infants Elizabethkingia meningosepticum (Chryseobacterium meningosepticum) infections in children Elizabethkingia infections in humans: from genomics to clinics Two outbreaks of Flavobacterium meningosepticum type E in neonatal intensive care unit Neonatal meningitis caused by Elizabethkingia meningoseptica in Saudi Arabia Elizabethkingia anophelis sp. nov., isolated from the midgut of the mosquito Anopheles gambiae Transfer of Chryseobacterium meningosepticum and Chryseobacterium miricola to Elizabethkingia gen. nov. as Elizabethkingia meningoseptica comb. nov. and Elizabethkingia miricola comb. nov Revisiting the taxonomy of the genus Elizabethkingia using whole-genome sequencing, optical mapping and MALDI-TOF, along with proposal of three novel Elizabethkingia species: Elizabethkingia brunniana sp. nov., Elizabethkingia ursingii sp. nov., and Elizabethkingia occult sp Chryseobacterium meningosepticum: an emerging pathogen among immunocompromised adults Nosocomial infections caused by Elizabethkingia meningoseptica: an emergent pathogen Elizabethkingia meningoseptica endogenous endopthalmitis-a case report The roots of bioinformatics in theoretical biology Recent trends in computational biomedical research. Life (Basel) ORIS: an interactive software tool for prediction of replication origin in prokaryotic genomes The CGView Server: a comparative genomics tool for circular genomes Prediction of probable genes by Fourier analysis of genomic sequences The RAST server: rapid annotations using subsystems technology The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST) Ori-Finder: a web-based system for finding oriCs in unannotated bacterial genomes MEGA X: molecular evolutionary genetics analysis across computing platforms OrthoVenn2: a web server for whole-genome comparison and annotation of orthologous clusters across multiple species OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species Draft genome sequences of strains representing each of the Elizabethkingia genomospecies previously determined by DNA-DNA hybridization Initial sequencing and analysis of the human genome Basic local alignment search tool Drug target identification and prioritization for treatment of Ovine foot rot: an in-silico approach DEG: a database of essential genes KEGG: integrating viruses and cellular organisms PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes Application of subtractive genomics and molecular docking analysis for the identification of novel putative drug targets against Salmonella enterica subsp. Enterica serovar Poona Base composition. Encycl Genet Genomic epidemiology and global diversity of the emerging bacterial pathogen Elizabethkingia anophelis Where does bacterial replication start? Rules for predicting the oriC region Recent advances in the identification of replication origins based on the Z-curve method The Z curve database: a graphic representation of genome sequences Molecular markers in phylogenetic studies-a review Identification and characterization of potential drug targets by subtractive genome analyses of methicillin resistant Staphylococcus aureus The mechanisms of action of ribosome-targeting peptide antibiotics Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Not applicable. Authors' contributions AK-conceptualization, methodology, formal analysis, writing-review and editing, visualization, supervision. NK-conceptualization, writing-review and editing, supervision, project administration. NG-formal analysis, investigation, data curation, writing-original draft. All authors read and approved the final manuscript. The financial assistance provided to Neha Girdhar under Women Scientist Scheme-A (WOS-A) vide Reference No. SR/WOS-A/LS-222/2016 by Department of Science and Technology, Government of India is gratefully acknowledged. The whole genome sequence of Elizabethkingia meningoseptica G4076 having Accession Number NZ_CP016376 was downloaded from NCBI site https:// www. ncbi. nlm. nih. gov/ genome/ 14625? genome_ assem bly_ id= 309079. All the protein sequences (numbering 3406) available in FASTA format were used for BLASTP analysis against human dataset option. Selected protein sequences (described in material method section) were further used as input for subtractive genomic analysis. The authors declare that no ethical approval is required for current study. Not applicable. The authors declare that they have no competing interests. Ready to submit your research Ready to submit your research ? Choose BMC and benefit from:? Choose BMC and benefit from: