key: cord-0030923-smixjjjd authors: Khouja, Hamed Ishaq; Ashankyty, Ibraheem Mohammed; Bajrai, Leena Hussein; Kumar, P. K. Praveen; Kamal, Mohammad Amjad; Firoz, Ahmad; Mobashir, Mohammad title: Multi-staged gene expression profiling reveals potential genes and the critical pathways in kidney cancer date: 2022-05-04 journal: Sci Rep DOI: 10.1038/s41598-022-11143-6 sha: ec60904c982c3a655f089fadbd9676a125a3bba4 doc_id: 30923 cord_uid: smixjjjd Cancer is among the highly complex disease and renal cell carcinoma is the sixth-leading cause of cancer death. In order to understand complex diseases such as cancer, diabetes and kidney diseases, high-throughput data are generated at large scale and it has helped in the research and diagnostic advancement. However, to unravel the meaningful information from such large datasets for comprehensive and minute understanding of cell phenotypes and disease pathophysiology remains a trivial challenge and also the molecular events leading to disease onset and progression are not well understood. With this goal, we have collected gene expression datasets from publicly available dataset which are for two different stages (I and II) for renal cell carcinoma and furthermore, the TCGA and cBioPortal database have been utilized for clinical relevance understanding. In this work, we have applied computational approach to unravel the differentially expressed genes, their networks for the enriched pathways. Based on our results, we conclude that among the most dominantly altered pathways for renal cell carcinoma, are PI3K-Akt, Foxo, endocytosis, MAPK, Tight junction, cytokine-cytokine receptor interaction pathways and the major source of alteration for these pathways are MAP3K13, CHAF1A, FDX1, ARHGAP26, ITGBL1, C10orf118, MTO1, LAMP2, STAMBP, DLC1, NSMAF, YY1, TPGS2, SCARB2, PRSS23, SYNJ1, CNPPD1, PPP2R5E. In terms of clinical significance, there are large number of differentially expressed genes which appears to be playing critical roles in survival. house various types of datasets. TCGA, oncomine, nephroseq, and GEO (gene expression omnibus) are the most widely used databases in biological sciences 11 . These databases mainly GEO store vast amount of datasets related with cancer, diabetes, and other biological problems 8, [12] [13] [14] [15] [16] . The identification of pathogenetically distinct tumour types poses a significant challenge in the treatment of complex diseases (especially cancer) [17] [18] [19] . The improvement in tumor classification always helps in the improvement during therapeutic approaches 20, 21 . In target specific therapy, effectiveness can be maximised while toxicity is reduced by using enhanced classification. To access biological datasets from these databases previously, a variety of tools/approaches were used. For molecular classification of cancer Golub TR et al., 22 have divided cancer classification into two challenges as class discovery and class prediction. A number of oncogenes and tumour suppressor genes that are changed in RCC, resulting in pathway dysregulation, need to be identified and investigated further [23] [24] [25] . Copy number, gene sequencing, expression pattern, and methylation in primary RCC are all possible avenues for achieving this goal. With continued breakthroughs in omics technology, the application of molecular markers for early diagnosis and prognosis deserves further attention 1, 2, [26] [27] [28] [29] [30] . We have selected RCC dataset with samples from two stages (stages I and II) for the purpose of understanding how gene expression patterns vary and how altered gene expression patterns lead to possible changes in the respective inferred functions as tumour stage I to II changes and from affymetrix platforms (U133A to U133B). Different cancer stages help in describing where a cancer could be located, how far it has spread, and whether it is affecting other parts of the body [31] [32] [33] . Healthy tissue usually contains many different types of cells grouped together. If the cancer looks similar to healthy tissue and contains different cell groupings, it is called differentiated or low-grade tumor and when the cancerous tissue looks very different from the healthy tissue, it is termed as poorly differentiated or high-grade tumor. The cancer's grade may help the clinician to predict how quickly the cancer will spread. In general, the lower the tumor's grade, the better the prognosis. Different types of cancer have different methods to assign a cancer grade 7,34-37 . In general, it is very hard to detect most of the cancers at early stage so the main focus was on exploring the gene expression pattern alterations and its functional consequences and further to avoid biasedness, we have incorporated TCGA dataset also which have the samples from all the grades. Here, we have selected a dataset from gene expression omnibus (GEO) where the samples are from human with two tumor stages (I and II). We have organized the samples in the order such as stage I normal versus tumor and stage II normal versus tumor for the affymetrix platforms U133A and U133B and analyzed the tumor samples with respect to their respective controls (normal sample of the same stage) for the gene expression alterations and evolved functions with the increase in tumor percentage. Based on our work, we conclude that irrespective of the tumor stage PI3K-Akt, Foxo, endocytosis, MAPK, Tight junction, cytokine-cytokine receptor interaction pathways and the major source of alteration for these pathways are MAP3K13, CHAF1A, FDX1, ARHGAP26, ITGBL1, C10orf118, MTO1, LAMP2, STAMBP, DLC1, NSMAF, YY1, TPGS2, SCARB2, PRSS23, SYNJ1, CNPPD1, PPP2R5E. In addition, we have also studied the clinical significance and observe that there are large number of differentially expressed genes which appears to be playing critical roles in for survival such as ARHGAP6, TGM4, CD248, SLC13A3, EPO, PARD6A, CLCA2, UBE2S, ERAL1, FGFR1, MRVI1, DYNC1I2, CDCA7. In the first step, we have selected the data of our interest (raw expression dataset) GSE6344 30, 38 , organized the samples in the order such as stage I normal versus tumor and stage II normal versus tumor for the affymetrix platforms U133A and U133B and processed it until normalization and log2 values for all the mapped genes as mentioned in the workflow Fig. 1a . This dataset contains 40 samples (5 normal and 5 tumor for two stages I and II from U133A and U133B platforms). For differential gene expression analysis, we have compared the tumor samples with normal samples of the respective stages and the respective platforms that it gives us four DEGs lists. Gene expression profiling and the associated functions for varying tumor percentages. In this study, the initial focus of our goal was to understand the gene expression pattern between the different stages for normal versus tumor samples. For this purpose, the total number of the DEGs, up, and down regulated genes have been calculated (Fig. 1b ) and the number of down-regulated genes are higher than the up-regulated genes and further, we observe that the number of down regulated genes are comparatively high in all the four DEGs list (Fig. 1c) . For U133A dataset, we observe very high number of DEGs for same stage and shares 1147 genes between stage I and II with respect to U133B which is 606 genes and stage I and II specific genes are also high in both the platforms U133A and U133B. Similar to the DEGs distribution, the enriched pathways are also distributed in the similar trend as shown in Fig. 1d (p-values < 0.05) and even after applying strict cut-off of p-value as shown in Fig. 1e (p-values < 0.001). Most of the shared genes between different stages and platforms have been shown with their fold changes and these genes are known to be associated with the critical pathways which are very important for multiple type of cancers (Fig. 1f ). In addition, we have also mapped the known association between all these genes (from Fig. 1f ) for which one list of all these DEGs have been combined to single DEGs list and finally, these genes have been mapped by using the network database in the form of network as s in Fig. 1g . Figure 1g presents the network of DEGs and their connectivity with each other where there are four smaller clusters and these clusters are connected by a core cluster of SYNJ1, MAPT, YY1, NSMAF, and FNBP4 genes and among the highly connected genes SYNJ1, LAMP2, SCARB2, FDX1, HDLBP, CHAF1A, MAPT, and FNBP4. (Fig. 3) . Here, we have also shown the connectivity of the genes for those networks where the connections are not clearly visible. For more details of the list of the genes and the pathways used for the network-level analysis were supplied in the Supplementary Table S1 . 5, 36, 39 . We observe that most of the top-raked genes (from selected 30 DEGs) mainly up-regulated genes show very high significance on the patients survival (Fig. 4) . In this figure, we have also shown the mutations in these top-ranked DEGs for clear renal cell carcinoma in the TCGA databse. There are few genes(ERBB4, SLC13A3, TGM4, and FGFR1) which are mutated at very high rate as shown in Fig. 4a ,b. Further, we have also selected different dataset (GSE68417 40 ) which contains the samples for adjacent normal, low grade, and high grade and compared the differentially expressed genes and the enriched pathways with each other (Fig. 5a ). This shows that the DEGs of adjacent normal versus low grade tumor samples share majority of the DEGs of adjacent normal versus high grade tumor samples and both these list share few DEGs with low grade versus high grade DEGs list and as expected there was no shared enriched pathways at all because there appears only few genes which have gene expression with fold change ≥ + 1.5 (up regulated) or ≤ − 1.5 (down regulated) in case of low grade versus high grade. Kaplan-Meier plots show the clinical significance and that is a large number of differentially expressed genes appear to be potentially significant in terms of survival and some of the selected genes are ARHGAP6, TGM4, CD248, SLC13A3, EPO, PARD6A, CLCA2, UBE2S, ERAL1, FGFR1, MRVI1, DYNC1I2, CDCA7 (additional data shown in supplementary Figs. S1-S6). Moreover, Fig. 5b has been presented with the list of genes and the respective p-values for survival analysis and here only those genes have been shown which are clinically significant and the overall pathways associated with these genes and further specific assocations were shown in Fig. 5c . Additionally, the expressions (RNA and protein) have been shown in supplementary data S7. We have checked the expression of these clinically relevant genes by using protein atlas where most of these genes are expressed in case of RCC and act as biomarkers and only TGM4 and GGN were not expressed. Renal cell carcinoma is one of the most common cancers, and it is one of the leading causes of cancer death 14, 15, 41 . In terms of therapy and diagnosis, therapeutic and clinical outcomes differ between the individuals with even close similarity in clinical and pathological characteristics (tumor type, grades, and stages) and despite tremendous efforts to identify molecular biomarkers (prognostic and predictive) and with improved precision compared to clinical and pathological predictors only few molecular tests have been introduced into oncological practice 29 . So it is important to understand and unravel different levels (such as gene expression pattern, epigenetics, protein expression) of diversities in cancer 42, 43 . We gathered the previously published dataset for this purpose and conducted a detailed and precise study ranging from gene expression profiling to functional changes, including networks mapped from the human protein network database. Our work leads to the conclusion that irrespective of the tumor stage PI3K-Akt, Foxo, endocytosis, MAPK, Tight junction, cytokine-cytokine receptor interaction pathways and the major source of alteration for these pathways are MAP3K13, CHAF1A, FDX1, ARHGAP26, ITGBL1, C10orf118, MTO1, LAMP2, STAMBP, DLC1, NSMAF, YY1, TPGS2, SCARB2, PRSS23, SYNJ1, CNPPD1, PPP2R5E. Networks of DEGs for the enriched Table 1 . Enriched pathways grouped either common or specific to the conditions. These pathways have been generated after plotting the venn diagram. www.nature.com/scientificreports/ pathways show that there are large number of genes from few specific pathways are altered such as Ras signaling pathways (Fig. 2c,h,m) , immune sysytems, Wnt, hippo, (Fig. 2d ,i,n) Akt pathways (Fig. 2a,f,k) . Here, we observe that critical pathways altered in RCC are wnt, hippo, regulation of actin cytoskeleton, ECM, infection and inflammation, metabolic, and more cancer related pathways. From the mapped network, we observe that the highly connected genes infer the potential pathways or in other works the top ranked genes based on connectivity refer to those pathways which are directly or indirectly associated either with RCC or other types of cancer. In terms of clinical significance, we looked at the rate of mutations for the top ranked genes (based on fold change) and patients' survival for changes in gene expression, with Kaplan-Meier plots indicating clinical significance. We conclude that a large number of differentially expressed genes tend to be potentially important in terms of survival, with ARHGAP6, TGM4, CD248, SLC13A3, EPO, PARD6A, CLCA2, UBE2S, ERAL1, FGFR1, MRVI1, DYNC1I2, CDCA7 among the genes chosen. Using the publicly available datasets, we have investigated the gene expression profiling for renal cell carcinoma. In the previous work, it has been focused on selected genes and pathways. Here, we have investigated the list of critical pathways and the genes which appear to be clinically highly significant in case of renal cell carcinoma. These clinically significant genes lead to potential alteration in PI3K-Akt, foxo, endocytosis, MAPK, tight junction, cytokine-cytokine receptor interaction pathways. Our work will help in diagnosing the renal cell carcinoma patients because here, we have presented the differentially expressed genes, their inferred pathways, and the clinical impact of the selective genes. Since, our finding is from overall perspective including clinical relevance so this study will help in future for diagnostic also. This work also appears to be more unique in comparison to the previous study that we potentially explored grade I and II of RCC and further explored the clinical relevance. Healthy tissue usually contains many different types of cells grouped together and if the cancer looks similar to healthy tissue and contains different cell groupings, it is called differentiated or low-grade tumor and when the cancerous tissue looks very different from the healthy tissue, it is termed as poorly differentiated or high-grade tumor. The cancer's grade may help the clinician to predict how quickly the cancer will spread. In general, the lower the tumor's grade, the better the www.nature.com/scientificreports/ prognosis. Different types of cancer have different methods to assign a cancer grade 7,34-37 and the different tumor stages could help in describing the severeness, tumor propagation speed, and its impact on the other organs [31] [32] [33] .. In general, it is very hard to detect most of the cancers at early stage so the main focus was on exploring the gene expression pattern alterations and its functional consequences and further to avoid biasedness, we have incorporated TCGA dataset also which have the samples from all the grades. Further, we have also investigated the expression of these clinically relevant genes by using protein atlas (https:// www. prote inatl as. org/) [44] [45] [46] [47] [48] . We observe that most of these genes are expressed in case of RCC and act as biomarkers and only TGM4 and GGN were not expressed. This study will be an important step for the understanding of early stage tumor propagation and also will be helpful for clinical aspect. Based on our findings, we conclude that PI3K-Akt, Foxo, endocytosis, MAPK, Tight junction, and cytokinecytokine receptor interaction pathways are among the most commonly altered pathways in renal cell carcinoma, and that www.nature.com/scientificreports/ Here, GSE6344 dataset was used for the study which contains the samples of stage I and II of gene expression for tumor kidney cancer 30, 38 . In the first step, we selected the raw expression dataset GSE6344 and processed it until normalisation and log2 values of all mapped genes were achieved, as shown in Fig. 1a of the workflow. These 40 samples in this dataset were 5 normal and 5 tumor for two stages I and II from U133A and U133B platforms. We have compared the tumor samples with standard samples of the respective stages and platforms for differential gene expression analysis, yielding four DEGs lists. In short the basic steps involved for the entire study are raw file processing, intensity calculation and normalization. For normalization [49] [50] [51] , GCRMA 52-56 , RMA, and EB are the most commonly used approaches. Here, we have used EB for raw intensity normalization. After normalization, we proceed for our goal which is to understand the gene expression patterns 14, 57 and its inferred functions 57, 58 . To prepare the list of DEGs and analysis, we have our own in-built codes. The samples were placed into two groups such as COVID-19 positive and negative and then normal and the tumor samples. The selection criteria were placed by the fold change and p-values which have been calculated and for the selection of genes as differentially expressed the threshold of fold changes and p-values applied were ± 2 and 0.05, respectively and then KEGG database [59] [60] [61] have been used for pathway analysis and for which there is our own code designed 62 . In summary, for differential gene expression prediction and statistical analysis, MATLAB2017 functions (e.g., mattest) were applied and further for pathway analysis, we used KEGG 61 database [62] [63] [64] [65] . For generating DEGs network, FunCoup2.0 66 hasbeen used for all the networks throughout the work and cytoscape 67 has been used for network visualization. For most of our coding and calculations MATLAB has been used [62] [63] [64] [65] . Furthermore, FunCoup2.0 66 database and cytoscape and its applications 68 were used for network visualization to understand the network and the connectivity of the genes within thenetwork of DEGs 69 www.nature.com/scientificreports/ or associations such as protein complexes, protein-protein physical interactions, metabolic, and signaling pathways 66 . MATLAB 2017b codes and the command line codes have been used for figure plotting and during analysis. For the network level-analysis such as the number of connectivity per gene and the genes belonging to different number of pathways, the codes have been written in MATLAB and finally it has been plotted also by the codes written in MATLAB 64, 65 . For venn diagram plotting, freely available webserver (http:// bioin forma tics. psb. ugent. be/ webto ols/ Venn/) was used [72] [73] [74] . We have utilized the publicly available datasets (main data source) which are freely available and have mentioned it in method section with proper references. The analyzed details have been supported by the supplementary data. www.nature.com/scientificreports/ Renal cell carcinoma Renal cell carcinoma Cancer evolution: The final frontier of precision medicine? Deciphering intratumor heterogeneity and temporal acquisition of driver events to refine precision medicine Cancer systems biology: a peek into the future of patient care? Understanding genomic alterations in cancer genomes using an integrative network approach Hallmarks of cancer: The next generation Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer Genetic fine mapping and genomic annotation Cancer: A systems biology disease A general framework for analyzing tumor subclonality using SNP array and DNA sequencing data Prostate cancer stem cells: deciphering the origins and pathways involved in prostate tumorigenesis and aggression Gene expression profiling identifies clinically relevant subtypes of prostate cancer Gene expression analyses reveal molecular relationships among 20 regions of the human CNS Expression profiling of ion channel genes predicts clinical outcome in breast cancer Tumor heterogeneity: next-generation sequencing enhances the view from the pathologist's microscope Tumor evolution in response to chemotherapy: Phenotype versus genotype Tumor-host cell interactionsin ovarian cancer: Pathwaysto therapy failure Personalization of prostate cancer prevention andtherapy: Are clinically qualified biomarkers in thehorizon? Diagnosis of prostate cancer using differentially expressed genes in stroma Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring Cancer genes and the pathways they control Renal cell carcinoma: Etiology, incidence and epidemiology Renal cell carcinoma: Links and risks. IJNRD 45 Renal cell cancer: Clinicopathological profile and survival outcomes Environmental and modifiable risk factors in renal cell carcinoma Expression profiling of metastatic renal cell carcinoma using gene set enrichment analysis Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing pathway signature and cellular differentiation in clear cell renal cell carcinoma A collective route to metastasis: Seeding by tumor cell clusters A continuous model of angiogenesis: Initiation, extension, and maturation of new blood vessels modulated by vascular endothelial growth factor, angiopoietins, platelet-derived growth factor-B, and pericytes A gene signature in histologically normal surgical margins is predictive of oral carcinoma recurrence Genomic hallmarks of localized, non-indolent prostate cancer Biomarkers of residual disease after neoadjuvant therapy for breast cancer An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics Mutational landscape and clonal architecture in grade II and III gliomas Secreted frizzled-related protein 1 loss contributes to tumor phenotype of clear cell renal cell carcinoma Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines Characterization of clear cell renal cell carcinoma by gene expression pro Systematic variation in gene expression patterns in human cancer cell lines Intratumor heterogeneity: Evolution through space and time Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing An atlas of the protein-coding genes in the human, pig, and mouse brain A human protein atlas for normal and cancer tissues based on antibody proteomics A genome-wide transcriptomic analysis of protein-coding genes in human blood cells A pathology atlas of the human cancer transcriptome Proteomics. Tissue-based map of the human proteome Microarray data normalization and transformation Microarray-based expression profiling and informatics Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data Making informed choices about microarray data analysis Gene selection for cancer identification: A decision tree model empowered by particle swarm optimization algorithm An integration of complementary strategies for gene-expression analysis to reveal novel therapeutic opportunities for breast cancer GenMAPP 2: New features and resources for pathway analysis Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles PANTHER version 10: Expanded protein families and functions, and analysis tools KEGG for linking genomes to life and the environment KEGG for representation and analysis of molecular networks involving diseases and drugs KEGG for integration and interpretation of large-scale molecular data sets In-Silico Study of immune system associated genes in case of type-2 diabetes with insulin action and resistance, and/or obesity Comparative study of gene expression profiling unravels functions associated with pathogenesis of dengue infection Gene expression profiling and clinical relevance unravel the role hypoxia and immune signaling genes and pathways in breast cancer: Role of hypoxia and immune signaling genes in breast cancer Informatics in Medicine Unlocked Global networks of functional coupling in eukaryotes from comprehensive data integration A differential network analysis approach for lineage specifier prediction in stem cell subpopulations Cytoscape: A software environment for integrated models of biomolecular interaction networks Simulated evolution of signal transduction networks Negative interactions and feedback regulations are required for transient cellular response The KEGG resource for deciphering the genome Role of potential COVID-19 immune system associated genes and the potential pathways linkage with type-2 diabetes Understanding the role of potential pathways and its components including hypoxia and immune system in case of oral cancer Gene Expression Profiling of Early Acute Febrile Stage of Dengue Infection and Its Comparative Analysis With Streptococcus pneumoniae Infection The work has been supported by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia funded this project, under grant no. (422-800). The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. The authors declare no competing interests. The online version contains supplementary material available at https:// doi. org/ 10. 1038/ s41598-022-11143-6.Correspondence and requests for materials should be addressed to H.I.K. or M.M. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.