key: cord-0252546-h3xpamwt authors: Arjmand, Babak; Khodadoost, Mahmood; Jahani Sherafat, Somayeh; Rezaei Tavirani, Mostafa; Ahmadi, Nayebali; Hamzeloo Moghadam, Maryam; Rezaei Tavirani, Sina; Khanabadi, Binazir; Iranshahi, Majid title: Assessment of colon cancer molecular mechanism: a system biology approach date: 2021 journal: Gastroenterol Hepatol Bed Bench DOI: nan sha: 759c28ea3f312f3da47dd0f7eb5e9a2c1819118e doc_id: 252546 cord_uid: h3xpamwt AIM: The current study aimed to assess and compare colon cancer dysregulated genes from the GEO and STRING databases. BACKGROUND: Colorectal cancer is known as the third most common kind of cancer and the second most important reason for global cancer-related mortality rates. There have been many studies on the molecular mechanism of colon cancer METHODS: From the STRING database, 100 differentially expressed proteins related to colon cancers were retrieved and analyzed by network analysis. The central nodes of the network were assessed by gene ontology. The findings were compared with a GSE from GEO. RESULTS: Based on data from the STRING database, TP53, EGFR, HRAS, MYC, AKT1, GAPDH, KRAS, ERBB2, PTEN, and VEGFA were identified as central genes. The central nodes were not included in the significant DEGs of the analyzed GSE. CONCLUSION: A combination of different database sources in system biology investigations provides useful information about the studied diseases. 1 Colorectal cancer is known as the third most common kind of cancer and the second most important reason for global cancer-related mortality rates (1) . It is one of the lethal cancers that is associated with problems in diagnosis as well as therapy (2) . Many Bioinformatics is a critical field applied to create new concepts by using the analysis results of genomic and proteomic studies (5) (6) (7) . Dysregulated metabolites, genes, and proteins in colon cancer patients have been studied using bioinformatics. In such studies, much is gathered from databanks or published articles and analyzed using bioinformatic tools (8) (9) (10) . First, the diversity of data sources, and second, the multiplicity of analysis methods are interesting points about these studies. Based on the selected source and method of investigation, results can be different. It seems clear that an explanation of the investigation protocol is required to determine the most accurate findings (8, 11) . GEO is a useful source of data, including gene expression profiles of assessed samples. Many researchers select GEO as a source of data to analyze differentially expressed genes in a defined condition. GEO is not only suitable source of data, but it is also equipped with useful software such as GEO2R which helps the primary analysis of data. Fold change and statistical validation of data are two important findings from GEO. The style of gene regulation, i.e. up-or downregulation is accessible in GEO2R analysis of the studied DEGs (12, 13) . STRING is another useful source of data that provides the related dysregulated proteins in the studied condition. There are many published articles that are concerned with "disease query" of string. Combination of STRING and Cytoscape software is a powerful tool in the bioinformatic analysis of data (14, 15) . In the present investigation, dysregulated genes in human colon cancer were assessed by using one recorded experiment in GEO and STRING sources to elucidate the findings. In this study, 100 proteins associated with colon cancer were extracted from the STRING database using the "disease query option." The proteins were interacted by Cytoscape software v 3.7.2 (16) by undirected edges, and the network comprising 100 nodes and 2811 links was constructed. The main connected components, including 95 nodes and 5 isolated proteins, were analyzed by the "NetworkAnalyzer" plug in of Cytoscape software. The network was visualized based on degree value by considering the color and size of the nodes. Based on degree value, the 10 top nodes of the main connected component were selected as the hub nodes of the network. The hubs were included in the ClueGO v2.5.7 (17) application of Cytoscape to analyze gene ontology. The related pathways were extracted from KEGG 08.05.2020. A p-value ≤ 0.01 and network specificity; medium were applied to determine the pathways. The GSE127069 of 6 patients, entitled "RNA sequencing for cancer tissues and adjacent tissues of third-stage rectal cancer patients with and without blood vascular thrombus" in GEO (18) was selected for analysis. The volcano plot of gene expression profiles of colon cancer tissue versus adjacent tissue was provided to statistically match the data. The top genes based on fold change (1.5