key: cord-0032756-oc8sdaef
authors: Mosharaf, Md. Parvez; Reza, Md. Selim; Gov, Esra; Mahumud, Rashidul Alam; Mollah, Md. Nurul Haque
title: Disclosing Potential Key Genes, Therapeutic Targets and Agents for Non-Small Cell Lung Cancer: Evidence from Integrative Bioinformatics Analysis
date: 2022-05-12
journal: Vaccines (Basel)
DOI: 10.3390/vaccines10050771
sha: 12fb9df94ce831f696bf007bd55403deb7b83e40
doc_id: 32756
cord_uid: oc8sdaef

Non-small-cell lung cancer (NSCLC) is considered as one of the malignant cancers that causes premature death. The present study aimed to identify a few potential novel genes highlighting their functions, pathways, and regulators for diagnosis, prognosis, and therapies of NSCLC by using the integrated bioinformatics approaches. At first, we picked out 1943 DEGs between NSCLC and control samples by using the statistical LIMMA approach. Then we selected 11 DEGs (CDK1, EGFR, FYN, UBC, MYC, CCNB1, FOS, RHOB, CDC6, CDC20, and CHEK1) as the hub-DEGs (potential key genes) by the protein–protein interaction network analysis of DEGs. The DEGs and hub-DEGs regulatory network analysis commonly revealed four transcription factors (FOXC1, GATA2, YY1, and NFIC) and five miRNAs (miR-335-5p, miR-26b-5p, miR-92a-3p, miR-155-5p, and miR-16-5p) as the key transcriptional and post-transcriptional regulators of DEGs as well as hub-DEGs. We also disclosed the pathogenetic processes of NSCLC by investigating the biological processes, molecular function, cellular components, and KEGG pathways of DEGs. The multivariate survival probability curves based on the expression of hub-DEGs in the SurvExpress web-tool and database showed the significant differences between the low- and high-risk groups, which indicates strong prognostic power of hub-DEGs. Then, we explored top-ranked 5-hub-DEGs-guided repurposable drugs based on the Connectivity Map (CMap) database. Out of the selected drugs, we validated six FDA-approved launched drugs (Dinaciclib, Afatinib, Icotinib, Bosutinib, Dasatinib, and TWS-119) by molecular docking interaction analysis with the respective target proteins for the treatment against NSCLC. The detected therapeutic targets and repurposable drugs require further attention by experimental studies to establish them as potential biomarkers for precision medicine in NSCLC treatment.

Lung cancer is treated as the leading cause of cancer-related death worldwide among human cancer, which causes the dynamic degradation of the lung [1] . The most common type of bronchial tumor is non-small-cell lung cancer (NSCLC), which accounts for approximately 75% of all lung cancers [2] . The NSCLC is more deadly than the small-cell lung cancer (SCLC), though it grows and spreads slowly compared with the SCLC since it progresses to the advanced stage with few or without any symptoms. Although the targeted therapy has achieved substantial development, the increasing mortality rate associated with lung cancer lays emphasis on both prevention and early detection of lung cancer. Traditional cancer diagnosis methods including histopathology and cytopathologyare practiced in the case of adenocarcinoma, squamous cell carcinoma, and large-cell carcinoma of NSCLC [3] [4] [5] . The morphological judgment for the tumors has some limitations, including the lack of significant morphological features, which leads to the identification ambiguities [6] [7] [8] [9] [10] . Several non-causal risk factors (e.g., smoking, alcohol consumption, and high air pollution) of lung cancer have been detected by several independent studies [11] [12] [13] [14] [15] . However, so far, there are no in-depth studies that explore the causal risk factors of NSCLC highlighting their pathogenetic processes and associated candidate drugs for the treatment against NSCLC. The causal risk factors are known as the mutated genes that drive the cancer progression. Usually, non-causal risk factors are assumed to be responsible for genetic mutation and some of them stimulate cancer progression. Cancer-causing mutated genes are utilized for diagnosis, prognosis, and therapies of cancer [16, 17] . Moreover, the DNA vaccine is part of a new era of modern therapeutics where the gene-based prophylactic vaccines are being developed [18] [19] [20] . The plasmid DNA vaccines and viral-vectored vaccines are two types of gene-based vaccines on which many animal trials are being practiced all over the world [21, 22] . Therefore, the cancer-causing genes also might be a great therapeutics target for the gene-based DNA vaccine development.

Gene expression profile analysis is now considered as one of the most promising approaches for exploring cancer-causing mutated genes, which yields relevant information for diagnosis, prognosis, and therapies of cancers. [23] [24] [25] [26] . Computationally, mutated genes (potential key genes) are predicted by the analysis of differential gene expression patterns [16, 17, [23] [24] [25] [26] [27] [28] [29] . Therefore, in this study, an attempt was made to explore NSCLCcausing key genes from the publicly available gene expression profiles, highlighting their functions, pathways, and regulators, which yield relevant information for diagnosis, prognosis, and therapies of NSCLC, by using the integrated bioinformatics approaches.

To reach the goal of this study, we analyzed a publicly available gene expression dataset by using integrated bioinformatics approaches [16, 28, 29] . The global working flowchart of this study is displayed in Figure 1 . 

To explore NSCLC-causing key genes, the Affymetrix Human Genome U133 plus 2.0 microarray gene expression dataset was retrieved from the NCBI Gene Expression Omnibus (GEO) database [30] with accession number GSE19804, which contained 60 tumor samples and 60 control samples on 54,675 genes. The dataset was generated by a previous study [31] . The sample unit was aged from 37 to 80 years, with nine different tumor stages 

To explore NSCLC-causing key genes, the Affymetrix Human Genome U133 plus 2.0 microarray gene expression dataset was retrieved from the NCBI Gene Expression Omnibus (GEO) database [30] with accession number GSE19804, which contained 60 tumor samples and 60 control samples on 54,675 genes. The dataset was generated by a previous study [31] . The sample unit was aged from 37 to 80 years, with nine different tumor stages (i.e., 1, 1A, 1B, 2, 2A, 2B, 3A, 3B, 4).

At first, the gene expression dataset was normalized for identifying DEGs through the Robust Multi-Array Average (RMA) expression measure and it was implemented by the NCBI-GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/, accessed on 5 October 2021) web-tool. Then, the LIMMA [32] statistical test was utilized to identify the DEGs between NSCLC and control samples. To control the false discovery rate in multiple-testing, the p-values were adjusted by Benjamini Hochberg's [33] method. Both the adjusted p-value and log 2 FC values were considered for identifying the upregulated and downregulated DEGs as follows:

The bioinformatics resources, Database for Annotation, Visualization and Integrated Discovery (DAVID) (version v6.8) [34, 35] was utilized to discern molecular function, biological process, and molecular pathway annotations related to the identified DEGs. Besides, the KEGG pathways identification was conducted through the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database [36] [37] [38] . For statistical significance, the adjusted p-value < 0.05 was considered, determined from Fisher Exact test and Benjamini-Hochberg's correction was used for the multiple testing correction techniques.

The STRING database [39] was used to construct the protein-protein interaction (PPI) network of the proteins encoded by DEGs. The STRING database uses a score combiner depending on the product of probabilities [40] . To visualize and perform topological analyses of the PPI network, the NetworkAnalyst [41] was utilized. The topological analysis was applied to determine hub-DEGs/proteins through the CytoHubba plugin [42] in Cytoscape 3.8.2 using degree (connectivity) and betweenness metrics simultaneously [43] . The minimum degree of 10 was considered as the cut off criterion in CytoHubba. Furthermore, the Molecular Complex Detection (MCODE), a novel clustering algorithm [44] along with the CytoHubba was used to identify the sub-modules from the PPI network. The top-scored modules are presented in this analysis.

To investigate the genomic alterations/mutations of the hub-genes, the online cBioPortal (https://www.cbioportal.org, accessed on 28 March 2022) was used over the NSCLC datasets of the server [45, 46] . The OncoPrint output was used to represent the most important alteration frequency of genes.

The physicochemical properties of the detected hub proteins were reported from the online tool ProtParam (https://web.expasy.org/protparam/, accessed on 10 November 2021), which allows the computation of various physical and chemical parameters for a given protein. The physiochemical properties of molecular weight, theoretical pI, extinction Vaccines 2022, 10, 771 4 of 20 coefficient, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) were checked for the reported hub protein in this study.

To explore transcriptional and post-transcriptional regulators of DEGs, we performed TFs-DEGs and miRNA-DEGs interaction network analysis, respectively. The TarBase and miRTarBase [47, 48] databases were used to identify the significant miRNAs. The JASPAR database [49] retrieved the key regulatory transcription factors (TFs). The entire analysis was conducted through the NetworkAnalyst [41] .

At first, patients were divided into a low-risk group (control group) and high-risk group (SCLC group) in the SurvExpress online server [50] . Then, the differences between the risk groups from the expression levels of hub-DEGs were investigated by using box plots and survival probability curves.The statistical significance of the differences in the box plots were evaluated through the t-test. Survival signatures of the reported biomolecules were evaluated through Kaplan-Meier plots, and a log-rank p-value < 0.01 for the statistical significance in all survival analyses.

The hub-DEGs-guided probable drugs or drug candidate molecules were retrieved through the online drug-repositioning tool and database Connectivity Map (CMap) [51] . This is an integrative platform that accumulates the information of the drug or drug candidate molecules from published data sources in clinical experimental stages, investigational stages, and approved for treatment stages. Furthermore, the molecular docking simulation study [52] was conducted for the target biomolecules with the repositioned drug to identify the best-fitted position with binding affinity. The highest docking score with the best-fit pose was considered for the drug-protein interaction affinity. An important type of molecular docking is protein-ligand docking because of its therapeutic applications in modern structure-based drug design [52] . Here, have performed some vital protein ligand docking and studied the interacting amino acids of the same complex. The 3D structure of the target proteins was obtained from Protein Data bank (PDB). The chemical structure of drugs was retrieved from PubChem database (https://pubchem.ncbi.nlm.nih.gov/, accessed on 5 December 2021). All generated chemical compound structures were energy minimized by the MMFF94 force field [53] . For the binding sites, predictions of target proteins were analyzed through 3DLigandSite-Ligand-binding site prediction Server [54] . Docking analysis was carried out using Autodock 4.2 [55] and AutoDock Vina [56] . The interactions like Hydrogen Bonding and other non-bonded terms between all drug and target proteins were carried out using the Accelrys Discovery Studio Visualizer software [57] .

At first, we normalized the genes expression profiles by using RMA. Then, we analyzed the normalized dataset by the statistical LIMMA approach and isolated 1943 DEGs between NSCLC and control samples with the cutoff at adjusted p-value < 0.001 and |log 2 FC| < 1 (Figure 2A ). Among those, 1367 DEGs were upregulated, and the remaining 576 DEGs were downregulated ( Figure 2B ). Further analysis was conducted based on these DEGs. 

At first, we normalized the genes expression profiles by using RMA. Then, we analyzed the normalized dataset by the statistical LIMMA approach and isolated 1943 DEGs between NSCLC and control samples with the cutoff at adjusted p-value < 0.001 and |log2FC| < 1 (Figure 2A ). Among those, 1367 DEGs were upregulated, and the remaining 576 DEGs were downregulated ( Figure 2B ). Further analysis was conducted based on these DEGs. 

The PPI network analysis was conducted to reveal the central highly connected proteins which are called hub-DEGs, or proteins or key genes/proteins based on the degree measures ( Figure 3 ) through Cytoscope 3.7.2 with CytoHubba. The degree was considered as ≥10 along with the other default parameters. The proposed top hub proteins are CDK1, EGFR, FYN, UBC, MYC, CCNB1, FOS, RHOB, CDC6, CDC20, and CHEK1, which could be the main proteins in the NSCLC pathogenesis mechanism. By using the MCODE algorithm, 19 sub-network modules were selected considering the default parameters such as node score cutoff of 0.2, K-Core value of 2, and maximum depth from the seed node of 100 along with the other default parameters. Based on the score, the top four modules are represented in Figure 4 and details of analysis results are provided in Supplementary File S1. The sub-modules were checked and the presence of the proposed hub proteins was found. The presence of the hub proteins indicates that these are more reliable to treat as potential therapeutic targets. 

The PPI network analysis was conducted to reveal the central highly connected proteins which are called hub-DEGs, or proteins or key genes/proteins based on the degree measures ( Figure 3 ) through Cytoscope 3.7.2 with CytoHubba. The degree was considered as ≥10 along with the other default parameters. The proposed top hub proteins are CDK1, EGFR, FYN, UBC, MYC, CCNB1, FOS, RHOB, CDC6, CDC20, and CHEK1, which could be the main proteins in the NSCLC pathogenesis mechanism. By using the MCODE algorithm, 19 sub-network modules were selected considering the default parameters such as node score cutoff of 0.2, K-Core value of 2, and maximum depth from the seed node of 100 along with the other default parameters. Based on the score, the top four modules are represented in Figure 4 and details of analysis results are provided in Supplementary Figure S1 . The sub-modules were checked and the presence of the proposed hub proteins was found. The presence of the hub proteins indicates that these are more reliable to treat as potential therapeutic targets.

The genomic alteration/mutation analysis of 11 hub-DEGs revealed that the EGFR, MYC, and CHEK1 genes had 12%, 8%, and 1.3% genomic alteration/mutation over the four lung cancer studies. Other genes were consistent among the studies. For details of the genomic alteration/mutation summary, see Supplementary Figure S1 .

The physicochemical properties of the identified hub proteins are reported in this study. These properties are essential for deeper investigation of the significant biomolecules. The EGFR protein had the highest molecular weight (MW) of 134,277.4 kda, where the UBC reflected the lowest 18,006.82 kda MW. The isoelectric point ranged from 4.77 (FOS) to 9.64 (CDC6) among the reported hub proteins. The detailed information is summarized in Table 1 . 

The genomic alteration/mutation analysis of 11 hub-DEGs revealed that the EGFR, MYC, and CHEK1 genes had 12%, 8%, and 1.3% genomic alteration/mutation over the four lung cancer studies. Other genes were consistent among the studies. For details of the genomic alteration/mutation summary, see Supplementary Figure S1 .

The physicochemical properties of the identified hub proteins are reported in this study. These properties are essential for deeper investigation of the significant biomolecules. The EGFR protein had the highest molecular weight (MW) of 134,277.4 kda, where the UBC reflected the lowest 18,006.82 kda MW. The isoelectric point ranged from 4.77 (FOS) to 9.64 (CDC6) among the reported hub proteins. The detailed information is summarized in Table 1 . 

DAVID (version v6.8) revealed the molecular function, biological process, and molecular pathway annotations of the identified DEGs through the gene over-representation analysis. The significant GO terms were retrieved, which included the biological processes, molecular function, and cellular components ( Table 2 ). The significant GO terms are summarized and presented in Table 2 for upregulated and downregulated genes separately. The significant functional pathways obtained from the KEGG Pathway analysis are also shown in Figure 5 for the hub-DEGs. The pathways in cancer, cytokine-cytokine receptor interaction, chemokine signaling pathway, cell-adhesion molecules (CAMs), cAMP signaling pathway, MAPK signaling pathway, and TNF signaling pathway are the significant pathways shared by the upregulated genes ( Figure 5A ). The metabolic pathways, cell cycle, PI3K-Akt signaling pathway, focal adhesion, and ECM-receptor interaction key pathways are exhibited by the downregulated genes ( Figure 5B ). 

The TFs-DEGs interaction network and the miRNA-DEGs interaction network revealed the substantial TFs and the miRNAs ( Figure 6 ) that may significantly regulate the DEGs. The transcription factors (FOXC1, GATA2, YY1, E2F1, FOXL1, NFIC, NFKB1, PPARG, TFAP2A, USF2) and miRNA (miR-335-5p, miR-26b-5p, miR-16-5p, miR-124-3p, miR-92a-3p, miR-7b-5p, miR-93-5p, miR-17-5p, miR-155-5p) were selected as the key transcriptional and post-transcriptional regulatory biomolecules of DEGs. Furthermore, the interaction network of hub proteins with TFs and miRNA were constructed (Figure 7) . The hub-proteins versus TFs interaction network reflected four TFs (FOXC1, GATA2, YY1, and NFIC) as the key regulatory TFs of the drug target hub-DEGs/proteins ( Figure 7A ). On the other hand, five miRNAs (miR-335-5p, miR-26b-5p, miR-92a-3p, miR-155-5p, and miR-16-5p) were found as the key regulatory miRNAs of hub-DEGs/proteins ( Figure 7B ). These regulatory biomolecules were also found from the interaction network analysis of DEGs-TF and all DEGs-miRNA, respectively ( Figure 6 ). 

The TFs-DEGs interaction network and the miRNA-DEGs interaction network revealed the substantial TFs and the miRNAs ( Figure 6 ) that may significantly regulate the DEGs. The transcription factors (FOXC1, GATA2, YY1, E2F1, FOXL1, NFIC, NFKB1, PPARG, TFAP2A, USF2) and miRNA (miR-335-5p, miR-26b-5p, miR-16-5p, miR-124-3p, miR-92a-3p, miR-7b-5p, miR-93-5p, miR-17-5p, miR-155-5p) were selected as the key transcriptional and post-transcriptional regulatory biomolecules of DEGs. Furthermore, the interaction network of hub proteins with TFs and miRNA were constructed (Figure 7) . The hub-proteins versus TFs interaction network reflected four TFs (FOXC1, GATA2, YY1, and NFIC) as the key regulatory TFs of the drug target hub-DEGs/proteins ( Figure 7A ). On the other hand, five miRNAs (miR-335-5p, miR-26b-5p, miR-92a-3p, miR-155-5p, and miR-16-5p) were found as the key regulatory miRNAs of hub-DEGs/proteins ( Figure 7B ). These regulatory biomolecules were also found from the interaction network analysis of DEGs-TF and all DEGs-miRNA, respectively ( Figure 6 ). 

The risk discrimination performance and the differential expression pattern were observed by the online gene validation website SurvExpress. The analysis was conducted through the TCGA Lung squamous cell carcinoma survival information for the hub genes and the key transcription factors. The survival curve for the high-and low-risk group and the box plot of their gene expressions are shown in (Figure 8 ). For both analyses, the 

The risk discrimination performance and the differential expression pattern were observed by the online gene validation website SurvExpress. The analysis was conducted through the TCGA Lung squamous cell carcinoma survival information for the hub genes and the key transcription factors. The survival curve for the high-and low-risk group and the box plot of their gene expressions are shown in (Figure 8 ). For both analyses, the 

The risk discrimination performance and the differential expression pattern were observed by the online gene validation website SurvExpress. The analysis was conducted through the TCGA Lung squamous cell carcinoma survival information for the hub genes and the key transcription factors. The survival curve for the high-and low-risk group and the box plot of their gene expressions are shown in (Figure 8 ). For both analyses, the prognostic index, log-rank test, and hazard ratio are shown (Figure 8 ). All hub proteins and reported TFs showed statistically significant performances in terms of survival probabilities in all datasets, in both the high-and low-risk groups.

Vaccines 2022, 10, x 12 of 21 prognostic index, log-rank test, and hazard ratio are shown ( Figure 8 ). All hub proteins and reported TFs showed statistically significant performances in terms of survival probabilities in all datasets, in both the high-and low-risk groups. 

The identification of the drug candidate molecules through CMap database revealed the repurposed drugs for the top hub drug-target proteins. The CMap database reflected the drug candidate molecules for the submitted hub proteins. Among the top hub proteins, for the CDK1, EGFR, FYN, and MYC, we found repurposable drugs in pre-clinical trials, FDA-approved drugs, and those in other experimental stages (Table 3) . 

The identification of the drug candidate molecules through CMap database revealed the repurposed drugs for the top hub drug-target proteins. The CMap database reflected the drug candidate molecules for the submitted hub proteins. Among the top hub proteins, for the CDK1, EGFR, FYN, and MYC, we found repurposable drugs in pre-clinical trials, FDA-approved drugs, and those in other experimental stages (Table 3) . The molecular docking analysis for the FDA-approved, launched drugs with the hub proteins was conducted. The best pose with the highest docking score was considered to select the drug-protein interaction. The potential repositioned drug candidates need deeper attention for further experimental validation, which leads to the development of more efficient therapy for NSCLC treatment. The molecular docking analysis results are summarized in Figure 9 , where (i) indicates the protein-drug complex and (ii) indicates the 2D diagram with interacting amino acid. For the Dinaciclib-CDK1 complex, interaction in the substrate-binding site (SBS-1) of CDK1 generated a binding-free energy of −9.3 Kcal/mol. Residues such as THR14, TYR15, VAL18, LYS33, GLN132, ASN133, ALA145, ASP146, and VAL165 surround the amino acid and THR14, GLN132, GLN132, ASN133, and ASP146 are involved in the hydrogen-bond interaction while the other surrounding amino acid residues are involved in hydrophobic interactions ( Figure 9A ). The docking simulation of EGFR inhibitor was performed with three compounds, including Afatinib, Erlotinib, and Gefitinib ( Figure 9B-D) . The highest affinity for substrate binding sites (SBS-2), with a binding free energy of −9.0 Kcal/mol, was found for Afatinib in the EGFR open conformation model, and binding-free energies of −8.5 Kcal/mol and −8.2 Kcal/mol were found for for Erlotinib and Gefitinib compounds in EGFR conformations respectively. Therefore, the chemical compound of Afatinib was strongly bound with EGFR conformation. LEU718, LYS745, MET793, CYS797, ARG841, ASN842, ASP855, and LEU858 are the surrounding residues for the Afatinib-EGFR complex. MET793, ASN842, ASP855, and LEU718 are involved in the hydrogen-bond interaction, while the other surrounding amino acids such as LYS745, LEU718, LEU858, and ARG841, CYS797, and ARG841 are involved in Pi-Cation, Alkyl, and Pi-Alkyl interactions respectively. The docking simulation of FYN inhibitor was performed with two compounds including Bosutinib and Dasatinib ( Figure 9E ,F). The highest affinity for SBS-3, with a binding-free energy of −7.1 Kcal/mol, was found for Bosutinib in FYN conformation, and a binding-free energy of −6.9 Kcal/mol was found for the Dasatinib-EGFR complex. Therefore, the compound of Bosutinib was strongly bound with the FYN conformation. Trp149, Tyr150, Arg176, Leu224, and Gln225 are surrounding residues for the Bosutinib-FYN complex. TRP149, GLN225, and ARG176 are involved in the hydrogen-bond interaction, while the other surrounding amino acids such as TRP149 and TRP149, and TRP149, TYR150, and LEU224 are involved in C-H and Pi-Orbital's interactions respectively.

For the TWS-119-MYC complex, the interaction in SBS-4 of MYC generated a bindingfree energy of −7.9 Kcal/mol. Residues such as Ser952, Val953, Glu956, Arg254, His258, and Gln261 are surrounding amino acids and GLN261, GLU956, and HIS258 are involved in the hydrogen-bond interaction while the other surrounding amino acid residues are involved in hydrophobic interactions ( Figure 9G ). The ultimate potential of the drugs with the molecular signatures of the NSCLC demanded close attention for experimental validation for developing effective and safe medications. 

Identification of disease-causing crucial biomarkers may shed light on a deeper understanding of the molecular mechanism of disease [58] [59] [60] [61] [62] [63] . The present study was conducted to analyze the NSCLC gene expression data to determine the DEGs, extensive molecular pathways, significant hub proteins, and associated regulatory biomolecules in order to pick up the potential therapeutic targets for NSCLC through a multi-omics data integration framework. Through the gene expression patterns analysis, we identified 1943 DEGs, including 1367 upregulated and 576 downregulated genes. The functional enrichment analysis revealed that the proposed upregulated DEGs are significantly involved with some cancer-causing molecular functions and pathways, including cytokine-cytokine receptor interaction, chemokine signaling pathway, cell-adhesion molecules (CAMs), cAMP signaling pathway, MAPK signaling pathway, TNF signaling pathway, cGMP-PKG signaling pathway, Proteoglycans in cancer, and Rap1 signaling pathway ( Figure 5 ). The downregulated genes are shared metabolic pathways, cell cycle, PI3K-Akt signaling pathway, focal adhesion, ECM-receptor interaction, p53 signaling pathway, and protein digestion and absorption pathways. All of these functions and pathways are significantly related to cancer development and play crucial roles in the NSCLC microenvironment. Recent studies indicated the importance of the tumor microenvironment as a decisive factor in tumorigenesis in various cancers [64] [65] [66] [67] [68] . Therefore, the physicochemical properties will be helpful to explore the further analysis of the reported proteins as a therapeutic target for NSCLC.

To detect the basic mechanism of disease, the protein-protein interaction network analysis is becoming a promising approach [69] . The PPI network analysis in this study revealed the hub-DEGs' encoded hub-proteins. The CDK1 is related to the cell cycle activities. Up-regulation of CDK1 genes may be indicative of poor survival rates and a higher risk for cancer recurrence. The CDK1 gene is also related to several other cancer diseases [70, 71] . The EGFR gene is associated with cell growth and had a contribution in lung cancer studied before [72, 73] . The study revealed that the growth is suppressed and the radiosensitivity is amplified by the activities of ubiquitin C (UBC) in NSCLC cells [74] [75] [76] [77] . The CDC6, CDC20, and CHEK1 genes are closely related to the occurrence and development of small-cell lung cancer, and CHEK1 is treated as a therapeutic target for lung cancer [78] . Eight hub genes (CDK1, EGFR, UBC, MYC, CCNB1, RHOB, CDC6, and CDC20) have tumor suppressor functions, while five hub genes (CDK1, EGFR, FYN, UBC, and CCNB1) are protein kinases as well. The MCODE cluster analysis clearly showed that the hub genes were distributed among the distinguished sub network (Figure 4 ) modules, which provided the strong evidence about the proposed signature biomolecules that these are reliable as therapeutic targets. Thus, the predicted hub-DEGs and relevant information might be useful in early detection of NSCLC. On the other hand, the genomic alteration/mutation analysis of the hub-DEGs reflected that most mutation for the EGFR occurred across the four lung cancer studies and was followed by the MYC and CHEK1 genes, since EGFR is a highly mutant/altered gene for lung cancer and NSCLC as well [79] . The alteration/mutation frequency revealed that the EGFR showed the highest alteration frequency relative to others, including the mutation, where CHEK1 represented mutation and deletion across the studies (Supplementary Figure S1) , which may be a concern of investigation in future research.

The DEGs and hub-DEGs regulatory network analysis commonly revealed four transcription factors (FOXC1, GATA2, YY1, and NFIC) and five miRNAs (miR-335-5p, miR-26b-5p, miR-92a-3p, miR-155-5p, and miR-16-5p) as the key transcriptional and posttranscriptional regulators of DEGs as well as hub-DEGs. A study reported that various tumor-associated genes are regulated by FOXC1 and maintain several cancer-related pathways [80] . The GATA2 is treated as a therapeutic target in NSCLC treatment development and it also related to breast and kidney cancer [81, 82] . The higher expression pattern of YY1 transcription factor triggered the patients having larger tumor size, differentiation, higher TNM stage, and lymph node metastasis [83] . The reported TFs are also involved in other cancer diseases [58] [59] [60] [61] [62] [63] . In various types of cancer tissues, the miR-26b-5p acts as a tumor suppressor [84] . Currently, as one of the diagnostic tools for lung cancer identification, the miR-92a-3p expression measurement is being used [85, 86] . The miR-155-5p is significantly associated with a higher risk for progression in adenocarcinoma patients [87, 88] and miR-16-5p showed higher expression pattern in NSCLC cells [86] .

The prognostic power of the reported biomolecules in discriminating the high-and lowrisk conditions were exhibited by using the multivariate survival probability curves and box plots (Figure 8) . The survival curves clearly demonstrated that the reported biomolecules played a significant role in patient survival. The box plot of the gene expression data of the molecular candidate also showed clear differences between the high-and low-risk groups ( Figure 8B ).

Finally, we selected the top-ranked five hub-DEGs-guided candidate drugs from the Connectivity Map (CMap) database ( Table 3 ). Out of the selected drugs, we validated FDA approved six launched drugs (Dinaciclib, Afatinib, Icotinib, Bosutinib, Dasatinib, and TWS-119) by molecular docking simulation with the top-ranked five hub-DEGs-mediated target proteins for the treatment against NSCLC. The drug-target binding affinity scores (less than −7.0 Kcal/mol) suggested that the aforementioned six FDA approved launched drugs might be effective for the treatment against NSCLC. Thus, the findings of this study might be useful resources in prevention and early detection of NSCLC.

The current study focused on identifying the significant biomolecules along with their molecular mechanisms through integrative bioinformatics analysis. Among 1943 DEGs, 11 DEGs (CDK1, EGFR, FYN, UBC, MYC, CCNB1, FOS, RHOB, CDC6, CDC20, and CHEK1) were reported as the hub-DEGs/proteins that may play the key roles in NSCLC progression. The DEGs set enrichment analysis with the gene ontology (GO) database showed that DEGs are significantly involved with the cell adhesion, cell division, inflammatory response, signal transduction, protein binding, and plasma membrane extracellular region. The enrichment analysis with the KEGG pathway database showed that DEGs are significantly associated with the metabolic pathways, cell cycle, ECM-receptor interaction, and pathways in cancer. The inevitable regulatory TFs (FOXC1, GATA2, YY1, and FOXL1) and miRNA (miR-335-5p, miR-26b-5p, miR-92a-3p, miR-155-5pm and miR-16-5p) were identified as potential regulatory biomarkers for both DEGs and hub-DEGs. The strong prognostic performance of the reported biomolecules was observed between the high-and low-risk groups through the survival curves and box plots. The top-ranked hub-DEG-guided repurposable drug analysis revealed that the Dinaciclib, Afatinib, Icotinib, Bosutinib, Dasatiniband, and TWS-119 might be suggested as novel putative drugs for NSCLC treatment. The molecular docking analysis between the drug-target hub proteins and the repurposed drugs were conducted to investigate their molecular interaction mechanism. Thus, the findings of this study might be useful resources for NSCLC diagnosis, prognosis, and therapies, including gene-based DNA-vaccine development.

Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/vaccines10050771/s1, Figure S1 : The genomic alteration of the proposed 11 hub-DEGs among the NSCLC. 

Seven autophagy-related lncRNAs are associated with the tumor immune microenvironment in predicting survival risk of nonsmall cell lung cancer

Cancer statistics, 2020

Pathologic Assessment after Neoadjuvant Chemotherapy for NSCLC: Importance and Implications of Distinguishing Adenocarcinoma From Squamous Cell Carcinoma

Distribution patterns of the metastases of the lung carcinoma in relation to histological type of the primary tumor: An autopsy study

Molecular Biologic Staging of Lung Cancer

In vitro and in vivo evaluation of therapy targeting epithelial-cell adhesion-molecule aptamers for non-small cell lung cancer

Patient-derived xenograft models of non-small cell lung cancer and their potential utility in personalized medicine

Temporal and spatial heterogeneity of host response to SARS-CoV-2 pulmonary infection

Role of Synaptophysin, Chromogranin and CD56 in adenocarcinoma and squamous cell carcinoma of the lung lacking morphological features of neuroendocrine differentiation: A retrospective large-scale study on 1170 tissue samples

college of chest physicians evidence-based clinical practice guidelines

Characteristics, incidence, and risk factors of immune checkpoint inhibitor-related pneumonitis in patients with non-small cell lung cancer

Risk factors of Lung Cancer in nonsmoker

Risk factors for lung cancer worldwide

Non-small cell lung cancer: Current treatment and future advances

Bioinformatics Screening of Potential Biomarkers from mRNA Expression Profiles to Discover Drug Targets and Agents for Cervical Cancer

Statistics and network-based approaches to identify molecular mechanisms that drive the progression of breast cancer

Vaccines for the 21st century

Recent developments in preclinical DNA vaccination

Dna vaccines: Recent developments and the future. Vaccines

Comparison of Current Regulatory Status for Gene-Based Vaccines in the U.S.; Europe and Japan

Gene-based vaccines: Recent technical and clinical advances

Interactome analysis of gene expression profiles identifies CDC6 as a potential therapeutic target modified by miR-215-5p in hepatocellular carcinoma

Molecular and genetic characterization of MHC deficiency identifies ezh2 as therapeutic target for enhancing immune recognition

Systematic cancer-testis gene expression analysis identified CDCA5 as a potential therapeutic target in esophageal squamous cell carcinoma

Vaccines 2022

Identification of the potential therapeutic target gene ube2c in human hepatocellular carcinoma: An investigation based on geo and tcga databases

Drug Targeting and Biomarkers in Head and Neck Cancers: Insights from Systems Biology Analyses

Computational identification of host genomic biomarkers highlighting their functions, pathways and regulators that influence SARS-CoV-2 infections and drug repurposing

Identification of host transcriptome-guided repurposable drugs for SARS-CoV-1 infections and their validation with SARS-CoV-2 infections by using the integrated bioinformatics approaches

Archive for functional genomics data sets-Update

Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women

Open software development for computational biology and bioinformatics

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources

New perspectives on genomes, pathways, diseases and drugs

KEGG as a reference resource for gene and protein annotation

Kyoto Encyclopedia of Genes and Genomes

The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible

Network-based methods for predicting essential genes or proteins: A survey

NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data

Identifying hub objects and sub-networks from complex interactome

Tissue-Specific Molecular Biomarker Signatures of Type 2 Diabetes: An Integrative Analysis of Transcriptomics and Protein-Protein Interaction Data

MultiContrast Delayed Enhancement (MCODE) improves detection of subendocardial myocardial infarction by late gadolinium enhancement cardiovascular magnetic resonance: A clinical validation study

Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal

The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov

DIANA-TarBase v8: A decade-long collection of experimentally supported miRNA-gene interactions

Updates to the experimentally validated miRNA-target interactions database

Update of the open-access database of transcription factor binding profiles and its web framework

SurvExpress: An Online Biomarker Validation Tool and Database for Cancer Gene Expression Data Using Survival Analysis

The connectivity map: Using gene-expression signatures to connect small molecules, genes, and disease

Molecular Docking: A Powerful Approach for Structure-Based Drug Discovery

Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94

3DLigandSite: Predicting ligand-binding sites using similar structures

AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility

AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading

Accelrys Software Inc. Visualizer DS

Identification of prognostic biomarker signatures and candidate drugs in colorectal cancer: Insights from systems biology analysis

Integrative transcriptomics analysis of lung epithelial cells and identification of repurposable drug candidates for COVID-19

Network-based approach to identify molecular signatures and therapeutic agents in Alzheimer's disease

Network-Based Computational Approach to Identify Delineating Common Cell Pathways Influencing Type 2 Diabetes and Diseases of Bone and Joints

Diseasome and comorbidities complexities of SARS-CoV-2 infection with common malignant diseases

Nurul Haque Mollah, M. A robust approach for identification of cancer biomarkers and candidate drugs

Cancer Stem Cell Hypothesis for Therapeutic Innovation in Clinical Oncology? Taking the Root Out, Not Chopping the Leaf

Tissue Proteome Analysis of Different Grades of Human Gliomas Provides Major Cues for Glioma Pathogenesis

Multiomics Analysis of Tumor Microenvironment Reveals Gata2 and miRNA-124-3p as Potential Novel Biomarkers in Ovarian Cancer

Biomarkers in tumor microenvironment? Upregulation of fibroblast activation protein-α correlates with gastric cancer progression and poor prognosis

Collagen abundance controls melanoma phenotypes through lineage-specific microenvironment sensing

The role of protein interaction networks in systems biomedicine

Comprehensive analysis of differential expression profiles reveals potential biomarkers associated with the cell cycle and regulated by p53 in human small cell lung cancer

Prognostic and predictive values of CDK1 and MAD2L1 in lung adenocarcinoma

Epidermal growth factor receptor (EGFR) in lung cancer: An overview and update

Vaccines 2022

Detection of rare and novel EGFR mutations in NSCLC patients: Implications for treatment-decision

Downregulation of ubiquitin inhibits the proliferation and radioresistance of non-small cell lung cancer cells in vitro and in vivo

Identification of key modules and prognostic markers in adrenocortical carcinoma by weighted gene co-expression network analysis

Evaluation of the Concentration of Ubiquitin C Protein (UBC) in Patients of Lung Cancer and Comparing with Healthy Subjects. Eng. Technol

Transcriptome analysis of phycocyanin-mediated inhibitory functions on non-small cell lung cancer A549 cell growth

Identification of candidate biomarkers and pathways associated with SCLC by bioinformatics analysis

Integrative Bioinformatics approaches to therapeutic gene target selection in various cancers for Nitroglycerin

FOXC1 in cancer development and therapy: Deciphering its emerging and divergent roles

GATA2 is epigenetically repressed in human and mouse lung tumors and is not requisite for survival of KRAS mutant lung cancer

Comprehensive analysis of the GATA transcription factor gene family in breast carcinoma using gene microarrays, online databases and integrated bioinformatics

Transcription Factor YY1 Modulates Lung Cancer Progression by Activating lncRNA-PVT1

Tumour-suppressive miRNA-26a-5p and miR-26b-5p inhibit cell aggressiveness by regulating PLOD2 in bladder cancer

The roles of microRNA in lung cancer

Evaluation of Serum Paired MicroRNA Ratios for Differential Diagnosis of Non-Small Cell Lung Cancer and Benign Pulmonary Diseases

Two Panels of Plasma MicroRNAs as Non-Invasive Biomarkers for Prediction of Recurrence in Resectable NSCLC

Hsa-miR-155-5p Up-Regulation in Breast Cancer and Its Relevance for Treatment with Poly

Vaccines 2022, 10, 771