key: cord-0016481-gf6wijx5 authors: Patni, Khyati; Agarwal, Preeti; Kumar, Ajit; Meena, Laxman S. title: Computational evaluation of anticipated PE_PGRS39 protein involvement in host–pathogen interplay and its integration into vaccine development date: 2021-04-01 journal: 3 Biotech DOI: 10.1007/s13205-021-02746-3 sha: 78f8b6a9aaa059f96faf61e14a0e907c69eb07c1 doc_id: 16481 cord_uid: gf6wijx5 Mycobacterium tuberculosis causes more than 1 million deaths every year, which is higher than any other bacterial pathogen. Its success depends on its interaction with the host and its ability to regulate the host’s immune system for its own survival. Mycobacterium tuberculosis H(37)Rv (Mtb) proteome consists of unique PE_PGRS family proteins, which present a significant role in bacterial pathogenesis over the past years. Earlier evidence suggests that some PE_PGRS proteins display fibronectin-binding activity. In this manuscript, computational characterization of the PE_PGRS39 protein has indicated something peculiar about this protein. Investigation showed that PE_PGRS39 is an extracellular protein that, instead of acting as fibronectin-binding protein, might mimic fibronectin which binds to alpha-5 beta-1 (α5β1) integrin. PE_PGRS39 protein additionally turned into proven pieces of evidence to have motifs such as DXXG and GGXGXD and PXXP that bind with guanosine triphosphate (GTP), calcium, and host Src homology 3 (SH3) domains, respectively, in conjunction with RGD-integrin binding. These interactions designate the direct role of PE_PGRS39 in bacterial pathogenesis via cell adhesion and signaling. Additionally, the analysis showed that PE_PGRS39 is an antigenic protein and epitope prediction provided functional regions of the protein that trigger a cellular immune response facilitated by T or B cells. Further, an experimental analysis could also open up new avenues for developing novel drugs by targeting signaling motifs or novel vaccines using functional epitopes that could evoke an immune response in the host. Tuberculosis (TB) is a highly transmittable disease carried by one of the most pathogenic bacteria known as Mycobacterium tuberculosis. Its strain H37Rv is the most studied strain in research laboratories. TB spread by infectious droplets containing Mycobacterium tuberculosis H 37 Rv (Mtb) from one sick person to another, and it remains a significant contributor to global health challenges, prompting comprehensive studies in infectious disease Fauci 2001) . According to the World Health Organisation (WHO) 2020 study, an estimated 10 million people worldwide have fallen ill with TB in 2019, with 5.6 million males, 3.2 million females, and 1.2 million children (WHO Report 2020). Antibiotic-resistant strains, including totally drugresistant TB (TDR-TB), rifampicin-resistant TB (RR-TB), Preeti Agarwal and Ajit Kumar have equally contributed. multidrug-resistant TB (MDR-TB), and extensively drugresistant TB (XDR-TB) , are the most imperative factors associated with the development of TB worldwide. In 2019, 206,030 people worldwide were detected with MDR/RR-TB, which comes out to be a 10% rise from 186,883 in 2018 (WHO Report 2020). Data showed that TB rates are rising alarmingly, but Bacille Calmette-Guérin (BCG) is still the solely certified vaccine for preventing TB (Montagnani et al. 2014) . Thus, there is a great need for the production of potential therapeutics for TB. Despite lots of studies on Mtb, there are still many PE/ PPE family proteins that have not been studied yet (Via et al. 2015) . In this manuscript, we worked on the subgroup of the PE/PPE protein family. These gene families of prolineglutamic (PE) and proline-proline-glutamic acid (PPE) extends to approximately 10% of total Mtb genome coding potential (Singh et al. 2016; Meena 2019) . The preserved N-terminus domains of both PE and PPE proteins are about 110 aa and 180 aa, respectively, and are divided into separate subclasses based on their variable C-terminus domain (Sampson 2011; Meena 2015) . The PE_PGRS protein, one of the PE/PPE family subgroups, has retained the N-terminus PE domain, which shows similarity to the PE family. Almost all PE_PGRS proteins are known to have consensus sequence GGXGG and demonstrated to contain GTP and calcium-binding motifs (Bachhawat and Singh 2007) . Some PE_PGRS proteins are well known for their contribution to the replication and survival of Mtb within macrophages and are likely to propose the function as a source of antigenic variability to evade host immune responses (Mahadevan 1998) . Other observations indicate that PE_PGRS proteins may be the components of the cell surface that could facilitate the binding of cell adhesion molecules, affect the bacterial cell structure or interfere with host immune responses by inhibiting antigen processing of antigen (Bachhawat and Singh 2007) . Evidence showed that some PE_PGRS proteins possess fibronectin-binding activity, which plays a potential part in cell adherence and immune invasion . Apart from the mycobacterium, it was shown that in Staphylococcus aureus (S. aureus), fibronectin-binding significantly increases the phagocytic activity of macrophages on binding with alpha-5 and beta-1 chains, which were associated with the cytoskeleton (Kevlani and Meena 2017) . Fibronectin consists of several domains classified as FnI to FnIII that show binding specificities for specific host membrane receptors such as integrins. Shreds of evidence have shown that the two significant domains within fibronectin that interact with beta-1 integrins are FnIII-9 and FnIII-10 (Speziale et al. 2019 ), but the most critical element for binding energy is the aspartate residue of the RGD motif within FnIII-10 ( Aota et al. 1994) . Integrins represent 24 αβ heterodimeric members of a broad cell adhesion class, and one-third of integrins have been shown to bind to the RGD sequence (Fenn et al. 2020 ). These integrins not only facilitate the attachment of cells to the extracellular matrix or conduct cell-cell interactions (Zimmermann et al. 2010) but are also equally responsible for migration and signal transduction (Kevlani and Meena 2017) . Understanding the interaction between host and pathogen is necessary to eliminate any disease. Many pathogens mimic human sequences to interact with host components and use host machinery for their survival. Thus, in this manuscript, we have analyzed how previously uncharacterized PE_PGRS39 protein might be responsible for Mtb pathogenesis by mimicry with host or other ways. Through bioinformatics tools, we have structurally and functionally characterized PE_PGRS39. Additionally, we have provided an insight into the integration of this protein for vaccine development. For the recovery of the protein sequence of the gene Rv2340c (PE_PGRS39) of Mtb, we used Mycobrowser (Beg et al. 2018) . Mycobrowser database provides genomic and proteomic information related to mycobacteria. It provides information about protein sequence, function, drug interaction, and orthologous relations (Lew et al. 2011) . It is also possible to recover genes and their protein sequences from the NCBI database, Uniport, and KEGG. The physiochemical properties such as theoretical pI, molecular weight, amino acid composition, atomic composition, aliphatic index, extinction coefficient, instability index, and grand average of hydropathicity (GRAVY) of the PE_PGRS39 were studied using the ProtParam tool, which takes input as a protein sequence (Godfrey et al. 2005) . Further, we found functional motifs computationally by motif scan and manually by surfing through the NCBI (National Center for Biotechnology Information) database. Motif scan includes Prosite, Pfam, and HAMAP profiles. The subcellular localization is performed using CEL-LO2GO. CELLO2GO indicates four subcellular localizations in archaea and gram-positive bacteria utilizing multiple integrated machine-learning classifiers. Specified localizations for gram-positive bacteria are the cytoplasm, the extracellular space, the cytoplasmic membrane, and the Page 3 of 17 204 cell wall (Yu et al. 2014) . To confirm CELLO2GO results, LocTree3 sever was used (Goldberg et al. 2012 ). (a) Functional category prediction: We used the VICMpred server to predict in which functional category PE_PGRS39 is involved, providing input in FASTA format and standard prediction approach as a pattern. To predict functional groups, VICMpred proposes a support vector machine-based method. Broadly, this server offers data for four available groups, i.e., metabolism molecular, virulence factors, the cellular process, and information molecules (Sharma et al. 2019) . The precision of VICMpred is 70.75% (Saha and Raghava 2006) . (b) Molecular activity prediction: Molecular activity of PE_PGRS39 was predicted using the COFACTOR online server (Agarwal et al. 2020) . COFACTOR takes either sequence or 3D structure of the model and threads the query through the BioLiP protein function database, which searches global and local structure matches to identify molecular activity based on functional sites and homologies. It annotates the biological function of a given protein in terms of gene ontology (GO). Protein-protein interaction inferrals can obtain these GO names from STRING or by sequence and sequence-profile alignments from UniProt-GOA. This server uses the best available homology templates to derive gene ontology (Roy et al. 2012 ). We used a disordered phosphorylation prediction (DEPP) server to predict phosphorylation sites in PE_PGRS39 protein. This server indicates the presence of threonine, serine, and tyrosine phosphorylation sites in proteins. It is well known that disordered protein regions frequently contain short linear peptide motifs, such as binding sites for the SH3 domain and targeting signals essential for protein function (Linding et al. 2003) . Thus, to identify those disorder regions, we used PONDR ® (Predictor of Natural Disordered Region). PONDR ® takes protein sequence as an input that searches for the size of 9-21 amino acids. It is typically a neural network based on hydropathy, particular amino acid composition, or complexity of a given protein sequence (Delamain et al. 2019 ). There are eight forms of PONDER, but we used PONDR VL-XT because it is the most reliable one (Agarwal et al. 2020 ). (a) Secondary structure prediction: A user-friendly server named PSIPRED was used to evaluate the secondary structure of PE_PGRS39. PSIPRED is a secondary structure prediction method based on PSI-blast that predicts beta-sheets, alpha helices, and coils from the primary sequence successfully (Monu 2016 ). (b) Tertiary structure prediction: To obtain the tertiary structure of PE PGRS39, Raptor-X, I-TASSER, and Phyre2 were used. Lastly, for further experimentation, the best model predicted by I-TASSER was chosen. I-TASSER is an interactive framework designed for template-based predictions of protein structure and function. For multiple threading alignments and iterative structural assembly simulations, I-TASSER first generates full-length atomic structural models using the highest significant template, followed by atomiclevel structure refinement starting from the target protein amino acid sequence (Zhang 2008) . For multiple threading alignments, I-TASSER uses LOMETS (Roy et al. 2010 ). (c) Tertiary structure refinement: To generate structure models, protein structure prediction tools rely on the similarity between the target and available template structures in different databases. Therefore, improving model structures based on templates beyond accuracy has been a primary concern among the bioinformatics community. The GalaxyRefine webserver was used to enhance both global and local structure consistency. This tool showed optimum results when the structure was predicted via servers such as I-TASSER and ROSETTA (Kathwate 2020; Heo et al. 2013 ). (d) Structure validation: The best-refined structure of protein PE_PGRS39 was authenticated through SAVES v5.0 (structure analysis and verification server version 5) and the ProSA server. SAVES v5.0 is a complete package of five programs that test the overall consistency of a protein structure. Out of five programs, we used VERIFY-3D to test the 3-D sequence profile for protein models and PROCHECK to validate structure through the Ramachandran plot. PROCHECK checks the stereochemical nature of a protein structure by analyzing residue-by-residue geometry and overall structure geometry (Beg et al. 2018 ). (a) Protein-protein docking: To show protein-protein interactions, we used the suite of docking programs called pyDock (Cheng et al. 2007 ). pyDock is a protein-protein docking, FFT-based algorithm that uses electrostatics, desolvation energy, and a small contri-bution of van der Waals contribution to scoring rigidbody docking poses (pyDockSER). The docking solutions evaluated for rigid-body docking with the pyDockSER module optimized are given by the equation: BE = EE + DE + W.VE, where BE is equal to total binding energy, EE is total electrostatics binding energy (lies between 1.0 and − 1.0 kcal/mol), DE is desolvation energy upon binding, VE is van der Waals binding energy(set at 1.0 kcal/mol to avoid much noise from the docking of rigid body surfaces), and W refers to weight taken at default equal to 0.1 (Degryse et al. 2008 We used EASE-MM, which stands for evolutionary, amino acid, and structural encodings with multiple models, to find the outcome of the mutation on potential binding sequences. EASE-MM supports the prediction of missense mutation specified as amino acid substitutions. EASE-MM takes protein sequence in a specified format and combines multiple specialized machine learning techniques to predict ΔΔG (Folkman et al. 2016 ). (a) Antigenicity: We predicted the protein's antigenic nature using the VaxiJen sever, setting a threshold of 0.4. This server is based on the physicochemical properties of the protein (Kardani et al. 2020 ). (b) Prediction of B-cell epitopes: To predict the B-cell binding region within the PE_ PGRS39 protein sequence, we used ABCpred. A threshold of 0.5 and 16mer amino acids default window were selected (Meena and Meena 2016 We used mycobrowser to obtain the protein sequence of the Rv2340c gene in FASTA format. Results showed that the retrieved protein belongs to the PE_PGRS family protein called PE_PGRS39, a glycine-rich protein containing 22.0% glycine. This protein is 413 amino acids long, with an isoelectric point at 6.5888, and has a molecular mass of 38722.3 Da. Through the ProtParam tool, we found that the grand average of hydropathicity (GRAVY) is 0.086. The protein sequence contains 28 negatively charged residues and 23 positively charged residues (Table 1) . Prot-Param also determined the instability index's value as 22.72, which categorizes PE_PGRS39 as a stable protein. Through MotifScan, we found that PE_PGRS39 protein contains an RGD cell attachment motif that consists of the amino acids Arg-Gly-Asp (Speziale et al. 2019) . PE_PGRS39 also confirmed the presence of conserved GTP and calcium-binding motifs, which is represented by GXXXXGK/DXXG/NKXD and GGXGXD/NXUX, respectively, where X represents any amino acid, and U is a non-polar or hydrophobic large residue that binds to Ca 2+ . Along with the above motifs, PE_PGRS39 has a PXXP motif which binds with SH3-containing proteins (Fig. 1) . In particular, some PXXP motifs are unique to Mtb (Chandra et al. 2004 ). PE_PGRS39 was predicted to be an extracellular protein with a cumulative score of 3.448, which was more significant than the other expected cell wall, plasma membrane, and cytoplasm divisions ( Table 2 ). The localization probability chart is represented in Fig. 2 . (a) Functional category prediction: The VICMpred server was used to predict which of the functional groups PE_PGRS39 belongs to. VICMpred showed that the PE_PGRS39 involve in the cellular process with a score of − 0.952733511 (Table 3) . (b) Molecular function prediction: We used COFACTOR to predict the probable molecular activity of PE_ PGRS39. It is expected that this protein establishes GO terms of structural molecule activity with CscoreGO as 0.31 that contributes to the structural integrity or its assembly within or outside a cell. It also shows functions like receptor binding activity, complex macromolecular binding, heterocyclic compounds binding, etc. (Table 4 ). Here, CscoreGO is the confidence score for standard GO terms on the COFACTOR server. CscoreGO values are within the range of 0-1, where a higher value infers greater confidence in predicting protein function. Although, CscoreGO value of 0.31 has low confidence level, we have still considered it to have a fair idea regarding protein's possible function. (c) Phosphorylation site and functional disorder regions prediction: The PONDR server indicated active disorder regions within the PE_PGRS39 protein sequence. The total number of DRs (Disorder Regions) counted in PE_PGRS39 is 10, in which 109 residues comes in the most prolonged DR. Total number of disordered resi-dues came out to be 291, and the overall disordered percent came out as 70.46 (Fig. 3) . DEPP results showed that 22 out of 33 (73.3333%) serines were phosphorylated, 6 out of 21 (28.5714%) threonines were phosphorylated, and 0 out of 2 (0.000%), i.e., no tyrosines were phosphorylated in our protein (Fig. 4) . (a) Secondary structure prediction: PRISPERD predicts that the secondary structure of PE_PGRS39 contains 67.12% coil, 32.4% alpha-helix, and 0.48% beta-strand. For the projected secondary structure, the pictorial representation is shown in Fig. 5 . (b) Tertiary structure modeling: I-TASSER predicted the tertiary structure of the protein. Initially, ten models were obtained using threading templates from the PDB structural database according to their Z score values ranging from 1.18 to 4.22, from which five best models were selected with C score values ranging from − 3.15 to − 0.68, where a greater C score means greater confidence. The C score of − 0.68 with a TM-score of 0.63 ± 0.14 and RMSD of 8.4 ± 4.5 Å was chosen for the further experiment (Fig. 6a ) TM-score greater than 0.5 suggests a valid topology model, and a TM-score less than 0.17 implies random similarity. (c) Tertiary structure refinement: We used the GalaxyRefine server for refining I-TASSER predicted model with the highest C score. The initial model was structurally changed in terms of GDT-HA, RMSD, and MolProbity score. One model with the values of the following parameters as MolProbity (2.298), GDT-HA (0.9546), RMSD (0.414), clash score (15.8), low rotamers score (0.4), and the Ramachandran plot score of 88.1 was selected as a final model based on model quality scores for all refined models (Fig. 6b ). (d) Final refined structure validation: The refined I-TASSER structure from the galaxy refine tool was verified using the SAVES v5.0 server. SAVES v5.0 consists of a package of five programs. Out of which, the result from the VERIFY-3D program presented that 80.63% of the residues of PE_PGRS39 protein had a 3D-1D arrangement, which is greater than the threshold value of 0.2. So, this model was successfully passed according to VERIFY-3D. ERRAT score of 84.02 validated and predicted the overall sound quality of the structure. Through PROCHECK Ramachandran plot was evaluated. Ramachandran plot depicted that 82.7% of protein residues were in the most favored region (Fig. 6c) . Furthermore, 13.0% and 1.7% residues were found in allowed and generously allowed areas, In the cartoon parameter helix, strands, and coils are denoted by pink, yellow colour, and grey line, respectively respectively, and only 2.7% of the residues were present in the disallowed region. The above data confirmed that the predicted model could be used for further experiments. Additionally, when we analyzed the model with the ProSA server, Z score provided the negative value of − 3.09, which confirms that the overall model quality is good. (a) Protein-protein docking: To confirm the above results and to know the binding affinity of each interacting host protein, molecular docking was performed, the interaction of PE_PGRS39 with integrins and SH3 domain was modeled by pyDock. We performed docking with three integrins with PDB code; namely, 4m76 (β 2 integrin; score: − 129.39), 4o02 (β 3 integrin; score: − 122.25) and 3vi3 (α5β1), out of that α5β1 showed maximum affinity (total binding energy) of − 138.678. This α5β1 integrins act as a triggering molecule for the recruitment of various protein assemblies and strengthen cell adhesion (Speziale et al. 2019 ). Further, we used PDB code, namely, 1W1F, which represents the human SH3 domain, and it was found that this SH3 domain binds explicitly to the PXXP motif of Mtb with a total binding energy of − 91.782. SH3 domain-containing proteins involve in cytoplasmic signaling to perform various essential functions. Molecular visualization was carried out using PyMol software in the cartoon and surface types (Fig. 7) . (b) Protein-ligand docking: Docking simulations were performed with purine nucleotides. Out of which docking simulations of GTP ( Fig. 8a) with PE_PGRS39 protein using Autodock Vina showed a maximum affinity with ∆G score of − 7.8 kcal/mol (Fig. 8b) . This result indicates that GTP has an excellent binding affinity for PE_PGRS39. For further analysis, we used the Discov-ery Studio to show GTP interaction with amino acids. It showed that GTP formed four carbon-hydrogen bonds with protein, involving the amino acid residues HIS152, GLY188, GLY141, GLY129, and two conventional hydrogen bonds GLY141 and GLU150. GTP also showed an electrostatic interaction with ASP139 and ARG122 (Fig. 8c) . Additionally, protein interaction with calcium ligand was confirmed by COACH-D results (Fig. 8d) Missense mutation causes enormous variation in protein binding, thus to forecast the effect of mutations at potential binding sites, an EASE-MM webserver was used. We predict that mutation at RGD, PXXP, calcium, and GTP binding sites can affect binding in one way or another. The extreme value of ΔΔG equal to − 1.6171 kcal/mol was detected when we replaced glycine with tryptophan at position 299 in the RGD motif, suggesting a substantial decrease in protein stability. For the SH3 domain binding site, mutation at the P336 position showed the highest destabilization of − 1.4727. Mutation at GTP and calcium-binding site showed destabilization of − 1.5209 and − 1.7457 at G114 and G135 places, respectively, on mutating with tryptophan. Antigenicity and epitope prediction (a) Protein antigenicity prediction: PE_PGRS39 was analyzed for its immunogenicity using an independent alignment server called Vaxijen-v2. The threshold for evaluation was kept at 0.4 to forecast whether PE_PGRS39 protein is antigenic or non-antigenic. The protein was predicted as antigenic with a score of overall 1.0072 (probable ANTIGEN). To determine the accuracy of the result, we have also taken an experimentally proved immunogenic Mtb protein (Schierloh et al. 2014) , i.e., aceE (contributes to acetyl-CoA production). Using the same Vaxijen-v2 server at 0.4 threshold, aceE predicted to be probable antigen with the score of 0.4770, which ultimately showed that PE_PGRS39 is highly antigenic as its value is more significant than already proven immunogenic protein. (b) B-cell epitope prediction: We used ABCpred to find several B-cell epitopes in PE_PGRS39. For analysis, default window of 16 amino acids was chosen. Outcomes showed the highest score of 0.91 for the region "GRDIVGSVRGDGGVGM" that starts at 221st residue (Table 5 ). (c) T-cell epitope prediction: Using HLAPred, we carried out a prediction of the T-cell epitope. MHC class II binders: we found the "IVGSVRGDG" epitope containing the RGD sequence that starts at 224th residue binds to MHC class II. Sequence search against the human genome showed no identity with humans (A eukaryotic organism). This indicates that this predicted binder can be a vaccine candidate. TB is a dangerous bacterial disease that causes numerous deaths every year. In 2019, it was estimated that a total of 1.4 million persons perished from TB (counting 208,000 individuals with HIV) (WHO Report 2020). Despite vigorous research on Mtb, several proteins have not been studied, and there is still no protective and corrective approach to remove TB, except for BCG. Studies show that BCG can be protective against TB, but it also fails to resolve drug-resistant TB. Thus, there is a high need to consider how different Mtb proteins interact with the host to improve therapeutics. Several bacteria have evolved to interact with host cell components through protein short linear motifs that mimic host short linear motifs. This interface similarity permits pathogen to smartly alter the host signaling, thus facilitating their internalization and pathogenesis (Guven-Maiorov et al. 2017) . In this manuscript, we have tried to analyze an uncharacterized PE_PGRS39 protein functionally and structurally to provide its role in pathogenesis. Further, observing the importance of PE_PGRS39 in host-pathogen interaction and molecular mimicry, we looked forward to the clues for vaccine development (Fig. 9) . PE_PGRS39, a glycine-rich protein made up of 413 amino acids and has a molecular mass of 38722.3 Da (Table 1) . Through subcellular localization tools, i.e., cel-lo2GO, we have found that PE_PGRS39 is an extracellular protein that implies that this protein might directly play a role in host-pathogen interaction (Fig. 2, Table 2 ). Further functional analysis was performed using VICMpred and COFACTOR, which indicated that PE_PGS39 is involve in the cellular process (Table 3) , and serves a molecular function as structural molecule activity (Table 4 ). DEPP server showed that 73.3333% serines, 28.5714% threonines, and no tyrosines were phosphorylated in PE_PGRS39 (Fig. 4) . PONDER server predicted functional disordered region (Fig. 3) . It is a well-known study that disordered protein regions are functionally active regions of the protein that frequently contain short linear peptide motifs, such as binding sites for the SH3 domain and targeting signals essential for protein function (Linding et al. 2003) . Further, in favor of functional analysis, several motifs like RGD (cell-attachment motif), PXXP, DXXG, and GGXGXD were recognized in PE_PGRS39 (Fig. 1) , which binds with integrin receptor, host SH3 domain-containing protein, GTP, and calcium, respectively. To precisely confirm the interaction between PE_PGRS39 protein and host, structural and interaction analyses were also performed. For this, we generated secondary (Fig. 5 ) and tertiary structure of PE_PGRS39 protein with refinement and validation values, which showed that the predicted protein model is a good quality model (Fig. 6) . This good quality model was further used for interaction analysis to validate the binding of PE_PGRS39 with host proteins (integrin and SH3 domain) and ligands (calcium and GTP) which successfully showed a positive result (high affinity) for α5β1 integrin, GTP, and calcium (Fig. 8) . However, SH3 domain-PXXP docking results showed low affinity but high specificity (Fig. 7) , which is preferably ideal for transient protein signaling in the cellular context (Chandra et al. 2004 ). Also, from previous studies, it is depicted that several bacteria use the PXXP motif to bind the host SH3 domain and manipulate its cell signaling pathways. In this regard, a cell wall-associated/secretory Rv1917c antigen, characterized by PXXP-SH3 binding can be taken as a positive control which has shown to induce selective Fig. 8 Binding conformation of the GTP in the protein binding site a representation of GTP ligand used for docking, b PyMol representation of GTP-protein binding with a hydrogen bond between amino acid residue, c different types of interaction between ligand GTP and PE_PGRS39 receptor, d calcium docking with COACH-D ◂ maturation of human dendritic cells by controlling the signaling of PI3K-MAPK-NF-B3 and facilitating a change in the subsequent immunity to the Th2 phenotype that helps mycobacterial immune evasion (Bansal et al. 2010) . Pieces of evidence showed that calcium (Ca 2+ ) is a universal signaling ion, and GTP-binding proteins (G-proteins) have also proved to be highly conserved signaling molecules (Meena et al. 2008) . Concerning calcium-binding proteins, studies have shown that various bacterial proteins like SdrC and SdrD in S. aureus and BapA in Paracoccus denitrificans, ClfA and ClfB in S. aureus and Vibrio cholerae, etc. bind and respond to the host Ca 2+ by its surface motifs, triggering the molecular mechanisms of adhesion, biofilm formation and processes enabling the development of persistent infections (Elíes et al. 2020) . In this in-silico experiment, after carrying out docking with various β1, β2, and β3 integrins. α5β1, also known as the fibronectin receptor, was shown to have the highest affinity. This result indicates that PE_PGRS39 might be similar to host fibronectin that contains an integrin-binding RGD motif on its extracellular surface. A similar phenomenon has been experimentally verified for the CagL protein of the Helicobacter pylori that binds with α5β1 integrin. It mimics fibronectin in triggering cell spreading, focal adhesion formation, and activation of several tyrosine kinases in an RGD-dependent manner, which gives an experimental point into our prediction. They showed that CagL activated tyrosine kinases focal adhesion kinase and Src, etc. Interestingly, host fibronectin also activates a similar range of tyrosine kinases (Tegtmeyer et al. 2010 ). Next, we inquired whether these motifs are essential for PE_PGRS39-induced signaling. We found that mutating PXXP, calcium, and GTP binding motif by the EASE-MM tool showed a high decrease in critical stability of the protein. Point mutation of glycine to tryptophan in RGD domain also showed a decrease in stability with a determined score of ΔΔG (change in Gibbs free energy) equal to − 1.6171 kcal/mol, signifying that the RGD motif of PE_PGRS39 could be vital for provoking the host cell responses, which are also stimulated by human fibronectin. Researchers showed that MHC-II peptides flexibly bind with the antigens of extracellular regions (Villadangos 2001) . Thus, after considering the importance of PE_ PGRS39, we took our experiment towards vaccine development. By B-cell and MHC-II epitope prediction tool, we found that PE_PGRS39 is highly antigenic, and both B-cell and T-cell epitopes contain the RGD sequence (Table 5) . Also, comparing the 16mer epitope sequence with the human host genome shows no match, which means that these epitopes are unique to mycobacterium and can also be used in vaccine development to evoke immune response against the pathogen. Studies showed that calcium-dependent signaling mechanisms are potential targets for inhibiting macrophage activation by Mtb (Watanabe et al. 1996) . Design of peptidomimics can be initiated by screening-specific distinctive PXXP-containing motifs in the Mtb proteome (Chandra et al. 2004) . Therefore, targeting this PE-PGRS39 protein to disrupt its functional features could also prove a successful step towards creating an effective tuberculosis drug. Further experimental analysis on this protein will open a door for host-pathogen interaction studies. In this study, various computational studies like functional, structural, interaction, and mutational analysis were performed to highlight the role of an unexplored PE_PGRS39 protein. Investigation showed that PE_PGRS39 is a functionally important protein that can interact with host cell components through its short linear motifs. PE_PGRS39 contains motifs like RGD (cell-attachment motif), PXXP, DXXG, and GGXGXD, which binds alpha-5 beta-1 (α5β1) integrin receptor, host SH3 domain-containing protein, GTP, and calcium, respectively with good affinity. Previous studies in different pathogens showed that these motifs mimic the host motifs. Their interaction with the host could alter cellular processes via adhesion and cell signaling, directly playing a potential role in pathogenesis. Further, PE_PGRS39 has the potential to evoke an immune response which gives an insight into vaccine development. Comprehensive analysis of GTP cyclohydrolase I activity in Mycobacterium tuberculosis H 37 Rv via in silico studies The short amino acid sequence Pro-His-Ser-Arg-Asn in human fibronectin enhances cell-adhesive function Mycobacterial PE_PGRS proteins contain calcium-binding motifs with parallel β-roll folds Src homology 3-interacting domain of Rv1917c of Mycobacterium tuberculosis induces selective maturation of human dendritic cells by regulating PI3K-MAPK-NF-κB signaling and drives Th2 immune responses Structural prediction and mutational analysis of Rv3906c gene of Mycobacterium tuberculosis H 37 Rv to determine its essentiality in survival Distribution of proline-rich (PxxP) motifs in distinct proteomes: functional and therapeutic implications for malaria and tuberculosis pyDock: Electrostatics and desolvation for effective scoring of rigid-body proteinprotein docking In silico identification of common epitopes from pathogenic mycobacteria In silico docking of urokinase plasminogen activator and integrins Reconciling the spectrum of Sagittarius A* with a two-temperature plasma model An update to calcium binding proteins Infectious diseases: considerations for the 21st century Mycobacterium tuberculosis uses Mce proteins to interfere with host cell signaling EASE-MM: sequencebased prediction of mutation-induced stability changes with feature-based multiple models Purification of cellular and organelle populations by fluorescence-activated cell sorting for proteome analysis LocTree2 predicts localization for all domains of life Prediction of host-pathogen interactions for helicobacter pylori by interface mimicry and implications to gastric cancer GalaxyRefine: protein structure refinement driven by side-chain repacking An overview of in silico vaccine design against different pathogens and cancer In silico design and characterization of multiepitopes vaccine for SARS-CoV2 from its Spike proteins Prominent role of FnBPs of Mycobacterium tuberculosis in cell adhesion, immune invasion and pathogenesis TubercuList -10 years after Protein disorder prediction: implications for structural proteomics Survival mechanisms of pathogenic Mycobacterium tuberculosis H 37 Rv An overview to understand the role of PE_PGRS family proteins in Mycobacterium tuberculosis H 37 Rv and their potential as new drug targets Interrelation of Ca 2+ and PE_PGRS proteins during Mycobacterium tuberculosis pathogenesis Cloning and characterization of a novel PE_PGRS60 protein (Rv3652) of Mycobacterium tuberculosis H 37 Rv exhibit fibronectin-binding property To elucidate the association of Rv0526 gene with the pathogenic potential of Mycobacterium tuberculosis H 37 Rv Cloning and characterization of GTP-binding proteins of Mycobacterium tuberculosis H 37 Rv Vaccine against tuberculosis: what's new? Biochemical characterization of PE_PGRS61 family protein of Mycobacterium tuberculosis H 37 Rv reveals the binding ability to fibronectin Guanosine triphosphatases as novel therapeutic targets in tuberculosis I-TASSER: a unified platform for automated protein structure and function prediction COFACTOR: an accurate comparative algorithm for structure-based protein function annotation VICMpred: an SVM-based method for the prediction of functional proteins of gram-negative bacteria using amino acid patterns and composition Mycobacterial PE/PPE proteins at the host-pathogen interface Differential expression of immunogenic proteins on virulent Mycobacterium tuberculosis clinical isolates In silico screening of protein Rv3228 to have a vision towards survival and pathogenesis of Mycobacterium tuberculosis H 37 Rv Central bringing excellence in open access in silico analysis of protein Fibronectin and its role in human infective diseases A small fibronectinmimicking protein from bacteria induces cell spreading and focal adhesion formation How pathogens use linear motifs to perturb host cell networks Presentation of antigens by MHC class II molecules: getting the most out of them Role of calcium in tumor necrosis factor-α production by activated macrophages COACH-D: improved proteinligand binding sites prediction with refined ligand-binding poses through molecular docking CELLO2GO: a web server for protein subCELlular lOcalization prediction with functional gene ontology annotation I-TASSER server for protein 3D structure prediction RGD motif of lipoprotein T, involved in adhesion of Mycoplasma conjunctivae to lamb synovial tissue cells The authors acknowledge financial support from