key: cord-0738346-9lpxjpeg authors: Tsai, Cheng-Yu; Chiou, Shean-Jaw; Ko, Huey-Jiun; Cheng, Yu-Fan; Lin, Sin-Yi; Lai, Yun-Ling; Lin, Chen-Yen; Wang, Chihuei; Cheng, Jiin-Tsuey; Liu, Hsin-Fu; Kwan, Aij-Li; Loh, Joon-Khim; Hong, Yi-Ren title: Deciphering the evolution of composite-type GSKIP in mitochondria and Wnt signaling pathways date: 2022-01-20 journal: PLoS One DOI: 10.1371/journal.pone.0262138 sha: 28b4223827724d868cb7e8c055faa1293ca62cc0 doc_id: 738346 cord_uid: 9lpxjpeg We previously revealed the origin of mammalian simple-type glycogen synthase kinase interaction protein (GSKIP), which served as a scavenger and a competitor in the Wnt signaling pathway during evolution. In this study, we investigated the conserved and nonconserved regions of the composite-type GSKIP by utilizing bioinformatics tools, site-directed mutagenesis, and yeast two-hybrid methods. The regions were denoted as the pre-GSK3β binding site, which is located at the front of GSK3β-binding sites. Our data demonstrated that clustered mitochondria protein 1 (CLU1), a type of composite-type GSKIP that exists in the mitochondria of all eukaryotic organisms, possesses the protein known as domain of unknown function 727 (DUF727), with a pre-GSK3β-binding site and a mutant GSK3β-binding flanking region. Another type of composite-type GSKIP, armadillo repeat containing 4 (ARMC4), which is known for cilium movement in vertebrates, contains an unintegrated DUF727 flanking region with a pre-GSK3β-binding site (115SPxF118) only. In addition, the sequence of the GSK3β-binding site in CLU1 revealed that Q126L and V130L were not conserved, differing from the ideal GSK3β-binding sequence of simple-type GSKIP. We further illustrated two exceptions, namely 70 kilodalton heat shock proteins (Hsp70/DnaK) and Mitofilin in nematodes, that presented an unexpected ideal GSK3β-binding region with a pre-GSK3β sequence; this composite-type GSKIP could only occur in vertebrate species. Furthermore, we revealed the importance of the pre-GSK3β-binding site (118F or 118Y) and various mutant GSK3β-binding sites of composite-type GSKIP. Collectively, our data suggest that the new composite-type GSKIP starts with a DUF727 domain followed by a pre-GSK3β-binding site, with the subsequent addition of the GSK3β-binding site, which plays vital roles for CLU1, Mitofilin, and ARMC4 in mitochondria and Wnt signaling pathways during evolution. Introduction with InterPro [21] [22] [23] [24] [25] to compare the results available at the National Center for Biotechnology Information (NCBI) website for domain enrichment by using the E-Value for parameters. Gene-based tests were performed using the following keywords as indicators: GSKIP, DUF727, GSK3β, CLU1, and AMRC4 separately and in combination with GSKIP-related data (S1 Fig). We used FlyBase version FB2014_04 to identify conservative portions in InterPro. Panther data mining is a protein classification system that is applied for analysis based on evolutionary relationships. It is a large biology database of gene/protein families and their functionally related subfamilies. It can be used to classify and identify the function of gene products [26, 27] . We used ClustalW [28] for multiple GSKIP sequence alignment for DUF727 (50-100 amino acids), pre-GSK3β (115-118 amino acids), and the GSK3β-binding domain (122-130 amino acids). The maximum likelihood method was used to select CLU1 to generate an unrooted phylogenetic tree of 20 species; a phylogenetic tree of 18 species was also found for one bacterium and ARMC4 orthologs [29] [30] [31] . ClustalW was used to ensure the tree structure [28] . The accession numbers of SSF103107, PF05303, IPR007967, and IPR023231 of all species in the UniProt search (S1 Table) and the sequence alignment and phylogenetic analysis of CLU1 and ARMC4 (S1 Fig) were used as parameters. The T-Coffee method was used to align full-length CLU1 and ARMC4 proteins and their fragmentary proteins spanning the SSF103107 unintegrated superfamily [32] [33] [34] . The MEGA-X program was used to produce the best-fitting amino acid substitution model [29] [30] [31] . We used the maximum likelihood method from MEGA-X to reconstruct phylogenetic trees with the LG substitution model [35, 36] . Boot strapping was used to evaluate the robustness of the phylogenetic trees. The initial three-dimensional (3D) NMR structural model of GSKIP (PDB ID: 1SGO) was obtained as previously described [20] . The NorthEast Structural Genomics consortium was used to obtain the human NMR structure of GSKIP (C14orf129, HSPC210; PDB ID: 1SGO). Next, the 3D structure of the ACT domain, which folds with a ferredoxin-like βαββαβ topology [37] , was determined. The structure was minimized for 100,000 conjugate gradient steps and then subjected to 100-s isothermal, constant-volume MD simulation. The final structure was used in domain comparisons (GSKIP, ACT domain, and CLUH-KIAA0664-SSF103107) [1-4, 6, 7, 37] . To illustrate the consensus sequence logos of GSKIP, we presented typical simple-type logos and several composite types. The rules were adapted from those used by the sequence comparison website PROSITE (http://prosite.expasy.org/sequence_logo.html) [38] , and standard oneletter codes were used for amino acids. The helices found in peptides and proteins are commonly modeled in two dimensions [39] . They can offer a view of the central axis in a protein. Wheel and net projections have been used to represent the two dimensions of 3D helical structures, and they enable the observation of the helical structural properties, especially in terms of residue polarity and intramolecular bonding. We used the helical wheel diagram shown here to determine the distribution of amino acid residues in a helical segment within the sequence of simple-type GSKIP and various composite-type variants to distinguish the differences between the two types. The plasmids of pACT2-GSKIP and pAS2-1-GSK3β were constructed for the yeast two-hybrid assay as described previously [40] [41] [42] . Briefly, GSK3β was cloned in-frame with the Gal4 DNA-binding domain in the pAS2-1 vector (MATCHMAKER Two-HybridSystem 2, Clontech) to yield the pAS2-1-GSK3β bait plasmid. In addition, DNA fragments encoding GSKIP were amplified through PCR using Taq polymerase (TaKaRa). The PCR fragments were then inserted into the BamHI and XhoI sites of the pACT2 (Clontech) vector to construct the pACT2-GSKIP plasmid. GSKIP Y118P, Y118A, Y118F, F122P, F122A, F122Y, L126P, F126A, L126V, L126Q, L130P, L130A, L130I, and L130V mutants were created through a site-directed mutagenesis technique by using the QuikChange Lightning kit (GE Healthcare, Sunnyvale, CA, USA). Mutated nucleotides were verified using an ABI PRISM 3730 Genetic Analyzer (Perkin-Elmer) for DNA sequencing. All experimental procedures were performed in accordance with the manufacturer's protocol. Yeast two-hybrid screening was performed using the MATCHMAKER Two-Hybrid System 2 (Clontech) [40] [41] [42] . YRG-2 yeast host cells were purchased from Stratagene. pAS2-1 and pACT2 plasmids were cotransfected and selected on G2 plates deficient in tryptophan and leucine and on G3 plates deficient in histidine. The yeast host cells were MATa ura3-52 his3-200 ade2-101 lys2-801 trp1-901 leu2-3 112 gal4-542 gal80-538 LYS2::UASGAL1-TATA GAL1--HIS3 URA3::UASGAL4 17mers(x3)-TATACYC1-lacZ. A visible blue-color pattern in the colony filter lift assay on the G3 plates represented a positive interaction [41] . YRG-2 yeast cells were cotransfected with pACT2-GSKIP and an empty pAS2-1 vector and spread on G2 and G3 agar plates to determine the growth-inhibiting effect of GSKIP in yeast. UniProt was used in combination with InterPro for domain enrichment analyses. The results were compared with the results in the NCBI database through stringent and extended genebased tests for ranking the genes (Fig 1A, 1B and 1C ). Our survey revealed that several domains were enriched for protein entries. The evolution of simple-type GSKIP and the GSK3β-and PKA RII-binding domains of 52 species has already been clearly demonstrated in previous reports [9, 20] . In this study, we extended the findings concerning four compositetype GSKIPs, namely CLU1 and ARMC4 (through the gene fusion mechanism presented in Fig 1A, 1B and 1C as well as the next subsection below) and another two proteins with sporadic occurrence (Mitofilin and Hsp70/Dnak), in nematodes (see the succeeding section on the hijack mechanism). CLU1 and ARMC4 contain DUF727; however, CLU1 contains other parts of CLU-N, CLU, CLU central, winged helix-like DNA-binding domains, and tetratricopeptide-like helical domains, but ARMC4 only contains 13X Armadillo repeats [43, 44] . DUF refers to a "domain of unknown function" [19] [20] [21] [22] [23] 45] . A short domain in the clustered mitochondria protein is involved in its mitochondrial cytoplasmic distribution [46, 47] . Moreover, ARMC4 in vertebrates has been identified as a multiprotein complex responsible for cilia movement; it is necessary for targeting and anchoring outer dynein arms [43, 44] . In the A, Evolved simple-type GSKIP from DUF727 to pre-GSK3β and then evolution to the GSK3β-binding site. The results of UniProtKB data retrieval using keywords are provided. When using GSKIP as a keyword, a total of 2234 protein entries were found, but the records were reduced when other terms such as DUF727, GSK3β, and PKA were added. B, When using CLU1 as a keyword, 936 protein entries were found, and 210 protein entries were convergent with the keywords "DUF727" or "GSKIP." C, Using ARMC4 as a keyword resulted in present study, data mining was performed to search the available protein entries, with the results yielding 210 of 936 (22%) with respect to CLU1, 198 of 936 (21%) with respect to CLU1/DUF727, and 210 of 936 (22%) with respect to CLU1/GSKIP. Additionally, the hidden (cryptic) superfamily of DUF727 (SSF103107) was indicated to be an unintegrated family. The hypothetical protein c14orf129 hspc210 superfamily entry could enrich entries for the GSKIP (SSF103107) domain to 1869/1872 (99.8%; Fig 1A, 4 vs. 1). When ARMC4 was retrieved (all entries must be searched to find the hidden code SSF103107), only the hidden (cryptic) superfamily of DUF727 (identified as the unintegrated family SSF103107) with 38/449 entries (8%) was shown to contain a pre-GSK3-binding site (115SPxF118) containing F118 instead of Y118 as compared with the normal GSK3-binding site (more details on the comparisons are provided in subsequent discussion). We occasionally found composite-type GSKIP in two bacterial species, one of which was gram-positive Desulfuribacillus alkaliarsenatis (A0A1E5G502_9BACL). The acetoin-utilization protein AcuB contained the CBS and ACT domains (CBS, 1-138 aa; ACT, 139-212 aa; GSKIP; DUF727, 154-204 aa), and DUF727 was located within the ACT domain as composite GSKIP. The ACT domain was identified in a PSI-BLAST search. Escherichia coli 3PGDH was discovered to be the first protein with an ACT domain that folds into a ferredoxin-like βαββαβ topology [37] (Fig 1D) . The ACT domain is found in a variety of contexts and may be a conserved regulatory ligand-binding fold. However, DUF727 has antiparallel ββββ topology (PDB ID: 1SGO, the NorthEast Structural Genomics consortium was used for NMR analysis; Fig 1E) , with a similar β-sheet homology (ferredoxin-like βαββαβ topology; Fig 1D right panel compared with Fig 1E) as the human CLUH-KIAA0664-SSF103107 superfamily domain ( Fig 1E) . We propose that during evolution, GSKIP may have originated from DUF727, as found in bacteria, and may have acquired the pre-GSK3β-binding motif and GSK3β-binding domain to become the ancestor of simpletype GSKIP. Through evolution, simple-type GSKIP later acquired the PKA RII-binding domain as GSKIP/AKAP in vertebrates, whereas simple-type GSKIP has been retained in invertebrates. For composite-type GSKIP, one bacterium was found to contain DUF727 with the ACT domain, whereas in all eukaryotes, DUF727 was incorrectly recognized as GSKIP. Although DUF727 lacks the ideal GSK3β-binding domain, it was still counted among the CLU/TIF31 and CLUH/KIAA0664 proteins. In some vertebrates, the GSKIP domain is also found in an ARMC4-containing protein with a pre-GSK3β-binding site only. Two sporadic Mitofilin and Hsp/DnaK proteins were found in invertebrates containing perfect pre-GSK3βbinding and GSK3β-binding sites ( Fig 1F) . We summarize four groups of composite-type GSKIP in Fig 1G. In different species, to determine the conserved sequences of GSK3β-binding regions and DUF727 in CLU1, we first selected one bacterial prototype and 20 GSKIP orthologs and used ClustalW and MEGA-X to construct a phylogenetic tree (Figs 2 and S1A). These 20 species 449 protein entries, but only 38 records remained as the SSF103107 hidden code (marked with " � " as a cryptic code; see the text for details) when DUF727 or GSKIP was combined with AMRC4 for data mining. D, E. coli 3PGDH with a pair of ACT domains formed an eight-stranded antiparallel sheet whose 3D structure was determined (left) to fold into a ferredoxin-like βαββαβ topology (red box, right panel). E, NMR analysis of DUF727 (PDB ID: 1SGO) with the central part of antiparallel ββββ topology (red box, right panel) based on the Northeast Structural Genomics consortium (left). F, All queries of previous simpletype GSKIP were summarized using UniProt and Panther data mining (see the text for details). G, Four groups of composite-type GSKIP: AcuB, CLU1, ARMC4, and sporadic (Hsp70/Dnak and Mitofilin). https://doi.org/10.1371/journal.pone.0262138.g001 . Additionally, ClustalW was combined with the T-Coffee webbased program and MEGA-X to ensure a tree structure (Fig 3) . Both a GSK3β-binding motif (122Fxxx126LxxR/K/QL130) and a pre-GSK3-binding motif (115SPxF118 rather than 115SPxY118 in simple-type GSKIP) were found in most of the GSKIP orthologs (Fig 3) . We found that two distinct bacteria contained composite-type GSKIP based on the observation of the aligned sequences without the GSK3β-binding site, which revealed that they are primitive [5, 6, 20] . Why and how composite-type GSKIP evolved a pre-GSK3β-binding site prior to the GSK3β functional domain should be investigated, and such an investigation could provide some indication of the role of the GSK3β-binding site in composite-type GSKIP evolution. When DUF727 is regarded as an initial domain, the CLU1 mitochondria family [48, 49] and the ARMC4 family of proteins that emerge [43, 44] are composite-type GSKIPs. By contrast, in simple-type GSKIP, the GSK3β-binding region is conserved with some residues, indicating that the 122Fxxx126L/QxxR/KV/G/A130 region is essential. The Leu130 residue has been characterized as being essential for GSK3β binding in humans [1, 6, 20] , indicating its inactivation when this site is modified with GSK3β. In the present study, this always occurred in CLU1 (Table 1 ), indicating that CLU1 is still in the process of evolution. Of note, the modification of the ARMC4-pre-GSK3 site 118F to 118Y abolished its interaction with GSK3 (Table 1) , indicating the same evolutionary process of the ARMC4-pre-GSK3β site as that of the CLU1 family. When using the helical wheel diagram, the comparisons revealed various V126L residues, but no 130L residue, in CLU1 (Fig 4, GSKIP: 122-139 aa with its various mutants) and Logo 3.6.0 (Fig 5A-5D ). These data enabled the evaluation of whether the consensus sequence is conserved in pre-GSK3-(115SPxF/Y118; Fig 5B and 5C compared with Fig 5A) and GSK3β-binding sites (122Lxxx126LxxK/RL130, Fig 5D compared with Fig 5A) . Crucially, three composite GSKIP variants (126Q, 126V, and 130V) from different species were found to bind GSK3 by Y2H (Table 1 ). In addition, the DUF727 domain seemed to gradually become a pre-GSK3β-binding site (115SPxF118) as a flanking region with the consensus sequence site 115SPxF118xxx122Fxxx126LxxR/KL130, whereas in vertebrates, ARMC4 contained a hidden code SSF103107 (unintegrated DUF727 superfamily, Fig 5C) flanking region with pre-GSK3β-binding sites at 115SPxF118 only (see the next subsection below). In our previous study, we used the keywords DUF727 (PF05303) and GSKIP (IPR007967) to search for ARMC4. Subsequently, DUF727 and GSKIP were eliminated as keywords from the amino acid sequences. The final dataset included a total of 161 positions. Phylogenetic tree generated using MEGA-X [31] . The bootstrap values represented as the likelihood function for each species as indicated. https://doi.org/10.1371/journal.pone.0262138.g002 Deciphering the evolution of composite-type GSKIP in mitochondria and Wnt signaling pathways website. Only the hidden code SSF103107 was used as a metaphor for DUF727 and GSKIP for searching for ARMC4. Although the hidden code has been classified into the DUF727 superfamily, the hidden code SSF103107 domain shares homology with DUF727 (PF05303) and GSKIP (IPR007967). Therefore, we checked all ARMC4 family queries, and 38 of 449 species (8%) were found to contain the hidden code SSF103107, with protein evolution through gene (domain) fusion. Because the pre-GSK3β-binding site (115SPxF118) was retained only in the ARMC4 family in vertebrates, the GSK3β-binding site is not yet present at this stage. The pre-GSK3β-binding site (115SPxF118) may exist as a primitive type in vertebrates. We selected 18 SSF103107 orthologs from vertebrates to determine their alignment and performed T-Coffee phylogenetic tree construction of ARMC4 proteins evolved through gene fusion in 8% of vertebrate species (S1B Fig). We selected SSF103107 orthologs, and ClustalW and MEGA-X were used for phylogenetic tree construction (S1C Fig). Proteins encoded by ARMC4, 10 armadillo repeat motifs (ARMs), and one HEAT repeat were found in ARMC4, which is thought to be involved in ciliary and flagellar movement [43, 44] . The term armadillo is derived from the historical name of the β-catenin gene in the fruit fly Drosophila, where the armadillo repeat was first discovered. Although β-catenin was previously thought to be a protein involved in linking cadherin cell adhesion proteins to the cytoskeleton, a recent study indicated that β-catenin regulates the homodimerization of alpha-catenin, which in turn controls act branching and bundling [50] . However, the armadillo repeat has been found in a wide range of proteins with other functions. This protein domain plays a vital role in transducing Wnt signals during embryonic development [51] . As described in the earlier text, we found that a substantial number of DUF727 domain insertions into CLU1 (in mitochondria) together with ARMC4 (in the Wnt pathway) may constitute the gene fusion recombination mechanism. We also detected ARMC4 in the pre- The β-galactosidase filter assay was also used for semiquantitative analysis. GSK3β-binding site (115SPxF118, Fig 5C) , whereas the CLU1 family gradually formed from the pre-GSK3β-binding site to extend GSK3β-binding sites (115SPxF118xxxF122xxxQ/ V126xxRV130, Fig 5B) . However, no PKA-binding sites were found in composite-type GSKIP compared with simple-type GSKIP. We discovered evidence that the contribution of domain fusion to the evolution of multidomain proteins is bounded by the lower boundary of 63% in invertebrates and the upper boundary of 94% in vertebrates in the CLU1 family (Fig 3) . By contrast, in the vertebrate ARMC4 family, a cryptic (unintegrated) superfamily DUF727 (hidden code SSF103107) was also found to bind to the DUF727 domain in 8% of species with a pre-GSK3β-binding site (115SPxF118) (20 species compared with one bacterium). The association of DUF727 with the pre-GSK3β-binding site in vertebrate ARMC4 suggests that DUF727 (SSF103107) originates in a primitive stage of evolution. We suggest that the gene Deciphering the evolution of composite-type GSKIP in mitochondria and Wnt signaling pathways fusion mechanism is a major contributor to the evolution of the CLU1 (in mitochondria) and ARMC4 (in the Wnt pathway) families in composite-type GSKIP. Members of the Hsp70 family are ubiquitously expressed and highly conserved; for example, the major form of Hsp70 from E. coli, termed DnaK, is approximately 50% identical to human Hsp70s. Hsp70 chaperone-assisted folding involves repeated cycles of substrate binding and release. Hsp70 activity is ATP-dependent. Hsp70 proteins comprise two regions: the amino terminus, which is the ATPase domain, and the carboxyl terminus, which is the substratebinding region [52, 53] . Unexpectedly, in this study, we demonstrated the noncanonical order of the sporadic Hsp70/DnaK (A0A261CFR1_9PELO, C. latens; A0A1I7XTM9_HETBA, Heterorhabditis bacteriophora; A0A368GFB0_ANCCA, Ancylostoma caninum; and A0A0D6M4D9_9BILA, A. ceylanicum) and Mitofilin (H3EKR7_PRIPA, Pristionchus pacificus) proteins found in the composite-type GSKIP of invertebrate nematodes; they exist in the mature forms 115SPxY118xxxFxxxLxxRL130 and 115SPxY118xxxFxxxV116xxKL130, respectively (Fig 5D) . The pre-GSK3-binding site prefers 118Y instead of 118F in invertebrate nematodes compared with the CLU1 (all eukaryotes) and ARMC4 (vertebrate) families. These pre-GSK3β-binding sites utilizing 118Y instead of 118F to exhibit their GSK3β-binding activity were completely conserved in Mitofilin and Hsp70/DnaK in invertebrate nematodes but not vertebrates. The noncanonical order of these two composite-type GSKIPs in the evolutionary tree could be explained by a hijack (recombination) mechanism, as further evidenced by omega and T-Coffee comparisons. Apparently, the DUF727 domain gradually evolved as a pre-GSK3β-binding site (115SPxF118) flanking region with the consensus sequence site 115SPxF118xxx122FxxxLxxR/KL130 (Fig 5A) , whereas in vertebrates, ARMC4 only contained the cryptic SSF103107 (unintegrated DUF727 superfamily, Fig 5C) flanking region with the pre-GSK3β-binding sites 115SPxF118 lacking the GSK3β-binding region (122FxxxLxxR/ KL130). Altogether, these findings suggest that (1) at the pre-GSK3β-binding site, 118F is present prior to 118Y during evolution; (2) ARMC4 in higher organisms evolved through the GSK3-binding site of Hsp70/DnaK and Mitofilin; and (3) composite-type GSKIP evolved slower than simple-type GSKIP. We further performed site-directed mutagenesis and yeast two-hybrid assays to compare and reveal the importance of this pre-GSK3β-binding site with its flanking GSK3β-binding conserved sites in invertebrates and vertebrates (Table 1) . We used budding yeast (S. cerevisiae) as an ideal model organism for studying domains with pre-GSK3β-and GSK3β-binding regions to determine how a new composite-type GSKIP was conserved in multicellular organisms through natural selection and it did not interfere with endogenous simple-type GSKIPs. As described in the earlier text, the pre-GSK3β-binding site (118F or 118Y) and various mutant GSK3β-binding sites (from CLU1) were compared through site-directed mutagenesis, as in a previous study, using yeast two-hybrid methods. Our data revealed 118Y, 122F, 126F, and 130L to be essential for binding during evolution, which is consistent with our previous study results [20] . Taken together, these data imply that the pre-GSK3β-binding site with 118Y plays a crucial role for GSK3β-binding sites as well as 122F, 126L, and 130L (Table 1) [1, 3, 20] . Our study revealed that the bat CLUH protein of composite-type GSKIP possesses a dominant consensus sequence (115SPxF118xxxF122xxxQ126xxRV130) quite similar to all the CLUH species described in the earlier text (S2 Table) . Overexpression of ATG2B and GSKIP increased progenitor sensitivity to thrombopoietin, enhancing megakaryocyte progenitor differentiation [11, 13] . The presence of the AK7 gene results in a predisposition to ciliary dyskinesia [43, 44, 49] . The protein encoded by ARMC4 containing 10 ARMs has also been reported to be involved in ciliary movement [43, 44] . The molecular mechanism for the coevolution of these neighboring genes with both simple-type (GSKIP-AK7) and composite-type GSKIP (ARMC4) in vertebrates remains unclear. A study suggested that pangolins are natural hosts of beta-coronaviruses, and comprehensive surveillance of coronaviruses in pangolins could improve our understanding of the spectrum of COVID-19 (the pandemic arising from a type of coronavirus called Sars-CoV-2; see [54] and S2 Table) . Urgent investigation of the bat genome is required and may reveal the orchestration of specific elements in the COVID-19 crisis. ARMC4 may function as a competitor to GSKIP may play different roles in the Wnt pathway in different species. For C. elegans and Drosophila, due to a lack of the Axin GSK3β-binding motif on either the APR1-PRY1 or APC-Axin complex, GSKIP orthologs (GSKIP-SGG and GSKIP-ZW3) may serve only as scavengers that prevent complex formation. For vertebrates such as Xenopus, D. rerio, and mammals, GSK3β is one of the proteins involved in the formation of the destruction complex with β-catenin. Thus, GSKIP as a negative regulator may play a dual role as a scavenger to prevent GSK3β binding to the destruction complex or as a competitor for the Axin GSK3β-binding site with GBP or FRAT involved in the canonical Wnt pathway. ARMC4 may function as a competitor to APC and β-catenin. In addition, the pangolin protein (TCF/LEF) is involved in the Wnt signaling pathway in pangolins, an intermediate host species for SARS (modified from [19, 51] APC and β-catenin. In addition, the protein TCF4 in pangolins, as an intermediate host species of severe acute respiratory syndrome, indicates the involvement of the Wnt signaling pathway. This may be a coincidence but also raises the interesting question of what is the root cause of COVID-19 infection. Previously, we proposed that the ARMC4 family (as a composite-type GSKIP with a physiological role) together with simple-type GSKIP could function as competitors and is involved in the Wnt signaling pathway [20] . Both simple-type GSKIP and composite-type GSKIP (ARMC4) have been implicated in the Wnt signaling pathway (Fig 6) [ 50, 51, 55, 56] . According to UniProt data, we found the domain/fusion mechanism of the CLU and ARMC4 families and the hijack (recombination) mechanism for two sporadic genes of Hsp70/DnaK [52, 53] and Mitofilin [46, 47] that were hijacked by the DUF727 domain. The evolution of these composite-type GSKIPs was also revealed by the gene location in the corresponding organelles. In this study, we detected gene fusion events by using UniProt, T-Coffee, and ClustalW. A total of 1748 simple-type GSKIP proteins as components of gene fusion in composite-type GSKIP were also detected, many of which were from the CLUH and ARMC4 families. The predicted functional associations were with mitochondria and the Wnt signaling pathway. We demonstrated for the first time that gene fusion is a complex evolutionary process for CLUH and ARMC4, including phylogenetic distance. Gene fusion plays a key role in the evolution of gene architecture. We can observe its effect if gene fusion occurs in coding sequences [57] . When gene fusion occurs in the assembly of a new gene, new functions emerge through the addition of peptide modules to the multidomain protein. The detection methods for gene fusion events on a large biological scale can provide insights into the multimodular architecture of proteins [26, 58, 59] . In bacteria, a novel ligand-binding domain (ATP-binding motif), named the ACT domain, was identified through a PSI-BLAST search. E. coli 3PGDH was the first protein with an ACT domain that was found to fold into a ferredoxin-like (potential ATP in photosynthetic bacteria) βαββαβ topology. The ACT domain is found in a variety of contexts and is proposed to be a conserved regulatory ligand-binding fold. However, DUF727 has an antiparallel ββββ topology (NMR PDF file), sharing a similar sheet homology (Fig 1D and [2] ) as the human CLUH--KIAA0664-SSF103107 superfamily domain ( Fig 1E) . Thus, Hsp70/Dnak, which requires ATP, chaperones, mitochondria, and Mitofilin, requires an ATP source for binding to CLU1. In particular, this standalone ACT domain protein might form complexes upon binding to other proteins, such as kinases, which interact with and regulate ARMC4, β-catenin, and APC in the Wnt pathway. We also observed that DUF727 fused to CLU1 and ARMC4. Moreover, domain accretion through a gene fusion mechanism may be a major contributor to gene evolution [60] . Three GSKIP-containing isoforms in humans are located in different chromosomes: GSKIP in chromosome 14, CLUH/KIAA0664 in chromosome 17, and ARMC4 in chromosome 10, and GSKIPs, CLUH, and ARMC4 still retain a DUF727. This may elucidate certain events in the evolutionary process. In particular, the unique order of Hsp70/Dnak and Mitofilin evolution may be evidence of another mechanism (hijacking). Our queries resulted in several new findings. First, we found a highly primitive form of GSKIP in lower organisms, such as gram-positive bacteria, and neither a PKA-binding domain nor a GSK3β-binding site was found when tracing GSKIP homologs in vertebrates and invertebrates [27, 37] . Second, the DUF727 domain in CLU1 and ARMC4, regarded as the central fragment of GSKIP, can exist alone or with other domains as part of multidomain structures. Third, DUF727 first extended to a pre-GSK3β site at 115SPXF118 in vertebrate ARMC4 during evolution. Fourth, two sporadic composite-type GSKIPs (Mitofilin and Hsp70/Dnak) resulting from a hijacking mechanism were found in invertebrate nematodes. As shown in Fig 1F, at an evolutionarily later time, simple-type GSKIP acquired the PKA RII-binding domain as GSKIP/AKAP in vertebrates. Simple-type GSKIP is still retained in invertebrates. For composite-type GSKIP, one bacterium containing DUF727 with the ACT domain was found, whereas in all eukaryotes, DUF727 was improperly recognized as GSKIP. Although DUF727 lacks the ideal GSK3β-binding domain, it is still counted in CLU/TIF31 and CLUH/KIAA0664 sequences. In some vertebrates, the GSKIP domain was found in an ARMC4-containing protein with a pre-GSK3βbinding site only. The two sporadic proteins of Mitofilin and Hsp/DnaK were found in invertebrates containing perfect pre-GSK3β-binding and GSK3β-binding sites. In sum, this study generated evidence demonstrating that the composite-type GSKIP of CLU1 and Mitofilin in mitochondria and the Hsp70 chaperone and ARMC4 have functions in Wnt signaling and simple-type GSKIP-linked proteins have roles in the origin of GSK3β-binding sites during DUF727/GSKIP evolution. Both simple-type and composite-type GSKIP (ARMC4) attract the most attention due to their degree of involvement in the Wnt signaling pathway (Fig 1F and Fig 6) . Moreover, the entire genome of Rhinolophus ferrumequinum (the greater horseshoe bat) should be sequenced because doing so can provide insights into the importance of the bat genome. The likely insights into the bat genome might contribute to the efforts to end the current COVID-19 outbreak in so far as the insights may elucidate how GSKIPs function in bat mitochondria and Wnt signaling pathways; particularly in immunology studies may also help to understand how mother nature of bat genome harboring coronavirus could be used to combat emerging variants of this pandemic virus. Two questions remain unanswered: first, under what conditions did this recombination occur to form composite-type GSKIP (to mitochondria, chaperone proteins, or armadillo repeats)? Second, in which organism did this recombination occur? For evolutionary biologists, these composite-GSKIP proteins can reveal the key steps in the evolution of GSKIP, particularly for composite-type GSKIP. Composite-type GSKIPs demonstrated the coevolution of pre-GSK3β-and GSK3β-binding sites that extended to DUF727 in the CLU1 and ARMC4 families, and the study findings may provide insights into the importance of both simple-type and composite-type GSKIP for the mitochondrial and Wnt signaling pathways. The boot strapping test measures the internal consistency of data produced above than 0.5 (50%) of the bootstrap replicates are consistent. C, Multiple sequence alignment with respect to the DUF727 region of GSKIP orthologs was conducted using ClustalW. D, T-Coffee estimates of alignment accuracy improved phylogenetic tree reconstruction. The conserved residues are indicated with asterisks, and residues with high similarity among the orthologs are marked with dots at the bottom. The label � indicates the possible region of DUF727 in ARMC4. (TIF) GSKIP Is Homologous to the Axin GSK3β Interaction Domain and Functions as a Negative Regulator of GSK3β Prediction of the binding mode between GSK3β and a peptide derived from GSKIP using molecular dynamics simulation Involvement of the residues of GSKIP, AxinGID, and FRATtide in their binding with GSK3β to unravel a novel C-terminal scaffold-binding region Interactome mapping of the phosphatidylinositol 3-kinase-mammalian target of rapamycin pathway identifies deformed epidermal autoregulatory factor-1 as a new glycogen synthase kinase-3 interactor Mechanisms of protein kinase A anchoring Glycogen synthase kinase 3beta interaction protein functions as an A-kinase anchoring protein GSKIP, an inhibitor of GSK3beta, mediates the N-cadherin/beta-catenin pool in the differentiation of SH-SY5Y cells GSKIP-and GSK3-mediated anchoring strengthens cAMP/PKA/Drp1 axis signaling in the regulation of mitochondrial elongation The A-Kinase Anchoring Protein (AKAP) Glycogen Synthase Kinase 3β Interaction Protein (GSKIP) Regulates β-Catenin through Its Interactions with Both Protein Kinase A (PKA) and GSK3β The A-kinase Anchoring Protein GSKIP Regulates GSK3β Activity and Controls Palatal Shelf Fusion in Mice Germline duplication of ATG2B and GSKIP predisposes to familial myeloid malignancies Myelodysplastic syndromes and acute leukemia with genetic predispositions: a new challenge for hematologists ATG2B and GSKIP: 2 new genes predisposing to myeloid malignancies Loss of PPARγ in endothelial cells leads to impaired angiogenesis miR-150-5p Inhibits Non-Small-Cell Lung Cancer Metastasis and Recurrence by Targeting HMGA2 and β-Catenin Signaling Disease-related cellular protein networks differentially affected under different EGFR mutations in lung adenocarcinoma Linc00173 promotes chemoresistance and progression of small cell lung cancer by sponging miR-218 to regulate Etk expression MiR-181c-5p Mitigates Tumorigenesis in Cervical Squamous Cell Carcinoma via Targeting Glycogen Synthase Kinase 3β Interaction Protein (GSKIP) GSKIP-Mediated Anchoring Increases Phosphorylation of Tau by PKA but Not by GSK3beta via cAMP/PKA/GSKIP/GSK3/Tau Axis Signaling in Cerebrospinal Fluid and iPS Cells in Alzheimer Disease The origin of GSKIP, a multifaceted regulatory factor in the mammalian Wnt pathway 2017-beyond protein family and domain annotations UniProt: a worldwide hub of protein knowledge UniProt: the universal protein knowledgebase in 2021 Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions The structure of a domain common to archaebacteria and the homocystinuria disease protein CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets Molecular Evolutionary Genetics Analysis across Computing Platforms A novel method for fast and accurate multiple sequence alignment T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension Bayesian phylogenetics with BEAUti and the BEAST 1.7 The ACT domain family NetWheels: Peptides Helical Wheel and Net projections maker A novel genetic system to detect protein-protein interactions The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest Yeast GAL4 two-hybrid system. A genetic system to identify proteins that interact with a target protein ARMC4 mutations cause primary ciliary dyskinesia with randomization of left/right body asymmetry Role of adenylate kinase type 7 expression on cilia motility: possible link in primary ciliary dyskinesia UniProt: a hub for protein information The mitochondrial inner membrane protein Mitofilin controls cristae morphology Caenorhabditis elegans Mitofilin homologs control the morphology of mitochondrial cristae and influence reproduction and physiology The S. cerevisiae CLU1 and D. discoideum cluA genes are functional homologues that influence mitochondrial morphology and distribution Clueless, a conserved Drosophila gene required for mitochondrial subcellular localization, interacts genetically with parkin Wnt/β-Catenin Signaling, Disease, and Emerging Therapeutic Modalities Knockout mouse models to study Wnt signal transduction GroES/GroEL and DnaK/DnaJ Have Distinct Roles in Stress Responses and during Cell Cycle Progression in Caulobacter crescentus Glycogen synthase kinase 3beta negatively regulates both DNA-binding and transcriptional activities of heat shock factor 1 Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? Protein Structure and Sequence Reanalysis of 2019-nCoV Genome Refutes Snakes as Its Intermediate Host and the Unique Similarity between Its Spike Protein Insertions and HIV-1 The 2019-new coronavirus epidemic: Evidence for virus evolution Fusion and Fission of Genes Define a Metric between Fungal Genomes Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes Gene fusion/fission is a major contributor to evolution of multi-domain bacterial proteins Recent duplication, domain accretion and the dynamic mutation of the human genome We thank Gary Mawyer and Wallace Academic Editing for English edition on the manuscript.