key: cord-0256783-8oql0era authors: Lowe, Nora E.; Chang, Roger L. title: Making sense of disorder: Investigating intrinsically disordered proteins in the tardigrade proteome via a computational approach date: 2022-01-31 journal: bioRxiv DOI: 10.1101/2022.01.29.478329 sha: 1ebd6fc0b39c7575f265286eb14661db3e236029 doc_id: 256783 cord_uid: 8oql0era Tardigrades, also known as water bears, are a phylum of microscopic metazoans with the extraordinary ability to endure environmental extremes. When threatened by suboptimal habitat conditions, these creatures enter a suspended animation-like state called cryptobiosis, in which metabolism is diminished, similar to hibernation. In this state, tardigrades benefit from enhanced extremotolerance, withstanding dehydration efficiently for years at a time in a type of cryptobiosis called anhydrobiosis. Recent studies have demonstrated that the tardigrade proteome is at the heart of cryptobiosis. Principally, intrinsically disordered proteins (IDPs) and tardigrade-specific intrinsically disordered proteins (TDPs) are known to help protect cell function in the absence of water. Importantly, TDPs have been successfully expressed in cells of other species experimentally, even protecting human tissue against stress in vitro. However, previous work has failed to address how to strategically identify TDPs in the tardigrade proteome holistically. The overarching purpose of this current study, consequently, was to generate a list of IDPs/TDPs associated with tardigrade cryptobiosis that are high-priority for further investigation. Firstly, a novel database containing 44,836 tardigrade proteins from 338 different species was constructed to consolidate and standardize publicly available data. Secondly, a support vector machine (SVM) was created to sort the newly constructed database entries on the binary basis of disorder (i.e., IDP versus non-IDP). Features of this model draw from disorder metrics and literature curation, correctly classifying 160 of the 171 training set proteins (~93.6%). Of the 5,415 putative IDPs/TDPs our SVM identified, we present 82 (30 having confident subclass prediction and 52 having experimental detection in previous studies). Subsequently, the role each protein might play in tardigrade resilience is discussed. By and large, this supervised machine learning classifier represents a promising new approach for identifying IDPs/TDPs, opening doors to harness the tardigrade’s remarkable faculties for biomaterial preservation, genetic engineering, astrobiological research, and ultimately, the benefit of humankind. Subsects of life that can withstand environmental extremes have been the subject of longstanding scientific fascination. Understanding the proteomics of these organisms may aid us in harnessing their unique extremotolerance to develop biotechnologies conferring this resilience to other creatures. One key extremotolerant organism is the tardigrade, commonly known as the water bear ( Figure 1 ). With their ability to survive environmental stresses sufficient to kill many other animals, the tardigrade has piqued the interest of researchers for 248 years (Bonnet & Goeze, 1773) . These creatures are ubiquitous. There are over 1,300 species, all of which are meiofauna, minute animals inhabiting watery films and gaps between grains of sediment (Degma, Bertolani, & Guidetti, 2021) . Tellingly, tardigrades have survived the five major mass extinctions; and Sloan, Alves Batista and Loeb (2017) even suggest it would require boiling Earth's oceans to eliminate tardigrades definitively. Understanding tardigrade extremotolerance is the central focus of this research. Currently, the literature includes robust documentation of the tardigrade's capabilities in surviving environmental stresses, such as extreme temperatures (Doyère, 1842; Rahm, 1921) , radiation (Beltrán-Pardo, Jönsson, Harms-Ringdahl, Haghdoost, & Wojcik, 2015; Hashimoto & Kunieda, 2017; Horikawa et al., 2006; Horikawa et al., 2013; Jönsson, Harms-Ringdahl, & Torudd, 2005) , and vacuums/intense pressures (Jönsson, Rabbow, Schill, Harms-Ringdahl, & Rettberg, 2008; Seki & Toyoshima, 1998) . In particular, tardigrades can withstand desiccation/dehydration adeptly; In 1948, zoologist Tina Franceschi described witnessing tardigrades from a 120-year-old dried moss sample being revived (Franceschi, 1948) . Though this claim is heavily disputed , there is considerable contemporary evidence that tardigrades can indeed survive in a dehydrated state, also called anhydrobiosis, for nine to thirty years (Tsujimoto, Imura, & Kanda, 2016) . This rare ability to cope with desiccation drives this computational investigation of the tardigrade proteome. Here, by pinpointing proteins involved in tardigrade desiccation tolerance, we provide insight for developing tardigrade protein-based technologies that could abate deleterious cell processes. This work could open doors for a range of applications, such as engineering drought and radiation-resistant plants, extending viability time for transfusion of blood products or transplant of organs, and establishing stable vaccine stockpiles. Hence, in this study, we created and deployed a support vector machine (SVM) to generate a concise list of proteins of interest in tardigrade extremotolerance. Annotations of tardigrade-specific intrinsically disordered proteins (TDPs), a protein family recently implicated in tardigrade resilience (Hesgrove & Boothby, 2020; Yamaguchi et al., 2012) , served as positive training data. We created and report here a novel, comprehensive, nonredundant pan-proteome database (PPD), from which we derived our training and testing sets. This database is composed of the phylum's publicly available protein sequences, with flagged proteins involved in anhydrobiosis and other forms of extremotolerance. The feature set selected includes DISOPRED3 disorder metrics (Jones & Cozzetto, 2015) , such as fractions of disordered residues and concentrations of certain amino acids, as well as phenotypic properties derived from literature curation. In short, this study consisted of creating an all-inclusive database and narrowing it down to locate proteins of interest. Altogether, understanding TDPs could help elucidate how to repurpose the tardigrade's aptitude for survival for human benefit. This dataset, to the best of our knowledge, is the most comprehensive Tardigrada PPD that has been deduplicated to eliminate redundancy. An efficient regular expression-based algorithm permitted the assignment of a unifying identifier to most PPD proteins. Whereas previous proteomic studies have struggled with inconsistent nomenclature obfuscating protein identity, our database overcomes this obstacle. Overall, this study represents the first attempt of its kind to strategically mine proteomes for disordered proteins involved specifically in cryptobiosis. Ultimately, an enhanced understanding of such proteins could enable humans to imitate tardigrade resilience through development of novel TDP technology with wide-ranging applications in translational medicine and genetic engineering. The tardigrade's unique ability to undergo anhydrobiosis has been long documented but only recently understood. Anhydrobiosis is a type of cryptobiosis that can occur in any developmental stage (Schill & Fritz, 2008) whereby the tardigrade retracts its limbs and curls into a spherical formation called a tun (Figure 2, overleaf; Baumann, 1922) . In a 1997 landmark study, Ricci and Pagani proposed a "Sleeping Beauty" hypothesis of aging, postulating that the tardigrade's biological clock pauses during the tun state. Hengherr, Brümmer, and Schill (2008a) corroborated this idea by showing how lifespans of periodically dried Milnesium tardigradum were similar to those of their control counterparts, excluding time spent in the tun state. In this state, a type of biostasis, tardigrades rely on a host of molecular mechanisms to survive prolonged periods of desiccation, such as the disaccharide trehalose. This space was intentionally left blank. (Schokraie et al., 2010) . Trehalose is a sugar reported to be involved in anhydrobiosis in some tardigrade species (Crowe, 2002; Kinchin, 2008; Webb, 1964) . However, a preponderance of inconsistencies, both between and within individual studies, reveal there is considerable inter-/intraspecies variation in concentrations of this substance. To illustrate, Jönsson and Persson (2010) Macrobiotus islandicus (accounting for 2.9% of anhydrobiote dry weight) compared to lower amounts in other species such as M. tardigradum (0.077% of dry weight). By contrast, in a different study, the latter species was previously shown to lack trehalose altogether (Hengherr, Heyer, Köhler, & Schill, 2008b) . In tandem, these conflicting findings indicate trehalose many not be solely responsible for the phenomenon of anhydrobiosis in tardigrades. While historically the field has spent a great deal of time focused on trehalose, scientists have recently redirected their attention to the tardigrade proteome. Promising findings have implicated proteins such as late embryogenesis abundant proteins (LEAs; Förster et al., 2009; Förster et al., 2012; Schokraie et al., 2010; Tanaka et al., 2015) , heat shock proteins (Hsps; Alterio, Guidetti, Boschini, & Rebecchi, 2012; Förster et al., 2009; Förster et al., 2012; Jönsson & Schill, 2007; Reuner et al., 2010; Rizzo et al., 2010; Schill, Steinbrück, & Köhler, 2004; Schokraie et al., 2010; Schokraie et al., 2011; Wang, Grohme, Mali, Schill, & Frohme, 2014; Yoshida et al., 2017) , and DNA damage suppressor proteins (Dsup; Hashimoto et al., 2016; Hashimoto & Kunieda, 2017; Yoshida et al., 2017) in augmenting tardigrade extremotolerance. Consequently, the tardigrade proteome has taken center stage as a cache for previously unexplored factors with ties to desiccation tolerance. In particular, proteins lacking order, meaning they do not have fixed tertiary structures, are of heightened interest. A family of proteins called tardigrade-specific intrinsically disordered proteins (TDPs) is heavily involved in tardigrade cryptobiosis. Some TDPs are heat soluble, including cytosolic abundant heat soluble (CAHS), secretory abundant heat soluble (SAHS), and mitochondrial abundant heat soluble (MAHS; Yamaguchi et al., 2012) . In addition, Dsup is type of nucleosome-binding and DNA-protecting protein (Chavez, Cruz-Becerra, Fei, Kassavetis, & Kadonaga, 2019) that is also considered a TDP. The term "tardigrade-specific" indicates that these proteins, to date, have not been identified outside of the phylum, and "intrinsically disordered" signifies proteins (IDPs) or protein regions (IDRs) lacking a consistent tertiary structure under certain cellular conditions (Jirgensons, 1996) . In essence, unique properties of the TDP family can be attributed to this disorder, in that proteins within this classification adopt a material state upon desiccation, forming non-crystalline, amorphous solids (Boothby et al., 2017) . Considered a type of biological glass, these solids have been shown to protect cells during dehydration in a multitude of ways (Crowe, Carpenter, & Crowe, 1998; Sun & Leopold, 1997) by coming together to form specific intra-and extracellular structures supporting cell structure when water is scarce (Richaud et al., 2020) . Further, radical ions and reactive oxygen species can be sequestered by some IDPs, thereby mitigating oxidative stress. TDPs were initially characterized by their location of expression in the cell. Yamaguchi et al. (2012) used green fluorescent protein analysis to confirm the cytosolic abundant heat soluble (CAHS) TDP tended to gravitate toward the matrix of the cytoplasm. On the other hand, the secretory abundant heat soluble (SAHS) TDP was detected in the culture medium, indicating the protein had crossed the plasma membrane. As implied by the practice of naming TDPs simply for where they localize (Yamaguchi et al., 2012) , there is, as yet, a paucity of research exploring the intricate mechanisms by which these proteins operate. This study, therefore, focuses on intrinsically disordered proteins, with an emphasis on tardigrade-specific cases. This study uses machine learning to uncover previously unidentified TDPs-as well as IDPs not exclusive to the phylum-to generate a novel list of high-priority proteins in need of further investigation. This research takes into consideration disorder metrics and phenotypic properties derived from literature curation and is novel in (1) comprehensively mining tardigrade proteomes for IDPs involved in cryptobiosis, (2) reporting, for the first time, a deduplicated and highly organized tardigrade panproteome database, and (3) crafting a concise list of proteins of interest in the anhydrobiotic phenotype. We anticipate that this nonredundant and centralized database, coupled with our novel machine learning pipeline, will serve as an asset for tardigrade researchers, and that the targeted list of proteins generated will grant future studies more direction. The overarching goal of this study was to strategically mine the tardigrade proteome, by way of machine learning, for intrinsically disordered proteins involved in the cryptobiotic phenotype. Objectives 1) To construct and deduplicate a comprehensive tardigrade pan-proteome database (PPD) 2) To use an intrinsic disorder prediction server (i.e., DISOPRED3) for analyzing location and intensity of disorder in proteins from the PPD To employ a combination of machine learning and literature curation to filter disorder prediction results in order to yield a list of proteins in need of further exploration as prospective cryptobiosis-associated proteins 4) To utilize InterPro, a bioinformatic tool for functional analysis, to annotate the protein list generated In order to maximize the likelihood of detecting TDPs through disorder prediction and machine learning, tardigrade proteomic data were consolidated. First and foremost, a Because these datasets were so vast (44,836 sequences), Excel was used to remove redundant sequence entries. To avoid introducing redundancy, to map between identical sequences cataloged under different names, and to maximize data inclusivity, literature-derived sequences were compared to sequences from UniProt and NCBI. A regular expression-based algorithm was devised in Perl: a brute force strategy that enumerates all possibilities for an edit distance (ED) between two sequences being aligned. For small ED, this algorithm runs faster than global sequence alignment and is therefore well-suited for mapping between nearly identical sequences for large datasets. Edit number dictated how sequences were dealt with ( Figure 3 ). This decision tree illustrates how sequences were dealt with depending on the output of the regular expression script. Sequences were either mapped to the PPD, appended to the PPD as a unique addition, or discarded due to failure to meet minimum length requirements. Sequences with ED of 1-3 were included in the PPD because these small variations likely account for substitution mutations or minor mass spectrometry errors. Peptides < 100 AAs were discarded because they were not long enough for disorder prediction nor comparable to the training set. The initiator methionine presence requirement ensured proteins added to the PPD were not fragments. DISOPRED3 is an intrinsic disorder prediction server that makes a binary call between ordered and disordered residues. DISOPRED2, its predecessor, was trained on sequences associated with missing residues in X-ray crystallography structures, a telltale sign of an intrinsically disordered region (IDR; Jones & Cozzetto, 2015) . DISOPRED3 was selected because it was ranked highly in the 2014 Critical Assessments of Techniques for Protein Structure Prediction evaluation (Monastyrskyy, Kryshtafovych, Moult, Tramontano, & Fidelis, 2014) . All PPD proteins were subjected to DISOPRED3 assessment. Currently in the field of tardigrade research, there is no preemptive full-scale TDP identification system in place. Given the expensive and time-consuming nature of wet lab procedures for measuring protein expression and characterizing protein function, there is a chance TDPs that could be used to transfer tardigrade resilience to other species have yet to be identified, which compelled us to create the machine learning pipeline presented here as a means of automating the TDP identification process. There is no single accepted set of guidelines for distinguishing IDPs from ordered proteins, ordered regions, or IDRs, so an additional literature search was conducted to locate common IDP characteristics. From this search, certain guiding principles were gleaned, namely that IDPs tend to (1) contain fewer hydrophobic amino acids , which impedes hydrophobic collapse; (2) contain higher concentrations of disorder promoting amino acids such as proline and serine (Atkins, Boateng, Sorensen, & McGuffin, 2015) ; (3) contain higher concentrations of aromatic residues such as phenylalanine, tryptophan, and tyrosine ; and (4) that IDRs of more than 30 residues are considered long (Mohan, Uversky, & Radivojac, 2009 ). Based upon this literature search, the following dimensions were established for the disorder classifier: protein length, number of hydrophilic/hydrophobic residues, frequency of proline, glutamic acid, serine, glutamine, lysine, phenylalanine, tryptophan, tyrosine, and continuously disordered region length. Hydropathy determination was in accordance with Kyte-Doolittle (Kyte & Doolittle, 1982) and Hopp-Woods (Hopp & Woods, 1981) scales. Python was used to parse the long DISOPRED3 file and produce a summary output file recording the above-mentioned disorder statistics. Negatives (ordered proteins) were flagged in the PPD based on if they had (1) a crystal structure in the Protein Data Bank (PDB; though this only accounted for three in the PDB, as accessed in October of 2020) and (2) enzyme commission (EC) numbers, as most enzymes have stable tertiary structures. Others were manually annotated, relying on confirmation that there were ordered PDB homologs. Positives (ordered proteins) were flagged in the PPD by searching the accompanying NCBI or UniProt annotation methodically for TDP keywords (i.e., CAHS, SAHS, MAHS, LEA, Dsup, and associated abbreviation expansions). Once training sets were established, it was necessary to select a machine learning model. When graphing a decision boundary, as the number of features, or dimensions, describing each data point increases (here, there were 11), the difficulty of separating data (44,836 points in the PPD) cleanly into distinct classes also rises. This "Curse of Dimensionality" (Bellman, 1966 ) is why we turn to artificial intelligence. This study utilized supervised machine learning in the form of a support vector machine (SVM). SVMs are highly effective classification tools with a computational edge over alternative separation techniques (Cristianini & Shawe-Taylor, 2000) . The present SVM was created by modifying a preexisting code from Chang et al. (2020) . One challenge encountered while training the model was a severe class imbalance (55 entries in the positive training set versus 2,500+ entries in the negative training set). To account for this issue, 116 ordered examples were randomly selected to soften the imbalance while continuing to reflect size differences between the two sets. The balance issue was also dealt with using weighted scoring in the code itself. The scalar magnitude of the hyperplane equation's coefficients for the resulting classifier (Table 1 , overleaf) served as a measure of relative influence for each SVM feature. A linear kernel was utilized so coefficient values would be more readily interpretable. Also, feature values ranged drastically in magnitude. Therefore, standardization was performed through Z-scoring, mean centering the value distribution at zero and causing standard deviation of the transformed distribution to equal one. This aided hyperplane parameterization by preventing bias toward a certain scale. Gradient descent was employed to optimize hyperplane parameterization, running for between 100 and 1,000 iterations with the squared hinge loss function. This methodology combatted overfitting, given the traditional bias-variance tradeoff that plagues machine learning (Luxburg & Schölkopf, 2008) . Because the positive training set was derived from evolutionarily distinct groups (CAHS, SAHS, MAHS, LEA, and Dsup), they were not sufficiently homologous to each other to warrant using alignment scores as SVM features (Hesgrove & Boothby, 2020 subclassified afterward by performing ~300,000 local sequence alignments between positive predictions and positive training set proteins to classify predictions into their respective TDP subgroups. Bit scores for alignments between predicted positives and known positives were averaged with respect to each TDP subgroup, with the predicted subgroup being declared according to the highest average. Specifically, the BLOSUM62 standard substitution scoring matrix (Henikoff & Henikoff, 1992) was used, but with a more severe gap penalty, tailoring the code to locate loose evolutionary relationships (the penalty for a gap of any length was -5, and the added penalty for each residue greater than one for a gap was -1). To assess how well the model separated the training set and generalized to unseen data, leaveone-out cross-validation was implemented. Afterward, to ensure cross-validation results weren't rendered by chance, additional leave-one-out cross-validation rounds were conducted (cross-validation confirmation), first with a shuffled feature space, and then again with shuffled training set class labels (a randomized control). Subclass prediction confidence was also evaluated. 55 known TDPs from the positive training set were aligned against each other. Then, bit scores from within each class (e.g., CAHS versus CAHS) and across classes (e.g., CAHS versus SAHS) were recorded in Excel to locate a classspecific threshold separating bimodal distributions (Table 2 , overleaf). To be deemed a confident prediction, the predicted positive bit score had to meet the threshold for its respective predicted class. Though this procedure does not produce an e-value or p-value, it does indicate the confidence of each predicted positive, erring on the more conservative side, as it corresponds to a false positive rate of 0%, with respect to TDP subtypes. The minimum local alignment bit score required for a protein to be deemed confidently a member of its predicted subclass. Bit score average with respect to each subclass for each disordered protein prediction was compared to the threshold for its associated subclass. If the bit score average was greater than the threshold, it was deemed a confident subclass prediction. The classifier designed in this study was proficient at separating known TDPs from known non-TDPs in cross-validated training set performance, correctly classifying 160/171 training set proteins, yielding raw accuracy of ~93.6% (Figure 4 ). This points to the suitability of disorder and sequence features selected for SVM training. For training set performance, the area under the curve (AUC) score for the receiver operating characteristic (ROC) curve was ~0.98. To confirm the hyperplane was not overfitted to training set data, leave-one-out cross-validation was performed. The leave-one-out crossvalidated AUC of the ROC was ~0.95. To ensure the promising cross-validation results were not generated by chance, secondary cross-validation (confirmation) was conducted, which involved rerunning cross-validation, but with shuffled training data. For the first round of cross-validation confirmation, with training on shuffled labels, the AUC score was ~0.56. For the second round, with training on shuffled feature space, the AUC score was ~0.52 ( Figure 5 ). Shuffled training data diminished the predictive capacity of the SVM, as expected. Qualitatively speaking, the closeness of model performance during training and leave-one-out cross-validation alludes to negligible overfitting and the strong, generalizable predictive power of the overall pipeline. At the core of this study was the creation of a concise list of TDPs intended to guide researchers in the budding field of tardigrade-based technology development. The SVM predicted 5,415 previously unknown IDPs/TDPs out of 44,836 total PPD proteins. Of these 5,415 proteins, 52 were also detected experimentally in previous studies, meaning they map to sequences from the six publications listed under "Data Sourcing." Also of these 5,415 proteins, 30 passed the bit score threshold for homology to TDP subclasses. To gain insight into how these proteins might play a role in cryptobiosis, InterPro, a protein annotation tool (Apweiler et al., 2001; Blum et al., 2021) , was used to analyze the 82 proteins (Appendix). Many entries were "hypothetical proteins," which we were able to characterize with InterPro, shedding light on their function. This narrowed down list will empower researchers to sidestep tedious experimental sifting for disordered proteins. Proteins in the finalized list were manually annotated with InterPro to provide a foothold for scientists exploring TDPs, as well as to expedite the genesis of new TDP technology. InterPro annotations showed that the 82 proteins of interest are involved in diverse biological processes, including motor activity, ATP binding, DNA binding, and ion sequestration. Curiously, 15/82 (~18.3%) were annotated as heat shock proteins (Hsp20, Hsp40, Hsp70, and Hsp90 were all detected). In this group, prevalence of heat shock proteins, a type of molecular chaperone that assists in biomolecule assembly/disassembly, reinforces their role in the tardigrade cryptobiotic molecular landscape (Schokraie et al., 2010; Schokraie et al., 2011) . Characterizing these putative TDPs is a pivotal step forward in this field. For TDP subclassification, 297,825 local . Graphs A-E show bit score distributions analyzed to determine class-specific confidence thresholds. After predicted and known positives were subject to local sequence alignments, the resultant bit scores were averaged with respect to each subclass, and the subclass associated with the highest average for each protein was deemed its predicted subclass. The confidence threshold represents the highest bit score that can be achieved from an alignment between two sequences from unrelated subclasses. As demonstrated by how the majority of bit scores lie below the threshold, this confidence metric is quite rigorous. One immediate application of this classifier is compensating for both the deficit in and drawbacks of tardigrade experiments. The handful of existing publications we pulled data from, though carefully selected, presumably reflect a fraction of tardigrade protein expression under various stress conditions. Moreover, IDPs are difficult to measure and identify because many can form higher-order structures that can may make them inaccessible to standard platforms such as nuclear magnetic resonance (Radivojac et al., 2004 ) or x-ray crystallography. Now that we have narrowed down proteins of interest, these can be targeted for specific measurements experimentally, such as with mass spectrometry, or genetic experiments. Since it would be a costly and lengthy process to investigate all PPD proteins experimentally, identifying proteins of interest beforehand ensures resources are spent prudently. While predictive pipelines are typically utilized to weed out IDPs and focus on ordered proteins (Atkins et al., 2015) , the same principle can be readily reversed, as was done here, for the purpose of studying IDPs. IDPs are vital to extremotolerance and cell biology in general (Dunker, Bondos, Huang, & Oldfield, 2015; Wright & Dyson, 2015) , playing crucial roles in protein-protein interactions due to their flexibility, but they are severely understudied (Necci, enlarge the register of known tardigrade-specific IDPs (TDPs), which is a notable step forward for creating TDP technologies with impacts in fields ranging from astrobiology to genetic engineering. An area for additional investigation would be constructing a separate SVM classifier for each TDP subclass. Future studies could also increase the number of shuffled cross-validation confirmation trials and develop a system for strategically, instead of randomly, selecting the negative training set. To fully reap the benefits of TDP-based technology, researchers need to assess proteins we identified in vitro, such as through mass spectrometry and western blotting (as in Schokraie et al., 2010) , as determining which of these potential IDPs are upregulated during desiccation, and to what extent, will be critical for identifying with confidence which are ideal for incorporation into biotechnology. Other avenues of investigation would include designing genetic knockout/knockdown experiments in tardigrades for targeting proteins identified here, as well as testing for transfer of stress tolerance to heterologous systems and performing stress tolerance assays. Recent studies have observed how TDPs can be utilized as a means for transferring oxidative stress tolerance (Chavez et al., 2019) , radiation tolerance (Hashimoto et al., 2016; Kirke, Jin, & Zhang, 2020; Westover et al., 2020) , and osmotic stress tolerance (Tanaka et al., 2015) between organisms. Notably, in 2017, transfer of CAHS proteins to yeast increased their desiccation tolerance approximately 100-fold (Boothby et al., 2017) . The same study found Escherichia coli exhibited a similar response, pointing to the promise of using TDPs to confer tardigrade extremotolerance to other life forms. Therefore, while this study fortifies knowledge of basic tardigrade biology and stress response, it has ramifications for higher life forms as well. Serendipitously, certain proteins pinpointed were likened to those found in humans. Common examples from the Appendix include fatty acid-binding proteins (Smathers & Petersen, 2011) and DnaJ/DnaK proteins. Table 3 presents select proteins from the Appendix and their human homologs, logical candidates for technological applications. Insofar as TDPs take on a material state upon desiccation (Boothby et al., 2017) , form supportive cell structures (Richaud et al., 2020) , and sequester ions (Bray, 1993) , they hold great promise for many advantageous applications. To clarify, TDPs could be used to design more effective, affordable biomaterial preservation methods. The concept of mimicking anhydrobiotic creatures to preserve organs, tissues, and blood has been pondered for half a century (Keilin, 1959) . Today, cytosolic abundant heatsoluble TDPs (CAHS) are already being evaluated for manufacturing pharmaceutical excipients, such as lactate dehydrogenase and lipoprotein lipase, and the results are encouraging (Piszkiewicz et al., 2019) . Such a preservative could also aid storage of medical supplies containing biological components, such as vaccines. To that end, because CAHS proteins protect enzymes from desiccation and lyophilization, they have been tested as a vaccine stabilizer. When a vaccine with this preservative was tested in mice, the stabilizer was deemed as not toxic, and treatment elicited antibody production as intended (Esterly et al., 2020) . Cold storage is sometimes a limiting factor when it comes to distributing vaccines (Kaufmann, Miller, & Cheyne, 2011; Zaffran et al., 2013 )-such as with some COVID-19, chicken pox, and Ebola vaccines-but with TDP stabilizers, certain vaccine stockpiles could be dispersed and stored safely and affordably, circumventing the cumbersome cold chain and even ultra-cold chain vaccine preservation methods, thus increasing global medical equity. Outside of the realm of biopreservation, TDPs could be a valuable addition to therapeutics. For instance, they could be incorporated into treatments for delaying deleterious cell processes when an injury occurs in an isolated location. Delaying necrosis and apoptosis would be especially germane to improving patient health outcomes when treatment is time-sensitive. This advancement could be crucial for the military in decreasing battlefield mortality. TDPs could also have applications in medical conditions involving oxidative stress. Because oxidative stress can contribute to cancer (Liguori et al., 2018) , antioxidant properties of these proteins (Hashimoto et al., 2016; Rizzo et al., 2010) could prove useful as part of new treatments. Specifically, certain tardigrade proteins form structures that can capture reactive oxygen species and cut short their damaging ripple effect within the cell. Aside from medical implications, mimicking tardigrade resilience has more broad technological applications, namely in coping with Earth's changing climate. For instance, a damage suppressor protein (Dsup) gene was expressed in plants and decreased radiation-induced DNA damage in Kirke et al. (2020) . Theoretically, extremotolerant plants could be engineered to maintain a stable food supply despite escalating environmental challenges. In the far future, perhaps such recombinant plants could play a role in sustaining life on other planets. On the note of space travel, tardigrades have survived the brutal conditions of space on multiple missions since the early 2000s (Jönsson et al., 2008; Persson et al., 2011; Rebecchi et al., 2009 Rebecchi et al., , 2010 Rebecchi et al., , 2011 Vukich et al., 2012) and are considered model organisms for space research Guidetti, Rizzo, Altiero, & Rebecchi, 2012; Jönsson, 2007; May, Maria, & Guimard, 1964) . Seeing as certain TDPs are involved in radiotolerance (Hashimoto et al., 2016) , they could be used to engineer radiation-resistant crops or even to protect astronauts against cosmic radiation, helping them explore new frontiers. this study charts a course for more directed research into tardigrade cryptobiosis, potentially permitting humans to one day exploit and cross-apply the tardigrade's natural strengths for the benefit of such fields as medicine, biotechnology, and even space travel. Further, considering how water shortages have wreaked havoc on crops and livestock in the wake of climate change, the prospect of engineering TDPexpressing crops that can withstand drought is of special import. The tardigrade's perplexing proteins could be key to conferring the same desiccation tolerance that has been observed in the enigmatic tardigrade for hundreds of years. Ironically, unraveling the underpinnings of protein disorder could be paramount for creating a more ordered world. Greater distance to hyperplane indicates more confident classification. Predicted Type column is based on bit score averages from local alignments between predicted/known positives. Amino acid is abbreviated as AA throughout. 52 red entries: experimentally detected in previous studies, with predicted types not satisfying bit score significance threshold. 30 blue entries: confident TDP type predictions without experimental detection. Distance to Hyperplane Length (AA #) InterPro Analysis 1 OQV23481.1 1.78E+00 871 LEA Contains myosin tail domains. Nine sections involved with coiling (e.g., AAs 287-607). LEA proteins are known to transition from randomly coiled chains to semi-folded proteins during hydration level changes (Li & He, 2009 ). Tardigrades rely on myosin-actin interactions for leg locomotion (Gross & Mayer, 2019) . Ion concentrations, like calcium, are in flux during desiccation, so tardigrade actin-myosin interaction has been studied (Prasath, Greven, & D'Haese, 2012 (Greven, 2007) . Some marine species have flagella (Bai et al., 2020; Rebecchi, 1997 (Cooper, 2000) . IDPs/IDRs allow proteins to bind to multiple partners and bolster binding efficiency because conformational plasticity eases orientation limitations (Dyson & Wright, 2002; Hsu et al., 2012; Liu et al., 2006; Oldfield et al., 2008; Uversky & Dunker, 2010) . NTPase family includes ATPases and GTPases that hydrolyze ATP and GTP (Vaishnavi & Lorsch, 2013 Myosin helps with dimerization, and some contain heptad repeats (Krendel & Mooseker, 2005) . IDPs do generally contain sequential repeats (Jorda, Xue, Uversky, & Kajava, 2010; Simon & Hancock, 2009 AAs 1-410 annotated as calreticulin family. Calreticulin is a chaperone in the endoplasmic reticulum that inactivates Ca 2+ (Michalak, Mariani, & Opas, 1998) . IDPs have been shown to sequester ions during desiccation (Bray, 1993 Obtaining crystals for X-ray crystallography is an impediment to determining alpha crystallin's structure (Augusteyn, 2004) , which is typical of IDPs (Radivojac et al., 2004) . In Reuner et al. (2010) , heat shock yielded upregulation of alpha-crystallin in M. tardigradum, suggesting it is involved in tardigrade resilience. Annotated as fructose-bisphosphate aldolase, class-I, involved in glycolytic/ catalytic activity. Perhaps when aerobic respiration is reduced during tardigrade anhydrobiosis (Clegg, 1973; Pigoń & Weglarska, 1955) , glycolysis compensates. (Yin, Schnoor, & Jun, 2020) . EF-hands bind calcium, reminiscent of TDP ion sequestration. EF-hand proteins have been studied in Hypsibius klebelsbergi (Prasath et al., 2012 AAs 1,379-1,635 annotated as zona pellucida (ZP) domain, which is involved in serving structural roles and helping signal transduction Dsup Non-cytoplasmic domain at AAs 25-1,054. Molecular function: chitin binding in extracellular region. Chitin domain at AAs 76-135. Tardigrade cuticle/feeding apparatus are chitinous (Guidetti GAU90042.1 5.13E-01 837 Dsup Multiple sections annotated as disordered Dsup AAs 23-672 marked as non-cytoplasmic domain. Snake toxin-like at AAs 21-103. Ophanin, snake venom protein/neurotoxin, was detected in tardigrades in Schokraie et al. (2012) and hypothesized to be used by carnivorous tardigrade species while hunting GAU95291.1 1.26E-01 972 Dsup AAs 87-183 annotated as peptidase domain. Predicted biological functions include proteolysis, metallopeptidase activity, and metalloendopeptidase activity 20E-01 557 Dsup AAs 140-308 annotated as RNA recognition motif (RRM) domain. Predicted molecular function: nucleic acid binding AAs 53-140 homologous to winged helix (WH) DNA-binding domain GAV05281.1 1.65E+00 359 Dsup AAs 150-354 highlighted as AT-hook DNA binding domain 33E+00 474 Dsup Assigned to linker histone H1/H5 protein family. Predicted biological processes: nucleosome assembly and DNA binding 14E-01 724 Dsup AAs 23-724 marked as non-cytoplasmic domain 69E-01 1,119 Dsup AAs 31-1,119 marked as non-cytoplasmic domain Dsup RRM domain at AAs 344-605. Two RNA recognition motifs bookend the protein at AAs 10-839 and Dsup Non-cytoplasmic domain at AAs 21-1,616. Signal peptide N-region at AA one, indicating a secretory protein Dsup AAs 21-1,615 marked as non-cytoplasmic domain. First amino acid annotated as signal peptide N-region Signal peptide N-region at AAs 1-5. Transmembrane helix domain at AAs 5-27. Miyazawa et al. (2021) found the disordered N-terminus of a SAHS protein can form an amphiphilic alpha helix OQV23218.1 1.68E+00 237 CAHS AAs 118-145 and 173-193 annotated as coiled Dsup AAs 40-101 flagged as chitin binding domain. AAs 20-745 marked as noncytoplasmic domain 90E-01 816 Dsup AAs 1-3 annotated as signal peptide N-region. AAs 58-684 related to S-phase kinase activator. AAs 16-816 noted as non-cytoplasmic domain AAs 36-167 are in fact highlighted as FABPs 156 SAHS Similar to #26. AAs 22-153 homologous to calycin. AAs 21-153 highlighted as FABP. AAs 19-156 marked as non-cytoplasmic domain 156 SAHS Similar to #27. AAs 18-155 homologous to calycin. AAs 21-153 highlighted as FABP. AAs 19-156 marked as non-cytoplasmic domain 33E-01 165 SAHS Assigned to intracellular lipid binding protein family. AAs 29-165 homologous to calycin. AAs 31-162 noted as FABP. AAs 22-165 marked as non-cytoplasmic domain Heat shock proteins in encysted and anhydrobiotic eutardigrades The InterPro database, an integrated documentation resource for protein families, domains and functional sites Disorder prediction methods, their applicability to different protein targets and their usefulness for guiding experimental studies Alpha-crystallin: A review of its structure and function Moebjergarctus clarionclippertonensis, a new abyssal tardigrade (Arthrotardigrada, Halechiniscidae, Euclavarctinae) from the Clarion-Clipperton Fracture Zone Die anabiose der tardigraden Dynamic programming Tolerance to gamma radiation in the tardigrade Hypsibius dujardini from embryo to adult correlate inversely with cellular proliferation Tardigrades as a model for experiences of animal survival in space The InterPro protein families and domains database: 20 years on Herrn Karl Bonnets Abhandlungen aus der Insektologie Tardigrades use intrinsically disordered proteins to survive desiccation Molecular responses to water deficit 14-3-3 proteins: A number of functions for a numbered protein Translationally controlled tumor protein is a conserved mitotic growth integrator in animals and plants Zinc-finger proteins in health and disease Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage The tardigrade damage suppressor protein binds to nucleosomes and protects DNA from hydroxyl radicals. eLife, 8, e47682 Structure, function and evolution of DnaJ: Conservation and adaptation of chaperone function Do dried cryptobiotes have a metabolism? The cell: A molecular approach Support vector machines The role of vitrification in anhydrobiosis Lessons from nature: The role of sugars in anhydrobiosis Actual checklist of Tardigrada species The spectrin repeat: A structural platform for cytoskeletal protein assemblies Sur le facilité possedent les tardigrades, les rotifers, les anguillules de toit et quelques autre animalcules, de revenir à la vie après été completement déssechées Intrinsically disordered proteins and multicellular organisms Coupling of folding and binding for unstructured proteins Toxicity and immunogenicity of a tardigrade cytosolic abundant heat soluble protein in mice The lipocalin protein family: Structural and sequence overview Transcriptome analysis in tardigrade species reveals specific molecular pathways for stress adaptations Tardigrade workbench: Comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades Anabiosi nei tardigradi Comments on the eyes of tardigrades Cellular morphology of leg musculature in the water bear Hypsibius exemplaris (Tardigrada) unravels serial homologies Distribution of calcium and chitin in the tardigrade feeding apparatus in relation to its function and morphology What can we learn from the toughest animals of the Earth? Water bears (tardigrades) as multicellular model organisms in order to perform scientific preparations for lunar exploration Extremotolerant tardigrade genome and improved radiotolerance of human cultured cells by tardigrade-unique protein DNA protection protein, a novel mechanism of radiation tolerance: Lessons from tardigrades Anhydrobiosis in tardigrades and its effects on longevity traits Trehalose and anhydrobiosis in tardigrades-evidence for divergence in responses to dehydration Amino acid substitution matrices from protein blocks The biology of tardigrade disordered proteins in extreme stress tolerance Prediction of protein antigenic determinants from amino acid sequences Analysis of DNA repair and protection in the tardigrade Ramazzottius varieornatus and Hypsibius dujardini after exposure to UVC radiation Radiation tolerance in the tardigrade Milnesium tardigradum Intrinsic protein disorder and protein-protein interactions Classification of proteins according to conformation DISOPRED3: Precise disordered region predictions with annotated protein-binding activity Facts and fiction about long-term survival in tardigrades Radiation tolerance in the eutardigrade Richtersius coronifer Trehalose in three species of desiccation tolerant tardigrades Tardigrades survive exposure to space in low Earth orbit Induction of Hsp70 by desiccation, ionizing radiation and heat-shock in the eutardigrade Richtersius coronifer Tardigrades as a potential model organism in space research Protein tandem repeats-the more perfect, the less structured Zona pellucida domain proteins Comparative transcriptomics suggest unique molecular adaptations within tardigrade lineages Protein Transport into the Endoplasmic Reticulum Vaccine supply chains need to be better funded and strengthened, or lives will be at risk. Health Aff (Millwood) The Leeuwenhoek lecture-the problem of anabiosis or latent life: History and current concept Tardigrades and anhydrobiosis water bears and water loss Expression of a tardigrade Dsup gene enhances genome protection in plants Pre-treatment with D942, a furancarboxylic acid derivative, increases desiccation tolerance in an anhydrobiotic tardigrade Hypsibius exemplaris Myosins: Tails (and heads) of functional diversity A simple method for displaying the hydropathic character of a protein Desiccation induced structural alterations in a 66-amino acid fragment of an anhydrobiotic nematode late embryogenesis abundant (LEA) protein Oxidative stress, aging, and diseases Intrinsic disorder in transcription factors Statistical learning theory: Models, concepts, and results. Handbook of the History of Logic Actions différentielles des rayons x et ultraviolets sur le tardigrade Macrobiotus areolatus, a l'état et desséché Calreticulin, a multifunctional Ca2+ binding chaperone of the endoplasmic reticulum A computational structural study on the DNA-protecting role of the tardigrade-unique Dsup protein Tardigrade secretory-abundant heat-soluble protein has a flexible β-barrel structure in solution and keeps this structure in dehydration Influence of sequence changes and environment on intrinsically disordered proteins Assessment of protein disorder region predictions in CASP10 Database resources of the National Center for Biotechnology Information Critical assessment of protein intrinsic disorder prediction Detection of a troponin I-like protein in non-striated muscle of the tardigrades (water bears) Intrinsically disordered proteins and intrinsically disordered protein regions Flexible nets: Disorder and induced fit in the associations of p53 and 14-3-3 with their partners GTPases -molecular switches of cellular signaling pathways More than just tails: Intrinsic disorder in histone proteins Extreme stress tolerance in tardigrades: Surviving space conditions in low Earth orbit Rate of metabolism in tardigrades during active life and anabiosis Protecting activity of desiccated enzymes EF-hand proteins and the regulation of actin-myosin interaction in the eutardigrade Hypsibius klebelsbergi (Tardigrada) Protein flexibility and intrinsic disorder Biologische und physiologische Beiträge zur Kenntnis der Moosfauna Resistance of the anhydrobiotic eutardigrade Paramacrobiotus richtersi to space flight (LIFE-TARSE mission on FOTON-M3) Tardigrade resistance to space effects: First results of experiments on the LIFE-TARSE mission on FOTON-M3 Resistance to extreme stresses in the Tardigrada: Experiments on Earth and in space and astrobiological perspectives Ultrastructural study of spermiogenesis and the testicular and spermathecal spermatozoon of the gonochoristic tardigrade Xerobiotus pseudohufelandi (Eutardigrada, Macrobiotidae) Stress response in tardigrades: Differential gene expression of molecular chaperones Desiccation of Panagrolaimus rigidus (Nematoda): Survival, reproduction and the influence on the internal clock Ultrastructural analysis of the dehydrated tardigrade Hypsibius exemplaris unveils an anhydrobiotic-specific architecture Antioxidant defenses in hydrated and desiccated states of the tardigrade Paramacrobiotus richtersi Desiccation tolerance in embryonic stages of the tardigrade Milnesium tardigradum Stress gene (Hsp70) sequences and quantitative expression in Milnesium tardigradum (Tardigrada) during active and cryptobiotic stages Investigating heat shock proteins of tardigrades in active versus anhydrobiotic state using shotgun proteomics Proteomic analysis of tardigrades: Towards a better understanding of molecular mechanisms by anhydrobiotic organisms Comparative proteome analysis of Milnesium tardigradum in early embryonic state versus adults in active and anhydrobiotic state Preserving tardigrades under pressure Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins The resilience of life to astrophysical events The human fatty acid-binding protein family: Evolutionary divergences and functions Cytoplasmic vitrification and survival of anhydrobiotic organisms Novel mitochondria-targeted heat-soluble proteins identified in the anhydrobiotic tardigrade improve osmotic tolerance of human cells Recovery and reproduction of an Antarctic tardigrade retrieved from a moss sample frozen for over 30 years The UniProt Consortium, UniProt: The universal protein knowledgebase in 2021 Understanding protein non-folding Chapter twenty-five-ATP and GTP hydrolysis assays (TLC) Classification of intrinsically disordered regions and proteins BIOKIS: A model payload for multidisciplinary experiments in microgravity Towards decrypting cryptobiosis-analyzing anhydrobiosis in the tardigrade Milnesium tardigradum using transcriptome sequencing Bound water, metabolites and genetic continuity Engineering radioprotective human cells using the tardigrade damage suppressor protein Intrinsically disordered proteins in cellular signaling and regulation Two novel heat-soluble protein families abundantly expressed in an anhydrobiotic tardigrade Structural characteristics, binding partners and related diseases of the calponin homology (CH) domain Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus The imperative for stronger vaccine supply and logistics systems High-throughput computation was enabled by the Harvard Medical School O2 high performance compute cluster.