key: cord-0772224-y8bda8wb
authors: Mesdaghi, Shahram; Murphy, David L.; Sánchez Rodríguez, Filomeno; Burgos-Mármol, J. Javier; Rigden, Daniel J.
title: In silico prediction of structure and function for a large family of transmembrane proteins that includes human Tmem41b
date: 2020-12-03
journal: F1000Res
DOI: 10.12688/f1000research.27676.1
sha: b092be39c881197e6657628b547a671fb1f47aa9
doc_id: 772224
cord_uid: y8bda8wb

Background: Recent strides in computational structural biology have opened up an opportunity to understand previously uncharacterised proteins. The under-representation of transmembrane proteins in the Protein Data Bank highlights the need to apply new and advanced bioinformatics methods to shed light on their structure and function. This study focuses on a family of transmembrane proteins containing the Pfam domain PF09335 ('SNARE_ASSOC'/ ‘VTT ‘/’Tvp38’). One prominent member, Tmem41b, has been shown to be involved in early stages of autophagosome formation and is vital in mouse embryonic development as well as being identified as a viral host factor of SARS-CoV-2. Methods: We used evolutionary covariance-derived information to construct and validate ab initio models, make domain boundary predictions and infer local structural features. Results: The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis. Furthermore, cross-referencing of other prediction data with covariance analysis showed that the internal repeat features two-fold rotational symmetry. Ab initio modelling of Tmem41b and homologues reinforces these structural predictions. Local structural features predicted to be present in Tmem41b were also present in Cl (-)/H (+ )antiporters. Conclusions: The results of this study strongly point to Tmem41b and its homologues being transporters for an as-yet uncharacterised substrate and possibly using H (+) antiporter activity as its mechanism for transport.

A protein's structural information is crucial to understand it's function and evolution. Currently, there is only experimental structural data for a tiny fraction of proteins (Khafizov et al., 2014) . For instance, membrane proteins are encoded by 30% of the protein-coding genes of the human genome (Almén et al., 2009 ), but they only have a 2% representation in the Protein Data Bank (PDB) (Koehler Leman et al., 2015) . Membrane protein families are particularly poorly understood due to experimental difficulties, such as over-expression, which can result in toxicity to host cells (Grisshammer & Tateu, 1995) , as well as difficulty in finding a suitable membrane mimetic to reconstitute the protein. Additionally, membrane proteins are much less conserved across species compared to water-soluble proteins (Sojo et al., 2016) , making sequence-based homologue identification a challenge, and in turn rendering homology modelling of these proteins more difficult. Membrane proteins can be grouped according to their interaction with various cell membranes: integral membrane proteins (IMPs) are permanently anchored whereas peripheral membrane proteins transiently adhere to cell membranes. IMPs that span the membrane are known as transmembrane proteins (TMEMs) as opposed to IMPs that adhere to one side of the membrane (Fowler & Coveney, 2006) .

One IMP protein family is Tmem41, which has two human representatives, namely Tmem41a and Tmem41b; both share the PF09335 ('SNARE_ASSOC'/ 'VTT '/'Tvp38') Pfam (El-Gebali et al., 2019) domain. The profile of Tmem41b has recently risen due to experimental evidence pointing to its involvement in macroautophagy regulation (making it a possible Atg protein, i.e. an autophagy related protein) and lipid mobilisation (Moretti et al., 2018) . Other studies identify Tmem41b to be involved in motor circuit function, with TMEM41B-knockout Drosophila showing neuromuscular junction defects and aberrant motor neuron development in knockout zebrafish (Lotti et al., 2012) . Also, it has been reported that in TMEM41B-knockout HeLa cells there is an inhibition of Zika virus replication (Scaturro et al., 2018) . Tmem41b has also been identified as a host cell factor for SARS-CoV-2 (Schneider et al., 2020). Tmem41b is the only common host cell factor identified for flaviviruses and coronaviruses and is the only autophagy-related protein identified as a viral host factor (Hoffmann et al., 2020).

Additionally, Tmem41b has been shown to be essential for mouse embryonic development: homozygous knockout mice embryos suffer early termination of their development after Until the structure of poorly characterised protein families such as Pfam family PF09335 can be elucidated experimentally, ab initio protein modelling can be used to predict a fold allowing for structure-based function inferences (Rigden et al., 2017). Such methods have made significant strides recently due to the availability of contact predictions (Kinch et al., 2016). Prediction of residue-residue contacts relies on the fact that each pair of contacting residues covaries during evolution. The process of co-variation occurs as the properties of the two residues complement each other in order to maintain structural integrity of that local region and, consequently, its original functionality. Therefore, if one residue from the pair is replaced, the other must also change to compensate the physical chemical variation and hence preserve the original structure (Lapedes et al., 1999) . The link between two residues can be then reliably detected in multiple sequence alignments by using direct coupling analysis In the current study, we utilised state of the art methods to make structural predictions for two prominent members of the Pfam family PF09335 (Tmem41b and Yqja) by exploiting data derived from sequence, evolutionary covariance and ab initio modelling. We are able to predict that PF09335 homologues (VTT proteins) contain re-entrant loops (stretches of protein that enter the bilayer but exit on the same side of the membrane) as well as a pseudo-inverted repeat topology. The predicted presence of both of these structural features strongly suggests that VTT proteins are secondary active transporters for an uncharacterised substrate. Dataset for custom re-entrant database A library of re-entrant loop sequences together with the putative re-entrant loop sequences from the query proteins were clustered to establish any visible relationships of the sequences. Table 1 ). The Mt2055 sequence originates from the unpublished draft genome of the archaebacterium Methanolobus tindarius DSM 2278. For many of the subsequent analyses, the shorter archaeal sequence was used initially but the clear homology among this set of proteins means that inferences can be drawn across the group.

There are no known experimental protein structures representing PF09335 or PF06695, but both Gremlin and DMPfold have constructed ab initio models for these Pfam domains and contact density results therefore pointed to a specific domain structure being present.

Sequence & contact prediction map analysis indicate that PF06695 is made up of a tandem repeat When the Mt2055 sequence was split at residue 60-61, the resulting N-terminal region of 60 residues and the C-terminal section of 79 residues could be aligned using HHalign (Soding, 2005) with a 78% probability and an E-value of 1.9E-3. Examination of the map of predicted contacts for Mt2055 reveals features that are present in both the N-and C-terminal halves of the protein (Figure 1c ). Taken together, these data strongly support the existence of a tandem repeat within the Mt2055 protein and hence across the PF06695 and PF09335 protein families.

Interestingly, however, an equivalent sequence analysis with HHpred of other PF09335 homologues including Tmem41b itself does not reveal a repeat. However, inspection of their corresponding predicted contact maps does reveal features repeated when N-and C-halves of the protein are compared ( Figure 2 ). Apparently, evolutionary divergence has removed all trace of the repeat sequence signal in bacterial and eukaryotic proteins, although the feature remains visible by evolutionary covariance analysis. Pfam domain boundaries for PF09335/PF06695, which define the limits of these previous modelling exercises, do not reflect the conserved structural domain that we predict. Given the fact that the available ab initio models were inconsistent with the transmembrane helix, secondary structure and contact predictions (data not shown), we constructed our own models of Mt2055 as well as Tmem41b and Yqja with trRosetta.

The Mt2055, Tmem41b and Yqja models had estimated TM scores from the trRosetta server of 0.633, 0.624 and 0.635 respectively, suggesting that they were likely to have captured the native fold of the family. All-against-all pairwise structural superposition of the models with DALI gave a mean Z-score of 11.9 confirming their strong similarity. We also used satisfaction of predicted contacts to validate the models ( The presence of a re-entrant loop packed against each TM helix can also be seen on predicted contact maps for these proteins (Figure 3b) . Interestingly, each of the re-entrant helices is predicted as a single transmembrane region in the TopCons predictions (see the diagonal of Figure 3b ) with a two-residue region of coil in the centre. Such a prediction would more obviously be treated as indicative of some kind of kink in the helix (Law et al., 2016) but the explanation here is these regions form re-entrant helices. Similar contact map features, indicative of re-entrant loops packing against TM helices, can be seen clearly on the contact maps of other VTT proteins (data not shown).

The analysis was performed by HELIQUEST (Gautier et al., 2008) which constructed helical wheel diagrams and provided a quantitative measure of the hydrophobic moment for the region being analysed (Figure 4 ).

Mapping conservation onto the models, using the Consurf server, indicates that the re-entrant loops are highly conserved and therefore likely to be functionally and/or structurally important ( Figure 5 ).

The presence of re-entrant loops and the high density of conserved residues within them caused us to examine experimentally characterised re-entrant loops in the PDBTM database. A total of 56 non-redundant re-entrant helices were identified (see Methods). All 56 were clustered with the putative re-entrant loops from Mt2055 and four PF09335 homologues (Tmem41b, Tvp38, Ydjx and Ydjz) using relative E-values derived from an all-against-all BLAST run in CLANS Analysis of the Cl -/H + antiporter structures show that they contain a similar inverted repeat as we infer for the VTT homologues, resulting in pseudo-2-fold axis of symmetry running along the membrane (Duran & Meiler, 2013). Again similarly, the Cl -/H + antiporter 3orgA also contains the amphipathic helices on the N-terminal side of the re-entrant loops. The fact that the presence of the amphipathic helices is restricted only to 3orgA and not found in all homologues suggest that these features are not essential for function ( Figure 5 ).

The presence of re-entrant loops in a transmembrane protein strongly indicates a transporter or pore functionality since this structural feature has, hitherto, only been found in proteins of this kind (Yan & Luo, 2010). The structural similarities between the VTT proteins and the Cl -/H + antiporters raise the possibility that the families studied here are, in fact, unsuspected distant homologues having this putative pore feature in common. In that regard it is relevant to recall a hypothesis that DedA proteins are H + antiporters resulting from SDM experiments (Justice et al., 2016) .

A recent study has identified key residues ( Figure 6) (Figure 7 ). Re-entrant loops are known to form pores and here we have two proton-titratable residues (E39, D51) in close proximity to essential basic residues (R130 and R136) within a putative pore. This three-dimensional arrangement of key residues could serve a role in the coupling of the protonation status with the binding of a yet to be characterised substrate as is postulated for the multi-drug H + antiporter MdfA (Heng et al., 2015) where these same residues are located inside a central cavity. . Additionally, the fact that VTT proteins show structural similarities with H + antiporters indicate that these proteins may also couple substrate transport with an opposing H + current. Indeed, the Yqja homologue also contains strategically placed residues known to be involved in H + antiporter activity.

The ab initio models show that the essential residues come together in the region that would be buried in the membrane potentially forming a substrate chamber consistent with the transport of a specific substrate. Further research needs to be carried out to determine what this substrate is and confirm the mechanism of transport.

All data underlying the results are available as part of the article and no additional source data are required.

The sentence: "The analysis was performed by HELIQUEST (Gautier et al., 2008) which constructed helical wheel diagrams and provided a quantitative measure of the hydrophobic moment for the region being analysed (Figure 4 )." is out of context. In that paragraph are described the reentrant helices, shouldn't the sentence (and the figure) be in the paragraph before where are mentioned the amphipathic helices? The figure discussion in the text should be extended. transmembrane proteins but interpretation of results of such analysis should be done carefully. While the generated models in the manuscript are interesting and might be fully or partly true, the sequence analysis and interpretation of the results are problematic.

Major: -The authors should be more specific about the exact boundaries of Pfam domains in different proteins as well as the sequence relations of proteins presented in Table 1 . Please provide multiple sequence alignment for these proteins indicating the localization of the two pfam domains and the proposed re-entrant loops/transmembrane regions in the sequences.

-The authors propose Mt2055 contains a tandem repeat and suggest the duplication is present in Tmem41b and Yqja structure as well even if it is undetectable from sequence analysis. The proposed domain boundary in Figure1a and arguments for tandem duplication does not seem convincing. The e-value of 1.9E-3 is quite large for the alignment. The authors should rule out that results in their paper may occur purely by chance. Please test the statistical significance of this value by generating pairwise alignments of transmembrane regions of unrelated transmembrane proteins with similar length. Moreover, contact maps for Mt2055 and Tmem41b were generated from the same multiple alignment, and therefore they must be identical/similar. Thus the similarities does not prove the tandem duplication occurred in Tmem41b too.

-Structure modeling of membrane proteins is somewhat different from globular ones for several reasons. It is highly recommended to use specific software for this task or argue why used a nonspecific one. On one hand, in general, topology prediction is more accurate than structure modeling and should be used as an input to aid the modeling. The reviewer is not sure the result of a standard ab initio structure modeling program is sufficient to question topology prediction results. On the other hand, topology prediction results are different for Tmem41b (6 TM helix) and Mt2055 (4 TM helix). Notably, other consensus topology method (CCTOP) have a similar result for Mt2055 (4 helix), but different for Tmem41b (6 helix). Using a third method (Octopus) a re-entrant loop is predicted. The authors should elaborate on such results instead of picking one method and running it on only one of the sequences.

-Authors state: "For many of the subsequent analyses, the shorter archaeal sequence was used initially but the clear homology among this set of proteins means that inferences can be drawn across the group." -Please provide the used multiple sequence alignment with pairwise similarities to support this statement.

-It is not clear how helical wheels and hydrophobic moments support the manuscript -please provide a better description or omit these results.

-Problems/Validation of re-entrant loops:

The authors selected 56 sequence regions from PDBTM database and run an all-against-all Blast search and create clusters based on the search results. Since the sequence complexity of membrane regions are lowest than regions of globular proteins, the analysis should be repeated on randomly selected transmembrane segments. Please provide the list of the selected 56 re-entrant loops together with the results of the repeated analysis.

Authors state: "The presence of a re-entrant loop packed against each TM helix can also be seen on predicted contact maps for these proteins (Figure 3b) ." Re-entrant loops cannot be ○ seen on contact map, only parallel and anti-parallel structures. A similar contact map can be easily generated from 3 transmembrane helices (1 parallel pair and two anti-parallel ones).

The authors filtered removing any sequences of less than 10 residues and more than 20. Although the exact sequence localisation and length of the predicted re-entrant loop are not provided, the regions indicated as the "sign" of re-entrant loops on Figure 3b is larger than 20 residues and on the structures the orange regions contain 7 turns, thus the sequence length of them should be more than 20 residues (7*3.5=24.5). Minor: Abstract/Results: "The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis." -As I showed above, this statement might not be true. Moreover evolutionary covariance data is the results of sequence analysis, so this sentence is void of sense. Please rephrase. ○ Introduction: "there are eight E. coli representatives of the DedA family (YqjA, YghB, YabI, Yoh, DedA, YdjX, YdjZ, and YqaA)" -Character D is missing in Yoh.

○ Introduction: "In the current study, we utilised state of the art methods to make structural predictions for two prominent members of the Pfam family PF09335 (Tmem41b and Yqja) by exploiting data derived from sequence, evolutionary covariance and ab initio modelling." -The most part of the manuscript deal with the sequence analysis of Mt2055, please rephrase this sentence in order to mirror this fact.

○ "Interestingly, each of the re-entrant helices is predicted as a single transmembrane region in the TopCons predictions (see the diagonal of Figure 3b ) with a two-residue region of coil in the centre." -TOPCONS does not predict coils and such details cannot be seen on the figure -please clarify this sentence. 

Is the study design appropriate and is the work technically sound? Partly

Are all the source data underlying the results available to ensure full reproducibility? No

Are the conclusions drawn adequately supported by the results? Partly function as a proton dependent antiporter. While I am not fluent in the computational methods used, their prediction do align with published experimental work. The manuscript is well written and informative, but with a number of factual errors. I also suggest additional citations.

I would like to begin with nomenclature. I received an email from Dr. Noburo Mizushima several months ago. He has published work on the TMEM41B protein. Also included on the email was Lucy Forrest, Dirk Schneider, and Rebecca Keller. It was Dr. Mizushima's suggestion to name this protein family the "DedA superfamily" that includes both prokaryotic and eukaryotic proteins (DedA, VMP, and TMEM41 families). Accordingly, the shared domain will be called "DedA domain" and "VTT" domain would no longer be used. All recipients of this email agreed to using this nomenclature moving forward. Therefore, to avoid confusion, I would like the authors to adopt this nomenclature. I can forward the email upon request.

Since the manuscript contains no line numbers, I will list the suggested corrections by paragraph: Introduction: Paragraph 1: Formally, "membrane proteins" also include various lipid-modified proteins of both prokaryotes and eukaryotes in addition to integral and peripheral membrane proteins.

Paragraph 4: "DedA" does not stand for "death effector domain". It was named in a 1987 paper 1 . See page 12213 of that article. I would like to see this article cited as well for historical purposes.

The sentence that begins with "Phenotypically, DedA knockout E. coli…" should instead read "Phenotypically, E. coli lacking both yqjA and yghB (encoding proteins with 60% amino acid identity and partially overlapping functions)…." This paragraph should also cite 2 .

The sentence that reads "As E. coli expresses multiple DedA homologues, the redundancy protects the cells from the phenotypical effects of single or multiple knock-outs as long as at least one DedA is expressed" should read "As E. coli expresses multiple DedA homologues, lethal effects are not observed as long as at least one DedA is expressed". Cite the following article 3 .

You may also point out that the sole DedA family gene in Borrelia burgdorferi is indeed essential 4 .

YqjA is misspelled "YdjA"

The sentence "Attempts to rescue…." Should be removed, as it does not make sense.

The final sentence about Pseudomonas cites a non-peer reviewed proceeding abstract. I would like all citations to "Justice et al. 2016" removed from this article. This sentence can be replaced with the equally effective "The functions of DedA have also been studied in the pathogen Burkholderia thailandensis where one family member was found to be required for resistance to polymyxin" 5 .

Paragraph 6: "YqjA" is spelled "Yqja" here and throughout the manuscript and should be corrected. This includes in Table 1 .

Paragraph 1: Please spell "Ydjx" and other bacterial proteins as "YdjX" with the final letter capitalized.

Paragraph 5: first sentence, remove "however". Paragraph 14: "A possible role for VTT proteins" final sentence remove "Justice et al." and instead cite 6, 7 . Also, in this sentence, define "SDM" as "site directed mutagenesis".

Paragraph 15, first sentence. This statement is incorrect. Mutation of D51, E39, R130 or R136 in YqjA resulted in properly folded (membrane localized) but nonfunctional proteins unable to complement alkaline pH sensitivity of E. coli YqjA mutant and antibiotic sensitivity of YqjA/YghB double mutant.

Finally, another interesting example of a membrane protein antiporter with re-entrant helices is the undecaprenyl pyrophosphate phosphatase UppP. It is up the authors if they would like to cite these articles 8, 9 .

The hisT-purF region of the Escherichia coli K-12 chromosome. Identification of additional genes of the hisT and purF operons

Temperature sensitivity and cell division defects in an Escherichia coli strain with mutations in yghB and yqjA, encoding related and conserved inner membrane proteins

Multiple deletions reveal the essentiality of the DedA membrane protein family in Escherichia coli.Microbiology (Reading)

BB0250 of Borrelia burgdorferi is a conserved and essential inner membrane protein required for cell division

A DedA Family Membrane Protein Is Required for Burkholderia thailandensis Colistin Resistance

Members of the conserved DedA family are likely membrane transporters and are required for drug resistance in Escherichia coli

Identification of essential arginine residues of Escherichia coli DedA/Tvp38 family membrane proteins YqjA and YghB

Crystal structure of an intramembranal phosphatase central to bacterial cell-wall peptidoglycan biosynthesis and lipid recycling

Crystal structure of undecaprenylpyrophosphate phosphatase and its role in peptidoglycan biosynthesis

Reviewer Expertise: Topology and structure prediction of transmembrane proteins.We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.Reviewer Report 17 December 2020 https://doi.org/10.5256/f1000research.30592.r75807Partly

Are all the source data underlying the results available to ensure full reproducibility? Yes

Competing Interests: No competing interests were disclosed.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com