key: cord-0925066-z1pw9zb3 authors: Muley, Vijaykumar Yogesh; Singh, Amit; Gruber, Karl; Varela-Echavarría, Alfredo title: SARS-CoV-2 Entry Protein TMPRSS2 and Its Homologue, TMPRSS4 Adopts Structural Fold Similar to Blood Coagulation and Complement Pathway Related Proteins date: 2021-04-26 journal: bioRxiv DOI: 10.1101/2021.04.26.441280 sha: 23d459442de51850fa97d8a48da7802e05aac6f4 doc_id: 925066 cord_uid: z1pw9zb3 The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) utilizes TMPRSS2 receptor to enter target human cells and subsequently causes coronavirus disease 19 (COVID-19). TMPRSS2 belongs to the type II serine proteases of subfamily TMPRSS, which is characterized by the presence of the serine-protease domain. TMPRSS4 is another TMPRSS member, which has a domain architecture similar to TMPRSS2. TMPRSS2 and TMPRSS4 have been shown to be involved in SARS-CoV-2 infection. However, their normal physiological roles have not been explored in detail. In this study, we analyzed the amino acid sequences and predicted 3D structures of TMPRSS2 and TMPRSS4 to understand their functional aspects at the protein domain level. Our results suggest that these proteins are likely to have common functions based on their conserved domain organization. Furthermore, we show that the predicted 3D structure of their serine protease domain has significant similarity to that of plasminogen which dissolves blood clot, and of other blood coagulation related proteins. Additionally, molecular docking analyses of inhibitors of four blood coagulation and anticoagulation factors show the same high specificity to TMPRSS2 and TMPRSS4 3D structures. Hence, our observations are consistent with the blood coagulopathy observed in COVID-19 patients and their predicted functions based on the sequence and structural analyses offer avenues to understand better and explore therapeutic approaches for this disease. ILY/SCOP database accession: 81995) (Gough et al., 2001) . This domain is likely 160 to adopt the Human T-cell Leukemia Virus Type II Matrix Protein structural 161 fold according to CATH database annotation (Dawson et al., 2017) . However, we 162 could not detect homologous sequences for this region in viral genome restricted 163 searches using a PSI-BLAST at NCBI, neither HHPred search predicted similar 164 structures in the PDB database (Hildebrand et al., 2009; Johnson et al., 2008) . 165 Therefore, experimental analyses are required to address the functional aspects 166 of both predicted domains. Figure 1B ). The LDLRA domain is located right next to the membrane helix in 176 both proteins, which is consistent with previous studies (Aberasturi and Calvo, 177 2015; Szabo and Bugge, 2008) . This domain contains six cysteine disulfide-bonds 178 that bind lipoproteins such as LDLs and a highly conserved cluster of negatively 179 charged amino acids (Bieri et al., 1995; Yamamoto et al., 1984) . All six cysteines 180 are conserved in TMPRSS2 and its mouse ortholog, and four are also conserved in with certainty but they are likely to mediate protein-protein interactions and 191 ligand binding (Hohenester et al., 1999; Resnick et al., 1994) . This domain is 192 found in diverse secreted and membrane bound proteins including regulators of 193 5 the complement cascades involved in immune response (Freeman et al., 1990) . 194 The catalytic triad of Ser-His-Asp residues responsible for its proteolytic activity 195 is conserved in human and mouse ( Figure 1B ). Overall, these results reveal that the extracellular region of TMPRSS2 and 197 TMPRSS4, and its domain organization is highly conserved suggesting that they 198 have related functions. The structures 2XRC, 1Z8G, and 2OQ5 were the most closely related to the 208 extracellular region of both proteins ( Figure 1C ). TMPRSS2 showed the best 209 match with the 2XRC structure of Human complement factor I encoded by (Table 1) . It is noteworthy, 219 that not a single structural match was obtained for the cytoplasmic region of 220 both proteins, even when their amino acid sequences were queried alone. The high sequence similarity of TMPRSS2 and TMPRSS4 with known 222 structures in the PDB database allowed modelling their 3D structure using 223 Phyre2 web server (Kelley et al., 2015) . In both sequences, about 86% of the 224 residues were modelled at more than 90% confidence. The 67 and 61 residues at 225 the N-terminal ends of TMPRSS2 and TMPRSS4, respectively, were modelled 226 ab initio due to the lack of homology with known structures. Therefore, we 227 removed the coordinates of the first 125 and 61 amino acids from the predicted 228 structures of TMPRSS2 and TMPRSS4 due to their low confidence prediction. (Table 2) . Aprotinin also inhibits plasma kallikrein and thrombin, which are involved in 300 blood coagulation and showed a reasonable structural match with TMPRSS2 and 301 TMPRSS4 serine protease domains. Therefore, we assumed that their selective 302 ligands can also inhibit the activity of TMPRSS2 and TMPRSS4. Structures However, we did not find homologous sequences for this region in viral genome 366 restricted searches using a PSI-BLAST at NCBI, neither using HHPred searches 367 in the PDB database (Hildebrand et al., 2009; Johnson et al., 2008) . This is 368 likely due to the fact that structures are more conserved than the sequences and 369 9 hence it is often observed that the same fold is adopted by proteins even though 370 amino acid sequences differ substantially. Therefore, experimental analyses are 371 required to address the functional aspects of both predicted domains considering 372 their possible roles in viral assembly and trafficking (Christensen et al., 1996; 373 Hughes and Stephens, 2008). The TMPRSS2 and TMPRSS4 3D structures are not available. Hence, we 419 first identified their structural homologs using HHPred, and also modelled 3D 420 structures using Phyre2. We identified 29 structural homologs from 10 species 421 with both approaches. These 3D structures were mapped to the SCRC and 422 serine-protease in both proteins, but no match was found for the LDLRA domain. triad were almost identical. It is also shown that the plasminogen shows 95% 433 identity within the S1-S1' subsites of TMPRSS2, which are used for cleavage 434 of the SARS-Cov-2 spike protein, and 64.71% within S4-S4' subsites, which 435 were the highest among 14 serine proteases including TMPRSS15 which was 436 selected for homology modelling of the TMPRSS2 structure (Huggins, 2020) . This suggests that the serine-protease domains of the TMPRSS2 and TMPRSS4 438 are likely to have a protease activity similar to that of plasminogen, which 439 dissolves the fibrin of blood clots (Storti and Szwast, 1982) . Hence, these Financial support to VYM and AV-E was provided by IA203920 and IN229620 Tang, J., Yu, C.L., Williams, S.R., Springman, E., Jeffery, D., Sprengeler, Note: TMPRSS2 and TMPRSS4 docking scores are based on predicted strcutures. 4BXW, 5UGG, and 2ANY are PDB structure identifiers, whose protease domains were used for docking. All these structures are part of the blood coagulation factors (annotated in brackets) a Peptide like selective thrombin inhibitor; b Plasminogen inhibitor; c Plasma kalikrein inhibitor; Figure 1 : Sequence comparison of TMPRSS2 and TMPRSS4 with their mouse orthologs Tmprss2 and Tmprss4. Panel A shows domain architectures of TMPRSS2 and TMPRSS4 proteins. The following domains with known activity are shown with their amino acid positions: Low-density lipoprotein receptor class A (LDLRA_2), Scavenger receptor cysteine rich (SRCR_2), and Serine proteases, trypsin family (Trypsin). Multiple sequence alignment between human TMPRSS2 and TMPRSS4 with their mouse orthologs is shown in panel (B). The approximate location of the domains shown in (A) are indicated by bars on top of the multiple alignment in (B) with the same color code in (A). The location of the triad of Ser-His-Asp, responsible for the proteolytic activity of the Trypsin domain is conserved in all four sequences and indicated by asterisks in (B). The serine protease domain and its active sites are conserved in all proteins. The LDLRA_2 domain appears to be functional in TMPRSS2 but it is truncated in TMPRSS4, which may have the cholesterol transport activity. Panel (C) shows the structural homologs of TMPRSS2 and TMPRSS4 in the PDB database. . The docking score of OGJ with TMPRSS2 was -4.851 (kcal/mol), and with TMPRSS4 was -8.980 (kcal/mol), whereas with the inhibitor 89M was -4.332 (kcal/mol) and -8.564 (kcal/mol), respectively. Hydrogen bonds are indicated by pink arrows and cation-pi interactions by red arrows. Superimposition of the protease domain of predicted 3D structures of TMPRSS2 and TMPRSS4 with the scavenger receptor cysteine-rich (SRCR) domain of 1Z8G PDB structure. TMPRSS2 (A) and TMPRSS4 (B) SRCR domains are shown Note: TMPRSS2 and TMPRSS4 protein sequences were searched against the entire PDB database using HHPred web-server. The top 20 hits for each protein are reported in the table with their PDB and chain identifiers. a Probability of target to be a true positive; b The number of hits one can expect by chance with a score better than the one for the target when scanning the datbase; c Raw sequence similarity score; d Secondary structure similarity score between query and target; e Range of aligned match states from query HMM; f Range of aligned match states from target HMM; 30 A) TMPRSS2 Supplementary figure 1: The position of membrane helix in TMPRSS2 and TMPRSS4. The combined analysis report obtained from the TOPCONS web-server is shown, in which lower G (free energy) values represent amino acids that are likely to be part of the trans-membrane helix. The thick red and blue lines represent the inside and outside topology of the protein, respectively. A transmembrane helix is predicted at the N-termini in TMPRSS2 (A) and TMPRSS4 (B) protein sequences.