key: cord-0853611-csgmzycm authors: Mamun, Abdulla Al; Akter, Farjana; Khan, Maksud; Ahmed, Sayeda Samina; Uddin, Md. Giash; Tasfia, Nabila Tabassum; Efaz, Faiyaz Md.; Ali, Md Ackas; Sultana, Mossammad Umme Chand; Halim, Mohammad A. title: Identification of potent inhibitors against transmembrane serine protease 2 for developing therapeutics against SARS-CoV-2 date: 2021-09-30 journal: Journal of biomolecular structure & dynamics DOI: 10.1080/07391102.2021.1980109 sha: 33eeeb1656f499e7a34e14cfdeb81479286f0155 doc_id: 853611 cord_uid: csgmzycm In viral binding and entry, the Spike(S) protein of SARS-CoV-2 uses transmembrane serine protease 2 (TMPRSS2) for priming to cleavage themselves. In this study, we have screened ‘drug-like’ 7476 ligands and found that over thirty ligands can effectively inhibit the TMPRSS-2 better than the control ligand. Finally, the three best drug agents L1, L2, and L6 were selected according to their average binding affinities and fitting score. These ligands interact with Asp435, Cys437, Ser436, Trp461, and Cys465 amino acid residues. The three best candidates and a reported drug Nafamostat mesylate (NAM) were selected to run 250 ns molecular dynamics (MD) simulations. Various properties of ligand-protein interactions obtained from MD simulation such as bonds, angle, dihedral, planarity, coulomb, and van der Waals (VdW) were used for principal component analysis (PCA) calculation. PCA discloses the evidence of the structural similarities to the corresponding complexes of L1, L2, and L6 with the complex of TMPRSS2(TM) and Nafamostat mesylate (TM-NAM). Moreover, Quantitative structure-activity relationship (QSAR) pattern recognition was generated using PCA for the investigation of structural similarities among the selected ligands. Multiple Linear Regression (MLR) model was built to predict the binding energy compared to the binding energy obtained from molecular docking. The MLR regression model reveals an accuracy of 80% for the prediction of the binding energy of ligands. ADMET analysis demonstrates that these drug agents are appeared to be safer inhibitors. These three ligands can be used as potential inhibitors against the TMPRSS2. Communicated by Ramaswamy H. Sarma A major viral respiratory disease appeared in Wuhan, Hubei Province, China, in December 2019 (Jaimes et al., 2020; Wu et al., 2020; Yuan et al., 2020) . The initial cluster of infections known as Coronavirus infected not only animals but also humans. Later, it occurred transmission at human-human and spread quickly from China to over the world Gorbalenya et al., 2020; . This new corona type virus is known as Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) since eighty-two percent of similar RNA genome analogous found to the SARS coronavirus (SARS-CoV), and both viruses belong to the Betacoronavirus b gene clade (L. Zhang et al., 2020) . The human host cell transmembrane protease serine 2 (TMPRSS2, Uniprot-O153934) which belongs to the transmembrane serine protease (II) family is found in epithelial cells of the human respiratory and gastrointestinal tracts (Bertram et al., 2012; Shrimp et al., 2020) . It plays an important role in viral entry and spread in the host (Iwata-Yoshikawa et al., 2019) . The predominant coronavirus antigen is a surface spike glycoprotein (S) which facilitates viral entry by activating the host receptor and negotiating virus-host membrane fusion . A two steps mechanism is followed here to activate SARS CoV-2. In the first step, facilitating host-cell entry, viral entry hemagglutinin protein is connected with angiotensin-converting enzyme 2 (ACE2), encoded by the ACE2 gene, that is expressed on respiratory epithelial cells. In the second step, cleavage of hemagglutinin is occurred to activate the internalization of the virus. This second step is proceeded through proteases particularly TMPRSS2, on the host cell (Walls et al., 2020) . It is also assumed that the S protein of SARS CoV-2 is activated by TMPRSS2 and furin. SARS-CoV-2 S protein is cleaved by furin at the S1/S2 site. The S2 0 site of S protein is cleaved by TMPRSS2 (Bestle et al., 2020) . Here, spike protein interacts with ACE2 as the passage receptor and utilizes the cell serine protease TMPRSS2 for S protein preparation Kumar et al., 2020) . Thus, TMPRSS2 is also responsible for SARS-CoV-2 replication. It has already been shown that TMPRSS2 and HAT split and activate 229E-for cathepsin L-host cell entry. TMPRSS2 triggers diverse human respiratory viruses, therefore, it may stimulate viral propagation in humans (S. Bertram et al., 2013) . Nafamostat mesylate, a serine protease inhibitor, inhibits TMPRSS2-dependent host cell entry of MERS-CoV (Yamamoto et al., 2016) . Recently, SARS-CoV-2 infection of human lung cells is highly blocked by nafamostat mesylate than camostat mesylate while these compounds had inactive against vesicular stomatitis virus infection . In light of the proven safety of nafamostat mesylate, we consider this as a standard drug for COVID-19 treatment. This study is conducted to screen a large chemical database with the final goal of identifying the best ligands for design an effective drug for COVID-19. We explore potent ligand candidates, which are found to be more effective than nafamostat mesylate to inhibit the activation of TMPRSS2, and this is primarily responsible for the entrance of SARS CoV-2 ( Figure 1 ). The 3D structure of the TMPRSS2 protein was unavailable in the PDB database (http://www.rcsb.org/) (Berman et al., 2002) . Therefore, the amino acid sequences of the protein were collected from the UniProt database (UniProtKB ID O15393) (Apweiler, 2009) , and the 3D structure of TMPRSS2 was generated by using the SWISS-MODEL server (Waterhouse et al., 2018) . Further, the structure validation of the predicted model was investigated by the Phi/Psi Ramachandran plot using PROCHECK Figure 3 (R. A. Laskowski et al., 1993) . The quality of the protein model was calculated by employing the Protein Structure Analysis (ProSA) Figure S1 (Wiederstein & Sippl, 2007) . Then, the generated structure of targeted protein TMPRSS2 was modelled using the template of Serine protease hepsin (PDB ID:5CE1) (Meng et al., 2020) as shown in Figure S2 . Moreover, Secondary structure analysis carried out by discovery studio and PDBsum shown in Figure 2A and 2B (Laskowski et al., 2005) . Finally, TM-align algorithm was also employed and the best model was selected based on TM-score (Y. Zhang & Skolnick, 2005 ). The predicted model structure does not provide information on the location of hydrogens, commonly referred to as the protonation state. The accurate prediction of the correct protonation state, especially within the binding interface, is crucial to accurately predict the correct binding mode and, binding affinity (Onufriev & Alexov, 2013; Petukh et al., 2013) . The protonation state of ionizable residues (Asp, Glu, Arg, Lys, Tyr, His, Cys) was set under pH 6.5 based on the pKa values calculated by the Hþþ server (Anandakrishnan et al., 2012; Gordon et al., 2005) . Total 'drug-like' 7476 ligands were predicted against the TMPRSS2 gene (UniProtKB ID O15393) in the ZINC15 database (http://www.zinc15.docking.org/) (Apweiler, 2009; Glowacka et al., 2011; Sterling & Irwin, 2015) . Zinc15 performed the prediction with a combined method of Similarity Ensemble Approach (SEA) and maximum Tanimoto similarity. It also uses compound annotations from ChEMBL Version 21 (Irwin et al., 2018) . In addition, these ligands have the formula mass of less than 700 g/mol, calculated partition co-efficient (clogP) ranges from À1 to À5, and hydrogen bond donors and acceptors are less than 5 and 10, respectively. To optimize the ligands, UFF (universal force field) (Artemova et al., 2016; Zorn et al., 2008) was used with the steepest descent optimization algorithm along with the 2000 number of minimization steps. Using MGL software package in AutoDock Tool (ADT), the protein and ligands PDB files are converted into pdbqt formats. After preparing of protein and ligands, the grid box of AutoDock Vina (using PyRx-0.8 version) was created counting the active binding sites of the targeted TMPRSS2 Model (TM) in which centers were 15.98 Å, -7.27 Å, 24.45 Å, and the dimensions were 18.33 Å, 19.42 Å, 17. 66 Å for X, Y, Z directions respectively. The exhaustiveness number is 8, and nine binding modes for every tested ligand. The ligand's binding affinities were measured in kcal/mol as a negative score unit in which the higher negative score was equal to the more binding affinity. The preliminary virtual screening of 7476 ligands was conducted by AutoDock Vina (Trott & Olson, 2010) docking protocol. Molecular docking simulation of every drug candidate was performed against the targeted active binding sites (HIS296, ASP435, SER436, CYS437, GLN438, GLY439, TRP461, GLY462, GLY464, and CYS465) of TM protein, and the binding sites were collected from various literatures (Hempel et al., 2021; Hussain et al., 2020; Kumar et al., 2020; Singh et al., 2020) . According to the measured binding affinity, the top thirty-one potential drug candidates were selected for further study. To get more insight about the docking analysis of top thirty-one ligands obtained AutoDock Vina, GOLD 5.7 (Genetic Optimization for Ligand Docking) docking (Jones et al., 1997) was also performed to support the AutoDock Vina results. The Hermes visualizer in the GOLD suite was applied to prepare the ligand and protein. For further analysis of docked protein-ligand complexes, Discovery Studio Software (version 19.1.0.18287) was used. AutoDock Vina result showed that L1 (ZINC000116505796, (R)-N-(4-carbamoylbenzyl)-1-(1-hydroxy-2naphthoyl) pyrrolidine-2-carboxamide), L2 (ZINC000436573789, (R)-N-(3-carbamoylbenzyl)-1-(1-hydroxy-2-naphthoyl)pyrrolidine-2carboxamide) and L6 (ZINC000026291155, N-(4,6-diphenylpyrimidin-2-yl)cyclohexanecarboxamide obtained more binding affinity as well as non-covenant interaction compared to NAM (Nafamostat). Moreover, the pKa values of the potent three ligands were calculated by the chemaxon server (https://www. chemaxon.com) shown in Table 1 . Therefore, these three and NAM were selected for molecular dynamics simulation. Molecular dynamics simulation was performed using the YASARA Dynamics software (Krieger et al., 2004) for Apo form (without ligand) and holo forms (ligand-protein) of TMPRSS2 model to understand their conformational dynamics. The AMBER14 force field was used for all calculations (Dickson et al., 2014 ). The total system had 52, 583 atoms where water molecules were included, which was neutralized by adding NaCl salt at 0.9% concentration (Krieger et al., 2006) at 298 K temperature. At physiological conditions, the Berendsen thermostat was used to control the simulation temperature and it was carried to the short-range Van der Walls and Coulomb interactions, and long-range electrostatic particle-mesh Ewald (PME) method (Darden et al., 1999) was utilized at a cut-off radius of 8 Å. For the simulation, the periodic boundary condition was included, where the box size was (88.42Åx76.53Åx76.89 Å). Using the time stage 1.25 fs (Krieger & Vriend, 2015) , simulation snapshots were saved at every 100 ps. The primary energy minimization process of each simulation system was continued to carry out using the simulated annealing method, the steepest gradient approach was applied in 5000 cycles. Molecular dynamics simulations were performed over 250 ns at constant pressure and Berendsen thermostat. Most favored regions are represented by colored red and additional allowed, generously allowed, and disallowed regions are designated as yellow, light yellow, and white fields, respectively. Energy, bond distance, bond angle, dihedral angle, hydrogen bond, solvent accessible surface area (SASA), radius of gyration (Rg), root mean square fluctuation (RMSF), and root mean square deviation (RMSD) values for backbone and heavy atoms were collected throughout the analysis for further calculations. PCA is one of the most recognizable and best-known techniques of multivariate analysis. In our study, the main objective of applying this method is to express the correlation among the variables in the collected energy profile of MD trajectory data (R. Islam et al., 2020) . PCA represents the multivariate response that is arranged in an X matrix into a product of two new matrices as indicated in the following equation: Here, T k is the matrix of scores which represents how samples relate to each other, P k is the matrix of loadings which contains information about how variables relate to each other, k is the number of factors included in the model and E is the matrix of residuals, which contains the information not retained by the model. Different energy profiles may be observed between complexes of four ligands with protein and main protein, i.e. apo-protein during MD simulation. Through the PCA algorithm, the difference in energy profile can be identified (J. Islam et al., 2019) . R platform was used to perform all calculations using in-house developed codes (Team, 2014) . The ggplot2 package (Wickham, 2008) was used to generate the plot. Before applying the PCA algorithm (Martens & Nï, 1992) , autoscale function was used to preprocess data. The PCA was analyzed by taking the final 150 to 250 ns of MD trajectory data. Then binding free energy was calculated using PRODIGY-LIG webserver (Vangone et al., 2019) . In this server, they predicted the binding free energy based on the number of atomic contacts (ACs) within the distance threshold of 10.5Å and classified the ACs according to the atom involved in the interaction (C ¼ Carbon, O ¼ Oxygen, N ¼ Nitrogen, X ¼ all other atoms). The calculation was conducted for the last 100 ns from MD simulation results of selected three and NAM complexes. The free energy calculation results were obtained using this equation 1: where, AC NN , AC CN, and AC XX, are the ACs between Nitrogen-Nitrogen, Carbon-Nitrogen, and between all other atoms and polar hydrogens, respectively. DGnoelec is the electrostatic energy calculated through the HADDOCK (van Zundert & Bonvin, 2014) refinement protocol. Out of 7476 ligands, the top potential ligands were selected based on the binding energy for the QSAR study. Initially, 21 ligands were arbitrarily selected as the 'training set' and 10 ligands were utilized as the 'test set' or 'validation samples'. Multiple linear regression (MLR) analysis was applied to find the fundamental correlation between the structure-activity relationships with the calculated binding energy values obtained from molecular docking (Fakayode et al., 2009; 2014; Liu et al., 2017) . In addition to binding energy, some other important variables like TPSA (topological polar surface area, Å 2 ), molecular weight, XLogPs-AA, ROTB count, number of H, C, O, Cl, N, and F atoms, single bonds (SB) count, double bonds (DB) count, and the number of benzene ring were taken into consideration. MLR data processing was carried out using the XLSTAT software package (Adinsoft, 2010) . The best possible pharmacokinetics and lethality profile alongside viability are the significant determinants owing to effective medication improvement (Dirar et al., 2016) . ADMET (Absorption, Distribution, Metabolism, Excretion & Toxicity) study validates the pharmacokinetics parameters of a drug molecule (Nisha et al., 2016) . ADMET structure-activity relationship database, known as AdmetSAR is an online service that gives the data in regards to the lethality, cancer-causing nature of the medications and discovers whether the potential drug candidates follow the Lipinski Rule. In our study, admetSAR 2.0 (http://lmmd.ecust.edu.cn/admetsar2/) was used as a platform to predict ADMET properties. The server was being used by uploading structure data file (sdf) and simplified molecular-input line-entry system (SMILES) strings for the modified drug. Here, the simplified molecular-input line-entry system (SMILES) of the settled ligands was submitted in the admetSAR program to check for lethality. ADMETrelated properties were calculated by counting the molecular formula (MF), the blood-brain barrier (BBB), Human either-ago-go inhibition (hEGI), Human intestinal absorption (HIA), Pglycoprotein inhibitor (PGI), Alogp, Caco-2, Carcinogenicity (binary), mol MW. Due to having significant impacts on the performance and pharmacological activity of the drug of these properties, they were considered in our study. Model validation is an essential part of homology modeling. The secondary structure analysis shows that the similarity between TMPRSS2 and hepsin is 33.82% with a Q mean value of -1.43. The structural analysis shows that there are 5 sheets, 10 helices, 6 disulphides, and 21 strands are present in the model structure of TMPRSS2 as shown in Figure 2A and 2B. The stereochemical qualities of the predicted model of TMPRSS2 protein obtained from the SWISS-MODEL server are validated by generating a Ramachandran plot using the PROCHECK (R. A. Laskowski et al., 1993; Waterhouse et al., 2018) . The shading on the Ramachandran plot showing in Figure 3 for TMPRSS2 protein is used to indicate different regions such as the red areas reveals the most favorable region for phi-psi combination. In case of better stereochemical quality of any model, maximum residues of protein (about 90%) in the core regions should be found. The plot statistics for TMPRSS2 protein showed 253 residues (84.6%) in the most favored regions, 44 residues (14.7%) are in the additional allowed regions, and also 1 residue (0.3%) in generously allowed regions whereas only 1 residue (0.3%) is in disallowed regions showing in Figure 3 and Table S1 . Table S2 represents the main and side chain parameters which also indicates the accuracy of the structure prediction. Other main chain parameters including omega angle standard deviation, bad contacts/100 residues, zeta angle standard deviation, hydrogen bond energy standard deviation, and overall G factor are determined by PROCHECK. These results also reveal that the SWISS-MODEL generated structure has better structural qualities (Table S2) . Side-chain parameters are summarized as chi1-chi2 side-chain torsion angle combinations for all residues in Table S3 indicate SWISS-MODEL generated structure exhibited a higher score in terms of overall structural accuracy. A better quality structure typically contains a much lower residue fraction in disallowed regions (Gunasekaran et al., 1996) . Therefore, the stereochemical quality of the predicted TMPRSS2 protein model is considered appropriate. The ProSA server is used to recognize the potential errors in the predicted model. Z-score obtained from this server represents the quality of the whole model and the total energy deviation of the structure can be measured from an energy distribution that is derived from random conformations. The Z-score obtained from the server is plotted compared to experimentally found Z-scores of all similar size proteins through NMR and X- ray (Wiederstein & Sippl, 2007) . The z-score obtained for TMPRSS2 protein is -8.67 denoting no significant deviation from the scores calculated for the proteins of closure structure of PDB database showing in Figure S1 . Finally, the model structure from SWISS-MODEL is also aligned with the PDB ID:5CE1 by using TM-align server, where amino acid residues are aligned relatively better with SWISS-MODEL (TM-score: 0.96; RMSD 0.91 Å) model. In summary, the results from PROCHECK, ProSA, and TM-align analysis clearly reported that the modeled structure of TMPRSS2 generated by using SWISS-MODEL program was superior in quality. Therefore, further study of the modeled structure of TMPRSS2 is carried out with SWISS-MODEL generated structure. In the modeled TMPRSS2 structure, Asp, Glu, Arg, Lys, Tyr, His, and Cys residues were protonated under pH 6.5 based on the pKa values. Further, during the protonation state, His296 residue switches to the Hip296 after protonation of the TMPRSS2 model which has a pKa value of 8.91. Histidine (His) residue exhibits three different conformations which can be protonated. The imidazole ring of the His side-chain can be protonated in a neutral confirmation at the e-nitrogen (HIE) or the d-nitrogen(HID) or in a charged (þ1) conformation (HIP) where both the eand d-nitrogens are protonated (Kim et al., 2013) . Structure-based virtual screening is performed to screen about 7476 ligands. A cut-up value -8.7 (the binding affinity of NAM, control ligand) is applied. Thirty-one ligands are shown better binding affinity than the control ligand and they are selected for further analyses. Again, selected 31 compounds were screened three times by AutoDock vina and reported their average binding affinity. At that time, the GOLD suit was used to support the Audock Vina docking results. Higher fitness scores in GOLD docking, as well as higher binding affinities in AutoDock, refer to the better docking interaction between ligand and protein. Three ligands L1, L2, L6, and NAM are docked with protonated protein TM to compare the binding affinity ( Table 2 ). The binding affinity and fitness score of the top thirty-one ligands are shown in and Table S4 . Besides, a standard TMPRSS2 inhibitor such as NAM is subjected to dock to compare the findings with the zinc database ligands. In this study, the binding affinity of 31 ligands is compared to NAM. Docking results show that three ligands have more binding affinities than NAM. The highest binding affinities are observed from the corresponding complexes of L1, L2, and L6 and while NAM shows the binding affinity of -8.5 kcal/mol. The results of the selected ligands are expected as the possible inhibitor of TMPRSS2, clearly representing the selected ligands will display stronger inhibitory effects than standard drugs. A significant number of non-covalent interactions such as hydrogen bond, hydrophobic and electrostatic interactions are detected in three ligands with TMPRSS2 as shown in Figure 4 and Table S5 . Among the selected three ligands, the L1 shows binding affinity -9.5 kcal/mol. In the TM-L1 complex, five hydrogen bonds, eight hydrophobic interactions stabilize the ligand. Three hydrogen bonds and nine hydrophobic interactions stabilize the TM-L2 complex where the TM-L6 complex shows four hydrogen bonds, eight hydrophobic and one electrostatic interaction. Two hydrophobic interactions and one electrostatic interaction are observed for the TM-NAM complex and it shows the lowest binding affinity -8.5 kcal/mol. In the TM-NAM complex, three hydrogen bonds (C-O … H-N) are detected with various amino acids including ASP435 (2.114), TRP461 (2.777), and GLY464 (2.314). Interestingly, it is observed that the ligands only show hydrogen bonds with the binding site residue ASP435, and GLY464 ( Figure 5A ). All ligands show important hydrophobic interactions with the active residues. More specifically, the best 3 ligands and NAM form important non-covalent bonds with common residues (TRP461 and GLY462 through pi-Alkyl interaction), and while LYS342, LEU 419 shows Pi-amide interaction ( Figure 5B ). 250 ns molecular dynamics simulation was performed for three protonated TM-ligand and TM-NAM complexes. Nafamostat complex (TM-NAM) is used as a control. To recognize the stability of the TM-ligand complexes, the RMSDs of the Ca atoms for the protein were measured and plotted. The protein's activity during the simulation time is depicted in Figure 6 and Table 3 . In Figure 6A , RMSDs of apo-form are reported and found lowest of any other complexes (average RMSD 2.79 Å). Lower RMSDs show greater stability. A higher fluctuation of RMSD value is observed in the TM-L6 complex in comparison to other complexes. The average RMSD value of TM-L6 is 8.26 Å. The average RMSD of TM-L2 is 3.81 Å, and little fluctuation is noticed in RMSD value, however, the ligand significantly changes its position from the initial binding pose over the simulation period ( Figure S4A ). It is also showed that the average RMSD value of TM-L1 is 3.70 Å, and it fluctuates very little throughout the MD simulation period. The average RMSD value of TM-NAM is detected at 6.49 Å. Figure 6B shows the RMSDs of the ligand. The average RMSD of the NAM ligand (average RMSD 3.16 Å) shows the lowest value compared to the average RMSD of other ligands including L1 (9.09 Å), L2 (7.86 Å), and L6 (7.74 Å). The RMSF of TM, TM-L1, TM-L2 and TM-L6 shows little fluctuation throughout the simulation period whereas TM- NAM shows the higher fluctuation ( Figure 6C ). RMSF helps to assess the dynamic activity of amino acid residues in the protein. The flexibility of amino acid residues is measured to provide a deeper insight into the degree to which protein flexibility is influenced by ligand binding. The RMSF of residues (146-255) fluctuated substantially during the interaction between protein and ligands. The average RMSF value of apo-protein, TM-L1, TM-L2, TM-L6, and TM-NAM are 1.82 Å, 1.78 Å, 1.84 Å, 2.89 Å, 3.74 Å respectively. For the majority of residues, a small RMSF (within 1.84 Å) in the TM-L2 complex is found, a high fluctuation of 2.74 Å is identified in the loop region adjacent to the residue 150, and a maximum fluctuation of 5.11 Å is noticed in the helix region of residues 220-230. Overall, RMSF of the residues, HIS296, ASP435, CYS437, GLN438, GLY439, TRP461, GLY462, GLY464, and CYS465 are lower for all ligand-bound complexes and apoprotein. It can also be said that the RMSF value of TM-L2 is lower than TM-NAM. The radius of gyration (R g ) is used to measure protein structural compactness. The lower degree of fluctuation in the simulation with its consistency implies the increased compactness and rigidity of a system. In Figure 6D ; the Rg of apo-protein (average Rg $22.06) is the most stable concerning the consistency of fluctuations in the simulation. The average Rg value of TM-L1, TM-L2, TM-L6, and TM-NAM is 22.29, 22.11, 23.72, 23 .09, respectively. Often the binding of a small molecule could have a significant impact on the structure of the protein and could alter SASA. For all complexes and apo-protein almost equal value of SASA is reported ( Figure 6E ). The average SASA value of apo-protein, TM-L1, TM-L2, TM-L6, and TM-NAM are 17414 Å 2 , 16757 Å 2 , 16890 Å 2 , 18169 Å 2 , 16853 Å 2 , respectively. Therefore, it can be proposed that the binding of TM-L2 may decrease the expansion of protein. Intermolecular hydrogen bonds play a role in conformational stability. For all complexes, the numbers of hydrogen bonds are determined. In Figure 6F , the total number of hydrogen bonds is calculated for all complexes. The average total hydrogen bond numbers for apoprotein, TM-L1, TM-L2, TM-L6, and TM-NAM complexes are 680, 637, 650, 687, and 631 respectively. For apo-protein, the highest number of hydrogen bonds is reported. The total hydrogen bond numbers found for TM-L2 are nearly analogous to TM-NAM. Over the MD simulation period, the hydrogen bond (HB) and hydrophobicity of the corresponding active residues are demonstrated in Figure S3 reveals that during simulation time ASP435 and GLY464 residues always interact with the ligand through HB. On the other side, SER436 and GLY462 amino acids also show the same interaction in MD simulation but exceptions are noticed at 20 ns for SER436 and 10 ns and 70 ns for GLY462 ( Figure S4 ). The TRP461, CYS437, and GLN438 residues show hydrophobic behavior almost continuously during MD simulation time. The rest of the residues are also shown hydrophobicity but comparatively less and follow the discontinuity. To evaluate the structural quality of protein-ligand complexes, principal component analysis (PCA) is utilized to explore the energy and structural information of four protein-ligand complexes along with apo obtained from MD simulation. The energy and structural information vary with some parameters including bond energies, bond angle energies, dihedral angle energies, planarity energies, Van der Waals energies, and Columbic interaction energies are here considered as a variable. Different cluster formation on the PCA plot as shown in Figure 7A reveals the similarity and dissimilarity between apo and protein-ligand complexes. The PC1 and PC2 score plots are considered in this study. From Figure 7A , it has been observed that the first principal component analysis (PC1) describes 45.53% of the variability and the second principal component (PC2) covers 23.74% of the variability. The scores plot shows that clusters of apo, L2 ligands are overlapped to each other to a great extent. In addition, by maintaining a small difference from apo and L2 clusters, clusters of L1 and NAM are also overlapped. Interestingly, it is shown from the loading plot ( Figure 7B ) that bond energies, bond angle, VdW are found in the same distinct area where the clusters are formed on the scores plot ( Figure 7B ). This reveals that these variables may have influenced the formation of different clusters in the scores plot. However, the cluster of L6 shows a noticeable deviation from apo's cluster. During MD simulation both TM-L6 complex exhibit the fluctuating nature that could be responsible for wider distribution in the score plot. Both the TM-L1 and TM-L2 complexes are in the lowest deviation compared to other complexes, especially the TM-L2 complex approximately overlaps with the cluster of apoprotein. Hence, it can be considered as a potential candidate. Figure 7C illustrates the binding free energy of all the ligands which are obtained from the prodigy-ligand web server. Here, the TM-L6 complex shows greater stability (around À8.9 kcal/mol) than any other complex. Generally, Quantitative structure-activity relationships studies have been broadly used popular theoretical method for modelling and pattern recognition analysis in drug design for the pharmaceutical industry, toxicology prediction of drug, clinical research, bioinformatics, petrochemical, food, and agrochemicals industry (Alam & Khan, 2017; Berhanu et al., 2012; Luco & Ferretti, 1997; Misra et al., 2017) . To get more insight into the correlation of molecular structure with biological and pharmaceutical activities, multiple linear regression (MLR) has been employed for further analysis. For developing multiple linear regression (MLR), all QSARs of ligands are significant. QSAR study on thirty-one ligands shows that molecular weight, TPSA (Å 2 ), XLogP3, H-bond donor count, and H-bond are vital QSAR contributors to the MLR listed in Table S6 and these may play an important role in binding affinity and non-covalent interaction against the TM. Principal component analysis (PCA) is used for revealing the pattern recognition of potential ligands Figure 8 . The first principal component (F1) describes 57.32% of the variability in ligand QSAR. The second principal component (F2) reveals 15.97% in QSAR variability of ligands. By close observation of the score plot, an interesting grouping is noticed. The ligands which have identical functional groups and rings are assembled on the score plot. Ligands L1, L2 and L6 show more binding affinity compared to NAM from molecular docking results. Thus, three ligands can be used to design a potential drug to target TM. The ligands L1, L2, L4, L5, L6, L10, L13, L16, L21, and L26 comprise some important groups and rings listed as amide and formamidine groups as well naphthalene, biphenyl, and pyrrolidine rings. These groups and rings are themselves bonded together through a single bond and connected to the benzene ring, hence the ligands are positioned together on the first and second quadrants of the QSAR pattern. It can be concluded that their position in the same quadrants is due to having structural similarities. Therefore, the binding energies are also observed in the range of -8.8 and -9.8 kcal/mol. It is explored that L1, L2, and L6 give the best performance against the targeted protein that supports the generated QSAR pattern. On the other side, L12, L14, L17, L24, L27, L30, and L31 are containing formamidine, diphenylmethyl and propylguanidine groups along with naphthalene, biphenyl, pyrrolidine, and benzothiazole rings. These chemical fragments are attached to the secondary amide group in the corresponding ligand structures and the ligands are clustered together on the third and fourth quadrants for their structural similarities. QSAR pattern reveals that L17 is quite different from the others located on the first quadrant which is compatible with the lower binding affinity obtained from molecular docking. The reason can be due to the presence of more branching in the ligand structure resulting in a larger size, therefore, the interaction between ligand and protein becomes less as ligand size does not fit the binding pocket of the targeted protein. In contrast, L14, L27, and L24 give comparatively more binding energy and lie in the same quadrant of the QSAR pattern. For instance, the presence of dioxosulfide and sulfur groups might enhance the possibility of more interactions. However, L6 is analyzed as the best candidate from molecular dynamics. The reason might be having of formamidine group which contains two nitrogen atoms and one double bond; hence, the density of electrons can be higher and can influence the physical and chemical properties. The main target of any MLR model is to predict the binding energy of future drug candidates. The training set is used to build an MLR model and therefore it is used to predict the binding energy of the test set for validation of drug candidates. The validation result shows similar binding energy obtained from molecular docking. BindingAffinity ¼ À9:35574 þ 0:04011 Â n N À 0:24726 Â n double bond À 0:15161 Â n single bond The binding energy obtained from molecular docking and the predicted binding energy by MLR regression for each ligand of validation ligands are exhibited in Table 4 . The capability of MLR regression to predict accurately the binding energy of the validated ligands is estimated by root-meansquare-relative percent errors (RMS%RE). The MLR regression revealed a high accuracy of 80% for the prediction of the binding energy of ligands compared with the binding energy obtained from molecular docking. Pharmacokinetic properties (absorption, distribution, metabolism, excretion, and toxicity) of the top-ranked three ligands are listed in Table 5 and the rest are in Table S7 by using the admetSAR server (http://lmmd.ecust.edu.cn/admetsar2/). Recently, drug discovery emphasizes pharmacokinetically good drugs. Structural optimization is an essential computational feature in the way to design a new drug. It is very important to optimize the pharmacokinetic parameters to find out a promising drug agent that can pass standard clinical trial criteria. Following the calculation of admetSAR, the selected three ligands are found to be non-carcinogenic as well as show good human intestinal absorption (HIA) which is a major step for transporting drugs to their targets. Again, the pharmacokinetics and pharmacodynamics profile of a drug is greatly affected by a drug transporter called Phosphorylated glycoprotein (P-gp) and also the absorption and bioavailability of a drug are boosted by the P-gp inhibitors. The ligands might have the potential to interact with p-glycoprotein which is characterized as ABC transporter. On the other hand, blood-brain barrier (BBB) is a serious obstacle to drug distribution to the central nervous system (CNS). Approximately the square route of molecular weight is inversely related to the BBB. The drugs having higher molecular weight cross the BBB sufficiently to affect the central nervous system (CNS) (Banks, 2009) . From the calculation, it has been found that the selected three ligands L1, L2, and L6 have relatively good BBB penetration probability and thus may not affect CNS function. From the observed data, it can be predicted that the selected ligands may be safe for humans. In this study, molecular docking, molecular dynamics, QSAR analysis, PCA, and ADMET tools are employed to detect potent ligands for TMPRSS2, analyzing 7476 ligands from the zinc database. A strong binding affinity is obtained for three ligands (L1, L2 and L6). Significant number of non-covalent interactions such as hydrogen bonding, hydrophobic and electrostatic interactions are detected. MD simulation is conducted to verify the stability of ligands, and found to be more stable, making more hydrogen bond interactions with TMPRSS-2, than the control NAM ligand. Here, L1, L2, L4, L5, L6, L10, L13, L16, L21, and L26 ligands contain amide as well as formamidine groups, naphthalene, biphenyl, and pyrrolidine rings. These groups and rings are themselves bonded together through a single bond and connected to the benzene ring, hence the ligands are positioned together on the first and second quadrants. On the other side, L12, L14, L17, L24, L27, L30, and L31 ligands have formamidine, diphenylmethyl and propylguanidine groups along with naphthalene, biphenyl, pyrrolidine, and benzothiazole rings. These chemical fragments are attached to the secondary amide group in the corresponding ligand structures and therefore, the ligands are clustered together on the third and fourth quadrants. The capability of MLR regression to predict accurately the binding energy of the validated ligands is measured by root-mean-square-relative percent errors (RMS%RE). The MLR regression revealed a high accuracy of 80% for the prediction of the binding energy of ligands compared with the binding energy obtained from molecular docking. It can be concluded that most of the selected ligands have shown promise and can be used to design effective antiviral drugs against the SARS-CoV-2. XLSTAT-software, version 10 3D-QSAR studies on Maslinic acid analogs for anticancer activity against breast cancer cell line MCF-7. Scientific Reports Hþþ 3.0: Automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations The universal protein resource (UniProt) in 2010 Automatic molecular structure perception for the universal force field Characteristics of compounds that cross the bloodbrain barrier Quantitative structure-activity/property relationships: The ubiquitous links between cause and effect The protein data bank TMPRSS2 activates the human coronavirus 229E for cathepsin-independent host cell entry and is expressed in viral target cells in the respiratory epithelium Influenza and SARS-coronavirus activating proteases TMPRSS2 and HAT are expressed at multiple sites in human respiratory and gastrointestinal tracts TMPRSS2 and furin are both essential for proteolytic activation of SARS-CoV-2 in human airway cells New tricks for modelers from the crystallography toolkit: The particle mesh Ewald algorithm and its use in nucleic acid simulations Lipid14: The amber lipid force field In silico pharmacokinetics and molecular docking of three leads isolated from Tarconanthus camphoratus L Chemometric approach to chiral recognition using molecular spectroscopy Determination of boiling point of petrochemicals by gas chromatography-mass spectrometry and multivariate regression analysis of structural activity relationship Evidence that TMPRSS2 activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response The species severe acute respiratory syndrome-related coronavirus: Classifying 2019-nCoV and naming it SARS-CoV-2 Hþþ: A server for estimating pKas and adding missing hydrogens to macromolecules Disallowed Ramachandran conformations of amino acid residues in protein structures Molecular mechanism of inhibiting the SARS-CoV-2 cell entry facilitator TMPRSS2 with camostat and nafamostat SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Nafamostat mesylate blocks activation of SARS-CoV-2: New treatment option for COVID-19 Molecular docking between human TMPRSS2 and SARS-CoV-2 spike protein: Conformation and intermolecular interactions Predicted biological activity of purchasable chemical space Prediction of deleterious non-synonymous SNPs of human STK11 gene by combining algorithms, molecular docking, and molecular dynamics simulation A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2 TMPRSS2 contributes to virus spread and immunopathology in the airways of murine models after coronavirus infection Phylogenetic analysis and structural modeling of SARS-CoV-2 spike protein reveals an evolutionary distinct and proteolytically sensitive activation loop Development and validation of a genetic algorithm for flexible docking Effects of histidine protonation and rotameric states on virtual screening of M. tuberculosis RmlC Making optimal use of empirical energy functions: force-field parameterization in crystal space New ways to boost molecular dynamics simulations Fast empirical pKa prediction by Ewald summation Withanone and Withaferin-A are predicted to interact with transmembrane protease serine 2 (TMPRSS2) and block entry of SARS-CoV-2 into cells PDBsum more: New summaries and analyses of the known 3D structures of proteins and nucleic acids PROCHECK: A program to check the stereochemical quality of protein structures Synthesis, fungicidal activity, and structure activity relationship of b-acylaminocycloalkylsulfonamides against botrytis cinerea QSAR based on multiple linear regression and PLS methods for the anti-HIV activity of a large group of HEPT derivatives Multivariate calibration The transmembrane serine protease inhibitors are potential antiviral drugs for 2019-nCoV targeting the insertion sequence-induced viral infectivity enhancement Pharmacophore modelling, atom-based 3D-QSAR generation and virtual screening of molecules projected for mPGES-1 inhibitory activity Docking and ADMET prediction of few GSK-3 inhibitors divulges 6-bromoindirubin-3-oxime as a potential inhibitor Protonation and pK changes in protein-ligand binding The role of protonation states in ligand-receptor recognition and binding An enzymatic TMPRSS2 assay for assessment of clinical candidates and discovery of inhibitors as potential treatment of COVID-19 Structurebased drug repositioning over the human TMPRSS2 protease domain: Search for chemical probes able to repress SARS-CoV-2 Spike protein cleavages ZINC 15-ligand discovery for everyone R: A language and environment for statistical computing. R Foundation for Statistical Computing AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading Large-scale prediction of binding affinity in protein-small ligand complexes: the PRODIGY-LIG web server Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein SWISS-MODEL: Homology modelling of protein structures and complexes Applied spatial data analysis with r. In Applied Spatial Data Analysis with R ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins A new coronavirus associated with human respiratory disease in China Identification of nafamostat as a potent inhibitor of Middle East respiratory syndrome coronavirus S protein-mediated membrane fusion using the split-protein-based cell-cell fusion assay A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved a-ketoamide inhibitors TM-align: A protein structure alignment algorithm based on the TM-score An interface between the universal force field and the effective fragment potential method We are grateful to our donors who supported to build a computational platform in Bangladesh http://grc-bd.org/donate/. No potential conflict of interest was reported by the authors. The authors also like to acknowledge The World Academy of Science (TWAS) for providing funding (17-479 RG/CHE/) for this project.