key: cord-310201-70fj4fhr authors: Wei, D.-Q.; Zhang, R.; Du, Q.-S.; Gao, W.-N.; Li, Y.; Gao, H.; Wang, S.-Q.; Zhang, X.; Li, A.-X.; Sirois, S.; Chou, K.-C. title: Anti-SARS drug screening by molecular docking date: 2006-05-22 journal: Amino Acids DOI: 10.1007/s00726-006-0361-7 sha: doc_id: 310201 cord_uid: 70fj4fhr Starting from a collection of 1386 druggable compounds obtained from the 3D pharmacophore search, we performed a similarity search to narrow down the scope of docking studies. The template molecule is KZ7088 (Chou et al., 2003, Biochem Biophys Res Commun 308: 148–151). The MDL MACCS keys were used to fingerprint the molecules. The Tanimoto coefficient is taken as the metric to compare fingerprints. If the similarity threshold was 0.8, a set of 50 unique hits and 103 conformers were retrieved as a result of similarity search. The AutoDock 3.011 was used to carry out molecular docking of 50 ligands to their macromolecular protein receptors. Three compounds, i.e., C(28)H(34)O(4)N(7)Cl, C(21)H(36)O(5)N(6), and C(21)H(36)O(5)N(6), were found that may be promising candidates for further investigation. The main feature shared by these three potential inhibitors as well as the information of the involved side chains of SARS Cov Mpro may provide useful insights for the development of potent inhibitors against SARS enzyme. The outbreak of severe acute respiratory syndrome (SARS) creates need of anti-SARS drug (Drosten et al., 2003; Ksiazek et al., 2003; Shortridge, 2003; Miura et al., 2004) . A new coronavirus (SARS CoV), a positive-sense, single-stranded RNA virus Gao et al., 2003; Du et al., 2005a, b) , has been found as the etiological agent of the disease SARS. SARS CoV main proteinase (Mpro) cleaves the polyprotein at no less than 11 conserved sites, a process initiated by the enzyme's own autolytic cleavage from pp1a and pp1ab Yang et al., 2003) . The released polypeptides mediate all of the functions required for SARS viral replication and transcription. The functional importance of the Mpro in the viral life cycle makes it an attractive target for the drugs design against SARS Yang et al., 2003; Chou et al., 2003 Chou et al., , 2004 . Anand et al. (2003) proposed a homology model of the Mpro based on the experimental Mpro structures of human coronavirus (HCoV) and porcine transmissible gastroenteritis virus (TGEV) complex. The experimental structure of SARS CoV Mpro was first determined by Yang et al. (2003) . They provided several crystal structures of SARS CoV Mpro at different pH conditions and the complex structure of SARS CoV Mpro with ligand hexapeptidyl CMK Yang et al., 2003) . The experimental structure and the theoretical homology structure of SARS CoV Mpro share many structural similarities, such as the catalytic active region and the Cys-His cleavage dyad Yang et al., 2003) . However, there are several differences between the theoretical and the experimental structures. For examples, the experimental structure of SARS CoV Mpro is a dimer, the protomer A has catalytic activity, and the protomer B has no activity. However, the N finger of protomer B, which plugs in the active cleft of protomer A, plays an important role in both dimerization and maintenance of the active form of the enzyme (Yang et al., 2003) . These new findings from the experimental structure of SARS CoV Mpro provide a reliable basis for inhibitor design. Meanwhile, it was suggested that suggested that the rhinovirus 3Cpro inhibitor AG7088 could well serve as a starting point for an anti-SARS drug based on the theoretical homology model of SARS CoV Mpro. Chou et al. (2003) and Chou (2004) found the fitting problem of AG7088 to the binding pocket of SARS CoV Mpro, and they suggested its derivative KZ7088 as a better starting point. By conducting the virtual screening for SARS-CoV protease based on KZ7088 pharmacophore points, Sirois et al. (2004 Sirois et al. ( , 2005 found 7083 unique hits without volume constraints and 2589 unique hits with volume constraints. Virtual screening has the advantages of being less expansive and easier to perform than the real experiments in searching for lead compounds; i.e., calculations could be performed on compounds that are not yet purchased or synthesized (Rayer, 2002) . Therefore, it was widely used to find initial lead structures from a large collection of compounds. Besides the studies of Siroirs et al. (2004) , other groups also reported work of screening the inhibitor of SARS (Xiong, 2003) . In the present study, by means of systematic search and molecular docking, we filtered the 2,589 candidates retrieved by Sirois et al. (2004) , and found some interesting results as reported below. From the 2589 unique hits found by Sirois et al. (2004) , we obtained a collection of 1386 druggable compounds that both match the 3D KZ7088 pharmacophore points and satisfy the druggable rules with a scoring value !0.8. To further narrow down the investigation scope for molecular docking, a similarity search of the 1,386 compounds were performed by using KZ7088 (Chou et al., 2003; Chou, 2004) as the template molecule. The compounds thus obtained were placed in a database. The fingerprints (Sheridan et al., 1996; Brown and Martin, 1996) of the temperate molecule and each of these compounds in the database were calculated. The MDL MACCS keys were used to fingerprint the molecules. The MACCS fingerprint consists of a set of indicators showing whether each of the 166 MACCS keys was found to be present in a given molecule. The fingerprint is stored internally as a vector of indices, where the presence of an index in the vector indicates the presence of the corresponding substructure in the molecule. Once a fingerprint is derived from a chemical structure, a metric is needed to compare the fingerprints. A common metric is the Tanimoto coefficient (Morris et al., 1988) , which is a number between 0 and 1, where 0 means ''maximally dissimilar'' and 1 means ''maximally similar''. If the similarity threshold was assigned as 0.8, a set of 50 unique hits and 103 conformers were retrieved as an outcome of the similarity search. AutoDock 3.011 is a suitable software for performing automated docking of ligands to their macromolecular protein receptors. The individual components of the program include AutoTors, AutoGrid, and AutoDock. AutoTors defines which bonds in the ligand are rotatable, affecting the degrees of freedom (DOF) of the ligand and hence the complexity of computations. AutoGrid pre-calculates a three-dimensional grid of interaction energies based on the macromolecular target using the AMBER force field. AutoDock performs the task of the docking. First, the ligand moves randomly in any one of six degrees of freedom, namely 3 translation degrees and 3 rotation degrees , and the energy of the new ligand ''state'' is calculated. If the energy of the new state is lower than that of the old state, the new one is automatically accepted as the next step in docking as the procedure used in (Chou and Carlacci, 1991; Chou, 1992) . Since ligands are not peptides, Gasteiger charge was assigned and then non-polar hydrogens were merged. The rigid roots of each ligands were defined automatically instead of picking manually. The amide bonds were made non-rotatable. The experimental structure of SARS CoV Mpro was extracted from the RCSB Protein Data Bank whose PDB code is 1UJ1. The Mpro is a dimer where the protomer A has the catalytic activity while the protomer B has not. Accordingly, protomer B was removed. The Kollman charges were added to each atoms of the remained protomer A (Yang et al., 2003) . The GA-LS search algorithm (Larmarckian genetic algorithm) was chosen to search for the best conformers. During the docking process, a maximum of 50 conformers was considered for each compound (the default is 10 conformers). The initial position of each ligand was on the center of the box and oriented randomly. The parameters were set using the software ADT (Autodock Tool Kit) on PC which is associated with Autodock 3.0.3. The calculations of autogrid and autodock were performed on Sun Enterprise 10,000 Server with 16 CPUs and 64GB memory. There are two kinds of free energies output by Autodock. One is the predicted binding free energy that includes the intermolecular energy and torsional free energy, and the other the docking energy that includes the intermolecular and intramolecular energies. The former is only reported at the end of a docking operation while the latter is used for selecting better individuals of population during a docking run (Morris et al., 1988) . We used both energies, respectively, as the criterion for ranking. Moreover, there are some other factors that need to be examined for checking the screening results, such as the ligand's location, hydrophobic effects, steric complementarity, and size of the ligand. Therefore, after ranking the 20-top conformers according to their low energies, the ligands were further carefully checked according to the aforementioned factors as well. Furthermore, the intermolecular hydrogen bonds and electrostatic interaction, whose effects have already been counted in the binding energy, were also investigated in order to find useful information for drug design. As shown in Rank 1 of Table 1 , the docking results are ranked according to the ascent of the docking energies of the 50 conformers for each of the ligands investigated. It was found by closely viewing the top-20 compounds docked to Mpro that some conformers are outside the active region concerned, and some even far away from the pocket. When checking some other low docking energy conformers of the same ligands, such as ligand 30, 36, 62, 29, 33, 38, we found that many of them are not buried in the protein but exposed outside. These molecules are large in size and cannot stay comfortably in the incommodious pocket. In contrast, if ranking the docking results according to the binding free energy which includes the torsional term as shown in Rank 2 of Table 1 , it was found that most of the top-20 ligands interacted quite well with the receptor in the pocket. These top 20 ligands in Rank 2 are smaller in size than the top 20 ligands in Rank 1. An example is given in Fig. 1 to show the complex of docking ligand 7 to SARS Mpro. A comparison of the results between Ranks 1 and 2 suggests that the binding free energy is more reliable as a criterion for the virtual screening via molecular docking. It seems that the torsional term, i.e. the unfavourable entropy of ligand binding, plays an important role in ligandreceptor interaction. Especially for some large-sized li-gands which suffer from more loss of torsional freedom upon binding. If the volume of the ligand does not allow it get into the pocket, the corresponding interaction energy would be meaningless. The recurrence of identical conformations of one ligand means that it fits very well to the pocket and is likely to be a good inhibitor candidate. In the output dlg file of a docking run, clusters of conformations are sorted out according to their RMDS values. We found a cluster Table 1 . The marks in this plot are the same with the ones in Fig. 2 except that ligand 12 is rendered in purple and ligand 4 in blue. Their differences increase at the right ends of the molecules, which might be caused by the fact interactions are less specific of two conformations in ligand 7 with the cluster's RMDS tolerance set to 1.0 and some other conformations identical to that two, as shown in Fig. 2 . The first-ranked Fig. 4 . Hydrogen bonds between ligand 4 and the involved residues of SARS CoV Mpro. The purple dots denote the hydrogen bonds and LIG4 denotes ligand4 and VAL_186 and MET_165 stand for the involved residues respectively. All the atoms are shown in stick drawing and colored by atom type in which hydrogen is colored white, carbon gray, oxygen red, nitrogen blue, chlorine green and sulfur yellow Mpro which shows steric hindrances between them. Connolly surface is colored blue and the ligand is colored by atom type. As shown, some atoms of ligand have penetrated the molecular surface of Mpro causing strong steric hindrance. Note that this surface is atom scalar surface different from solvent excluded surface ligand 7 according to the order of Rank 2 is a probably a potent inhibitor. Also, a comparison for the binding structures among the best conformations of ligands 7, 12 and 4 (top three from ligand 7, top two from ligand 12, and top one from ligand 4) indicates that they are quite similar to each other, especially in the five-atom ring linked to an alkyl as shown in Fig. 3 . These findings may also be useful for structure-based drug design. The hydrogen bonds make important contributions to the interactions between ligand and receptor. They are observed in several of the conformations of the ligands whose complexes with Mpro were visually examined. In best four conformations of complexes formed by ligand 4 and Mpro, there are four, three, four, two hydrogen bonds formed, respectively. Those four hydrogen bonds involve Met-165, two involve Thr-175, two involve Val-186 and one involves His-164, His-172, Tyr-182. An example of hydrogen bonding between ligand 4 and the participating residues of Mpro is shown in Fig. 4 . And there are also some hydrogen bonds formed between ligands 7, 12 and Mpro respectively in which Gly-174 occurs for five times, His-164, Arg-40 and Tyr-182 twice and Thr-175, His-172 and Leu-50 once. From the frequency of the residue's occurrence in the formation of hydrogen bonding, His-164 plays an important role, which may be due to the polar hydrogens of its imidazole ring being a perfect hydrogen bond donor. And Gly-174, Met-165, and Thr-175 also take part in the hydrogen bonding with relatively high frequency. This information could be helpful for our drug design in which the potential drug should interact well with these residues, especially with His-164. His-164 is of great concern in anti-SARS drug design which is believed to be one of the catalytic dyad Du et al., 2005a, b) . Therefore it is important to have this imidazole ring locked such that its catalytic function is depressed. It was found there were 6 hydrogen bonds between KZ7088 and nearby residues of the receptor (Chou et al., 2003) . Among all the ligands screened we have not found anyone any formed more than 4 hydrogen bonds. In comparison with the aforementioned active sites, some residues involved in the hydrogen bonds such as Gly-174, Thr-175 and Arg-40 are out of this set. Nonetheless they are quite close to the residues of the active site. Besides hydrogen bonds, there are some other contributions to the total Gibbs free energy induced by the bonding, such as desolvation, hydrophobic interaction, and electrostatic interaction. The hydrogen bond may act as an ''anchor'', determining the 3D position of the ligand in the binding pocket, thus facilitating the hydrophobic interactions and electrostatic interactions In the output of Autodock's calculations, the potential energy of van der Waals interaction and electrostatic interaction is reported separately. So we selected a few groups of molecules with the same number of atoms but with obvious difference in energy whose van der Waals potentials and electrostatic potentials were added up. In the same group, the electrostatic interaction energy varies from one to another but it is too small in comparing with the total free energy. As we have calculated, the van der Waals energy plays a main part of the total energy. There are also some other factor which should be responsible for the remaining difference of energy, varies from 1 to 3 kal=mol. Then the desolvation term emerges as a possibility which is included in autodock's scoring function. As for the torsional free energy, it depends on the number of torsions of each molecule and how the molecule is aligned in the pocket. The solvent-excluded surface of the ligand-receptor complex is examined and shown in Fig. 5 . The hydrophobic areas of the ligand and Mpro are complement with respect to each other as we can see in the Fig. 5 . Thus the entropy of desolvation upon binding increases through the depart of water molecules from the molecular surface. The binding does not take place in vacuum but in solvation, so water is deeply involved and played an important role. The study of hydrophobic interaction is relevant to the protein-folding problem (Anfinsen, 1973) and hydrophobicity (Tanford, 1962) . The model of a folded protein with the hydrophobic amino acid side chains in the interior forming an oil drop, implying that as the protein folds the hydrophobic side chains were preferentially buried away from the external solvent. In our study, the process of the ligand-receptor surface buried in each other is just like the case of protein folding. So in further rational drug design it is critic to identify the hydrophobic groups of the ligand and receptor and ensure that they are facing each other upon binding. What we have discussed above is thermodynamics of ligand-receptor binding. Moreover, the other side of this binding -kinetics should also be taken into consideration. The kinetics of binding process is mainly determined by steric complementarity (Vaidyanathan et al., 2001; Chou and Zhou, 1982; Chou, 1975 Chou, , 1976 Chou and Jiang, 1974; Zhou and Zhong, 1982) . It is the steric complementarity that makes the cavity site acces-sible for the ligand. The Connolly surface of ligand-SARS Mpro is examined and shown in Fig. 6 , where ligand 4 forms four hydrogen bonds with SARS Mpro, and fully occupies the cavities and lies comfortably in the receptor's pocket. To give evidence for the key role of steric complementarity, a poor steric complementarity or steric hindrance is shown in Fig. 7 . Although there are three hydrogen bonds in the complex, its Gibbs free energy is still 10 kal=mol higher than that of ligand 4. That means steric complementarity is supposed to be responsible for the micro-mechanical interlock at molecular level which is suggested by Vaidyanathan et al. (2001) . The primary driving force for mechanical interlocking is the steric complementarity between the ligand and the collagen receptor site. Once mechanical interlocking is facilitated by steric complementarity, other binding interactions may also occur. Therefore our further drug design or modification of the lead compounds must first take the steric configuration into account. Three compounds, C 28 H 34 O 4 N 7 Cl, C 21 H 36 O 5 N 6, and C 21 H 36 O 5 N 6 , i.e., ligands 4, 7 and 12 of Table 1 , may be promising candidates for further modification and structure-based drug design. Further studies are currently under way in our laboratories. Coronavirus main proteinase (3CLpro) structure: basis for design of anti-SARS drugs Principles that govern folding of protein chains Use of structure-activity data to compare structure-based clustering. Methods and descriptors for use in compound selection ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes, and its applications in analyzing SARS-CoV genomes Studies on the enzyme kinetics of the cavity-active site The kinetics of the combination reaction between enzyme and substrate Energy-optimized structure of antifreeze protein and its binding mechanism Review: structural bioinformatics and its impact to biomedical science Simulated annealing approach to the study of protein structures Studies on the rate of diffusion-controlled reactions of enzymes Structure of beta-sheets: origin of the right-handed twist and of the increased stability of antiparallel over parallel sheets Origin of the right-handed twist of betasheets of poly-L-valine chains Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS Role of the protein outside active site on the diffusion-controlled reaction of enzyme Identification of a novel coronavirus in patients with severe acute respiratory syndrome Molecular modelling and chemical modification for finding peptide inhibitor against SARS CoV Mpro Application of bioinformatics in search for cleavable peptides of SARS-CoV Mpro and chemical modification of octapeptides Prediction for proteinase cleavage sitess in polyproteins of coronaviruses and its applications in analyzing SARS-CoV genomes A novel coronavirus associated with severe acute respiratory syndrome N-terminal domain of the murine coronavirus receptor CEACAM1 is responsible for fusogenic activation and conformational changes of the spike protein Automated docking using a Lamarckian genetic algorithm and empirical binding free energy function Geometric problems and algorithms in computer-aided molecular design Chemical similarity using geometric atom pair descriptors Severe acute respiratory syndrome and influenza: virus incursions from southern China Virtual screening for SARS-CoV protease based on KZ7088 pharmacophore points Assessment of chemical libraries for their druggability Contribution of hydrophobic interactions to the stability of the globular conformation of proteins Collagen-ligand interaction in dentinal adhesion: computer visualization and analysis The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor Diffusion-controlled reactions of enzymes. A comparison between Chou's model and Alberty-Hammes-Eigen's model The flexibility during the juxtaposition of reacting groups and the upper limits of enzyme reactions We thank Professor Arthur J. Olson for his kindness in offering us the AutoDock 3.0.3 program. This work was supported by the Tianjin Commission of Science and Technology under the contract No. 033801911 and the Special Fund for Intensive Computation from the Tianjin Commission of Education. Authors' address: Dong-Qing Wei, College of Life Science and Technology, Shanghai Jiaotong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China, Fax: þ86-21-3420-4717, E-mail: dqwei@sjtu.edu.cn