key: cord-0776107-oqxh7egj
authors: Eswar, N.; Sali, A.
title: Comparative Modeling of Drug Target Proteins
date: 2007-04-11
journal: Comprehensive Medicinal Chemistry II
DOI: 10.1016/b0-08-045044-x/00251-0
sha: bcddc07d63fd5b8cdd0be602f29e23ab48a45e15
doc_id: 776107
cord_uid: oqxh7egj

In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state of the art by a number of specific examples.

Structure-Based Drug Discovery

Over the past few years, structure-based or rational drug discovery has resulted in a number of drugs on the market and many more in the development pipeline. [1] [2] [3] [4] Structure-based methods are now routinely used in almost all stages of drug development, from target identification to lead optimization. [5] [6] [7] [8] Central to all structure-based discovery approaches is the knowledge of the three-dimensional (3D) structure of the target protein or complex because the structure and dynamics of the target determine which ligands it binds. The 3D structures of the target proteins are best determined by experimental methods that yield solutions at atomic resolution, such as x-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. 9 Recent developments in the techniques of experimental structure determination have enhanced the applicability, accuracy, and speed of structural studies. 10, 11 Despite these advances, however, structural characterization of sequences remains an expensive and time-consuming task.

The publicly available Protein Data Bank (PDB) 12 currently contains B33 000 structures and grows at a rate of approximately 40% every 2 years. On the other hand, the various genome-sequencing projects have resulted in B2.1 million sequences, including the complete genetic blueprints of humans and hundreds of other organisms. 13, 14 This achievement has resulted in a vast collection of sequence information about possible target proteins with little or no structural information. Current statistics show that the structures available in the PDB account for only B1.5% of the sequences in the UniProt database. 13 Moreover, the rate of growth of the sequence information is more than twice that of the structures. Due to this wide sequence-structure gap, reliance on experimentally determined structures limits the number of proteins that can be targeted by structure-based drug discovery.

Fortunately, domains in protein sequences are gradually evolving entities that can be clustered into a relatively small number of families with similar sequences and structures. 15, 16 For instance, 75-80% of the sequences in the UniProt database have been grouped into fewer than 15 000 domain families. 17, 18 Similarly, all the structures in the PDB have been classified into about 1000 distinct folds. 19, 20 Computational protein structure prediction methods, such as threading 21 and comparative protein structure modeling, 22, 23 strive to bridge the sequence-structure gap by utilizing these evolutionary relationships. The speed, low cost, and relative accuracy of these computational methods have led to the use of predicted 3D structures in the drug discovery process. 24, 25 The other class of prediction methods, de novo or ab initio methods, attempts to predict the structure from sequence alone, without reliance on evolutionary relationships. However, despite recent progress in these methods, 26 especially for small proteins with fewer than 100 amino acid residues, comparative modeling remains the most reliable method of predicting the 3D structure of a protein, with an accuracy that can be comparable to a low-resolution, experimentally determined structure. 9

The primary requirement for reliable comparative modeling is a detectable similarity between the sequence of interest (target sequence) and a known structure (template). As early as 1986, Chothia and Lesk 27 showed that there is a strong correlation between sequence and structural similarities. This correlation provides the basis of comparative modeling, allows a coarse assessment of model errors, and also highlights one of its major challenges: modeling the structural differences between the template and target structures 28 (Figure 1 ).

Comparative modeling stands to benefit greatly from the structural genomics initiative. 29 Structural genomics aims to achieve significant structural coverage of the sequence space with an efficient combination of experimental and prediction methods. 30 This goal is pursued by careful selection of target proteins for structure determination by x-ray crystallography and NMR spectroscopy, such that most other sequences are within 'modeling distance' (e.g., 430% sequence identity) of a known structure. 15, 16, 29, 31 The expectation is that the determination of these structures combined with comparative modeling will yield useful structural information for the largest possible fraction of sequences in the shortest possible timeframe. The impact of structural genomics is illustrated by comparative modeling based on the structures determined by the New York Structural Genomics Research Consortium. For each new structure, on average, 100 protein sequences without any prior structural characterization could be modeled at least at the level of the fold. 32 Thus, the structures of most proteins will eventually be predicted by computation, not determined by experiment.

In this review, we begin by describing the various steps involved in comparative modeling. Next, we emphasize two aspects of model refinement, loop modeling and side-chain modeling, due to their relevance in ligand docking and rational drug discovery. We then discuss the errors in comparative models. Finally, we describe the role of comparative modeling in drug discovery, focusing on ligand docking against comparative models. We compare successes of docking against models and x-ray structures, and illustrate the computational docking against models with a number of examples. We conclude with a summary of topics that will impact on the future utility of comparative modeling in drug discovery, including an automation and integration of resources required for comparative modeling and ligand docking.

Comparative modeling consists of four main steps 23 ( Figure 2a) : (1) fold assignment that identifies similarity between the target sequence of interest and at least one known protein structure (the template); (2) Figure 1 Average model accuracy as a function of sequence identity. 28 As the sequence identity between the target sequence and the template structure decreases, the average structural similarity between the template and the target also decreases (dashed line, triangles). 27 Structural overlap is defined as the fraction of equivalent C a atoms. For the comparison of the model with the actual structure (filled circles), two C a atoms were considered equivalent if they belonged to the same residue and were within 3.5 Å of each other after least-squares superposition. For comparisons between the template structure and the actual target structure (triangles), two C a atoms were considered equivalent if they were within 3.5 Å of each other after alignment and rigid-body superposition. The difference between the model and the actual target structure is a combination of the targettemplate differences (green area) and the alignment errors (red area). The figure was constructed by calculating 3993 comparative models based on a single template of varying similarity to the targets. All targets had known (experimentally determined) structures. 28 Comparative Modeling of Drug Target Proteins

Although fold assignment and sequence-structure alignment are logically two distinct steps in the process of comparative modeling, in practice almost all fold assignment methods also provide sequence-structure alignments. In the past, fold assignment methods were optimized for better sensitivity in detecting remotely related homologs, often at the cost of alignment accuracy. However, recent methods simultaneously optimize both the sensitivity and alignment accuracy. Therefore, in the following discussion, we will treat fold assignment and sequence-structure alignment as a single protocol, explaining the differences as needed.

As mentioned earlier, the primary requirement for comparative modeling is the identification of one or more known template structures with detectable similarity to the target sequence. The identification of suitable templates is achieved by scanning structure databases, such as PDB, 12 SCOP, 19 DALI, 33 and CATH, 20 with the target sequence as the query. The detected similarity is usually quantified in terms of sequence identity or statistical measures, such as E-value or z-score, depending on the method used.

Sequence-structure relationships are coarsely classified into three different regimes in the sequence similarity spectrum: (1) the easily detected relationships characterized by 430% sequence identity; (2) the 'twilight zone,' 34 corresponding to relationships with statistically significant sequence similarity in the 10-30% range; and (3) the 'midnight zone,' 34 corresponding to statistically insignificant sequence similarity.

For closely related protein sequences with identities higher than 30-40%, the alignments produced by all methods are almost always largely correct. The quickest way to search for suitable templates in this regime is to use simple pairwise Figure 2 Comparative protein structure modeling. (a) A flowchart illustrating the steps in the construction of a comparative model. 23 (b) Description of comparative modeling by extraction of spatial restraints as implemented in MODELLER. 96 By default, spatial restraints in MODELLER involve: (1) homology-derived restraints from the aligned template structures; (2) statistical restraints derived from all known protein structures; and (3) stereochemical restraints from the CHARMM-22 molecular mechanics force field. These restraints are combined into an objective function that is then optimized to calculate the final 3D structure of the target sequence.

sequence alignment methods such as SSEARCH, 35 BLAST, 36 and FASTA. 35 Brenner et al. showed that these methods detect only B18% of the homologous pairs at less than 40% sequence identity, while they identify more than 90% of the relationships when sequence identity is between 30% and 40%. 37 Another benchmark, based on 200 reference structural alignments with 0-40% sequence identity, indicated that BLAST is able to correctly align only 26% of the residue positions. 46 

The sensitivity of the search and accuracy of the alignment become progressively difficult as the relationships move into the twilight zone. 34, 38 A significant improvement in this area was the introduction of profile methods by Gribskov and co-workers. 39 The profile of a sequence is derived from a multiple sequence alignment and specifies residue-type occurrences for each alignment position. The information in a multiple sequence alignment is most often encoded as either a position-specific scoring matrix (PSSM) 36, 40, 41 or as a hidden Markov model (HMM). 42, 43 In order to identify suitable templates for comparative modeling, the profile of the target sequence is used to search against a database of template sequences. The profile-sequence methods are more sensitive in detecting related structures in the twilight zone than the pairwise sequence-based methods; they detect approximately twice the number of homologs under 40% sequence identity. [44] [45] [46] The resulting profile-sequence alignments correctly align approximately 43-48% of residues in the 0-40% sequence identity range 46, 47 ; this number is almost twice as large as that of the pairwise sequence methods. Frequently used programs for profile-sequence alignment are PSI-BLAST, 36 SAM, 48 HMMER, 42 and BUILD PROFILE. 49 

As a natural extension, the profile-sequence alignment methods have led to profile-profile alignment methods that search for suitable template structures by scanning the profile of the target sequence against a database of template profiles, as opposed to a database of template sequences. These methods have proven to include the most sensitive and accurate fold assignment and alignment protocols to date. 47, [50] [51] [52] Profile-profile methods detect B28% more relationships at the superfamily level and improve the alignment accuracy by 15-20% compared to profile-sequence methods. 47, 53 There are a number of variants of profile-profile alignment methods that differ in the scoring functions they use. 47, 50, [53] [54] [55] [56] [57] [58] [59] However, several analyses have shown that the overall performances of these methods are comparable. 47, [50] [51] [52] Some of the programs that can be used to detect suitable templates are FFAS, 60 SP3, 53 SALIGN, 47 and PPSCAN. 49 4.10.2.1.6 Sequence-structure threading methods As the sequence identity drops below the threshold of the twilight zone, there is usually insufficient signal in the sequences or their profiles for the sequence-based methods discussed above to detect true relationships. 44 Sequencestructure threading methods are most useful in this regime as they can sometimes recognize common folds, even in the absence of any statistically significant sequence similarity. 21 These methods achieve higher sensitivity by using structural information derived from the templates. The accuracy of a sequence-structure match is assessed by the score of a corresponding coarse model and not by sequence similarity, as in sequence comparison methods. 21 The scoring scheme used to evaluate the accuracy is either based on residue substitution tables dependent on structural features such as solvent exposure, secondary structure type, and hydrogen bonding properties, 53,61-63 or on statistical potentials for residue interactions implied by the alignment. [64] [65] [66] [67] [68] The use of structural data does not have to be restricted to the structure side of the aligned sequence-structure pair. For example, SAM-T02 makes use of the predicted local structure for the target sequence to enhance homolog detection and alignment accuracy. 69 Commonly used threading programs are GenTHREADER, 61,70 3D-PSSM, 71 Yet another strategy is to optimize the alignment by iterating over the process of calculating alignments, building models, and evaluating models. Such a protocol can sample alignments that are not statistically significant and identify the alignment that yields the best model. Although this procedure can be time-consuming, it can significantly improve the accuracy of the resulting comparative models in difficult cases. 72 alignment errors. Improving the performance and accuracy of methods in this regime remains one of the main tasks of comparative modeling today. 73 It is imperative to calculate an accurate alignment between the target-template pair, as comparative modeling can almost never recover from an alignment error. 74 

After a list of all related protein structures and their alignments with the target sequence have been obtained, template structures are prioritized depending on the purpose of the comparative model. Template structures may be chosen purely based on the target-template sequence identity or a combination of several other criteria, such as experimental accuracy of the structures (resolution of x-ray structures, number of restraints per residue for NMR structures), conservation of active-site residues, holo-structures that have bound ligands of interest, and prior biological information that pertains to the solvent, pH, and quaternary contacts. It is not necessary to select only one template. In fact, the use of several templates approximately equidistant from the target sequence generally increases the model accuracy. 75 Once an initial target-template alignment is built, a variety of methods can be used to construct a 3D model for the target protein. 23, 74, [77] [78] [79] [80] The original and still widely used method is modeling by rigid-body assembly. 78, 79, 81 This method constructs the model from a few core regions, and from loops and side chains that are obtained by dissecting related structures. Commonly used programs that implement this method are COMPOSER, 82-85 3D-JIGSAW, 86 and SWISS-MODEL. 87 Another family of methods, modeling by segment matching, relies on the approximate positions of conserved atoms from the templates to calculate the coordinates of other atoms. [88] [89] [90] [91] [92] An instance of this approach is implemented in SegMod. 91 The third group of methods, modeling by satisfaction of spatial restraints, uses either distance geometry or optimization techniques to satisfy spatial restraints obtained from the alignment of the target sequences with the template structures. [93] [94] [95] [96] [97] Specifically, MODELLER, 96,98,99 our own program for comparative modeling, belongs to this group of methods.

MODELLER implements comparative protein structure modeling by the satisfaction of spatial restraints that include:

(1) homology-derived restraints on the distances and dihedral angles in the target sequence, extracted from its alignment with the template structures 96 ; (2) stereochemical restraints such as bond length and bond angle preferences, obtained from the CHARMM-22 molecular mechanics force field 100 ; (3) statistical preferences for dihedral angles and nonbonded interatomic distances, obtained from a representative set of known protein structures 101 ; and (4) optional manually curated restraints, such as those from NMR spectroscopy, rules of secondary structure packing, cross-linking experiments, fluorescence spectroscopy, image reconstruction from electron microscopy, site-directed mutagenesis, and intuition ( Figure 2b) . The spatial restraints, expressed as probability density functions, are combined into an objective function that is optimized by a combination of conjugate gradients and molecular dynamics with simulated annealing. This model-building procedure is similar to structure determination by NMR spectroscopy.

Accuracies of the various model-building methods are relatively similar when used optimally. 102, 103 Other factors, such as template selection and alignment accuracy, usually have a larger impact on the model accuracy, especially for models based on less than 30% sequence identity to the templates. However, it is important that a modeling method allows a degree of flexibility and automation to obtain better models more easily and rapidly. For example, a method should allow for an easy recalculation of a model when a change is made in the alignment; it should be straightforward to calculate models based on several templates; and the method should provide tools for incorporation of prior knowledge about the target (e.g., cross-linking restraints and predicted secondary structure).

Protein sequences evolve through a series of amino acid residue substitutions, insertions, and deletions. While substitutions can occur throughout the length of the sequence, insertions and deletions mostly occur on the surface of proteins in segments that connect regular secondary structure segments (i.e., loops). While the template structures are helpful in the modeling of the aligned target backbone segments, they are generally less valuable for the modeling of side chains and irrelevant for the modeling of insertions such as loops. The loops and side chains of comparative models are especially important for ligand docking; thus, we discuss them in the following two sections. Loop modeling is an especially important aspect of comparative modeling in the range from 30% to 50% sequence identity. In this range of overall similarity, loops among the homologs vary while the core regions are still relatively conserved and aligned accurately. Loops often play an important role in defining the functional specificity of a given protein, forming the active and binding sites. Loop modeling can be seen as a mini protein folding problem because the correct conformation of a given segment of a polypeptide chain has to be calculated mainly from the sequence of the segment itself. However, loops are generally too short to provide sufficient information about their local fold. Even identical decapeptides in different proteins do not always have the same conformation. 104, 105 Some additional restraints are provided by the core anchor regions that span the loop and by the structure of the rest of the protein that cradles the loop. Although many loop-modeling methods have been described, it is still challenging to correctly and confidently model loops longer than approximately 8-10 residues. 98,106

There are two main classes of loop-modeling methods: (1) database search approaches that scan a database of all known protein structures to find segments fitting the anchor core regions 90, 107 ; and (2) conformational search approaches that rely on optimizing a scoring function. [108] [109] [110] There are also methods that combine these two approaches. 111 The database search approach to loop modeling is accurate and efficient when a database of specific loops is created to address the modeling of the same class of loops, such as b-hairpins, 113 or loops on a specific fold, such as the hypervariable regions in the immunoglobulin fold. 107, 114 There are attempts to classify loop conformations into more general categories, thus extending the applicability of the database search approach. [115] [116] [117] However, the database methods are limited because the number of possible conformations increases exponentially with the length of a loop. As a result, only loops up to 4-7 residues long have most of their conceivable conformations present in the database of known protein structures. 118, 119 This limitation is made even worse by the requirement for an overlap of at least one residue between the database fragment and the anchor core regions, which means that modeling a 5-residue insertion requires at least a 7-residue fragment from the database. 89 Despite the rapid growth of the database of known structures, it does not seem possible to cover most of the conformations of a 9-residue segment in the foreseeable future. On the other hand, most of the insertions in a family of homologous proteins are shorter than 10-12 residues. 98

To overcome the limitations of the database search methods, conformational search methods were developed. 108, 109 There are many such methods, exploiting different protein representations, objective functions, and optimization or enumeration algorithms. The search algorithms include the minimum perturbation method, 120 molecular dynamics simulations, 111,121 genetic algorithms, 122 Monte Carlo and simulated annealing, 123-125 multiple-copy simultaneous search, 126 self-consistent field optimization, 127 and enumeration based on graph theory. 128 The accuracy of loop predictions can be further improved by clustering the sampled loop conformations and partially accounting for the entropic contribution to the free energy. 129 Another way of improving the accuracy of loop predictions is to consider the solvent effects. Improvements in implicit solvation models, such as the Generalized Born solvation model, motivated their use in loop modeling. The solvent contribution to the free energy can be added to the scoring function for optimization, or it can be used to rank the sampled loop conformations after they are generated with a scoring function that does not include the solvent terms. 98 Two simplifications are frequently applied in the modeling of side-chain conformations. 133 First, amino acid residue replacements often leave the backbone structure almost unchanged, 26 allowing us to fix the backbone during the search for the best side-chain conformations. Second, most side chains in high-resolution crystallographic structures can be represented by a limited number of conformers that comply with stereochemical and energetic constraints. 134 

Rotamers on a fixed backbone are often used when all the side chains need to be modeled on a given backbone. This approach reduces the combinatorial explosion associated with a full conformational search of all the side chains, and is applied by some comparative modeling 78 and protein design approaches. 143 However, B15% of the side chains cannot be represented well by these libraries. 144 In addition, it has been shown that the accuracy of side-chain modeling on a fixed backbone decreases rapidly when the backbone errors are larger than 0.5 Å. 145 

Earlier methods for side-chain modeling often put less emphasis on the energy or scoring function. The function was usually greatly simplified, and consisted of the empirical rotamer preferences and simple repulsion terms for nonbonded contacts. 138 Nevertheless, these approaches have been justified by their performance. For example, a method based on a rotamer library compared favorably with that based on a molecular mechanics force field, 146 and new methods continue to be based on the rotamer library approach. 147, 148 The various optimization approaches include a Monte Carlo simulation, 149 simulated annealing, 150 a combination of Monte Carlo and simulated annealing, 151 the dead-end elimination theorem, 152,153 genetic algorithms, 142 neural network with simulated annealing, 154 mean field optimization, 155 and combinatorial searches. 138, 156, 157 Several recent papers focused on the testing of more sophisticated potential functions for conformational search 157, 158 and development of new scoring functions for sidechain modeling, 159 reporting higher accuracy than earlier studies.

The major sources of error in comparative modeling are discussed in the relevant sections above. The following is a summary of these errors, dividing them into five categories ( Figure 3 ).

This error is a potential problem when distantly related proteins are used as templates (i.e., less than 30% sequence identity). Distinguishing between a model based on an incorrect template and a model based on an incorrect alignment with a correct template is difficult. In both cases, the evaluation methods (below) will predict an unreliable model. The conservation of the key functional or structural residues in the target sequence increases the confidence in a given fold assignment.

The single source of errors with the largest impact on comparative modeling is misalignments, especially when the target-template sequence identity decreases below 30%. Alignment errors can be minimized in two ways. Using the profile-based methods discussed above usually results in more accurate alignments than those from pairwise sequence alignment methods. Another way of improving the alignment is iteratively to modify those regions in the alignment that correspond to predicted errors in the model. 75 

Segments of the target sequence that have no equivalent region in the template structure (i.e., insertions or loops) are one of the most difficult regions to model. Again, when the target and template are distantly related, errors in the alignment can lead to incorrect positions of the insertions. Using alignment methods that incorporate structural information can often correct such errors. Once a reliable alignment is obtained, various modeling protocols can predict the loop conformation, for insertions of fewer than 8-10 residues. 98,106,111,169

As a consequence of sequence divergence, the main-chain conformation changes, even if the overall fold remains the same. Therefore, it is possible that in some correctly aligned segments of a model, the template is locally different (o3 Å) from the target, resulting in errors in that region. The structural differences are sometimes not due to differences in sequence, but are a consequence of artifacts in structure determination or structure determination in different environments (e.g., packing of subunits in a crystal). The simultaneous use of several templates can minimize this kind of an error. 75,76

As the sequences diverge, the packing of the atoms in the protein core changes. Sometimes even the conformation of identical side chains is not conserved -a pitfall for many comparative modeling methods. Side-chain errors are critical if they occur in regions that are involved in protein function, such as active sites and ligand-binding sites.

Side-chain packing

Regions without a template Model x-ray template Figure 3 Typical errors in comparative modeling. 23 Shown are the typical sources of errors encountered in comparative models. Two of the major sources of errors in comparative modeling are due to incorrect templates or incorrect alignments with the correct templates. The modeling procedure can rarely recover from such errors. The next significant source of errors arises from regions in the target with no corresponding region in the template, i.e., insertions or loops. Other sources of errors, which occur even with an accurate alignment, are due to rigid-body shifts, distortions in the backbone, and errors in the packing of side chains.

The accuracy of the predicted model determines the information that can be extracted from it. Thus, estimating the accuracy of a model in the absence of the known structure is essential for interpreting it.

As discussed earlier, a model calculated using a template structure that shares more than 30% sequence identity is indicative of an overall accurate structure. However, when the sequence identity is lower, the first aspect of model evaluation is to confirm whether or not a correct template was used for modeling. It is often the case, when operating in this regime, that the fold assignment step produces only false positives. A further complication is that at such low similarities the alignment generally contains many errors, making it difficult to distinguish between an incorrect template on one hand and an incorrect alignment with a correct template on the other hand. There are several methods that use 3D profiles and statistical potentials, 65, 160, 161 which assess the compatibility between the sequence and modeled structure by evaluating the environment of each residue in a model with respect to the expected environment, as found in native high-resolution experimental structures. These methods can be used to assess whether or not the correct template was used for the modeling. They include VERIFY3D, 160 PROSAII, 162 HARMONY, 163 ANOLEA, 164 and DFIRE. 165 Even when the model is based on alignments that have 430% sequence identity, other factors, including the environment, can strongly influence the accuracy of a model. For instance, some calcium-binding proteins undergo large conformational changes when bound to calcium. If a calcium-free template is used to model the calcium-bound state of the target, it is likely that the model will be incorrect, irrespective of the target-template similarity or accuracy of the template structure. 166 

Self-Consistency

The model should also be subjected to evaluations of self-consistency to ensure that it satisfies the restraints used to calculate it. Additionally, the stereochemistry of the model (e.g., bond lengths, bond angles, backbone torsion angles, and nonbonded contacts) may be evaluated using programs such as PROCHECK 167 and WHATCHECK. 168 Although errors in stereochemistry are rare and less informative than errors detected by statistical potentials, a cluster of stereochemical errors may indicate that there are larger errors (e.g., alignment errors) in that region.

It is crucial for method developers and users alike to assess the accuracy of their methods. An attempt to address this problem has been made by the Critical Assessment of Techniques for Proteins Structure Prediction (CASP) 170 and the Critical Assessment of Fully Automated Structure Prediction (CAFASP) experiments. 171 However, both CASP and CAFASP assess methods only over a limited number of target protein sequences. 102, 172 To overcome this limitation, two additional evaluation experiments have been described, LiveBench 172 and EVA. 173, 174 EVA is a large-scale and continuously running web server that automatically assesses protein structure prediction servers in the categories of secondary structure prediction, residue-residue contact prediction, fold assignment, and comparative modeling. The aims of EVA are: (1) to evaluate continuously and automatically blind predictions by prediction servers, based on identical and sufficiently large data sets; (2) to provide weekly updates of the method assessments on the web; and (3) to enable developers, nonexpert users, and reviewers to determine the performance of the tested prediction servers.

There is a wide range of applications of protein structure models ( Figure 4) . 1, [175] [176] [177] [178] [179] [180] For example, high-and mediumaccuracy comparative models are frequently helpful in refining functional predictions that have been based on a sequence match alone because ligand binding is more directly determined by the structure of the binding site than by its sequence. It is often possible to predict correctly features of the target protein that do not occur in the template structure. 181, 182 For example, the size of a ligand may be predicted from the volume of the binding site cleft and the location of a binding site for a charged ligand can be predicted from a cluster of charged residues on the protein.

Fortunately, errors in the functionally important regions in comparative models are many times relatively low because NMR, x-ray Figure 4 Accuracy and applications of protein structure models. 9 Shown are the different ranges of applicability of comparative protein structure modeling, threading, and de novo structure prediction, their corresponding accuracies, and their sample applications.

Comparative Modeling of Drug Target Proteins the functional regions, such as active sites, tend to be more conserved in evolution than the rest of the fold. Even lowaccuracy comparative models may be useful, for example, for assigning the fold of a protein. Fold assignment can be very helpful in drug discovery, because it can shortcut the search for leads by pointing to compounds that have been previously developed for other members of the same family. 183,184

The remainder of this review focuses on the use of comparative models for ligand docking (see also Chapter 4.19.2.5 ). [185] [186] [187] It is widely accepted that docking to comparative models is more challenging and less successful than docking to crystallographic structures. However, it seems that surprisingly little work has been done to obtain quantitative information about the accuracy of docking to comparative models, to determine in detail why the results are inferior to those obtained with crystal structures, and to improve methods for docking to comparative models. We begin our discussion with a study by McGovern and Shoichet 188 that compared the success of docking against three different conformations of 10 enzymes: holo (ligand-bound), apo, and homology modeled. All 10 enzymes had known structures in both the holo and apo form. Comparative models for each of these enzymes were taken from MODBASE, a database of comparative models for all protein sequences that are detectably related to at least one known structure. The models were based on single template structures with sequence identities in the range of 28-87%. Each enzyme had multiple known inhibitors in the MDL Drug Data Report (MDDR) database, a library of drug-like molecules where each molecule has been annotated by the receptor to which it binds. Success of the docking, carried out with the Shoichet group's version of DOCK, 189, 190 was assessed by enrichment: the ability to distinguish known inhibitors from a large set of B100 000 'decoys' relative to random selection. As might be expected, the holo structures were the best at selecting the known ligands from among the MDDR decoys based on the docking score. Unexpectedly, the comparative models often ranked known ligands among the top-scoring database molecules; in four targets, the enrichment was 20 times higher than expected by chance. 188 In one case, purine nucleoside phosphorylase, the modeled structure actually performed better than the holo structure. For the comparative model, 25% of the known ligands were found in the top 1.2% of the ranked database, whereas for the holo conformation, 2.8% of the ranked list had to be searched before 25% of the ligands were found. In another example, the holo structure of thymidylate synthase correctly recognized ligands similar in size to the ligand captured in the x-ray structure, but not ligands that were markedly different from it. In contrast, the binding sites in the modeled conformations were more spacious and could in fact correctly detect and accommodate larger ligands than the holo receptor ( Figure 5) . Thus, it appears that, while x-ray crystallographic structures remain the first choice in docking, many comparative models seem sufficiently accurate to rank highly known ligands from among a very large list of possible alternatives.

Asp169 Phe176

Asp169 Phe176 lle79 Trp83 Ala262 His51 His51 Figure 5 Docking predictions for thymidylate synthase. Shown are the x-ray structure of the holo receptor in gray, the modeled receptor in blue, the docked conformation of the ligand in the holo structure in green, and the docked conformation of the ligand in the modeled structure in yellow. A second holo complex, not used for docking but bound to a larger ligand, is also shown with protein atoms in white and ligand atoms in purple. The ligand in the holo receptor was smaller in size than many of the known ligands in the database. Consequently, while the holo structure yielded better enrichment of ligands that were similar to the native ligand, it was unable to dock larger ligands correctly. The modeled receptors, in contrast, with their more spacious binding sites, showed better competence in such cases. (Courtesy of Brian Shoichet.) 4.10.8.2

Despite problems with comparative modeling and ligand docking, comparative models have been successfully used in practice in conjunction with virtual screening to identify novel inhibitors. We briefly review a few of these success stories' to highlight the potential of the combined comparative modeling and ligand-docking approach to drug discovery (see 4.19 Virtual Screening) .

Comparative models have been employed to aid rational drug design against parasites for more than 20 years. 122, [191] [192] [193] As early as 1993, Ring et al. 122 used comparative models for computational docking studies that identified low micromolar nonpeptidic inhibitors of proteases in malarial and schistosome parasite lifecycles. Li et al. 191 subsequently used similar methods to develop nanomolar inhibitors of falcipain that are active against chloroquineresistant strains of malaria. In a study by Selzer et al. 193 comparative models were used to predict new nonpeptide inhibitors of cathepsin L-like cysteine proteases in Leishmania major. Sixty-nine compounds were selected by DOCK 3.5 as strong binders to a comparative model of protein cpB, and of these, 21 had experimental IC 50 values below 100 mmol L À 1 . Finally, in a recent study by Que et al. 192 comparative models were used to rationalize ligand-binding affinities of cysteine proteases in Entamoeba histolytica. Specifically, this work provided an explanation for why proteases ACP1 and ACP2 had substrate specificity similar to that of cathepsin B, although their overall structure is more similar to that of cathepsin D.

Enyedy et al. 194 discovered 15 new inhibitors of matriptase by docking against its comparative model. The comparative model employed thrombin as the template, sharing only 34% sequence identity with the target sequence. Moreover, some residues in the binding site are significantly different; a trio of charged Asp residues in matriptase correspond to 1 Tyr and 2 Trp residues in thrombin. Thrombin was chosen as the template, in part because it prefers substrates with positively charged residues at the P1 position, as does matriptase. The comparative model was constructed using MODELLER and refined with MD simulations in CHARMM. The National Cancer Institute database was used for virtual screening that targeted the S1 site with the DOCK program. The 2000 best-scoring compounds were manually inspected to identify positively charged ligands (the S1 site is negatively charged), and 69 compounds were experimentally screened for inhibition, identifying the 15 inhibitors. One of them, hexamidine, was used as a lead to identify additional compounds selective for matriptase relative to thrombin. The Wang group has also used similar methods to discover seven new, low-micromolar inhibitors of Bcl-2, using a comparative model based on the NMR solution structure of Bcl-X L . 195 Schapira et al. 196 discovered a novel inhibitor of a retinoic acid receptor by virtual screening using a comparative model. In this case, the target (RAR-a) and template (RAR-g) are very closely related; only three residues in the binding site are not conserved. The ICM program was used for virtual screening of ligands from the Available Chemicals Directory (ACD). The 5364 high-scoring compounds identified in the first round were subsequently docked into a full atom representation of the receptor with flexible side chains to obtain a final set of 300 good-scoring hits. These compounds were then manually inspected to choose the final 30 for testing. Two novel agonists were identified, with 50-nanomolar activity. Zuccotto et al. 197 identified novel inhibitors of dihydrofolate reductase (DHFR) in Trypanosoma cruzi (the parasite that causes Chagas disease) by docking into a comparative model based on B50% sequence identity to DHFR in L. major, a related parasite. The virtual screening procedure used DOCK for rigid docking of over 50 000 selected compounds from the Cambridge Structural Database (CSD). Visual inspection of the top 100 hits was used to select 36 compounds for experimental testing. This work identified several novel scaffolds with micromolar IC 50 values. The authors report attempting to use virtual screening results to identify compounds with greater affinity for T. cruzi DHFR than human DHFR, but it is not clear how successful they were.

Following the recent outbreak of the severe acute respiratory syndrome (SARS) in 2003, Anand et al. 198 used the experimentally determined structures of the main protease from human coronavirus (M PRO ) and an inhibitor complex of porcine coronavirus (transmissible gastroenteritis virus, TGEV) M PRO to calculate a comparative model of the SARS coronavirus M PRO . This model then provided a basis for the design of anti-SARS drugs. In particular, a comparison of the active site residues in these and other related structures suggested that the AG7088 inhibitor of the human rhinovirus type 2 3C protease is a good starting point for design of anticoronaviral drugs. 199 Comparative models of protein kinases combined with virtual screening have also been intensely used for drug discovery. [200] [201] [202] [203] [204] The 4500 kinases in the human genome, the relatively small number of experimental structures available, and the high level of conservation around the important adenosine triphosphate-binding site make comparative modeling an attractive approach toward structure-based drug discovery.

G protein-coupled receptors are another interesting class of proteins that in principle allow drug discovery through comparative modeling. [205] [206] [207] [208] [209] Approximately 40% of current drug targets belong to this class of proteins. However, Comparative Modeling of Drug Target Proteins these proteins have been extremely difficult to crystallize and most comparative modeling has been based on the atomic resolution structure of the bovine rhodopsin. 210 

Future Directions

Although reports of successful virtual screening against comparative models are encouraging, such efforts are not yet a routine part of rational drug design. Even the successful efforts appear to rely strongly on visual inspection of the docking results. Much work remains to be done to improve the accuracy, efficiency, and robustness of docking against comparative models. Despite assessments of relative successes of docking against comparative models and native x-ray structures, 188, 202 surprisingly little has been done to compare the accuracy achievable by different approaches to comparative modeling and to identify the specific structural reasons why comparative models generally produce less accurate virtual screening results than the holo structures. Among the many issues that deserve consideration are the following:

* The inclusion of cofactors and bound water molecules in protein receptors is often critical for success of virtual screening; however, cofactors are not routinely included in comparative models * Most docking programs currently retain the protein receptor in a completely rigid conformation. While this approach is appropriate for 'lock-and-key' binding modes, it does not work when the ligand induces conformational changes in the receptor upon binding. A flexible receptor approach is necessary to address such induced-fit cases 211,212 * The accuracy of comparative models is frequently judged by the C a root mean square error or other similar measures of backbone accuracy. For virtual screening, however, the precise positioning of side chains in the binding site is likely to be critical; measures of accuracy for binding sites are needed to help evaluate the suitability of comparative modeling algorithms for constructing models for docking * Knowledge of known inhibitors, either for the target protein or the template, should help to evaluate and improve virtual screening against comparative models. For example, comparative models constructed from holo' template structures implicitly preserve some information about the ligand-bound receptor conformation * Improvement in the accuracy of models produced by comparative modeling will require methods that finely sample protein conformational space using a free energy or scoring function that has sufficient accuracy to distinguish the native structure from the nonnative conformations. Despite many years of development of molecular simulation methods, attempts to refine models that are already relatively close to the native structure have met with relatively little success. This failure is likely to be due in part to inaccuracies in the scoring functions used in the simulations, particularly in the treatment of electrostatics and solvation effects. A combination of physics-based energy function with the statistical information extracted from known protein structures may provide a route to the development of improved scoring functions * Improvements in sampling strategies are also likely to be necessary, for both comparative modeling and flexible docking

Given the increasing number of target sequences for which no experimentally determined structures are available, drug discovery stands to gain immensely from comparative modeling and other in silico methods. Despite unsolved problems in virtually every step of comparative modeling and ligand docking, it is highly desirable to automate the whole process, starting with the target sequence and ending with a ranked list of its putative ligands. Automation encourages development of better methods, improves their testing, allows application on a large scale, and makes the technology more accessible to both experts and nonspecialists alike. Through large-scale application, new questions, such as those about ligand-binding specificity, can in principle be addressed. Enabling a wider community to use the methods provides useful feedback and resources toward the development of the next generation of methods.

There are a number of servers for automated comparative modeling (Table 1) . However, in spite of automation, the process of calculating a model for a given sequence, refining its structure, as well as visualizing and analyzing its family members in the sequence and structure spaces can involve the use of scripts, local programs, and servers scattered across the internet and not necessarily interconnected. In addition, manual intervention is generally still needed to Table 1 Programs and web servers useful in comparative protein structure modeling Figure 6 An integrated set of resources for comparative modeling. 32 Various databases and programs required for comparative modeling and docking are usually scattered over the internet, and require manual intervention or a good deal of expertise to be useful. Automation and integration of these resources are efficient ways to put these resources in the hands of experts and nonspecialists alike. We have outlined a comprehensive interconnected set of resources for comparative modeling and hope to integrate it with a similar effort in the area of ligand docking made by the Shoichet group. 220 maximize the accuracy of the models in the difficult cases. The two main repositories for precomputed comparative models, SWISS-MODEL 87 and MODBASE, 31 begin to address these deficiencies. They provide access to web-based comparative modeling tools, cross-links to other sequence and structure databases, and annotations of sequences and their models.

A schematic of our own attempt at integrating several useful tools for comparative modeling is shown in Figure 6 . 32,213 MODBASE is a comprehensive database that contains predicted models for domains in approximately one-half of all B2.1 million known protein sequences. The models were calculated using MODPIPE 28, 213 and MODELLER. 96 The web interface to the database allows flexible querying for fold assignments, sequence-structure alignments, models, and model assessments. An integrated sequence-structure viewer, Chimera, 214 allows inspection and analysis of the query results. Models can also be calculated using MODWEB, 213,250 a web interface to MODPIPE and stored in MODBASE to facilitate sharing, presentation, distribution, and annotation. For example, MODBASE contains binding site predictions for small ligands and a set of predicted interactions between pairs of modeled sequences from the same genome. Other resources associated with MODBASE include a comprehensive database of multiple protein structure alignments (DBALI), 215 a server for modeling of loops in protein structures (MODLOOP), 216, 251 structurally defined ligand-binding sites, 217 structurally defined binary domain interfaces (PIBASE), 218, 252 predictions of ligand-binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP). 175, 219, 253 Compared to protein structure prediction, the attempts at automation and integration of resources in the field of docking for virtual screening are still in their nascent stages. One of the recent successful efforts in this direction is ZINC, 220 a publicly available database of commercially available druglike compounds. ZINC contains more than 3.3 million 'ready-to-dock' compounds organized in several subsets and allows the user to query the compounds by molecular properties and constitution. In the future, ZINC will rely on DOCKBLASTER that will enable end-users to dock the compounds against their target structures using DOCK. 189, 190 In the future, we will no doubt see efforts to improve the accuracy of comparative modeling and ligand docking. But perhaps more importantly, the two techniques will be integrated into a single protocol for more accurate and automated docking of ligands against sequences without known structures. As a result, the number and variety of applications of both comparative modeling and ligand docking will continue to increase.

Drug Disc

Proc. Natl. Acad. Sci

Proc. Natl. Acad. Sci

Proc. Natl. Acad. Sci

Proc. Natl. Acad. Sci

Proc. Natl. Acad. Sci

Proc. Natl. Acad. Sci

Comparative Modeling and Its Applications to Drug Discovery

III Proc. Natl. Acad. Sci

The Proteomics Protocols Handbook

Proc. Natl. Acad. Sci

MODWEB

MODLOOP

PIBASE

After an MSc degree in Physics from the University of Hyderabad, India, he was awarded a Research Fellowship by the Indian Institute of Science, Bangalore, India, for a PhD Comparative Modeling of Drug Target Proteins under the supervision of Prof C Ramakrishnan at the Molecular Biophysics Unit where he focused on the conformational analysis of protein structures. He then joined the laboratory of Prof Andrej Sali at the Rockefeller University

He was awarded the Research Council of Slovenia Scholarship, the Overseas Research Students Award, and the Merck Sharpe and Dohm Academic Scholarship at Birkbeck College, University of London, where he received his PhD in biophysics in 1991, under the supervision of Prof Tom L Blundell. He focused on development of methods for comparative modeling of protein three-dimensional structure and their implementation in the program MODELLER. He then went to the Department of Chemistry at Harvard University as a Jane Coffin Childs Memorial Fund postdoctoral fellow with Prof Martin Karplus

He was a Sinsheimer Scholar, an Alfred P Sloan Research Fellow, and an Irma T Hirschl Trust Career Scientist. Dr Sali is an Editor of Structure and a Founder of Prospect Genomix, now Structural Genomix. He is interested in using computation grounded in the laws of physics and the theory of evolution to study the structure and function of proteins. He is aiming to improve and apply methods for (i) predicting the structures of proteins

All Rights Reserved Comprehensive Medicinal Chemistry II No part of this publication may be reproduced, stored in any retrieval system or transmitted in any form by any means electronic, electrostatic, magnetic tape, mechanical, photocopying, recording or otherwise

This article is partially based on papers by Jacobson and Sali, 177 Fiser and Sali, 22 and Madhusudhan et al. 221 We also acknowledge the funds from Sandler Family Supporting Foundation, NIH R01 GM54762, P01 GM71790, P01 A135707, and U54 GM62529, as well as Sun, IBM, and Intel for hardware gifts.