key: cord-290802-761wqgbe authors: Zhao, Zheng; Bourne, Philip E. title: Structural Insights into the Binding Modes of Viral RNA-Dependent RNA Polymerases Using a Function-Site Interaction Fingerprint Method for RNA Virus Drug Discovery date: 2020-09-18 journal: J Proteome Res DOI: 10.1021/acs.jproteome.0c00623 sha: doc_id: 290802 cord_uid: 761wqgbe [Image: see text] The coronavirus disease of 2019 (COVID-19) pandemic speaks to the need for drugs that not only are effective but also remain effective given the mutation rate of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To this end, we describe structural binding-site insights for facilitating COVID-19 drug design when targeting RNA-dependent RNA polymerase (RDRP), a common conserved component of RNA viruses. We combined an RDRP structure data set, including 384 RDRP PDB structures and all corresponding RDRP–ligand interaction fingerprints, thereby revealing the structural characteristics of the active sites for application to RDRP-targeted drug discovery. Specifically, we revealed the intrinsic ligand-binding modes and associated RDRP structural characteristics. Four types of binding modes with corresponding binding pockets were determined, suggesting two major subpockets available for drug discovery. We screened a drug data set of 7894 compounds against these binding pockets and presented the top-10 small molecules as a starting point in further exploring the prevention of virus replication. In summary, the binding characteristics determined here help rationalize RDRP-targeted drug discovery and provide insights into the specific binding mechanisms important for containing the SARS-CoV-2 virus. The coronavirus disease of 2019 (COVID-19) pandemic is a severe threat to global public health, infecting over 15 million people, according to the World Health Organization (WHO) situation report. 1 Consequently, researchers have focused on developing convenient testing techniques, vaccines, and drug design and repurposing to mitigate the causative coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2−4 However, to date, there are no effective COVID-19specific therapeutic agents being prescribed. Laboratory testing techniques have made breakthroughs: the Food and Drug Administration has recently granted an emergency use authorization for the Sofia 2 SARS Antigen FIA COVID-19 test, 5 which can detect the virus within minutes. Progress in detection is important, but it does not speak to a treatment; hence, the research community is developing specialized SARS-CoV-2 vaccines and drugs to mitigate and treat the pandemic. 6 Drug discovery is thwarted by the multiple mutations found in the SARS-CoV-2 family. 7 Three distinct "variants" from the SARS-CoV-2 genomes sampled 7 between December 24, 2019, and March 4, 2020, have been reported. Thus, it is challenging to design novel COVID-19 medications, which not only are effective but also remain so given the mutation rate. Scientists have established SARS-CoV-2 as an RNA virus containing a single-stranded positive-sense RNA genome. 2 RNA viruses have been the main cause of epidemics over the last two decades: SARS 8 in 2003, MERS 9 in 2012, Ebola 10 in 2014, Zika 11, 12 in 2015, and now COVID-19. RNA viruses are divided into 4 classes: 3 single positive-strand RNA ((+)-ssRNA) such as SARS, MERS, and SARS-CoV-2; single negative-strand RNA ((−)ssRNA) such as Ebola; doublestrand RNA (dsRNA); and retroviruses such as HIV. 13, 14 These viruses replicate their genetic material within host cells; 15 hence, one way to limit infection is to inhibit virus replication. Apart from retroviruses, the other classes all This article is made available via the ACS COVID-19 subset for unrestricted RESEARCH re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. contain a common RNA-dependent RNA polymerase (RDRP), 16 which catalyzes the replication of viral RNA and hence is a prime drug target. Multiple high-resolution RDRP 3D structures have been solved, each with a similar core architecture, 17 namely a "cupped right hand" with 7 motifs (A−G) comprising "palm", "fingers", and "thumb" (Figure 1 and Table S1 ). 18, 19 Here, we propose a drug discovery scheme targeting the conserved RDRP. On April 29, 2020, the National Institutes of Health (NIH) indicated that the repurposed drug remdesivir, targeting RDRP, shortened patients' time to recovery by 4 days, or 31%. However, remdesivir did not show significant efficacy in reducing mortality, 20 but it is a start in the quest for RDRPtargeted drugs. Here, we advance anti-COVID-19 drug research and development by revealing new features of drug binding to RDRP using a computational pharmacology approach. We collected 384 PDB structures of RDRP catalytic domains and their complexes (available as of July 1, 2020) from 47 RNA viruses, including coronavirus, as our RDRP data set (see Methods). Then, using computational pharmacology methods, notably a function-site interaction fingerprint method, we characterized the RDRP−ligand interactions to provide new insights into antiviral drug design and discovery. Finally, combining the new structure-based insights and a virtual docking process with an antiviral compound library from Drugbank (www.drugbank.ca), we determined specific potential inhibitors as a proof of concept for drug-repurposing opportunities as well as for gaining new insights into possible modes of inhibition. We first counted the PDB IDs of all RDRP structures by accessing the ProRule accession number of RDRP (PRU00539) using the PROSITE 21 (Table S2) . We then filtered out all of apo-RDRP structures and complexes with invalid small molecules such as buffers, organic cofactors, and solvent molecules such as dimethyl sulfoxide (DMSO), flavin adenine dinucleotide (FAD), and glycerol by checking the "SITE_DESCRIPTION" keywords in each PDB structure. Furthermore, we removed complexes where the ligands bind the surface, or other parts of the RDRP catalytic domain, rather than the core architecture. Our final list contained 141 PDB structures of ligand-bound complexes used to encode the function-site interaction fingerprints in this paper. The pairwise similarity of all ligands was calculated using the screenmd from ChemAxon. 23 Each ligand is described using the molecular descriptor ECFP with a fixed length of 120 bits, and then the pairwise similarity is calculated using a Tanimoto coefficient. 24−26 In the paper, the chemical structures are drawn using Marvinjs from ChemAxon. 27 The Fs-IFP represents the characteristics of protein−ligand interactions at functional sites and does so on a proteome-wide scale, as described previously. 28−31 Briefly, the Fs-IFP method combines a sequence-order-independent structural binding-site alignment method 32−36 with the protein−ligand interaction fingerprint strategy 37−39 to achieve comparable binding features. Application of the Fs-IFP method involves three steps. The first step is to align all of the binding sites. The secondary structures of all RDRP catalytic domains are aligned against the SARS-CoV-2 RDRP structure template using the sequence-order-independent structural alignment program TM-align with the default scoring function. TM-align results have a value between 0 and 1. A value >0.3 implies a similar fold, and >0.5 implies the same fold. 34 The alignment of the binding sites was performed using SMAP with default parameters. 32, 33 The SARS-CoV-2 RDRP−remdesivir complex (PDB 7BV2) was used as the template, and residues within 15 Å of the ligand defined the binding site. 40 The second step is to determine the Fs-IFP of every complex. Here, the interaction fingerprints are encoded using a previously described interaction fingerprint method (IChem). 41 In the third step, the comparable interaction fingerprints of each complex are clustered using the k-means method in the R package. 42 For screening, 7894 annotated drug molecules were downloaded from Drugbank to form our compound library. 43 These drugs were docked to the RDRP catalytic domain using the docking software Surflex 44 v4.103. Surflex uses a pseudomolecule (also called an idealized active ligand or protomol) as a target to generate the putative poses of ligands in the protein binding site. 45 The putative poses are scored using a Hammerhead scoring function. 46 We use a residue-based method to generate the protomol, i.e., the residues that border the active site. For subpocket 1, the distilled binding characteristics (see Results) show that subpocket 1 is located within motifs A−D and F−G. From these motifs, on the basis of the binding characteristics, we chose residues N497, K551, R553, D623, S682, D760, and F793 as bordering the active site and hence to generate the protomol molecule of subpocket 1. Similarly, subpocket 2 involves motif E, helix1, and the thumb lobe (see Results). Hence, we chose residues F594, S814, and P830 as bordering the active site and from which to generate a protomol molecule. 44 These two protomol molecules that define different subpockets were chosen to screen for potential compounds using default parameters and the scoring function. All docked small molecules with different binding conformations were sorted based on the binding affinity score. The topfive highest scoring molecules from different subpockets were further analyzed. (Table S2 ). All RDRP catalytic domains show high similarity ( Figure 2b ) based on secondary structure alignment (see Methods). Specifically, in the "finger" and "thumb" regions they have the same folding patterns (helix and sheet) across all viruses. Likewise, the conserved core architectures of all RDRPs, such as motif C, are high similar and overlap ( Figure 2b ). The global structural similarity between all RDRP catalytic domains and the SARS-CoV-2 RDRP was calculated (see Methods). The lowest similarity 0.33, is from bacteria, Escherichia coli (UniProt P0A6P1). The top-three viruses with a RDRP similarity above 0.65 are poliovirus type 1 (UniProt P03300), hepatitis C virus (HCV) genotype 2a (UniProt Q99IB8), and hepatitis C virus genotype 1b (UniProt P26663, Figure S1 ). In sum, structurally, SARS-CoV-2 has high global/ core structural similarity to the RDRP catalytic domains of all other RNA viruses, which provides an opportunity for structure-based COVID-19 drug design and repurposing, noting that keys differences lie in the subtle details. Table S3 columns 1−3). The 141 ligands come from 105 different compounds, of which 16 compounds occur more than once (Figure 3a) . The top-four recurring compounds are GTP (15), UTP(5), CH1(5), and ATP(5), which all have the same triphosphate fragment. However, the pairwise similarities of 85% compounds are below 0.25 (Figure 3b ), which guarantees ligand diversity and diverse chemotypes. Using the SARS-CoV-2 RDRP binding site as a template, 141 binding sites were aligned using a sequence-order-independent pocket alignment method resulting in a comparison matrix of interacting amino acids (see Methods). Within the matrix, columns of amino acids without any encoded interaction information were removed. Thus, each aligned binding site consists of 123 columns of amino acids, i.e., the interaction fingerprint (Table S3 ). According to the similarity of functionsite interaction fingerprints over all complexes, it was possible to divide the binding modes into four classes, where each class contains multiple PDB structures from different kinds of viruses (Table 1) . Each class possesses distinct binding characteristics (Figure 4 ). 3.2.1. Class I. There are 11 aligned PDB structures belonging to the same Dengue virus (UniProt entry: Q6YMS4) in this class ( Table 1 ). The aligned binding sites have almost the same binding patterns (Figure 4) . Within Motifs A−D and Motifs F and G there are no interaction fingerprints; however, conserved interaction fingerprints exist in motif E, helix1, and the thumb domain, implying the ligandbinding site is located at the palm region and between motif E and the thumb domain (Figure 5a ). Residues L511, H512, and L514 of helix1 and C709 and S710 of motif E provide the conserved interactions (Figure 5a ). In the thumb domain, the interaction fingerprints of all complexes in the class are similar, especially in the columns marked with the dashed rectangle, (Figure 4 ). The role of this class I binding pocket has been discussed by other groups previously. 47 Noble et al. inhibited enzyme activity through fragment screening 47 that identified this binding pocket. As part of their study, by changing a phenyl to a thiophene, a higher binding affinity was obtained, highlighting the role of this pocket in subsequent drug design. 3.2.2. Class II. There are 50 PDB structures from 15 viruses ( Table 1) in this class ( Figure 4 ). Class II interaction fingerprints exist mainly in the region of motifs A−D and motifs F and G, implying the ligand is located at the regions of the "palm" and "fingers" (Figure 5b ). Remdesivir is reported to bind in this subpocket 19 where K551 and R553 are located within motif F, D623 located within motif A, S682 located within motif B, and D760 located within motif C are the major contributors to ligand binding (PDB 7BV2). While these amino acids are conserved, remdesivir only provides moderate improvement in the recovery time of patients with severe symptoms of COVID-19. 4 Further exploring this binding site with compounds of a higher binding affinity would seem warranted. 3.2.3. Class III. There are 17 PDB structures belonging to three RNA viruses (Table 1 ) in this class. The interaction fingerprints are distributed in the regions helix1, motif C, motif E, and the thumb (Figure 4) , which form a binding pocket to accommodate the ligand (Figure 5c ). Specifically, in helix1, the three residues P197, R200, and L204 provide the primary interactions with the ligand and are conserved in the class (Figure 4) . Motif C is a beta-hairpin folding (Figure 1) , and on each strand, there are 3 conserved amino acids (residues 314− (Figures 4 and 5c) . Within motif E, L360, I363, S365, C366, and S368 provide the main binding interactions (Figure 5c ). Two conserved residues (L360 and I363) define a unique fingerprint for the class. Compared with the class I binding pocket, both pockets are composed of helix1, motif E, and the thumb domain. However, the difference is that in class III, motif C is involved as well; thus, the binding pockets partially overlap each other. In a previous report, 48 Mayland et al. discovered an inhibitor GSK-5852, which just targets the class III pocket in HCV RDRP, to treat HCV infection. 48 3.2.4. Class IV. This is the largest class with 63 PDB structures belonging to 4 viruses (Table 1 ) and has interaction fingerprints most similar to class I and class III (Figure 4) . Specifically, using a HCV complex (PDB ID 3cwj) as the representative (Figure 5d ), in the region of helix1, F193, P197, and R200 interact with the ligand. Residues D318 and D319 from motif C and residue C366 from motif E are also conserved, as was found in class III. Distinct from classes I and III, residues from motif B participate in the binding interactions, notably N291. Another difference occurs in motif E; only residue C366 from the hairpin loop interacts with the ligand, differing from Class III, which involves additional residues. Interestingly, within motif E, C366 is highly conserved (Figure 4 ). In the Thumb domain, there are interactions not found in the other classes. Thus, in class IV, the pocket is composed of the thumb domain, motifs B, C, and E (Figure 5d ). In summary, according to our clustering analysis, there are four distinct binding modes in the conserved core architecture of RDRP, each with different subpockets to accommodate diverse inhibitors. Classes I, III, and IV have helix1 and motif E that always participate in ligand-binding interactions; hence, their binding pockets have a common overlap. Class II has a different subpocket, which has been exploited as a primary target 11 for studying the drug remdesivir to fight COVID-19. 11 With the above-mentioned binding classes in mind, we screened 7894 FDA-approved small-molecule drugs targeting the RDRP catalytic domain. In so doing, we recognize the limitations of such in silico findings; they are nothing more than suggestions requiring experimental validation. Two different subpockets were chosen as the binding pockets. Subpocket 1 is located within class II and subpocket 2 is located in the area centered on the common region of classes I, III, and IV (see the two subpockets highlighted with spheres 44 in Figure S3 ). Through virtual screening (see Methods), the top-five highest scoring compounds against each subpocket are listed in Table 2 . For subpocket 1, there is an inhibitor of factor Xa (Darexaban), 49 which prevents venous thromboembolism by acting as an anticoagulant and antithrombotic after surgery; two inhibitors of histone deacetylase (4SC-202 and CUDC-907); 50,51 an inhibitor of dipeptidyl peptidase 4 (DB07779); 22 and an inhibitor of EGFR (Osimertinib). 52 These subpocket 1 inhibitors interact with Motifs A−D and F and G (Figure 6a ). For comparison, remdesivir (accession number DB14761) is included in our compound library, and its docking score is 6.0 (Figure 5b) , considerably less than our top-scoring inhibitors. Screening of subpocket 2 revealed five inhibitors with a binding affinity of >7.9 (Table 2, Figure 6b ). It is noteworthy that two of the inhibitors (LY-517717 and DB07074) 53 target coagulation factors X and XI, respectively. It is reported that COVID-19 induces blood clotting in the lungs and elsewhere. 54, 55 As blood thinners, these drugs might have the added value of reducing blood clotting, 56 and indeed, there are multiple clinical trials using anticoagulants. 57 Pentamidine is an agent to treat pneumocystis pneumonia in HIV-infected patients. 58 Nafamostat is a short-acting anticoagulant, which acts as a serine protease inhibitor and is reported to have antiviral properties, 59 and is undergoing a clinical trial in Japan. To summarize, we characterized RDRP binding pockets, suggesting four classes of binding modes (classes I−IV). In silico screening against two of these completely different pockets (subpockets 1 and 2) provided a series of putative inhibitors with a high binding affinity. Again, we emphasize that experimental validation is necessary to draw any meaning from this putative outcome given, among other possible computational inaccuracies, the unreliability of such binding affinities. In this paper, we explored the structural characteristics of the RDRP catalytic domain using a computational pharmacology method. More specifically, we focused on the ligand-binding characteristics of the RDRP binding site using a receptor− ligand function-site interaction fingerprint strategy. We collected all available RDRP structures and analyzed the conserved core structure. Across the entire data set, a "cupped right hand" folding pattern and 7 conserved motifs characterize a highly similar RDRP architecture. By analyzing these protein−ligand complexes with an overall shared architecture, four different classes of binding modes were revealed. Class II is based on the pocket consisting of motifs A−D and F and G, whereas classes I, III, and IV have distinct yet somewhat overlapping characteristics; for example, helix1 and motif E always participate in ligand binding. In terms of distinct characteristics, class I has a unique binding mode in the thumb domain, class III in motif E, and class IV in motifs B and E. On the basis of these RDRP−ligand-binding features, multiple FDA drugs were screened to determine possible repurposing opportunities. The top-10 speculative inhibitors against the two most distinct subpockets are discussed. One is already part of a clinical trial as a potential COVID-19 drug, and three anticoagulants are also included. In sum, these results provide structural insights into targeting the RDRP catalytic domain and provide potential repurposing opportunities that need experimental verification. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00623. (Table S1) , similarity of catalytic domains between all RDRPs and SARS-CoV-2 RDRP ( Figure S1 ), RDRP structure data set ( Figure S2 ), and two subpockets (gray and lime) used to screen the compound library ( Figure S3 ) (PDF) The complete RDRP data set (Table S2 ) (XLSX) The aligned ligand-binding sites (Table S3 ) (XLSX) The WHO situation report A pneumonia outbreak associated with a new coronavirus of probable bat origin Research and Development on Therapeutic Agents and Vaccines for COVID-19 and Related Human Coronavirus Diseases Vaccines: Status Report Phylogenetic network analysis of SARS-CoV-2 genomes Understanding the Latest Human Coronavirus Threat. Viruses Fall, I. S. Ebola Virus Transmission Caused by Persistently Infected Survivors of the 2014−2016 Outbreak in West Africa An update on Zika virus infection Drug repurposing to target Ebola virus replication and virulence using structural systems pharmacology Inhibition of Influenza A Virus Replication by Compounds Interfering with the Fusogenic Function of the Viral Hemagglutinin The uncoupling of catalysis and translocation in the viral RNA-dependent RNA polymerase A Structural Overview of RNA-Dependent RNA Polymerases from the Flaviviridae Family Insights from Structure, Function and Evolution. Viruses Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir NIH clinical trial shows Remdesivir accelerates recovery from advanced COVID-19 Xenarios, I. New and continuing developments at PROSITE ) JChem 6.2.0; ChemAxon Extended-connectivity fingerprints Improving the search performance of extended connectivity fingerprints through activityoriented feature filtering and application of a bit-density-dependent similarity function Molecular similarity in medicinal chemistry Insights into the binding mode of MEK type-III inhibitors. A step towards discovering and designing allosteric kinase inhibitors across the human kinome Structural Insights into Characterizing Binding Sites in Epidermal Growth Factor Receptor Kinase Mutants Delineation of Polypharmacology across the Human Structural Kinome Using a Functional Site Interaction Fingerprint Approach Revealing Acquired Resistance Mechanisms of Kinase-Targeted Drugs Using an on-the-Fly, Function-Site Interaction Fingerprint Approach A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites Detecting evolutionary relationships across existing fold space, using sequence order-independent profileprofile alignments TM-align: a protein structure alignment algorithm based on the TM-score Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity Calculating and scoring high quality multiple flexible protein structure alignments Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions Pythonbased Protein-Ligand Interaction Fingerprinting Optimizing fragment and scaffold docking by use of molecular interaction fingerprints sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank IChem: A Versatile Toolkit for Detecting, Comparing, and Predicting Protein-Ligand Interactions R: A language and environment for statistical computing. R Foundation for Statistical Computing DrugBank: a knowledgebase for drugs, drug actions and drug targets Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities A Conserved Pocket in the Dengue Virus Polymerase Identified through Fragment-based Screening Discovery of a potent boronic acid derived inhibitor of the HCV RNA-dependent RNA polymerase dose escalation study of YM150, an oral direct factor Xa inhibitor, in the prevention of venous thromboembolism in elective primary hip replacement surgery Elucidating the mechanism of action of domatinostat (4SC-202) in cutaneous T cell lymphoma cells CUDC-907 in relapsed/ refractory diffuse large B-cell lymphoma, including patients with MYC-alterations: results from an expanded phase I trial Treatment approaches for EGFR-inhibitor-resistant patients with non-small-cell lung cancer A phase II study of the oral factor Xa inhibitor LY517717 for the prevention of venous thromboembolism after hip or knee replacement COVID-19 and its implications for thrombosis and anticoagulation Thromboembolism and anticoagulant therapy during the COVID-19 pandemic: interim clinical guidance from the anticoagulation forum Association of Treatment Dose Anticoagulation With In-Hospital Survival Among Hospitalized Patients With COVID-19 Trial Evaluating Efficacy and Safety of Anticoagulation in Patients With COVID-19 Infection Repurposing of Drugs Is a Viable Approach to Develop Therapeutic Strategies against Central Nervous System Related Pathogenic Amoebae Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro