key: cord-311762-f6muhf3d authors: Chen, Yu Wai; Yiu, Chin-Pang Bennu; Wong, Kwok-Yin title: Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL (pro)) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates date: 2020-02-21 journal: F1000Res DOI: 10.12688/f1000research.22457.1 sha: doc_id: 311762 cord_uid: f6muhf3d We prepared the three-dimensional model of the SARS-CoV-2 (aka 2019-nCoV) 3C-like protease (3CL (pro)) using the crystal structure of the highly similar (96% identity) ortholog from the SARS-CoV. All residues involved in the catalysis, substrate binding and dimerisation are 100% conserved. Comparison of the polyprotein PP1AB sequences showed 86% identity. The 3C-like cleavage sites on the coronaviral polyproteins are highly conserved. Based on the near-identical substrate specificities and high sequence identities, we are of the opinion that some of the previous progress of specific inhibitors development for the SARS-CoV enzyme can be conferred on its SARS-CoV-2 counterpart. With the 3CL (pro) molecular model, we performed virtual screening for purchasable drugs and proposed 16 candidates for consideration. Among these, the antivirals ledipasvir or velpatasvir are particularly attractive as therapeutics to combat the new coronavirus with minimal side effects, commonly fatigue and headache. The drugs Epclusa (velpatasvir/sofosbuvir) and Harvoni (ledipasvir/sofosbuvir) could be very effective owing to their dual inhibitory actions on two viral enzymes. On 7 January 2020, a new coronavirus, 2019-nCoV (now officially named SARS-CoV-2) was implicated in an alarming outbreak of a pneumonia-like illness COVID-19, originating from Wuhan City, Hubei, China. Human-to-human transmission was first confirmed in Guangdong, China 1 . The World Health Organisation has declared this a global public health emergency -on 15 February 2020, there are more than 65,000 confirmed cases reported, and the death toll is over 1500. In the height of the crisis, this virus is spreading at a rate and scale far worse than previous coronaviral epidemics. It was immediately evident from its genome that the coronavirus is evolutionarily related (80% identity) to the beta-coronavirus implicated in the severe acute respiratory syndrome (SARS), which originated in bats and was causative of a global outbreak in 2003. The momentum of research on developing antiviral agents against the SARS-CoV carried on after the epidemic subsided. Despite this, no SARS treatment has yet come to fruition; however, knowledge acquired from the extensive research and development efforts may be of use to inform the current therapeutic options. The viral genome encodes more than 20 proteins, among which are two proteases (PL pro and 3CL pro ) that are vital to virus replication; they cleave the two translated polyproteins (PP1A and PP1AB) into individual functional components. The 3-chymotrypsin-like protease (3CL pro , aka main protease, M pro ) is considered to be a promising drug target. Tremendous effort has been spent on studying this protein in order to identify therapeutics against the SARS-CoV in particular and other pathogenic coronaviruses (e.g. MERS-CoV, the Middle East respiratory syndrome coronavirus) in general because they share similar active sites and enzymatic mechanisms. The purpose of this study is to build a molecular model of the 3CL pro of the SARS-CoV-2 and to carry out virtual screening to identify readily usable therapeutics. It was not our intention, however, to comment on other structure-based drug design research as these will not be timely for the current epidemic. The translated polyprotein (PP1AB) sequence was obtained from the annotation of the GenBank entry of the SARS-CoV-2 genome (accession number MN908947). By comparing this sequence with the SARS-CoV PP1AB sequence (accession number ABI96956), the protease cleavage sites and all mature protein sequences were obtained. Sequence comparison and alignment were performed with BLASTp. The high-resolution apo-enzyme structure of SARS-CoV 3CL pro (PDBID: 2DUC) 2 was employed as the template. The variant residues were "mutated" in silico by SCWRL4 3 , followed by manual adjustment to ensure that the best side-chain rotamer was employed ( Table 2 ). The rebuilt model was subjected to steepest descent energy minimisation by Gromacs 2018.4 using the Gromos 54A7 forcefield, with a restraint force constant of 1000 kJ mol -1 nm -2 applied on all backbone atoms and all atoms of the vital residues (Table 1) . Accessible surface area of residues were calculated with areaimol of the CCP4 suite v7.0. MTiOpenScreen web service 4 was used for screening against its library of 7173 purchasable drugs (Drugs-lib), with the binding site grid specified by the active-site residues. The active sites on chain A and chain B were screened independently with AutoDock Vina 5 . When the crystal structure was released, it was stripped of its inhibitor and subjected to a screening. A list of 4,500 target:ligand docking combinations ranked by binding energies was produced for each screen. The top 10 or 11 (ranked using a binding energy cut-off) hits for chains A and B were examined visually in PyMOL (version 1.7.X) 6 . An earlier version of this article can be found on ChemRxiv (DOI: 10.26434/chemrxiv.11831103.v2). The first available genome was GenBank MN908947, now NCBI Reference Sequence NC_045512. From it, the PP1AB sequence of SARS-CoV-2 was extracted and aligned with that of SARS-CoV. The overall amino-acid sequence identity is very high (86%). The conservation is noticeable at the polyprotein cleavage sites. All 11 3CL pro sites 2 are highly conserved or identical (Extended data 7 , Table S1), inferring that their respective proteases have very similar specificities. The 3CL pro sequence of SARS-CoV-2 has only 12 out of 306 residues different from that of SARS-CoV (identity = 96%). We compared the polyprotein PP1AB and the 3CL pro sequences among all 11 SARS-CoV-2 genomes (GenBank MN908947, 3D model of the SARS-CoV-2 3CL pro The amino acids that are known to be important for the enzyme's functions are listed in Table 1 . Not unexpectedly, none of the 12 variant positions are involved in major roles. Therefore, we are confident to prepare a structural model of the SARS-CoV-2 3CL pro by molecular modelling (Extended data 7 , Figure S1 ), which will be immediately useful for in silico development of targeted treatment. After we submitted the first draft of this study, the crystal structure of SARS-CoV-2 3CL pro was solved and released (PDB ID 6LU7), which confirms that the predicted model is good within experimental errors (Extended data 7 , Figure S2 ). When examined in molecular graphics 6 , all solutions were found to fit into their respective active sites convincingly. The binding energies of chain A complexes were generally higher than those of chain B by approximately 1.4 kcal mol -1 (Table 3) . This presumably demonstrates the intrinsic conformational variability between the A-and B-chain active sites in the crystal structure (the average root-mean-square deviation (rmsd) in Cα atomic positions of active-site residues is 0.83 Å). In each screen, the differences in binding energies are small, suggesting that the ranking is not discriminatory, and all top scorers should be examined. We combined the two screens and found 16 candidates which give promising binding models (etoposide and its phosphate counted as one) ( Table 3) . We checked the actions, targets and side effects of the 16 candidates. Among these, we first noticed velpatasvir ( Figure 1A , D) and ledipasvir, which are inhibitors of the NS5A protein of the hepatitis C virus (HCV). Both are marketed as approved drugs in combination with sofosbuvir, which is a prodrug nucleotide analogue inhibitor of RNA-dependent RNA polymerase (RdRp, or NS5B). Interestingly, sofosbuvir has recently been proposed as an antiviral for the SARS-CoV-2 based on the similarity between the replication mechanisms of the HCV and the coronaviruses 14 . Our results further strengthen that these dual-component HCV drugs, Epclusa (velpatasvir/sofosbuvir) and Harvoni (ledipasvir/sofosbuvir), may be attractive candidates to repurpose because they may inhibit two coronaviral enzymes. A drug that can target two viral proteins substantially reduces the ability of the virus to develop resistance. These direct-acting antiviral drugs are also associated with very minimal side effects and are conveniently orally administered (Table 4 ). The flavonoid glycosides diosmin ( Figure 1B ) and hesperidin ( Figure 1E ), obtained from citrus fruits, fit very well into and block the substrate binding site. Yet, these compounds Table 2 . In silico mutagenesis of the SARS-CoV-2 3CL pro . The 12 variant residues with reference to the SARS-CoV enzyme are shown with the respective treatment of rotamer. "A" and "B" refers to the individual chains of the dimeric model. Both chains are in the crystal asymmetric unit and are not identical. The rotamer symbol (bracketed) is defined according to the conventions of Richardson 15 , followed by its respective rank of popularity. ASA: accessible surface area (average of A and B chains) of the residue in the SARS-CoV 3CL pro structure, in Å 2 and in % relative to the ASA of a residue X in the Gly-X-Gly conformation. Residue Rotamer ASA, Å 2 (%) Remarks on replacement (Table 4) . Hesperidin hits showed up multiple times, suggesting it has many modes of binding ( Figure 1A ). Teniposide and etoposide (and its phosphate) are chemically related and turned up in multiple hits with good binding models ( Figure 1F ). However, these chemotherapy drugs have a lot of strong side effects and need intravenous administration (Table 4 ). The approved drug venetoclax ( Figure 1C ) and investigational drugs MK-3207 and R428 scored well in both screens. Venetoclax is another chemotherapy drug that is burdened by side effects including upper respiratory tract infection (Table 4) . Not much has been disclosed about MK-3207 and R428. We subjected the crystal structure to the same virtual screening procedures. A very similar list of candidates showed up consistently (Extended data 7 , Table S2 ) with high scores although ledipasvir was not found. We noticed that most of the compounds on the list have molecular weights (MW) over 500, except lumacaftor (MW=452). The largest one is ledipasvir (MW=889). This is because the size of the peptide substrate and the deeply buried protease active site demand a large molecule that has many rotatable dynamics to fit into it. We identified five trials on ClinicalTrials.gov involving antiviral and immunomodulatory drug treatments for SARS ( One record which receives a lot of attention amid the current outbreak is the lopinavir/ritonavir combination 18 . They are protease inhibitors originally developed against HIV. During the 2003 SARS outbreak, despite lacking a clinical trial, they were tried as an emergency measure and found to offer improved clinical outcome 18 . However, some scientists did express scepticism 19 . By analogy, these compounds were speculated to act on SARS-CoV 3CL pro specifically, but there is as yet no crystal structure to support that, although docking studies were carried out to propose various binding modes 20-23 . The IC 50 value of lopinavir is 50 μM (K i = 14 μM) and that for ritonavir cannot be established 24 . Although this is far from a cure, based on our results that the two CoV 3CL pro enzymes are identical as far as protein sequences and substrate specificities are concerned, we are of the opinion that this is still one of the recommended routes for immediate treatment at the time of writing (early February 2020). If we look beyond the 3CL pro , an earlier screen produced 27 candidates that could be repurposed against both SARS-CoV and MERS-CoV 25 . In addition, the other coronaviral proteins could be targeted for screening. Treatment of the COVID-19 with remdesivir (a repurposed drug in development targeting the RdRp) showing improved clinical outcome has just been reported and clinical trial is now underway 26 . We consider this work part of the global efforts responding in a timely fashion to fight this deadly communicable disease. We are aware that there are similar modelling, screening and repurposing exercises targeting 3CL pro reported or announced 20,27-33 . Our methods did not overlap, and we share no common results with these studies. The "Extended Results" folder contains the following extended data: • Tab S1.docx (Sequence homology of the 3CL pro cleavage junctions of PP1AB between SARS-CoV-2 and SARS-CoV). • Tab S2.docx (The results of virtual screening of drugs on the active site of SARS-CoV-2 3CL pro crystal structure). • Fig S1. pptx (The structural model of the SARS-CoV-2 3CL pro protease). • Compare Crystal.docx (A comparison, with Figure S2 , of the active sites of model chains A, B and the crystal structure). Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). If applicable, is the statistical analysis and its interpretation appropriate? Yes 1. 3. More details of the docking should be provided. What's the binding energy cutoff used? How is the hits (reported in Table 3 ) used? 3CLpro is catalytically active as a dimer. How is this considered in the virtual screening? What does the "(B Top scorers)" mean? In the extended data of virtual screening, one compound could have multiple entries with different ZINC numbers. For example hesperidin corresponds to at least 20 different compounds. What are the difference? And how are different results assembled? Table 1 is not clear. Please do a column-by-column comparison between different sites of SARS-CoV and SARS-CoV-2. Also please add one-letter amino acid codes for the residues. The constructed protein structure is very similar to the recently solved crystal structure (6LU7), as "... confirms that the predicted model is good within experimental errors", but the docking results seem to differ significantly. Could the authors explain? Are all the source data underlying the results available to ensure full reproducibility? Yes No competing interests were disclosed. Competing Interests: We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster reported the computational modelling and virtual screening results of the 3C-like protease et. al.(3CLpro) of SARS-CoV-2. This study is timely in view of the recent outbreak of COVID-19. The rationale of repurposing existing drugs to tackle the global viral outbreak is sound. The manuscript is also well-written and structured. It should be noted that:The authors compared their model with the recently published crystal structure of 3CLpro and found a high similarity between the two structures. They also obtained a similar list of top-ranked drug candidates when the crystal structure was subjected to the same screening protocol.Several studies using similar modeling and virtual screening approaches have also been published recently. Some suggestions for improving the manuscript:The authors proposed that the HCV drugs velpatasvir and ledipasvir, and thus Epclusa and Harvoni, could be attractive drug candidates for treating SARS-CoV-2 infection. However, there is no direct evidence to support this claim. To support this claim, the authors should connect the computational results with experimental data. To test their hypothesis, the authors should at least prove (or disprove) that the two HCV drugs could inhibit the biochemical activity of 3CLpro of SARS-CoV-2.To further test the hypothesis, the two NS5A inhibitors should be tested using in vitro assays such as viral RNA PCR assay.If there are no such experimental data to support the claim, the authors may consider revising their conclusion to "the computational results provide a rationale for further experimental validation of treating SARS-CoV-2 with velpatasvir and ledipasvir". No competing interests were disclosed. Reviewer Expertise: Medicinal Chemistry, Drug Discovery, Chemical Biology Yu Wai Chen and co-workers presented a molecular modeling and docking study of the 3CL protease in the SARS-CoV-2 virus. The manuscript started with comparing polyprotein PP1AB sequences of SARS-CoV-2 and SARS-CoV, based on which the 3D structure of SARS-CoV-2 3CLPro protein was constructed. The authors then performed virtual screening against SARS-CoV-2 3CLPro using a library of 7173 purchasable drugs. Considering both binding affinities and known side effects, the authors recommend velpatasvir and ledipasvir, and further suggest combining them with another HCV RdRp inhibitor sofosbuvir, aka repurposing the Epclusa and Harvoni for treating the coronavirus. This is a concise and timely report, and has proposed new therapeutic possibilities for the SARS-CoV-2 virus. The manuscript could be further improved by addressing the following comments. More details of the docking should be provided. What's the binding energy cutoff used? How is the The benefits of publishing with F1000Research:Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and moreThe peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com