key: cord-1022763-mblsxztv authors: Rossetti, Giacomo G.; Ossorio, Marianna; Barriot, Samia; Tropia, Laurence; Dionellis, Vasilis S.; Gorgulla, Christoph; Arthanari, Haribabu; Mohr, Peter; Gamboni, Remo; Halazonetis, Thanos D. title: Identification of low micromolar SARS-CoV-2 Mpro inhibitors from hits identified by in silico screens date: 2020-12-03 journal: bioRxiv DOI: 10.1101/2020.12.03.409441 sha: 9ac9b0e5cebdfb064ee3378ef55b46ebdf463a06 doc_id: 1022763 cord_uid: mblsxztv Mpro, also known as 3CLpro, is the main protease of the SARS-CoV-2 coronavirus and, as such, is essential for the viral life cycle. Two studies have each screened and ranked in silico more than one billion chemical compounds in an effort to identify putative inhibitors of Mpro. More than five hundred of the seven thousand top-ranking hits were synthesized by an external supplier and examined with respect to their activity in two biochemical assays: a protease activity assay and a thermal shift assay. Two clusters of chemical compounds with Mpro inhibitory activity were identified. An additional five hundred molecules, analogues of the compounds in the two clusters described above, were also synthesized and characterized in vitro. The study of the analogues revealed that the compounds of the first cluster acted by denaturing Mpro and might denature other proteins as well. In contrast, the compounds of the second cluster targeted Mpro with much greater specificity and enhanced its melting temperature, consistent with the formation of stable Mpro-inhibitor complexes. The most active compounds of the second cluster exhibited IC50 values between 4 and 7 μM and their chemical structure suggests that they could serve as leads for the development of potent Mpro inhibitors. At the end of December 2019, an outbreak of pneumonia of initially unknown etiology was reported by the Chinese health authorities. A novel coronavirus was isolated from human airway epithelial cells and identified as the cause of a disease, now referred to as COVID- 19 , that produces a wide range of symptoms, including fever, cough, shortness of breath and loss of smell and taste as the most common ones [1] . On 11 March 2020, the World 50 Health Organization declared the outbreak as a pandemic. As of December 1, 2020, the coronavirus pandemic has already infected more than 64 million people and caused more than 1.5 million deaths (www.worldometers.info/coronavirus/). ORF1b give rise to polyproteins, pp1a and pp1ab, respectively. A main protease (M pro , also called 3C-like protease) and a papain-like protease (PL pro ) process the polyproteins 60 pp1a and pp1ab into 16 nonstructural proteins, which are important for synthesis of viral RNA and structural proteins (envelope, membrane, spike and nucleocapsid proteins). The three-dimensional structure of SARS-CoV-2 M pro has been described by several groups [4] [5] [6] [7] [8] [9] , revealing that the N-terminus of M pro contains two domains (domain I, residues 10-99; and domain II, residues 100-184) that adopt a chymotrypsin-like fold. The active site is 65 located at the cleft formed between these domains and contains a Cys145-His41 catalytic dyad. A total of five protein segments comprising residues 25-27 and 44-50 from domain I and residues 140-143, 165-168 and 188-190 from domain II form the walls of the active site. The M pro protease is essential for the viral life cycle and is also among the most highly conserved proteins in the coronavirus family. For example, the M pro proteases of SARS-CoV-1 and SARS-CoV-2 are 96% identical at the amino acid sequence level and their active sites are 100% identical [4]. For the above reasons, M pro is considered to be a good target for the development of novel treatments for COVID-19. Effective inhibitors of SARS-75 CoV-2 M pro could impact the course of COVID-19, but, more importantly, they might prevent possible future pandemics caused by other Betacoronaviruses. The conservation of M pro extends beyond the coronavirus family. Thus, GC376, one of the most potent inhibitors of SARS-CoV-2 M pro , was originally developed as an inhibitor of the 3CL pro of Norwalk virus, a member of the calicivirus family [10] . GC376 and the related compounds GC373 and GC375 are dipeptidyl compounds with different warheads [10] [11] [12] . 85 The electrophilic warheads allow these inhibitors to form covalent bonds with the thiol group of the active site cysteine. GC376 inhibits the main proteases of picornaviruses, caliciviruses and coronaviruses [10] and is being developed for the treatment of cats infected with feline coronavirus [13] . GC376 also inhibits the M pro of SARS-CoV-2 with an IC50 of 30 ± 8 nM [14, 15] . 90 Another potent inhibitor of SARS-CoV-2 M pro , PF-00835231, was developed originally as an inhibitor of the M pro of SARS-CoV-1 [16] . PF-00835231 is structurally related to GC376, but features a different N-capping group and uses a hydroxy-methyl-ketone as warhead to form a covalent bond with the active site cysteine; it has an impressive in vitro IC50 of 0.27 95 ± 0.1 nM [16] . However, efforts to develop this compound for treatment of COVID-19 in humans suggest that a continuous intravenous infusion of a prodrug would be needed to achieve effective doses in the plasma of human patients [17] . Additional efforts to develop SARS-CoV-2 M pro inhibitors include screening by X-ray 100 crystallography of a library of very small chemical compounds (referred to as fragments) to identify compounds that bind M pro [18, 19] . These hits can be used as starting points to develop M pro inhibitors. Finally, many groups have used the reported crystal structure of M pro to identify inhibitors 105 by in silico screening. Two of these studies screened more than one billion compounds [20, 21] . As our group participated in one of these endeavors, we followed upon this work by examining whether some of the top hits could actually inhibit the protease activity of SARS-CoV-2 M pro in vitro. We describe here our experience and the identification of three structurally related inhibitors that inhibit SARS-CoV-2 M pro in vitro with IC50 values between 110 4.2 and 7.4 μM. Protein expression and purification of SARS-CoV-2 M pro The M pro construct [4] provided by Rolf Hilgenfeld was transformed into E. coli strain BL21-Gold (DE3) (Agilent). Transformed clones were picked to prepare pre-starter cultures in 2 mL YT medium with ampicillin (100 μg/ml), at 37°C for 8 h. The pre-starter culture was then inoculated into fresh 120 mL YT medium with ampicillin (100 μg/ml) and incubated at 37°C overnight. The next day, the starter culture was inoculated into 1,600 mL YT medium 120 with ampicillin (100 μg/ml) and incubated at 37°C until OD600 reached a value between 0.6 and 0.8. 1 mM isopropyl-D-thiogalactoside (IPTG) was then added to induce the overexpression of M pro at 30°C for 5 h. The bacteria were harvested by centrifugation at 8260 x g, 4°C for 15 minutes, resuspended in Binding Buffer (25 mM BTP pH6.8; 300 mM NaCl; 2 mM DTT; 1 mM EDTA; 3% DMSO) and then lysed using an Emulsiflex-C3 125 homogenizer (Avestin). The lysate was clarified by ultracentrifugation at 137,088 x g, 4°C for 1 h and loaded onto a HisTrap FF column (Cytiva) using an Äkta protein purification system (Cytiva). When all the supernatant containing M pro had passed through the column, the column was washed with 80 mL binding buffer to remove non-specifically bound proteins and then M pro was eluted using an imidazole gradient (0-500 mM) in Binding 130 Buffer. The M pro fractions were concentrated using 3 kDa Amicon Ultra Centrifugal Filters (Merck Millipore) and the M pro protein was further purified by size exclusion chromatography using a HiLoad Superdex 200 column (Cytiva) attached to a SMART protein purification system (Pharmacia). Compounds 482 compounds, referred to as parent compounds, were selected from lists of highranking, based on in silico screens, putative M pro inhibitors (Supplementary Tables 1-4) . After evaluating the activity of these compounds in vitro, we selected an additional 578 compounds that were analogues of a few active parent compounds (Supplementary 140 Tables 5-7). All the above compounds were purchased from Enamine, their purity was ≥ Development of a FRET-based assay for SARS-CoV-2 M pro activity The plasmid encoding the SARS-CoV-2 M pro protease with a C-terminal His-tag was kindly provided by Rolf Hilgenfeld [4] . M pro was expressed in E. coli strain BL21-Gold (DE3) and the recombinant protein was purified in two steps by affinity and size exclusion 180 chromatography (Fig. 1a) . The protease activity of purified M pro was studied using a FRET-based assay suitable for high throughput analysis. The FRET substrate peptide contains at its N-terminus a dye (HiLyte-Fluor488), whose fluorescence is quenched by a C-terminal quencher (QXL520). 185 Cleavage of the peptide by M pro leads to an increase in fluorescence intensity. The assay was carried out in 384-well plates in a reaction volume of 10 μL with the protease and fluorogenic substrate at final optimized concentrations of 100 nM and 500 nM, respectively. The fluorescence intensity was measured kinetically every 10 min in a microtiter plate-reading fluorimeter in the presence or absence of the M pro inhibitor GC376 190 [10] [11] [12] . In the absence of the inhibitor, the fluorescence intensity increased linearly during the first 60 min of the reaction, whereas in the presence of 40 μM GC376 no increase in fluorescence intensity was observed (Fig. 1b) . The substrate and inhibitor were dispensed using an acoustic liquid dispenser, which allowed compounds to be dispensed directly from the stock solution plates. 195 Validation of putative SARS-CoV-2 M pro inhibitors identified by in silico screens We described recently two in silico screens of one billion compounds using M pro as a target [20] . The first screen, hereafter referred to as screen 1A, used the three-dimensional 200 coordinates of M pro described by Jin et al [5] (pdb id: 6lu7). The second screen, hereafter referred to as screen 1B, used the coordinates of M pro described by Dai et al [3] (pdb id: 6m0k), except that different rotamers were used for residues S46, M49 and C145, in the hope of expanding the active site and capturing a larger repertoire of chemical compounds [20] . Comparison of the top 1,000 hits identified by screens 1A and 1B, revealed an 205 overlap of only 12 compounds. From screen 1A, 3,808 top-ranking hits were evaluated and 195 compounds thereof were chosen to be synthesized (Table S1 ). Main selection criteria included drug-likeness and chemical diversity. From screen 1B, 3,851 top hits were evaluated and 226 compounds 210 thereof were selected for chemical synthesis (Table S2 ). In addition, guided by the results of a crystallographic fragment screen [19] that showed a fragment containing a nitrile group deep in the active site of M pro (pdb id: 5r82), we identified all the nitrile-containing compounds among the top 20,000 hits of screens 1A and 1B. This list included 253 compounds, 45 of which were synthesized (Table S3) . Finally, we also ordered synthesis 215 of 12 of the top 15 hits from screen 1A and of 8 of the top 15 hits from an in silico screen of 1.3 billion compounds conducted by Ton et al [21] , hereafter referred to as screen 2 (Table S4 ). All the above compounds, 486 in total, were assayed at a final compound concentration of 220 40 μM for their ability to inhibit the protease activity of M pro . From the 207 compounds selected from the hits of screen 1A (Table S1 and first 12 compounds of Table S4 ), only weak inhibitors of M pro were identified and none of them were pursued further. From the 226 compounds selected from the hits of screen 1B (Table S2) , one compound, Z1037455358, was particularly active (Fig. 2a,c) ; whereas from the 45 nitrile-containing 225 compounds selected from the hits of screens 1A and 1B (Table S3) , the structurallyrelated compounds Z637352244 and Z637352642 (both from screen 1A) appeared promising (Fig. 2a,c) . Finally, from the 8 compounds selected from the hits of screen 2 (bottom 8 compounds of Table S4 ), two structurally-related compounds, ZINC000636416501 and ZINC000373659060, were weakly active (Fig. 2b,c) . We decided 230 to investigate further the compounds cited above by having synthesized or ordering, when commercially available, analogous compounds based on similarity and substructure searches. Characterization of a cluster of nitriles and a diamino-quinazoline singleton 235 We first focused our effort on the two structurally-related compounds Z637352244 and Z637352642 (Fig. 2c) . These two compounds were identified by screen 1A and have in common a nitrile, a functional group which is also present in a fragment that was found to bind M pro by crystallographic screening [18] . To identify more potent compounds, we ordered 301 analogues, most of which were chemically synthesized for this project (Table 240 S5). As expected, several of the analogues inhibited M pro , when tested at a final concentration of 40 μM (Fig. 3a) . We determined the IC50 concentrations of the original compounds and of the five most promising analogues. The parent compounds Z637352244 and Z637352642 exhibited IC50 values of 16 and 96 μM, respectively (Fig. 3b,c) . Three analogues, Z56785964, Z637450230 and Z56786187, showed IC50 values between 13-17 μM, whereas two analogues, Z2239054061 and Z637352638, had IC50 values of 6.7 and 7.5 μM, respectively (Fig. 3b,d) . We will refer to this family of compounds as the cluster of nitriles. We next focused our efforts on compound Z1037455358, which was identified by screen 250 1B and which contained a diamino-quinazoline core. The IC50 of this compound was 19 μM (Fig. 3e) . We obtained 108 analogues of this compound (Table S6 ), but none of them were more potent than the parent compound and, therefore, none of them were further pursued. However, we retained the parent compound for further analysis and we will refer to it as the diamino-quinazoline singleton. 255 To continue validation of the nitrile cluster and the diamino-quinazoline singleton, we examined their effect in a thermal shift assay (TSA). Briefly, M pro protease, at a final concentration of 1 μM, was incubated for 20 min with the inhibitors at a final concentration of 20 μM; the melting temperature of M pro was then determined. Compounds that bind to 260 M pro should enhance its melting temperature [22] . Indeed, GC376 increased the melting temperature of M pro by 19°C (Fig. 4a) . Surprisingly, the parent nitrile-containing compounds Z637352244 and Z637352642 decreased the melting temperature of M pro by 2-3°C (Fig. 4a) . Three of their analogues, Z637352638, Z2239054061 and Z56785964, decreased the melting temperature of M pro even more, by 8-11°C. Of the remaining two 265 analogues, Z56786187 decreased the melting temperature of M pro by 4°C, whereas Z637450230 had no effect on the melting temperature (Fig. 4a) . Compound Z1037455358, the diamino-quinazoline singleton, also led to a modest decrease in the melting temperature of M pro (Fig. 4b) . The observed decrease in the melting temperature of M pro by the above compounds was of concern, as it could indicate that these compounds were inhibiting the protease activity of M pro by acting as unspecific denaturing agents. We reasoned that, if this was the case, then the compounds would lose inhibitory activity when the protease assay was performed in the presence of non-specific proteins that could serve as a sink for compound 275 sequestration. Indeed, the parent nitrile-containing compound Z637352244, its analogues Z56786187 and Z637450230, and the Z1037455358 singleton all lost activity when the protease assay was performed in the presence of 1 μg cell lysate (Fig. 5) . In contrast, the previously described inhibitor GC376 maintained activity in the presence of the lysate and, interestingly, so did compounds ZINC000373659060 and ZINC000636416501 (Fig. 5) , 280 which were identified by the second in silico screen (screen 2) [21] . Compounds ZINC000373659060 and ZINC000636416501 are structurally related to each other and contain a dihydro-quinolinone core (Fig. 2b) . Encouraged by the fact that the 285 activity of these two compounds was not affected by the presence of cell lysate, we obtained 157 analogues (Table S7 ) and examined their ability to inhibit the protease activity of M pro . Three analogues were found to be significantly more potent than the parent compounds (Fig. 6a) . Specifically, compounds Z228770960, Z393665558 and Z225602086 had IC50 values of 4.2, 5.9 and 7.4 μM, respectively (Fig. 6b) . Importantly, all 290 three analogues retained their inhibitory activity against M pro in the presence of a cell lysate (Fig. 6c) . The parent ZINC000373659060 and ZINC000636416501 compounds and their three active analogues were then examined for their ability to modulate the melting temperature 295 of M pro in the thermal shift assay. The parent compounds did not affect the melting temperature of M pro (Fig. 7a) . However, the analogues increased the melting temperature of M pro (Fig. 7b) ; the most active analogue, Z228770960, induced the greatest increase in melting temperature (1.2°C; Fig. 7b ). These results are consistent with the hypothesis that the analogues inhibit M pro by binding to its active site and stabilizing the protein, as 300 predicted by the docking software that led to the identification of the parent compounds ZINC000373659060 and ZINC000636416501 (Fig. 8 ). Prospects COVID-19 has had a significant impact on our society and this is likely to continue for 305 several additional months. The development of vaccines against SARS-CoV-2 has proceeded with great speed, as compared to previously developed ones. Yet, despite their rapid development, the vaccines will become available only about one year after it became clear that COVID-19 was spreading as a pandemic and it will take several months to vaccinate a significant fraction of the population. 310 It is interesting to note that the active site of the M pro protease of SARS-CoV-2 is identical to that of SARS-CoV-1 and very similar to those of the proteases of many other members of the coronavirus family. Indeed, the best SARS-CoV-2 M pro inhibitors are actually compounds that were originally identified as inhibitors of SARS-CoV-1 M pro [16, 17, 23] . 315 However, these compounds were not developed to the point that they could be used to treat COVID-19 patients, because the SARS-CoV-1 epidemic was short-lived and the interest in developing agents that could treat SARS waned. In retrospect, it appears that this decision was a mistake. The analogues that we have identified here are not potent enough to treat COVID-19 patients and their toxicity and pharmacokinetic profile is unknown. However, they have drug-like features and, therefore, the potential to be developed into drugs. In fact, these compounds might be the first class of compounds developed de novo against SARS-CoV-2 M pro [23] . In addition, whereas the majority of M pro inhibitors form covalent complexes 325 with the catalytic cysteine, the compounds that we have identified are non-covalent inhibitors, lacking a highly reactive electrophile, which might make it easier to develop them for oral administration, a prerequisite for responding to a pandemic. For these reasons, we intend to proceed with the development of these compounds, not to impact the evolution of the current pandemic, but to better prepare ourselves for the next one. 330 Diamino-quinazoline singleton Z1037455358. Tan for the China Novel Coronavirus Investigating and Research Team Covid-Moonshot Consortium DMSO: 56.36°C GC376: 66.63-75°C Z56785964: 48.63-55.98°C Z637450230: 56.55°C Z56786187: 52°C DMSO: 56.36°C GC376: 66.63-75°C Z1037455358: 55.9°C The authors thank Anh-Tien Ton and Artem Cherkasov for providing the pdb coordinates of compounds ZINC000373659060 and ZINC000636416501 docked to M pro [21] .