key: cord-190207-en96o8zo authors: Jim'enez-Avalos, Gabriel M.; Vargas-Ruiz, Ana P.; Delgado-Pease, Nicol'as E.; Olivos-Ramirez, Gustavo E.; Sheen-Cortavarr'ia, Patricia; Fernandez, Manolo; Zimic-Peralta, Mirko J. title: High-Throughput Virtual Screening of 4487 flavonoids: New insights on the structural inhibition of SARS-CoV-2 Main Protease date: 2020-08-30 journal: nan DOI: nan sha: doc_id: 190207 cord_uid: en96o8zo COVID-19 presents a great threat to public health worldwide and the infectious agent SARS-CoV-2 is currently the target of much research aiming at inhibition. The virus' main protease is a dimeric enzyme that has only recently begun to be thoroughly described, opening the door for virtual screening more broadly. Here, a PAIN-filtered flavonoid database was screened against four sites of the protease: a free (normal) conformation of the Substrate Binding Site (NSBS), an induced-fit state of the SBS (ISBS), a Dimerization Site (DS) and a Cryptic Site (CS). The mean binding energies of the top five ligands from each site were -9.52, -11.512, -7.042 and -10.348 kcal/mol for the NSBS, the ISBS, the DS and the CS, respectively. For the DS and CS, these top five compounds were selected as candidates to bind their respective site. In the case of SBS, the top 30 ligands with the lowest binding energies from NSBS and ISBS were contrasted and the ones present in both lists were selected as the final candidates. The final list was: Dorsilurin E (FL3FQUNP0001), Euchrenone a11 (FL2FALNP0014), Kurziflavolactone C (FL2FA9NC0016), Licorice glycoside E (FL2F1AGSN001) and Taxifolin 3'- (6"-phenyl- acetylglucoside) (FL4DACGS0020) for the SBS; Sanggenol O (FL2FALNP0020), CHEMBL2171573, Kanzonol E (FL3F1ANP0001), CHEMBL2171584 and Abyssynoflavanone VI (FL2FACNP0014) for DS and CHEMBL2171598, CHEMBL2171577, Denticulaflavanol (FL5FAANR0001), Kurzichalcolactone (FL1CA9NC0001) and CHEMBL2171578 for CS. Virtual screening integrated several confirmation methods, including cross-docking assays and positive and negative controls. All 15 compounds are currently subjected to molecular dynamics so as to theoretically validate their binding to the protease. The Coronavirus Disease-2019 (or COVID- 19) was first reported in late December 2019 in Wuhan, PRC (1). First identified as a kind of pneumonia, common symptoms are fever, cough and myalgia or fatigue, with less common symptoms ranging from sputum production, headache, haemoptysis and diarrhoea (2) . Later developments included lymphopenia and dyspnoea. These findings point at a clinical description akin to that of severe acute respiratory syndrome (SARS) coronavirus (2, 3) . Nonetheless, finding a treatment for viral diseases seldom is a serendipitous event. It requires systematic narrowing of candidates, usually by virtual screening (VS) methods. VS has been able to provide ample and robust support for drug discovery, design and evaluation efforts (4, 5) with a variety of specialized options (6) as the in silico counterpart for high throughput screening. These are often coupled in sequential or parallel arrays: hierarchical VS (HLVS) or parallel VS (PVS), respectively (5) . This development was inevitable given both the relentless growth in library sizes (7, 8) and the equally vertiginous development in the virtual screening par excellence, molecular docking (9) (10) (11) . HLVS has been invaluable in sieving through large databases of small organic molecules to find drugs for viral diseases. This is invaluable for the development of new contenders as the current pharmacological options are rendered obsolete by high viral genome variability (12) . Such developments of new drugs against virus through VS include influenza (13) , ebola (14) and dengue (15) . Trying SARS-CoV drugs against SARS-CoV-2 has been proposed (16) given the close genomic proximity between them (17) . Unfortunately, those drugs have not been universally approved for treatment and usually remain stagnant on clinical or pre-clinical trials (18) . Moreover, even small differences in amino acid sequence can affect drug efficaciousness (19) . Unsurprisingly, the appeal of VS approaches against SARS-CoV-2 has not eluded scientists during this pandemic. Most have tried to evaluate the repurposing of existing drugs (16, (20) (21) (22) . Nevertheless, these research lines run the risk of facing adverse effects later on for employing drugs designed for other sites (23) . Other papers have focused on the active site of SARS-CoV-2 main protease (M PRO ) (16, 24) , but this emphasis could be too restrictive since dimerization inhibition is a plausible alternative (25, 26) . Furthermore, the occurrence of cryptic sites is a possibility worth exploring. Loosely put, a binding site can be described as cryptic if identifiable in the ligandbound while not in the unbound structure of a given protein (27) . These locations pose a promising way to increase the druggability of a target protein (28, 29) . Therefore, an exhaustive VS of readily available, pharmacologically safe, new small molecules targeted at more than one site in the M PRO is the aim of this paper. Flavonoids are one such group of small molecules produced as secondary metabolites in fungi and plants. They present a common diphenyl structure coupled with a heterocyclic ring and with various hydroxyl phenolic groups with great chelation capacity for iron and other transition metals. Especially relevant to this research is the antiviral activity of flavonoids. The flavonoid apigenin combined with acyclovir has a significant effect on types 1 and 2 herpes virus simplex known since the early 1990s (30) . Remarkably, 64 flavonoids were previously screened against the SARS-CoV-1 3CL (chymotrypsin-like) protease (3CLpro) with promising results (31) . Furthermore, the 3CLpro from SARS-CoV-1 differs from the SARS-CoV-2 M PRO in but 12 amino acids, which justifies a further attempt for flavonoid inhibition against this second protein (32) . In that vein, a handful of flavonoids have been recently tried in silico against SARS-CoV-2 spike protein (33) , which leaves open the questions as to what other flavonoids could prove useful. All this considered, the panoply of flavonoids available in many online databases presents itself as an ideal opportunity to fuel a new quest for SARS-CoV-2 inhibition. An exhaustive search was conducted for compounds reported as flavonoids in the Metabolomics (34) database with an inhouse R script for web scraping using the rvest package (35) . Similarly, the online databases DrugBank (36) , CHEMBL (37) and PubChem (38) were subject to an automated search. For these last websites, searches were performed manually with the key words flavonoid, flavonoids and flavo, except for DrugBank, where the whole base was downloaded. Then, all four bases were standardized to show a unique ID, SMILE and online source for each compound. All four databases from DrugBank, PubChem, CHEMBL and Metabolomics were merged through an R script, additionally parsing the complete database through the same means in order to remove compounds with molar weight under 180 kDa. Compounds with identical SMILES were removed for each database. The database was then uploaded to FAFDrugs4 server (39) for further processing. Two SMILES, even though literally distinct, may be synonyms for the same compound. FAFDrugs4 reads each SMILE and interprets the code into structure in order to exclude synonymic SMILEs. Next, all screening tools to identify and purge pan-assay interference compounds (PAINS) offered by the website were employed. A final SMILE database was downloaded and matched for their unique IDs and online sources. SMILEs were read and employed to generate .smi files for each compound. Afterwards, these files worked as blueprints for the generation of .sdf files containing the bidimensional structure of each compound through the package OpenBabel v.3.1.1 (40) . These in turn were the starting point for the three-dimensional structures, each of which was subject to a 25000-step minimization through the steepest-descent method (41) . Finally, the resulting .sdf files of the minimized compounds were converted to .pdb format for their docking preparation. All these steps were performed automatically through an inhouse Python script. Three sites of interest were identified for docking targeting: the substrate-binding site (SBS), the dimerization site (DS) and the cryptic site (CS). The first two were chosen after bibliographic consideration (32, 42) while the cryptic site was originally predicted using CryptoSite (28) . Collectively, they are referred to as "putative binding sites". pH is not homogeneously distributed along the protein surface, given that pH in each region is determined by the microenvironment fostered by the residues specific to that region. In order to analyse these microenvironments, server PDB2PQR (43) was employed. Said program assigns the appropriate protonation states to an input protein's residues, presently the apo-structure of M PRO excised from PDB ID: 6LU7. This software takes into account the surrounding residues as well as the macro-system pH, which in this case was set to 7.4 (physiological pH). Thus, the protonation states of key histidines were employed to infer the microenvironment pHs for the three putative binding sites and where they stood in comparison to the general pH of the system. Each putative binding site had its respective solvent-accessible surface area (SASA) found in PyMol v.2.4 (44) , using as input the hydrogen-curated structure of M PRO obtained before. All the regions were scoured for residues capable of establishing strong interactions, aromatic rings and other relevant moieties. The exact residues which conform the S1' and S1-S5 subsites of the SBS region (42, 45) were determined by analysis of M PRO -N3 complex (PDB: 6LU7). 20 random compounds were selected in order to conclude the structure was correctly represented. Once this was settled, the files were converted from .pdb to .pdbqt format with the AutodocksTool v.4.2 package (46) . All torsions were assigned and Gasteiger charges were added for all atoms in the compound structures. M PRO structure was procured in two distinct conformations: a free conformation (PDB: 6YB7) and another conformation induced by an inhibiting ligand in the active site (PDB: 6LU7). These structures were independently curated on PDBfixer (47) with the intent of repairing lost atoms and non-standard residues, as well as for deleting heteroatoms. Using this repaired chains, both receptors were constructed as dimers from the available protomeric states in their respective .pdb files (biological assembly) using PyMol software.Then, all receptor structures were converted to .pdbqt format using the graphic interface found in the AutodockTools v.4.2 package, adding polar hydrogens and Gasteiger charges as well. For the following procedure, a semiflexible VS (with rigid receptor and flexible ligand) was conducted so as to assess the possibility of compound interaction with each putative binding site. For this purpose, Autodock Vina v.1.1.2 (48) software was employed with 24 of exhaustiveness (49) through a Python script which automated the task. In order to reach robust conclusions for SBS-binding compounds, a cross-docking methodology was followed. To this end, the receptors were free M PRO (PDB: 6YB7) and ligand-induced conformation M PRO (PDB: 6LU7). All residues conforming the SBS were included within the volume of the search box. Box coordinates for normal and induced-fit docking were 12 For each docking assay, the best pose was selected based on its 2Dscore (51) . Briefly, each pose is assigned a score based on the standard normalization (Zscore) of two variables: binding energy (ΔG) and cluster population (Pop). Said score is calculated as follows: The selected solutions were then ranked according to their binding energies. For the DS and CS VDs, the top five ligands with the lowest binding energies were selected as hits. In the case of SBS, the top 30 ligands with the lowest binding energies from the VDs against NSBS and ISBS were contrasted and the ones present in both lists were selected as hits. Then, their most favourable pose was extracted using PyMol, converted to .mol2 format and supplemented with the missing hydrogen atoms using OpenBabel at pH 7.4. The protein-ligand complex was reconstructed by merging the ligand file with its corresponding hydrogen-curated receptor (see Section 1.3 above). It is key to confirm that the docking workflow can replicate the binding event of a known experimental inhibitor and provide reference values (positive control). To this end, the potent broad-spectrum non-covalent SARS-CoV-2 inhibitor X77 was chosen as comparative standard. The ligand (X77) was excised from a X77-M PRO co-crystallized complex (PDB: 6W63) using PyMol, while the same was done for the receptor (free M PRO ) from a N3-M PRO co-crystallized complex (PDB: 6LU7). Then, the ligand .pdb file was converted into .sdf using OpenBabel v.3.1.1. This file was curated manually using PyMol again so as to correct bond order and add missing hydrogen atoms. X77 was then docked against ISBS in order to produce a cross-docking while maintaining the last VD's parameters. The resulting complex was contrasted with the initial structure (PDB: 6W63) by their RMSD using PyMol (52) . Equally important is the capacity of sieving protocols for rejecting inert molecules. These compounds could exhibit false positives in biochemical assays due to colloidal aggregation (53, 54) . This is the case of TDZD-8, which only inhibits M PRO through aggregation (45) . Thus, this molecule is taken as an appropriate negative control to test the efficacy of the present protocol at excluding unspecific, promiscuous molecules. Again, the docking of this compound follows the VD protocol explained before. Protein-Ligand Interaction Profiler (PLIP) was used with default parameters to further explore protein-ligand interactions arising from docking (55) . Coupled with previous electrochemicalstructural revision, PLIP shed light on the chemical environment surrounding each flavonoid. Regarding the SBS, it is worth mentioning that only the interactions provided by the VD against ISBS were identified and analysed. In addition to these steps, the protein-ligand complex .pdb files for each hit (see Section 2.3.1 above) was uploaded to the online service Dr. SASA (56) in order to obtain surface data. The complete database (before FAFDrug4 filters) included 6696 compounds. The final database comprised 4887 compounds. SMILE information for every compound sufficed to generate satisfactory minimized 3-D structures in .pdbqt format. The approximate mean torsion quantity was 9.73. Exhaustiveness was adjusted to 24 for Vina VS and the number of evaluations was set to 25 million for the VD as a higher number of torsions requires deeper search standards. Cryptic site prediction was done through CryptoSite, which produces a .pol.pred file containing prediction scores for every residue. The residue is presumed to have a high probability of constituting a cryptic site whenever the score is above 10. Out of these, several were also present in the same region as the SBS or DS boxes or relatively close to their residues. These were left out for the sake of dynamic independence. The residues comprising CS are Lys5, Met6, Pro108, Gly109, Arg131, Trp218, Phe219, Tyr239, Glu240, Leu271, Leu272, Leu287, Glu288, Asp289, Glu290, Arg298, Gln299 and Val303. M PRO is a homodimer with a heart-shaped structure (Figure 1 ). The SBS region of M PRO can be further subdivided into S1' and S1-S5 subsites. Respectively, these sites interact with P1' and P1-P5 segments of a generic substrate peptide (42) . S1' is defined to be the catalytic region. Other segments of the peptide are numbered consecutively, starting from P1'. According to this definition, it is possible to map S1' and S1-S5 subsites by analysing the M PRO -N3 interaction described elsewhere (45) . Shortly, the S1' subsite comprises His41, Gly143, Ser144 and Cys145; S1 includes Ser1 (from the other protomer), Phe140, Leu141, Asn142 and His163; S2 consists in Met49, Tyr54, His164, Asp187 and Arg188; S3 consists in Met165 and Glu166; S4 is constituted by Leu167, Gln189, Thr190 and Gln192; and S5 encompasses Pro168 and Ala191 (Figure 1a ). All of these residues are part of, but not limited to the ones reported in the literature (32) . Regarding dimensions, SBS has a SASA of 975.013 Å 2 . Two histidine tautomers may exist under physiological conditions: HSD (protonated in the delta position) with a pKa of 6.7, and HSE (protonated in the epsilon position) with a pKa of 7.3 (57) . In the SBS, three monoprotonated histidines were found: His41 and His163 (HSD) and His164 (HSE). Apart from histidines, three amino acids with charged side chains at physiological pH were observed within the site (Table 1) . Lastly, Cys145 forms a catalytic dyad with His41. The following residues constitute the DS with a joint SASA of 736.627 Å 2 : Arg4, Met6, Ser10, Gly11, Glu14, Asn28, Ser139, Phe140, Ser147, Glu166, Glu290 and Arg298 (Figure 1b ) (32) . Nine out of 12 residues are polar, which raises the chances of forming hydrogen bonds with a ligand. There are no histidines among the constituting residues, yet His172 and His163 (both as HSD) are found near Ser139, Phe140 and Glu166. It is worth noting that the source material that predicted SARS-CoV-2 DS inferred the location of the site based on SARS-CoV-1 DS (32), therefore their site description is perfectible by more direct studies. Similar to the SBS, five amino acids with charged side chains at physiological pH were observed (Table 1) . Finally, the CS residues ( Figure 1c ) jointly make up a SASA of 665.350 Å2. Unlike DS, only half of the residues were polar. No histidines were observed inside nor in the near vicinity. Nevertheless, this site is the most likely to host saline bonds given the nine residues with ionizable sidechains found within (Table 1) . The best 100 compounds for each putative binding site had mean binding energies of -8.89, -10.06, -8.08 and -9.21 kcal/mol for the NSBS, the ISBS, the DS and the CS, respectively. Each of these ligands' sets were selected for a VD against their respective putative binding site. Mean binding energies for each site's top fivr were -9.52, -11.512, -7.042 and -10.348 kcal/mol for the NSBS, the ISBS, the DS and the CS, respectively. After cross-referencing the NSBS and ISBS top 30, a total of fifteen compounds were retrieved ( Table 2 ). The final VD results presented the ligands Dorsilurin E (FL3FQUNP0001), Licorice glycoside E (FL2F1AGSN001) and Taxifolin 3'-(6"-phenylacetylglucoside) (FL4DACGS0020) binding only (Figure 2 ). Of these compounds, seven flavonoids (Licorice glycoside E, Taxifolin 3'-(6"-phenylacetylglucoside), CHEMBL2171573, CHEMBL2171584, CHEMBL2171598, CHEMBL2171577 and CHEMBL2171578) had more than 12 torsions, while the rest had less than eight torsions. Remarkably, all SBS-binding ligands interacted with one catalytic residue (His41) mentioned in previous literature (Table 3) . Among the most energy-favoured hits for SBS, Dorsilurin E presented the best binding energy to ISBS (-11.31 kcal/mol (Table 3) . Surprisingly, all DS hits inserted at least one of their aromatic rings into a novel pocket comprised by DS and CS residues Met6, Glu290, Arg298; by residues exclusive to CS: Lys5 and Val303; and other residues not included in any site: Ala7, Phe8, Ser113, Gln127, Phe291, Asp295 ( Figure 3 ). The cavity is also present in other crystal structures of the free MPRO (PDB IDs: 6XB1, 6XHU, 6Y2E, 6M2Q, 6Y84 and 7BRO). Interestingly, the mentioned pocket disappears in 6LU7 (MPRO complexed with inhibitor N3). However, in other MPROs complexed with different inhibitors, the novel pocket is present fully or to some degree (PDB IDs: 6M2N, 6WTT, 6XMK, 6M0K, 6W63, 6XBG). The buried rings of CHEMBL2171584 and CHEMBL2171573 sustain a cation-π interaction with Arg298 ( Figure 2 , Table 3 ). Abyssinoflavanone VI shows a π-π stacking with Phe8. The inserted rings of Abyssinoflavanone VI, Sanggenol O and Kanzonol E are substituted with hydroxyl groups. In both Abyssinoflavanone VI and Sanggenol O, the hydroxyl group in the position 5 of the backbone ring shows hydrogen bonds with Arg298. Likewise, in both flavonoids the hydroxyl group in the position 7 presents three hydrogen bonds with Met6, Ser113 and Gln127 ( Figure 2 ). Furthermore, the hydroxyl group in the position 7 of the Kanzonol E ring establishes hydrogen bonds with Met6 and Gln127. As for the CS, CHEMBL2171598 was the best inhibitor with a binding energy of -10.59 kcal/mol and a Kd of 16.19 nM. This compound interacted through hydrogen bonds with residues Lys5, Ala7 and Val125 from 2.01, 2.97 and 2.11 Å, respectively (Table 3) . Likewise, it held hydrophobic interactions with residues Lys5, Met6, Pro9, Tyr126, Asp295, Gln299 and Phe305. Interestingly, Kurzichalcolactone and CHEMBL2171598 were found to bind within the novel pocket in the same fashion as the DS hits. The information regarded surface data for each hit is included in The inhibitory ligand X77 successfully docked against the ISBS of the free M PRO (-10.77 kcal/mol). A structural alignment between the ligands on the docked structure and the original 6W63 crystal revealed a relatively low deviation from the original (RMSD: 1.352 and 1.693 Å, with and without hydrogens, respectively), considering that the docked ligand was not the original inhibitor (N3) from the employed induced conformation ( Figure 4) . As for the negative control, the best pose according to the 2Dscore had an energy of -5.46 kcal/mol. The M PRO binding sites analysis have provided new insights regarding the chemical microenvironments of the structure. Based on the aforementioned pKa's of HSD and HSE, the pH range needed for the coexistence of these two protonation states is 6.7-7.3. Hence, SBS microenvironment's pH is acidic with respect to the pH 7.4 of the general system. Additionally, His172 is proximate to His163, the latter belonging to SBS. His172 is also protonated in the HSD position. Given the proximity of these two 6.5 pKa residues (7.6 Å between alpha carbons), there is more reason to believe in the relative acidity of the SBS. Moreover, the presence of two adjacent histidines with divergent tautomeric states (His163 in HSD and His164 in HSE) suggests said microenvironment can be further divided, being the region around His163 (S1) more acidic than the region around His164 (S2). Said difference is reasonable as His163 is closer to the catalytic subsite S1'. This acidic microenvironment could ease proteolysis as it has been reported that the optimum pH range for catalytic sites in other cysteine-proteases is 5-6 (58-60). Sizeable changes in the microenvironment pH could thus lead to partial or total loss of catalytic function. Therefore, a ligand designed to not just bind the SBS but to change local pH is a hypothetical, but realistic goal for drug designers. Similar conditions are found in a DS subregion comprising Ser139, Phe140 and Glu166. His163 and His172, close to the aforementioned DS subregion, are in HSD tautomeric state, suggesting an acidic microenvironment for this DS subsection. In contrast, no histidines were observed near or within the CS. Then, Table 1 shows CS has the highest number of charged residues and thus stands a better chance of establishing saline bonds with an appropriate ligand. The absence of this type of interaction among the other compounds is explained by the electrochemically neutral nature of the hits. Future attempts at drug design might benefit from avoiding electrochemical neutrality for a greater chance of saline bond formation in this region. As has already been mentioned, Table 3 contains binding energies which suggest likelihood of effective binding. To take these numbers at face value could be misleading, however. Besides binding energy, binding location and interaction density should also be accounted for. In particular, for the SBS docking, the more subsites a ligand interacts with in a wider area, the higher the chance of proper inhibition, as the cavities needed for the proper accommodation of the natural substrate of M PRO would be sterically obstructed. For instance, besides a handsome binding energy, Dorsilurin E forms one hydrogen bond and seven hydrophobic interactions with five different SBS subsites (S1', S2, S3, S4 and S5) ( Figure 3 ). By contrast, Euchrenone a11 forms two hydrogen bonds but with the same residue, although hydrophobic interactions are found in fewer subsites (S1', S1, S2 and S3). Similarly, Kurziflavolactone C has five hydrogen bonds, seven hydrophobic interactions (two with the same residue) and a cation-π interaction, but only maintains contact with S1', S1 and S3 besides two residues outside SBS proper. Although they present the least binding energy, Licorice glycoside E and Taxifolin 3'-(6"-phenylacetylglucoside) presented the most hydrogen bonds in the SBS (nine and seven, respectively). Licorice glycoside E interacts thus with S1', S1, S3 and S4 and via hydrophobic interactions with S3, S4 and S5. Besides, it holds a hydrogen bond and a hydrophobic interaction with two separate residues outside SBS proper. Taxifolin 3'-(6"phenylacetylglucoside) holds hydrogen bonds with S2, S3 and S4; hydrophobic interactions with S1, S3 and S4; and a π-stacking with S1'. Because a wider array of subsites could enhance steric competitiveness, Licorice glycoside E stands a substantial chance at inhibition despite not showing the greatest binding energy. It is worth noting all the SBS hits show hydrophobic interactions with Glu166, a residue which interacts with the N-terminus domain of the second protomer (56) . This intercatenary interaction allows for the stabilization of SBS conformation (in the S1 subsite), which is necessary for M PRO proteolytic activity (61, 62) . Based on this, it is highly likely that ligands interacting with Glu166 may cause structural modifications on the catalytic site and hinder activity. In addition, π-π stacking interactions were observed between two compounds, Kurziflavolactone C and Taxifolin 3'-(6"-phenylacetylglucoside), and His41, the latter being one of the catalytic residues of SBS. His41 is protonated as HSD. Said tautomer prevails in this region's acidic pH. A slight variation in this microenvironment could change the protonation states and modify the aforementioned interaction. For example, a change in His41 from HSD (single protonation) to HIS+ (double protonation) could intensify bond interaction (63) . However, π-π stackings are generally weak. Further insights are needed to assess the real contribution to ligand binding by these interactions. A last note-worthy point on SBS is that two hits involve glycosylated flavonoids (Licorice glycoside E and Taxifolin 3'-(6"-phenylacetylglucoside)). A previous study on SARS-CoV-1 3CL PRO reported two glycosylated flavonoids as inhibitors, whose sugar moieties interacted with the active sites's S1 and S2 subsites through hydrogen bonds (31) . Nonetheless, only Licorice glycoside E's sugar moieties exhibited any interactions whatsoever: hydrogen bonds with His41 (S1'), Gly143 (S1'), Ser144 (S1'), His 163 (S1) and Gly166 (S3). While the visualization tools suggested a non-interactional occupation of S3 by Taxifolin 3'-(6"-phenylacetylglucoside), there is stark contrast with the results reported by the aforementioned study. The differences in sugarreceptiveness between analogous subsites could be illuminating for future studies aiming at deeper comparative analyses between both proteases. Further along the line, the M PRO DS is presented as a remarkably druggable site for possible inhibition, since a close correlation between the dimerization process and the proteolytic activity has been indicated (64) . Remarkably, all the DS hits interacted with highly conserved residues among coronaviruses, namely Arg4, Met6, Gly11, Gly14, Ser10, and Arg298 (32) . Under this view, hydrogen bonds might interfere with the dimerization and might be key for future coronavirus inhibitors development ( Figure 2 , Table 3 ). Remarkably, all the DS hits interacted at least one of the DS residues which are highly conserved among coronaviruses, namely Arg4, Met6, Gly11, Gly14, Ser10, and Arg298 (32) (Table 3) . Additionally, the binding modes observed in those compounds revealed a cavity partially spanning both DS and CS showing high affinity for substituted and unsubstituted aromatic rings. According to the ring's substitutions, the pocket would allow for π-π and cation-π interactions as well as hydrogen bonds. Both cation-π interactions and hydrogen bonds involve at least one polar group whereas π-π stackings could be present between exclusively nonpolar groups. As the novel pocket has the possibility to form any of these interactions, it must have affinity for both nonpolar or polar moieties. Therefore, the novel pocket has an amphipathic nature. This is coherent with the other results, as the introduced rings can be either hydrophobic (phenyl rings) or hydrophilic (phenyl rings substituted with hydroxyl groups). The CS may be visually divided in two regions: the first comprised by residues shared with the novel pocket (Lys5, Met6, Gln290, Arg298 and Val303), and the second conformed by the remaining residues (Pro108, Gly109, Arg131, Trp218, Phe219, Tyr239, Glu240, Leu271, Leu272, Leu287, Glu288, Asp289, Glu290 and Gln299). Five of the 11 novel pocket residues are shared with the CS, and thus have higher CryptoSite scores than the recommended threshold (28) . Although this suggests a cryptic nature for this new cavity, further analysis to confirm the reality of the pocket is wanting. By definition, a cryptic site is one that forms in holo structure, but not in the apo structure (28) . As the novel cavity was also present in the apo structure of some crystal structures of the M PRO (PDB IDs: 6XB1, 6XHU, 6Y2E, 6M2Q, 6Y84 and 7BRO it would be necessary to see a dramatic modification of this region upon ligand binding over time. Cavity expansion and ligand engulfing upon binding would count as sufficient evidence to formally propose the cavity as a novel cryptic site. Thus, it would be reasonable to submit the systems of the presented best hits to molecular dynamics simulations. Such a method would allow us to evaluate the systems' mechanics and free energy profiles. In contrast to apo-M PRO , where a free energy local minimum is expected at the novel pocket region, it is presumed that the hits that interacted with said region are able to induce a higher energy conformation and stabilize the pocket (65) . Unlike the novel pocket, the second CS region forms no observable cavity in the apo-M PRO . If a cavity is then formed upon ligand binding, this region would fit the definition of cryptic site better than the novel pocket. However, as it is the case with the novel cavity, to see this formation and settle the cryptic nature of this second region, it would be necessary to submit the systems to molecular dynamics. In other considerations, binding energy is not the only factor to take into account, as interface area between ligand and protein in complex is important as well. In short, more interface area involves less SASA, as the ligand buries more protein area. A greater interface area generally enjoys three advantages. First, the ability to sterically compete with the native substrate for the SBS is enhanced as less SASA is present for the natural substrate after the binding event. This fact is closely related to the previously mentioned occupation of the SBS subsites. Second, as the ligand occupies more protein surface, more interfacial water is dispersed and "reinserted" into the solvent, thus partly offsetting the entropic diminishment from ligand binding and preserving the second law of thermodynamics (66) . Another contribution to this compensation is an increase in the number of conformational degrees of freedom of the protein itself (67, 68) . This is the third advantage: a wider contact surface area for the ligand might make it easier to alter the native enzyme conformation for the M PRO and achieve more degrees of freedom. Said correlation has been observed both in protein-protein complexes (69) and individual subunits (70) as a relation between interface surface and sizable conformational change (71) . It is suggestive to imagine a similar relationship with regard to protein-ligand complexes. Therefore, while there is not much difference among the interface areas projected by SBS and CS ligands, DS ligands show greater standard deviation.If interface area is therefore taken as an important variable for inhibition, Licorice glycoside E and CHEMBL2171598 could be most promising for SBS and CS, respectively. Flavonoids have already been suggested as potential inhibitors of the SARS-CoV M PRO , showing a high affinity for their hydrophobic aromatic rings and hydrophilic hydroxyl groups (31) . In the present research, a group of 15 flavonoids has been identified as possible SARS-CoV-2 M PRO inhibitors. Said group shares some structural similarities with those proposed as SARS-CoV-1 inhibitors. Besides, these compounds have been evaluated against different binding sites in M PRO (SBS, DBS and CBS), and the resulting binding energies were favourable for flavonoid-protein complexes. Nonetheless, experimental validations are still required to vouch for the effectiveness of the presented compounds as true SARS-CoV-2 inhibitors. Not only PAINS, but colloidal aggregates could induce false positives (53,54), calling for the application of surface tension reduction agents such as Triton Novel Coronavirus(2019-nCoV) Situation Clinical features of patients infected with 2019 novel coronavirus in Wuhan Description and clinical treatment of an early outbreak of severe acute respiratory syndrome (SARS) in Guangzhou, PR China Virtual screening -An overview Hierarchical virtual screening approaches in small molecule drug discovery The holistic integration of virtual screening in drug discovery Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17 Mapping of the Available Chemical Space versus the Chemical Universe of Lead-Like Compounds Past, Present, and Future of Molecular Docking Molecular Docking in Modern Drug Discovery: Principles and Recent Applications Progress in molecular docking In silico virtual screening approaches for anti-viral drug discovery Recent Advances in Computer-Aided Drug Design as Applied to Anti-Influenza Drug Discovery Virtual screening of the inhibitors targeting at the viral protein 40 of Ebola virus Computational screening of medicinal plant phytochemicals to discover potent pan-serotype inhibitors against dengue virus. Sci Rep Deep learning based drug screening for novel coronavirus 2019-nCov. Prepr Therapeutic Drugs Targeting 2019-nCoV Main Protease by High-Throughput Screening An overview of severe acute respiratory syndrome-coronavirus (SARS-CoV) 3CL protease inhibitors: Peptidomimetics and small molecule chemotherapy Four Amino Acid Changes in HIV-2 Protease Confer Class-Wide Sensitivity to Protease Inhibitors Nelfinavir was predicted to be a potential inhibitor of 2019-nCov main protease by an integrative approach combining homology modelling, molecular docking and binding free energy calculation Potential inhibitors for 2019-nCoV coronavirus M protease from clinically approved medicines. bioRxiv Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CLpro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates OpenVirtualToxLab-A platform for generating and exchanging in silico toxicity data Inhibitors for Novel Coronavirus Protease Identified by Virtual Screening of 687 Million Compounds. ChemRxiv Prepr Analysis and characterization of dimerization inhibition of a multi-drug-resistant Human Immunodeficiency Virus Type 1 protease using a novel size-exclusion chromatographic approach Inhibition of a viral enzyme by a small-molecule dimer disruptor Cryptic binding sites on proteins: definition, detection, and druggability Expanding the Druggable Proteome by Characterization and Prediction of Cryptic Binding Sites Exploring the structural origins of cryptic sites on proteins Combined effects of flavonoids and acyclovir against herpesviruses in cell cultures Inhibition of SARS-CoV 3CL protease by flavonoids Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CLpro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates Targeting SARS-CoV-2 Spike Protein of COVID-19 with Naturally Occurring Phytochemicals: An in Silco Study for Drug Development Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database Easily Harvest (Scrape) Web Pages DrugBank 5.0: a major update to the DrugBank database ChEMBL web services: streamlining access to drug discovery data and utilities PubChem 2019 update: improved access to chemical data Free ADME-tox filtering computations for chemical biology and early stages drug discovery Open Babel: An Open chemical toolbox Näherungsformeln für die Zylinderfunktionen für große Werte des Arguments und unbeschränkt veränderliche Werte des Index Prediction of Novel Inhibitors of the Main Protease (M-pro) of SARS-CoV-2 through Consensus Docking and Drug Reposition PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations The PyMOL Molecular Graphics System Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors Software news and updates AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. Gentleman R, editor AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading Computational protein-ligand docking and virtual drug screening with the AutoDock suite Accelerating AUTODOCK 4 with GPUs and Gradient-Based Local Search An efficient use of Xray information, homology modeling, molecular dynamics and knowledge-based docking techniques to predict protein-monosaccharide complexes DockRMSD: an open-source tool for atom mapping and RMSD calculation of symmetric molecules through graph isomorphism Promiscuous aggregate-based inhibitors promote enzyme unfolding Colloidal aggregation: From screening nuisance to formulation nuance PLIP: fully automated proteinligand interaction profiler Calculation of accurate interatomic contact surface areas for the quantitative analysis of non-bonded molecular interactions Structure in Protein Chemistry UC San Diego Previously Published Works Purification and characterization of cysteine protease from miswak Salvadora persica Purification and characterization of cysteine protease from germinating cotyledons of horse gram Purification and properties of cysteine protease from rhizomes of Curcuma longa (Linn.) Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved a-ketoamide inhibitors Targeting the Dimerization of the Main Protease of Coronaviruses: A Potential Broad-Spectrum Therapeutic Strategy The multiple roles of histidine in protein interactions Understanding Cryptic Pocket Formation in Protein Targets by Enhanced Sampling Simulations Isothermal titration calorimetry and differential scanning calorimetry as complementary tools to investigate the energetics of biomolecular recognition Global Changes in Local Protein Dynamics Reduce the Entropic Cost of Carbohydrate Binding in the Arabinose-binding Protein Residual Ligand Entropy in the Binding of p-Substituted Benzenesulfonamide Ligands to Bovine Carbonic Anhydrase II Surface, subunit interfaces and interior of oligomeric proteins The atomic structure of protein-protein recognition sites Relative solvent accessible surface area predicts protein conformational changes upon binding Docking Screens for Novel Ligands Conferring New Biology Computational advances in combating colloidal aggregation in drug discovery We thank Dr. Miguel Quiliano for sharing his pearls of wisdom with us during the course of this research. We would also like to show our gratitude to the Universidad Privada San Juan Bautista and Mr. Guido Choque for their logistical support.