key: cord-0800674-fpaofb0j
authors: Jawarkar, R. D.; Bakal, R. L.; Zaki, Magdi E.A.; Al-Hussain, Sami; Ghosh, Arabinda; Gandhi, Ajaykumar; Mukerjee, Nobendu; Samad, Abdul; Masand, V. H.; Lewaa, Israa
title: QSAR Based Virtual screening derived Identification of a Novel Hit as a SARS CoV-229E 3CLpro Inhibitor: GA-MLR QSAR modeling supported by Molecular Docking, Molecular Dynamics Simulation and MMGBSA calculation Approaches
date: 2021-10-19
journal: Arabian journal of chemistry
DOI: 10.1016/j.arabjc.2021.103499
sha: 8a4b7b155cb1206782ba640250cd9b5c47a27157
doc_id: 800674
cord_uid: fpaofb0j

Congruous coronavirus drug targets and analogous lead molecules must be identified as quickly as possible to produce antiviral therapeutics against human coronavirus (HCoV SARS 3CLpro) infections. In the present communication, we bear recognized a HIT candidate for HCoV SARS 3CLpro inhibition. Four Parametric GA-MLR primarily based QSAR model (R2:0.84, R2adj:0.82, Q2loo: 0.78) was once promoted using a dataset over 37 structurally diverse molecules along QSAR based virtual screening (QSAR-VS), molecular docking (MD) then molecular dynamic simulation (MDS) analysis. QSAR-based virtual screening was utilized to find novel lead molecules from an in-house database of 100 molecules. The QSAR-VS successfully offered a hit molecule with an improved PEC50 value from 5.88 to 6.08. The benzene ring, phenyl ring, amide oxygen and nitrogen, and other important pharmacophoric sites are revealed via MD and MDS studies. Ile164, Pro188, Leu190, Thr25, His41, Asn46, Thr47, Ser49, Asn189, Gln191, Thr47, and Asn141 are among the key amino acid residues in the S1 and S2 pocket. A stable complex of a lead molecule with the HCoV SARS 3CLpro was discovered using MDS. MM-GBSA calculations resulted, from MD simulation results well supported with the binding energies calculated from the docking results. The results of this study can be exploited to develop a novel antiviral target, such as an HCoV SARS 3CLpro Inhibitor.

Coronaviruses are classified as RNA viruses. To date, seven human coronaviruses (HCoVs) viz. SARS-CoV, Middle East Respiratory Syndrome (MERS) -CoV, and SARS-CoV-2, 229E, Human coronavirus OC43, Human coronavirus NL63 (HCoV-NL63), and Human coronavirus HKU1 (HCoV-HKU1) have been discovered. First, three of the seven coronaviruses, particularly SARS-CoV, MERS-CoV, and SARS-CoV-2 are pathogenic species. Whereas the ultimate four, namely 229E, OC43, NL63, and HKU1 cause mild diseases. Coronaviruses belongs to the order Nidovirales, household Coronaviridae, and subfamily Orthocoronavirinae. Amongst the four coronavirus genera (Alphacoronavirus, Betacoronavirus, Gammacoronavirus, Deltacoronavirus), HCoVs are categorized below Alphacoronaviruses; HCoV-229E and NL63 and Betacoronaviruses; MERS-CoV, SARS-CoV, HCoVOC43 and HCoV-HKU1. SARS-CoV-2 from the Betacoronavirus genus has fairly close relatedness with two bat-derived CoV-like coronaviruses, viz. bat-SL-CoVZC45 and bat-SL-CoVZXC21 (3) Coronaviruses are of sphere-shape with a diameter of a hundred twenty five nm with the club-shaped projections on the floor that resemble a photo voltaic corona. Coronavirus has fairly the greatest genome amongst each and every positive-strand RNA viruses. (1) An incredibly transmissible coronavirus to that amount causes lethal respiratory harm was once in the beginning determined of China. The severity of the symptoms is characterized by the increased nasal mucosal plasma exudation and interferon γ (IFNγ) levels in nasal lavage specimens (4, 5) . The advance peak of respiratory tract viral loads seems within the preceding three days then infectio n or drops off dramatically within a week, correlating including development and raise within signs and symptoms stability (6) (7) (8) (9) (10) .

Dramatically, there is a consequential considerable vibrancy into the quantity of corona cases. To date, corona infection has reached more than 29 lot humans international with a mortality dimensions as high as 3.15 % (according in accordance with World Health Organization's (WHO's) report, September 2020). Despite the fact, potent hit in opposition to SARS-CoV-2 is still a dream durability (2) .

A 3C-like protease (3CLpro) additionally appear among CoV-229E, the causative agent for the severe acute respiratory sign (SARS) into human. CoV-229E and SARS-CoV exist in a complex with the inhibitors were investigated in the several crystal structures of 3Cpro from CVB3 and 3CLpro protein. (11) In it concern, numerous investigators hold utilized MD, MDS, quantitative structure-activity relationship (QSAR) studies for virtual screening to to identify a new hit for HCoV SARS 3CLpro inhibition.

QSAR techniques have been effectively implemented not only In the development of a reliable statistics-based mathematical correlation between physicochemical properties of chemical substances and their desired biological activities but also to forecast the biological activity of de novo molecules. (12) In the last couple of decades together with the advances in the computational field, wet-lab chemical experimentation has been substituted by molecular modeling and virtual experimentation that deploy fundamentals of basic sciences such as, mathematics, chemistry, physics, and algorithms (13) (14) .

Enriching the utility about QSAR methodologies among the drug search yet development endeavor, especially into the improvement on the doubtlessly potent fresh chemical entities then hit/lead together with diverse bioactivity is a captivating scientific research community. (15) With the advances in computational sciences, QSAR technologies are evolving rapidly and gaining potential makes use of in regulatory science. Food and Drug Administration (FDA) had invested a lot of efforts to facilitate the development of reliable QSAR models in setting up chemical databases using superb and protected experimental statistics accompanied with the aid of the development of computational algorithms (16) .

Successful application of high throughput screening (HTS) to molecules' libraries to find out the new lead for a particular biological property is one of the core traits in drug discovery. To set up the correlation of the undertaking of a molecule with molecular descriptors, QSAR analysis is frequently used which includes digital molecular filtering and screening based on a mathematical model. This strategy reduces the cost in the failure of a drug candidate in superior (clinical) degrees by filtering combinatorial libraries, rejecting these molecules with an expected toxic effect, and disadvantaged pharmacokinetic profiles, thereby decreasing the number of experiments. (17) Molecular docking (MD) is one of the widely used, well-established in-silico structure-based drug discovery methods. Docking describes and/or predicts ligand-target interactions at the molecular level, set up structure-activity relationships (SAR) and enable the identification of new lead candidate of therapeutic interest a priori information on the chemical structure of other goal modulators (18) . MD techniques are largely used to discover conformation adopted by means of ligands inside the binding pocket(s) of the macromolecular targets. MD additionally evaluates the ligand-receptor binding free power by way of assessing critical phenomena via a complicated intermolecular recognition system (19) .

Hit identification and lead optimization are abundantly tangled with computational modeling. In drug discovery, structure-based virtual screening (VS) has been indispensable for more than a decade with its drastically studied, underlying computational technique, docking. The parameters for VS may range with the objective, however the usual protocol is very straightforward. Principally in VS, a library of small molecules are docked into the binding pocket of a macromolecule (target receptor, protein, etc.). The system ends up by way of returning various solutions per molecule, ranked in the order of acceptance for similarly screening and the identification of the fine possible hit(s). (20) VS is a time, cost, resources, and labor saving approach and this has marked VS as one of the effective computational techniques to display libraries of small molecules for new hits to be experimentally examined for desired property/activity. Among the VS approaches, QSAR analysis is the most powerful method due to its excessive and speedy throughput and desirable hit rate. A QSAR model once developed and fully validated for robustness and productiveness, can be utilized to the reliable prediction of the biological property of novel compounds. Although the experimental trying out of computational hits is now not an inherent section of QSAR methodology, it is exceptionally preferred and need to be carried out as an ultimate validation of developed models, advisably. (21) In the present scientific contribution, QSAR primarily based virtual screening strategy is expected for the rapid and less expensive development of medicines to deal with SARS-CoV-2. This tactic is primarily based on discovering the anti-HCoV SARS 3CLpro attainable of leverage molecules beforehand testified to have powerful inhibitory recreation for the same. Critical evaluations of present information on HCoV SARS 3CLpro inhibitors the use of QSAR based VS supported and enriched by way of MD and MDS procedures have been carried out to perceive novel HCoV-229E inhibitors with preferred properties.

To begin, we classified the complete Chembl information set into two classes, with assays of each the wild kind and mutant form of the target. As documented with the aid of Chembl, no invitro assay was developed to evaluate Human Coronavirus 229E inhibitory activity towards mutant targets.

After removing structural duplicates, we used the median EC 50 value to create the QSAR models. The log-transformed EC 50 values were used for the QSAR models (22) . All the compounds test in vitro against Human coronavirus 229E were used in this collection of inhibitor structures from the ChEMBL database.In the end, 39 co mpounds representing 37 unique compounds were identified as having been tested against Human Coronavirus 229E (see table 1 ) (23) .

ChEMBL's trustworthily determining criteria were used to prefilter the compounds and data in the database: (1) the confidence score (a quantitative indicator of data quality in ChEMBL) is greater than 8; (2) expert-based curation; (3) data source (PubMed); is indicated; (4) EC 50 is a parameter of activity measurement; (5) EC 50 is precisely define (there is no ">" or "*" signal before EC 50 ); (6) We have not included the structure because it is not a multi-component complex or salt. Therefore, only the compounds tested against the Human Coronavirus 229E inhibition assay were extracted from ChEMBL.

Depiction of 37 dataset molecules used in QSAR study.

To create the structures, ACD Labs' chemical sketch program (www.acdlabs.com) was used. The structures were converted into 3D structures using Open-Babel 2.4 and then optimized with MMFF94 force fields.The 3D constructions were optimized using TINKER default settings, and then they were aligned using Open3DAlign. 

A precise way to avoid statistics leakage is to split the data set into training, prediction, and external/test sets with the appropriate composition and parts prior to exhaustive subjective feature selection (28) . For bias-free analysis, the dataset was randomly split up into training (80% = 30 molecules) and prediction (20% = 7 molecules) sets. To choose a set of molecular descriptors, a training set was used alone, and a prediction/external set was used solely to perform external validation of the model (Predictive QSAR).

We employed QSARINS-2.2.4's GA-MLR method to pick out relevant descriptors for subjective feature decision the usage of Q 2 LOO as a fitness parameter. The variety of molecular descriptors in the model is an essential factor in growing a profitable QSAR model without excessive over-fitting. Using breaking point values drawn from R 2 tr and Q 2 LOO values, a design (see figure 2 ) was plotted between the wide variety of molecular descriptors involved in the QSAR model and the number of molecular descriptors involved in the model. The breaking point used to be consequently viewed to be the optimal number of the molecular descriptors. According to Figure 2 , there are four variables that determine the breaking point. As a result, we excluded QSAR models with more than 4 descriptors. The set of information used to be arbitrarily split using random splitting in QSARINS into a training set and a prediction set (80% training and 20% prediction, respectively). After creating the model, the training set was used for external validation, that is, to reveal the model's potential to predict fresh chemical entities. (29, 30, 31, 32) With default settings, QSARINS used to be used to create GA-MLR primarily based QSAR models. In GA, the selected fitness purpose to maximize used to be Q 2 , which also covered the double crossvalidation. During the improvement of the model, it was once found that the value of Q 2 extended up to 4 variables, but then dropped significantly. To avoid overfitting and construct simple and informative QSAR models, the molecular descriptor vary was once confined to a set of 4 descriptors. (33, 34, 35) Values for molecular descriptors used in QSAR models can be determined in the extra information for every molecule. Because one of the OECD guidelines advises methodically validating a QSAR model, all of the fashions had been subjected to inside and external validation, Yscrambling, and QSARINS model applicability domain (AD) analysis. A GA-MLR based QSAR model's statistical agreeable and strength have been assessed the use of the following criteria: (a) Internal validation primarily based on leave-one-out (LOO) and leave-many-out (LMO) system (i.e. crossvalidation (CV)); (b) the usage of External validation; (c) Y-randomization (or Y-scrambling) and (d) fulfilling of precise threshold value for the statistical limits (36, 37, 38, 39, 40) : R 2 tr ≥ 0.6, Q 2 loo ≥ 0.5, Q 2 LMO ≥0.6, R 2 > Q 2 ,R 2 ex ≥ 0.6, RMSE tr < RMSEcv, ΔK ≥ 0.05, CCC ≥ 0.80, Q 2 -Fn ≥ 0.60, r 2 m ≥ 0.6, (1-r 2/ro2 ) < 0.1, 0.9 ≥ ok ≥ 1.1 or (1-r 2/r'o2 ) < 0.1,0.9 ≥ k' ≥ 1.1,| ro2≥ r'o2| < 0.3 with RMSE and MAE shut to zero. As a result, any QSAR model that did no longer meet the abovementioned criteria was once eliminated. The formulae for calculating these statistical parameters are accessible in the supplementary material.

The protein data bank provided the pdb file for SARS-CoV 229e 3CLpro (pdb id-2zu2). The pdb 2zu2 was carefully chosen for its X-ray resolution and sequence completion. For docking analysis, the optimised protein is suitable (see Figure 7 ). The native ligand (zinc-coordinating and peptidomimetic chemicals) was eliminated before docking study. The binding site for native ligand has been considered as the active site in the present work. Consequently, all the compounds were docked into the active site, where native ligand was bound with SARS-CoV 229e 3CLpr, the docking pose for the most active molecule as a representative is presented here for convenience.

The software NRGSuite was utilized to perform the molecular docking study. (41, 42) This is a free and open source software that may be used as a PyMOL plugin. With the help of FlexAID, it can detect surface holes in a protein and use them as target binding sites for docking simulations (43) . It uses a genetic algorithm to operate conformational search, model ligand and side-chain flexibility, and allows for covalent docking simulation. To acquire the great performance using NRGSuite, the flexible-rigid docking method was used with the following default settings: input method for binding sites-spherical shape (diameter: 18Å); spacing of three dimensional grid-0.375Å; side chain flexibility-no; ligand flexibility-yes; ligand pose as reference-no; constraints-no; Hetero groupsincluded water molecules; van der Walls permeability-0.1; solvent types-no type; number of chromosomes-1000; number of generations-1000; fitness model-share; reproduction modelpopulation boom; and number of top complexes-5 . For validation of molecular docking, the molecule TG-0204998, a recognized peptidomimetic inhibitor of SARS-CoV 229e 3CLpro, was used to validate the docking protocol.

The virtual screening results are used to analyze the Hit Molecule 97 with a docking score of Long-range electrostatic interactions were calculated the use of the particle mesh Ewald technique (50) with a radius of 9Å for Coulomb interactions. The non-bonded forces have been calculated using the RESPA integrator. To have a look at the balance of the complex in MD simulations, the root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), and protein ligand interactions had been measured.

During MD simulations of 2zu2 complexed with dataset compound 4, most active hit molecule 97 and least active hit molecule 70, the binding free energy (Gbind) of docked complexes was calculated using the premier molecular mechanics generalized Born surface area (MM-GBSA) module (Schrodinger suite, LLC, New York, NY, 2017-4). The binding free energy was calculated using the OPLS 2005 force field, VSGB solvent model, and rotamer search methods [1] . After the MD run, 10 ns intervals were used to choose the MD trajectories frames. The total free energy binding was calculated using equation 1:

Where, ∆Gbind = binding free energy, Gcomplex = free energy of the complex, Gprotein = free energy of the target protein, and Gligand = free energy of the ligand.

In this paper, QSAR and Molecular docking studies were employed to uncover hidden structural information responsible for SARS-CoV 229e 3CLpro inhibition. The QSAR model is created using PyDescriptor, which is simple to understand and link with biological activity. With the availability of easily accessible chemical descriptors and interpretation in terms of structural properties, the four-parameter GA-MLR model shows strong external prediction ability. Even though the current analysis used a straight evaluation of EC 50 values of the molecules in the dataset to describe the influence of a precise descriptor, it is important to note that the combined or opposite effect of unknown factors or other molecular descriptors could have a significant impact on the molecule's EC 50 value. (See  table 1) The present QSAR analysis is performed using a data set comprising structurally assorted 37 compounds with experimentally determined EC 50 value ranging from 200 to 60,000 nM. Therefore, it encompasses acceptable as well as comprehensive chemical space and data range. This will be utilized for the development of properly validated genetic algorithm combined multilinear regression (GA-MLR) model to assemble or outspread exhaustive evidence about the pharmacophoric features that govern desired bio-activity (Descriptive QSAR) and having adequate external predictive capability (Predictive QSAR). The four variable based GA-MLR QSAR model along with the selected internal and external validation parameters (see supplementary material for additional parameters) is as follow:

To accomplish a better knowledge about structural features determining the SARS-CoV 229e 3CLpro inhibitory activity, we have used interpretable molecular descriptors (as structural features) for model development. The GA-MLR QSAR four parametric model with its selected internal and external validation parameters (see supplementary material for additional parameters), is as follow: Validation parameters for QSAR model: In the present QSAR modeling work, various statistical validation parameters were suggested to justify the internal and external robustness and have typical meaning (see supplementary material for detailed descriptions and formulae). The high value of unlike statistical parameters like R 2 tr (coefficient of determination), R 2 adj. (adjusted coefficient of determination), and R 2 cv (Q 2 loo) (crossvalidated coefficient of determination for leave-one-out), R 2 ex (external coefficient of determination), Q 2 −F n and CCC ex (Concordance Correlation Coefficient) etc. and low value of LOF (lack-of-fit), RMSE tr (Root mean square error), MAE tr (Mean absolute error), R 2 Yscr (R 2 for Yscrambling), etc. alongside different graphs obtained in the developed QSAR model explain the statistical robustness as well as excellent internal and external predictive ability with no chancy correlation. Furthermore, the Williams plot specifies that the model is statistically satisfactory (see Figure 3 ). Thus, the developed QSAR model satisfies all the Organisation for Economic Co-operation and Development (OECD) suggested guidelines. (See supplimentary material section 1.3.1 for explanation and calculation method of various statistical parameters)

A properly developed and validated QSAR model successfully established a correlation between a salient pharmacophoric traits presented by molecular descriptors and their biological activity, that extend hidden information about mechanistic features of molecule, specificity of particular substituents and even presence or absence of various pharmacophoric aspects critical for SARS-CoV 229e 3CLpro inhibition. Although, in the developed QSAR model, we have compared the EC 50 value of diverse dataset molecules in correlation and as an effect of certain molecular descriptor, however a similar or opposite effect of other molecular descriptors or unknown features having a prevailing influence in determining the general EC 50 value of a molecule cannot be ignored. Moreover, In other words, a single molecular descriptor is not sufficiently expert of fully clarifying the experimental EC 50 value for such a diverse set of molecules. That is, the successful application of the established QSAR model depend on the simultaneous usage of constituent molecular descriptors.

The descriptor fnotringNsp3C3B point out closer to the frequency of prevalence of sp3 hybridized carbon atom precisely at three bonds from non-ring nitrogen atom. The descriptor fnotringNsp3C3B has nice corelationship with the pEC 50 , therefore increase in the wide variety of such combination in molecule, and may similarly enhances the SARS-CoV 229e 3CLpro inhibition. The calculation of the fnotringNsp3C3B descriptor was disallowed if the same sp3 hybridized carbon atom was once simultaneously present at one or two bonds from any different non-ring nitrogen atom. This statement supports, when we have compared the structures of molecule 1(pEC50=6.69, fnotringNsp3C3B=3) and 37(pEC50=4.22, fnotringNsp3C3B=0). If, we amplify the value of the descriptor fnotringNsp3C3B from 1 for the molecule 37 to 3 resulted into increase in the pEC 50 by means of about 2.47 unit (about 20 fold expand in the SARS-CoV 229e 3CLpro inhibition). Furthermore, the presence of a sp3 hybridized carbon atom at 3 bonds from a non-ring nitrogen atom performs a necessary role in SARs covid viral inhibition when we consider that it increases hydrophobicity and offers an electrostatic function to the molecule 1. Molecular 37, on the different hand, lacks the same property, which ought to be the reason of the discrepancy in pEC 50 of these molecules. (See fig.3 ), Similar observation is revealed when we have in contrast molecule 2 (pEC 50 =6.30nm, fnotringNsp3C3B=3) with molecule 34(pEC 50 =4.43 nm, fnotringNsp3C3B=0).

The descriptor faccH4B highlights the frequency of hydrogen atoms precisely at four bonds from the acceptor atoms. Because the descriptor has a negative correlation with pEC 50 , adding greater nitrogen atoms at four bonds from the acceptor atom might also decrease the pEC 50 value of these molecules. If the identical Hydrogen atom is simultaneously existing at two to three bonds from any acceptor atom, then it was once excluded at some stage in the calculation of faccH4B. 4) . If, we limit the value of the descriptor faccH4B from 14 for the molecule 37 to the 12 resulted into increase in the pEC 50 by about 2.47 unit (about 20 fold amplification in the SARS-CoV 229e 3CLpro inhibition). Because this descriptor has a negative coefficient in the generated models, the number of hydrogen atoms close to the 4 bonds acceptor atom is a proper combination to hire for SARS-CoV 229e 3CLpro lead/drug optimization. Because hydrogen is the smallest element, it suggests that the bulk in the vicinity of ring Nitrogen atoms be stored to a minimum. To enhance SARS-CoV 229e 3CLpro inhibition, steric bulk close to acceptor atom inside four bonds atoms be decreased or averted in future changes. 

The presence of a sp3 hybridized nitrogen atom within two bonds from ring carbon atoms is represented by this description. In the mounted QSAR model, this descriptor has a negative coefficient; thus, an amplification in the number of such sp3 hybridized nitrogen atoms should result in a decrease in the EC 50 value for the molecule for SARS-CoV 229e 3CLpro. The poor EC 50 for the molecules 34 (pEC 50 = 4.40, ringC_sp3N_2B=3), 35 (pEC 50 =4.35, ringC_sp3N_2B=4), 23 (pEC 50 =5.05, ringC_sp3N_2B=3), 25 (pEC 50 =5.05, ringC_sp3N_2B=3) and 26 (pEC 50 = 5.051, ringC_sp3N_2B=1) may additionally attributed to the high frequency of occurrence of such sp3 hybridized nitrogen atoms. In the existing dataset, there are around 14 molecules which have 3 to 1 such sp3 hybridized nitrogen within 2 bonds from the ring carbon atoms. Based on this analysis, it is rationale to say that, close combination of such nitrogen atom and ring carbon atom should be eluded in future to have higher EC 50 for SARS-CoV 229e 3CLpro inhibition. Alongside, molecule 1(pEC 50 =6.69, ringC_sp3N_2B=0), 2(pEC 50 =6.3, ringC_sp3N_2B=), 3(pEC 50 =6.2, ringC_sp3N_2B=0), 7(pEC 50 =5.74, ringC_sp3N_2B=0), 8(pEC 50 =5.74, ringC_sp3N_2B=0), and 9(pEC 50 =5.60, ringC_sp3N_2B=0) show absence of such Sp3 hybridized nitrogen atoms, that ought to be the possible cause for the decrease in the activity of these molecules. The constituent molecular descriptors obtained in the GA-MLR QSAR model have presented visible and hidden records about the structure landscapes linked to a various set of molecules investigated for their activity against SARS-CoV 229e 3CLpro inhibition in the current QSAR study. It is essential to recognize that no single molecular description can totally explain the supported EC50 distribution for such a numerous set of molecules. That is, the performance of the built QSAR model is dependent on the employment of constituent molecular descriptors at the equal time. 

Supplementary Materials include SMILES notations, calculated molecular descriptor values, pEC50, and EC 50 for a 100-compound in-house library utilized for virtual screening. We've included the five most active and five least active hit molecules from the in-house library, as predicted through the created QSAR model, for the sake of convenience. (See figure 2) Docking Analysis SARS-CoV 3CLpro is a dimeric protein with three domains in each subunit. 3CLpro has a massive loop between -strands C1 and D1, in accordance to structure-based sequence alignment. SARS-CoV 3CLpro's C1-D1 loop keeps the P2 facet chain in the S2 hydrophobic pocket. The C1-D1 loop of SARS-CoV 3CLpro secures the S2 hydrophobic pocket for the P2 side chain. Gln as the P1 residue, a hydrophobic residue at the P2 position, and a brief amino acid residue at the P1 position are all identified by means of 3CLpro with similar substrate specificity.(see figure 6 ). To unfold binding mode and interactions, the dataset molecule 4 and a known inhibitor such as TG-0204998 were docked into the active binding pocket of SARs Cov 229e 3CL pro in this study.

The substrate binding subsites are chosen as S1, S1, S2, S3, and S4. with preserved water molecules. The catalytic dyad of His-Cys is located in the active site in the cleft between domains I and II, whereas domain III participates in the protease dimerization. TG-0204998, the unsaturated ethyl ester occupies the S1site, which is in close proximity to the catalytic center. Therefore, we have selected native binding site of known inhibitors, TG-0204998 as an active site in the docking protocol.

The TG-0204998 is the peptidomimetic inhibitors of SARs Cov 229e 3CLpro, whose xray resolution shape is used to validate the docking protocol. The alignment of SARs Cov 229e 3CLpro with the TG-0204998 and the molecule four is depicted in Figure 7 , which mean that the docking protocol is agreeable (see Figure 7) . Table 2 divulges the docking scores for the 5 most active and 5 least active dataset molecules.

In this paper, we have identified a novel class of SARs Cov 229e 3CLpro inhibitor by performing a computer aided drug-design protocol. Our experiment contain selection of the dataset containing 37 structurally diverse compounds whose activity was predicted by using QSAR modelling. Further, the developed QSAR model was once used to predict the biological activity of in-house library undertaking of 100 numerous compounds. Successively, we have docked all the hundred compounds into the SARs Cov 229e 3CLpro. On the foundation of docking simulation outcomes (docking score), we have chosen 5 most active and 5 lease active hit compounds accompanied through molecular dynamic simulation and binding free energy calculations. 3CLpro. This implies that, docking outcomes are in good agreement with QSAR analysis.

Finally, the descriptor ringC_sp3N_2B gives an idea about the prevalence of sp3 hybridized nitrogen atom within 2 bonds from ring carbon atoms. In this, molecule four don't have such kind of combination which may additionally drop the binding affinity against SARS-CoV 229e 3CLpro. This exhibits that, QSAR analysis correctly identified concealed and hidden structural characteristic decisive for SARS-CoV 229e 3CLpro inhibition. Table 3 shows the molecular docking scores for the six most active and six least lively hit molecules. With a docking score of -8.043 kcal/mol and an RMSD of 1.53257, hit no. molecule 97 emerged as the most active of the 100 hit molecules. It reveals a 6.089 predicted EC50. Although various hit molecules, such as 19, 6, 39, 91, and 38, exhibited robust predictive activity, but they did now not have top binding characteristics, consequently hit molecule 97 was chosen as the most outstanding hit for the analysis. In the molecule 97, the 2-oxo-ethylthio oxygen form a hydrogen bonding interaction with the key amino acid residue His B: 41, which form a catalytic dyad of SARS-CoV 3CLpro and an important amino acid residue in the S1 pocket (See Fig. 10) . Further, the terminal side chain substituent di methoxy substituted phenyl ring structure a pi-cation contact with the hydrophobic residue Ala: 1 of S2 hydrophobic pocket. Next, 4-oxoquinazolin oxygen atom bind with Thr B: 7 residue through hydrogen bonding interaction with the involvement of water molecule. Moreover, propanamide oxygen in amide linkage joining quinazoline and dimethoxy phenyl ring, form a contact of hydrogen bond with the negatively charged Glu B:165 residue, while amide nitrogen anchored a hydrogen bond with the Asn B:141 residue. Here, binding of hit molecule 97 into the respective S1 and S2 binding pocket of SARS-CoV 3CLpro give an explanation for its binding specificity. Our docking evaluation outcomes provide a structural basis for the optimization of the Hit molecule 97 and development of potential candidate for the antiviral therapies. On the other hand, root mean square fluctuations of respective amino acids of C-α spine of 2 displayed least fluctuations signifying the stable protein structure (Figure 13 ).

Root mean square fluctuation of C-α backbone of 229e (red) & Hcov_229e (green) at its respective amino acid residues for 100 ns simulation exhibiting a stable configuration.

Ligand-protein interactions might also be tracked for the duration of the simulation. There are 4 sorts of interactions: additive, multiplicative, functions, and symmetric. Hydrogen bonds, hydrophobic, ionic, and water bridges are classifications of protein-ligand interactions. Ligand interaction of Hit Molecule 97 with the binding site residues of 229e and molecule 4 with the binding site residues of Hcov_229e; displayed the formation of non-bonded interactions such as hydrophobic interaction as shown in Figure 14 The radius of gyration (Rg) is the indicator of the size and compactness of the protein in the ligandbound state displayed in Figure 12 . We have observed the Rg plot of Cα-backbone of 229e-Hit Molecule 97 (red) and Hcov_229e-complex molecule 4(black) bound protein complex in Figure 12 having significant compactness well after the last 40 ns with an average of 25.8 Å deviation indicating the significant convergence. We observed less Rg score in complex molecule 4 bound 229e complex with 25.6 Å throughout the simulation (Figure S1, black, See Supplimentary material) . But we observed the high lowering of Rg in Hcov_229e-Hit Molecule 97 bound complex (Figure S1 , red, see Supplimentary material), which signify less compactness and lesser stability comparatively.

We have also recorded the average hydrogen bonds formed between complexes Hcov_229e-Hit Molecule 97 (red) & 229e-complex molecule 4(black), during the 100 ns simulation in (Figure S2 , see Supplimentary material). The average hydrogen bond formed for complex Hcov_229e-Hit Molecule 97 (red) is 1 and for complex 229e-complex molecule 4(black) is 2.

A chronology of the interactions and contacts that were listed on the preceding page. The (Figure S3 , See Supplimentary material) displays the total number of distinct interactions the protein makes with the ligand during the journey. Ligand-interacting residues are shown on the bottom panel in Figure S4 in Supplimentary material. Some residues make several specific contacts with the ligand, which is shown by a deeper shade of orange on the y-axis.The range of distribution of distinct forms of the molecule is determined by the RMSD of a ligand to the reference conformation. Calculate the radius of gyration using the ligand's "extendedness," which is equal to the ligand's moment of inertia. In a ligand molecule, the number of intramolecular hydrogen bonds (intramolecular HB).

Use a probe radius of 1.4 to estimate the molecular surface area. This is the van der Waals area. Use the formula with the oxygen and nitrogen atoms acting as the entire composition to get the PSA. A detailed molecular structure drawing that depicts the ligand molecules' molecular structure as well as specific amino acid residue interactions with protein residues. Interactions that occur 12.0% or more of the simulation time are reported if the simulation lasts from 0.00 to 100.00 nsec. Some residuals are capable of interacting with the same ligand atom via a variety of interactions. From the 

MMGBSA is a popular method in calculating the binding energy of ligand to protein molecules. The estimation of the binding free energy of each of the protein-ligand complexes, as well as the role of other non-bonded interactions energies were estimated. The average binding energy of the ligands dataset compound 4 (229e-complex4), most active hit molecule 97 (229e-hit6) and least active hit molecule 70 with SARS CoV-229E 3CLpro were found to be -32.2 ± 7.6, -53.81 ± 6.7 and -7.2 ± 3.4, respectively ( Table 4 ). The ΔGbind is influenced by of various types of non-bonded interactions, including ΔGbind Coulomb, ΔGbindCovalent, ΔGbindHbond, ΔGbindLipo, ΔGbindSolvGB and ΔGbindvdW interactions. Among all the types of interactions ΔGbindvdW, ΔGbindLipo and ΔGbindCoulomb energies contributed most to achieve the average binding energy. In contrast, ΔGbindSolvGB and ΔGbind covalent energies contributed the lowest to attain the final average binding energies. 

In addition, the values of ΔGbindHbond interaction of hit molecule 97, dataset compound 4 and 70inactive protein complexes showed the stable hydrogen bonds with the amino acid residues. In all the complexes ΔGbindSolvGB and ΔGbindCovalent showed unfavorable energy contributions and thus opposed binding. It is observed from figure 18 , at pre-simulation (0 ns) dataset compound 4, most active hit molecule 97 and least active hit molecule 70 at the binding pocket of SARS CoV-229E 3CLpro undergone substantial angular movement of the pose (curved to straight) after post simulation (100 ns). These conformational changes consequences the better acquisition at the binding pocket as well as the interaction with the residues for higher stability and better binding energy.

(a) Thus MM-GBSA calculations resulted, from MD simulation trajectories well corroborated with the binding energies calculated from the docking results. Therefore, it can be suggested that the dataset compound 4, most active hit molecule 97 has good affinity for the major target SARS CoV-229E 3CLpro. However, least active hit molecule 70 displayed least binding energy with SARS CoV-229E 3CLpro. The MMGBSA trajectories displayed the conformational changes in the dataset compound 4, most active hit molecule 97 and least active hit molecule 70 to achieve the best fitting in the binding cavity of the protein.

Throughout of this paper, QSAR modelling, QSAR-based virtual screening, molecular docking, and MD simulation reality findings are used to uncover the new molecule as a SARS-CoV 229e 3CLpro inhibitor. Expending four descriptors, a GA-MLR based QSAR model is invented to understand the essential pharmacophoric prospect accountable for the SARS-CoV 3CLpro inhibition. Ensuing OECD directions, the QSAR model was once appraise for both internal and external validation measures. Pharmacophoric characters counting fnotringNsp3C3B, faccH4B, com_lipohyd_3A, and ringC_sp3N_2B seems as prominent aspects that deliver SARS-CoV 3CLpro inhibition, concurrent to the cutting-edge investigation. Internal and external validation specification in the derived model have a high value. In addition, QSAR-based virtual screening yielded a compound with a lower P EC 50 value of 5.88 nm and a higher P EC 50 value of 6.08. Furthermore, molecular docking investigation of molecule 4 into the SARS-CoV 3CLpro proclaim the key pharmacophoric moieties implicated in the binding interactions that are accountable for the inhibitory potential. The MD simulation and Molecular Docking evaluation divulge the imperative pharmacophoric centers like benzene ring, phenyl ring, amide oxygen and nitrogen etc. plays vital position in executing hydrogen bonding and hydrophobic interactions with the key amino acide residues namely; Ile164, Pro188, Leu190, Thr25, His41, Asn46, Thr47, Ser49, Asn189, Gln191, His41, Thr47, Asn141. In order to produce effective and selective SARS-CoV 3CLpro inhibitors, QSAR and molecular docking yielded a consensus as well as complimentary pharmacophoric features, which should be kept in the future. Finally, the extraordinary high docking score of hit molecule 97 with SARS-CoV 3CLpro explains the higher affinity and opens up new domain for a novel SARS-CoV 3CLpro inhibitor drug.

Coronaviridae

Advances in developing small molecule SARS 3CLpro inhibitors as potential remedy for corona virus infection

Properties of coronavirus and SARS-CoV-2. The Malaysian journal of pathology

Mucosal exudation of fibrinogen in coronavirus-induced common colds

Nasal cytokines in common cold and allergic rhinitis

Frequent detection of human coronaviruses in clinical specimens from patients with respiratory tract infection by use of a novel real-time reverse-transcriptase polymerase chain reaction

Signs and symptoms in common colds

The time course of the immune response to experimental coronavirus infection of man

The Tecumseh study of respiratory illness. VI. Frequency of and relationship between outbreaks of coronavirus infection

Effect of specific humoral immunity and some non-specific factors on resistance of volunteers to respiratory coronavirus infection

Structural basis of inhibition specificities of 3C and 3C-like proteases by zinc-coordinating and peptidomimetic compounds

3D-QSAR in drug design-a review

QSAR Methods. Methods in molecular biology

Artificial Intelligence in Drug Design

QSAR and 3D-QSAR studies applied to compounds with anticonvulsant activity

QSAR models at the US fda/nctr

Evolutionary computation and QSAR research. Current computer-aided drug design

Molecular Docking: Shifting Paradigms in Drug Discovery

Molecular docking and structure-based drug design strategies

Docking and Virtual Screening in Drug Discovery

QSAR-Based Virtual Screening: Advances and Applications in Drug discovery

Combinatorial peptide library screening for discovery of diverse αglucosidase inhibitors using molecular dynamics simulations and binary QSAR models

Silico Identification of Tripeptides as Lead Compounds for the Design of KOR Ligands. Molecules

ChEMBL: towards direct deposition of bioassay data

ChEMBL web services: streamlining access to drug discovery data and utilities

QSAR modeling for anti-human African trypanosomiasis activity of substituted 2-Phenylimidazopyridines

Multiple QSAR models, pharmacophore pattern and molecular docking analysis for anticancer activity of α, βunsaturated carbonyl-based compounds, oxime and oxime ether analogues

QSAR analysis for 6-arylpyrazine-2-carboxamides as Trypanosoma brucei inhibitors

Quantitative structureactivity relationships (QSARs) and pharmacophore modeling for human African trypanosomiasis (HAT) activity of pyridyl benzamides and 3-(oxazolo [4, 5-b] pyridin-2-yl) anilides

Effect of information leakage and method of splitting (rational and random) on external predictive ability and behavior of different statistical parameters of QSAR model

Effect of information leakage and method of splitting (rational and random) on external predictive ability and behavior of different statistical parameters of QSAR model

Principles of QSAR modeling: comments and suggestions from personal experience

Understanding the Roles of the "Two QSARs

QSAR modeling: where have you been? Where are you going to?

External Evaluation of QSAR Models, in Addition to Cross-Validation: Verification of Predictive Capability on Totally New Chemicals

Why QSAR fails: an empirical evaluation using conventional computational approach

Extending the identification of structural features responsible for anti-SARS-CoV activity of peptide-type compounds using QSAR modelling

On the Misleading Use of Q 2 F3 for QSAR Model Comparison

Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection

QSAR model reproducibility and applicability: a case study of rate constants of hydroxyl radical reaction models applied to polybrominated diphenyl ethers and (benzo-)triazoles

Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient

Principles of QSAR models validation: internal and external

The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC

NRGsuite: a PyMOL plugin to perform docking simulations in real time using FlexAID

Structural basis of inhibition specificities of 3C and 3C-like proteases by zinc-coordinating and peptidomimetic compounds

November. Scalable algorithms for molecular dynamics simulations on commodity clusters

Desmond performance on a cluster of multicore processors

Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids

Prediction of Absolute Solvation Free Energies using Molecular Dynamics Free Energy Perturbation and the OPLS Force Field

Nosé-Hoover chains: The canonical ensemble via continuous dynamics

Constant pressure molecular dynamics algorithms

Ewald summation techniques in perspective: a survey

Molecular dynamics simulations of wild type and mutants of SAPAP in complexed with Shank

Conflict of Interest: Author declare no conflict of interest

The authors extend their appreciation to the Deanship of Scientific Research at Imam Mohammad bin Saud Islamic University for funding this work through Research Group no-21-09-77. Authors Rahul D. Jawarkar and Magdi E.A. Zaki are thankful to Dr. Paola Gramatica for providing free copy of QSARINS-2.2.4's.