key: cord-0776926-3vf939lg authors: Khan, Anika Tajrian; Chowdhury, Golam Mahmud; Hafsah, Juwairiyah; Maruf, Md; Raihan, Md Riyad Hossen; Chowdhury, Md Talha; Nawal, Nafisa; Tasnim, Nishat; Saha, Pranto; Roy, Prottoy; Tabassum, Raya; Rodrigues, Souvick Patrick; Hasan, Walid; Samanta, Zarin Tasnim; Kamal, Suprio; Nazir, Md Shahoriar; Ali, Md Ackas; Halim, Mohammad A. title: A student led computational screening of peptide inhibitors against main protease of SARS‐CoV‐2 date: 2021-10-09 journal: Biochem Mol Biol Educ DOI: 10.1002/bmb.21580 sha: d8e21cfc76ae9a677ced70c28973d607001b91a4 doc_id: 776926 cord_uid: 3vf939lg The main protease of SARS‐CoV‐2 is a promising drug target due to its functional role as a catalytic dyad in mediating proteolysis during the viral life cycle. In this study, experimentally proven 14 HIV protease peptides were screened against the main protease of SARS‐CoV‐2. Fourteen middle and high school “student researchers” were trained on relevant computational tools, provided with necessary biological and chemical background and scientific article writing. They performed the primary screening via molecular docking and the best performing complexes were subjected to molecular dynamics simulations. Molecular docking revealed that HIP82 and HIP1079 can bind with the catalytic residues, however after molecular dynamics simulation only HIP1079 retained its interaction with the catalytic sites. The student researchers were also trained to write scientific article and were involved with drafting of the manuscript. This project provided the student researchers an insight into multi‐disciplinary research in biology and chemistry, inspired them about practical approaches of computational chemistry in solving a real‐world problem like a global pandemic. This project also serves as an example to introduce scientific inquiry, research methodology, critical thinking, scientific writing, and communication for high school students. researchers were also trained to write scientific article and were involved with drafting of the manuscript. This project provided the student researchers an insight into multi-disciplinary research in biology and chemistry, inspired them about practical approaches of computational chemistry in solving a realworld problem like a global pandemic. This project also serves as an example to introduce scientific inquiry, research methodology, critical thinking, scientific writing, and communication for high school students. Main protease, molecular dynamics, SARS-CoV-2, student research Emergence of COVID-19 occurred at the end of 2019 1 with a novel coronavirus strain, SARS-CoV-2, which WHO recognized as global pandemic shortly. To date, SARS-CoV-2 has affected 213 countries and territories around the globe and caused deaths more than 3 million of people. 2 SARS-CoV-2 is a single stranded RNA virus. A comprehensive genome sequencing and phylogenetic analysis of SARS-CoV and SARS-CoV-2 revealed that both species are member of betacoronavirus genus and share 82% of genomic identity. 3 The SARS-CoV-2 genome contains about 30,000 nucleotides. 1 Among them overlapping replicase polyprotein duo (pp1a [replicase 1a, $450 kD] 4 and pp1ab [replicase 1ab, $790 kD] 3 ) consist more than two-third of the viral genome, which mediate all of the functions required for viral replication and transcription. 5 The polyproteins get converted into functional proteins by proteolytic enzymes. One of the proteolytic enzymes of SARS-CoV-2 is SARS-CoV-2 main protease (SARS-CoV-2-M pro or M pro ) which is responsible for most of the proteolytic activities. It excises the polyprotein at not less than 11 cleavage sites. 5 SARS-CoV-2-M pro structurally functions as a homodimer and consists of three domains (I, II, III). 3 The substrate-binding site for this protease enzyme is located in a cleft between domain I and domain II 3 where CYS145 and HIS41 form a catalytic dyad which actively participates in proteolysis and has a high potential for inhibition with other compounds, as no third catalytic residue present. 6 SARS-CoV-2-M pro is highly similar to its SARS-CoV counterpart and shares regularity in the active site and functional mechanisms with related pathogenic beta coronaviruses i.e., MERS-CoV and SARS-CoV. 7 SARS-CoV-M pro plays critical role in the viral life cycle which makes this protease an attractive target for the drug developers. 8, 9 The chances of accidentally targeting host proteins are negligible since SARS-CoV-2 M pro does not have a human homolog. 3, 10 During this outbreak, traditional drug discovery is not a feasible option because of its lengthy procedure. There are other alternatives like drug repurposing, vaccination and immunotherapy to accelerate the discovery. Knowledge from the vaccination and immunotherapy might be helpful in designing peptides and peptidomimetic therapeutics. Peptides are preferable for their specificity and affinity toward certain targets and minimal toxicity because of their less augmentation tendency in the body. Over the last 5 years (2015-2019), 208 new drugs got an approval from the U.S. Food and Drug Administration among them 15 molecules were peptides or peptide-containing entity. 11 In recent times, computational drug discovery programs have achieved popularity and success due to its ability to predict potent therapeutic agents. To prevent the COVID-19 pandemic, so far, 18 vaccines are approved by at least one country and 36 vaccines are under phase 3 clinical trial. 12 Among the approved vaccines Sputnik-V is the first vaccine registered in Russia which critics claimed a premature vaccine because of its insufficient data of clinical trials, 13 but the concern was ruled out later by Ian Jones & Poly Roy declaring Sputnik-V as safe and effective vaccine. 14 Among the approved vaccines eight of them are inactivated vaccines, five are non-replicating viral vector vaccines, three are RNA vaccines, and two are proteinsubunit vaccines. In our study, we focus on targeting SARS-CoV-2-M pro with some putative peptide inhibitors. Some of the computational studies has already shown that peptide could be potentially utilized as SARS-Cov-2 inhibitors targeting spike protein. [15] [16] [17] In response to find an effective therapeutic, herein, we have targeted SARS-CoV-2-M pro with 14 peptides which were experimentally proved to be effective against HIV virus. To evaluate the performance of the peptides, we performed their molecular docking against SARS-CoV-M pro and molecular dynamics (MD) simulation of top two candidates. Computational chemistry and its applications are introduced in undergraduate curriculums and the details of it are mostly studied in graduate levels of disciplines in chemistry. However, its application spans the entire spectrum of research in biology, chemistry and engineering. This leaves out a significant portion of students from other fields, unaware of the methods and tools to investigate a phenomenon on a molecular level. Availability and affordability of powerful computers and related computational tools has made in silico chemical investigations practicable. Our aim was to introduce this field into high schoolers in hopes of elucidating the interdisciplinary nature of research in academia through an in silicoinvestigation on screening peptide-based inhibitors of the SARS-CoV-2-M pro . The students were registered online for a "Science Research Internship for Middle and High Schoolers Program 2020" project hosted by The Red-Green Research Centre (http://grc-bd.org/). The online coordinated project had 4 weeks of lectures and hands on training on every Friday for 3 h. The major training module contents (students assignments related to literature search, reading text, analyzing data, making presentation, and drafting manuscript over the weekdays are not included in the content) can be found in Table 1 . The trainers were available online via Skype for further consultation as required by individual students. Students were also engaged in virtual discussion with their peers and trainers via Skype. After the training period, students were assigned specific peptides for docking studies with M pro (Table S1 ). Molecular dynamics on the docked complexes were performed by the trainers. After the docking and MD studies, the students were assigned with writing the manuscript. They wrote the first draft of introduction, methods, results and discussion of docking studies. The trainers reviewed the student's drafts, wrote the results and discussions of MD simulations and concluded the manuscript. This project gave middle and high schoolers a firsthand experience on the application of computational tools in chemistry for screening drug molecules for combating SARS-CoV-2 virus, a global pandemic yet to be contained. They also got acquainted with writing a manuscript and submission requirements for a peer-reviewed scientific journal. Peptide is a compound containing two or more amino acids in which the carboxyl group of one acid is linked to the amino group of the other. In this research, we have worked with 14 peptides as ligands. The peptides were obtained from the HIV inhibitory peptides database (HIPdb) and they are known to inhibit HIV protease in vitro 18, 19 (Table S2) . We used the PEP-FOLD 3.5 20 server for peptide designing ( Figure 1 ). It is a de-novo approach aimed at predicting peptide structures from amino acid sequences. The number of simulations was set to 200 for better results and the models were sorted by sOPEP. It provided us with five models after running. We considered the "Model-1" as the best model and used it as ligand. Molecular docking is the structure-based virtual screening (VS) technique that is used for drug designing. Programs based on different algorithms use docking approaches that model the interaction between ligand molecule and protein and characterize the behavior of the ligand in the binding site of the protein targeted and forms a stable complex of potential efficacy and more specificity. The processes of docking involve two basic Practice on manuscript writing steps: ligand conformation and assessment of the binding affinity. There are different sampling methods and scoring schemes that are utilized to rank these conformations. Various sampling algorithms like matching algorithms, MCSS, LUDI, base their sampling on the molecular shape, or fragment attachment and also focus on chemical properties like hydrogen bonds and hydrophobic contacts for the enrichment of active compounds. The scoring functions involve estimating the binding affinity, adopting various assumptions and simplifications, and can be divided into various classes like fieldbased, empirical and knowledge-based functions. 21 For this research, a total of 14 peptides and the crystal structure of SARS-CoV-2-M pro (PDB id: 6Y2G) were obtained from HIPdb and RCSB Protein Data Bank, respectively. The peptides were docked to SARS-CoV-M pro using PatchDock 21 and further refinement was done through FireDock 22,23 ( Figure 2 ). The binding affinities of F I G U R E 1 Peptide modeling through PEP-FOLD 3.5 server. (a) Layout of PEP-FOLD server, input of amino acid sequence of peptide, and running the job (Step 1), (b) a running job (Step 2) and (c) job completion and downloading the peptide models (Step 3) the peptides were measured in kcal/mol units and sorted according to the higher negative values, which imply the best binding affinities. Based on calculated binding affinities, the four top-ranked potential peptides were chosen for further analysis. For visualizing and detecting the non-bond interactions in the docked ligand-protein complex, the educational version of PyMol, 24 and BIOVIA Discovery Studio software package 25 were utilized. F I G U R E 2 Peptide docking on protein and refining via PatchDock and FireDock server, respectively. Layout of PatchDock server, input of peptide and protein pdb files, and running the job (Step 1), a completed PatchDock job and proceeding to refining via FireDock (Step 2) and FireDock job completion and downloading the refined docked complexes (Step 3) In this study, the MD simulations were performed for docked anti-HIV peptide HIP1079 and HIP82 with the main protease of SARS-CoV-2, obtained from the highest binding affinity results. The YASARA dynamics suit 26 was used to evaluate conformational change and protein-peptide interaction (Figure 3a,b) . For each system, the total simulation time for each case was 25 ns MD simulations with 100 ps time step. The cuboid cell box of and periodic boundary surroundings were applied with long-range interactions behind the cut-off radius 8.0 Å regard for using the particle-mesh Ewald (PME) method. 27 AMBER 14 28 was considered for all MD simulations at constant F I G U R E 3 (a) Molecular dynamics in YASARA. Loading desired protein-peptide complex (PDB format) for MD simulation (step 1). Defining simulation cell resembling human body condition (Step 2). Creating YASARA scene file to prepare loaded file for MD run (Step 3). Initializing MD simulation with "md_run.mcr" macro file (Step 4). (b) Molecular dynamics simulation in YASARA (continued Step 5). Setting up macro files that analyzes and extracts results from the simulation job using md_analyze.mcr, md_analyzered.mcr and using md_convert.mcr to export the structures at different timepoints of the simulation as .pdb files. The results include data of root-mean-squaredeviations, solvent accessible surface area, radius of gyration, root-mean-square-fluctuations and other geometric parameters temperature (310 K) and pH (7.4) using the NVT ensemble. The 0.9% concentration of NaCl salt was used to neutralize the system where water molecules (density 0.997 g/ml) were considered. To analysis, time, root mean square deviation (RMSD values for backbone, alpha carbon, and all heavy atoms, solvent-accessible surface area (SASA), molecular surface area (MolSA), root mean square fluctuations (RMSF) were recorded. The BIOVIA Discovery Studio software package was used to analysis nonbonding interactions from the MD simulation results. Among the 14 selected peptide-M pro complexes obtained from FireDock, we chose four complexes according to their FireDock global energy score (Table S3) . HIP82-M pro complex showed the highest score, and the ligand rested at the desired pocket ( Figure 4) . Close view analysis revealed that HIP82 binds with one of the F I G U R E 3 (Continued) catalytic residues (His41; Figure 5 ). Second highest binding affinity holder was HIP1033-M pro complex. In HIP1033-Mpro complex, the ligand harbored away from the binding location (Figure 4b ). In between HIP1034-M pro and HIP1079-M pro , HIP1034 had slightly higher affinity toward protein entity but it binds to only one catalytic residue (Cys145) ( Figure 5 ). On the contrary, HIP1079 binds to both of the catalytic residues HIS41, CYS145 ( Figure 5 ). Based on the high scores and interaction with catalytic sites in docking studies, HIP82 and HIP1079 were then subjected to MD. The peptides HIP1079 and HIP82 were subjected to 25 ns MD. Molecular dynamics is a powerful tool to investigate dynamic structural information of macromolecules in atomic details and in context of a biological system. It gives us geometrical properties of the molecules under investigation, such as RMSD, SASA, radius of gyration (Rg), RMSF, and secondary structure elements (SSE). The chemical nature of ligand-macromolecule interactions gives us insights on residues that play key roles in desired binding. The RMSD value compares the atomic co-ordinate between a simulation step and the initial structure. A macromolecule in a biological system has a dynamic behavior. Upon binding to its substrate, initial structure may alter due to structural changes in the protein imposed by the substrate interactions and nature of solvent. These changes reflect on the protein stability, providing the basis for further analysis. In our study, HIP1079-M pro showed stable RMSD throughout the 25 ns MD with a mean of 1.494 ± 0.173 Å (Figure 6a ). HIP82-M pro complex showed a gradual increase in RMSD until 13 ns to 1.648 Å, followed by a jump above 2 Å and then a gradual decrease to 1.858 Å until 25 ns (Figure 6a ). It showed a mean 1.634 ± 0.34 Å in its (Figure 7) . However, the alpha helical structure is lost by HIP1079 before 5 ns while HIP82 retains it (Figure 7) . HIP1079-M pro complex retained a similar but slightly higher RMSD profile compared to the M pro . HIP82-M pro complex deviated from the M pro F I G U R E 8 Profiles of root-mean-square fluctuation (RMSF) of (a) peptide-M pro complexes and M pro and (b) peptides over 25 ns molecular dynamic (MD) simulation F I G U R E 9 M pro residues that show significant number of interactions with (a) HIP82 peptide; (b) HIP1079 peptide; M pro residues that interact with a peptide residue from (c) HIP82 peptide; (d) HIP1079 peptide during at least 90% of their 25 ns simulation period, i.e. shows temporal presence at least 90% of the simulation period. Interactions were observed at every 1 ns of their simulation period to observe number of interactions and presence in interaction, i.e. data sampling were done at 1 ns simulation intervals significantly after 14 ns and remained higher in its RMSD profile. The SASA denotes the MolSA of the complex that is accessible to solvent molecules. The HIP1079-M pro showed gradual increase in SASA throughout the simulation period with a mean 14,043 ± 207 Å 2 . Although an increasing trend is observed in HIP82-M pro complex until 12 ns where it reaches 14,361 Å 2 , it eventually decreases to a value of 13,771 Å 2 ; having a profile mean of 13,914 ± 177 Å 2 (Figure 6b) . Both complexes show a higher SASA compared to M pro , although HIP82-M pro complex was closer with the M pro SASA after 14 ns. The Rg is defined as the mass-weight root mean square distance of collection of atoms from the center of mass of the complex. It denotes the overall spherical dimension of the T A B L E 2 Details of non-bond interacting M pro residues with a minimum 90% temporal presence over the 25 ns molecular dynamics (MD) simulation (Figure 6c ). Rg of HIP82-M pro complex was lower than M pro throughout the simulation while the opposite was observed for HIP1079-M pro . The Rg also agrees with the SASA profile, indicating that HIP82 binding induced compactness of the complex and HIP1079 had opposite effects. The RMSF gives an idea on the extent of fluctuation of atomic co-ordinates throughout the whole simulation period-as opposed to RMSD, which only compares the deviation at a particular timestep with the initial atomic co-ordinates. The RMSF of protein residues in both complexes remained nearly identical (Figure 8a ) except for M pro residues 50-55, 137-142, 154, 165-171, which experienced higher RMSF in HIP1079-M pro compared to HIP82-M pro . HIP1079-M pro had similar protein RMSF profiles with M pro while HIP82-M pro differed only in the aforementioned residues with lower RMSF (Figure 6a ). The RMSF of the peptides reveal HIP1079 to be more stable than HIP82, having residues 4-6 under 1 Å= and 1-3 under 1.5 Å (Figure 8b ). The binding chemistry analysis reveals THR25, MET49, ASN142 and HIS41, CYS44, CYS145, HIS163, HIS164, MET165 and GLU166 to be residues having more than 25 interactions throughout the 25 ns MD in HIP82-M pro and HIP1079-M pro , respectively (Figure 9a, b) . However, in HIP82-M pro complex, only THR25 and MET49 had more than 90% temporal presence in interacting with a peptide residue (Figure 9c ). In HIP1079-M pro complex, HIS41, CYS44, CYS145, GLY146, HIS163, MET165 and GLU166 had more than 90% temporal presence in interactions with a peptide residue (Figure 9d ). Temporal persistence in bonding leads to greater stability. The peptide residues that participate in interaction with temporally persistent protein residues are listed in Table 2 . It is seen that during MD simulation, only HIP1079 peptide interacted with the catalytic residues, HIS41 and CYS145, throughout the simulation period. The dynamic profile of type of bonding between the peptide and M pro reveals that HIP1079 maintains a persistent number of H bonds and electrostatic interactions with varying hydrophobic interactions ( Figure 10 ). However, HIP82 does not have a stable H bond profile, particularly in the 10-15 ns time steps. This could be a reason why a jump in RMSD is observed in this timeframe due to lack of stabilizing H-bonds. Students joined in this program demonstrate four main learning outcomes: Attending lecture, reading assigned texts, articles, manual and participating group discussions with trainers and peers, students demonstrate the knowledge and understanding of the basic concepts of computational chemistry related to drug design, molecular docking, and MD. In addition, students learn basic biochemistry concepts relevant to amino acids, peptides, and protein structure and properties, and drug-protein binding and interaction. Searching scientific literatures, implementing appropriate research methods, complex data analysis, and interpretation to address the therapeutic questions for Covid-19 drug design, students reveal scientific reasoning and problem-solving skills. Managing research data and collaborating with team members, students reveal project-management skills, teamwork skills, and future science career preparation. Our study suggests that HIP82 and HIP1079 anti-viral HIV peptides are potential candidates for COVID-19 treatment. Molecular docking shows that HIP82 and HIP1079 perfectly fit in the binding pocket of SARS-CoV-2-M pro wherein they both bind to the catalytic residues. Molecular dynamic simulations, however, reveal that only in HIP1079-M pro complex, the peptide is bound to the catalytic sites HIS41 and CYS145. Since M pro is one of the conserved proteinases among the corona viruses, proposed candidates may effectively work against other corona viruses as well. Experimental study can confirm in-silico results, however, the logistics and costs for preparing multiple peptide molecules are significantly reduced through this screening study. Besides hands-on experiences in computer aided drug design, learning basic biochemistry concept, scientific inquiry, criticalthinking, problem-solving skills, teamwork and communication skills obtained from this research internship are very essential components to motivate high school students pursuing future science careers. A novel coronavirus from patients with pneumonia in China COVID-19) Dashboard j WHO Coronavirus (COVID-19) Dashboard With Vaccination Data Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved a-ketoamide inhibitors The spike protein of severe acute respiratory syndrome (SARS) is cleaved in virus infected Vero-E6 cells Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor Impact of early pandemic stage mutations on molecular dynamics of SARS-CoV-2 MPro SARS to MERS: crystallographic studies on coronaviral proteases enable antiviral drug design An overview of severe acute respiratory syndromecoronavirus (SARS-CoV) 3CL protease inhibitors: Peptidomimetics and small molecule chemotherapy Hilgenfeld coronavirus main proteinase (3CLpro) structure: basis for design of anti-SARS drugs Peptide therapeutics 2.0. Molecules Safety and immunogenicity of an rAd26 and rAd5 vector-based heterologous prime-boost COVID-19 vaccine in two formulations: two open, non-randomised phase 1/2 studies from Russia V COVID-19 vaccine candidate appears safe and effective Antiviral peptides as promising therapeutics against SARS-CoV-2 Identification of a potential peptide inhibitor of SARS-CoV-2 targeting its entry into the host cells Design of ACE2-based peptide inhibitors of SARS-CoV-2 HIPdb: a database of experimentally validated HIV inhibiting peptides The inhibition of human immunodeficiency virus proteases by 'interface peptides PEP-FOLD3: faster de novo structure prediction for linear peptides in solution and in complex Molecular docking: a powerful approach for structure-based drug discovery FireDock: a web server for fast interaction refinement in molecular docking Fast interaction refinement in molecular docking PyMod 2.0: improvements in protein sequence-structure analysis and homology modeling within PyMOL Discovery studio modeling environment New ways to boost molecular dynamics simulations Particle mesh Ewald: an NÁlog (N) method for Ewald sums in large systems The Amber biomolecular simulation programs A student led computational screening of peptide inhibitors against main protease of SARS-CoV-2 The authors are grateful to our donors who supported to build a computational platform (http://grc-bd.org/ donate/). The authors thank Ashutosh Nath, Atkiya Anjum Nishat, Fahmida Akhter, Fuad Taufiqul Hakim, Md Shahadat Hossain, Sadia Afrose Esha, Sumit Kumar Baral and Zannatul Ferdous Sonia for guiding and training students at the "Science Research Internship for Middle and High Schoolers Program 2020" event. The authors declare no conflict of interest.