key: cord-0902100-ff9n96yg authors: Etemadi, Ali; Moradi, Hamid Reza; Mohammadian, Farideh; Karimi-Jafari, Mohammad Hossein; Negahdari, Babak; Asgari, Yazdan; Mazloomi, Mohammadali title: Binder design for targeting SARS-CoV-2 spike protein: An in silico perspective date: 2021-11-26 journal: Gene Rep DOI: 10.1016/j.genrep.2021.101452 sha: 44b9d9319b5e6a3c7c24cfcd2b909f745c4972af doc_id: 902100 cord_uid: ff9n96yg INTRODUCTION: The COVID-19 pandemic is now affecting all people around the world and getting worse. New antiviral medications are desperately needed other than the few approved medications that have shown no promising efficacy so far. METHODS: Here we report three blocking binders for targeting SARS-CoV-2 spike protein to block the interaction between the spike protein on the SARS-CoV-2 and the angiotensin-converting enzyme 2 (ACE2) receptors, responsible for viral homing into the alveolar epithelium type II cells (AECII). RESULTS: The design process is based on the collected natural scaffolds and using Rosetta interface for designing the binders. CONCLUSION: Based on the structural analysis, three binders were selected, and the results showed that they might be promising as new therapeutic targets for blocking COVID-19. the only homing receptor recognized for the virus till date . 83% of the alveolar epithelium type II cells (AECII) express ACE2, and making the respiratory system the main site of injury. The virus not only uses ACE2 as the receptor but also diminishes its protective role against lung damage (5) . Also, the presence of AEC2 in other organs results in multi-organ dysfunction (6) . Reports indicate that 76.5% of the amino acid sequence of the S protein is the same between SARS-CoV-1 and SARS-CoV-2, besides their matching 3D structure of RBD in the spike protein. SARS-CoV-2 shows a higher binding affinity than SARS-CoV-1 which can be an explanatory factor for the fast spread of the virus around the world (7) . Owing to the designability and selectivity, protein therapeutics show great promise for the future. As intermediate, small proteins fill the gap between small molecules and antibodies due to their lower molecular weight, these small proteins are highly soluble and stable compared to antibodies (140KDa). However, they have some pitfalls such as lower serum half-life. Considering the advancement in drug delivery methods, it is possible to overcome these problems in the nearest future (8) . With the advent of high-throughput techniques such as in-silico design and experimental evaluation, it is possible to rapidly produce de novo binder proteins in a massively parallel way. This previously was thought o to be impossible (9) . Some inventions are being done on the binding of macrocyclic peptides for inhibiting the entry of the virus (10). On the other end of models, etc. Therefore, E.coli can be a superior host for the production of antivirals against SARS-CoV2 (14) . In this study, we aimed at designing some novel binders to block the spike glycoprotein (S protein) of the SARS-CoV-2. The design process (Figure 1 ) was based on collecting a bunch of scaffolds which are natural proteins for redesigning thereby making them modified binders for targeting the S protein. 2 Methods: In the first step, a list of natural proteins (Supplementary T1) was collected using the advanced search in RCSB (15) based on the following criteria: 1) resolution below 2 amgstroms, 2) X-ray diffraction method, 3) must be expressed in E.Coli, 4) sequence length below 100 residues, 5) monomeric proteins, and 6) no DNAs, RNAs, ligands, and mutations in proteins. These scaffolds are natural proteins and were used as initial proteins for making binders using docking, redesign, and other steps. To remove all clashes in scaffolds and make them more favourable for Rosetta (16) , all selected scaffolds were relaxed using the following script (/path/to/rosetta/main/source/bin/relax.hdf5.linuxgccrelease -database /path/to/rosetta/main/database -l PDBLIST -ignore_unrecognized_res -relax:constrain_relax_to_start_coords relax:coord_constrain_sidechains -relax:ramp_constraints false -ex1 -ex2 -use_input_sc -no_his_his_pairE -no_optH false -flip_HNQ). We used SARS-CoV-2 spike protein (PDB number 6M0J) (17) as a target for docking. Patchdock Molecular dynamics (MD) simulation was conducted using the GROMACS package (19) with an AMBER99SB-ILDN force field and TIP3P water model. Gromacs was used to analyze the overall RMSD and RMSF after 20ns MD simulation, and to analyze the protein conformational changes and stability under physiological conditions. 3 trajectories were analyzed for 50ns, 2fs per step. Also, MD simulations were done on the original target (6M0J) and promising scaffolds. A simulation box with at least 1nm box boundary to protein distance was used for all proteins. Periodic boundary conditions (PBC) and particle mesh Ewald (PME) methods were used for simulations to consider the long-range electrostatic interactions (20) . Energy minimization (using the steepest descent method) was done after adding ions (Cl− or Na+) to neutralize the total charge, followed by equilibration for 200ps under the NPT (constant number of particles, pressure, and temperature) ensemble. Position restraints and the temperature was coupled to 310 K using the velocity-rescaling thermostat. For all the designed binders and also the original scaffold and the target, root-mean-square deviation (RMSD), residue contact maps, the radius of gyration, and minimum distance were analyzed using gmx rms, gmx distance, gmx mdmat, gmx gyrate, and gmx mindist respectively. Based on these criteria the final top 10 designs were selected for 50ns MD simulation. Using patchdock, a type of blind docking was done based on 177 designed binders against the target. For this reason, binding sites were not provided and for each binder, 2000 outputs were checked. The top 10 docked complexes were selected and visually checked to determine which binders were optimal in finding the binding sites of interest . Simulteaneously, , ClusPro was used to check the binders docking against the target (21) . Weighted Scores in center representation were used to select the top 10 docked models in ClusPro. Also, the top 10 docked models were selected based on the Patchdock geometric shape complementarity score. To compare the binding scores of designed binders and SARS coronavirus spike receptorbinding domain against its receptor ACE2, ClusPro was used for docking of the ACE2 and Spike protein and was compared with the binding scores for all binders. All designed binders were examined for their characteristics. The MolProbity, VADAR 1.8 (22) , and 2Struc (23) tools were used for the structural evaluation and DSSP. ExPASy-ProtParam (24) was employed for the calculation of stability and the grand average of hydropathy (GRAVY) index. The aggregation potential of the proteins was assessed by Aggrescan (25) . To forecast antigenic epitopes of the proteins, different algorithms were used from the IEDB web server. Analysis on antigenicity and surface accessibility were performed using Emini methods (27) by Kolaskar and Tongaonkar (26) . Enfuvirtide (T20), an approved antiviral drug, was used for positive control. Evaluation of structural parameters with MolProbity was performed without any Hydrogen addition and for each PDB file, MolProbity scores were recorded. In VADAR 1.8, the Shrake method (28) was utilized for the values of Van der Waals radii and the definition of polar/nonpolar accessible surface area (ASA) and charged ASA. This method did not consider a uniform radius (1.8 Å) for all side-chain atoms and any hetero-atom attached to carbon was considered polar. Also, the Voronoi procedure was chosen for volume calculation (29) . The protein structure coordination J o u r n a l P r e -p r o o f files were examined in the main chain, stereo/packing quality, and 3D profile quality indexes (30) . Default thresholds were considered for every IEDB measuring tool (31) . Fragment quality application was used in robbeta to pick the best 3-and 9-mer fragments scores and used to predict 3D protein structures based on comparative modeling in Robetta (32) . PyMol was used for structure visualization(33). 3 Result: Patchdock and Cluspro were applied for global or blind docking without providing any binding sites. 167 binders from the outputs of RosettaDesign were selected and docked against the target. Figure 2 shows the results for all docked designs against the target using ClusPro and Patchdock. From the top 10 binders selected from both Patchdock and Cluspro, two (3HGL_1_78_2_1_78 or BIN78 and 3HGL_1_45_1_1_32 or BIN32) were found to be the same in both Patchdock and Cluspro outputs and were highly promising. These binders were bound to the same binding site ofinterest. Also, the binding sites were the same as the site targeted by ClusPro and PatchDock models were 0.263 and 0.393, respectively ( Figure 3A ). Both ClusPro and Patchdock predicted that the BIN78 bound to the desired pocket on the target with minimum RMSD. A closer look at BIN78(figure 3B), revealed that two main helices interacted with the main groove on the target. There were two π-π interactions between the BIN78 and the target: F28 with F202/236 and also H7 with Y251. Ddg scores for BIN32 and BIN78 were -43.998 and -41.691. Table 1 shows some predicted scores for all selected binders. To check whether our designed binders had enough binding scores, we compared the weighted scores in ClusPro for designed binders BIN78, BIN32, BIN91, and SARS coronavirus spike receptor-binding domain complexed with its receptor (PDB number 6M0J). The weighted J o u r n a l P r e -p r o o f scores for BIN78, BIN32, BIN91 were -747.3, -814, and -771.8, respectively. However, this score for the complex of S protein and ACE2 was -719.6. The bite signals or highly frequent residues in all sequences were 30 residues (out of 78 residues) Figure 4 shows the Weblogo profile of all selected binders. Rosetta was used for the comparative modeling of all selected binders. The results of the modeling showed that BIN91 was highly promising in predicting the 3-dimensional structure. The RMSD of BIN32, BIN91, and BIN78 with native scaffolds (3HGL) were 2.81, 1.97, and 2.88 A, respectively ( Figure 5 ). To investigate the conformational behavior of the binder's models and native scaffold, MD simulation was used during the 50ns simulation ( figure 6 ). The minimum and maximum RMSD score for native scaffolds and the binders were close and the traces for the binders were followed by the traces for native scaffolds, although the RMSD for BIN91 was a little different from the others. The radius of gyration (Rg) was analyzed to check protein structure compactness and stability of the binders, and also the target and native scaffolds. The Rg values for the binders in the binding form with the target were similar and larger than the target and the native scaffold alone. The contact maps for all three binders also showed the similarity between the binders. Likewise, minimum distances between the atoms in all binders and the native scaffold were analyzed and showed that the binders and native scaffolds were close to each other. The results for distances between the specified residues (77-195, 19 Structural analysis was done on all 177 designed proteins. Since there was no perfection, each protein had some pros and cons. Therefore, by comparing their characteristics, with each other and with the main 3HGL protein, the 10 top proteins with enhanced or comparable parameters were selected ( Inferentially, this implied there was no problem in the local environment, packing, and hydrophobic energy of the structures. Among the three final candidates, BIN78 had a better stereo/packing quality index and overall better performance in this section. Enfuvirtide PDB structure as positive control showed a low MolProbity score and many problems were detected with its 3D profile quality index. Ramachandran plots produced by VADAR in Figure 8 indicates that, for BIN32 and BIN91, 92% of φ and ψ angles are in the core region, and 7% are in the allowed region. Additionally, 93% of BIN78 φ and ψ angles are in the core region, and 5% are located in the allowed region. For Enfuvirtide, 90% of residue angles are in the core region, and only 9% belong to the allowed region. No φ and ψ angle outliers were detected in any structure. It can be inferred from their data that there is a high probability for proteins with a negative value of GRAVY score to be soluble, and proteins with a positive value to be membranous (36) . As shown in Figure 7C , the GRAVY index value of a large number of proteins is at the boundary and above 0. The GRAVY score of top binders and controls are represented in Table 2 shown that there are some dipeptides (the smallest unit of order) in the protein's sequence which defines the stability (38) . The instability index of all designed proteins is plotted in Figure 7D . Although it has been stated that proteins with an instability index below 40 are considered to be stable, the main 3HGL protein had an instability index of 59.03. Another important aspect of choosing these candidates is the aggregation potential of the molecules. For this purpose, the normalized parameters such as global aggregation propensity had -16 Na4vSS, which was one of the lowest among the proteins indicating its low aggregating potential ( Figure 7B ). Considering the data evaluated from the analysis of datasets by the original article, proteins with less than -5.18 for Na4vSS, 0.9 THSAr, and 3.35 NnHS parameters remained soluble when they are over-expressed in E.coli. Hence, the designed protein pool was filtered by these criteria (26) . The top ten selected protein designs all had a value below -10 for Na4vSS, except BIN32 and BIN78 for which a different selection procedure was performed ( Table 2 ). BIN97 and BIN78 had one, while BIN32 had two additional number of hotspots in the same range of sequence compared to the main 3HGL protein (Figure 8 ). Compared to Enfuvirtide, all three top designs had an advantage in THSAr and NnHS, but when Na4vSS was considered, Enfuvirtide had a low value (-20), indicating a low aggregation propensity of this small protein. The final major selection parameter was the antigenicity of the proteins. The average, minimum, and maximum scores based on Kolaskar and Tongaonkar and Emini methods are presented in Table 3 . Kolaskar and Tongaonkar with 75% accuracy, is a semi-empirical method for locating the antigenic determinants in the sequence. It uses the physicochemical properties of residues and their frequencies in experimentally determined epitopes (27) . As Figure 8 shows starts from around the 7th to 15th amino acid position. They also had another position adjacent to the first one, which starts from around the 21st to the 30th residue. All three top binders also had a third spot which was above the threshold, but it was not predicted to be antigenic in BIN78 by the webservers' algorithm. The calculation was based on the surface accessibility scale on a product instead of addition within the window. The Emini method used the formula Sn = (n+4+i) (0.37) -6 which gave the probability for a hexapeptide sequence to be on the surface. Sn greater than 1.0 shows a higher probability for a sequence to reside on the surface of the protein (28) . Results from Emini demonstrated that most parts of the antigenic sequences were not exposed at the surface. BIN91 had the least average (1.028) score between the three top binders, while the other two both had an average score of 1.046. The two ends of the Enfuvirtide sequence were the most antigenic parts but the Emini method did not consider them as exposed to the surface ( Figure 8 ). To date, no drugs have been approved for the treatment of the CoVID-19. Remdesivir is the sole antiviral drug that is recommended for patients with mild disease. However, it is on limited supply (37) , and the attainment of herd immunity by vaccination may take at least a year while the infection rate of SARS-CoV-2 keeps increasing rapidly. Despite this, the effective range of the vaccines remains inconclusive based on some reports (38) . AvrPtoB a pseudomonas syringae pv tomato multi-domain effector protein, which plays a role in interrupting the immune responses of tomato plants by inhibiting the programmed cell death and promotes disease (39) . To elucidate the crystal structure, Dong expressed a portion of this protein (AvrPtoB121-205 Or 3HGL) that was stable and sufficient for the interaction with its target in E.coli (40) . In this study, 3HGL was used as a scaffold with the interface of its surfaces redesigned using RossetaDesign (41) . This method was applied by different groups for making diagnostic and therapeutic proteins (42) . Targeting the S protein with inhibitors is one of the prominent ways of developing therapeutics (43) . Not only do they inhibit the attachment of RBD to ACE2 and halts the entry of the virus, but they also preserve the ACE2. Consequently, they can J o u r n a l P r e -p r o o f function as regulators of the renin-angiotensin-aldosterone system, and protect the lung from injury by their protease function against angiotensin (Ang) 1. By attaching to the products of the ACE2, Ang 1-7, and Mas pathway they exert their anti-inflammatory, and anti-proliferative effects. Furthermore, they increase oxygenation and reduce the hypersensitivity of the airway and downregulate the apoptosis of alveolar epithelial cells. However, ACE2 is not the only enzyme that participates in the production of angiotensin. A week later, the level of this peptide gets back to normal, but in the short run, it could be quite damaging (44) . The results of ClusPro docking scored for selected binders and also spike protein against the receptor ACE2, showed that our binders are promising and might be useful for targeting the virus as it bound to the target stronger than the spike protein. As can be seen from Figure 5B , the substitution of caline (residue 76 in 3HGL) with aspartate (in BIN91) or threonine (in BIN32) increased the stability of the protein which may partly be due to the arginine in the 77 th position of the sequence. In BIN78 substitution of valine with leucine (a methyl added as the bridge in side chain) did not change the instability index (58.28) considerably. indicating the limited half-life of this binder. Guruprasad's method, however, does not consider the higher-order factors that may affect the results (45) . Buried charges may impact the folding of proteins and their stability because it is not so energetically favorable to bury a charge inside the protein structure. There are some regions of the binders that are considered antigenic based on their sequence. Apart from the structural standpoint, other factors may contribute to the immunogenicity of the proteins like administration dose, excipient formulation, host proteins, patients' immune/genetic background, and the aggregation of the proteins (46). The Aggrescan webserver predicted that BIN91 had the least aggregating propensity, but these predictions were based on the sequence and the folded protein may take a structure that increases or decreases the aggregation potential. Hiding some of the regions during the folding may vary the characteristics of the protein. That notwithstanding, it is possible to address such challenges with drug delivery methods (47). Step J o u r n a l P r e -p r o o f Covid-19 testing strategy of India-Current status and the way forward Strong Social Distancing Measures In The United States Reduced The COVID-19 Growth Rate: Study evaluates the impact of social distancing measures on the growth rate of confirmed COVID-19 cases across the United States Natural History of COVID-19 and Current Knowledge on Treatment Therapeutic Options Photobiomodulation and antiviral photodynamic therapy as a possible novel approach in COVID-19 management Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus Role and mechanism of angiotensin-converting enzyme 2 in acute lung injury in coronavirus disease 2019. Chronic Diseases and Translational Medicine Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission Challenges and opportunities for non-antibody scaffold drugs. Drug discovery today Interaction of certain monoterpenoid hydrocarbons with the receptor binding domain of 2019 novel coronavirus (2019-nCoV), transmembrane serine protease 2 (TMPRSS2), cathepsin B, and cathepsin L (CatB/L) and their pharmacokinetic properties Learning from the past: possible urgent prevention and treatment options for severe acute respiratory infections caused by 2019-nCoV A Genetically Encoded, Phage-Displayed Cyclic-Peptide Library Recombinant protein expression in Escherichia coli: advances and challenges Metabolic engineering of Escherichia coli for natural product biosynthesis The protein data bank A Pareto-optimal refinement method for protein design scaffolds Structural basis of receptor recognition by SARS-CoV-2 PatchDock and SymmDock: servers for rigid and symmetric docking GROMACS: a message-passing parallel molecular dynamics implementation Particle mesh Ewald: An N⋅ log (N) method for Ewald sums in large systems The ClusPro web server for protein-protein docking VADAR: a web server for quantitative evaluation of protein structure quality 2Struc: the secondary structure server Protein identification and analysis tools on the ExPASy server. The proteomics protocols handbook AGGRESCAN: a server for the prediction and evaluation of" hot spots" of aggregation in polypeptides A semi-empirical method for prediction of antigenic determinants on protein antigens Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide Environment and exposure to solvent of protein atoms. Lysozyme and insulin Areas, volumes, packing, and protein structure Assessment of protein models with three-dimensional profiles Stereochemical quality of protein structure coordinates Protein structure prediction and analysis using the Robetta server The PyMOL Molecular Graphics System, Version 1.8. Schrödinger LLC Approved antiviral drugs over the past 50 years The interpretation of protein structures: total volume, group volume distributions and packing density A simple method for displaying the hydropathic character of a protein Hydroxychloroquine and Remdesivir in COVID-19: A critical analysis of recent events Some Recovered Coronavirus Patients In Wuhan Are Testing Positive Again. NPR Goats and Soda Mystery in Wuhan: recovered coronavirus patients test negative then positive. National Public Radio: News and Analysis Effector-triggered immunity mediated by the Pto kinase Crystal structure of the complex between Pseudomonas effector AvrPtoB and the tomato Pto kinase reveals both a shared and a unique interface compared with AvrPto-Pto Computational protein design for given backbone: recent progresses in general method-related aspects. Current opinion in structural biology Redesigned HIV antibodies exhibit enhanced neutralizing potency and breadth Much More Than Just a Receptor for SARS-COV-2 Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence Supplementary T1: A list of selected scaffolds collected from RCSB for binder design Declared none. Dr. Ali Etemadi, Dr.Farideh Mohammadian, and Dr. Babak Negahdari: conceptualized and designed the study, drafted the initial manuscript, and reviewed and revised the manuscript.Hamid Reza Moradi and Dr.Mohammad Hossein Karimi-Jafari:: Designed the data collection instruments, collected data, carried out the initial analyses, and reviewed and revised the manuscript.Dr. Yazdan Asgari and Dr.Mohammadali Mazloomi: Coordinated and supervised data collection, and critically reviewed the manuscript for important intellectual content. None.