key: cord-193133-puqcbf8t authors: Piplani, Sakshi; Singh, Puneet Kumar; Winkler, David A.; Petrovsky, Nikolai title: In silico comparison of spike protein-ACE2 binding affinities across species; significance for the possible origin of the SARS-CoV-2 virus date: 2020-05-13 journal: nan DOI: nan sha: doc_id: 193133 cord_uid: puqcbf8t The devastating impact of the COVID19 pandemic caused by SARS coronavirus 2 (SARSCoV2) has raised important questions on the origins of this virus, the mechanisms of any zoonotic transfer from exotic animals to humans, whether companion animals or those used for commercial purposes can act as reservoirs for infection, and the reasons for the large variations in susceptibilities across animal species. Traditional lab-based methods will ultimately answer many of these questions but take considerable time. In silico modeling methods provide the opportunity to rapidly generate information on newly emerged pathogens to aid countermeasure development and also to predict potential future behaviors. We used a structural homology modeling approach to characterize the SARSCoV2 spike protein and predict its binding strength to the human ACE2 receptor. We then explored the possible transmission path by which SARSCoV2 might have crossed to humans by constructing models of ACE2 receptors of relevant species, and calculating the binding energy of SARSCoV2 spike protein to each. Notably, SARSCoV2 spike protein had the highest overall binding energy for human ACE2, greater than all the other tested species including bat, the postulated source of the virus. This indicates that SARSCoV2 is a highly adapted human pathogen. Of the species studied, the next highest binding affinity after human was pangolin, which is most likely explained by a process of convergent evolution. Binding of SARSCoV2 for dog and cat ACE2 was similar to affinity for bat ACE2, all being lower than for human ACE2, and is consistent with only occasional observations of infections of these domestic animals. Overall, the data indicates that SARSCoV2 is uniquely adapted to infect humans, raising questions as to whether it arose in nature by a rare chance event or whether its origins lie elsewhere. The devastating impact of COVID-19 infections caused by SARS-coronavirus 2 (SARS-CoV-2) has stimulated unprecedented international activity to discover effective vaccines and drugs for this and other pathogenic coronaviruses. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] It has also raised important questions on the mechanisms of zoonotic transfer of viruses from animals to humans, questions as to whether companion animals or those used for commercial purposes can act as reservoirs for infection, and the reasons for the large variations in SARS-CoV-2 susceptibility across animal species. [17] [18] [19] Understanding how viruses move between species may help us prevent or minimize these pathways in the future. Elucidating the molecular basis for the different susceptibilities of species may also shed light on the differences in susceptibilities in different sub-groups of humans. Very recently, Shi et al. published the results of experiments to determine the susceptibility to SARS-CoV-2 of ferrets, cats, dogs, and other domesticated animals. 20 They showed that SARS-CoV-2 virus replicates poorly in dogs, pigs, chickens, and ducks, but ferrets and cats are permissive to infection. Other studies have reported the susceptibility of other animal species to SARS-CoV-2. 17, 20, 21 Susceptible species such as macaques, hamsters and ferrets are used as animal models of SARS-CoV-2 infection. [22] [23] [24] In the absence of purified, isolated ACE2 from all the relevant animal species that could be used to measure the molecular affinities to spike protein experimentally, computational methods offer considerable promise for determining the rank order of affinities across species, as a method to impute which species may be permissive to SARS-CoV-2. Here we show how computational chemistry methods from structure-based drug design can be used to determine the relative binding affinities of the SARS-CoV-2 spike protein for its receptor, angiotensin converting enzyme (ACE)-2, a critical initiating event for SARS-CoV-2 infection, across multiple common and exotic animal species. [25] [26] [27] The aim of these studies was to better understand the species-specific nature of this interaction and see if this could help elucidate the origin of SARS-CoV-2 and the mechanisms for its zoonotic transmission. To construct the three-dimensional structure of the SARS-CoV-2 spike protein, the sequence was retrieved from NCBI Genbank Database (accession number YP_009724390.1). A PSI-BLAST search against the PDB Database for template selection was performed and the X-ray structure of SARS coronavirus spike template (refcode 6ACC) was selected with 76.4% sequence similarity to SARS-CoV-2 spike protein. The protein sequences of the ACE2 proteins for different species is summarized in Table 11 and full sequence alignment in Supplementary Figure 1 . The phylogenetic tree for ACE2 proteins from selected animal species is illustrated in Supplementary Figure 2 . The 3D-structures were built using Modeller 9.21 (https://salilab.org/modeller/). 28 The quality of the generated models was evaluated using the GA341 score and DOPE scores, and the models assessed using SWISS-MODEL structure assessment server (https://swissmodel.expasy.org/assess). 29 The X-ray crystal structures of human ACE2 (recode 3SCI) and human SARS spike protein (refcode 5XLR) were retrieved from Protein Data Bank. Protein preparation and removal of nonessential and non-bridging water molecules for docking studies were performed using the UCSF Chimera package (https://www.cgl.ucsf.edu/chimera/). 30 These modelled structures were docked against SARS-CoV-2 spike protein structure using the HDOCK server (http://hdock.phys.hust.edu.cn/). 31, 32 Molecular docking was performed on the homology modelled SARS-CoV-2 spike protein with human and animal ACE2 proteins. The SARS-CoV spike protein was also docked with human ACE2 protein to obtain the docking pose for binding energy calculations. The docking poses were ranked using an energy-based scoring function. The docked structures were analyzed using UCSF Chimera software. The final models were optimized using the AMBER99SB-ILDN force field in Gromacs2020 (http://www.gromacs.org/). 33 Docked complexes (SARS-CoV-2 spike with human ACE2, human SARS-CoV spike with human ACE2, SARS-CoV-2 spike with bat ACE2 etc) were used as starting geometries for MD simulations. Simulations were carried out using the GPU accelerated version of the program with the AMBER99SB-ILDN force field I periodic boundary conditions on an Oracle Cloud Server. Docked complexes were immersed in a truncated octahedron box of TIP3P water molecules. The solvated box was further neutralized with Na+ or Cl− counter ions using the tleap program. Particle Mesh Ewald (PME) was employed to calculate the long-range electrostatic interactions. The cutoff distance for the long-range van der Waals (VDW) energy term was 12.0 Å. The whole system was minimized without any restraint. The above steps applied 2500 cycles of steepest descent minimization followed by 5000 cycles of conjugate gradient minimization. After system optimization, the MD simulations was initiated by gradually heating each system in the NVT ensemble from 0 to 300 K for 50 ps using a Langevin thermostat with a coupling coefficient of 1.0/ps and with a force constant of 2.0 kcal/mol·Å2 on the complex. Finally, a production run of 100 ns of MD simulation was performed under a constant temperature of 300 K in the NPT ensemble with periodic boundary conditions for each system. During the MD procedure, the SHAKE algorithm was applied for the constraint of all covalent bonds involving hydrogen atoms. The time step was set to 2 fs. The structural stability of the complex was monitored by the RMSD and RMSF values of the backbone atoms of the entire protein. Finally, the free energies of binding were calculated for all the simulated docked structures. Calculations were also performed for up to 500 ns on human ACE2 to ensure that 100ns is sufficiently long for convergence. Duplicate production runs starting with different random seeds were also run to allow estimates of binding energy uncertainties to be determined for the strongest binding ACE2 structures. The binding free energies of the protein-protein complexes were evaluated in two ways. The traditional method is to calculate the energies of solvated SARS-CoV-2 spike and ACE2 proteins and that of the bound complex proteins and derive the binding energy by subtraction. ΔG (binding, aq) = ΔG (complex, aq) -(ΔG (spike, aq) + ΔG (ACE2, aq) We also calculated binding energies using the molecular mechanics Poisson Boltzmann surface area (MM-PBSA) tool in GROMACS that is derived from the nonbonded interaction energies of the complex. 34, 35 The method is also widely used method for binding free energy calculations. The binding free energies of the protein complexes were analyzed during equilibrium phase from the output files of 100 ns MD simulations. The g_mmpbsa tool in GROMACS was used after molecular dynamics simulations, the output files obtained were used to post-process binding free Free energy decomposition analyses were also performed by MM-PBSA decomposition to get a detailed insight into the interactions between the ligand and each residue in the binding site. The binding interaction of each ligand-residue pair includes three terms: the van der Waals contribution, the electrostatic contribution, and the solvation contribution. Another estimate of the strength of the interaction between protein-protein complex can be obtained from the non-bonded interaction energy between the complex. GROMACS has the ability to decompose the short-range nonbonded energies between any number of defined groups. To compute the interaction energies as a part of our analysis, we reran the trajectory files obtained during simulation to recompute energies using -rerun command. The interaction energy is the combination of short range Coulombic interaction energy (Coul-SR:Protein-Protein) and the short-range Lennard-Jones energy (LJ-SR:Protein-Protein (see Table 3 ). While this paper was being prepared, a paper by Guterres and Im described a substantial improvement in protein-ligand docking results using high-throughput MD simulations. 36 They employed docking using AutoDock Vina, followed by MD simulation using CHARMM. The parameters they advocated were very similar to those used in our study. Proteins were solvated in a box of TIP3P water molecules extending 10 Å beyond the proteins and the particle-mesh Ewald method was used for electrostatic interactions. Nonbonded interactions over 10 and 12 Å were truncated. Their systems were minimized for 5000 steps using the steepest descent method followed by 1 ns equilibration with an NVT setting. For each protein-ligand complex, they ran 3 × 100 ns production runs from the same initial structure using different initial velocity random seeds and an integration step size of 2 fs. The ancestry of SARS-CoV-2 traces back to the human, civet and bat SARS-CoV strains, which all use the same ACE2 proteins for cellular entry. [37] [38] [39] The similarities and variations in sequences for both the SARS-CoV-2 spike protein and the SARS spike protein were determined from sequences retrieved from NCBI GenBank Databank and aligned using CLUSTALW. The spike protein receptor binding domain (RBD) region showed a 72% identity between the two viruses ( Figure 1 ). conserved regions in black and non-conserved residues in red and blue. The three-dimensional structure of SARS-CoV-2 Spike protein ( Figure The Ramachandran scores of the modeled ACE2 structures for selected species are summarized in Table 1 with the actual predicted ACE2 structures and Ramachandran plots for each selected species shown in Supplementary Figure 3 . Figure 3) . The modelled structures were further assessed for quality control using Ramachandran Plot and molprobity scores in SWISSModel structure assessment. The Ramachandran Plot checks the stereochemical quality of a protein by analyzing residue-by-residue geometry and overall structure geometry, is also a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. The Ramachandran score of SARS-CoV-2 spike protein was 90% in the binding region and molprobity scores that provide an evaluation of model quality at both the global and local level was 3.17. Further, the ACE2 modelled structures from selected species were also assessed using Swiss model structure assessment. Ramachandran score or the percentage of amino acid residues falling into the energetically favored region for all species ranged from 96-99% (Table 1) . These Ramachandran and Molprobity scores show that all the built structures were of good quality and were suitable for use in further studies. The Ramachandran graphs are presented in Supplementary Figure 3 . The Receptor Binding Domain (RBD) of SARS-CoV-2 spike protein was docked against ACE2 receptor of various species using HDOCK server. The interacting residues of ACE2 and SARS-CoV-2 spike protein are depicted in Table 3 . We found certain key amino acids in the receptor binding motif (RBM) that were in accordance with previous studies 40 . Certain amino acids including PHE28, ASN330, ASP355 and ARG357 were conserved in ACE2 of most of the selected species and were observed to take part in the interaction with spike protein. TYR41, LYS353, ALA386 and ARG393 also interacted with spike protein residues and were highly conserved across all species except bat, mouse, ferret and pangolin, respectively. Spike protein interacting residues with Ophiophagus hannah (King cobra) were least common amongst all the ACE2 species included in the study, consistent with its low sequence similarity to human ACE2. Homo sapiens (Human) The molecular dynamics simulation of complexes of SARS-CoV-2 spike protein and ACE2 receptors of various species were performed for 100ns. All complexes became stable during simulation with RMSD fluctuations converging to a range of 0.5 to 0.8 nm from the original position. The calculated binding energies for the interactions of SARS-CoV-2 with ACE2 from the species studied are presented in Table 4 . The Table 4 also includes observational in vivo data on SARS-CoV-2 infectivity and disease symptoms in the species where this has been reported. Although bats carry many coronaviruses including SARS-CoV, a relative of SARS-CoV-2, direct evidence for existence of SARS-CoV-2 in bats has not been found. As highlighted by our data, the binding strength of SARS-CoV-2 for bat ACE2 is considerably lower than for human ACE2, suggesting that even if SARS-CoV-2 did originally arise from a bat precursor it must later have adapted its spike protein to optimise its binding to human ACE2. There is no current explanation for how, when or where this might have happened. Instances of direct human infection by coronaviruses or other bat viruses is rare with transmission typically involving an intermediate host. For example, lyssaviruses such as Hendra are periodically transmitted from bats to horses and then to humans who contact the infected horse. Similarly, SARS-CoV was shown to be transmitted from bats to civet cats and from them to humans. To date, a virus identical to SARS-CoV-2 has not been identified in bats or any other non-human species, making its origins unclear. To date, the most closely related coronavirus to SARS-CoV-2, is the bat coronavirus, BatCoV RaTG1, which has 96% whole-genome identity to SARS-CoV-2. 50 Hence other genetic factors could underlie the apparent lack of susceptibility of dogs to COVID-19 clinical infection. It is known that gain of function (GOF) mutations occur in viruses that can lead to pandemics. GOF means viruses gain a new property e.g. in influenza virus GOF has been associated with the acquisition of a new function, such as mammalian transmissibility, increased virulence for humans, or evasion of existing host immunity. 49 The conditioning of viruses to humans as pandemics progress is well recognized. However, the SARS-CoV-2 structures and sequences that we employed were from viruses collected very early in the pandemic. It is therefore not clear how SARS-CoV-2 could have developed such a high affinity for human ACE2, notably higher than for those of putative zoonotic sources for SARS-CoV-2, unless it has been previously selected on human ACE2 or an ACE2 of another species bearing a closely homologous spike protein binding domain. Interestingly, pangolin ACE2 bears some similarities in its SBD to human ACE2. This marries with the fact that Pangolin-CoV shares a highly similar RBD to SARS-CoV-2, although their remaining sequence has only 90% similarity. This could be consistent with a process of convergent evolution whereby human and pangolin coronaviruses infecting via ACE2, have come to the same solution in respect of evolving an optimal spike RBD for binding of either human or pangolin ACE2, respectively. Our data does indicate that humans might be permissive to pangolin CoVs that use ACE2 for cell entry, a fact that needs to be borne in mind in respect of future potential coronavirus pandemic sources. However, this does not mean that pangolin ACE2 was the receptor on which the SARS-CoV-2 spike protein RBD was initially selected, with the strength of binding to pangolin ACE2 lower than binding to human ACE2. This makes it unlikely that pangolins are the missing intermediate host. If SARS-CoV-2 spike was selected on pangolin ACE2, then given the higher affinity of SARS-CoV-2 for human ACE2 than for bat ACE2, SARS-CoV-2 would have to have circulated in pangolins for a long period of time for this evolution and selection to occur and to date there is no evidence of a SARS-CoV-2 like virus circulating in pangolins. Another possibility would be a short term evolutionary step where a pangolin was recently coinfected with a bat ancestor to SARS-CoV-2 at the same time as it was infected by a pangolin CoV allowing a recombination event to occur whereby the spike RBD of the pangolin virus was inserted into the bat CoV, thereby conferring the bat CoV with high binding for both pangolin and human ACE2. Such recombination events are known to occur with other RNA viruses and can explain creation of some pandemic influenza strains 49 . Nevertheless, such events are by necessity rare as they require coinfection of the one host at exactly the same time. Most importantly, if such a recombination event had occurred in pangolins it might have been expected to have similarly triggered an epidemic spread of the new highly permissive SARS-CoV-2 like virus among pangolin populations, such as we now see occurring across the human population. Currently there is no evidence of such a pangolin SARS-CoV-2 like outbreak, making this whole scenario less likely. Indeed, pangolins might be protected from SARS-CoV-2 infection due to the existence of crossprotective spike RBD neutralising antibodies induced by exposure to pangolin CoV, given the RBD similarity of these two viruses. Another possibility which still cannot be excluded is that SARS-CoV-2 was created by a recombination event that occurred inadvertently or consciously in a laboratory handling coronaviruses, with the new virus then accidentally released into the local human population. Given the seriousness of the ongoing SARS-CoV-2 pandemic, it is imperative that all efforts be made to identify the original source of the SARS-CoV-2 virus. In particular, it will be important to establish whether COVID-19 is due to a completely natural chance occurrence where a presumed bat virus was transmitted to humans via an intermediate animal host or whether COVID-19 has alternative origins. This information will be of paramount importance to help prevent any similar human coronavirus outbreak in the future. Clinical trials for the treatment of Coronavirus disease 2019 (COVID-19): A rapid response to urgent need Progress and Prospects on Vaccine Development against SARS-CoV-2 Antiviral Treatment of Covid-19 COVID-19: a fast evolving pandemic The COVID-19 vaccine development landscape World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19) Repurposing antimalarials and other drugs for COVID-19 Pharmacologic Treatments for Coronavirus Disease 2019 (COVID-19): A Review What Does Plant-Based Vaccine Technology Offer to the Fight against COVID-19? Vaccines (Basel) Clinical trials on drug repositioning for COVID-19 treatment Perspectives: Potential Therapeutic Options for SARS-CoV-2 Patients Based on Feline Infectious Peritonitis Strategies: Central Nervous System Invasion and Drug Coverage Research towards treating COVID-19 Timely development of vaccines against SARS-CoV-2 Don't rush to deploy COVID-19 vaccines and drugs without sufficient safety guarantees COVID-19 needs a big science approach Can companion animals become infected with COVID-19? Can companion animals become infected with COVID-19? COVID-19: Zoonotic aspects Susceptibility of ferrets, cats, dogs, and other domesticated animals to SARScoronavirus 2 Absence of SARS-CoV-2 infection in cats and dogs in close contact with a cluster of COVID-19 patients in a veterinary campus Infection and Rapid Transmission of SARS-CoV-2 in Ferrets Age-related rhesus macaque models of COVID-19 Simulation of the clinical and pathological manifestations of Coronavirus Disease 2019 (COVID-19) in golden Syrian hamster model: implications for disease pathogenesis and transmissibility Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 ACE2 the Janus-faced protein -from cardiovascular protection to severe acute respiratory syndrome-coronavirus and COVID-19 Structural basis of receptor recognition by SARS-CoV-2 Comparative protein modelling by satisfaction of spatial restraints Toward the estimation of the absolute quality of individual protein structure models UCSF Chimera--a visualization system for exploratory research and analysis HDOCK: a web server for proteinprotein and protein-DNA/RNA docking based on a hybrid strategy The HDOCK server for integrated protein-protein docking High performance molecular simulations through multilevel parallelism from laptops to supercomputers Electrostatics of nanosystems: application to microtubules and the ribosome Open Source Drug Discovery, C. & Lynn, A. g_mmpbsa--a GROMACS tool for high-throughput MM-PBSA calculations Improving Protein-Ligand Docking Results with High-Throughput Molecular Dynamics Simulations Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus A highly conserved cryptic epitope in the receptor-binding domains of SARS-CoV-2 and SARS-CoV Predicting the angiotensin converting enzyme 2 (ACE2) utilizing capability as the receptor of SARS-CoV-2 Silico Analysis of Intermediate Hosts and Susceptible Animals of SARS-CoV-2. ChemRxiv Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins An Overview of SARS-CoV-2 and Animal Infection SARS-CoV-2 spike protein favors ACE2 from Bovidae and Cricetidae The pathogenicity of 2019 novel coronavirus in hACE2 transgenic mice Comparative pathogenesis of COVID-19, MERS And SARS in a nonhuman primate model Risks and benefits of gain-of-function experiments with pathogens of pandemic potential, such as influenza virus: a call for a science-based discussion A pneumonia outbreak associated with a new coronavirus of probable bat origin alignment of ACE2 amino acid sequence from selected species. The SAR-Cov-2 spike protein binding region is highlighted in red We would like to thank Harinda Rajapaksha for assistance to optimise Gromacs for this project. We would like to thank Oracle Corporation for providing their Cloud computing resources through the Oracle for Research program for the modelling studies described herein. In particular, we wish to Supplementary Figure 4 . Predicted and confirmed utilization of ACE2 receptors by the SARS-Cov-2 spike protein based on sequence homology from Qui et al. 41