key: cord-0332764-i8nx1ya7 authors: Piplani, Sakshi; Singh, Puneet Kumar; Winkler, David A.; Petrovsky, Nikolai title: In silico comparison of spike protein-ACE2 binding affinities across species; significance for the possible origin of the SARS-CoV-2 virus date: 2020-05-13 journal: nan DOI: 10.1038/s41598-021-92388-5. sha: 453ef8987a3e8640d7ac4a9ac949e775a0e5c81f doc_id: 332764 cord_uid: i8nx1ya7 The devastating impact of the COVID-19 pandemic caused by SARS coronavirus 2 (SARS CoV 2) has raised important questions about viral origin, mechanisms of zoonotic transfer to humans, whether companion or commercial animals can act as reservoirs for infection, and why there are large variations in SARS-CoV-2 susceptibilities across animal species. Powerful in silico modelling methods can rapidly generate information on newly emerged pathogens to aid countermeasure development and predict future behaviours. Here we report an in silico structural homology modelling, protein-protein docking, and molecular dynamics simulation study of the key infection initiating interaction between the spike protein of SARS-Cov-2 and its target, angiotensin converting enzyme 2 (ACE2) from multiple species. Human ACE2 has the strongest binding interaction, significantly greater than for any species proposed as source of the virus. Binding to pangolin ACE2 was the second strongest, possibly due to the SARS-CoV-2 spike receptor binding domain (RBD) being identical to pangolin CoV spike RDB. Except for snake, pangolin and bat for which permissiveness has not been tested, all those species in the upper half of the affinity range (human, monkey, hamster, dog, ferret) have been shown to be at least moderately permissive to SARS-CoV-2 infection, supporting a correlation between binding affinity and permissiveness. Our data indicates that the earliest isolates of SARS-CoV-2 were surprisingly well adapted to human ACE2, potentially explaining its rapid transmission. The devastating impact of COVID-19 infections caused by SARS-coronavirus 2 (SARS-CoV-2) has stimulated unprecedented international activity to discover effective vaccines and drugs for this and other pathogenic coronaviruses. [1] [2] [3] [4] It has also raised important questions about the mechanisms of zoonotic transfer of viruses from animals to humans, whether companion animals or those used for commercial purposes can act as reservoirs for infection, and the reasons for the large variations in SARS-CoV-2 susceptibility across animal species. [5] [6] [7] Understanding how viruses move between species may help us prevent or minimize similar events in the future. Methods that elucidate the molecular basis for differences in species susceptibilities of may also shed light on why different human sub-groups exhibit differences in susceptibilities. 8 The most important features of SARS-CoV-2 are its spike protein (S protein) and a functional polybasic cleavage site at the S1-S2 boundary. 9 The SARS-CoV-2 spike monomer consists of a fusion peptide, two heptad repeats, an intracellular domain, N-terminal domain, two subdomains and a transmembrane region. 10 The angiotensin converting enzyme 2 (ACE2) was identified as the main receptor for the SARS-CoV-2 S protein, as it is for SARS CoV, binding to which is a critical initiating event for infection. ACE2 is relatively ubiquitous in humans, existing in cell membranes in the lungs, arteries, heart, kidney, and intestines. It consists of an N-terminal peptidase M2 domain and a C-terminal collectrin renal amino acid transporter domain. Non-human species vary markedly in their susceptibility to SARS-Cov-2 7,11,12 and their ACE2 receptor binding domains also differ substantially. The phylogenetic tree showing relatedness of ACE2 proteins across selected animal species is illustrated in Supplementary We and others 15, 16 postulate that variation in ACE2 structures between species will determine the binding strength of SARS-CoV-2 S protein and suggest which species are permissive to SARS-CoV-2 infection. For example, the low binding affinity of SARS S protein for mouse ACE2 has been postulated as the reason mice are largely non-permissive to SARS infection. Direct measurement of the binding affinity of SARS-CoV-2 S protein to ACE2 (e.g. using cell lines transfected with ACE2 proteins from different species) would be very useful but is time consuming, and purified or recombinant ACE2 proteins from all relevant animal species are not yet available. Here we show how fast, efficient in silico structural modelling and docking algorithms from structure-based drug design can be used to determine the relative binding affinities of the SARS-CoV-2 S protein for ACE2, across multiple common and exotic animal species. [17] [18] [19] We provide novel insights into the species-specific nature of this interaction and impute which species might be permissive for SARS-CoV-2 that may help elucidate the origin of SARS-CoV-2 and the mechanisms for its zoonotic transmission. The results of docking the receptor binding domains (RBDs) of SARS-CoV-2 S and ACE2 receptors of various species using the HDOCK server, refined by MD simulations, are summarized in Tables 1 and 2 . The calculated binding energies for the interactions of SARS-CoV-2 with ACE2 from the species studied are summarized in Table 2 Table 2 . This suggests that more sophisticated computational methods are required to obtain binding affinities more consistent with observation, as in the work presented here and by others, e.g. 22 While this paper was being prepared, Rodrigues et al. published a similar study that used the docking method HADDOCK to estimate the relative strength of binding affinities of SARS-Cov-2 spike protein for ACE2 proteins of 30 species. 22 The docking experiments were followed by short MD simulations. This study found 18 species had higher affinity than the human ACE2, including dog, pangolin, ferret, Siberian tiger, horseshoe bat, civet, hamster and guinea pig but also goldfish, sheep, cat, horse and rabbit. Goldfish (no known permissiveness of infection with SARS-Cov-2) had the second highest affinity after dog. The authors stated that, despite their method having some discrimination between species that are susceptible and those that are not, their predictions were not entirely correct. For example, they rank guinea pig ACE2 (SARS-CoV-2 negative) as a better receptor for SARS-CoV-2 RBD than for human, cat, horse, or rabbit ACE2 (all SARS-CoV-2 positive species), despite experiments showing that there is negligible binding between the two proteins. 23 One explanation may be that their MD simulations were short (simulation time not listed) and it longer simulations like those used in this study are needed to obtain more accurate ranking of the binding affinities. Other relevant papers were also published while this paper was in preparation and review. Recently, Damas et al. published an analysis of ACE2 sequences from 410 vertebrate species, including 252 mammals, to study the probability that ACE2 could be used as a receptor by SARS-CoV-2. 15 They classed the species into five risk groups. Man, apes, and monkeys were in the very high-risk group while Chinese hamsters, whales and porpoises were prominent examples of species in the high-risk group. Note added in proof: While this paper was undergoing review, a highly relevant paper was published in Scientific Reports (5 Oct) by Lam et al. 24 on the species specificity of spike-ACE2 interaction. This provides validation of our computational approach and results we obtained on the relative binding affinities of non-human species. As with our study they used MODELLER to generate ACE2 structures for difference species and selected the best refined model using DOPE scores. They calculated free energy differences, DDG, between human ACE2 binding and that of non-human species ACE2 to SARS-Cov-2 spike using free energy perturbation methods. However, energy differences they calculated are similar to those we calculated. Unfortunately, Lam et al. did not report the DDG for the most important species, pangolin, making our study significant in reporting this binding energy. Table 1 . ACE2 RBD residues interacting with the S protein RBD from MD simulations of complexes. Residues interacting with the same S residue in different species ACE2, that differ from those in human ACE2, are labelled green (conservative replacements) or red (nonconservative replacements). Accession Number Felis catus (cat) Equus ferus caballus (horse) Mus musculus (mouse) Ophiophagus Golden hamsters, cattle and cats were members of the medium risk group and dogs, horses and bats were in the low risk group. Pangolins, ferrets, mice, and minks were assigned to the very low risk group in their analyses. The susceptibility predictions from these studies also do not correlate well with observations, consist with the low correlation between binding energies and sequence noted above. Two additional, recent studies have some relevant to our study. Spinello The key interacting residues of ACE2 receptor and SARS-CoV-2 S protein (Table 1) were least conserved, consistent with its low sequence similarity to human ACE2. Identifying species permissive to SARS-CoV-2 is very important for identifying intermediate hosts Our structure-based approach revealed some surprisingly results, contrasting with to those from sequence based analyses. Conspicuously, the predicted binding between SARS-CoV-2 S protein and ACE2 was strongest overall for humans than for any species studied. This very high affinity for human ACE2 was confirmed very recently (July 10) in a preprint by Alexander et al. 41 who also studied the RBD of spike-ACE2 for several species and reported that the SARS-CoV-2 RBD sequence is optimal for binding to human ACE2 compared to other species. They also described this as a remarkable finding that underlies the high transmissibility of the SAR-Cov-2 virus amongst humans. These results are also consistent with a recent report comparing SARS-CoV and SARS-CoV-2 that found a number of differences in the SARS-CoV-2 RBD that made it a much more potent binder to human ACE2 through the introduction of numerous hydrogen bonding and hydrophobic networks 35 helping explain the efficient and rapid transmission SARS-CoV-2 through the human population, once a presumed cross-over event occurred in or around November 2019. Interestingly, as shown in Table 1 , pangolin and human ACE2 are closest in binding energy despite being structurally different and only sharing 10 of 16 interacting residues at the SARS-CoV-2 RBD. This similarity of binding energy is interesting as pangolins have previously been imputed as a potential intermediate host to explain the spill-over of a putative bat coronavirus to humans. However, most of the pangolin ACE2 RBD differences are conservative replacements of residues in the human ACE2 RBD viz. Q24E, D30E, D38E, L79I that are likely to make similar contributions to the binding interaction with SARS-CoV-2 spike. The main differences are the lack of S19 interaction and the replacement of H34 by S34 in pangolin ACE2. However, the MD structures show that the OH moiety in the sidechain of S24 lies in the same region, and can make similar interactions, as the NH moiety in the imidazole sidechain of H34. SARS-CoV-2 also recognizes ACE2 from a variety of animal species, including palm civet, Ferrets are also permissive to SARS CoV infection, and our modelling data indicated that SARS-CoV-2 has a similar binding energy to ferret and hamster ACE2 ( chickens, and ducks, but that ferrets and cats were permissive to infection. 31 Bats have been suggested as the original host species of SARS-Cov-2 infections in humans, with pangolins acting as an intermediate animal vector. Bat CoV RaTG13 has the highest sequence similarity to SARS-CoV-2, with 96% whole-genome identity (50) ., but does not possess neither the furin cleavage site or the pangolin RBD seen in SARS-CoV-2. Could SARS-CoV-2 be an as-yet unidentified bat virus? Although bats carry many coronaviruses, no evidence of a direct relative of SARS-CoV-2 in bat populations has so far been found. As highlighted by our data, the binding affinity of SARS-CoV-2 for bat ACE2 is considerably lower than for human ACE2 and human ACE2 has been shown to not to bind the RaTG13 spike 53 . This suggests that even if SARS-CoV-2 did originally arise from a bat precursor, it must have spent considerable time in another host wherein it adapted its S protein to bind better to the host ACE2. This also resulted in acquiring higher affinity for human ACE2 while lowering its affinity to the original bat ACE2. and humans. Although it has the same spike RBD as SARS-CoV-2, pangolin-CoV is not closely related to SARS-CoV-2 overall, with <90% sequence similarity across its whole genome. 25 It is noteworthy that the common spike RBD in both pangolin CoV and SARS-CoV-2 is able to bind strongly to both pangolin and human ACE2, despite significant differences in the sequences of the ACE2 RDB (only 63% of residues are common ( Table 1) ). Xiao reported a pangolin-CoV that has 100%, 98.6%, 97.8% and 90.7% amino acid identity with SARS-CoV-2 in the E, M, N and S proteins. 26 They surprisingly reported the RBD of the pangolin S protein to be almost identical to genomes estimated the date for the most recent common ancestor from the start to middle of December, consistent with the earliest reported date 1st December 2019 for the initial cluster of pneumonia cases. 56 This study concluded, based on available genome sequence data, that the current epidemic has been driven entirely by human to human transmission at least since December. As the SARS-CoV-2 structure that we employed was obtained from viruses collected early in the outbreak, it is not clear how these early strains of SARS-CoV-2 developed such a high affinity for human ACE2. This suggests that SARS-CoV-2 spike RBD previously evolved by selection on a human-like ACE2. Notably, pangolin ACE2 bears major differences in its RBD to human ACE2. Therefore its is surprising that pangolin-CoV has a similar RBD to SARS-CoV-2 that forms the basis for implicating pangolins directly or indirectly in the origins of SARS-CoV-2. The fact that pangolin CoVs can use human ACE2 for cell entry suggests that pangolin CoVs could represent a source of future human coronavirus pandemics if they were to gain the SARS-CoV-2 furin cleavage site. Given the seriousness of the ongoing SARS-CoV-2 pandemic, it is imperative that all efforts be made to identify the original source of the virus. One question to be addressed is whether the virus is completely natural and was transmitted to humans by an intermediate animal vector, or whether it came from a recombination event that occurred inadvertently or intentionally in a laboratory handling coronaviruses, with the new virus being inadvertently released into the local human population. This is of key importance given the ability to use such information to help prevent any similar outbreak in the future. The one positive to our observation that SARS-CoV-2 already has optimised high binding to human ACE2, providing little selective pressure for mutations in the spike RBD to further increase binding affinity. Thus, a vaccine that induces neutralising antibody to the spike RBD may remain effective long-term, and not need to be modified regularly like influenza vaccines to keep up with RBD mutations. Homology modelling of S protein and ACE2 from multiple species. As no three-dimensional structure of the SARS-CoV-2 S protein was available at the commencement of the project, we generated a homology structure the sequence retrieved from NCBI Genbank Database (accession number YP_009724390.1) in January 2020. A PSI-BLAST search against the PDB database for template selection was performed and the x-ray structure of SARS coronavirus S template (PDB ID 5XLR) was selected with 76.4% sequence similarity to SARS-CoV-2 S protein. The sequence alignment and sequences of related bat and pangolin coronaviruses are shown in Figure 1 . The 3D-structures of the RBD of SARS-Cov-2 S and non-human ACE2 proteins were built using Modeller template. The quality of the generated models was evaluated using the GA341 score 59 and DOPE ((Discrete Optimized Protein Energy) method scores 60 , and the models assessed using SWISS-MODEL structure assessment server (https://swissmodel.expasy.org/assess). 61 Structures with the lowest DOPE score were refined by MD simulations (vide infra) and used for further analysis. The modelled structures were also assessed for quality control using Ramachandran Plot and molprobity scores in SWISSModel. The Ramachandran plot checks the stereochemical quality of a protein by analysing residue-by-residue geometry and overall structure geometry and visualizing energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. The Ramachandran score of SARS-CoV-2 spike protein was 90% in the binding region and the molprobity score was 3.17. The Ramachandran score of the percentage of amino acid residues in the various species ACE2 that fall into the energetically favoured region ranged from 96-99% (Supplementary Table 2 (http://www.gromacs.org/). 66 Simulations were carried out using the GPU accelerated version of the program with the AMBER99SB-ILDN force field and implementing periodic boundary conditions in ORACLE server. Docked complexes were immersed in a truncated octahedron box of TIP3P water molecules. The solvated box was further neutralized with Na+ or Cl− counter ions using the tleap program. Particle Mesh Ewald (PME) was employed to calculate the long-range electrostatic interactions. The cut-off distance for the long-range van der Waals (VDW) energy term was 12.0 Å. The system was minimized without restraints. We applied 2500 cycles of steepest descent minimization followed by 5000 cycles of conjugate gradient minimization. After system optimization, the MD simulations was initiated by gradually heating each system in the NVT ensemble from 0 to 300 K for 50 ps Calculations were also performed for up to 500 ns on human ACE2 to ensure that 100ns is sufficiently long for convergence. Multiple production runs with different starting random seeds were used to estimate binding energy uncertainties for the strongest binding ACE2 structures -human, bat, and pangolin. All complexes stabilized during simulation with RMSD fluctuations converging to a range of 0.5 to 0.8 nm. The RMSD values for superimposition of the Ca backbones of each ACE2 structure before and after 100ns simulation was 1.2±0.1 Å, showing significant movement away from the initial HDOCK structures. We found that the complex had stabilised after 50ns so considered that 100ns was an adequate simulation time. We also compared 100ns simulated structures of the ACE2 proteins from all species against those generated by homology modelling (most x-ray structures were not available) and found the RMSD values for Ca alignments between 0.5-0.8 Å. This suggests that any memory of the human template has been removed or minimized. We also compared the structures generated independently by homology (Modeller) and HDOCK (Supplementary Table 3 ) and they agreed very well (RMSD< 1Å) Calculation of binding free energies of complexes. The binding free energies of the protein-protein complexes were evaluated in two ways. The traditional method is to calculate the energies of solvated SARS-CoV-2 S and ACE2 proteins and that of the bound complex proteins and derive the binding energy by subtraction. ΔG (binding, aq) = G (complex, aq) -G (spike, aq) -G (ACE2, aq) We also calculated binding energies using the molecular mechanics Poisson Boltzmann surface area (MM-PBSA) tool in GROMACS that is derived from the nonbonded interaction energies of the complex. 67, 68 The method is also widely used method for binding free energy calculations. The binding free energies of the protein complexes were analysed during equilibrium phase from the output files of 100 ns MD simulations. The Free energy decomposition analyses were also performed by MM-PBSA decomposition to get a detailed insight into the interactions between the ligand and each residue in the binding site. The binding interaction of each ligand-residue pair includes three terms: the van der Waals contribution, the electrostatic contribution, and the solvation contribution. As the simulations are very lengthy, we only ran multiple simulations for the species proposed as intermediate hosts and potential sources of the original virus, human, pangolin, bat and snake to estimate the uncertainty in the binding energies. As the ACE2 proteins for all species were extremely similar, we expected that these simulation error estimates would be of the same order for all other species. The predicted binding energies for these other species were significantly lower than those of human ACE2. We also used a statistical test to calculate the probability of the pangolin and human ACE2 affinities being different. Data Availability. The coordinates of the S protein-ACE2 complexes will be deposited in. data repositories at La Trobe University and Flinders University. COVID-19 needs a big science approach Pharmacologic Treatments for Coronavirus Disease 2019 (COVID-19): A Review World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19) The COVID-19 vaccine development landscape Can companion animals become infected with COVID-19? Can companion animals become infected with COVID-19? COVID-19: Zoonotic aspects SARS-CoV-2 RBD mutations, ACE2 genetic polymorphism, and stability of the virus-receptor complex: The COVID-19 host-pathogen nexus. bioRxiv From Animal to Human: Interspecies Analysis Provides a Novel Way of Ascertaining and Fighting COVID-19. The Innovation 1 Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Evidence for SARS-CoV-2 Infection of Animal Hosts How do viruses leap from animals to people and spark pandemics? Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Outbreak SARS-CoV-2 spike protein favors ACE2 from Bovidae and Cricetidae Broad host range of SARS-CoV-2 predicted by comparative and structural analysis of ACE2 in vertebrates Spike protein recognition of mammalian ACE2 predicts the host range and an optimized ACE2 for SARS-CoV-2 infection Prediction of Novel Inhibitors of the Main Protease (M-pro) of SARS-CoV-2 through Consensus Docking and Drug Reposition ACE2 the Janus-faced protein -from cardiovascular protection to severe acute respiratory syndrome-coronavirus and COVID-19 Structural basis of receptor recognition by SARS-CoV-2 Predicting the angiotensin converting enzyme 2 (ACE2) utilizing capability as the receptor of SARS-CoV-2. Microbes Infect Silico Analysis of Intermediate Hosts and Susceptible Animals of SARS-CoV-2. ChemRxiv chemrxiv.12057996.v1 Insights on cross-species transmission of SARS-CoV-2 from structural modeling. bioRxiv SARS-CoV-2 and three related coronaviruses utilize multiple ACE2 orthologs and are potently blocked by an improved ACE2-Ig SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins Infection of dogs with SARS-CoV-2 Update on possible animal sources for COVID-19 in humans Simulation of the clinical and pathological manifestations of Coronavirus Disease 2019 (COVID-19) in golden Syrian hamster model: implications for disease pathogenesis and transmissibility Animal and translational models of SARS-CoV-2 infection and COVID-19 Susceptibility of ferrets, cats, dogs, and other domesticated animals to SARS-coronavirus 2 Infection and Rapid Transmission of SARS-CoV-2 in Ferrets Detection of SARS-CoV-2 in a cat owned by a COVID-19−affected patient in Spain Is the Rigidity of SARS-CoV-2 Spike Receptor-Binding Motif the Hallmark for Its Enhanced Infectivity? Insights from All-Atom Simulations Enhanced receptor binding of SARS-CoV-2 through networks of hydrogenbonding and hydrophobic interactions A highly conserved cryptic epitope in the receptor-binding domains of SARS-CoV-2 and SARS-CoV Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2 Which animals are at risk? Predicting species susceptibility to An Overview of SARS-CoV-2 and Animal Infection. Preprints Absence of SARS-CoV-2 infection in cats and dogs in close contact with a cluster of COVID-19 patients in a veterinary campus Susceptibility of ferrets, cats, dogs, and other domesticated animals to SARS-coronavirus 2 Infection and Rapid Transmission of SARS-CoV-2 in Ferrets Age-related rhesus macaque models of COVID-19 Comparative pathogenesis of COVID-19, MERS And SARS in a non-human primate model The pathogenicity of 2019 novel coronavirus in hACE2 transgenic mice Human angiotensin-converting enzyme 2 transgenic mice infected with SARS-CoV-2 develop severe and fatal respiratory disease Pharmacological therapeutics targeting RNA-dependent RNA polymerase, proteinase and spike protein: from mechanistic studies to clinical trials for COVID-19 Absence of SARS-CoV-2 infection in cats and dogs in close contact with a cluster of COVID-19 patients in a veterinary campus Possibility of transmission through dogs being a contributing factor to the extreme Covid19 outbreak in North Italy SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects Risks and benefits of gain-of-function experiments with pathogens of pandemic potential, such as influenza virus: a call for a science-based discussion On the evolutionary epidemiology of SARS-CoV-2 UCSF Chimera--a visualization system for exploratory research and analysis Comparative protein modelling by satisfaction of spatial restraints Comparative protein structure modeling by iterative alignment, model building and model assessment Statistical potential for assessment and prediction of protein structures Toward the estimation of the absolute quality of individual protein structure models MolProbity: all-atom structure validation for macromolecular crystallography Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein HDOCK: a web server for protein-protein and protein-DNA/RNA docking based on a hybrid strategy The HDOCK server for integrated protein-protein docking High performance molecular simulations through multi-level parallelism from laptops to supercomputers Electrostatics of nanosystems: application to microtubules and the ribosome Open Source Drug Discovery, C. & Lynn, A. g_mmpbsa--a GROMACS tool for high-throughput MM-PBSA calculations End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design Predicting the angiotensin converting enzyme 2 (ACE2) utilizing capability as the receptor of SARS-CoV-2. Microbes Infect We would like to thank Harinda Rajapaksha for assistance to optimise GROMACS for this project.We would also like to thank Oracle for providing their Cloud computing resources for the modelling studies described herein. In particular, we wish to thank Peter Winn, Dennis Ward, and 20 Alison Derbenwick Miller from Oracle in facilitating these studies. The opinions expressed herein are solely those of the individual authors and should not be inferred to reflect the views of their affiliated institutions, funding bodies or Oracle corporation. Petrovsky -conceived project, analysed data, contributed to manuscript; Piplani and Singhperformed the computations, analysed data, contributed to the manuscript; Winkler -analysed data and contributed to manuscript