key: cord-0725503-khl418zw authors: Othman, Houcemeddine; Bouslama, Zied; Brandenburg, Jean-Tristan; da Rocha, Jorge; Hamdi, Yosr; Ghedira, Kais; Srairi-Abid, Najet; Hazelhurst, Scott title: Interaction of the spike protein RBD from SARS-CoV-2 with ACE2: Similarity with SARS-CoV, hot-spot analysis and effect of the receptor polymorphism date: 2020-05-14 journal: Biochem Biophys Res Commun DOI: 10.1016/j.bbrc.2020.05.028 sha: 4fd12ab2487048d959e07e92cc34fcc21eacde0f doc_id: 725503 cord_uid: khl418zw The spread of COVID-19 caused by the SARS-CoV-2 outbreak has been growing since its first identification in December 2019. The publishing of the first SARS-CoV-2 genome made a valuable source of data to study the details about its phylogeny, evolution, and interaction with the host. Protein-protein binding assays have confirmed that Angiotensin-converting enzyme 2 (ACE2) is more likely to be the cell receptor through which the virus invades the host cell. In the present work, we provide an insight into the interaction of the viral spike Receptor Binding Domain (RBD) from different coronavirus isolates with host ACE2 protein. By calculating the binding energy score between RBD and ACE2, we highlighted the putative jump in the affinity from a progenitor form of SARS-CoV-2 to the current virus responsible for COVID-19 outbreak. Our result was consistent with previously reported phylogenetic analysis and corroborates the opinion that the interface segment of the spike protein RBD might be acquired by SARS-CoV-2 via a complex evolutionary process rather than a progressive accumulation of mutations. We also highlighted the relevance of Q493 and P499 amino acid residues of SARS-CoV-2 RBD for binding to human ACE2 and maintaining the stability of the interface. Moreover, we show from the structural analysis that it is unlikely for the interface residues to be the result of genetic engineering. Finally, we studied the impact of eight different variants located at the interaction surface of ACE2, on the complex formation with SARS-CoV-2 RBD. We found that none of them is likely to disrupt the interaction with the viral RBD of SARS-CoV-2. binding energy score between RBD and ACE2, we highlighted the putative jump in the affinity from a progenitor form of SARS-CoV-2 to the current virus responsible for COVID-19 outbreak. Our result was consistent with previously reported phylogenetic analysis and corroborates the opinion that the interface segment of the spike protein RBD might be acquired by SARS-CoV-2 via a complex evolutionary process rather than a progressive accumulation of mutations. We also highlighted the relevance of Q493 and P499 amino acid residues of SARS-CoV-2 RBD for binding to human ACE2 and maintaining the stability of the interface. Moreover, we show from the structural analysis that it is unlikely for the interface residues to be the result of genetic engineering. Finally, we studied the impact of eight different variants located at the interaction surface of ACE2, on the complex formation with SARS-CoV-2 RBD. We found that none of them is likely to disrupt the interaction with the viral RBD of SARS-CoV-2. key words: COVID-19, ACE2, viral spike Receptor Binding Domain, 1 homology-based protein-protein docking, variants. The coronavirus SARS-CoV-2 (previously known as nCoV -19) has been 4 associated with the recent epidemic of acute respiratory distress syndrome 5 [2]. Recent studies have suggested that the virus binds to the ACE2 receptor 6 on the surface of the host cell using spike proteins, and explored the binary 7 interaction of these two partners [8, 23] . In this work, we focused our 8 analysis on the interface residues to get insight into four main subjects: (1) 9 The architecture of the spike protein interface and whether its evolution in 10 many isolates supports an increase in affinity toward the ACE2 receptor; 11 (2) How the affinity of SARS-COV-2-RBD and SARS-CoV-RBD toward 12 different ACE2 homologous proteins from different species is dictated by a 13 divergent interface sequences (3) ; A comparison of the interaction hotspots 14 between SARS-CoV and SARS-CoV-2; and finally, (4) whether any of the 15 studied ACE2 variants may show a different binding property compared to 16 the reference allele. To tackle these questions we used multi-scale modelling 17 approaches in combination with sequence and phylogenetic analysis. with discrete Gamma distribution (+G) with 5 rate categories. For the 57 RBD sequences, the best substitution model for maximum likelihood (ML) 58 calculation was selected using a model selection tool implemented on MEGA 59 6 software based on the lowest BIC score. Therefore, the WAG model [20] 60 using a discrete Gamma distribution (+G) with 5 rate categories has been 61 selected. Phylogenetic trees were generated using a ML method in MEGA 6. The 63 consistency of the topology, for the RBD sequences, was assessed using a 64 bootstrap method with 1000 replicates. The resulting phylogenetic tree 65 was edited with iTOL [9]. The co-crystal structure of the spike protein of SARS-CoV complexed 69 to human-civet chimeric receptor ACE2 was solved at 3 of resolution 70 (PDB code 3SCL). We used this structure as a template to build the 71 complex of spike protein from different virus isolates with the human ACE2 72 protein (Uniprot sequence Q9BYF1). The template sequences of the ligand 73 (spike protein) and the receptor (ACE2) were aligned locally with the 74 target sequences using the program Water from the EMBOSS package [12] . 75 Modeller version 9.22 [14] was then used to predict the complex model of 76 each spike protein with the ACE2 using a slow refining protocol. For each 77 model, we generated ten conformers from which we selected the model with 78 the best DOPE score [15] . To calculate the binding energy scores we used, PRODIGY server [22] , 80 MM-GBSA method implemented in the HawkDock server [19] and FoldX5 81 [3] . The contribution of each amino acid in protein partners was calculated 82 HawkDock server. Different 3D structures of human ACE2 (hACE2), 83 each comprising one of the identified variants, were modeled using the 84 BuildModel module of FoldX5. Because it is more adapted to predict the 85 effect of punctual variations of amino acids, we used DynaMut at this stage 86 of analysis [13]. 87 2.4 Flexibility analysis 88 We ran a protocol to simulate the spike RBD fluctuation of SARS-CoV-2 89 and SARS-CoV using the standalone program CABS-flex (version 0.9.14) [7] . 90 Three replicates of the simulation with different seeds were conducted using 91 a temperature value of 1.4 (dimensionless value related to the physical 92 temperature). The protein backbone was kept fully flexible and the number 93 of the Monte Carlo cycles was set to 100. 94 3 Results Sequence and phylogenetic analysis 96 Phylogenetic analysis of the different RBD sequences revealed two well 97 supported clades. Clade 1 includes Rm1 isolate, Bat-SL-CoVZC45 and 98 Bat-SL-CoVZXC21. These three isolates are closely related to SARS-CoV-2 99 as revealed by the phylogenetic tree constructed from the entire genome 100 ( Figure 1A ). Clade 2 includes SARS-CoV-2, RatG13, SZ16, ZS-C, WIV16, 101 MA15, and SARS-CoV-Sino1-11 isolates ( Figure 1A ). SARS-CoV-2 and 102 RatG13 sequences are the closest to the common ancestor of this clade. 103 The exact tree topology is reproduced when we used only the RBD segment 104 corresponding to the interface residues with hACE2. This is a linear 105 sequence spanning from residue N481 to N501 in SARS-CoV-2. To investigate whether the interface of the spike protein isolate evolves 114 by increasing the affinity toward the ACE2 receptor in the final host, we 115 predicted the interaction models of the envelope anchored spike protein (SP) 116 from several clinically relevant coronavirus isolates with hACE2 receptor 117 (PDB files for the complexes are listed in Supplementary Materials 1). The 118 construction of the complex applies a comparative-based approach that uses 119 a template structure in which both partners (ligand and receptor) are closely 120 related to those in the target system respectively. In our study, we only 121 modeled the interaction of the RBD which was shown to be implicated in the 122 physical interaction with ACE2 ( Figure 2A ). The lowest sequence identity 123 of the modeled spike proteins as well as those of any of the orthologous 124 ACE2 sequences (Human, civet, bat, pig, rat, chicken and snake) do not fall 125 below 63% toward their respective templates. At such values of sequence 126 identities it is expected that the template and the target complexes share 127 the same binding mode [6] . We calculated the binding energy scores of the RBD from different virus 131 isolates interacting with hACE2 ( Figure 2b ). All three methods used 132 for the calculation are in agreement that RBDs from bat-SL-CoVZC45, 133 bat-SL-CoVZXC21 and Rm1 show the worst energy scores. While the 134 binding energy score falls in the boundary limit of the incertitude margin for 135 PRODIGY calculation (section 2, Supplementary material 2), the differences 136 in the scores calculated by FoldX and MM-GBSA are not. Therefore we 137 consider that such differences in energies compared to SARS-CoV-2 are 138 consistent between the three methods. Except for FoldX, the affinity is 139 predicted to be more favorable for RBD from SARS-CoV-2 compared to 140 SARS-CoV. However, MM-GBSA only marginally discriminates between 141 the two values. MM-GBSA allowed us to assign the contribution of each amino acid in 161 the interface with hACE2, in the binding energy score. We conducted this 162 analysis using both sequences of the SARS-CoV-2 Wuhan-Hu-1 ( Figure 3A ) 163 and the Sino1-11 SARS-CoV ( Figure 3B) isolates. Residues F486, Y489, 164 Q493, G496, T500 and N501 of SARS-CoV-2 RBD forming the hotspots of 165 the interface with hACE2 protein were investigated (we only consider values 166 > 1 or < 1 kcal/mol to ignore the effect due to the thermal fluctuation). 167 All these amino acids form three patches of interaction spread along the 168 linear interface segment ( Figure 3C ): two from the N and C termini and one 169 central. T500 establishes two hydrogen bonds using its side and main chains 170 with Y41 and N330 of hACE2. N501 forms another hydrogen bond with 171 ACE2 residue K353 buried within the interface. On the other hand, SARS-172 CoV RBD interface contains five residues ( Figure 3D ), L473, Y476, Y485, 173 T487 and T488 corresponding to the equivalent hotspot residues of RBD 174 from SARS-CoV-2 F487, Y490, G497, T501 and N502. Therefore, Q493 as 175 a hotspot amino acid is specific to SARS-CoV-2 interface. The equivalent 176 residue N480 in SARS-CoV only shows a non-significant contribution of 177 0.18 kcal/mol. The similarity matrix analysis was conducted to assess the divergence of 179 the interaction interface of RBDs qualitatively, i.e. the specific set of residues 180 implicated in the interaction with ACE2, and quantity, i.e. the contribution 181 of each residue in the binding energy score. The similarity matrix was 182 calculated from free energy decomposition of interface residues of RBDs 183 from SARS-CoV-2 and SARS-CoV in complex with ACE2 orthologous 184 and reported as a network representation ( Figure 3E and Figure 1 and 185 2 in Supplementary Materials 2). We noticed the existence of densely 186 interconnected edges involving all the protein-protein complexes for SARS-187 CoV-2 and SARS-CoV except those involving ACE2 from Sus scrofa and 188 Rattus norvegicus. Complexes involving the RBD of SARS-CoV-2 show less 189 intrinsic similarity compared to RBD of SARS-CoV. However, similarity 190 scores tend to be uniform in the group involving ACE2 from human, civet, 191 dog, bat, snake, and chicken. The complex including hACE2 does not seem 192 to diverge from the rest of the members of the SARS-CoV-2 group such as 193 the case of Sus scrofa and Rattus norvegicus. 195 Sequence analysis and the visual inspection of RBD/hACE2 complex might 196 reflect the substitution of P499 in SARS-CoV-2 RBD as a form of adaptation 197 toward a better affinity with the receptor. In order to further investigate 198 its role, we performed a flexibility analysis using a reference structure 199 (SARS-CoV-2 RBD containing P499) and an in silico mutated form P499T, 200 a residue found in SARS-CoV and most of the clade 2. Our results show 201 that the mutation caused a significant decrease in stability for nine residues 202 of the interface corresponding to segment 482-491 ( Figure 3F) . Indeed, the 203 RMSF variability per amino acid for this sequence increases compared to 204 the reference structure. A total of eight variants of hACE2 that map to the interaction surface 208 are described in the gnomAD database ( Figure 4A ). All these variants are 209 rare (Table 1 ) and mostly found in European non-Finnish and African 210 populations. Considering both the enthalpy (ddG) and the vibrational 211 entropy in our calculation (ddS), we found no significant changes (> 1 or 212 < 1 kcal/mol) in neither the folding energy of the complex ( Figure 4B ) nor 213 the interaction energy of the protein-protein partners ( Figure 4C ). Since the Covid-2019 outbreak, several milestone papers have been published 216 to examine the particularity of SARS-CoV-2 spike protein and its putative 217 interaction with ACE2 as a receptor [21] . In the current study, we focused 218 our analysis on the interface segments of SARS-CoV-2 spike RBD interacting 219 with ACE2 from different species by estimating interaction energy profiles. 220 We have studied the effect of eight variants of ACE2 in order to detect 221 polymorphisms that may increase or decrease virulence in the host. Our 222 results showed that if ACE2 is the only route for the infection in humans, 223 variants interacting physically with RBD are not likely to disrupt the 224 formation of the complex and would have a marginal effect on the affinity. 225 Therefore, it is unlikely that any form of resistance to the virus, related to 226 the ACE2 gene, exists. However, this analysis merits to be investigated in 227 depth in different ethnic groups for a better assessment of the contribution 228 of genetic variability in host-pathogen interaction. The similar values of binding energy scores with different ACE2 ortho-230 logues suggest that the ability of binding to different ACE2 orthologous is 231 preserved in many species either for SARS-CoV-2 or SARS-CoV. Therefore, 232 the transition to the zoonotic form is trivial if that depends only on ACE2 233 as the primary route to the infection in both the intermediate and the 234 final host. However, we know that such a process is very complex since it 235 requires many protein-protein interactions to acquire the specific capacity 236 of infecting and replicating in the host cells [18] . Consequently, it makes 237 sense to assume that many other types of receptors or co-receptors may 238 be critical to determine the capacity of crossing the species barrier. This 239 has been already suggested for SARS-CoV [1] and similarly, SARS-CoV-2 240 may show the same feature. Moreover, our results show that the significant 241 overlap of glycosylation sites with the protein-protein interface implies a 242 likely interaction of SARS-CoV-2 progenitors with receptors other than 243 ACE2. Finally, recent transcriptomic profiling has suggested the possibility 244 of multiple route infections via the interaction of many human receptors 245 for SARS-CoV-2 [11]. Whole-genome phylogenetic analysis of the different isolates included in 247 this study is consistent with previous works that place the Wuhan-Hu-1 248 isolate close to Bat-SL-CoVZC45 and Bat-SL-CoVZXC21 isolates [10, 17] 249 within the Betacoronavirus genus. The use of RBD sequences, however, 250 places the virus in a clade that comprises SARS-CoV related homologs 251 including isolates from Bat and Civet. The clade swapping as seen in 252 figure 1A , seems also to occur for RaTG13 and Rm1 isolated from bat. This 253 is expected as the use of different phylogenetic markers may considerably 254 affect the topology of the tree. However, The significant divergence in 255 the interfaces segments as a key molecular element contributing to the 256 determination of the tree topology has driven our work toward studying 257 their impact on the interaction with hACE2. The binding of the spike 258 glycoprotein to ACE2 receptor requires a certain level of affinity. In the 259 case where the RBD evolves from an ancestral form closer to that of Bat-SL-260 CoVZC45 and Bat-SL-CoVZXC21, we expected a decrease of the binding 261 energy scores through the evolution process following incremental changes in 262 the RBD. In such a scenario, we presume that there are other intermediary 263 forms of coronavirus that describe such variation of the binding energy 264 score to reach a level where the pathogen can infect humans with high 265 affinity toward hACE2. On the other hand, our results show that the 266 binding energy score and the interface sequence of SARS-CoV-2 RBD are 267 closer to SARS-CoV related isolates (either from Human or other species). 268 Therefore a recombination event involving the spike protein that might 269 have occurred between SARS-CoV and an ancestral form of the current 270 SARS-CoV-2 virus might be also possible. This will allow for the virus to 271 acquire a minimum set of residues for the interaction with hACE2. The 272 recombination in the spike protein gene has been previously suggested 273 by Wei et al in their phylogenetic analysis [4] . Thereafter, incremental 274 changes in the binding interface segment will occur in order to reach a 275 better affinity toward the receptor. One of these changes may involve P499 276 residue which substitution to threonine seems to drastically destabilize the 277 interface segment and has a distant effect. Moreover, the decomposition 278 of the interaction energy showed that 5 out of 6 hotspot amino acids in 279 SARS-CoV-2 have their equivalent in SARS-CoV including N501. Contrary 280 to what Wan et al [17] have stated, the single mutation N501T does not 281 seem to enhance the affinity. Rather, the residue Q493 might be responsible 282 for such higher affinity due to a better satisfaction of the Van der Waals 283 by the longer polar side chain of asparagine. Indeed, when we made the 284 same analysis while mutating Q493 to N493, the favorable contribution 285 decreases from -2.55 kcal/mol to a non significant value of -0.01 kcal/mol, 286 thus supporting our claim. No major divergence of the interaction interface of SARS-CoV-2 RBD 288 with hACE2 was noticed from the similarity matrix analysis. This suggests 289 that the molecular elements required for the binding with the receptor might 290 also be involved in the interaction with other orthologous forms of ACE2 291 and that these elements are not optimized specifically for the human form. 292 Therefore, it is unlikely that the interface of RBD from SARS-CoV-2 is a 293 result of human intervention via genetic engineering aiming to increase the 294 affinity toward ACE2. For example, residue E484 contributes unfavorably 295 to the binding energy with 2.24 kcal/mol due to an electrostatic repulsion 296 with E75 from hACE2. This residue is an apparent choice for engineering 297 a protein-protein complex with high affinity by substituting E484 with a 298 polar residue. It is, however, noteworthy that the lesser homogeneity of the 299 nodes of SARS-CoV-2 group, in comparison to SARS-CoV, may suggest 300 a higher tolerance for the mutation of the new virus which would allow 301 it to cross the species barrier more easily and to efficiently optimize the 302 interaction in the host. being understanding about his absence and for not being able to bring him 311 the marshmallow candy because of the COVID-19 outbreak. Highlights for the paper "Interaction of the spike protein RBD from SARS-CoV-2 with ACE2: similarity with SARS-CoV, hot-spot analysis and effect of the receptor polymorphism" • We noticed a large difference in the binding energy of RBD from the spike protein of SARS-CoV-2 with ACE2, compared to the closest isolates to the proposed progenitor forms. • Currently, available sequence data do not explain the transition from low-affinity RBD forms to high-affinity RBD that are capable to infect the Humans. • We suggest that residues N493Q and not N501T as was reported previously is responsible for a higher affinity toward ACE2 when we compare SARS-CoV-2 to • The mutation T499P is responsible for stabilizing the interface of RBD interacting with ACE2 when we compare SARS-CoV-2 to SARS-CoV • It is unlikely that the interface of RBD from SARS-CoV-2 is a result of human intervention aiming to increase the affinity toward ACE2. • The role of receptors other than ACE2 is shown, and these may have a more critical function in crossing the species barrier. • Variants that corresponding to residues on the ACE2 interface are unlikely to be associated with resistance or sensibility forms compared to the reference allele. SARS-CoV and emergent 314 coronaviruses: viral determinants of interspecies transmission A familial cluster of pneumonia associated with 320 the 2019 novel coronavirus indicating person-to-person transmission: 321 a study of a family cluster FoldX 5.0: 323 working with RNA, small molecules and a new graphical interface Cross-species transmis-326 sion of the newly identified coronavirus 2019-nCoV MAFFT online ser-329 vice: multiple sequence alignment, interactive sequence choice and 330 visualization Templates are 332 available to model nearly all complexes of structurally characterized 333 proteins CABS-flex standalone: a simulation environment for 336 fast modeling of protein flexibility Crystal structure of the 2019-ncov 340 spike receptor-binding domain bound with the ace2 receptor. bioRxiv Interactive Tree Of Life (iTOL) v4: recent 343 updates and new developments Full-genome evolutionary analysis 347 of the novel corona virus (2019-nCoV) rejects the hypothesis of 348 emergence as a result of a recent recombination event Single cell RNA sequenc-351 ing of 13 human tissues identify cell types and receptors of human 352 coronaviruses EMBOSS: the European Molec-354 ular Biology Open Software Suite DynaMut: predicting 357 the impact of mutations on protein conformation, flexibility and 358 stability Comparative protein modelling by sat-360 isfaction of spatial restraints Statistical potential for assessment and 363 prediction of protein structures 366 MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Receptor 369 recognition by novel coronavirus from Wuhan: An analysis based on 370 decade-long structural studies of SARS How host genetics dictates successful 372 viral zoonosis 374 HawkDock: a web server to predict and analyze the protein-protein 375 complex based on computational docking and MM/GBSA. Nucleic 376 Acids Res A General Empirical Model of Protein 378 Evolution Derived from Multiple Protein Families Using a Maximum Cryo-EM structure 383 of the 2019-nCoV spike in the prefusion conformation PRODIGY: a web server for predicting the binding 387 affinity of protein-protein complexes Structural basis for 390 the recognition of the 2019-ncov by human ace2. bioRxiv