key: cord-0749872-sxyeimtv authors: Calcagnile, Matteo; Forgez, Patricia; Iannelli, Antonio; Bucci, Cecilia; Alifano, Marco; Alifano, Pietro title: Molecular docking simulation reveals ACE2 polymorphisms that may increase the affinity of ACE2 with the SARS-CoV-2 Spike protein date: 2020-11-09 journal: Biochimie DOI: 10.1016/j.biochi.2020.11.004 sha: 0ea339548480ba625dd8ef356cd384c6fd82f9d5 doc_id: 749872 cord_uid: sxyeimtv There is increasing evidence that ACE2 gene polymorphism can modulate the interaction between ACE2 and the SARS-CoV-2 spike protein affecting the viral entry into the host cell, and/or contribute to lung and systemic damage in COVID-19. Here we used in silico molecular docking to predict the effects of ACE2 missense variants on the interaction with the spike protein of SARS-CoV-2. HDOCK and FireDock simulations identified 6 ACE2 missense variants (I21T, A25T, K26R, E37K, T55A, E75G) with higher affinity for SARS-CoV-2 Spike protein receptor binding domain (RBD) with respect to wild type ACE2, and 11 variants (I21V, E23K, K26E, T27A, E35K, S43R, Y50F, N51D, N58H, K68E, M82I) with lower affinity. This result supports the hypothesis that ACE2 genetic background may represent the first “genetic gateway” during the disease progression. Different phases can be distinguished during the progression of COVID-19 [1, 2] . During the first phase, after the incubation period lasting 6 days in average, the onset of disease may be characterized by influenza-like symptoms, from mild to moderate. The second phase, which is known as the pulmonary phase and involves~30% of all SARS-CoV-2 infected subjects, is characterized by progressive respiratory involvement with onset of pneumonia-like symptoms. The third phase, which develops in~15% of all patients, is known as the pro-inflammatory phase, and is characterized by severe interstitial pneumonia with focal and systemic iper-inflammation, which may lead to acute respiratory distress syndrome, and systemic inflammatory response syndrome. The fourth phase of COVID-19, which is known as the pro-thrombotic phase, develops in~5% of patients, and is characterized by the onset of microvascular and macrovascular thrombosis possibly promoted by strong focal and/or systemic inflammation. During this phase patients require medical treatment in intensive care units, and most of them do not survive. SARS-CoV-2 infection susceptibility and severity seem to be influenced by environmental factors (climate, pollution, cultural, social and economic inequalities, climate, health care system organizations), co-morbidities (high blood pressure, cardiovascular disease, other heart and lung conditions, diabetes, cancer, or compromised immune system), and inter-individual genetic differences [3e6] . Inter-individual genetic differences may affect the spatial transmission dynamics of COVID-19, the susceptibility and severity of disease, and the inflammatory and immune response, and three "genetic gateways" have been proposed accounting for disease progression [7] . Specifically, there is evidence that angiotensin-converting enzyme 2 (ACE2) is the human cell receptor of SARS-CoV-2 [8e10], and it was speculated [5,7,11e17 ] that ACE2 gene polymorphism may modulate the interaction between ACE2 and the Spike protein of SARS-CoV-2 during the virus entry into the host cell. In particular, differential affinity of a number of ACE2 missense variants for Spike protein was predicted using different computational approaches [12,18e21] . Moreover, since ACE2 regulates the renin-angiotensin-aldosterone system [22] , ACE2 missense variants or expression quantitative trait loci (eQTL) variants may contribute to pulmonary and systemic injury by fostering vasoconstriction, inflammation, oxidation and fibrosis, thereby affecting the clinical outcome [4,11,15,23e25] . The possible association between specific ACE2 gene variants and COVID-19 susceptibility, severity, and clinical outcomes is supported by massive genomic data from general population [26] , while large-scale genome-wide association studies are urgently needed to firmly establish the causal link [27] . In this study we have used in silico molecular docking to analyze the possible effects of ACE2 single nucleotide polymorphisms (SNPs) leading to missense variants on the interaction between ACE2 and SARS-CoV-2 Spike protein. Molecular docking was performed with HDOCK, a powerful pipeline for integrated proteinprotein docking, which is based on hybrid docking algorithm of template-based modeling and ab initio free docking to optimize the adjustment of ligand [28e31]. The HDOCK pipeline differs from other molecular docking platforms in its ability to support amino acid sequences as inputs, and in its hybrid docking strategy in which experimental information on the protein-protein binding site and small-angle X-ray scattering are incorporated during the docking and post-docking processes [28] . With respect to the other pipelines that were previously used to model the interaction between SARS-CoV-2 Spike protein and ACE2 missense variants [5,7,11,13e17] , HDOCK has the advantage of integrating two approaches with the same software, together with a remarkable simplicity of use, and it is completely automated with consequent high reproducibility. 3D structures of proteins were downloaded from Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) (http://www.rcsb.org/) [32] . We focused our analysis on structures of SARS-COV-2 Spike Receptor Binding Domain (RBD)/ ACE2 complexes 6M17 (10.2210/pdb6M17/pdb) [33] 6LZG (10.2210/ pdb6LZG/pdb) [34] , and 6M0J (10.2210/pdb6M0J/pdb) [35] models. The Single Nucleotide Polymorphism Database (dbSNP) [36, 37] was used to identify the ACE2 SNPs leading to missense variants. Functional information of ACE2 was acquired by UniProt database (Q9BYF1, ACE2_HUMAN) [38] . ACE2 SNP frequencies were obtained by the database GnomAD-Exomes (https://gnomad.broadinstitute. org/). ACE2 bat sequences were downloaded from NCBI database. Multiple alignments of human and bat sequences were carried out by Clustal Omega [39] . HDOCK server (http://hdock.phys.hust.edu.cn/) was used to carry out molecular docking between receptor binding domain (RBD) of SARS-CoV-2 Spike protein and ACE2 wild type or missense variants from the dbSNP. We focused our analysis on two ACE2 Nterminal alpha helices that form the major binding interface with Table 1 ACE2 SNPs analyzed in this study, their frequencies, and Global Energy Score (GES, Kcal/mol) of the interaction between wild type ACE2 or ACE2 missense variants and SARS-Cov-2 Spike protein. SARS-CoV-2 Spike protein RBD based on X-ray crystallography [33e35]. In our analysis we used, as a receptor, the amino acid sequence of ACE2 wild type or missense variants, and, as a ligand, the SARS-CoV-2 Spike protein RBD models (6LZG chain B, 6M0J chain E or 6M17 chain E) downloaded from RCSB-PDB database. 6M17 was the most complete structure because it contains the ACE2 collectrin-like domain [33] . Although this domain is far from the binding interface, it could still affect the geometry of the protein. Since HDOCK only provides score for ab initio free docking, to compare the complex scores obtained by ab initio free docking and template-based modeling we used FireDock (http://bioinfo3d.cs. tau.ac.il/FireDock/) [40, 41] . Results of HDOCK/FireDock simulations were confirmed by submitting HDOCK-generated ACE2/Spike protein RBD complexes to PRODIGY [42] . Furthermore, wild type ACE2 and K26R ACE2 models were also built by using MODELLER 9.25 [43] via Chimera [44] , and these models were used as receptors in SwarmDock simulations [45] . HDOCK/FireDock pipeline was also used to evaluate the impact of SARS-CoV-2 Spike protein RBD variants on binding to wild type or K26R ACE2. QMEANDisCo (SwissDock) [46] and MolProbity [47] were used for bad bonds and angles metrics. A detailed protocol of the computational workflow is depicted in Supplementary Fig. S1 . Two ACE2 N-terminal alpha helices form the major binding interface with SARS-CoV-2 Spike protein RBD based on X-ray crystallography [33] [34] [35] . In this region 25 SNPs causing leading to ACE2 missense variants are listed in the dbSNP. HDOCK and FireDock pipelines were used for molecular docking. For each ACE2 missense variant, three docking simulations were carried, each with a different PDB model (6M17, 6LZG, 6M0J) , and the results obtained with the two methods (template-based modeling and ab initio free docking) were analyzed separately. Before proceeding with the simulations, the quality of the models generated by HDOCK was analyzed and compared with the quality of the corresponding models generated by MODELLER. The analysis with QMEANDisCo demonstrated good quality of all models with global scores similar to those of the control PDB models (6LZG, 6M0J and 6M17) . Moreover, percentage of bad angles according to MolProbity was slightly lower with HDOCK (wild type ACE2 ¼ 0.8%; K26R ACE2 ¼ 0.77) compared to MOD-ELLER (1.24% for both wild type and K26R ACE2) ( Supplementary Fig. S2) . Overall, HDOCK/FireDock results with the different PDB models and methods were concordant in 92% of cases (Table 1; Supplementary Table S1 ; Fig. S3 ). For only two polymorphisms (N58K and M62V), the template-based method produced results that differed from those produced by ab initio docking. We performed 156 docking simulations (i.e., 26 ACE2 SNPs, S19AQP ACE2, wild type ACE2, all multiplied by three ligands and two methods). Global energy score (GES) average with all simulations was À47.20 kcal/mol (Fig. 1A, gray line) , total standard deviation was 6.39 kcal/mol and confidence interval was ±1.0035 kcal/ mol (Fig. 1A, dotted line) . The highest GES was À37.94 kcal/mol (M82I), while the lowest one was À56.24 (T55A). We used total GES average, GES value with wild type ACE2 (Fig. 1A, red line) , and confidence interval as a threshold to screen the SNPs, considering as relevant only the SNPs that affected significantly the binding with SARS-CoV-2 Spike protein RBD. By using this approach, we found 6 out of the 25 ACE2 missense variants (24%) (I21T, A25T, K26R, E37K, T55A, E75G) that showed higher affinity for SARS-CoV-2 Spike protein RBD with respect to wild type ACE2, and 11 variants (44%) (I21V, E23K, K26E, T27A, E35K, S43R, Y50F, N51D, N58H, K68E, M82I) that exhibited lower affinity in silico (Fig. 1A) . GnomAD-Exomes database was used to gain information about frequencies of the examined ACE2 SNPs worldwide (Table 1; Supplementary Table S2 ). K26R is the most diffused one with a global frequency of 0.3971%. The large diffusion of this SNP is also confirmed by others database: 0.4579% in TOPMED; 0.595% in 4ALFA Project; 0.368% in ExAC; 0.315% in GnomAD; 0.511% in GO-ESP; 0.21% in 1000G; 0.62% in TWINSUK; 0.93% in ALSPAC. In particular, the K26R occurs with highest frequency in European (0.503%) and American (0.329%) populations with maximum value in Ashkenazi Jewish (1.2%), while it is less common in both African (0.099%) and Asian (0.079%) populations. Cumulative frequency analysis of ACE SNPs demonstrated that ACE2 missense variants exhibiting increased affinity for SARS-CoV-2 Spike protein were more common in European and American populations (Fig. 1B) , while those exhibiting reduced affinity were more common in African and Asian populations (Fig. 1C) . The frequencies of each ACE2 missense variants were plotted individually in Supplementary Fig. S4. A number of missense variants affecting the SARS-CoV-2 Spike protein have recently been identified worldwide, and listed in a comprehensive database [48] . In particular, some of these variants including N439K, L455F, F456L, A475V, Q493R, Q493L and N501Y, fall into the interfaces of binding of RBD. The impact of these RBD variants on binding to wild type ACE2 or K26R ACE2 was then evaluated. We focused on K26R missense variant because of its high frequency in the general population. Preliminary, PRODIGY was used to confirm the effect of ACE2 K26R missense on wild type Spike protein RBD binding as predicted by FireDock with HDOCK complexes. Moreover, PRODIGY calculated dissociation constants (K d ) that were 8.8 E À10 for the ACE2 K26R and 4.610 E À9 for wild type ACE2. The effect of K26R missense was further confirmed using models that were generated by MODELLER, and then submitted to SwarmDock obtaining an energy of À39.88 kcal/mol for wild type ACE2 and of À46.13 kcal/mol for K26R ACE2. HDOCK/FireDock analysis was performed using the wild type or K26R ACE2 receptor and missense variants of the Spike protein RBD listed above. Results demonstrated that 5 of the 7 RBD mutations increased binding affinity for wild type ACE2 (Fig. 1D) , while 5 of the 7 RBD mutations decreased the binding affinity for K26R ACE2 compared to wild type RBD (Fig. 1E ). The present study supports the hypothesis that ACE2 gene polymorphism may contribute to the genetic susceptibility to COVID-19 affecting the SARS-CoV-2 entry into the host cells, thus representing the first "genetic gateways" during disease progression [7] . Our results broaden the list of ACE2 missense variants that can affect the interaction with the SARS-CoV-2 spike protein [7, 18, 19, 21] . Specifically, we focused our attention on ACE2 SNPs affecting two N-terminal alpha helices that form the major binding interface with SARS-CoV-2 Spike protein [33] [34] [35] . We did not include in our analysis the S19P variant because it falls into the cleavage site of ACE2 precursor, and it may affect the N-terminal sequence of the mature protein. Besides, there is evidence that the S19P may reduce the affinity for SARS-CoV-2 Spike protein [19] . Results about K26R that is expected to increase the affinity for Spike protein is noteworthy because this variant is relatively frequent in European people with a frequency about 0.5%, which would correspond to a potential target population of 2,230,000 people at the European Union level [12] . In this study we confirmed the results of K26R by using HDOCK that also allowed us to identify additional missense variants (I21T, A25T, E37K, T55A, E75G) with higher affinity for SARS-CoV-2 Spike protein, and 11 variants (I21V, E23K, K26E, T27A, E35K, S43R, Y50F, N51D, N58H, K68E, M82I) with lower affinity. It is worth of noticing the K26R variant of ACE2 was identified in a COVID-19 patient but not in control subjects in Italy, in a recent genome-wide association study enrolling a cohort of 131 patients and 258 controls [27] , further reinforcing the hypothesis that this missense variant may be associated with clinical susceptibility to disease. Beside, it may be also relevant to note is rather common in different families of bats including Vespertilionidae and Phyllostomidae ( Supplementary Fig. S5 ). Specifically, Phyllostomidae are diffused in South America (Desmodus rotundus XP_024425698.1, Phyllostomus discolor XP_028378317.1), while Phyllostomidae are very common in China (Pipistrellus abramus ACT66266.1) and Indochina (Kerivoula pellucida QJF77795.1), and the presence of P. abramus was confirmed in the Wuhan area [49] . It is conceivable that the polymorphisms responsible for a higher affinity may be responsible for a greater severity of the disease in humans, especially when very high affinity receptors are overexpressed due to the environmental and pharmacological factors. Of course, underlying diseases would contribute to an even more severe course of the disease, with an intense viral replication capable of infecting in turn a large number of persons, including some individuals with similar ACE2 polymorphisms, and so on. Another aspect to consider is the co-evolution of Spike protein. Indeed, missense mutations in the Spike RBD may have conflicting effects on binding affinity for wild type and K26R ACE2 (Fig. 1D and E). Polymorphisms in genes coding for proteases from the respiratory tract belonging to the transmembrane protease/serine subfamily (TMPRSS) may also contribute to inter-individual differences in susceptibility and severity of disease [26, 50] . Indeed, there is evidence that TMPRSS proteolytic activity induces SARS-CoV Spike protein fusogenic activity, and, notably, SARS-CoV-2 cell entry is dependent on TMPRSS2, and blocked by protease inhibitors [51] . Obviously, the impact of these polymorphisms on severity of outcome should be weighted by appropriate demographic and clinical factors. If this difference were confirmed, this would pave the way for the identification, on a population scale, of healthy individuals whose molecular phenotypes would be responsible for disease that is more serious. Apart from the usual social distancing measures, targeted drug prevention strategies could be evaluated. It could be logical to assess pharmacological prophylactic interventions, as proposed in categories of healthy people at particular risk of exposure such as care-givers. The serine protease inhibitor camostat mesylate, approved in Japan to treat unrelated diseases, has been shown to block TMPRSS2 activity [52, 53] , and is thus an interesting candidate. Conversely, the identification of broader categories of people with lower risk of developing severe disease, could allow a safer exit from the lock-down phases, while facilitating the establishment of a faster herd immunity, and waiting reliable serological tests and effective vaccine. Authors' contribution M.C. contributed to experimental set-up, pipeline development, in silico analysis; P.F., A.I., and C.B. contributed to study designing and data providing; M.A. and P.A. contributed to coordination, conception, designing and writing. M.A. and P.A. contributed equally to the work. All authors critically revised draft versions of the manuscript and approved the final version. The authors declare no conflict of interest. Natural history of COVID-19 and current knowledge on treatment therapeutic options COVID-19: unravelling the clinical progression of nature's virtually perfect biological weapon Assessing Risk Factors for Severe COVID-19 Illness Renin-angiotensin system at the heart of COVID-19 pandemic COVID-19 pandemic: a European perspective on health economic policies Obesity and COVID-19: ACE 2, the missing tile Genetic gateways to COVID-19 infection: implications for risk, severity, and outcomes Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation The two faces of ACE2: the role of ACE2 receptor and its polymorphisms in hypertension and COVID-19 ACE2 Polymorphisms and Individual Susceptibility to SARS-CoV-2 Infection: Insights from an in Silico Study, bioRxiv Comparative genetic analysis of the novel coronavirus (2019-nCoV/ SARS-CoV-2) receptor ACE2 in different populations The expression and polymorphism of entry machinery for COVID-19 in human: juxtaposing population groups, gender, and different tissues ACE2 receptor polymorphism: susceptibility to SARS-CoV-2, hypertension, multi-organ failure, and COVID-19 disease outcome COVID-19 vulnerability: the potential impact of genetic susceptibility and airborne transmission Do genetic polymorphisms in angiotensin converting enzyme 2 (ACE2) gene play a role in coronavirus disease 2019 (COVID-19)? Studying the effects of ACE2 mutations on the stability, dynamics, and dissociation process of SARS-CoV-2 S1/hACE2 complexes Structural variations in human ACE2 may influence its binding with SARS-CoV-2 spike protein Human ACE2 Receptor Polymorphisms Predict SARS-CoV Interaction of the spike protein RBD from SARS-CoV-2 with ACE2: similarity with SARS-CoV, hot-spot analysis and effect of the receptor polymorphism COVID-19 and the cardiovascular system COVID-19 and individual genetic susceptibility/receptivity: role of ACE1/ACE2 genes, immunity, inflammation and coagulation. Might the double X-chromosome in females Be protective against SARS-CoV-2 compared to the single X-chromosome in males? Genetic associations with plasma angiotensin converting enzyme 2 concentration: potential relevance to COVID-19 risk Analysis of ACE2 genetic variability among populations highlights a possible link with COVID-19-related neurological complications New insights into genetic susceptibility of COVID-19: an ACE2 and TMPRSS2 polymorphism analysis ACE2 gene variants may underlie interindividual variability and susceptibility to COVID-19 in the Italian population The HDOCK server for integrated proteinprotein docking Addressing recent docking challenges: a hybrid strategy to integrate template-based and free protein-protein docking HDOCK: a web server for proteinprotein and protein-DNA/RNA docking based on a hybrid strategy An iterative knowledge-based scoring function for protein-protein recognition The protein Data Bank Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Structural and functional basis of SARS-CoV-2 entry by using human ACE2 Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Sirotkin, dbSNP: the NCBI database of genetic variation Searching NCBI's dbSNP database UniProt: the universal protein knowledgebase Clustal Omega, accurate alignment of very large numbers of sequences FireDock: fast interaction refinement in molecular docking FireDock: a web server for fast interaction refinement in molecular docking Prodigy: A web server for predicting the binding affinity of proteineprotein complexes Comparative protein structure modeling using MODELLER UCSF Chimerada visualization system for exploratory research and analysis A server for flexible proteineprotein docking QMEANDisCoddistance constraints applied on model quality estimation MolProbity: all-atom contacts and structure validation for proteins and nucleic acids Comprehensive annotations of the mutational spectra of SARS-CoV-2 spike protein: a fast and accurate pipeline Distribution and preference of landscape features and foraging sites of insectivorous bats in Hong Kong urban parks Variability in genes related to SARS-CoV-2 entry into host cells (ACE2, TMPRSS2, TMPRSS11A, ELANE, and CTSL) and its potential use in association studies SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Simultaneous treatment of human bronchial epithelial cells with serine and cysteine protease inhibitors prevents severe acute respiratory syndrome coronavirus entry Protease inhibitors targeting coronavirus and filovirus entry We wish to thank prof. Diane Damotte (University of Paris) for advice and critical reading of the manuscript. Supplementary data related to this article can be found at https://doi.org/10.1016/j.biochi.2020.11.004.