key: cord-0829144-0h1c0wkh authors: Salpini, Romina; Alkhatib, Mohammad; Costa, Giosuè; Piermatteo, Lorenzo; Ambrosio, Francesca Alessandra; Di Maio, Velia Chiara; Scutari, Rossana; Duca, Leonardo; Berno, Giulia; Fabeni, Lavinia; Alcaro, Stefano; Ceccherini-Silberstein, Francesca; Artese, Anna; Svicher, Valentina title: Key genetic elements, single and in clusters, underlying geographically dependent SARS-CoV-2 genetic adaptation and their impact on binding affinity for drugs and immune control date: 2020-11-30 journal: J Antimicrob Chemother DOI: 10.1093/jac/dkaa444 sha: 31b855db932845e2bcb7d1cdbd6a7d501d6569b4 doc_id: 829144 cord_uid: 0h1c0wkh OBJECTIVES: To define key genetic elements, single or in clusters, underlying SARS-CoV-2 (severe acute respiratory syndrome coronavirus-2) evolutionary diversification across continents, and their impact on drug-binding affinity and viral antigenicity. METHODS: A total of 12 150 SARS-CoV-2 sequences (publicly available) from 69 countries were analysed. Mutational clusters were assessed by hierarchical clustering. Structure-based virtual screening (SBVS) was used to select the best inhibitors of 3-chymotrypsin-like protease (3CL-Pr) and RNA-dependent RNA polymerase (RdRp) among the FDA-approved drugs and to evaluate the impact of mutations on binding affinity of these drugs. The impact of mutations on epitope recognition was predicted following Grifoni et al. (Cell Host Microbe 2020; 27: 671–80.) RESULTS: Thirty-five key mutations were identified (prevalence: ≥0.5%), residing in different viral proteins. Sixteen out of 35 formed tight clusters involving multiple SARS-CoV-2 proteins, highlighting intergenic co-evolution. Some clusters (including D614G(Spike) + P323L(RdRp) + R203K(N) + G204R(N)) occurred in all continents, while others showed a geographically restricted circulation (T1198K(PL-Pr) + P13L(N) + A97V(RdRp) in Asia, L84S(ORF-8) + S197L(N) in Europe, Y541C(Hel) + H504C(Hel) + L84S(ORF-8) in America and Oceania). SBVS identified 20 best RdRp inhibitors and 21 best 3CL-Pr inhibitors belonging to different drug classes. Notably, mutations in RdRp or 3CL-Pr modulate, positively or negatively, the binding affinity of these drugs. Among them, P323L(RdRp) (prevalence: 61.9%) reduced the binding affinity of specific compounds including remdesivir while it increased the binding affinity of the purine analogues penciclovir and tenofovir, suggesting potential hypersusceptibility. Finally, specific mutations (including Y541C(Hel) + H504C(Hel)) strongly hampered recognition of Class I/II epitopes, while D614G(Spike) profoundly altered the structural stability of a recently identified B cell epitope target of neutralizing antibodies (amino acids 592–620). CONCLUSIONS: Key genetic elements reflect geographically dependent SARS-CoV-2 genetic adaptation, and may play a potential role in modulating drug susceptibility and hampering viral antigenicity. Thus, a close monitoring of SARS-CoV-2 mutational patterns is crucial to ensure the effectiveness of treatments and vaccines worldwide. The new coronavirus, termed SARS-CoV-2 (severe acute respiratory syndrome coronavirus-2), emerged in China at the end of 2019. 1, 2 Afterwards, SARS-CoV-2 was declared a pandemic and has been responsible for over 16 million cases with >650 000 deaths (https://www.gisaid.org/, updated 29 July 2020), causing a global health emergency of inconceivable magnitude. 2, 3 SARS-CoV-2 is an enveloped positive-sense RNA virus characterized by a genome encoding four structural proteins, 16 nonstructural proteins (NSPs) and other regulatory proteins. The four structural proteins are: the envelope (E), spike (S), membrane (M) and nucleocapsid (N) protein. The 16 NSPs include the 3chymotrypsin-like protease (3CL-Pr), the papain-like protease (PL-Pr), the replication complex comprising the RNA-dependent RNA polymerase (RdRp), the helicase (Hel), the 3 0 ,5 0 -exonuclease and other NSPs involved in the different steps of viral replication. 4 So far, 3CL-Pr and RdRp have been explored as the main drug targets for therapeutic approaches against SARS-CoV-2 infection. 5 Preliminary studies suggest that SARS-CoV-2 is evolving during its spread worldwide and its genome is accumulating some new variations with respect to the SARS-CoV-2 strains that originated in China. 6, 7 Nevertheless, an in-depth definition of mutational profiles underlying SARS-CoV-2 genetic diversification across geographical areas and their functional characterization has not been extensively addressed. Furthermore, given the urgency of the SARS-CoV-2 outbreak, there has been considerable interest in repurposing existing drugs approved for treating other infections or for other medical indications. 8 Nevertheless, no information is available on the role of SARS-CoV-2 mutations in affecting, positively or negatively, the binding affinity of these drug candidates. Understanding this issue can provide important information for the development of effective antiviral agents and universal vaccines, as well as for the design of accurate diagnostic assays, thus representing a crucial aspect to consider in ongoing public health measures to contain infection worldwide. In this light, by analysing one of the largest sets of SARS-CoV-2 sequences, this study aimed to define key genetic elements, single or in clusters, underlying the evolutionary diversification of SARS-CoV-2 across continents, and their impact on protein structural stability by molecular dynamics simulations, on binding affinity of drug candidates by docking analysis and on epitope recognition by in silico prediction models. A total of 12 150 high-quality and nearly complete SARS-CoV-2 genomic sequences were retrieved from https://www.gisaid.org/ (see Supplementary Information available as Supplementary data at JAC Online). Sequences were obtained from samples collected between 24 December 2019 and 20 April 2020, and cover 69 countries with the following geographic distribution: Europe (N = 6680), America (N = 3274), Oceania (N = 1321), Asia (N = 777) and Africa (N = 98). The quality filters for sequences inclusion are reported in the Supplementary Material. Sequences were aligned using the NC_045512.2 SARS-Cov-2-Wuhan-Hu-1 isolate as the reference sequence by Bioedit. The amino acid variability in viral proteins (S, M, N, E, 3CL-Pr, PL-Pr, RdRp, Hel, NSP-14, NSP-7, NSP-8 and ORF-8) was evaluated by estimating the mean evolutionary divergence (ED) compared with the reference NC_045512.2 using the Poisson correction included in the Mega X software. The Shannon entropy was calculated to measure the extent of amino acid variability at each position of SARS-CoV-2 protein sequences using the formula [Sn = -P i (p i lnp i )/lnN], where p i was the frequency of each amino acid and N was the total number of sequences analysed. For each protein, we assessed the number of amino acid positions with a variability 0.5%. SARS-CoV-2 mutations were defined according to the sequence of each specific protein using the NC_045512.2 SARS-Cov-2-Wuhan-Hu-1 isolate as the reference sequence. The prevalence of mutations was calculated in the overall population and according to the continent of sequence isolation. Statistically significant differences in the prevalence of mutations between Asia (continent of origin for the epidemic) and the other continents were assessed by Fisher's exact test and corrected for multiple testing by the Benjamini-Hochberg method (false discovery rate = 0.05). Statistically significant pairs of mutations were investigated by calculating the binomial correlation coefficient (phi) for the simultaneous presence of mutations at two positions in the same isolate, while clusters of mutations were identified by average linkage hierarchical agglomerative clustering described elsewhere 9 and in the Supplementary Material. Phylogenetic trees were performed by MEGA X 10 using a maximum likelihood tree based on the Jukes-Cantor model, 11 Mutations were localized in SARS-CoV-2 major histocompatibility complex (MHC) Class I/II T cell epitopes and B cell epitopes defined by Grifoni et al. (2020) . 12 The impact of each mutation in altering the binding affinity between the epitopes and the MHC molecules was estimated in silico through the Immune Epitope Database and Analysis Resource (IEDB), by following the approach recently used in Grifoni et al. (2020) 12 and described in the Supplementary Material. Molecular dynamics simulations (described in the Supplementary Material) were performed to assess the impact of mutations on the stability of RdRp, Hel, 3CL-Pr and S proteins, with the available crystallographic models. The molecular recognition studies of 3CL-Pr and RdRp were carried out by structure-based virtual screening techniques using the DrugBank database as library (details in the Supplementary Material). The amino acid ED from the SARS-CoV-2 reference sequence (NC_045512.2) varied according to the proteins analysed and never exceeded a mean value of 0.0028 amino acid substitutions per site, indicating limited genetic divergence. The highest ED was observed in the N protein and in the ORF-8-encoded regulatory protein (mean ED ± SD: 0.0021 ± 0.0011 and 0.0028±0.0020 amino acid substitutions per site) (Table 1) . Notably, in the N protein, the highest ED occurred in the Ser/Arg-rich motif, crucial to mediate the interaction of the nucleocapsid with viral and cellular proteins. [13] [14] [15] Despite limited ED, Shannon entropy analysis revealed the presence of key amino acid mutations (with a frequency 0.5%) at 35 hot-spot positions ( Table 1 and Figure 1 ). Mutations detected with the highest frequency were D614G Spike (61.2%) and P323L RdRp (61.8%), followed by R203K N and G204R N (17.2% and 17.2%, respectively), L84S ORF-8 (13%), P504L Hel and Y541C Hel (8.1% and 8.2%, respectively) ( Figure 1 ). The remaining mutations occurred with a prevalence <5% ( Figure 1 ). Among them, S193I N , S194L N , S197L N and S202N N (along with R203K N and G204R N ) resided in the Ser/Arg-rich motif, corroborating the selective pressure acting on this domain ( Figure 1 ). Furthermore, all the mutations observed in 3CL-Pr and PL-Pr showed a prevalence <5% ( Figure 1 ). A dramatic increase (up to 60%) in the prevalence of D614G Spike and P323L RdRp was observed in all continents compared with Asia (adjusted P 0.01 for all comparisons), indicating that these mutations are emerging as the major circulating viral variants despite a limited initial circulation in Asia ( Figure 2) . Furthermore, specific mutations circulate with higher prevalence in Europe than in other geographical regions. Among them, T175M M , S193I N and K90R 3CL-Pr showed a prevalence of 4.8%, 3.5% and 1.8%, respectively, in Europe while being absent or nearly absent in other continents (prevalence <0.9% for T175M M , <0.1% for S193I N and 0.3% for K90R 3CL-Pr , adjusted P 0.05 for all comparisons). A similar scenario was observed for R203K N and G204R N , showing a remarkable increase in their circulation in Europe and Oceania (adjusted P < 0.05) compared with Asia and America ( Figure 2) . Similarly, G15S 3CL-Pr showed a higher prevalence in Europe and Africa compared with the other geographic areas (adjusted P < 0.05) ( Figure 2) . Interestingly, the genetic diversification in America mainly involved proteins acting as cofactors of viral polymerase. In particular, P504L Hel and Y541C Hel circulate predominantly in America (prevalence: 26.5% and 27%, respectively) while being rarely detected in Asia and Europe (prevalence <0.3%) and never in Africa (adjusted P < 0.05 for each comparison) ( Figure 2 ). Similarly, S25L NSP-7 and A320V NSP-14 circulate in America with a prevalence of 5.1% and 4.3%, while their prevalence never exceeded 1.1% and 0.3%, respectively, in other geographic areas. A peculiar situation was observed in Oceania characterized by an increased circulation of a large variety of specific mutations throughout the different viral proteins, rarely found in Asia. Among them, G282V PL-Pr was the only detected exclusively in Oceania (prevalence: 5.8%) and never in the other continents ( Figure 2 ). Covariation analysis identified several statistically significant pairs involving mutations localized in different viral proteins, highlighting a process of intergenic co-evolution. In particular, among the above-mentioned 35 SARS-CoV-2 mutations, 16 were tightly associated with each other. Specific pairs of mutations were detected in all continents, although with a diverse frequency of circulation (Table 2) . This is the case of D614G Spik ! P323L RdRp (phi from 0.7 to 0.9 in the different continents) predominantly circulating in Europe (71.9%) and Africa (83.3%), followed by America (54.6%) and Oceania (46.6%), A position was defined as variable if amino acid substitutions were detected with a frequency 0.5%. c Shannon entropy was used to measure the amino acid variability at each position. The minimum and maximum entropy value is reported. and in 3CL-protease, PL-protease and ORF-8-encoded protein (c) across continents. Statistically significant differences were calculated by v 2 test comparing each continent with Asia. The Benjamini-Hochberg method was used for correction of multiple comparisons. *P < 0.05, **P < 0.01. G282V PL-Pr , although with an overall prevalence of <1%, was reported due to its peculiar geographic distribution. Other specific pairs of mutations showed a preferential circulation in specific geographic areas. In particular, P504L Hel ! Y541C Hel circulate in America and Oceania (phi = 0.9 for both) with a prevalence of 26.7% and 8.6%, respectively (Table 2) . Similarly, P13L N ! A97V RdRp and P13L N ! T1198K PL-Pr were detected in Asia and Oceania with a prevalence ranging from 3.2% to 3.9% and from 3.2% to 7.3%, respectively (phi = 0.9 and 0.6 for P13L N ! A97V RdRp and phi = 0.8 and 0.5 for P13L N ! T1198K PL-Pr ) ( Table 2) . Again, a peculiar scenario was observed in Oceania, characterized by the circulation of multiple pairs of mutations occurring solely in this continent: L84S ORF-8 ! P13L N (phi = 0.4), P153L PL-Pr ! F233L NSP-14 (phi = 1.0) and P13L N ! S197L N (phi = 0.8). The 3CL-Pr was the only viral protein whose mutations (G15S, K90R and A266V) were not involved in statistically significant pairs. By hierarchical clustering analysis, the pair D614G Spike ! P323L RdRp was linked to R203K N ! G204R N , forming a tight cluster that was detected in all continents, with the highest prevalence in Oceania (13.2%), followed by Europe (6.1%), Asia (5.7%) and America (3.9%) (bootstrap = 1.0 for all continents) (Figure 3 ). In Europe, this cluster D614G Spike ! P323L RdRp ! R203K N ! G204R N was accompanied by the continent-specific mutation T175M M (bootstrap = 1.0) (Figure 3) . Furthermore, the cluster made up of P504L Hel ! Y541C Hel ! L84S ORF-8 was detected in America and in Oceania (bootstrap = 1.0 and 0.97, respectively) with a prevalence of 26.7% and 8.2%, respectively. Finally, two continent-specific clusters were identified: T1198K PL-Pr ! P13L N ! A97V RdRp in Asia (bootstrap = 1.0) and P153L 3CL-Pr ! V62L ORF-8 ! F233L NSP-14 in Oceania (bootstrap = 1.0) (Figure 3) . No continent-specific clusters of mutations were found in Africa, presumably due to the limited number of sequences available. Phylogenetic analysis confirmed the geographically dependent clustering of mutations ( Figure S1 ). Predicted impact of the mutations on SARS-CoV-2 recognition by T cell-and B cell-mediated immune response Among the 35 above-mentioned mutations, 16 resided in Class I/II-restricted T cell epitopes (Table S1 ). Remarkably, by in silico prediction, 12/16 mutations reduced the binding affinity for specific human leukocyte antigens (HLAs) compared with the WT epitope (Table 3) . Importantly, a drastic drop in the binding affinity was observed for P1263L Spike (score for HLA-B*07:02 of the WT versus the mutated epitope: 0.649 versus 0.001), P504L Hel (score for HLA-B*07:02 of the WT versus the mutated epitope: 0.725 versus 0.001) and Y541C Hel (score for HLA-A*01:01 of the WT versus the mutated epitope: 0.976 versus 0.008) ( Table 3 ). This suggests a process of antigenic drift favouring SARS-CoV-2 escape from T cell-mediated immune responses. Furthermore, nine mutations were localized in B cell epitopes: two in the S protein, including D614G Spike , and seven in the N protein (Table S1) . Notably, six out of seven N mutations were mapped within the same putative B cell epitope spanning positions 177-215. Structural analysis was focused on mutations with a prevalence >1% localized in 3CL-Pr, RdRp, Hel and S (the only SARS-CoV-2 proteins whose crystallographic models are available). Most of the analysed mutations determined minimal changes in the stability of the entire proteins compared with the WT ( Figure S2a-d) . The only exception is represented for K90R 3CL-Pr associated with an increased stability of the entire 3CL-Pr compared with the WT (RMSD WT =2.07 Å , RMSD K90R =1.57 Å ) ( Figure S2d ). Focusing in more detail on the structural localization of mutations, P323L RdRp is located close to the interface with NSP-8. Compared with the WT model, P323L RdR increased the number of total and average hydrogen bonds (HBs) between RdRp and NSP-8, supporting a more stable interaction between these two proteins (WT tot = 241 075 versus P323L tot = 247 839; WT ave = 241 versus P323L ave = 248) ( Figure S3a) . Furthermore, as mentioned above, D614G Spike is located in a B cell epitope encompassing residues 592-620. By the root mean square fluctuation analysis, this mutation profoundly decreased the stability of this B cell epitope compared with the WT, suggesting an altered epitope conformation ( Figure S3b ). Firstly, we applied an in silico drug-repurposing approach to select the best inhibitors of 3CL-Pr and RdRp (the main anticoronavirus pharmacological targets) among the DrugBank compounds. Then, we evaluated if the mutations P323L RdRp , K90R 3CL-Pr and G15S 3CL-Pr can modulate, positively or negatively, the binding affinity between the enzyme and the inhibitor. Based on our structure-based virtual screening (SBVS), 20 potential inhibitors of RdRp were identified: five purine analogues (including remdesivir), four cephalosporins, two acetamide derivatives, two flavone compounds, two peptide derivatives, two triazoles, an oxoazepanyl compound, a polyphenol derivative and a pyrimidine analogue (Figure 4) . The majority of the identified compounds establish several HBs with specific RdRp residues (including K545, S549, K551, R553, D623 and D761) and with some RNA nucleobases, such as the uracil at position 18, adenine at position 19 and uracil at position 20 ( Figure S4 to Figure S23 ). As shown in Figure 5 , the best ranked compound cefoperazone in the WT complex was involved in four coordination bonds with Mg 2! cations, two p-cation interactions and two HBs, while in the presence of P323L RdRp the oxoazepanyl derivative RU82209 was well stabilized into the enzyme-binding pocket by means of six coordination bonds between its phosphate groups and the Mg 2! cations. Notably, several candidates were better recognized in the P323L complex, as in the case of penciclovir, tenofovir, PF-00610355, zanamivir, diosmin, isavuconazole, resveratrol and, above all, RU82209, associated with the absolute best G-score value ( Figure 4) . Conversely, in the presence of P323L RdRp , all cephalosporins and both peptide derivatives showed a decreased binding affinity compared with the WT, as well as rutin and PF-03715455. Interestingly, P323L RdRp decreased the binding affinity of remdesivir compared with the WT (Figure 4) . Regarding 3CL-Pr, SBVS allowed selection of 21 promising inhibitors: four cephalosporins, four peptide derivatives, four purine analogues (including remdesivir), three flavone compounds, three pyrimidine analogues, two triazoles and a benzeneacetamide ( Figure 6 ). The majority of the identified compounds establish several HBs with crucial residues of 3CL-Pr (including N142, G143, C145, E166 and T190), as well as Van der Waals contacts and a pivotal p-p stacking interaction with H41. Both C145 and H41 were involved in the 3CL-Pr catalytic dyad ( Figure S24 to Figure S44 ). Figure 7 reports the best ranked compounds screened against the WT and mutated models. Notably, the NAD 3-pentanone adduct, PF-00610355, PF-03715455, reproterol and rutin were found to be better recognized in both K90R 3CL-Pr and G15S 3CL-Pr compared with the WT, with the NAD 3-pentanone adduct being associated with the absolute best G-score in the presence of K90R 3CL-Pr (Figures 6 and 7) . This increased theoretical binding affinity could be justified by an additional salt bridge between the ligand pyridine charged nitrogen and 3CL-Pr glutamate at position 166, missing Table 3 . Impact of SARS-CoV-2 mutations on binding affinity between epitopes and Class I/II-MHCs a Epitopes are based on T cell epitope prediction by Grifoni et al. (2020) . 12 Letters in bold in the epitopes indicate the mutated amino acid. b The score and the rank were assessed by the Immune Epitope DataBase. A reduced score and an increased rank indicate a decreased binding affinity between the epitope and the related MHC Class I/II allele. in the WT model. After reproterol recognition, in both mutated complexes, we observed a pivotal p-p stacking interaction with H41 ( Figure S35) , while in the best poses of both investigational PF compounds, an increased number of HBs was found (Figures S43 and S44) . Furthermore, in K90R 3CL-Pr , the NAD 3-pentanone adduct engaged seven HBs, a salt bridge contact and a p-p stacking interaction with H41 ( Figure 7) . Notably, among the protease inhibitors already used for other viral infections, G15S 3CL-Pr determined an increased binding affinity of indinavir ( Figure 6 ). Conversely, both K90R 3CL-PR and G15S 3CL-Pr strongly reduced the theoretical binding affinity of isavuconazole and sofosbuvir ( Figure 6 ). Furthermore, K90R 3CL-Pr determined a remarkable decrease in the binding affinity of adafosbuvir, cefpiramide and ceftibuten, while G15S 3CL-Pr affected that of hesperidin, lisinopril and presatovir ( Figure 6 ). Based on one of the largest publicly available datasets of SARS-CoV-2 sequences so far analysed (>12 000), this study identified key mutations, single or in pairs or clusters, underlying geographically dependent viral evolutionary adaptation to human hosts. 16, 17 Some of these mutations can hamper Class I/II epitope recognition or can profoundly alter the conformation of specific B cell epitopes, suggesting their capability to alter viral antigenicity. Furthermore, to our knowledge, this is the first study highlighting the capability of mutations in RdRp and 3CL-Pr to modulate, either positively or negatively, the binding affinity of specific compounds, suggesting their potential involvement in mechanisms underlying SARS-CoV-2 hypersusceptibility or resistance to drugs. Among the identified pairs of mutations, D614G Spike ! P323L RdRp occurred in all continents, although with a frequency ranging from 13.6% in Asia to 71.9% in Europe and 83.3% in Africa. In particular, in Europe, D614G Spike was first noted in a viral strain isolated from Germany. 18 Notably, D614G Spike can introduce a novel protease cleavage site capable of enhancing the fusion between the viral envelope and cell membrane, thus increasing viral infectivity and, in turn, interhuman transmission potential. [19] [20] [21] Furthermore, D614G Spike lies in a recently identified B cell epitope encompassing amino acids 592-620. Molecular dynamics simulations show that D614G Spike profoundly alters the stability of this epitope, supporting an altered conformation and consequently an impaired recognition by humoral responses. This is in line with a recent finding showing the association of D614G Spike with a reduced antigenicity compared with the SARS-CoV-2 strains isolated in the early epidemic phase. 21 On this basis, D614G Spike could also pose concerns for the full effectiveness of vaccine strategies under development and thus deserves further investigation in immunological studies. P323L RdRp resides in the interface domain (amino acids 250-365) known to connect N-and C-terminal RdRp domains. 22 By molecular dynamics simulations, P323L RdRp increased the number of HBs between RdRp and NSP-8, a cofactor which is part of the replication complex along with RdRp, suggesting a stabilized interaction between these two proteins. Furthermore, by in silico prediction, this mutation can reduce the binding affinity for specific HLAs, suggesting that P323L RdRp along with D614G Spike can favour viral evasion from immune responses. Notably, by SBVS on FDA-approved drugs, P323L RdRp determined a reduced binding affinity for specific compounds including remdesivir. This suggests that P323L RdRp could act as a 'natural' drug resistance mutation, hampering the full effectiveness of specific antiviral treatments. This point deserves investigation in ongoing clinical trials. 22 At the same time, P323L RdRp determined an increased binding affinity for specific purine analogues, including penciclovir and tenofovir, suggesting a potential viral hypersusceptibility. Regarding 3CL-Pr, SBVS identified four cephalosporins among the best 3CL-Pr inhibitors, in line with a previous study suggesting the incorporation of a lactam ring in the lead optimization process of SARS-CoV 3CL-Pr inhibitors. 23 While K90R 3CL-Pr determined a reduced binding affinity for all of them, G15S 3CL-Pr was associated with an increased binding affinity for a specific cephalosporin, suggesting differential viral susceptibility to this drug class according to the observed mutational profile. Among the Pr inhibitors already approved for other viral infections, lopinavir was not included in the list of the best 3CL-Pr inhibitors, due to its low binding affinity towards both WT and mutated 3CL-Pr. This is in line with recent clinical data showing no significant benefits in overall mortality and reduction of SARS-CoV-2 load. 24 The overall findings highlight the role of viral genetic variability in modulating, either positively or negatively, drug-binding affinity, a concept that should be taken into account in the current drugrepurposing approach. The above-mentioned pair D614G Spike ! P323L RdRp formed a tight cluster with R203K N ! G204R N . This mutational pair resides in the Ser/Arg-rich motif of the N protein, known to be the target of phosphorylation by cellular kinases, to regulate the equilibrium between viral genome replication and morphogenesis 25 and to be involved in several intracellular signalling pathways. 14, 22 Furthermore, R203K N ! G204R N along with S193I N , S194L N , S197L N and S202N N reside in a region spanning amino acids 177-215 identified as a B cell epitope. 12 Thus, these mutations could alter the antigenicity of the nucleocapsid, potentially affecting its capability to elicit the production of antibodies in infected subjects. 26 Similarly, the peculiar enrichment of mutations in the N protein could have important implications in the serological SARS-CoV-2 diagnosis, based on the detection of antibodies against the nucleocapsid, and poses some concerns on the use of this region as a target of molecular diagnostic assays. Indeed, these mutations could contribute to the variations in the sensitivity observed among the assays used for SARS-CoV-2 diagnosis. 27 Further comparative studies on the performance of molecular and serological assays for SARS-CoV-2 in the presence of these mutations will be useful to clarify this issue. Our analysis also showed that the cluster D614G Spike ! P323L RdRp ! R203K N ! G204R N can be linked to continent-specific mutations. Indeed, in Europe, this cluster was accompanied by T175M M . During viral morphogenesis, the M protein interacts with the N and the S protein, acting as a central organizer of viral assembly. 25, 28 Notably, both R203K and G204R reside in a stretch of amino acids (168-208) that has been proposed to be involved in the interaction with the M protein in SARS-CoV-1. 25 Thus, the overall findings suggest that the association of mutations in the abovementioned proteins can promote the packaging of the encapsidated genome, thus enhancing viral morphogenesis and explaining their emergence in the viral population. Another widely circulating cluster, characterizing 26.7% and 8.2% of American and Australian viral strains, was made up of H504C Hel ! Y541C Hel ! L84S ORF-8 . Notably, H504C Hel and Y541C Hel are the only mutations capable of abrogating the binding affinity of CD8! T cell epitopes for some specific HLAs, supporting that these mutations can act as immune escape mutations, favouring viral evasion from CD8! T cell responses. In conclusion, the identification of key geographically dependent genetic elements, single or in pairs or clusters, reflects a geographically dependent viral evolutionary adaptation to human hosts. These mutations may play a potential role in modulating viral susceptibility to drugs either positively or negatively and in favouring vaccine and diagnostic escape events. For this reason, in vitro studies are necessary to further confirm such in silico-based results. In this light, a close and continuous monitoring of SARS-CoV-2 mutational patterns across different geographical areas is crucial to ensure the effectiveness of antiviral treatments and vaccines as well as the accuracy of diagnostic assays worldwide. A novel coronavirus from patients with pneumonia in China The proximal origin of SARS-CoV Genomic characterization of a novel SARS-CoV-2 Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Pharmacological therapeutics targeting RNA-dependent RNA polymerase, proteinase and spike protein: from mechanistic studies to clinical trials for COVID-19 Three adjacent nucleotide changes spanning two residues in SARS-CoV-2 nucleoprotein: possible homologous recombination from the transcription-regulating sequence Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant Rapid repurposing of drugs for COVID-19 The profile of mutational clusters associated with lamivudine resistance can be constrained by HBV genotypes MEGA X: molecular evolutionary genetics analysis across computing platforms Evolution of protein molecules A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV Post-translational modifications of coronavirus proteins: roles and function A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Structural proteins in severe acute respiratory syndrome coronavirus-2 Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 Large scale genomic analysis of 3067 SARS-CoV-2 genomes reveals a clonal geo-distribution and a rich genetic variations of hotspots mutations Genetic diversity and evolution of SARS-CoV-2 Global spread of SARS-CoV-2 subtype with spike protein mutation D614G is shaped by human genomic variations that regulate expression of TMPRSS2 and MX1 genes Could the D614G substitution in the SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? Emergence of drift variants that may affect COVID-19 vaccine development and antibody treatment Identification of novel mutations in RNAdependent RNA polymerases of SARS-CoV-2 and their implications on its protein structure Potential broad spectrum inhibitors of the Coronavirus 3CLpro: virtual screening and structure-based drug design study A trial of lopinavir-ritonavir in adults hospitalized with severe COVID-19 Characterization of protein-protein interactions between the nucleocapsid protein and membrane protein of the SARS coronavirus Comparative computational analysis of SARS-CoV-2 nucleocapsid protein epitopes in taxonomically related coronaviruses Implication of SARS-CoV-2 evolution in the sensitivity of RT-qPCR diagnostic assays Isolation of coronavirus envelope glycoproteins and interaction with the viral nucleocapsid We are grateful to the Johns Hopkins University for collection of SARS-CoV-2 sequences in the GISAID portal and all the laboratories originating and submitting data (Supplementary Material and Supplementary Information). We thank the Aviralia Foundation and the Vironet C Foundation for supporting this study, and Dr Alba Grifoni for her contribution in epitope prediction analysis. This study was conducted as part of our routine research work. None to declare. Supplementary Figure 6 . Continued