key: cord-0792900-8nvzeew3 authors: Donzelli, Sara; Spinella, Francesca; di Domenico, Enea Gino; Pontone, Martina; Cavallo, Ilaria; Orlandi, Giulia; Iannazzo, Stefania; Ricciuto, Giulio Maria; Team, ISG Virology Covid; Pellini, Raul; Muti, Paola; Strano, Sabrina; Ciliberto, Gennaro; Ensoli, Fabrizio; Zapperi, Stefano; La Porta, Caterina A.M.; Blandino, Giovanni; Morrone, Aldo; Pimpinelli, Fulvia title: Evidence of a SARS-CoV-2 double Spike mutation D614G/S939F potentially affecting immune response of infected subjects date: 2022-01-21 journal: Comput Struct Biotechnol J DOI: 10.1016/j.csbj.2022.01.021 sha: d53477a175b9700b7bdc04bbbfe2a04178e251b9 doc_id: 792900 cord_uid: 8nvzeew3 OBJECTIVES: Despite extensive efforts to monitor the diffusion of COVID-19, the actual wave of infection is worldwide characterized by the presence of emerging SARS-CoV-2 variants. The present study aims to describe the presence of yet undiscovered SARS-CoV-2 variants in Italy. METHODS: Next Generation Sequencing was performed on 16 respiratory samples from occasionally employed within the Bangladeshi community present in Ostia and Fiumicino towns. Computational strategy was used to identify all potential epitopes for reference and mutated Spike proteins. A simulation of proteasome activity and the identification of possible cleavage sites along the protein guided to a combined score involving binding affinity, peptide stability and T-cell propensity. RESULTS: Retrospective sequencing analysis revealed a double Spike D614G/S939F mutation in COVID-19 positive subjects present in Ostia while D614G mutation was evidenced in those based in Fiumicino. Unlike D614G, S939F mutation affects immune response by the slight but significant modulation of T-cell propensity and the selective enrichment of potential binding epitopes for some HLA alleles. CONCLUSION: Collectively, our findings mirror further the importance of deep sequencing of SARS-CoV-2 genome as a unique approach to monitor the appearance of specific mutations as for those herein reported for Spike protein. This might have implications on both the type of immune response triggered by the viral infection and the severity of the related illness. Objectives: Despite extensive efforts to monitor the diffusion of COVID-19, the actual wave of infection is worldwide characterized by the presence of emerging SARS-CoV-2 variants. The present study aims to describe the presence of yet undiscovered SARS-CoV-2 variants in Italy. Methods: Next Generation Sequencing was performed on 16 respiratory samples from occasionally employed within the Bangladeshi community present in Ostia and Fiumicino towns. Computational strategy was used to identify all potential epitopes for reference and mutated Spike proteins. A simulation of proteasome activity and the identification of possible cleavage sites along the protein guided to a combined score involving binding affinity, peptide stability and T-cell propensity. Keywords: SARS-CoV-2, Spike mutations, D614G, S939F, immune response. Betacoronaviruses are responsible of the last three major pathogenic zoonotic diseases occurred in the past two decades (1) . Indeed, severe acute respiratory syndrome (SARS-CoV) emerged in 2002 and exhibited 10% mortality of infected people (2) . Middle East respiratory syndrome coronavirus (MERS-CoV) appeared in 2012 with 35% mortality (3) . To date, SARS-CoV-2 records worldwide over than 80 millions of infected people with around 2 millions deaths. Since SARS-CoV-2 pandemic is still active, the related numbers grow daily. Coronavirus entry into host cells is pivotal for viral infectivity and pathogenesis. It also represents of major determinant for immune surveillance and a key target for therapeutic intervention. SARS-CoV-2 enters host cells of high and low respiratory tracts binding ACE2, a cell surface receptor for viral attachment (4). Subsequently TPMRSS2 internalization protease primes S protein (4). SARS-CoV S1 contains a Receptor-binding domain (RBD) that specifically binds to hACE2 receptor. RBD status constantly oscillates between standing-up conformation for receptor binding to lying-down position for immune evasion (5) . The crystal structure of the complex between SARS-CoV-2 RDB and h-ACE2 receptor has been recently solved (6) . It revealed subtle differences between previously identified SARS-Co-V RBD and SARS-CoV-2 RBD to recognize hACE2. This leads to the increased binding affinity of SARS-CoV-2 to the receptor and determines its severe effects of the infected cell types. Moreover, compared with other SARS-related coronaviruses (SARSr-CoVs), SARS-CoV-2 possesses a unique furin cleavage site (FCS) in its spike protein that is highly functional and that increases the efficiency of virus infection into cells (7) . Herein we identified by retrospective next-generation sequencing analysis a SARS-CoV-2 double Spike mutation D614G/S939F in 16 respiratory samples from occasionally employers within the Bangladeshi community present in Ostia. SARS-CoV-2 Spike D614G mutation was evidenced in the members of the Bangladeshi community living in the close town Fiumicino who were frequently in contact with those members found positive for Spike D614G/S939F double mutation in Ostia. We also found that unlike D614G, S939F mutation affects immune response through the slight but significant modulation of T-cell propensity and the selective enrichment of potential binding epitopes for some HLA alleles. . We evaluated whether any of the analyzed employees was part of an epidemiologically linked cluster based on illness onset date, positive test status, and work location. We found a correlation between geographic location and mutation set. All employees in the same clusters also had identical or nearly identical consensus genomes, which reflects the low genetic diversity of SARS-CoV-2 at this stage of the pandemic. It is highly unlikely that there are direct transmission pairs in our dataset, but we cannot conclusively rule out coincident transmission linkage. However, the high similarity between one case belong to the group of Bangladeshi (bang 2B) to the group of Fiumicino, suggests that 2B acted as a bridge between the two clusters. All consensus sequences have been submitted to GISAID and GeneBank. The Spike D614G/S939F double mutation is poorly studied and consequently its impact on host infection and patient clinical implications are scarcely known. Here we aim to estimate the effects of the D614G/S939F mutations on the immune response. To this end we adapted to the present experimental aim a computational strategy previously introduced to study the SARS-CoV-2 virus and other similar coronaviruses (9, 10) . In particular, we considered all the potential epitopes associated with the reference and mutated SARS-CoV-2 spike protein. As reported in La Porta & Zapperi 2020 and La Porta & Zapperi 2021, the first step of the process involved a simulation of proteasome activity and the identification of possible cleavage sites along the protein (9, 11, 12) . This resulted in a set of 1549 peptides of length 8-11 for the reference spike protein and 1541 total peptides for both mutations in the protein. We then analyzed the peptides searching for likely epitopes using NetTepi which produced a combined score involving binding affinity, peptide stability and T-cell propensity for 13 supported HLA call I (11) . These three measurements all contribute to the potential that a peptide is a T-cell epitope: Binding affinity measures the likelihood that a peptide binds with an HLA, peptide stability measures the ability for the HLA to retain the peptide and T-cell propensity measures whether a peptide is likely to be recognized by a T-cell (11) . The combined score is calculated as a weighted sum of binding affinity, stability and T-cell propensity prediction scores (11) . A high score indicates that the peptide is likely to become a T-cell epitope. From the ranked list of potential epitopes, we selected and counted the highly ranking peptides associated to each HLA allele, as described in the method section. Figure 5 reports that mutations change the number of potential epitopes for some HLA alleles. In particular, the number of highly ranked peptides is increased by the mutations for HLA-A03:01, HLA-A11:01 and HLA-A26:01, it is decreased for HLA-A02:01, HLA-B39:01 and HLA-B40:01, and it remains unchanged for the other HLA alleles. The two point mutations D614G and S939F only affected a limited number of peptides, and due to their distance along the sequence no peptide can have more than one mutation. We thus consider the effect of each mutation separately. As shown in figure 6A , we can identify a small number of peptides that are either present exclusively in the reference protein (16 for D614G and 20 for S939F) or in the mutated protein (16 for D614G and 12 for S939F) (Table S2) . We therefore studied the relevance of these peptides for the immune response. Figure 6B shows that the T-cell propensity did not change significantly for peptides under the D614G mutation, while the S939F displays a small but significant effect. In particular, the higher T-cell propensity indicates that the mutated spike is more easily recognized by T-cells. In figure 6C , we show the combined scores of reference and mutated peptides for the different HLA alleles with some differences observed in an allele dependent manner. In this figure, a decrease in combined score means that the peptide is less likely to be a T-cell epitope. Notice that the number of alleles available for NetTepi is rather limited. We report in Table S3 the distribution of HLA-A and HLA-B alleles found in a Bangladeshi population extracted from the allelefrequencies.net website. We can see that NetTepi HLA-A alleles represent 62% of the population and HLA-B only 50%. To obtain a larger coverage of the alleles present in the population we expanded the analysis by considering MHCflurry 2.0 that is able to predict the binding affinity of arbitrary peptides to any HLA molecule using an artificial neural network (13) . We used this tool to compare the binding affinity of the small group of reference and mutated peptides discussed above for 26 HLA class I alleles providing a broad coverage of the human population (see Fig. S1 for a Venn diagram reporting the alleles considered and for a comparison between the predictions of NetTepi and MHCFlurry). In particular, these 26 alleles represent 93% of the Bangladeshi population for HLA-A alleles and 72% for HLA-B alleles. The results reported in figure 7 show that mutations change the binding landscape only in some cases. For example in the case of the S939F, we could identify some alleles where some new strongly binding peptides emerged in the mutated protein (e.g. HLA-A26:01 or HLA-A32:01), while for the D614G mutation the presence of isolated strongly binding peptides was not affected by the mutation (see HLA-A02:01, HLA-A02:03 and HLA-A02:06). In aggregate, our findings indicate that Spike mutations may potentially alter CD8 T cell immune response to SARS-CoV-2 thereby affecting the rate of infection and clinical impact. The widespread diffusion of SARS-CoV-2 depends, at least in part, from its high rate of genome mutation that leads to the appearance of viral variants with different rate of infection and severity of the Covid 19 disease. As for cancer, whose deep deciphering of DNA mutational landscape has been pivotal for the identification of specific driver mutations and for the design of precision medicine therapeutic approaches, the sequencing of the viral genome by using NGS technologies is of pivotal importance. In the present manuscript, retrospective NGS of SARS-CoV-2 viral genomes revealed a double Spike mutation D614G/S939F in the members of Bangladeshi community located in Ostia as occasional employees. This is the first evidence of the presence of this SARS-CoV-2 double Spike mutation in Italy. Its presence in Europe was previously found in Denmark, Sweden and Croatia. These findings further emphasize the critical need, which is still unmet, to perform massive next generation sequencing of SARS-CoV-2 viral genome to monitor the appearance of novel viral variants and to predict their rate of infection and severity of the related illness in the infected people. Pre-clinical evidence showed that D614G/S939F double Spike mutation was among those mutations that exhibited higher rate of infection (14) . Multiple studies suggest that T cells are important in the immune response against SARS-CoV-2, and may mediate long-term protection against the virus (13) (14) (15) (16) (17) . Interestingly, we provide novel evidence that the described double Spike mutation affects immune response. To this end, we use computational methods based on artificial neural networks such as NetTepi (11) and MHCFlurry (13) . Indeed, a combined score involving binding affinity, peptide stability and T-cell propensity for 13 supported HLA class I alleles was generated (1) . This led to the evaluation of T cell propensity that resulted slight but significantly modulated upon S939F mutation compared to D614G. Furthermore, the binding landscape affinity of predicted peptides to any HLA molecules was affected by S939F mutation when compared to D614G mutation, that, on the contrary, had no impact on the affinity of strongly binding peptides. One of the most debated issues within the SARS-CoV-2 community is the efficacy of the currently used vaccines against specific viral variants. The generation of tools, as those applied for the identification of a combined score have potential utility as they might also predict viral immune escape of specific SARS-CoV-2 variants. We should also notice that HLA-binding algorithms are widely used in the RNAs extraction from nasopharyngeal and oropharyngeal swab was performed in two ways. First (to perform routinely Real -Time PCR) by using Bosphore EX-Tract Dry Swab RNA Solution For the detection of SARS-CoV-2 in RNAs extracted from nasopharyngeal and oropharyngeal swab we used Bosphore Novel Coronavirus (2019-nCoV) Detection Kit v2 (AnatoliaGeneWork). This kit is a Real-Time PCR-based in vitro diagnostic medical device that allows to detect two regions of the virus in two separate reactions:E gene is used for screening purpose, where 2019-nCoV and also the closelyrelated coronaviruses are detected, and the orf1ab target region is used to discriminate 2019-nCoV specifically. This kit includes also an internal control in order to check RNA extraction, PCR inhibition and application errors. For the detection of presence/absence of COVID-19, 10 ul of RNA was tested using Allplex™ 2019-nCoV Assay (Seegene) according to manufacturer's instructions. After the first step of PCR amplification library preparation has been conducted following the Ion Torrent™ Ion AmpliSeq™ Library Kit Plus protocol (Thermo Fisher Scientific). SARS-CoV-2 Ampliseq libraries have been sequenced by using the Ion Chef™ and the Ion Genestudio™ S5 Plus Software (Thermo Fisher Scientific) have been used for bioinformatic analysis to provide data on coverage sequencing (Table S1) , variant calling and annotation, and genome assembly: CoverageAnalysis; VariantCaller, Covid19AnnotateSNPeff, IRMA and AssemblerTrinity (21) (22) (23) (24) . We calculated the mean depth of coverage from the 13 BAM files, at single nucleotide resolution All consensus sequences have been submitted to GISAID with the following accession IDs: EPI_ISL_1181628, EPI_ISL_1257897, EPI_ISL_1224910, EPI_ISL_1257867, EPI_ISL_1257868, EPI_ISL_1257869, EPI_ISL_1257870, EPI_ISL_1257871, EPI_ISL_1257872, EPI_ISL_1257873, EPI_ISL_1257894, EPI_ISL_1257895, EPI_ISL_1257896 . Information about S939F and D614G variants counts and frequency were obtained by 2019nCoVR browser (https://bigd.big.ac.cn/ncov) (25) (26) (27) . Distribution of S939F and D614G variants in the world over time was provided by COVID-19 CoV Genetics browser (https://covidcg.org) (28) . Distribution of S939F variant in Europe at the indicated time points was verified by interrogating GISAID database (https://www.gisaid.org) (29) . In the following analysis, we only consider peptides that are likely to be produced by proteasome degradation using NetChop 3.1 (12) a neural network based algorithm that scans proteins for probable cleavage sites of the human proteasome. We perform the scan for the spike protein of the reference virus SARS-CoV-2 and of the mutated virus which includes the two mutations D614G and S939F. Potential T cell epitopes are identified using NetTepi 1.0 through the server (https://services.healthtech.dtu.dk/service.php?NetTepi-1.0). The method combines estimates for peptide-HLA binding affnity, peptide-HLA stability and T cell propensity (11) . Peptides are then ranked against a set of 200000 natural peptides to obtain a global rank score. Here we scan all the peptides selected by proteasome simulation with lengths 8-11 from the spike protein of the reference virus SARS-CoV-2 and of the mutated virus, including the two mutations D614G and S939F. We select highly ranked peptides as those with rank score lower than 2% which are considered "strong binders" (<0.5%) and "weak binders" (<2%). We perform the calculations for all the class I MHC alleles supported by NetTepi, using the default values for the relative weight on stability prediction and the relative weight on T cell propensity prediction. The prevalence of HLA alleles in the Bangladeshi population has been identified using Allele frequency net database (AFND) (30) . Binding affinities are estimated using MHC flurry 2.0 (13) . We only estimate the binding affinities for peptides selected by proteasome using NetChop 3.1 and that differ between the reference and the mutated (D614G and S939F) SARS-CoV-2 virus. The figure shows the number of highly ranked peptides from the reference and the mutated (D614G and S939F) SARS-CoV-2 spike protein for a set of HLA alleles, estimated with NetTepi as discussed in the Methods section. For some HLA alleles, the number of highly ranked peptides, the potential T-cell epitopes, differs for the reference and the mutated virus. Zoonotic origins of human coronaviruses Severe acute respiratory syndrome Infection with Middle East respiratory syndrome coronavirus SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV Elucidating Interactions Between SARS-CoV-2 Trimeric Spike Protein and ACE2 Using Homology Modeling and Molecular Dynamics Simulations The emergence of the spike furin cleavage site in SARS-CoV-2 Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Estimating the Binding of Sars-CoV-2 Peptides to HLA Class I in Human Subpopulations Using Artificial Neural Networks SARS-CoV-2 variants-Immune profile of SARS-CoV-2 variants of concern Frontiers in Digital Health NetTepi: an integrated method for the prediction of T cell epitopes The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage MHCflurry 2.0: Improved Pan-Allele Prediction of MHC Class I-Presented Peptides by Incorporating Antigen Processing The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity SARS-CoV-2-reactive T cells in healthy donors and patients with COVID-19 Targets of T cell responses to SARS-CoV-2 coronavirus in humans with COVID-19 disease and unexposed individuals SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls Broad and strong memory CD4+ and CD8+ T cells induced by SARS-CoV-2 in UK convalescent individuals following COVID-19 Robust T cell immunity in convalescent individuals with asymptomatic or mild COVID-19 In silico T cell epitope identification for SARS-CoV-2: Progress and perspectives. Advanced drug delivery reviews Viral deep sequencing needs an adaptive approach: IRMA, the iterative refine-ment meta-assembler De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analy-sis Full-length transcriptome assembly from RNA-Seq data without a reference genome A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 Social organization The 2019 novel coronavirus resource Global initiative on sharing all influenza data -from vision to reality disease and diplomacy: GISAID's innovative contribution to global health Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools Clustal Omega for making accurate alignments of many protein sequences Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega Molecular Evolutionary Genetics Analysis across Computing Platforms Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies We gratefully acknowledge all the Authors from the Originating laboratories responsible for obtaining the specimens and the Submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this research is based.