key: cord-0859361-opgj8dwe authors: Harbison, Aoife M.; Fogarty, Carl A.; Phung, Toan K.; Satheesan, Akash; Schulz, Benjamin L.; Fadda, Elisa title: Fine-tuning the spike: role of the nature and topology of the glycan shield in the structure and dynamics of the SARS-CoV-2 S date: 2021-11-25 journal: Chemical science DOI: 10.1039/d1sc04832e sha: 6e5a730b2f5d2821c1ca023f6f9eeb7b9b40be26 doc_id: 859361 cord_uid: opgj8dwe The dense glycan shield is an essential feature of the SARS-CoV-2 spike (S) architecture, key to immune evasion and to the activation of the prefusion conformation. Recent studies indicate that the occupancy and structures of the SARS-CoV-2 S glycans depend not only on the nature of the host cell, but also on the structural stability of the trimer; a point that raises important questions about the relative competence of different glycoforms. Moreover, the functional role of the glycan shield in the SARS-CoV-2 pathogenesis suggests that the evolution of the sites of glycosylation is potentially intertwined with the evolution of the protein sequence to affect optimal activity. Our results from multi-microsecond molecular dynamics simulations indicate that the type of glycosylation at N234, N165 and N343 greatly affects the stability of the receptor binding domain (RBD) open conformation, and thus its exposure and accessibility. Furthermore, our results suggest that the loss of glycosylation at N370, a newly acquired modification in the SARS-CoV-2 S glycan shield's topology, may have contributed to increase the SARS-CoV-2 infectivity as we find that N-glycosylation at N370 stabilizes the closed RBD conformation by binding a specific cleft on the RBD surface. We discuss how the absence of the N370 glycan in the SARS-CoV-2 S frees the RBD glycan binding cleft, which becomes available to bind cell-surface glycans, and potentially increases host cell surface localization. reported for the SARS-CoV-1 S 8 . Summary of the specific glycosylation at each site is shown in In all MD simulations the protein and counterions (200 mM) were represented by the AMBER ff14SB 9 parameter set, whereas the glycans were represented by the GLYCAM06j-1 version of the GLYCAM06 force field 10 . Water molecules were represented by the TIP3P model. All simulations were run with v18 of the AMBER software package 11 . The following running protocol was used for all MD simulations. The energy of the S ectodomains models was minimized in two steps of 50,000 cycles of the steepest descent algorithm each. During the first minimization all the heavy atoms were kept harmonically restrained using a potential weight of 5 kcal mol −1 Å 2 , while the solvent, counterions and hydrogen atoms were left unrestrained. The minimization step was repeated with only the protein heavy atoms were kept restrained, while the glycans, solvent, counterions and hydrogens were left unrestrained. After energy minimization the system was equilibrated in the NVT ensemble with the same restraints scheme, where heating was performed in two stages over a total time of 1 ns, from 0 to 100 K (stage 1) and then 100 to 300 K (stage 2). During equilibration the SHAKE algorithm was used to constrain all bonds to hydrogen atoms. The Van der Waals interactions were truncated at 11 Å and Particle Mesh Ewald (PME) was used to treat long range electrostatics with B-spline interpolation of order 4. Langevin dynamics with collision frequency of 1.0 ps-1 was used to control temperature, which a pseudo-random variable seed to ensure there are no synchronization artefacts. Once the system was brought to 300 K an equilibration phase in the NPT ensemble of 1 ns was used to set the pressure to 1 atm. The pressure was held constant with isotropic pressure scaling and a pressure relaxation time of 2.0 ps. At this point all restraints on the protein heavy atoms were removed, allowing the system to evolve for 15 ns of conformational equilibration before production. At this stage different replicas for each model were generated starting from different velocities. The conformational equilibration phase for each replica was further extended to include the first 300 ns of production to allow the glycans shield to adapt to the protein architecture and viceversa. In the analysis this 300 ns initial phase was discarded, see Table S .2 footnotes for nomenclature) and N370 (R2), which in these cases is further indication of the glycosylation supporting a "wide open" RBD corresponding to the cryo-EM PDB 6VYB used as a starting structure. Nevertheless, the same threshold was used for all simulations for consistency. We obtained a list of SARS-CoV-2 S protein homologs from UniProt 11 using a blastp search of the SARS-CoV-2 S protein sequence (UniProt Accession P0DTC2) against "Virus" proteins with an E-value threshold of 0.01, auto-selection of matrix, allowing gaps and with a maximum of 1000 hits. We further filtered this list of proteins to remove duplicates and only retain those with "spike" in the name. We aligned these proteins with Clustal Omega v1.2.2 12, 13 . A phylogenetic tree was created from the multiple sequence alignment using FastTree v2.1.10 without SSE3, and 1000 as the bootstrap parameter 14 . Ancestral sequence reconstruction was performed based on this multiple sequence alignment and the associated phylogenetic tree, using CodeML from the PAML v4.9e with WAG amino acid substitution matrix and molecular clock turning on 15 SWISS-MODEL: homology modelling of protein structures and complexes How and why plants and human N-glycans are different: Insight from molecular dynamics into the "glycoblocks" architecture of complex carbohydrates Sequence-to-structure dependence of isolated IgG Fc complex biantennary N-glycans: a molecular dynamics study Site-specific glycan analysis of the SARS-CoV-2 spike Deducing the N-and Oglycosylation profile of the spike protein of novel coronavirus SARS-CoV-2 Vulnerabilities in coronavirus glycan shields despite extensive glycosylation Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB GLYCAM06: a generalizable biomolecular force field. Carbohydrates UniProt: the universal protein knowledgebase in 2021 Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology FastTree 2 -Approximately Maximum-Likelihood Trees for Large Alignments PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution VMD: visual molecular dynamics Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega