key: cord-0300288-02sayfl5
authors: Berndsen, Zachary T.; Chakraborty, Srirupa; Wang, Xiaoning; Cotrell, Christopher A.; Torres, Jonathan L.; Diedrich, Jolene K.; López, Cesar A.; Yates, John R.; van-Gils, Marit J.; Paulson, James C.; Gnanakaran, S; Ward, Andrew B.
title: Visualization of the HIV-1 Env Glycan Shield Across Scales
date: 2019-11-12
journal: bioRxiv
DOI: 10.1101/839217
sha: af3c9d9e0783fdd1230d6a4bf3dd4f87145763f2
doc_id: 300288
cord_uid: 02sayfl5

The dense array of N-linked glycans on the HIV-1 Envelope Glycoprotein (Env), known as the “glycan shield”, is a key determinant of immunogenicity, yet intrinsic heterogeneity confounds typical structure-function analysis. Here we present an integrated approach of single-particle electron cryomicroscopy (cryo-EM) and computational modeling to probe glycan shield structure and behavior at multiple levels. We found that dynamics lead to an extensive network of inter-glycan interactions and drive higher-order structuring within the glycan shield. This structure defines diffuse boundaries between buried and exposed protein surface and provides a mapping of potentially immunogenic sites on Env. Analysis of the same Env across a range of glycosylation states revealed that subtle changes in glycan occupancy, composition, and dynamics can impact glycan shield structure and epitope accessibility. We also performed site-specific mass-spectrometry analysis on the same samples and show how cryo-EM can complement such studies. Finally, we found that highly connected glycan sub-domains are resistant to enzymatic digestion and help stabilize the pre-fusion trimer state, suggesting functionality beyond immune evasion.

The Human Immunodeficiency Virus Type 1 (HIV-1) Envelope Glycoprotein (Env) is the sole antigen on the surface of 30 the virion and has thus evolved several tactics for evading the adaptive immune system, chief among which is 31 extensive surface glycosylation [1] [2] [3] . Env has one of the highest densities of N-linked glycosylation sites known, with 32 ~1/21 extracellular residues being glycosylated, accounting for ~1/2 the mass of the molecule [4] [5] [6] [7] [8] . This sugar coat, 33 referred to as the "glycan shield," is common among viral fusion proteins and is believed to be a primary hurdle in the 115 the data is thresholded, or contoured, at some intensity level so that no voxel of lesser value is displayed. The surface 116 constructed from this thresholded map is called an isosurface. This way of viewing and interpreting cryo-EM 117 reconstructions is well suited for the typical goals of structural biology, but when the reconstruction contains 118 variations in resolution and intensity, as is often the case for flexible and dynamic molecules and molecular complexes, 119 a single map processed in this manner is of limited value.

The limitations of this single-map approach are well illustrated in the context of fully glycosylated Env, as 121 shown in Figure 1 . The global resolution of this C3-symmetric reconstruction determined by Fourier shell correlation 122 (FSC) was ~3.1Å (Supplemental Figure 1 ). The bulk of the Env protein structure is at or below the global resolution 123 ( Figure 1C ) and ~80% of the Env amino acid (aa) residues could be confidently built into this map, with missing residues 124 occurring around the flexible variable loops 1, 2, 4, and 5, the HR1 helix, and the N and C-termini. Most of the N-linked 125 glycans (colored green) on the other hand, are ill-defined beyond the core N-acetylglucosamine (NAG), and we can 126 only account for ~15% of the total glycan mass in this sharpened map ( Figure 1D ). Glycans located on the poorly 127 resolved flexible loops (N185e, N185h, N339, N398, N406, N411, N462) could not be identified at all, however MS 128 analysis confirms that all the potential N-linked glycosylation sites (PNGS) predicted from the sequence are indeed 129 glycosylated[35] (Supplemental Figure 12A ). This indicates that the glycan shield is highly dynamic with respect to the 130 protein core but exhibits some relative differences in ordering between glycans. In a later section, we show how to 131 quantify these differences more precisely. At a low threshold ( Figure 1C ), noise appears surrounding Env's well 132 resolved protein core where the missing glycan mass should be, however it is not apparent if there exists any 133 meaningful structure within this noise.

Scale-space and 3-D variance analysis reveal interconnectivity and higher-order structure within the glycan shield

To look for potential structure within the glycan shield we first examined its properties across resolution scales by 137 progressively smoothing the map with a Gaussian kernel of increasing standard deviation (SD) and visualizing them at 138 5 their noise threshold (Figure 2A ). We use a simple topological metric for determining the noise threshold based on 139 connected components. A connected component is any connected and isolated group of 'on' voxels in a binary volume 140 or isosurface. In the language of topology, the number of connected components defines the 0th Betti number of the 141 map at that threshold [36] . When the log of this number is plotted as a function of intensity for the series of filtered 142 maps shown in Figure 2A we see the same general trend ( Figure 2B and Supplemental Figure 7C ). The initial rise-and-143 fall corresponds to high-intensity features emerging, growing, and merging, while the second sharp peak corresponds 144 to low-intensity noise. We defined the noise threshold as the lowest intensity point in the local minimum between 145 the signal and noise peaks (red circles Figure 2B ).

Examination of this series of filtered maps reveals progressively more structural features in the glycan shield.

At higher SD, there is a dramatic expansion of non-protein signal. In Figure 2C we plot the noise threshold as a function 148 of SD as well as the total volume at the noise threshold. Both curves have a sigmoidal shape, with a rapid reduction in 149 the noise threshold and concurrent increase in volume, followed by a plateau around ~1.5-2 SD. We interpret this 150 plateau as indicating there is negligible gain in signal with further smoothing. Although volume dilation with 151 decreasing threshold is normal for cryo-EM maps, the amount of dilation surrounding regions known to be 152 glycosylated is greater than around regions that are not. This is well illustrated by plotting cross-sections through the 153 six maps ( Figure 2D ).

Having identified the proper resolution scale for analyzing the glycan shield, we examined how its features 155 change with intensity. The series of thresholded maps shown in Figure 2E illustrate how the topology of the glycan 156 shield changes dramatically with intensity, becoming increasingly interconnected. Signal from individual glycans 157 begins as isolated components then merges with neighboring glycans to form higher-order, multi-glycan structures.

While the threshold needed to best capture the structure and connectivity of the glycan shield is not well defined, the 159 higher order structure formed by neighboring glycans is clearly seen as the threshold is decreased.

As a complementary approach for visualization and characterization of the glycan shield, we performed 3-D 161 variance analysis on the same dataset using the function sx3dvariability from the SPARX software package [37, 38] 162 (Supplemental Figure 2) . We see high variability around the constant domains of the Fabs (clipped from view) as well 163 as loops and the exterior of the protein surface at the sites of N-linked glycans. In particular, variability in the glycans 164 is highest at the distal ends of the glycan stalks and expands outward as the threshold is reduced. Top and side views 165 of the SPARX 3-D variability map are shown at three intensity thresholds and it exhibits features similar to the Gaussian 166 filtered maps. In particular, non-uniform intensity across the surface (between glycans) and interconnectivity at low 167 threshold. It is clear from the 3-D variability map how the higher-order structure within the glycan shield acts to 168 occlude nearby non-glycosylated protein surfaces (red arrow -Supplemental Figure 2B ). (Figure 3 ). Ten such models are shown in Figure 3C , one 181 from each protein scaffold, while the full set of models at a single PNGS is shown in Figure 3D . We also repeated the 182 simulation with uniform mannose-5 (Man5) glycosylation for comparison.

To determine if our simulations converged, we first calculated the root mean-squared fluctuation (RMSF) 184 for each glycan (see Methods) across all 1000 models after aligning the protein scaffold ( Figure 3E ), then compared it 185 to the average of randomly selected equally sized subsets (Supplemental Figure 3D ). We see that the mean RSMF 186 values between the subsets are nearly identical and the standard deviations from the means are very small, indicating 187 convergence. A similar trend can be seen for the per-glycan sampled volumes (Supplemental Figure 3E ). 

N301, N332, N448, and N611 sites for example, have lower RMSF. This leads to a large difference in sampled volumes 196 between the most and least dynamic glycans (Supplemental Figure 5A ). We also see an increase in average RMSF 197 within a single glycan as a function of glycan residue number starting from the first NAG (Supplemental Figure 3C ),

which is in line with the cryo-EM results showing reduced resolution beyond this residue ( Figure 1D ).

We see a similar trend in the average RMSF values from the Man5 ensemble (Supplemental Figure 4A 7 However, crowding from neighboring glycans is not the only factor that can influence glycan flexibility, it can 208 also be influenced by the local protein structure. In our modeling pipeline, the protein backbone was kept 209 harmonically restrained close to the template structure to allow for extensive sampling of glycan conformations using 210 simulated annealing, without leading to unfolding of the underlying protein. Thus, we see that the Asn sidechains of 211 residues 88, 160, 197, 234 and 262 all have very low RMSF (Supplemental Figure 3F) , possibly stemming to some 212 extent from limited torsional space available during modeling. The glycosylated Asn residues in gp41 have relatively 213 low RMSF as well (N611, N618, N625, and N637), being situated on stable helical bundles (Supplemental Figure 5B ).

This ultimately results in a relative reduction of the glycan dynamics at some of these sites ( Figure 3C ). Correcting for 215 the contribution to fluctuations coming from the underlying protein, we observed that the RMSF between the 216 different glycans are comparable, ranging from 3 Å to 5 Å, with similar scale of SD ( Figure 3F ). 

The simulated cryo-EM map reproduced some of the defining features of the experimental data. Like in the 231 experimental map, refinement was dominated by the stable protein core and only the first few sugar residues at each 232 site are defined at the global FSC resolution and high intensity thresholds ( Figure 4B ). We replicated the scale-space 233 analysis from Figure 2C on the simulated maps and observed a similar trend, with the plateau again appearing around 234 1.5-2 SD (Supplemental Figure 7A -B). In fact, the volume of the 1.5 SD map contoured at its noise threshold closely 235 approximates the total sampled volume measured directly from the atomic models (dashed line). When comparing 236 BG505_Man9 to the curves produced from the BG505_Man5 and BG505_PO reconstructions, the additional glycan 237 volume recovered by the filtering and thresholding process is apparent, indicating cryo-EM can detect global changes 238 in glycosylation between reconstructions. After filtering, we observed a similar threshold-dependent evolution 239 towards a more connected topology ( Figure 4C ), where at the lowest thresholds the majority of the protein surface is 240 occluded, and the glycan shield is completely interconnected. We even replicated the SPARX 3-D variability analysis 241 on the simulated Man9 dataset and observed very similar results (Supplemental Figure 8A ). By comparing the SPARX 8 method to the true per-voxel 3-D variance calculated without projection and refinement, we confirmed there is 243 negligible difference between the two (Supplemental Figure 8B ). These results suggest the HT-AM pipeline can 244 capture globally similar features to the experimental data, however we would like to assess the accuracy of the models 245 at a more local level.

Measuring individual glycan dynamics from synthetic cryo-EM maps 248 Cryo-EM maps represent the average structure of all the particles that went into their construction after alignment 249 to a common reference, and therefore if each molecule is chemically identical, the intensity at a particular voxel will 

The HT-AM pipeline reproduces physiologically relevant trends in glycan dynamics measured by cryo-EM

With a method in place for measuring glycan flexibility from cryo-EM maps, we could then make direct comparisons 266 to the experimental data. First, we built and relaxed glycan stalks into the 1.5 SD Gaussian filtered BG505_293F map 267 as described above using the refined model as a scaffold. We could identify clear signal at 21/28 PNGS, two less than 268 from the simulated map, suggesting the glycans at N398 and N462 are more dynamic than captured by the simulation.

However, the fact that the other V2 and V4 loop glycans could not be identified in both maps means the simulated 270 dynamics agree with our experimental data at least up to the detection limit of the method. Overall, we found the 271 HT-AM pipeline captures a similar trend in ordering with a correlation coefficient between the two of ~0.46 (p = 0.03) 272 ( Figure 5D ).

We observe deviations from the experimental results around a few glycans. In addition to the V4 and V5 loop 274 glycans at N398 and N462, we also see a large deviation at N137 on the V1 loop. The V1 and V5 loops are both dynamic 275 in the experimental data (as determined by reduced resolution) but were not modeled in the simulations, therefore 276 differences are to be expected at these sites. Outside of V loops, two major deviations occur the N262 and N301 9 glycans. In the BG505_293F map, the glycan at N262 is the most ordered due to its stabilizing contacts with the gp120 278 core, and these interactions may not be accurately captured by the simulation given the restricted protein dynamics.

The glycan at N301 on the other hand, is dynamic in the BG505_293F map but showed both low Asn RMSF and low 280 glycan RMSF during the simulation (Supplemental Figure 3F and Figure 3E ). In gp41, a large deviation also occurred at 281 the N611 glycan, which is dynamic in the BG505_293F map. We attribute these differences to restrictive sampling at 282 the protein backbone level as previously discussed (Supplemental Figure 5) . To complicate the comparison, the N618 283 and N625 sites are significantly under-occupied as revealed by MS (Supplemental Figure 10 ). As we show in the next 284 section, sub-occupancy will cause a reduction in local signal intensity due to averaging, which will corrupt 285 measurements of glycan dynamics and even affect the dynamics and processing of neighboring glycans.

Overall, the positive correlation between the experimental data and theoretical predictions shows that the 

Using the methodology presented above we should be able to detect site-specific changes in dynamics and occupancy 294 between differentially glycosylated Env. To test this, we performed a comparative analysis between the BG505_Man9 295 and BG505_Man5 synthetic reconstructions. Given that Man9 and Man5 glycans are identical up to the 5 th mannose 296 residue, the only changes in intensity around the BMA residue should arise from differences in dynamics alone. On 297 average, we see a ~17% reduction in intensity indicative of increased dynamics, which is in line the RMSF data ( Figure   298 6A). We also accurately detect the largest increase and decrease in dynamics at the N262 and N234 sites respectively.

To verify we could detect changes in occupancy, we removed the glycan at the N625 site from half of the 300 models and re-refined the data (referred to as BG505_Man9HO for "half occupancy"). Not surprisingly, we see a near 301 50% reduction in mean intensity from the fully occupied reconstruction around this site ( Figure 6A ). The glycan at 302 N625 is one of the most dynamic, so the relative intensities do not change much, however, this shows how sub-303 occupancy can affect measurements of flexibility, and thus should be taken into account when making comparisons 304 to theoretical estimates.

Another technique that should be sensitive to subtle changes between similar cryo-EM maps is difference 306 mapping, which involves subtracting one cryo-EM map from another. Indeed, the change in occupancy at the N625 307 site is apparent in the BG505_Man9 -BG505_Man9HO difference map ( Figure 6B ). At high intensity, the signal is 308 localized around the glycan stalk and extends to the protein surface. Even at low threshold there is still no other 309 detectable difference between the two independent reconstructions. In the BG505_Man9 -BG505_Man5 difference 310 map however ( Figure 6C ), the difference signal is strongest where the distal tips of the Man9 glycans would be, but 311 expands to include the entire additional sampled volume at low threshold. Shown below for comparison is the BG505_Man9 -BG505_PO difference map, which isolates the full contribution of the glycan shield to the cryo-EM 313 reconstruction ( Figure 6D ). These results establish cryo-EM as a tool for measuring glycan dynamics as well as changes 314 in chemical composition and occupancy.

Insights gained from analysis of synthetic cryo-EM data allow improved characterization of cell-type specific 317 differences in glycan shield structure, dynamics, and chemical composition

We illustrated using simulated data that cryo-EM is capable of capturing subtle changes in glycan structure, dynamics, 319 and chemical composition, so next we sought to test this experimentally. To do so, we collected cryo-EM data on 

Both the BG505_CHO and BG505_293S datasets refined to ~3Å-resolution (Supplemental Figure 1 ) and we 327 observe nearly identical Ca positions between the three structures (Supplemental Figure 1 ). In addition, we 328 performed the same scale-space analysis shown in Figure 

The MS data also shows reduced occupancy of the CHO sample compared to 293F at multiple sites 345 (Supplemental Figure 12) . Indeed, upon closer examination we see that the difference signal around the gp41 glycans 346 at N611, N618, and N625 in the BG505_293F -BG505_CHO difference map extends all the way to the protein surface 11 ( Figure 7H ), indicative of changes in occupancy. In this difference map as well, there is clear signal around the N137 348 site (confirmed by MS) and to a lesser extent at the tip of the N332 glycan stalk ( Figure 7I ). Given the proximity of 349 N137 and N332, it is plausible that sub-occupancy at one is driving changes in dynamics and/or glycan distribution at 350 the other. Additionally, as the CHO sample came from the pre-trial GMP test-run, these observations have clear clinical With a network in place we can analyze the relative influence of each glycan on the whole system and 378 examine its long-range structure. To do this we calculated the relative eigenvector centrality of the nodes, which is 379 given by the sum of the centrality values of the nodes connected to it. Effectively, importance of each node is 380 determined by the total importance of all its direct neighbors. The normalized eigencentrality of the glycans are 381 projected on the network as a colormap in Figure 8C . We see that the high mannose patch glycans at N332, N339, 

Incorporated intrinsically into the network is a set of stable sub-graphs that represent highly connected 387 glycan clusters, as is evident in the adjacency matrix ( Figure 8A ). To illustrate the hierarchy of these clusters we 388 progressively stripped the network using tighter overlap cutoffs (Supplemental Figure 14) . As the network is 389 degraded, we first see the formation of two large sub-graphs; one composed of the V1/V2 apex and the gp120 outer Experimentally validating the proposed interaction networks and sub-domain structure is not as straightforward as 398 validating individual glycan dynamics. However, we hypothesized that highly connected glycans would be protected 399 from enzymatic digestion and conversely that sparsely connected glycans would be more susceptible. If confirmed, it 400 could provide indirect validation of our network models. To test this, we exposed the BG505_293S sample (already 401 complexed with RM20A3) to digestion by Endoglycosidase H (Endo H) and performed cryo-EM on samples after 2hrs 402 and 16 hrs of digestion. Endo H cleaves only high-mannose type glycans between the first and second residues, leaving 403 the core NAG attached ( Figure 9A ). The datasets, referred to as BG505_EndoH2 and BG505_EndoH16, reconstructed 404 to ~3.2Å and 3.5Å respectively, with similar overall quality, resulting in highly similar atomic models (Supplemental 405 Figure 1 ).

Using the methods presented above, we characterized the glycosylation state at each PNGS from the two 407 digestion intermediates. Indeed, we found that digestion occurs non-uniformly between glycans ( Figure 9B ). This was 408 confirmed by the difference maps ( Figure 9C 

If we assume a linear relationship between intensity and occupancy and use the MS data to determine the 413 initial occupancy at each site, we can calculate the percent occupancy after digestion ( Figure 9B ). After 2 hours we 414 see complete digestion of the gp41 glycans (N611-637) while some glycans, particularly those at the high-mannose 415 patch (N197, N295, N332, N363, N386, N392, and N448), remain mostly intact. Apparent in the 0-2hr difference map 416 only is signal around the V2, V4, and V5 loop glycans ( Figure 9C ). This indicates the dynamic V-loop glycans are highly 13 susceptible to digestion. We also found partial occupancy at a few sites, suggesting non-uniform digestion between 418 the particles. For example, the apex glycans at N156 and N160 as well as the glycans at N133, N197, and N234 all 419 showed partial signal reduction. After 16 hours we saw almost complete digestion of the glycan shield, with only 420 reduced occupancy detected around the previously discussed cluster composed of the N363, N386, N137, and N197 421 glycans, as well as a cluster composed of the N295, N332, and N448 glycans. In addition, the highly protected glycan 422 at N262 remained completely intact after 16 hrs.

By quantifying the degree of protection from Endo H (see Methods) and comparing it to the predicted 424 eigencentralities from the proposed network model, we obtain a correlation coefficient of ~0.8 (p=1.14e-05),

suggesting highly connected glycans are resistant to enzymatic digestion ( Figure 9E ). Also evident is the similarity 426 between the persistent glycan clusters and the sub-graphs presented in Supplemental Figure 14 and 15. The stripping 427 protocol used to define the sub-graphs can be seen as mimicking the gradual digestion by Endo H, and the stable 428 subnetworks that persist closely match the glycan clusters remaining after digestion ( Figure 9D ). Thus, confirming our 429 initial hypothesis, and providing indirect experimental validation of our proposed network models. 

revealed an increasing degree of protein unfolding and subunit dissociation that appeared to initiate from the V1-3 432 loops in the trimer apex. We identified 4 distinct classes; a stable trimer, unfolding V1-3 loops, dissociated gp120, and 433 a monomer/dimer class ( Figure 10A) . A "junk" class was also detected, which could be more highly degraded sample 434 or misclassified contaminants left over from picking. As the reaction progressed, the percentages of the unfolded 435 trimer classes increased, while the percentage of stably folded trimers decreased ( Figure 10B ). Although it cannot be 436 easily confirmed, the unfolded trimers within these datasets are likely to be more completely de-glycosylated than 437 the particles that make up the stable trimeric classes. Thus, it would appear that the highly connected glycan sub- Env, structural and physical models of the glycan shield are incomplete. Towards that goal, we presented an integrated 446 experimental and theoretical approach aimed at illuminating glycan shield structure and behavior at multiple levels.

Prior to this study, there were conflicting reports of glycan shield structure from cryo-EM and X-ray 

It is known that immune responses to Env preferentially target glycan-depleted surface area[9,10], and the 463 results presented here provide the first experimentally determined mapping of this surface. As the maps are 464 contoured from high to low intensity the glycan volume expands and the topology becomes more connected, while 465 the accessible surface area shrinks. Because intensity scales with the probability of a glycan occupying that region of 466 space, the "strength" of the shielding effect will too, and thus the boundaries delineating shielded from exposed , and here we showed that cryo-EM is highly sensitive to occupancy. Our results

suggest there may be more substantial differences between the CHO and 293F samples than observed by MS, and 484 given the known impact of changes in glycosylation, in particular occupancy, on the immunogenicity of Env, this has 485 clear implications for the currently on-going human clinical trials that are based on the CHO purified Env trimer. In addition, we found that reduced occupancy at the N137 site opens up a large glycan hole at V1 and appears to impact 487 the composition and/or dynamics at the neighboring N332 site in V3 ( Figure 7I ). Similar behavior was previously 488 observed via MS by comparing glycoform distributions before and after knocking out a particular glycan[46-48]. Such 489 higher-order effects on glycan processing and dynamics can be easily explained by our structural and computational 490 findings. Beyond the detection of changes in glycosylation, we found that glycan dynamics measured by local intensity 491 in the cryo-EM maps is strongly correlated with the extent of glycan processing measured by the percentage of high-492 mannose type glycans at each site (Supplemental Figure 12B-C) . Indicating the extent of processing at each site can 493 be predicted from cryo-EM maps alone. Processing was also correlated with susceptibility to Endo H digestion 494 (correlation coefficient = 0.6167, p = 0.0029). The relationship between these variables reflects their mutual 495 dependence upon local glycan density and protein structure, both of which can reduce dynamics and restrict access 496 of Endo H and glycan processing enzymes. Indeed, we also see a strong correlation between processing and the 497 measure of glycan crowding introduced earlier (Supplemental Figure 6D ).

We showed the ALLOSMOD-based HT-AM pipeline is capable of reproducing key features of the 499 experimental data and obtaining physiologically relevant sampling at most glycan sites. Our work represents the first 500 case of using cryo-EM to validate atomistic models of glycoprotein ensembles, and in turn, their use in guiding cryo-501 EM analysis. We found that glycan flexibility deviated from the experimental observations when the dynamics of the 502 underlying protein are greater than can be captured by the current pipeline. Thus, exploring solutions for enhanced 503 sampling of the protein backbone represents a clear route for improving accuracy of the pipeline. Another logical next 504 step is to use the experimental cryo-EM maps to steer the glycan modeling process, which has been successful at the 505 level of protein ensembles[21,49,50]. On the experimental side, our ability to accurately capture differences in glycan 506 dynamics from cryo-EM maps is limited by additional uncertainties that we did not quantify in this paper and could 507 therefore be contributing to the observed deviations. Improving upon the cryo-EM methodology will be equally as Figure 6E ). Transient deflections of neighboring glycans away from a site can temporarily 519 expose it to digestion, but the probability of this occurring will be influenced by the local density and the dynamics of 520 the surrounding glycans. As the peripheral glycans were digested away it revealed a core set of highly connected glycan sub-domains which showed close resemblance to the sub-graphs generated by defining a more stringent 522 overlap cutoff within the network. Although a direct comparison between our probabilistic network model and the 523 networks constructed from the MD simulations[31,32] would not be entirely accurate, we found they show a similar 524 overall structure. Taken together, these results lend strong support to the general accuracy of our network model, 525 which is an important validation considering the HT-AM pipeline does not robustly sample the protein backbones, nor 526 does it capture temporal dynamics.

The observation that enzymatic de-glycosylation leads to progressive destabilization of the Env trimer was 528 somewhat surprising. The most recent investigations into the effect of glycan knockouts and de-glycosylation on Env 529 stability and viral infectivity suggest the glycan shield has little to no effect on either [51, 52] , however other studies and not fully understood. One potential mechanism is stabilizing interactions between the core NAG and neighboring 533 side chains, which are observed throughout Env and are common in other glycoproteins. However, Endo H leaves the 534 core NAG attached, so the stability must arise by another mechanism. We hypothesize that the dense packing of 535 glycans serves to dampen the underlying protein dynamics. In line with this, we observed an increase in glycan RMSF 536 when using Man5 in place of the larger Man9, which we interpret as more crowding in the glycan canopy (above the 537 stalks) leading to reduced dynamics closer to the protein surface. Furthermore, the correlation we observed between 538 Endo H protection and network eigencentrality suggests glycan-glycan interactions might also play a role. Even though 539 a single interaction may be weak, the combined effect of many such interactions in a densely glycosylated region 

The samples were analyzed on an Q Exactive HF-X mass spectrometer (Thermo). Samples were injected directly onto 574 a 25 cm, 100 μm ID column packed with BEH 1.7 μm C18 resin (Waters). Samples were separated at a flow rate of 300 575 nL/min on a nLC 1200 (Thermo). Solutions A and B were 0.1% formic acid in 5% and 80% acetonitrile, respectively. A 

The MS data were processed as previously [35] . The data were searched against the proteome database and 584 quantified using peak area in Integrated Proteomics Pipeline-IP2. For samples produced in HEK293F and CHO cells, 585 glycosites (N-X-T/S) with N + 203 were identified as sites with high mannose glycans removed by the initial Endo H 586 treatment (high mannose), the glycosites with N + 3 were identified as sites whose glycans were complex type glycans 587 removed by PNGase F, and glycosites with N+0 were identified as sites that had no glycan prior to endoglycosidase 588 treatments. Since samples produced in HEK293S are only high mannose, and were treated only with PNGase F, sites 589 with N+3 were identified as sites with high mannose. 

Cryo-EM data processing 618 All non-custom cryo-EM data processing, which includes particle picking, 2-D and 3-D classification, refinement, per- processing. The initial processing usually involved 2 rounds of 2-D classification and subset selection followed by 1 625 round of ab-initio classification into 4 classes. Clean classes were then pooled and refined using the class average as a template. All refinements were run with C3 symmetry unless stated otherwise. At this stage, refinement meta-data 627 was downloaded from CryoSparc and re-formatted for processing in RELION. Following refinement in RELION, per-628 particle CTF and beam-tilt refinement were performed, followed by another round of refinement, and 1 or more 

The total volume at the noise threshold was calculated as the sum of the volumes of all the connected components.

Noise threshold and total volume were then calculated as a function of Gaussian filter SD to create the plot shown in 660 Figure 2B and Supplemental Figure 7 . The process was performed before and after resampling and intensity equalization, also with the Fabs masked out to better capture the relevant signal (Supplemental Figure 7) . The mask 662 used to delete the Fab here and for difference mapping has been deposited to the EMDB with the BG505_293F map. Figure 9B ). The cumulative absolute 691 deviation from the mean across all glycans was calculated for each individual glycan residue, which suggested the 692 BMA residue closely approximates the full-glycan RMSF (Supplemental Figure 9A) . The correlation analysis was 693 repeated using each of the 11 individual glycan residues in place of the average and we saw negligible differences 694 (Supplemental Figure 9C ). This same model was used to analyze the BG505_CHO data, while minor adjustments to 695 the N611 glycan were made to for the BG505_293S map. This same procedure was used to analyze all the simulated 21 cryo-EM maps. To quantify the percent occupancy at each site after Endo H digestion we assumed a linear relationship 697 between signal intensity and occupancy, with any intensity <= 0 being considered fully digested. Initial occupancies 698 were determined by MS. To calculate the Endo H protection score we added the percent occupancy at each site from 699 the EndoH2 and EndoH16 maps and normalized the results between 0 and 1. This can be seen as the integral of 700 occupancy with respect to reaction time. All local intensity analysis was repeated before and after intensity 701 equalization in RELION for comparison (Supplemental Figure 10) . 

The absolute intensity scales of the simulated maps are different from the experimental maps however we observed 769 minimal change with intensity equalization. SPARX 3-D variability maps were calculated in the same manner as the 770 experimental data however without including CTF information. Because SPARX re-extracts particles, the center of 

All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.

Additional data related to this paper may be requested from the authors. The electron potential maps have been 

Functions of HIV envelope glycans

Factors limiting the immunogenicity of HIV-1 832 gp120 envelope glycoproteins

Evolutionary dynamics of the glycan shield of 834 the human immunodeficiency virus envelope during natural infection and implications for exposure of the 2G12 835 epitope

Neutralization of the AIDS retrovirus 837 by antibodies to a recombinant envelope glycoprotein. Science. American Association for the Advancement of 838 Science

Envelope glycans of immunodeficiency 840 virions are almost entirely oligomannose antigens

Trimeric HIV-1-Env Structures Define 843

Evolutionary and immunological implications 845 of contemporary HIV-1 variation

Assignment of intrachain disulfide bonds 847 and characterization of potential glycosylation sites of the type 1 recombinant human immunodeficiency virus 848 envelope glycoprotein (gp120) expressed in Chinese hamster ovary cells

Antibody neutralization and escape by HIV-1

Ward AB, Wilson IA. The HIV-1 envelope glycoprotein structure: nailing down a moving target

Cryo-EM structure of a fully glycosylated soluble 857 cleaved HIV-1 envelope trimer

Cryo-EM structure of a native, fully glycosylated, cleaved HIV-1 envelope trimer

Cryo-electron Microscopy Analysis of Structurally Heterogeneous Macromolecular Complexes

Cryo-electron microscopy for structural analysis of dynamic biological macromolecules

How cryo-EM is revolutionizing structural biology

Quantifying the local resolution of cryo-EM density maps

Resolution and Probabilistic Models of Components

CryoEM Maps of Mature P22 Bacteriophage

A Multi-model Approach to Assessing Local and Global Cryo-EM Map Quality

Model-based local density sharpening of cryo-EM maps. Elife. eLife Sciences 875 Publications Limited

All-atom molecular dynamics of 879 the HBV capsid reveals insights into biological function and cryo-EM resolution limits

Automatic local resolution-based 882 sharpening of cryo-EM maps

Thresholding of cryo-EM density maps by false discovery rate control

Biophysical Characterization of a Nanodisc with 886 and without BAX: An Integrative Study Using Molecular Dynamics Simulations and Cryo-EM

Role of N-glycosylation in activation of proMMP-9. A molecular dynamics simulations study

Effects of N-glycosylation on protein conformation and dynamics: Protein Data Bank analysis 891 and molecular dynamics simulation study. Sci Rep

Long-ranged Protein-glycan Interactions Stabilize von 894

Willebrand Factor A2 Domain from Mechanical Unfolding. Sci Rep

Analysis of site-specific N-glycan remodeling in 897 the endoplasmic reticulum and the Golgi

Glycan Shield of an HIV-1 Envelope Trimer After the Loss of a Glycan. Sci Rep

Conformational Heterogeneity of the HIV Envelope Glycan 902

Microsecond Dynamics and Network Analysis of the HIV-1 SOSIP Env 904

Trimer Reveal Collective Behavior and Conserved Microdomains of the Glycan Shield

A next-generation cleaved

664 gp140, expresses multiple epitopes for broadly neutralizing but not non-neutralizing 908 antibodies

Native-like Env trimers as a platform for HIV-1 vaccine design

Differential processing of HIV envelope 913 glycans on the virus and soluble recombinant trimer

SPARX, a new environment for Cryo-EM 917 image processing

A primer to single-particle cryo-electron microscopy

All-atom ensemble modeling to analyze small-angle x-ray scattering of 921 glycosylated proteins

Env Trimers Resulting from Removal of a Conserved CD4 Binding Site-Proximal Glycan

Accelerated cryo-EM structure determination with parallelisation 926 using GPUs in RELION-2. Elife. eLife Sciences Publications Limited

664, an extensively glycosylated, trimeric HIV-1 envelope glycoprotein vaccine candidate

Structure of an HIV gp120 envelope 931 glycoprotein in complex with the CD4 receptor and a neutralizing human antibody

Model Building and Refinement of a Natively Glycosylated HIV-1 Env Protein 934 by High-Resolution Cryoelectron Microscopy

Integrity of Glycosylation 938 Processing of a Glycan-Depleted Trimeric HIV-1 Immunogen Targeting Key B-Cell Lineages

Composition and Antigenic Effects 941 of Individual Glycan Sites of a Trimeric HIV-1 Envelope Glycoprotein

Glycan clustering stabilizes the 944 mannose patch of HIV-1 and preserves vulnerability to broadly neutralizing antibodies

Bayesian Weighing of Electron Cryo-947 Microscopy Data for Integrative Structural Modeling

Determination of protein structural ensembles using cryo-electron microscopy

HIV-1 envelope subunit protein gp120 is not required for native trimer formation or viral infectivity

Partial enzymatic deglycosylation preserves 956 the structure of cleaved recombinant HIV-1 envelope glycoprotein trimers

A systematic study of the N-glycosylation sites of HIV-959 1 envelope protein on infectivity and antibody-mediated neutralization

Nonrandom distribution of gp120 N-linked glycosylation 962 sites important for infectivity of human immunodeficiency virus type 1

Location-specific, unequal contribution of the N 965 glycans in simian immunodeficiency virus gp120 to viral infectivity and removal of multiple glycans without 966 disturbing infectivity

Intracellular functions of N-linked glycans

Modulation of protein biophysical properties by chemical 970 glycosylation: biochemical insights and biomedical implications

Effect of glycosylation on protein folding: a close look at thermodynamic stabilization

Effect of N-linked glycosylation on glycopeptide and glycoprotein structure

Region in the V1V2 Variable Domain of the HIV-1 Envelope gp120 Protein. Rein A, editor

The influence of N-linked glycans on the 980 molecular dynamics of the HIV-1 gp120 V3 loop

Effects of glycosylation on the conformation and dynamics of O-linked 983 glycoproteins: carbon-13 NMR studies of ovine submaxillary mucin

Stabilization of proteins by glycosylation examined by NMR analysis of a fucosylated 986 proteinase inhibitor

Effects of glycosylation on protein structure and dynamics in ribonuclease B and some of its 988 individual glycoforms

How hydrophobicity and the glycosylation site of glycans affect protein folding and stability: a 991 molecular dynamics simulation

Folding of glycoproteins: toward understanding the biophysics of the glycosylation 994 code

The core trisaccharide of an N-linked glycoprotein 996 intrinsically accelerates folding and enhances stability

Open and closed structures reveal 999 allostery and pliability in the HIV-1 envelope spike. Nature

HIV-1 VACCINES. HIV-1 neutralizing antibodies 1002 induced by native-like envelope trimers

Recombinant HIV envelope trimer 1005 selects for quaternary-dependent antibodies targeting the trimer apex

A generalized HIV vaccine 1008 design strategy for priming of broadly neutralizing antibody responses. Science. American Association for the 1009 Advancement of Science

MotionCor2: anisotropic correction of beam-1011 induced motion for improved cryo-electron microscopy

Real-time CTF determination and correction

New tools for automated high-resolution 1016 cryo-EM structure determination in RELION-3

cryoSPARC: algorithms for rapid unsupervised cryo-EM structure 1019 determination

EMHP: An accurate automated hole masking algorithm for single-1021 particle cryo-EM image processing

Quantitative analysis of cryo-EM density map segmentation 1023 by watershed and scale-space filtering, and fitting of structures by alignment to regions

UCSF Chimera--a visualization 1026 system for exploratory research and analysis

SWISS-MODEL: homology modelling 1028 of protein structures and complexes

Automated structure refinement of macromolecular 1030 assemblies from cryo-EM maps using Rosetta. Elife. eLife Sciences Publications Limited

MolProbity: all-atom structure 1033 validation for macromolecular crystallography

EMRinger: side chain-directed model and 1036 map validation for 3D cryo-electron microscopy

Structural analysis of glycoproteins: building N-linked glycans with Coot

Automatically Fixing Errors in 1041

LLC. The PyMOL Molecular Graphics System

Structure-based model of allostery predicts coupling between distant sites

Comparative protein 1046 structure modeling using Modeller

Sali A. Comparative protein modeling by satisfaction of spatial restraints

PROCHECK: a program to check the stereochemical 1051 quality of protein structures

Optimization of the additive CHARMM all-atom protein 1054 force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles

CHARMM36 all-atom additive protein force field: validation based on comparison to NMR 1057 data

VMD: visual molecular dynamics

EMAN2: an extensible image processing suite for 1061 electron microscopy