key: cord-0825347-8j7c9grl authors: Shajahan, Asif; Supekar, Nitin T; Gleinich, Anne S; Azadi, Parastoo title: Deducing the N- and O- glycosylation profile of the spike protein of novel coronavirus SARS-CoV-2 date: 2020-05-04 journal: Glycobiology DOI: 10.1093/glycob/cwaa042 sha: a9de1de98dd6b0c767bd449c74e213d2473c1477 doc_id: 825347 cord_uid: 8j7c9grl The current emergence of the novel coronavirus pandemic caused by SARS-CoV-2 demands the development of new therapeutic strategies to prevent rapid progress of mortalities. The coronavirus spike (S) protein, which facilitates viral attachment, entry and membrane fusion is heavily glycosylated and plays a critical role in the elicitation of the host immune response. The spike protein is comprised of two protein subunits (S1 and S2), which together possess 22 potential N-glycosylation sites. Herein, we report the glycosylation mapping on spike protein subunits S1 and S2 expressed on human cells through high resolution mass spectrometry. We have characterized the quantitative N-glycosylation profile on spike protein and interestingly, observed unexpected O-glycosylation modifications on the receptor binding domain (RBD) of spike protein subunit S1. Even though O-glycosylation has been predicted on the spike protein of SARS-CoV-2, this is the first report of experimental data for both the site of O-glycosylation and identity of the O-glycans attached on the subunit S1. Our data on the N- and O- glycosylation is strengthened by extensive manual interpretation of each glycopeptide spectra in addition to using bioinformatics tools to confirm the complexity of glycosylation in the spike protein. The elucidation of the glycan repertoire on the spike protein provides insights into the viral binding studies and more importantly, propels research towards the development of a suitable vaccine candidate. U N C O R R E C T E D M A N U S C R I P T U N C O R R E C T E D M A N U S C R I P T 3 The current emergence of the novel coronavirus pandemic caused by SARS-CoV-2 demands the development of new therapeutic strategies to prevent rapid progress of mortalities. The coronavirus spike (S) protein, which facilitates viral attachment, entry and membrane fusion is heavily glycosylated and plays a critical role in the elicitation of the host immune response. The spike protein is comprised of two protein subunits (S1 and S2), which together possess 22 potential N-glycosylation sites. Herein, we report the glycosylation mapping on spike protein subunits S1 and S2 expressed on human cells through high resolution mass spectrometry. We have characterized the quantitative N-glycosylation profile on spike protein and interestingly, observed unexpected O-glycosylation modifications on the receptor binding domain (RBD) of spike protein subunit S1. Even though O-glycosylation has been predicted on the spike protein of SARS-CoV-2, this is the first report of experimental data for both the site of O-glycosylation and identity of the O-glycans attached on the subunit S1. Our data on the N-and Oglycosylation is strengthened by extensive manual interpretation of each glycopeptide spectra in addition to using bioinformatics tools to confirm the complexity of glycosylation in the spike protein. The elucidation of the glycan repertoire on the spike protein provides insights into the viral binding studies and more importantly, propels research towards the development of a suitable vaccine candidate. The current major health crisis is caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that rapidly spread globally within weeks in early 2020. This highly transmissible infectious disease causes a respiratory illness named COVID-19 (Huang, C., Wang, Y., et al. 2020 , Wu, F., Zhao, S., et al. 2020 . As of the 31 st of March 2020, 750890 cases of COVID-19 and 36405 COVID-19-related deaths have been confirmed globally by the World Health Organization (WHO) (World Health Organization 2020b). To date, no specific medical treatments or vaccines for COVID-19 have been approved (Li, G.D. and De Clercq, E. 2020, World Health Organization 2020a) . Therefore, the scientific community is expending great effort in compiling data regarding the virus, as well as the respiratory illness caused by it, to find effective ways of dealing with this health crisis. The pathogenic SARS-CoV-2 enters human target cells via its viral transmembrane spike (S) glycoprotein. The spike protein is a trimeric class I fusion protein and consists of two subunits, namely S1 and S2. The S1 subunit facilitates the attachment of the virus, and subsequently the S2 subunit allows for the fusion of the viral and human cellular membranes (Hoffmann, M., Kleine-Weber, H., et al. 2020 , Walls, A.C., Park, Y.J., et al. 2020 , Zhou, P., Yang, X.L., et al. 2020 . The entry receptor for SARS-CoV-2 has been identified as the human angiotensin-converting enzyme 2 (hACE2), and recent studies determined a high binding affinity to hACE2 (Hoffmann, M., Kleine-Weber, H., et al. 2020 , Shang, J., Ye, G., et al. 2020 , Walls, A.C., Park, Y.J., et al. 2020 . Given its literal key role, the S protein is one of the major targets for the development of specific medical treatments or vaccines: neutralizing antibodies targeting the spike proteins of SARS-CoV-2 could prevent the virus from binding to the hACE2 entry receptor and therefore from entering the host cell (Shang, J., Ye, G., et al. 2020) . Each monomer of the S protein is highly glycosylated with 22 predicted N-linked glycosylation sites. Furthermore, three O-glycosylation sites were also predicted (Andersen, K.G., Rambaut, A., et al. 2020) . Cryo-electron microscopy (Cryo-EM) provides evidence for the existence of 14-16 N-glycans on 22 potential sites in the SARS-CoV-2 S protein (Walls, A.C., Park, Y.J., et al. 2020) . The glycosylation pattern of the spike protein is a crucial characteristic to be considered regarding steric hindrance, chemical properties and even as a potential target for mutation in the future. The N-glycans on S protein play important roles in proper protein folding and priming by host proteases (Walls, A.C., Park, Y.J., et al. 2020) . Since glycans can shield the amino acid residues and other epitopes from cells and antibody recognition, glycosylation can enable the coronavirus to evade both the innate and adaptive immune responses (Walls, A.C., Park, Y.J., et al. 2020 , Walls, A.C., Xiong, X., et al. 2019 . Elucidating the glycosylation of the viral S protein can aid in understanding viral binding with receptors, fusion, entry, replication and also in designing suitable antigens for vaccine development Here, we report the site-specific quantitative N-linked and O-linked glycan profiling on SARS-CoV-2 subunit S1 and S2 protein through glycoproteomics using high resolution LC-MS/MS. We used recombinant SARS-CoV-2 subunit S1 and S2 expressed in human cells, HEK 293, and observed partial N-glycan occupancy on 17 out of 22 N-glycosylation sites. We found that the remaining five N-glycosylation sites were unoccupied. Remarkably, we have unambiguously identified 2 unexpected O-glycosylation sites at the receptor binding domain (RBD) of subunit S1. O-glycosylation on the spike protein of SARS-CoV-2 is predicted in some recent reports and most of these predictions are for sites in proximity to furin cleavage site 6 (S1/S2) as similar sites are O-glycosylated in SARS-CoV-1 (Andersen, K.G., Rambaut, A., et al. 2020, Uslupehlivan, M. and Sener, E. 2020) . However, we observed O-glycosylation at two sites on the RBD of spike protein subunit S1, and this is the first report on the evidence for such glycan modification at a crucial binding location of the spike protein. Site-specific analysis of Nand O-glycosylation information of SARS-CoV-2 spike protein provides basic understanding of the viral structure, crucial for the identification of immunogens for vaccine design. This in turn has the potential of leading to future therapeutic intervention or prevention of COVID-19. Studies over the past two decades have shown that glycosylation on the protein antigens can play crucial roles in the adaptive immune response. Thus, it is obvious that the glycosylation on the protein antigen is relevant for the development of vaccines, and it is widely accepted that the lack of information about the glycosylation sites hampers the design of such vaccines (Wolfert, M.A. and Boons, G.J. 2013) . We have procured culture supernatants of HEK 293 cells expressing SARS-CoV-2 subunit S1 and subunit S2 separately. The proteins were expressed with a His tag with Val16 to Gln690 for subunit S1 and Met697 to Pro1213 for subunit S2. According to manufacturers, SDS-PAGE of the proteins showed a higher molecular weight than the predicted 75 and 60 kDa, respectively, due to glycosylation. Since the proteins were unpurified, we fractionated them through SDS-PAGE on separate lanes and cut the bands corresponding to subunit S1 and subunit S2. The gels were stained with Coomassie dye, and gel bands were cut into small pieces, de-stained, reduced, alkylated and subjected to in-gel protease digestion. We employed trypsin, chymotrypsin, and both trypsin-chymotrypsin in combination to generate glycopeptides that contain a single N-linked glycan site. Purified subunit S2 was also digested by trypsin- chymotrypsin combination through in-solution digestion. The glycopeptides were further analyzed by high resolution LC-MS/MS, using a glycan oxonium ion product dependent HCD triggered CID program. The LC-MS/MS data were analyzed using Byonic software, each detected spectrum was manually validated and false detections eliminated. We identified the glycan compositions at 17 out of the 22 predicted N-glycosylation sites of the SARS-CoV-2 S1 and S2 proteins and found the remaining five sites unoccupied (Figure 2 , 3, 4, S1-S24). We observed high mannose, hybrid and complex-type glycans across the Nglycosylation sites. We quantified the relative intensities of glycans at each site by comparing the area under the curve of each glycopeptide peak on the LC-MS chromatogram. A recent preprint investigated the N-glycosylation on SARS-CoV-2 spike protein expressed on FreeStyle293F human cells and reported prevalence of hybrid-type glycans (Watanabe, Y., Allen, J.D., et al. 2020) . In contrast, we observed a combination of high mannose and complextype, but fewer hybrid-type glycans on most of the sites. We discovered predominantly highly processed sialylated complex-type glycans on sites N165, N282, N801, N1074, and N1098 ( Figure 4 , 5). The highly sialylated glycans at N234 and N282 adjacent to RBD can act as determinant in viral binding with hACE2 receptors (Hoffmann, M., Kleine-Weber, H., et al. 2020 , Tortorici, M.A., Walls, A.C., et al. 2019 , Walls, A.C., Park, Y.J., et al. 2020 . Similar to one recent report, we observed Man 5 GlcNAc 2 as a predominant structure across all S1 sites (Watanabe, Y., Allen, J.D., et al. 2020) . However, we observed significantly unoccupied peptides on seven N-glycosylation sites ( Figure 2 ). Sites N17, N603, N1134, N1158 and N1173 were completely unoccupied, although further studies with higher concentration, alternative protease digestion strategies and purity of proteins are required to validate this finding (Figure 1 , 2). On subunit S2, the assignments at sites N709 and N1134 was ambiguous as the quality of the MS/MS spectra was not satisfactory and we are currently evaluating the possibilities of other post translational modifications adjacent to these sites. While we detected unglycosylated peptide for site N1194 through high quality MS/MS spectrum, glycosylation at this site is ambiguous. We have evalulated O-glycosylation on the S1 and S2 subunits of SARS-CoV RBD located in the S1 subunit of SARS-CoV-2 undergoes a hinge-like dynamic movement to enhance the capture of the receptor RBD with hACE2, displaying 10-20-fold higher affinity for the human ACE2 receptor than SARS-CoV-1, which partially explains the higher transmissibility of the new virus (Wrapp, D., Wang, N., et al. 2020 , Yan, R., Zhang, Y., et al. 2020 . The residues Thr323 and Ser325 are located at the RBD of the S1 subunit of SARS-Cov-2, and thus the O-glycosylation at this location could play a critical role in viral binding with hACE2 receptors Figure 3 ) (Andersen, K.G., Rambaut, A., et al. 2020 , Hoffmann, M., Kleine-Weber, H., et al. 2020 . Our observation will pave the way for future studies to understand the implication of Oglycosylation at the RBD of S1 protein in viral attachment with hACE2 receptors. Our Dithiothreitol (DTT) and iodoacetamide (IAA) were purchased from Sigma Aldrich (St. Louis, MO). Sequencing-grade modified trypsin and chymotrypsin were purchased from Promega (Madison, WI). All other reagents were purchased from Sigma Aldrich unless indicated otherwise. Data analysis was performed using Byonic 3.5 software and manually using Xcalibur 4.2. The SARS-CoV-2 spike protein cuture supernatant subunit S1 (Cat. No. 230-20407) and Cat. No. 230-20408) , and purified subunit S2 were purchased from RayBiotech (Atlanta, GA). The protein subunits S1 and S2 as HEK 293 culture supernatants were fractionated on separate lanes using SDS-PAGE. The gel was stained by Coomassie dye and the bands corresponding to subunit S1 (200 to 100 kDa) and subunit S2 (150 to 80 kDa) were cut into smaller pieces (1 mm squares approx.) and transferred to clean tubes. The gel pieces were de- The glycoprotein digests were analyzed on an Orbitrap Fusion Tribrid mass spectrometer equipped with a nanospray ion source and connected to a Dionex binary solvent system The LC-MS/MS spectra of tryptic, chymotryptic and combined tryptic /chymotryptic digests of glycoproteins were searched against the FASTA sequence of spike protein S1 and S2 subunit using the Byonic software by choosing appropriate peptide cleavage sites (semi-specific cleavage option enabled). Oxidation of methionine, deamidation of asparagine and glutamine, possible common human N-glycans and O-glycan masses were used as variable modifications. The LC-MS/MS spectra were also analyzed manually for the glycopeptides with the support of the Xcalibur software. The HCDpdCID MS 2 spectra of glycopeptides were evaluated for the glycan neutral loss pattern, oxonium ions and glycopeptide fragmentations to assign the sequence and the presence of glycans in the glycopeptides. Financial support from the US National Institutes of Health (S10OD018530) is gratefully acknowledged. This work was also supported in part by the U.S. Department of Energy, Office A.S. and P.A. conceived of the paper; A.S., N.S. and A.G. contributed equally and performed experiments; everyone contributed toward writing the paper; P.A. monitored the project. The authors certify that they have no competing interests. Legends to figures Figure 1 : The SARS-CoV-2 spike proteins recombinantly expressed on HEK293 supernatant were fractionated through SDA-PAGE, subsequently digested by proteases and analyzed by nLC-NSI-MS/MS. The expression of SARS-CoV-2 spike protein subunits S1 and S2 on HEK 293 culture supernatant showed higher molecular weight upon SDS-PAGE than expected, because of glycosylation. Thus, the gel bands corresponding to the molecular weight of 200 kDa to 100 kDa for S1 and 150 kDa to 80 kDa for S2 were cut, proteins were lysed after reductionalkylation and analyzed by LC-MS/MS (created with biorender.com). Purified S2 were processed after in solution protease digestion. S C R I P T Figure 2 : Glycosylation profile on coronavirus SARS-CoV-2 characterized by high-resolution LC-MS/MS. About 17 N-glycosylation sites were found occupied out of 22 potential sites along with two O-glycosylation sites bearing core-1 type O-glycans. Some N-glycosylation sites were partially glycosylated. Monosaccharide symbols follow the SNFG (Symbol Nomenclature for Glycans) system (Varki, A., Cummings, R.D., et al. 2015) . 13 sites on subunit S1; b. 9 sites on subunit S2. RA -Relative abundances. Monosaccharide symbols follow the SNFG system (Varki, A., Cummings, R.D., et al. 2015) . representative N-glycopeptide TQSLLIVNNATNVVIK (site N122) of spike protein subunit S1; b. representative N-glycopeptide TPPIKDFGGFNFSQILPDPSKPSKR (site N801) of spike protein subunit S2. Monosaccharide symbols follow the SNFG system (Varki, A., Cummings, R.D., et al. 2015) . representative O-Glycopeptide 320 VQPTESIVR 328 with core 1 type GalNAcGalNeuAc 2 glycan detected on site Thr323 of spike protein subunit S1; b. representative O-Glycopeptide 320 VQPTESIVR 328 with core 2 type GalNAcGalNeuAc(GlcNAcGalNeuAc) glycan detected on site Thr323 of spike protein subunit S1. Monosaccharide symbols follow the SNFG system (Varki, A., Cummings, R.D., et al. 2015) . Emerging viruses and current strategies for vaccine intervention The proximal origin of SARS-CoV-2 Global aspects of viral glycosylation The SARS coronavirus S glycoprotein receptor binding domain: fine mapping and functional characterization Why Glycosylation Matters in Building a Better Flu Vaccine SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Clinical features of patients infected with 2019 novel coronavirus in Wuhan Therapeutic options for the 2019 novel coronavirus (2019-nCoV) Structural Constraints Determine the Glycosylation of HIV-1 Glycomic and glycoproteomic analysis of glycoproteins-a tutorial Structural basis of receptor recognition by SARS-CoV-2 Extreme C-terminal sites are posttranslocationally glycosylated by the STT3B isoform of the OST Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology Database analysis of O-glycosylation sites in proteins Symbol Nomenclature for Graphical Representations of Glycans Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion Site-specific analysis of the SARS-CoV-2 glycan shield Exploitation of glycosylation in enveloped virus pathobiology Adaptive immune activation: glycosylation does matter World Health Organization. 2020a. Coronavirus, Licence: CC BY-NC-SA 3.0 IGO World Health Organization. 2020b. WHO COVID-19 Situation Report -71, Licence: CC BY-NC Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation A new coronavirus associated with human respiratory disease in China Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Site-specific N-glycosylation Characterization of Recombinant SARS-CoV-2 Spike Proteins using High-Resolution Mass Spectrometry Identification of N-linked glycosylation sites in the spike protein and their functional impact on the replication and infectivity of coronavirus infectious bronchitis virus in cell culture A pneumonia outbreak associated with a new coronavirus of probable bat origin