key: cord-0757087-x8jgcqts authors: Uslupehlivan, Muhammet; Şener, Ecem title: Computational analysis of SARS-CoV-2 S1 protein O-glycosylation and phosphorylation modifications and identifying potential target positions against CD209L-mannose interaction to inhibit initial binding of the virus date: 2020-03-25 journal: bioRxiv DOI: 10.1101/2020.03.25.007898 sha: a5319eb45848af0ff724120423ecceaab3e88284 doc_id: 757087 cord_uid: x8jgcqts Covid-19 outbreak is still threatening the public health. Therefore, in the middle of the pandemic, all kind of knowledge on SARS-CoV-2 may help us to find the solution. Determining the 3D structures of the proteins involved in host-pathogen interactions are of great importance in the fight against infection. Besides, post-translational modifications of the protein on 3D structure should be revealed in order to understand the protein function since these modifications are responsible for the host-pathogen interaction. Based on these, we predicted O-glycosylation and phosphorylation positions using full amino acid sequence of S1 protein. Candidate positions were further analyzed with enzyme binding activity, solvent accessibility, surface area parameters and the positions determined with high accuracy rate were used to design full 3D glycoprotein structure of the S1 protein using carbohydrate force field. In addition, the interaction between the C-type lectin and α-mannose residues was examined and carbohydrate recognition positions were predicted. We suggest these positions as a potential target for the inhibition of the initial binding of SARS-CoV-2 S1 protein to the host cell. The coronavirus spike protein (S) plays a key role in the early steps of viral infection. S protein comprises of two sub-units S1 and S2 which are responsible for the binding of the host cell receptor and fusion of the cellular membrane, respectively [1, 2, 3] . S protein also contains furin cleavage site at the boundary between the S1-S2 sub-units which mediates the membrane fusion and virus infectivity [1, 2] . The S1 protein mediates an initial high affinity interaction with a cellular receptor. It has been suggested that different domains within a single S protein could bind multiple alternative receptors. Although ACE2 (angiotensin-converting enzyme 2) is known to be the SARS-CoV receptor [2, 4] , CD209L, a C-type lectin that binds to high-mannose glycans on glycoproteins, has also been found to be as an alternative receptor for SARS-CoV [5] . However, recent studies have mainly focused on the S protein-ACE2 interaction, there is a paucity of information on the SARS-CoV-2 S1 protein-CD209L lectin interaction. It has been found that CD209L lectin binds to SARS-CoV, human coronavirus 229E, Ebola, Hepatitis C, HIV, Influenza virus glycoproteins and may mediate the endocytosis of pathogens [6] [7] [8] [9] [10] [11] . Thus, identification of the carbohydrate-lectin interaction sites on the 3D structure for understanding of the initial binding of virus to a host cell is crucial. Also, as post-translational modifications regulate the host-pathogen interaction [12] , identifying SARS-CoV-2 S1 protein modifications may help us to inhibit initial binding of the virus. 3D crystal structure of SARS-CoV-2 S protein with N-glycosylation positions was identified but, sites such as furin cleavage and Oglycosylation are missing. In addition, the detailed structure of N-glycan units was not included in this model [13] . As glycan structures typically exist in solution or on proteins, it is a big challenge to characterize the 3D structure of glycoproteins experimentally. However, computational structural biology allows us to generate 3D protein and glycoprotein modelling with high accuracy rate using all amino acid sequence [14] [15] [16] . Herein, we predicted Oglycosylation and phosphorylation positions using full amino acid sequence of S1 protein. Candidate positions were further analyzed with enzyme binding activity, solvent accessibility, surface area parameters and the positions determined with high accuracy rate were used to design full 3D glycoprotein structure of the S1 protein using carbohydrate force field. In addition, the interaction between the C-type lectin and α-mannose residues was examined and carbohydrate recognition positions were predicted. We suggest these positions as a potential target for the inhibition of the initial binding of SARS-CoV-2 to the host cell. Determining the 3D structures of the proteins involved in host-pathogen interactions are of great importance in the fight against infection. Besides, post-translational modifications of the protein on 3D structure should be revealed in order to understand the protein function since these modifications are responsible for the host-pathogen interaction. Therefore, researchers primarily have focused on the S protein structure of SARS-CoV-2 virus and revealed the 3D structure with N-glycosylation positions. But, the detailed structure of the N-glycan units in these sites and the O-glycosylation positions are not included in the current model [13] . Also, it is known that the initial binding of other coronaviruses to host cell occurs via the alternative receptor C-type lectin CD209L besides ACE2 protein [5] . Taken together, herein, we first demonstrated SARS-CoV-2 S1 protein high mannose N-glycan structure and its interaction with CD209L. When 3D glycoprotein structure of S1 protein analysed, N-glycosylation positions were found to be located mainly on the N-Terminal Domain (NTD) (Figure 1 ). Since the Receptor Binding Domain (RBD) is known to be involved in ACE2 binding, we suggest that N-glycan positions on NTD may associated with the binding of C-type lectin. C-type lectins interact with α-linked mannose residues on high mannose N-glycan structures [17] . There has been a structural study on C-type lectin-glycan interaction but terminal sugar is not mannose here [18] . Thus, in this study, the interaction between the C-type lectin and α-D-mannose was analysed and interface residues were identified. ZDOCK docking and PDBePISA interface interaction analysis showed in two models with high accuracy rate that mannose sugar interacts with C-type lectin at Met282, Lys307, and Ser345 positions with hydrogen bonding ( Fig. 2 and Table 1 ). We suggest these positions as the α-D-mannose recognizing sites having function on the lectin-sugar binding. Secondly, we identified O-GalNAc and O-GlcNAc modifications of SARS-CoV-2 S1 protein. O-glycans are known to be involved in protein stability and function [19] . The presence of Oglycans in some viral proteins has been demonstrated and suggested that glycans may play a role on biological activity of viral proteins. In the comparative study on human SARS-CoV-2 and other coronaviruses S proteins have showed that Ser673, Thr678, and Ser686 are conserved O-glycosylated positions and have suggested that SARS-CoV-2 S1 protein may show Oglycosylation at these positions [20] . In this study, S1 protein was found to be O-GalNAcylated at Thr632, Thr678 and O-β-GlcNAcylated at Thr323, Thr638, Ser686 positions (Table 2) . When compared our results with the previous study, SARS-CoV-2 S1 protein found not to be O-glycosylated at Thr673, but O-GalNAcylated at Thr678 and O-β-GlcNAcylated at Ser686. Besides, we found additional O-glycosylation positions at Thr323, Thr632, and Thr638 on the S1 protein. On the 3D glycoprotein structure, Thr323 was found to be located at RBD and Thr678 and Ser686 located near FRS ( Fig. 1 and 3 ). Since O-glycans are involved in proteinprotein interactions [19] , we suggest that O-glycans at Thr323 may play a role on binding to ACE2 and O-glycans at Thr678, Thr686 may responsible for furin protease enzyme binding. It has been known that O-glycans are responsible for the protein stability and creating mucin like domain as glycan shields involved in immunoevasion [21] . Based on this, we suggest that the O-glycosylation positions found may be involved in the stability of the S1 protein and immunoevasion of virus to the host cell. Phosphorylation of viral proteins are catalyzed by host cell enzymes like glycosylation modifications [12, 22] . In this study, phosphorylation modification of SARS-CoV-2 S1 protein was also examined and 36 positions of Ser/Thr residues were found to be phosphorylated on almost all domains of the protein (Table 3) . Davidson et al. have shown that S1 protein is phosphorylated at 7 positions [23] . Amongst them, Ser459, Ser637, Ser640 positions were found to be correlated with our results. Phosphorylation modification regulates the protein activity and function in cooperation with glycosylation [24, 25] . When we examined the phosphorylation and glycosylation locations of S1 protein on the 3D structure; the phosphorylation positions at Ser161 and Ser162 were found to be located near glycosylation position Asn165. Likewise, Thr109 with Asn234; Ser680 with Thr678 and FRS; Thr604 with Asn603; Thr630 with Thr632; Thr637 and Thr640 with Thr638; Ser659 with Asn657 were found to be located close on the 3D structure (Fig. 3) . So, we suggest these sites as a critical sites for S1 protein activity. Covid-19 outbreak is still threatening the public health. At this point, inhibiting the initial binding of virus to the host cell is crucial in order to find treatment. Since post-translational modifications regulate the host-pathogen interaction, identifying SARS-CoV-2 S1 protein modifications may help us to inhibit initial binding of the virus. Therefore, we focused on the CD209L lectin-α-D-mannose interaction of S1 protein and suggested Met282, Lys307, and Ser345 positions as targets for the initial binding. Also, we found that phosphorylation and glycosylation positions where located at close sites may be critical for S1 protein activity. Therefore, we suggest that, these positions can be targeted for drug development against Covid- The potential O-glycosylation (O-GalNAc) and O-β-GlcNAc sites of S1 protein were analysed via NetOGlyc 4.0 [26] and YinOYang 1.2 Server [27] , respectively. The threshold was chosen as 0.5 for both to predict high potential sites. The potential glycosylation positions which are found on NetOGlyc and YinOYang were analysed with the ISOGlyP server for the enzyme binding activity [28] . ISOGlyP was used to calculate all potential positions with ppGalNAc transferase isoforms which showed high enzyme binding activity and calculates the enhancement value product (EVP) values as an indication of glycosylation rates [29] . The potential phosphorylation positions of S1 protein was analysed with NetPhos 3.1 Server [30] . NetSurfP v1.1 was used to assess the surface and solvent accessibility of predicted Ser and Thr positions with glycosylation and phosphorylation positions [31] . The relative solvent accessibility of potential glycosylation positions was analysed using Poly-View 2D-SABLE protein structure prediction server [32] . 3D structure models involving full amino acid sequence of S1 protein was taken with QHD4341 ID from I-TASSER (Iterative Threading ASSEmbly Refinement) [33] . The structure model has been generated by the C-I-TASSER pipeline, which utilizes deep convolutional neural-network based contact-map predictions to guide the I-TASSER fragment assembly simulations. GLYCAM-Web Server (AMBER carbohydrate force field) was used to screen highly reliable glycosylation sites on the protein 3D structure. GLYCAM-Web is dedicated to simplifying the prediction of three-dimensional structures of carbohydrates and macromolecular structures involving carbohydrates. Glycoprotein Builder runs SASA (Solvent Accessible Surface Area) prediction and finds the most appropriate glycosylation sites for adding glycan units on the 3D structure using molecular dynamics simulation [34, 35] . Glycan units skeleton were chosen as Core 1 type for O-glycan units and the common shape of high mannose type for N-glycan units. Core-1 type is the most common O-glycan motif on proteins. The common shape of high mannose type is a conserved motif that plays an important role on the N-glycan formation [17] . Additionally, all potential phosphorylation sites were analysed for SASA parameters on the 3D protein structure using GLYCAM and the positions with high accuracy rates were chosen as phosphorylation positions. In order to find out the CD209L lectinmannose interaction, ZDOCK docking server was used [36] . CD209L lectin 3D structure was taken from PDB (ID: 1XPH) [37] and α-D-mannose structure was taken from Glyco3D database [38] . PDBePISA server was used for the exploration of macromolecular interfaces between the receptor-ligand [39] . All structures were visualized with PyMOL. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors have no conflict of interest to declare. T1 T2 T3 T4 T5 T10 T11 T12 T13 T14 T16 T/S ratio Table 3 . The predicted phosphorylation positions of SARS-CoV-2 S1 protein. All positions were analysed with the kinase binding, surface, and solvent accessibility parameters. The kinase specificity and protein localizations were also given in the Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor A 193-amino acid fragment of the SARS coronavirus S protein efficiently binds angiotensin-converting enzyme 2 CD209L (L-SIGN) is a receptor for severe acute respiratory syndrome coronavirus C-type lectins DC-SIGN and L-SIGN mediate cellular entry by Ebola virus in cis and in trans Differential N-linked glycosylation of human immunodeficiency virus and Ebola virus envelope glycoproteins modulates interactions with DC-SIGN and DC-SIGNR L-SIGN (CD209L) and DC-SIGN (CD209) mediate transinfection of liver cells by hepatitis C virus Human coronavirus 229E can use CD209L (L-SIGN) to enter cells N-linked glycosylation facilitates sialic acid-independent attachment and entry of influenza A viruses into cells expressing DC-SIGN or L-SIGN DC-SIGN and DC-SIGNR interact with the glycoprotein of Marburg virus and the S protein of severe acute respiratory syndrome coronavirus Glycovirology protocols Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Regulation of the metastatic cell phenotype by sialylated glycans Substitution of the N-glycan function in glycosyltransferases by specific amino acids: ST3Gal-V as a model enzyme Introduction to glycobiology Structural basis for selective recognition of oligosaccharides by DC-SIGN and DC-SIGNR Pathways and functions of mammalian protein glycosylation The proximal origin of SARS-CoV-2 Global aspects of viral glycosylation Identifying protein phosphorylation sites with kinase substrate specificity on human viruses Characterisation of the transcriptome and proteome of SARS-CoV-2 using direct RNA sequencing and tandem mass spectrometry reveals evidence for a cell passage induced in-frame deletion in the spike glycoprotein that removes the furin-like cleavage site Prediction of posttranslational glycosylation and phosphorylation of proteins from the amino acid sequence O-GlcNAcylation of key nuclear and cytoskeletal proteins: reciprocity with O-phosphorylation and putative roles in protein multimerization Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology Prediction of glycosylation across the human proteome and the correlation to protein function Identification of common and unique peptide substrate preferences for the UDP-GalNAc: polypeptide α-N-acetylgalactosaminyltransferases T1 and T2 derived from oriented random peptide substrates Emerging paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family of glycosyltransferases Sequence and structure-based prediction of eukaryotic protein phosphorylation sites A generic method for assignment of reliability scores applied to solvent accessibility predictions Accurate prediction of solvent accessibility using neural networks-based regression I-TASSER server for protein 3D structure prediction GLYCAM06: a generalizable biomolecular force field. Carbohydrates Importance of ligand conformational energies in carbohydrate docking: Sorting the wheat from the chaff ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers The structure of DC-SIGNR with a portion of its repeat domain lends insights to modeling of the receptor tetramer Glyco3D: a portal for structural glycosciences Inference of macromolecular assemblies from crystalline state