key: cord-0322487-ery85j24 authors: Crouch, Lucy I.; Urbanowicz, Paulina A.; Baslé, Arnaud; Cai, Zhi-Peng; Liu, Li; Voglmeir, Josef; Diaz, Javier M. Melo; Benedict, Samuel T.; Spencer, Daniel I.R.; Bolam, David N. title: Plant N-glycan breakdown by human gut Bacteroides date: 2022-04-07 journal: bioRxiv DOI: 10.1101/2022.04.07.487459 sha: 9cc213a8f6cb2e53e60d97835622a62884ccf411 doc_id: 322487 cord_uid: ery85j24 The major nutrients available to the human colonic microbiota are complex glycans derived from the diet. To degrade this highly variable mix of sugar structures, gut microbes have acquired a huge array of different carbohydrate-active enzymes (CAZymes), predominantly glycoside hydrolases, many of which have specificities that can be exploited for a range of different applications. Plant N-glycans are prevalent on proteins produced by plants and thus components of the diet, but the breakdown of these complex molecules by the gut microbiota has not been explored. Plant N-glycans are also well characterised allergens in pollen and some plant-based foods, and when plants are used in heterologous protein production for medical applications, the N-glycans present can pose a risk to therapeutic function and stability. Here we use a novel genome association approach for enzyme discovery to identify a breakdown pathway for plant complex N-glycans encoded by a gut Bacteroides species and biochemically characterise five CAZymes involved, including structures of the PNGase and GH92 α-mannosidase. These enzymes provide a toolbox for the modification of plant N-glycans for a range of potential applications. Furthermore, the keystone PNGase also has activity against insect-type N-glycans, which we discuss from the perspective of insects as a nutrient source. Complex carbohydrates from a wide range of sources are the major nutrients available to the 37 colonic microbiota. Degradation of these complex macromolecules by the microbiota is achieved 38 through the massive expansion in genes encoding carbohydrate-active enzymes (CAZymes), with some 39 species of gut microbe encoding >300 CAZyme genes from different families. Bacteroides species are 40 particularly adept at glycan breakdown and typically organise the genes encoding the apparatus required 41 for the breakdown of a particular glycan into discrete co-regulated loci (polysaccharide utilisation loci; 42 PULs). A typical Bacteroides PUL comprises genes encoding CAZymes, the outer membrane glycan 43 import system (SusC/D homologues), surface glycan binding proteins (SGBPs), and sensor-regulators 44 (1) . Discrete CAZyme gene clusters can also exist without the Sus or other PUL components, co-45 regulated with the core PUL apparatus located elsewhere on the genome. As a general rule, the more 46 complex the substrate, the higher the number of CAZymes, CAZy families, and number of loci 47 involved. For example, in Bacteroides thetaiotaomicron, the chemically simple substrate starch requires 48 only one PUL, whereas breakdown of highly variable O-glycans induce the upregulation of 15 PULs (2) . 49 N-glycans are common decorations of secreted proteins from almost all types of organisms and 50 play important roles in protein stability and function (3) . Although the core structures of plant and 51 mammalian N-glycans are conserved, key differences exist in the types of sugar decorations and 52 linkages. As a broad classification, mammalian complex N-glycans commonly have an α-1,6-fucose 53 linked to the base GlcNAc whereas plants frequently have an α-1,3-fucose on this sugar and a β-1,2-54 xylose linked to the first mannose (Fig. 1) . Insect N-glycans commonly have both α-1,3-and α-1, fucose linked to the core N-glycan. Plant complex N-glycans also differ from mammalian complex N-56 glycans in their antennae structures, which have β1,3-galactose and α1,4-fucose decorating the 57 GlcNAcs ( Fig. 1 )(4), although there is significant variation in plant N-glycan structures depending on 58 the species (5) . Plant N-glycans are also of interest as they can be highly antigenic and induce allergic 59 responses in mammals, causing both hayfever and food allergies (6) (7) (8) . Furthermore when plants are 60 used as hosts for heterologous protein production for medical applications and the N-glycans present 61 on these therapeutic proteins can pose a risk to function and stability. Thus there is a need to be able to 62 both characterise and modify plant N-glycans from different sources for a range of applications. Plant 63 N-glycan specific CAZymes would be useful tools for this job, but currently there is a paucity of data 64 describing enzymes that act specifically on plant N-glycan structures. 65 Previous studies investigating carbohydrate-degradation systems in gut bacteria have typically 66 used transcriptomic during growth on a specific glycan to identify the PUL or PULs involved in its 67 breakdown (9, 10) . These experiments require relatively large amounts of substrate for the bacteria to 68 grow on, but for some glycans it may not be possible to isolate enough material, which means that 69 discovery of enzymes that act on these substrates is currently limited. Plant N-glycans are a good 70 example as although these molecules are common components of plant material and therefore widely 71 consumed in the human diet, they are not easily available in the amounts required for transcriptomic 72 studies (9) . Here we describe a genome mining approach -"PULomics" -to look for the enzyme 73 apparatus in Bacteroides species that degrade complex plant N-glycans. We relied on information about 74 activity and specificity for particular enzyme families, assessed them for activity, and also extended this 75 analysis to neighbouring putative CAZyme genes. Here we describe the biochemical characterisation 76 of five CAzymes and two crystal structures to provide a degradation pathway for plant complex N-77 glycans encoded by the human gut microbiome. The study provides a toolbox of activities that will 78 allow modification of plant and insect N-glycan structures for biotechnological and medical 79 applications. 80 81 Bioinformatics analysis shows two types of PNGase are present in Bacteroides species. There are 84 currently two main classes of enzyme that remove N-glycans from glycoproteins; glycoside hydrolases 85 (from either GH18 or GH85 families) and Peptide-N4-(N-acetyl-β-glucosaminyl)asparagine amidases 86 (PNGases). The GH18 and GH85 enzymes hydrolyse the β1,4 glycosidic bond between the two core 87 GlcNAc sugars, whereas PNGase family members cleave the linkage between the first GlcNAc and 88 Asn of the protein/peptide. PNGaseF (or EMTypeI) from Elizabethkingia meningoseptica is widely 89 used to remove mammalian-type N-glycans, which commonly have an α1,6-fucose attached to the first 90 core GlcNAc (known as 'Type I' activity) (11) . However, this enzyme is not able to accommodate N-91 glycan structures with α1,3-fucose attached to the core GlcNAc that are common decorations in plant 92 and insect N-glycans (11) . More recently, a 'Type II' PNGase also from E. meningoseptica was 93 characterised that displayed additional activity towards N-glycans with a core α1,3-fucose 94 (EMTypeII) (11) . Structures of both PNGase enzymes exist and allowed structural insight into how these 95 different α-fucose decorations are accommodated or blocked (11) . In PNGaseF/EMTypeI, the active site 96 residue Glu118 blocks where an α1,3-fucose may have potentially been accommodated, whereas in the 97 EMTypeII structure, the equivalent residue is a Gly350, thus creating a pocket for this sugar (Fig. 1B) . 98 Glu418 in EMTypeII likely carries out the equivalent coordinating role to Glu118 in EMTypeI (11) . 99 Recent work in Bacteroides thetaiotaomicron has described the degradation of mammalian-100 type biantennary complex and high-mannose N-glycans (9, 12) . Both of these systems use a GH18 101 family member to remove the N-glycan from glycoproteins. B. fragilis has also been shown to degrade 102 mammalian complex N-glycan structures (13) and a number of other Bacteroides species have the 103 capacity to grow on glycoproteins with complex N-glycans (9) . We wanted to further investigate the N-104 glycan degradation capacity of prominent gut Bacteroides species by exploring the prevalence and 105 function of putative PNGases encoded by these prominent symbionts. Analysis of The Integrated Microbial Genomes and Microbiomes system (IMG) (14) revealed 107 the presence of putative PNGases in 13 species of Bacteroides, with 7 of these species having two genes 108 each. These PNGase sequences clustered broadly into two groups according to sequence identity (Fig. 109 1C; Table 1 ). Sequence alignment included comparison to those from E. meningoseptica and revealed 110 an interesting trend in terms of the possible substrate preferences of the two groups (Fig. 1D) . Group 1 111 has aspartate residues in the equivalent position to Glu118 in EMTypeI, whereas Group 2 all had 112 glycines at this position (Fig. 1D ). This indicated that Group vulgatus are also available and confirm a similar positioning of the key catalytic residues relative to 116 PNGaseF/EMTypeI (Fig. S1 ). Another striking difference between the two groups is the presence of an N-terminal domain in 118 most of the Group 2 protein sequences (all except B. coprophilus) and not in the group 1 sequences 119 (Fig. S2 ). This N-terminal domain is present in EMTypeII, but not PNGaseF/EMTypeI, and has 120 previously been dubbed the N-terminal bowl-like domain (NBL) (11) . It has a unique structure and 121 unknown function, but the expression of the catalytic domain alone without the NBL did not affect 122 activity or specificity of EMTypeII (11) . A final noticeable difference between the two groups is a small insert between the two eight-124 stranded antiparallel β sheets in the sequences in Group 1 (Fig. S2) cases these enzymes crystallised in as dimers in the asymmetric unit with the twist being the major 129 interaction and the active site was not blocked by this interaction (Fig. S1 ). This could be an indication 130 of dimerization in vivo or an artefact of crystallography. The activity of Group 138 2). Commercially available PNGaseF/EMTypeI was also assayed for comparison. Released N-glycans 139 were subsequently labelled with procainamide and analysed by liquid chromatography-fluorescence 140 detection-electrospray-mass spectrometry (LC-FLD-ESI-MS). The B. fragilis PNGase from Group 1 141 (BF0811 PNGase ) displayed very similar activity to PNGaseF/EMTypeI with the removal of complex and 142 high-mannose N-glycans, but not plant N-glycans (Fig. 2 ). In contrast, the B. massiliensis PNGase 143 from Group 2 (B035DRAFT_03341 PNGase ) showed good activity towards plant-type N-glycans and 144 limited activity towards the other substrates. 145 To further explore the substrate preference of B035DRAFT_03341 PNGase , we used soya and 146 papaya protein extracts as substrates ( Fig. 2 We also tested the activity of B035DRAFT_03341 PNGase against bee venom glycoprotein 151 phospholipase A2, as this type of activity has previously been observed for EMTypeII (11) . 152 B035DRAFT_03341 PNGase was able to remove N-glycans from this insect glycoprotein. A decrease in 153 molecular weight of phospholipase A2 was observed with the addition of B035DRAFT_03341 PNGase and 154 there was also a decrease in intensity when staining for glycoproteins. This indicates the removal of N-155 glycan from this substrate (Fig. S5 ). 156 Notably, the Bacteroides PNGases are predicted to have a type I signal sequences, indicative 157 of localisation to the periplasm (Table 2 ), which would suggest that deglycosylation occurs after the 158 substrate has been imported across the outer membrane. This would be in contrast to GH18 directed 159 cleavage of N-glycans which occurs at the cell surface in B. thetaiotaomicron (9, 12) . One potential 160 reason for this may be that the preferred PNGase substrates are glyco-peptides that are products of 161 proteolytic digestion of glycoproteins, whereas GH18 enzymes deal with native glycoprotein. However, 162 it is also possible that the signal sequences prediction is incorrect and the PNGases are localised to the 163 outside, as has been seen previously (9) . 164 The structure of the B. massiliensis PNGase. To investigate the structural basis for specificity in the 166 Bacteroides PNGase enzymes from Group 2, we solved the structure for B035DRAFT_03341 PNGase to 167 1.95 Å (Table S3 ). The structure consists of two domains: the catalytic domain and an NBL domain, 168 which are linked through a flexible α-helical linker (Fig. 3A) . The catalytic domain consists of two 169 eight-stranded anti-parallel β-sheets, which is a consistent structural feature of PNGase enzymes 170 described so far. The active site residues that are key for activity in EMTypeII are conserved in 171 B035DRAFT_03341 PNGase , with Gly388 occupying a critical position to allow the accommodation of 172 α1,3-fucose (Fig. 3B) . The available space for this fucose is particularly apparent in comparison to 173 PNGaseF/EMTypeI (Fig. 3C ). The α1,6-fucose points away from the active site so is not an issue in 174 terms of blocking activity. The NBL domain has a very similar structure to the one in EMTypeII and is 175 unique to these two proteins. They consist of 11 β-sheets connected by short α-helical regions and 176 disordered loops (Fig. 3D ). Plant N-glycan specific α-1,3-mannosidase. The presence of type II PNGase enzymes in prominent 179 members of the human gut microbiota suggests that these microbes use plant N-glycans as a nutrient 180 source. To identify the other enzymes required to fully degrade these glycans, we examined the putative 181 CAZyme genes adjacent to the PNGase genes in ta number of different species of gut derived 182 Bacteroides. The B035DRAFT_03341 PNGase gene is next to a putative GH92 (B035DRAFT_03340 GH92 ) 183 in B. massiliensis, the characterised members of which are all α-mannosidases (Fig. 4A) . Notably, 184 B035DRAFT_03340 GH92 has a type II signal sequence suggesting it is membrane-associated, although 185 whether the enzyme is localised to the cell surface or faces the periplasm in not known. Incubation of this recombinant enzyme with α-mannobiose of varying linkages showed activity 187 against the α-1,3 substrate only (Fig. S6 ). B035DRAFT_03340 GH92 was then assayed against HRP, 188 which has simple plant-type N-glycan decorations (no antenna decorations; Fig. S6 ). B035DRAFT_03340 GH92 was able to remove the α-1,3-mannose from plant N-glycan heptasaccharide 190 once it was removed from the protein and also whilst the glycan was still attached to the protein ( Fig. 191 4B & Fig S6) . To further study the specificity of B035DRAFT_03340 GH92 it was compared to a GH92 that has 193 specificity towards the α-1,3-mannose linkages in high mannose N-glycans (BT3991 GH92 from B. 194 thetaiotaomicron) (12) . BT3991 GH92 was unable to remove mannose from HRP either with or without 195 the removal of the N-glycan from the protein, whereas B035DRAFT_03340 GH92 could cleave the α-1, mannose from both free HRP glycan or while the N-glycan was attached to protein (Fig. S4 ). The lack 197 of activity seen for BT3991 GH92 is likely due to the plant-specific β-1,2-xylose causing steric hindrance 198 within the active site, whereas B035DRAFT_03340 GH92 is able to accommodate this decoration. 199 Close homologues of B035DRAFT_03340 GH92 are predicted to be in all the Bacteroides species 200 that have a Group 2 PNGase encoded in the genome. These homologues had identity between 77 and 201 96 % (Table S4) An exception is B. neonati that also has a homologue of B035DRAFT_03340 GH92 with 73 % identity, 204 but no putative Group 2 PNGase gene and, unexpectedly, this GH92 gene is adjacent to the putative 205 Group 1 PNGase in B. neonati (Fig. S7 ). The structure of the GH92 α-1,3-mannosidase able to target plant N-glycans. To investigate the 208 structural basis for the unusual specificity displayed by B035DRAFT_03340 GH92 we determined the 209 crystal structure of the enzyme to 1.43 Å (Table S3 ). The enzyme consists of two domains: an N-210 terminal β-sandwich domain composed of 16 antiparallel β-strands domain and an (α/α)6-barrel catalytic 211 domain. These two domains are pinned together by two α-helices, previously dubbed Helix 1 and 2 ( Fig 212 5A ) (15) . The secondary structures of the seven GH92 enzymes with known structures also have these 213 three features: the N-terminal domain, the catalytic domain, and the α-helical linker. The density for 214 several metal ions was observed bound to the protein surface, which were modelled as Na from the 215 crystallisation conditions, except for a Ca near the active site, as this metal has been shown to be key 216 for GH92 activity. 217 The active site of B035DRAFT_03340 GH92 is comprised of residues originating from both the 218 N-and C-terminal domains, which is a common feature of the other GH92 structures ( Fig 5B) . The 219 residues that are conserved within the catalytic site throughout all GH92 enzymes with structures derive 220 from the C-terminal domain. Those residues that vary come from both domains and are the drivers of 221 specificity by interacting with the +1 subsite sugar and beyond. To explore the active site of 222 B035DRAFT_03340 GH92 , we overlaid the nonhydrolysable substrate mimic thiomannobioside from the 223 structure of BT3990 (PDB 2WW1) ( Fig 5D) . From this we could speculate about four subsites, 224 including the -1 α-1,3-mannose, +1 core mannose, +1' xylose and the +1'' α-1,6-mannose. The 225 environment of the -1 mannose is identical to that described for other GH92 enzymes structures (15) (16) (17) . 226 The +1 mannose is likely coordinated by at least three residues: H530, S533, and E532 which come up 227 from underneath the mannose. Overhanging this subsite (and possibly the +2 GlcNAc position also) are 228 three hydrophobic residues Y72, W178, and W209, which is suggestive of π-stacking of the sugars. 229 However, these aromatics are more distant than equivalent residues seen in other GH92 structures (15) . 230 The +1 subsite interactions allows space either side of the +1 mannose for the +1'' mannose and +1' 231 xylose. There are clear pockets for these sugars in the B035DRAFT_03340 GH92 structure (Fig 5E) . 232 The eight GH92 structures already available include five α-1,2-mannosidases, an α-1,3-233 mannosidase, an α-1,4-mannosidase, and a mannose-α-1,4-PO4-mannose mannosidase (15) (16) (17) (18) (19) . 234 Previous comparison of the α-1,2-mannosidase structures revealed three residues coordinating the 235 mannose at the +1 subsite that drive specificity for α-1,2-linkages. These are a Trp from the N-terminal 236 domain and a Glu and His from the C-terminal domain and these are also predicted through sequence 237 alignments to be present in other GH92 α-1,2-mannosidases. SP2145 from Streptococcus pneumoniae 238 PDB 5SW1 was crystallised with a mannose in the +1 subsite and demonstrates these interactions ( Fig 239 S8 ). In an attempt to highlight if there were any similar conserved motifs present for GH92 α-1,3-240 mannosidases, we compared the structures of B035DRAFT_03340 GH92 with BT3130 (PDB 6F8Z; Fig 241 S8 ). This comparison saw no conservation in the active site residues associates with the +1 subsite. 242 However, the residues contributed from the N-terminal domain were tryptophans, like those seen in the 243 α-1,2-mannosidases, but the location and orientation differed (Fig. S8) . Notably, in 244 B035DRAFT_03340 GH92 this tryptophan +1 subsite "lid" is much further away from where the glycan 245 would sit than in other GH92 structures. This lid would possibly reach down further if substrate was 246 present. We carried out phylogenetic analysis of the GH92 enzymes that had been characterised to see 248 if they would cluster according to their activities ( Fig S9) . This was successful in that α-1,2-249 mannosidases and α-1,3-mannosidases clustered together. The sequences were predominantly derived 250 from B. thetaiotaomicron, so this may not be a completely reliable method of predicting specificities. Gene association analysis to identify additional plant N-glycan degrading enzymes. The 253 characterisation of PNGase homologues from gut Bacteroides species revealed there are likely two 254 different PNGase-like activities encoded by these microbes; Group 1 targeting mammalian N-glycans 255 and Group 2 targeting plant N-glycan structures. We were also able to identify an α-1,3-mannosidase 256 with specificity towards plant-type N-glycans by characterising the product of a GH92 gene associated 257 with the Group 2 PNGase gene in B. massiliensis. Genes in the same locus likely have functional 258 associations and this is common in carbohydrate degradative systems in Bacteroidetes. We therefore 259 expanded this concept to identify other putative plant N-glycan targeting CAZymes from Bacteroides 260 species. The group 2 PNGases from B. dorei, B. barnsiae and B. coprophilus are all orphan genes (i.e. 262 no obvious adjacent genes) and the PNGase from B. sartorii only neighbours a susC/D pair, however 263 the Group 2 PNGase genes from B. helcogenes and B. vulgatus all look to be a part of more extensive 264 loci (Fig. S7) . The neighbouring ORFs included putative SusC/D pairs, GH29, GH3, GH130, and 265 additional GH92 enzymes (Fig. S7 ). Using this initial survey, a network of possible functionally related 266 ORFs was built for all the Bacteroides species with putative Group 2 PNGase enzymes (Fig. S7 ). Using 267 this approach, we were able to highlight CAZymes with potential specificity towards plant N-glycans. 268 For B. massiliensis, these CAZymes were located in two further putative loci (Fig. 4A ). One locus 269 consists of a susC/D pair and a putative GH29 and the second has a GH3, a GH2, a sulfatase, and an 270 AraC-type regulator. The activities of these B. massiliensis CAZymes were then explored. 271 272 B035DRAFT_00995 GH3 is a β-xylosidase acting on plant N-glycans. B035DRAFT_00995 GH3 was 273 screened against a variety of pNP substrates and found to be active against pNP-β-xylose. Using this 274 information, we then incubated this enzyme with the plant-type heptasaccharide released by 275 B035DRAFT_03341 PNGase (Fig. 4B ). This was unsuccessful at removing the bisecting β-1,2-xylose. 276 However, when the reaction was carried out also in the presence of B035DRAFT_03340 GH92 to remove 277 the α-1,3-mannose, B035DRAFT_00995 GH3 was able to remove this xylose ( Fig. 4B & Fig. S10 ). B035DRAFT_00995 GH3 has a Type I signal peptide, so is likely localised to the periplasm (Table 2) . 279 B035DRAFT_00995 GH3 activity was also assessed against β-1,4-xylobiose (Fig. S10) B035DRAFT_02132 GH29 is an α-1,3-fucosidase specific to the core decorations of plant N-glycans. 291 The fucose decorating the core GlcNAc of a plant N-glycan is through an α-1,3-linkage, in contrast to 292 the α-1,6-linkage of mammalian-derived N-glycans. Another enzyme identified through the functional 293 association analysis was B035DRAFT_02132 GH29 , which is predicted to be localised to the periplasm 294 (Table S2 ). GH29 family members typically have exo α-1,3/4-fucosidase activities, so 295 B035DRAFT_02132 GH29 was screened against a variety of fucose-containing glycans (Fig. S11 ). B035DRAFT_02132 GH29 was found to only hydrolyse the α-1,3-fucose from Lewis X trisaccharide to 297 completion overnight, which is the glycan most similar to the core of a plant N-glycan out of the defined 298 oligosaccharides that were tested. As a comparison it was only partially active against the α-1,3-fucose 299 from 3-fucosyllactose, which confirms a specificity for GlcNAc over Glc in the +1 subsite. 300 Furthermore, B035DRAFT_02132 GH29 was not able to remove the α-1,4-fucose from Lewis A, which 301 indicates that it does not target the antennary structures of plant N-glycans (Fig. S11 ). When B035DRAFT_02132 GH29 was tested against plant N-glycan heptasaccharide released by 303 B035DRAFT_03341 PNGase , no core fucose removal was observed (Fig. 4B) . However, partial removal 304 of the core α-1,3-fucose was seen once the α-1,3-mannose had been removed by 305 B035DRAFT_03340 GH92 and full removal of the α-1,3-fucose by the GH29 was made possible after 306 removal of the β-1,2-xylose by B035DRAFT_00995 GH3 . These observations provide insights into the 307 likely plant N-glycan degradation pathway in B. massiliensis (Fig. 6) . Homologues of these enzymes 308 were in five out of the seven Bacteroides species with TypeII PNGase enzymes (Table S6 ). There was 309 no obvious homologue in B. vulgatus and B. dorei and there were also no obvious homologues in species 310 without TypeII PNGases. 311 Fucosidases from different sources have previously been shown to act on the α-1,3-linkage core 312 linkage. Most notably, a GH29 from E. meningoseptica, cFase I, can act on the core α-1,3-fucose even 313 when the plant N-glycan has antennary decoration (20) . Another GH29 from Arabidopsis thaliana, 314 AtFUC1, was also able to act on the α-1,3-linkage, but only when the glycan was reduced down to an 315 α-1,3-fucose linked to chitobiose. A knockout of the AtFUC1 gene lead to an accumulation of this 316 trisaccharide in the plant confirming the enzymes specificity towards the core fucose of plant N-317 glycans (21) . Comparison between plant N-glycan degradation pathways in B. massiliensis and a bacterial 320 phytopathogen. Xanthomonas campestris pv. campestris causes black rot disease in Brassica plant 321 species and in a previous study a set of genes upregulated in the presence of GlcNAc was explored in 322 terms of plant N-glycan degradation (22) . These genes were predicted to be putative CAZymes from a 323 range of families and their subsequent characterisation revealed some comparable observations to the 324 work described here. Firstly, a GH92 (NixK) was able to remove the α-1,3-mannose from a plant N-325 glycan heptasacchairde, akin to what was observed here for B035DRAFT_03340 GH92 (42 % identity 326 between these two enzymes). Furthermore, without the removal of this mannose, the activity of other 327 enzymes was blocked, as observed in the B. massiliensis system. 328 Plant N-glycan β-1,2-xylosidase activity was also observed with a GH3 family member (NixI) 329 from X. campestris. NixI has a low identity to B035DRAFT_00995 GH3 of 33 %, but the specificity of 330 acting after the removal of the α-1,3-mannose is the same. In terms of core α-1,3-fucosidase activity, a 331 GH29 (NixE) could remove this sugar, but only when all mannose sugars had been removed, which is 332 not the case for B035DRAFT_02132 GH29 . It is worth noting that the substrate used in this study was a 333 glycopeptide produced from trypsin degradation of avidine produced in corn and not a free N-glycan, 334 which may influence the activities observed. 335 There was no endo-acting enzyme activity characterised for the X. campestris system, although 336 a GH18 was present in the GlcNAc-activated locus. The GH18 is a likely candidate for removal of the 337 N-glycan in X. campestris, unlike B. massiliensis which employs a PNGase. analysis was a GH2, B035DRAFT_00996 GH2 , which was screened against a variety of pNP substrates 342 and found to be active against pNP-β-galactose. This enzyme was initially screened against defined 343 oligosaccharides to determine its specificity (Fig. S12 ). B035DRAFT_00996 GH2 only had activity 344 towards β1,3-linked galactose when GlcNAc was in the +1 position (Lacto-N-biose). It could also act 345 on Lacto-N-tetraose, which has the same linkage and +1 sugar. It was unable to hydrolyse LacNAc or 346 Galβ1,3Glc, which demonstrates the specificity towards the β1,3-linkage and a requirement for the N-347 acetyl group of the +1 GlcNAc, respectively. Furthermore, partial activity was observed towards 348 Galβ1,3GalNAcβ1,3Galβ1,4Glc, which emphasises the importance of the N-acetyl group in the +1 349 sugar with some influence also coming from the C4 hydroxyl of the +1 sugar either being axial or 350 equatorial (Gal or Glc, respectively). These results show that B035DRAFT_00996 GH2 has specificity 351 towards the linkage and +1 GlcNAc sugar found on the antenna of complex plant N-glycans. Activity was also tested against Lewis A trisaccharide, which is the epitope of the antenna 353 structure present on plant N-glycans ( Fig. 1A & Fig. S12 ). B035DRAFT_00996 GH2 was unable to 354 remove the galactose in this case, which suggests that a fucosidase must act before this galactosidase in 355 the breakdown of the full plant N-glycan substrate. Analysis of the galactosidase activity against a soya-356 derived complex plant N-glycan structure confirmed that the Gal decorations could only be removed 357 after the antennary fucoses have first been cleaved (Fig. 4C ). This enzyme is also predicted to be 358 periplasmic (Table 2) . 359 Notably, this galactosidase did not have close homologues in other Bacteroides species and was 360 highlighted in B. massiliensis by its association with B035DRAFT_00995 GH3 xylosidase. Genes 361 encoding putative β-1,3-galactosidases in other species with Group 2 PNGase enzymes were not 362 obvious from the functional association analysis, suggesting the terminal galactose structures are likely 363 targeted by a CAZyme unrelated to this GH2. Antennary fucose removal from plant N-glycans. Complex plant N-glycans are also often decorated 366 with antennary α-1,4-fucose (Fig. 1A) . The GH families that act to remove fucose in an exo-fashion 367 include GH29, GH95, and GH151. GH29 enzymes typically act on α-1,3/4-linkages, but there are some 368 examples of α-1,2-specific enzymes. Only a single GH29 was identified in B. massiliensis using 369 functional association analysis (B035DRAFT_02132 GH29 ) and this was shown to be specific for the core 370 α-1,3 fucose. Therefore, to test the possibility of this species being able to degrade the antennary fucose 371 structures we screened the activity of three further GH29 enzymes from B. massiliensis. All three 372 displayed relatively broad activity against Lewis and fucosyllactose glycans (Fig. S11 ). In particular, 373 all three were able to hydrolyse the α-1,4-fucose from Lewis A trisaccharide, which is the epitope found 374 in plant complex N-glycans, whereas B035DRAFT_02132 GH29 was unable to do this. 375 The GH29 fucosidases were then assessed against soya bean derived N-glycans (Fig. 4C) Accessing the activity of B035DRAFT_00997 sulfatase . Sulfated N-glycans have been observed in a wide 387 variety of organisms ranging from animals to viruses (Fig. S13 ) (23) (24) (25) (26) . These decorations can take the 388 form of GalNAc-6S, GalNAc-4S, Gal-3S, Gal-6S, and Man-6S (36) . To our knowledge, sulfation of 389 plant N-glycans has not yet been observed, however, with N-glycan sulfation being so widespread 390 throughout other organisms it would be surprising if it was not also present in some plants. A putative sulfatase gene adjacent to the genes for the xylosidase and galactosidase 392 (B035DRAFT_00997) was assessed for activity against a variety of sulfated monosaccharides and 393 oligosaccharides (Fig. S13) . No activity could be observed against the tested substrates. It is possible 394 that this enzyme has specificity for a substrate not tested here, but we were not able to test the full 395 spectrum of possibilities. Degradation of high-mannose N-glycan structures. The degradation of complex N-glycans in B. 398 thetaiotaomicron has previously been described (12) . This work showed three GH92 enzymes BT3990, 399 BT3991, and BT3994 would hydrolyse the terminal α1,2-, α1,3-, and the first α1,6-mannose from high 400 mannose N-glycans, respectively, to leave a Manα-1,6Manβ-1,4GlcNac trisaccharide. Homologues of 401 these three enzymes were adjacent to the plant-N-glycan degrading genes in B. helcogenes, therefore it 402 appears that this species has the genes required to degrade high-mannose and plant complex N-glycans 403 in the same place in the genome. Homologues of these enzymes were also traced throughout the other 404 species assessed in this study and found to be well-preserved throughout. Phylogenetic analysis of all 405 the GH92 enzymes from the functional analysis was carried out and these clustered into five groups 406 (Fig S14) . Three of these are likely the GH92 enzymes acting high-mannose N-glycans, one group are 407 likely all α1,3-mannosidases that can accommodate β1,2-xylose (homologues of 408 B035DRAFT_03340 GH92 ) and one remains uncharacterised. 409 410 This study characterises the pathway for the degradation of plant N-glycans by a prominent member of 413 the gut microbiota. This set of enzymes was identified through functional association using putative 414 PNGase enzymes as a starting point. This work demonstrates that it is possible in some cases to find 415 enzymes with particular activities without using gene upregulation methods and only using what is 416 already known about CAZyme families. This is a useful demonstration because for many substrates, 417 like for the plant complex N-glycans described here, it is not possible to perform gene upregulation 418 studies to identify the link between a substrate and set of genes. 419 Here we presented the characterisation of five enzymes against plant complex N-glycans and 420 two of these include crystal structures. The specificity of these enzymes also indicated the order in 421 which they act in vivo (Fig. 6 ). B035DRAFT_03341 PNGase removes the plant N-glycan from the protein 422 and then monosaccharides are removed sequentially from the non-reducing ends. Firstly, an α1,4-423 fucosidase removes the fucose to allow B035DRAFT_00996 GH2 to remove the terminal galactose. A 424 number of different GH20 enzymes have been identified previously that can remove the GlcNAc at this 425 stage and homologues of these enzymes are present in many of Bacteroides species (9) . 426 B035DRAFT_03340 GH92 is then able to remove the α1,3-mannose, followed by 427 B035DRAFT_00995 GH3 removing the β1,2-xylose, and B035DRAFT_02132 GH29 removing the core 428 α1,3-fucose. This leaves a Manα-1,6Manβ-1,4GlcNacβ-1,4GlcNac tetrasaccharide. 429 In addition to providing understanding of how plant N-glycans are hydrolysed in the human 430 gut, the experiments using insect glycoproteins indicate that this type of N-glycan may also be used as 431 a nutrient source for the human gut microbiota. Insects have been a human food source for centuries for 432 some populations, they are common in other primate diets, and there is an increased interest for this 433 western culture largely for environmental and sustainability issues (27) . This report provides new methods to analyse plant complex N-glycans. It also provides more 435 options for modifying proteins decorated with plant and insect N-glycans, such as biopharmaceuticals. One of the biggest potential uses of the CAZymes identified in this study would be in the production of 437 pharmaceutical proteins in different plant species. Successful examples of this type of production 438 include antibodies ("plantibodies"), collagen, vaccines, and enzymes, which can be produced in maize, 439 rice, tobacco, flax, or strawberry (28) . Monoclonal antibodies are potent treatments for a number of 440 human diseases, including cancer and COVID-19 (29) . Variation in the composition of the N-glycans 441 decorating the antibodies have been seen to affect the function of these biological therapeutics (30) . 442 Therefore, increasing the options around being able to modify N-glycans post-production will increase 443 the success rate of different candidates in plants and insects. It will also provide opportunities to reduce 444 the allergenicity of plant-produced proteins. purified from cell-free extracts using immobilised metal affinity chromatography (IMAC using Talon 469 resin; Clontech) as described previously (34) . The purity and size of the proteins were checked using 470 SDS-PAGE and their concentrations determined using absorbance at 280 nm (NanoDrop 2000c; 471 Thermo Scientific) and their molar extinction coefficients (35) . Thin-layer chromatography. For defined oligosaccharides, 3 μl of an assay containing 1 mM substrate 482 was spotted on to silica plates. For assays against mucin, this was increased to 9 μl. The plates were 483 resolved in running buffer containing butanol/acetic acid/water (2:1:1) and stained using a 484 diphenylamine-aniline-phosphoric acid stain (36) . 485 486 Procainamide labelling. Procainamide labelling was performed by reductive amination using a 487 procainamide labelling kit containing sodium cyanoborohydride as a reductant (Ludger). Excess 488 reagents were removed with S cartridges (Ludger). Cartridges were conditioned successively with 1mL 489 of DI water, 5 mL of 30 % acetic acid (v/v), and 1 mL of acetonitrile. Procainamide labelled samples 490 were then spotted on the cartridge and allowed to adsorb for 15 min. The excess dye was washed with 491 acetonitrile. Labelled N-glycans were eluted with 1 mL of DI water. Liquid chromatography-fluorescence detection-electrospray-mass spectrometry analysis of 494 procainamide labelled glycans. Procainamide labelled glycans were analysed by LC-FLR-ESI-MS. 495 Here Analysis of mass spectrometry data. Mass spectrometry of procainamide-labelled glycans was 508 analysed using Bruker Compass Data Analysis Software and GlycoWorkbench (37) . Glycan 509 compositions were elucidated on the basis of MS 2 fragmentation and previously published data. sulfoxide/acetic acid (7:3 v/v)) was added, and the mixture was incubated at 65°C for 4 h. Papaya N-glycan separation. Chromatographic separation of oligosaccharides was carried using a 531 Nexera UPLC system (Shimadzu Corporation, Kyoto, Japan), consisting of a DGU-20A5R degasser 532 unit, a LC-30AD pump, a SIL-30AC autosampler, and a RF-20Axs fluorescence detector (set at 330 533 nm excitation and 420 nm emission) adapted from Guo et al. (39) . Briefly, the analyses were performed 534 using an Acquity BEH Glycan column (Waters 1. proteins was conducted as follows. 1 g of soy protein isolate (purchased from a local supermarket) was 553 washed with deionized (DI) water (3x 10 mL), with centrifugation (10 min X 2500 g) in between each 554 wash. The resulting pellet was homogenized with 20 mL of DI water to form a slurry. 100 μL of the 555 slurry was dried down by vacuum centrifugation and resuspended in 25 μl of 50 mM NaH2PO4-556 Na2HPO4 buffer pH 7.5 and boiled for 5 min. Control samples were digested with PNGase F (1 μl, 5 557 mU, QA-Bio) For Pngase-003341 the final enzyme concentration was 1 μM. Samples were incubated 558 for 12 h at 37 °C. 100 μl of DI water was added to dilute the sample before MALDI-MS analysis using 559 a Bruker Auto-flex Speed (Bruker Daltonics, Bremen, Germany). The spectrometer was operated in 560 positive ion mode. Spectra were acquired in the mass range 900-3500 m/z at a laser intensity of 50%. 561 The Mass Spectrometry (MS) data were further processed using Flex analysis3.5; sample preparation 562 was as follows 0.5 μl of Super-DHB matrix (50 mg/mL in (50:50 [v/v] H2O: acetonitrile)), was spotted 563 on a ground steel target, 0.5 μl the sample was added on top and allowed to dry. 20 μl of aliquots of the 564 diluted samples were dried down by vacuum centrifugation and labelled with procainamide prior to 565 UHPLC-MS analysis. 566 567 Isolation of plant complex N-glycans from soya proteins. A total of 20 g of soya protein isolate were 568 processed as follows: 2.5 g of soy protein isolate were placed in 50 mL centrifuge tubes, for a total of 569 eight tubes. To each tube 25 mL,50 mM NaH2PO4-Na2HPO4 pH 6.0, 0.05 % NaN3 buffer was added 570 (1:10 solid-liquid ratio), and the samples were denatured by boiling for 5 min at 100 °C. The tubes were 571 allowed to cool down and 60 µL of PNGaseL (2 mg/mL) were added and allowed to incubate for 2d at 572 37 °C. After release samples were centrifuged (30 min 2500 g). and the pellet was washed thrice with 573 DI water (20 mL), supernatants were combined and concentrated by rotary evaporation to 20 mL. 574 Acetone was added to achieve a concentration of 50% (v/v) and allowed to cool at -20 °C for 1h to 575 precipitate proteins, the supernatant was separated by centrifugation (30 min X 2500 g). The pellet was 576 washed twice with 10 mL of 50% acetone, and the washings were combined with the supernatant and 577 concentrated to dryness using a rotary evaporator. The dry residue was resuspended in 5 mL of H2O High-mannose N-glycans (HMNGs) have mannose sugars decorating both arms usually to give a total of between 5 and 9 mannose sugars, dubbed Man5 and Man9, respectively, for example. HMNGs do not vary between different organisms, whereas complex N-glycans do have differences according to the source. In mammals, complex N-glycans have LacNAc disaccharides (Galβ1,4GlcNAc) attached to the mannose arms through a β1,2-linkage. The galactose sugars are typically decorated with sialic acids, but these can also decorate the antenna GlcNAc. Complex N-glycans can have addition antenna through a β1,4-linkage on the α1,3-mannose arm and a β1,6linkage on the α1,6-mannose arm, to produce tri-and tetra-antennary structures, respectively. α1,6-fucose is a common decoration on the first core GlcNAc in mammals, but α1,3/4-linked fucose is also found to decorate the antenna GlcNAc. In contrast, Plant N-glycans typically have Lewis A epitopes as their antenna, a core α1,3-fucose, and a bisecting β-1,2-xylose. Insect N-glycan structures typically have both α1,3and α1,6-fucose decorating the core GlcNAc. (B) The active sites of the two PNGases from Elizabethkingia meningoseptica. The key active site residues are shown as sticks and chitobiose is present in the PNGaseF/EMTypeI structure. (C) Phylogenetic tree of the PNGase enzymes from Bacteroides species, which broadly split into two groups. The members of Group 1 have quite variable identity between them, as low as 52 % in one instance, but generally between 67-99 %. The members of Group 2 have 75-97 % identity between them. (D) A sequence alignment to show residues that are key to the specificity of accommodating the α1,3-fucose typical of plant N-glycans. The residue blocking the α1,3-fucose in the Group I PNGases is highlighted in blue (E118 in PNGaseF/EMTypeI), the glycine replacing this residue in the Group 2 PNGases is highlighted in pink (G380 in EMTypeII), and the glutamic acid replacing the function of E118 is highlighted in green(E419 in EMTypeII). Starch catabolism by a prominent human gut symbiont is directed by the recognition of amylose helices How glycan metabolism shapes the human gut microbiota Biological roles of glycans N-glycoprotein biosynthesis in plants: recent developments and future trends Analysis of Asn-linked glycans from vegetable foodstuffs: widespread occurrence of Lewis a, core alpha1,3-linked fucose and xylose substitutions Role of complex asparagine-linked glycans in the allergenicity of plant glycoproteins Rapid isolation, characterization, and glycan analysis of Cup a 1, the major allergen of Arizona cypress (Cupressus arizonica) pollen Immunoreactivity in mammals of two typical plant glyco-epitopes, core alpha(1,3)-fucose and core xylose Complex N-glycan breakdown by gut Bacteroides involves an extensive enzymatic apparatus encoded by multiple co-regulated genetic loci Complex pectin metabolism by gut bacteria reveals novel catalytic functions Identification and characterization of a novel prokaryotic peptide: Nglycosidase from Elizabethkingia meningoseptica Human gut Bacteroidetes can utilize yeast mannan through a selfish mechanism Efficient utilization of complex N-linked glycans is a selective advantage for Bacteroides fragilis in extraintestinal infections The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities Bacteroides thetaiotaomicron generates diverse α-mannosidase activities through subtle evolution of a distal substrate-binding motif Mechanistic insights into a Ca2+-dependent family of alphamannosidases in a human gut symbiont Molecular Characterization of N-glycan Degradation and Transport in Streptococcus pneumoniae and Its Contribution to Virulence A bacterial glycosidase enables mannose-6-phosphate modification and improved cellular uptake of yeast-produced recombinant human lysosomal enzymes Enterococcus faecalis alpha 1-2-mannosidase (EfMan-I): an efficient catalyst for glycoprotein N-glycan modification Identification and characterization of a core fucosidase from the bacterium Elizabethkingia meningoseptica Degradation pathway of plant complex-type N-glycans: identification and characterization of a key α1,3-fucosidase from glycoside hydrolase family 29 The N-Glycan cluster from Xanthomonas campestris pv. campestris: a toolbox for sequential plant N-glycan processing N-glycans of bovine submaxillary mucin contain core-fucosylated and sulfated glycans but not sialylated glycans Hemocytes and plasma of the eastern oyster (Crassostrea virginica) display a diverse repertoire of sulfated and blood group A-modified N-glycans Structural analysis of N-glycans in chicken trachea and lung reveals potential receptors of chicken influenza viruses Isomeric Separation and Recognition of Anionic and Zwitterionic Nglycans from Royal Jelly Glycoproteins Insects as human food; from farm to fork Molecular farming -The slope of enlightenment Neutralizing Antibody Therapeutics for COVID-19 The "less-is-more" in therapeutic antibodies: Afucosylated anti-cancer antibodies with enhanced antibody-dependent cellular cytotoxicity A discrete genetic locus confers xyloglucan metabolism in select human gut Bacteroidetes A Dietary Fiber-Deprived Gut Microbiota Degrades the Colonic Mucus Barrier and Enhances Pathogen Susceptibility Bacteria of the human gut microbiome catabolize red seaweed glycans with carbohydrate-active enzyme updates from extrinsic microbes Key residues in subsite F play a critical role in the activity of Pseudomonas fluorescens subspecies cellulosa xylanase A against xylooligosaccharides but not against highly polymeric substrates such as xylan Protein Identification and Analysis Tools on the ExPASy Server. The Proteomics Protocols Handbook Thin-layer chromatography for the analysis of glycosaminoglycan oligosaccharides GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans Analysis of N-glycans from Raphanus sativus Cultivars Using PNGase H Discovery of Highly Active Recombinant PNGase H(+) Variants Through the Rational Exploration of Unstudied Acidobacterial Genomes DIALS: implementation and evaluation of a new integration package Decision making in xia2 Scaling and assessment of data quality Anonymous (1994) The CCP4 suite: programs for protein crystallography REFMAC5 for the refinement of macromolecular crystal structures Features and development of Coot MolProbity: all-atom structure validation for macromolecular crystallography Anonymous (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC. ) SignalP 5.0 improves signal peptide predictions using deep neural networks Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega IMG: the Integrated Microbial Genomes database and comparative analysis system The carbohydrate-active enzymes database (CAZy) in 2013 SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building The Pfam protein families database in 2019 20 years of the SMART protein domain annotation resource SMART: recent updates, new developments and status in 2015 were collected at the synchrotron beamlines I03 and I04 of Diamond light source (Didcot, UK) at a 614 temperature of 100 K. The data set was integrated with dials (40) or XDS (41) via XIA (42) and scaled 615with Aimless (43) . The space group was confirmed with Pointless. The phase problem was solved by 616 molecular replacement with Phaser(44) using PDB file 2WVX and 4R4Z as search models for 617 B035DRAFT_03340 PNGase and B035DRAFT_03341 GH92 respectively. While the initial solution 618Rfactors were very poor (over 50%) for B035DRAFT_03340 PNGase the electron density map was 619interpretable. The automated model building program task CCP4build on CCP4cloud(45) delivered a 620 model with Rfactors below 20 %. The models were refined with refmac (46) and manual model building 621with COOT (47