key: cord-0004586-ob5igsv5 authors: Kizziah, James L.; Manning, Keith A.; Dearborn, Altaira D.; Dokland, Terje title: Structure of the host cell recognition and penetration machinery of a Staphylococcus aureus bacteriophage date: 2020-02-18 journal: PLoS Pathog DOI: 10.1371/journal.ppat.1008314 sha: 3e7042ac51bcdbd64db05d998292e8ed45ebfc07 doc_id: 4586 cord_uid: ob5igsv5 Staphylococcus aureus is a common cause of infections in humans. The emergence of virulent, antibiotic-resistant strains of S. aureus is a significant public health concern. Most virulence and resistance factors in S. aureus are encoded by mobile genetic elements, and transduction by bacteriophages represents the main mechanism for horizontal gene transfer. The baseplate is a specialized structure at the tip of bacteriophage tails that plays key roles in host recognition, cell wall penetration, and DNA ejection. We have used high-resolution cryo-electron microscopy to determine the structure of the S. aureus bacteriophage 80α baseplate at 3.75 Å resolution, allowing atomic models to be built for most of the major tail and baseplate proteins, including two tail fibers, the receptor binding protein, and part of the tape measure protein. Our structure provides a structural basis for understanding host recognition, cell wall penetration and DNA ejection in viruses infecting Gram-positive bacteria. Comparison to other phages demonstrates the modular design of baseplate proteins, and the adaptations to the host that take place during the evolution of staphylococci and other pathogens. Introduction Staphylococcus aureus is a Gram-positive bacterium and opportunistic pathogen that colonizes the nasal cavities of �30% of the population, increasing the risk of pathogenic infection, especially in the clinical setting [1] . Almost 120,000 cases of S. aureus blood stream infections occurred in the United States in 2017, with 20,000 associated deaths [2] . S. aureus resistant to methicillin (MRSA) and other antibiotics have become a major public health concern. Most resistance and virulence factors in S. aureus are encoded on mobile genetic elements (MGEs), including bacteriophages (phages) and genomic islands [3, 4] . Given the rarity of genetic conjugation loci and restriction against transformation, transduction via bacteriophages is considered the main mode of horizontal gene transfer of MGEs in S. aureus [5] . Some staphylococcal bacteriophages carry virulence factor genes in their genomes. Many phages are also capable of transferring unrelated genetic material through generalized transduction. Furthermore, certain "helper" phages are involved in highly specific, high frequency mobilization of S. aureus pathogenicity islands (SaPIs) and similar genetic elements [6, 7] . The underlying mechanisms of horizontal gene transfer and the ability of a particular phage to transfer genetic material to a new host are therefore central to genetic diversification and the emergence of novel pathogenic strains in S. aureus. Bacteriophages of many Firmicutes, including S. aureus, interact with wall teichoic acid (WTA), a variable carbohydrate polymer present on the surface of most Gram-positive cells [8] . Most strains of S. aureus have a unique type of WTA that is distinct from that of other (coagulase-negative) staphylococci (CoNS) and generally blocks infection by CoNS-specific phages [9] . A key determinant of host specificity is provided by the phage tail tip complex (TTC) or more elaborate baseplate, specialized structures at the tips of phage tails that are usually the first point of contact between the phage and its host [10, 11] . Enzymatic activities associated with TTCs and baseplates degrade the cell wall, ultimately leading to penetration of the plasma membrane and ejection of the encapsidated DNA into the cell. Tailed, double-stranded DNA phages (order Caudovirales) are divided into three families based on the structures of their tails: long and flexuous (Siphoviridae); long and contractile (Myoviridae); or short (Podoviridae) [12] . TTCs and baseplates from these diverse groups vary greatly in complexity, but tend to share a common set of structural modules, combined with a variable set of host interaction proteins, including tail fibers and receptor binding proteins. The best-described bacteriophage baseplate structures are those of the lactococcal siphoviruses TP901-1, Tuc2009, and p2, and the Escherichia coli myovirus T4, for which various structures have been produced [11, 13] . In contrast, little structural information is available for baseplates from staphylococcal bacteriophages. Phage 80α is a typical staphylococcal siphovirus with a 43.8 kbp genome and a 63 nm icosahedral capsid, attached to a 190 nm long flexuous tail that is capped by an ornate baseplate [14] [15] [16] . 80α is one of the best-described S. aureus phages, and is closely related to ϕETA2 and other phages involved in specifying bacterial pathogenicity [14] . 80α serves as the mobilizing phage for several SaPIs [14] . The tail and baseplate gene cluster in 80α includes open reading frames (ORFs) 53-68, which are part of the large late operon (ORFs 40-73) that encodes all structural proteins, the terminase proteins involved in DNA packaging, and the lysis proteins ( Fig 1A and 1B) . Here, we have determined the cryo-EM structure of the 80α baseplate at 3.75 Å resolution, allowing most of the baseplate proteins and part of the tail to be modeled in atomic detail. This is the first high-resolution structure of a baseplate from a staphylococcal siphovirus and of a siphovirus tail. The 80α baseplate structure has several unique features, including the presence of three tail fiber/receptor binding proteins, but also displays striking similarities to other phage baseplates, indicating a mix-and-match strategy for baseplate evolution across the phages of the Firmicutes. Our structure contributes to a broader understanding of the diversity of the cell recognition and penetration machinery of siphoviruses infecting Gram-positive bacteria and how these structures access the cell interior and trigger DNA ejection during infection. 80α tails were produced using an 80α lysogen (strain ST247) with a deletion of the first 13 amino acids of the scaffolding protein (gp46), which results in failure to form capsids and accumulation of tails in the lysate (Fig 1C) [16] . Initial cryo-EM datasets of up to �7,000 baseplate particle images were collected with a Philips CM20 microscope on SO-163 film or using an FEI Titan Krios microscope equipped with a Direct Electron DE-20 detector, and reconstructed with sixfold (C6) symmetry to 9.3 Å resolution (FSC 0.5 ) using EMAN and EMAN2 [17] . A larger dataset was subsequently collected with the Titan Krios and DE-20 detector, and processed with MotionCor2, Gctf and RELION-2 [18] , yielding a final C6 reconstruction at 3.75 Å resolution (FSC 0.143 ) from 74,294 particle images (Fig 1D-1F , S1 and S2 Figs). A reconstruction of the same data with application of only threefold (C3) symmetry was generated with RELION-3 [19] and reached 4.07 Å resolution (Fig 1G, S1 and S2 Figs) . The baseplate is 298 Å wide and 271 Å tall and consists of six peripheral structures surrounding a core that represents a continuation of the tail (Fig 2A and 2B ). The tail itself is 86 Å wide and consists of �39 hexameric rings with an inter-ring spacing of 41 Å, enclosing a 42 Å wide lumen (Fig 2B) . Three of the tail rings are embedded within the baseplate itself. The peripheral structures consist of two large rings surrounding the embedded part of the tail, and six trefoiled, club-shaped features, about 85 Å wide and 124 Å long, connected to the baseplate core via six kinked stems. At lower cutoff level, six fibers can be seen to extend from the lower ring and wrap around the outside of the baseplate, terminating in globular densities near the baseplate tip (Fig 2C) . At least seven tail rings can be discerned, and a rod-like protrusion extends from the tip of the baseplate. Additional, noisy densities, disconnected from the rest of the baseplate, appear below this protruding rod. 2D class averages that represent end-on views of the baseplate (Fig 1E) suggested the presence of six thin fibers spread out radially in the plane of the ice, but these features were not observed in the 3D reconstruction. ORF 53, encoding gene product (gp) 53, was previously identified as the major tail protein (MTP) of 80α. Likewise, gp57 could be identified as the tape measure protein (TMP), typically Table 1 ) [14, 20] . Other ORFs could be assigned by comparison with the closely related phage ϕ11 [21] , for which HHpred analysis [22] had identified homologs in the PDB structure database. These ORFs include the distal tail protein (Dit, gp58), which comprises the baseplate "hub", the tail-associated lysin (Tal, gp59), comprising the tip of the tail, and the receptor binding protein (RBP, gp61) (Fig 1A and 1B ; Table 1 ). The N-terminal part of ϕ11 gp54, corresponding to 80α gp62, was shown to have some similarity with upper baseplate protein (BppU) of lactococcal phage TP901-1 [21, 23] . The crystal structure of ϕ11 RBP (gp45), which is 97% identical to the 80α RBP, was recently determined [24] . Crystal structures have also been determined for Bacillus phage SPP1 Dit [25] and the baseplates of lactococcal phages p2 and TP901-1, which include Dit, Tal, BppU, and receptor binding proteins that are not similar to the ϕ11 RBP [23, 26] . We carried out HHpred analysis on 80α proteins gp53, gp57, gp58, gp59, gp61, gp62, gp67 and gp68, which identified additional homologies compared to ϕ11, due to server improvements [22] and the presence of more structures in the database ( Table 2 ). Most importantly, gp62 was identified as a tail fiber (FibL) with coiled-coil structure, while gp68 was predicted as a collagen-like fiber protein (FibU). gp67 encodes a predicted cell wall hydrolase (Hyd) that is 97% identical to ϕ11 gp49, which was previously shown to exhibit muralytic activity [27] . Gp67 was the only putative baseplate protein that could not be identified in the baseplate reconstruction, presumably because it is disordered and/or present at low occupancy, consistent with previous MS analysis [28] . Atomic models for MTP (gp53), Dit (gp58), RBP (gp61), and parts of Tal (gp59), FibL (gp62), FibU (gp68) and TMP (gp57) were built into either the C6 or C3 baseplate reconstructions, using I-TASSER models as starting points [29] , complemented with real-space refinement in PHENIX [30] . The RBP, FibL and FibU models were further improved by refinement into focused reconstructions (see Methods). In total, 72 polypeptides comprising 7 different proteins ( Table 1) were included in the final model ( Fig 2D-2F, S3 Fig) . The FSC curves between the final models and the corresponding reconstructions are shown in S1 Fig. The major tail proteins of siphoviruses, tail tube proteins of myoviruses, and the Hcp1 tubeforming proteins of type 6 secretion systems (T6SS) share a common fold, known as the tail tube fold [31] . In addition, the tail tube fold has been identified in various baseplate proteins, tail terminator proteins, and the T4 capsid assembly protease, reflecting their evolutionary relationship [32] . HHpred analysis of 80α gp53 yielded >98% probability matches to major tail proteins of several siphoviruses, including SPP1 gp17.1 and λ gpV [33, 34] , and lower probability matches to two T6SS tube-forming proteins ( Table 2 ). Six copies of MTP were built into each of the two most well defined tail rings closest to the baseplate. Of the 193 residues of MTP, residues 4-169 were modeled in the lower ring (denoted MTP L ) and 4-150 into the upper ring (MTP U ) (Figs 2E and 3A). Each MTP subunit includes two four-stranded β-sheets folded into a β-sandwich and flanked by an α-helix ( Fig 3A) . In a complete tail ring, one of these β-sheets forms a 24-stranded, highly negatively charged β-barrel-42 Å in diameter-lining the tail lumen, presumably providing a slippery tube for the DNA during infection (Figs 2B and 3B ). This luminal β-sheet contains an extended insertion loop between β2 and β3 (the "stacking loop") that reaches across the �3 Å space that separates tail rings and inserts into a pocket between two subunits in the adjacent ring ( Fig 3A) . The equivalent loop in the major tail proteins of phages λ, T5 and SPP1 was shown to play an essential role in tail polymerization [33] [34] [35] . An additional inter-ring contact is mediated by the C-terminus of MTP (the "C-arm"), which extends in the opposite direction to the stacking loop and fills a crevice on the exterior surface of the adjacent subunit ( Fig 3A) . No equivalent contact has been seen in other systems. The major tail proteins of phages λ, SPP1 and T5 instead have an additional C-terminal immunoglobulin (Ig)-like domain [33] [34] [35] . Overall, the tail tube domain of 80α MTP differs from Table 2 . HHpred analysis of 80α baseplate protein sequences. The most relevant hits are shown for each protein, with matched residues, PDB ID and chain identifier, HHpred probability (%), E-value and percent sequence identity in matched region. those of λ gpV and SPP1 gp17.1 with Cα RMSD values of 4.2 Å and 5.6 Å, respectively, between the most well-matched residues (S1 Table, The "distal tail protein" (Dit) constitutes the hub of the baseplate, and is highly conserved between TTCs and baseplates of siphoviruses [32] . HHpred matched 80α gp58 to the Dit proteins of several phages with �99.5% probability, including SPP1 gp19.1, TP901-1 ORF46 and p2 ORF15 (Table 2 ) [23, 25, 26] . Dit forms a connection between the sixfold symmetric tail and the threefold symmetric tail tip (Tal, see below), and provides an attachment site for the six RBP trimers (Fig 2D-2F ). Six copies of Dit were built into the hexameric ring of density in the baseplate core just below MTP L (Figs 2E and 3A) . Dit consists of two domains: an N-terminal domain (NTD, residues 1-166) with a tail tube fold, and a C-terminal domain (CTD, residues 167-315) with a galectin-like fold (Fig 3A) . Both Dit domains are conserved between 80α and SPP1, TP901-1 and p2 [23, 25, 26] (S4D- S4F Fig and S1 Table) . The interior surface of the Dit NTD forms a β-barrel that constitutes a continuation of the tail lumen (Figs 2B and 3B ). The Dit NTD lacks a C-arm equivalent to that of MTP; instead, an extended insertion loop (the "tail binding loop") between β4 and β5 of Dit fits into the same crevice on MTP L that the C-arm of MTP L occupies on MTP U (Fig 3A and 3C ). This topologically distinct Dit loop contains a triplet of residues (Y100, R101, F102) that matches a corresponding triplet in the MTP C-arm (Y159, R160, F161), a striking case of convergent evolution that has not been observed in other systems ( Fig 3C) . The Dit CTD has an insertion (the "RBP binding loop", residues 182-197) that serves as the attachment site for the RBP trimer (Fig 3D and 3E ) that is not present in SPP1 and TP901-1. Several aromatic residues in this loop insert into hydrophobic invaginations in the three-helix bundle that makes up the N-terminal end of RBP ( Fig 3E, The Dit NTD includes a stacking loop equivalent to that of MTP-previously described as the "belt extension" in SPP1 [25] -that interacts with Tal ( Fig 3A, 3F and 3G ). The CTDs from all six Dit subunits form a "crown" that grips the outside of the Tal trimer ( Fig 3F, see below) . The interaction between Dit and Tal is primarily hydrophobic in nature (Fig 3G) . HHpred analysis matched the NTD of 80α gp59, consisting approximately of residues 1-350, to several proteins related to the gp27 baseplate hub protein from E. coli phage T4 (Table 2 ) [32] . These proteins are trimers that contain two tail tube folds, resulting in a quasi-C6 structure. The CTD of gp59, comprising residues 390-633, matched various esterases and lipases of the GDSL-hydrolase family and the SGNH-hydrolase subfamily [21] (Table 2) , suggesting a The surface is colored from -10 (red) to +10 (blue) kcal/(mol � e) according to the color bar. (C) Superposition of MTP U onto MTP L (gray). MTP L (tan) shifted by the same amount then superimposes on Dit (red). The expanded view shows the superposition of the Dit tail binding loop (red) with the MTP C-arm (tan). The side chains of the triplet of residues conserved between the C-arm of MTP (Y159,R160,F161) and the tail binding loop in Dit (Y100,R101, F102) are shown in stick representation and labeled. (D) Detail of the interaction between Dit (red) and the RBP coiled-coil stem region (purple), showing the RBP binding loop in the Dit CTD. (E) Same view as D with RBP shown as a van der Waals surface colored from most hydrophilic (tan) to most hydrophobic (purple), according to the Kyte-Doolittle scale. The rotated view shows the N-terminal end of the RBP trimer and its interaction with the RBP binding loop, with side chains shown in stick representation. (F) Surface representation of Dit (red) and Tal (pink), viewed from the side. (G) Surface representations of Dit (top) and Tal (bottom) rotated 90˚in opposite directions to show the interacting surfaces. One subunit each of Dit (red), Tal (pink) and TMP (green) is shown in solid color, the rest are colored by hydrophobicity as in E. https://doi.org/10.1371/journal.ppat.1008314.g003 role in cell wall degradation. For consistency with what was previously proposed for ϕ11 [21] , we will refer to this protein as Tal, for "tail-associated lysin," although its enzymatic activity has not yet been established. The Tal NTD was initially fitted as a C3 trimer into the cap-like structure at the bottom of the baseplate, directly below the Dit hexamer in the C6 symmetric reconstruction (Fig 2A) . Due to the quasi-sixfold nature of the Tal trimer, it matched quite well overall, but much of the density was blurred due to the superposition of non-identical Tal orientations. To obtain a reconstruction with only threefold symmetry imposed, we first subtracted all density not associated with Tal from the baseplate images. These signal-subtracted images were then subjected to symmetry expansion and 3D classification to group together consistently oriented Tal trimers within the larger C3 asymmetric unit. These orientations were used to generate a reconstruction of the whole baseplate with C3 symmetry, which reached a resolution of 4.07 Å ( Fig 1G, S1 and S2 Figs) . In this reconstruction, the density corresponding to the cap and the rod was well resolved (Fig 4A-4C ) and allowed modeling of residues 2-388 of Tal (Fig 4D and 4E) . The disorganized density beyond the rod was presumed to correspond to the Tal CTD (residues 390 to 607), but was not interpretable even in the C3 reconstruction ( Fig 4C) . The Tal NTD (residues 1-331) consists of two tail tube domains (TT1 and TT2) that form a quasi-sixfold adapter to the Dit hexamer (Figs 3F, 3G, 4D and 4E). The two domains merge into a continuous 8-stranded β sheet on one side (Fig 4E) . The two domains are separated by an insertion (Ins1) with a fold related to the type 3 secretion system protein EscC (PDB ID: 3GR5). The second tail tube domain has an additional domain (Ins2) inserted in a position that is topologically equivalent to the β2-β3 stacking loop in MTP (Fig 4E) . Residues 335-348 form an α-helix that extends into the tail lumen. In the Tal trimer, these helices form a twisted tripod that occludes the opening of the lumen (Fig 4D and 4F) . Residues 362-389 from the three subunits make up an α-helical coiled coil that forms the rod and extends to the unresolved CTD (Fig 4D and 4F ). The Tal fold is similar to proteins from systems as distant as E. coli phage Mu, Listeria monocytogenes EGD-e prophage A118 and an E. coli T6SS, with RMSDs ranging from 2.6 Å to 3.3 Å between equivalent Cα atoms (S4G-S4I Fig and S1 Table) . After accounting for the plug and rod helices, there were three additional, well-ordered αhelical densities forming a second tripod interlaced with that of the Tal plug ( Fig 4F) . This density matched the C-terminal residues 1135-1154 of TMP. The TMP triplet fit into a hydrophobic patch on Tal (Fig 4G and 4H ). The localization of the TMP C-terminus at the baseplate agrees with previous research on other phages [36] . Additional density attributable to TMP could be seen inside the tail lumen, but was not interpretable in terms of the atomic structure ( Fig 4B) . 80α gp61 is 97% identical in amino acid sequence to gp45 of the closely related phage ϕ11, for which the crystal structure was previously determined [24] . Gp45 was previously identified as the primary receptor binding protein for ϕ11, with binding affinity for WTA, and was thus denoted RBP [21] . RBP is a trimeric protein consisting of four domains: an N-terminal α-helical coiled-coil "stem" domain (residues 1-142) that is folded by 155˚at a central hinge; a fivebladed β-propeller "platform" domain (residues 143-442) that was previously suggested to bind to the GlcNAc residues of WTA on the host cell surface [21, 24] ; and two highly similar C-terminal "tower" domains (residues 443-636; S1 Table) , each consisting of a bent β-sheet superposed by an α-helix (Fig 5A) . Six such trimers surround the baseplate core, making up the bulk of the observed peripheral structures (Fig 2D) . As expected, the 80α RBP is very similar to ϕ11 gp45 [24] (Fig 5B, S4J Fig; S1 Table) . The greatest difference between the 80α and ϕ11 RBPs is in the orientation of the stem domains relative to the platform and tower domains ( Fig 5B) . As noted above, the N-termini of the coiled-coil domains from each RBP trimer connect to the RBP binding loop of Dit (Fig 3D and 3E) . The RBP hinge interacts with a ring formed by the FibL trimers (Figs 2D and 5D, see below) . Deletion of ORF61 led to a complete loss of peripheral structures and of infectivity, although tails still assembled normally (S5 Fig). The structure of RBP is remarkably similar to the "tail fiber" (gp17) of the distantly related staphylococcal podovirus P68 [37] (Fig 5C, S1 Table) , presumably reflecting specificity for S. aureus surface structures. However, in P68 gp17, the stem domain is less bent at the hinge compared to 80α (66˚vs. 155˚) (Fig 5C) . In addition, since P68 lacks a Dit equivalent, the gp17 stem domain is instead attached to the dodecameric portal protein. Consequently, P68 has twelve gp17 trimers, compared to the six RBP trimers in 80α. Gp62 is highly modular, with HHpred matches to several distinct phage protein structures ( Table 2 ). The first 205 residues matched the N-terminal Ig-like domain and α-helical coiledcoil domain of TP901-1 BppU [23] . The central part of gp62 (residues 146-279) matched several α-helical coiled-coil structures at lower probability (�93.8%; Table 2 ). Residues 252-489 were matched to the tower domains from ϕ11 RBP, with lower probability matches of residues 267-366 and 405-491 to a similar domain from a Pseudomonas aeruginosa R2 pyocin fiber protein ( Table 2 ). The C-terminal residues of FibL (387-607) matched the two C-terminal domains of the gp68 receptor binding protein from S. aureus phage K (PDB ID: 5M9F), which comprise an RBP-like tower domain and a distinct β-sandwich domain [38] . Because of the fibrous coiled-coil domain and its formation of the lower of the two exterior baseplate rings, the protein was designated FibL, for lower tail fiber. Six trimers of the N-terminal Ig-like domain of FibL were built into the lower of the external rings surrounding the baseplate (Figs 2D-2F and 5D-5F). The two proximal subunits (p1 and p2) from each trimer contribute to the ring itself, while the third, distal subunit (d) is attached on the outside of the ring. Each subunit in the trimer contributes an α-helix to the subsequent coiled-coil domain that extends to residue 169 for subunit p2, which contributes the top α-helix, and to 193 and 199 for the p1 and d subunits, respectively (Fig 5E) . This organization is similar to that of the BppU protein of TP901-1, except that in this phage, one of the α-helices in the coiled coil originates from a subunit in the adjacent trimer [23] (S4L Fig). Due to the differing disposition of the N-terminal domains, the residues of the α-helical domains are shifted relative to one another, inducing asymmetry within the coiled coil (Fig 5D) . The FibL ring is not closely associated with either the Dit or MTP hexamers. Although residues 11 and 12 of one of the FibL subunits approach the β4-β5 loop of MTP U by �4 Å, most of MTP U is separated from the FibL ring by �15-20 Å (Fig 5F) . However, the bottom of the Iglike fold of FibL associates closely with the hinge domain of the RBP stem region (Fig 5D and 5E ). This is consistent with the observation that deletion of the gene encoding FibL (ORF62) led to a complete loss of peripheral structures, including RBP (S5 Fig), suggesting that the interaction between FibL and RBP is co-stabilizing. A similar phenotype was observed when the corresponding gene (ORF54) was deleted in ϕ11 [39] . Beyond residue �180, the density for the FibL coiled-coil and C-terminal domains was attenuated, likely due to flexibility in the coiled-coil domain (Fig 2A) . At lower cutoff level, however, FibL density could be seen to wrap around the peripheral structures (Fig 2C) . A new reconstruction was made from modified images created by subtracting the density (Fig 5G) . This reconstruction reached 5.0 Å resolution (S1 and S2 Figs). In this map, FibL could be seen as an elongated structure with a globular "knee", extending beyond the tips of the RBP trimers, terminating in two additional globular densities. While this map was not fully interpretable in terms of the atomic structure of the proteins, models for the remaining domains of FibL based on the HHpred predictions were rigid body fitted into the density and connected to provide a complete pseudo-atomic model for FibL (Figs 2F and 5H ). In this model, the three coiled-coil α-helices continue with no twist between residues 180-238. This is followed by an RBP-like tower domain that was fitted into the knee in the middle of the fiber, followed by another, shorter coiled-coil region. At the C-terminus of FibL, the predicted phage K-like tower and β-sandwich domains matched the disordered densities at the bottom of the baseplate (Fig 5G and 5H ). Like FibL, gp68 was predicted to include a BppU-like N-terminal Ig-like domain (99.6% probability; Table 2 ). However, gp68 lacks the α-helical coiled-coil portion of FibL. Instead, residues 178-208 were predicted to match various collagen-like triple helix structures (�95.4% probability), suggesting that the protein forms a collagen-like fiber. The remaining 180 residues did not match any known structure. An I-TASSER model for the gp68 NTD matched the density corresponding to the upper external ring that makes up the peripheral structures of the baseplate. We designated this protein FibU, for upper fiber protein. Despite the similarity between the NTDs of FibU and FibL (21% sequence identity and RMSD = 1.9 Å; S4K Fig and S1 Table) , the two proteins could be unambiguously distinguished based on bulky side chains (S3F and S3G Fig) . Based on its predicted collagen-like domain and similarity to FibL, FibU was expected to form trimers. However, there was only density for twelve copies of FibU in the upper ring, equivalent to the proximal (p1 and p2) subunits of FibL (Fig 5E and 5I ). Like FibL, FibU does not make close contacts with either MTP or Dit, but both FibU subunits contact the distal subunit of FibL ( Fig 5D and 5E ). FibU was not essential for infectivity, and deletion of the ORF68 gene did not affect the structure of the rest of the baseplate (S5 Fig). No density was observed for the predicted fibrous portion of FibU beyond residue 123 ( Fig 2C) . However, 2D class averages representing top views of the baseplate showed six thin fibers extending 9 nm radially from the baseplate and terminating in a 4 nm knob (Fig 1E) . These features might correspond to FibU fibers interacting with the air-water interface when the baseplate is oriented in the plane of the ice. When the baseplate is oriented laterally (Fig 1D) , the FibU fibers might be disordered and thus not observed in the reconstruction. We have determined the structure of the bacteriophage 80α baseplate at 3.75 Å resolution, allowing atomic models to be built for the major tail protein, part of the tape measure protein, and all of the major baseplate-associated proteins except gp67, the predicted cell wall hydrolase, the C-terminal domain of Tal, and the C-terminal part of FibU (gp68). This is the first high-resolution structure of a baseplate from a staphylococcal siphovirus. While the 80α baseplate shares features with other bacteriophages, the structure is unlike those of the well-octadecamer (colored as in E) and the MTP hexamer underneath (tan). (G) Isosurface (cutoff 3.0 σ) of a reconstruction made from signal-subtracted images excluding density corresponding to MTP, Dit, Tal, RBP and FibU. FibL density is blue. Additional density in the center (gray) probably corresponds to TMP. (H) Model for the complete FibL protein trimer built into the density from G. The lower panel is colored as in E, with structural elements labeled (CC, coiled coil; ϕK is the phage K gp68-like domain). (I) Top view of the dodecameric FibU ring, colored as in E. https://doi.org/10.1371/journal.ppat.1008314.g005 described baseplates of E. coli phage T4 and lactococcal phages p2 and TP901-1 [11, 13] . In particular, the presence of three separate tail fiber/receptor binding proteins and their organization into external rings surrounding the tail are unique features of the 80α baseplate. The Dit hexamer constitutes the hub of the baseplate, forming an adaptor between the sixfold symmetric tail and the threefold symmetric Tal protein, as well as the attachment point for the six RBP trimers. Our identification of the C-terminus of TMP associated with Tal supports the assumption that tail assembly starts from a nucleus containing Dit, Tal, and TMP. The first MTP ring is added through interactions involving the MTP stacking loop and the Dit tail binding loop (Fig 3) . Additional MTP subunits are added through interactions of stacking loops and C-arms until the full length of the TMP is reached [10] . The peripheral structures are not required for this process, since deletions of the genes encoding RBP, FibL or FibU had no effect on tail formation (S5 Fig). The additions of RBP and FibL to the hub appear to be interdependent, whereas FibU is probably the last protein to be added to the baseplate. RBP (gp62) is the main receptor binding protein of 80α, and is essential for infectivity. The virtually identical RBP of ϕ11 was shown to bind WTA [21] , a polymer present in the cell wall of most Gram-positive organisms. WTA serves as receptor for many phages, but is highly variable between different species. Most strains of S. aureus have WTA that is distinct from that of coagulase-negative staphylococci (CoNS) such as Staphylococcus epidermidis, and is made from ribitol phosphate repeating units modified by GlcNAc [8, 9] . However, some pathogenic S. aureus strains belonging to the ST395 lineage express an altered form of WTA that is more similar to that of CoNS, resulting in altered phage susceptibility and immune system evasion [9] . Host specificity is reflected in the structure of the receptor-binding structures of the infecting phages. The distantly related S. aureus-infecting podovirus P68 has a "tail fiber" protein (gp17) that is remarkably similar to the 80α RBP [37] (25% sequence identity), suggesting that this structure is conserved among phages that infect S. aureus. In contrast, phage Andhra, which is structurally very similar to P68, but infects S. epidermidis [40] , has a completely different predicted RBP structure based on a β-helical architecture. Similarly, S. aureus phage ϕ187, which infects the ST395 lineage of S. aureus, lacks an RBP-like protein, but has a tail fiber protein (gp24) that, by HHpred analysis, resembles parts of FibL, including an N-terminal Ig-like domain, a coiled-coil domain, a tower domain, and a C-terminal domain similar to the phage K tail fiber protein, gp68 (S6 Fig). A striking feature of the 80α baseplate is the presence of two tail fiber proteins, FibL and FibU, in addition to RBP. These proteins are probably involved in host interactions, although it is not known whether they bind to WTA or other components of the staphylococcal cell wall. Although FibL was essential for 80α infectivity, this could be because deletion of ORF62 also caused loss of RBP (S5 Fig). FibU was not essential, and deletion of ORF68 had no effect on either infectivity or baseplate integrity (S5 Fig). The two C-terminal domains of FibL are similar to those of phage K gp68 (S6 Fig), a large-genome myovirus that infects S. aureus strains belonging to several lineages [38] . Thus, while FibL and FibU may not be strictly required for infection of RN4220 under standard laboratory conditions, they could facilitate adsorption under certain conditions through low-affinity, reversible binding to surface structures (Fig 6A) . FibL and FibU could also be involved in binding divergent S. aureus strains that express altered WTA. The presence of multiple copies of receptor binding complexes is a common theme among phages of Gram-positive bacteria. 80α has six trimers of RBP, while P68 has twelve [37] , and lactococcal phage TP901-1 has 18 trimers of its BppL receptor binding protein [23] . Presumably, the presence of multiple copies of the receptor binding proteins increases the avidity of the low-affinity interaction with WTA and surface polysaccharides. In contrast, phages like λ and SPP1 that use protein receptors-a high affinity interaction-generally do not have multiple receptor binding protein complexes. The conservation of the FibU and FibL NTDs with proteins from many phages suggests that this fold is optimized for making rings for the attachment of various structures to phage tails [41] . Receptor binding is only the first step in the infection process that is mediated by the baseplate (Fig 6A) . Once bound, enzymatic activities associated with the baseplate are used to degrade the cell wall peptidoglycan and penetrate the cell wall. This process most likely requires conformational changes in the baseplate in order to expose the hydrolase domains of Tal and Hyd (Fig 6B) . We previously observed that treatment of 80α with heat or low pH led to clustering of baseplates, presumably due to exposure of hydrophobic sequences resulting from a conformational change [16] . The observed conformational differences in the stem between the 80α and ϕ11 RBPs and the phage P68 tail fiber (Fig 5B and 5C ) suggest that the hinge might constitute a pivot point that allows the RBPs to rotate. This movement might assist in orienting the baseplate perpendicular to the cell wall. Similarly, in p2, reorientation of its ORF18 receptor binding protein occurred as a consequence of a Ca 2+ -triggered conformational change in Dit [26] . Straightening of the RBP stem might also lead to release of FibL, which is bound to the RBP hinge (Fig 5D) . Once the peptidoglycan layer is degraded and Tal reaches the plasma membrane, the αhelices in the plug and rod must be removed to permit extrusion of the pore-forming TMP (Fig 6C) [36] . This conformational change is fundamentally different from the opening of the p2 baseplate [26] . Membrane penetration has to be signaled to the capsid in order to initiate DNA release and must be tightly regulated to avoid premature DNA ejection. Other studies have revealed no apparent conformational change in the tail tube associated with the infection process in siphoviruses [32] . Even in the myoviruses, tail contraction itself is not the signal that leads to ejection [11, 35] . Our observation of TMP α-helices associated with Tal resolves this conundrum: upon release by Tal, the TMP α-helices might act as a "pull cord" that drags the DNA along during injection (Fig 6C) . Most temperate bacteriophages with long tails (myoviruses and siphoviruses) share common features in structural and genomic organization, reflecting their common evolutionary origin [42] . According to the "modular theory" of phage evolution, phage genomes evolve by exchanging modular units consisting of protein domains, as opposed to whole genes or transcriptional units. In accordance with this principle, the tail and baseplate proteins in 80α are based on a set of conserved modules that is found in a wide variety of phages. In 80α, MTP, Dit and Tal all incorporate the tail tube fold, one of the most highly conserved bacteriophage structural modules [31] . In contrast, the peripheral structures are far less conserved between 80α and other phages, presumably reflecting different adsorption strategies between bacteriophages of different hosts. Nevertheless, the proteins that make up the peripheral structures share common domains with proteins from several bacteriophages, reflecting the modular evolution of these proteins. Modules such as Ig-like N-terminal domains, RBP tower domains and phage K tail fiber CTDs are repeated across phages from a broad range of hosts (S6 Fig) . Our study provides a structural basis for understanding host specificity and the infection process in staphylococcal bacteriophages. Since bacteriophages are the primary mediators of horizontal gene transfer in staphylococci, this process is critical to the dissemination of virulence factors and the evolution of pathogenicity in S. aureus and other pathogens. The ability to infect a variety of strains and species will also be important for the development of effective therapeutic strategies utilizing phages. 80α tails were produced by mitomycin C induction of S. aureus strain ST247, which is an 80α lysogen with a deletion of residues 1-13 from the scaffolding protein, gp46 [16] . The tails were purified from the lysate by PEG precipitation and CsCl gradient purification, as previously described. Cryo-EM samples were prepared on nickel Quantifoil R2/1 grids, as previously described [15] . An initial cryo-EM data set consisting of 51 images was collected on SO-163 film on a Philips CM20 FEG microscope. The films were scanned with a Nikon 9000ED film scanner at 4000 dpi, corresponding to 1.27 Å/pixel, and used to generate the initial model for reconstruction. High-resolution data was collected on an FEI Titan Krios microscope equipped with a Direct Electron DE-20 detector at the SECM4 consortium at Florida State University. A smaller dataset of 715 micrographs and a larger dataset of 6,474 micrographs were collected at 29,000x magnification, corresponding to 1.21 Å/pixel in the specimen, and with typical defocus of 1.0-3.0 μm. An initial C6 starting model was generated from 4,180 particles picked from the film data, then used to reconstruct and refine 6,906 particles picked from the smaller FSU dataset, using EMAN 1.9 and EMAN 2.2 [17] . The resulting 9.3 Å resolution (FSC 0.143 ) reconstruction was in turn low-pass filtered and used as an initial model in the reconstruction of the larger FSU dataset in RELION-2.1 [18] . 515,106 particles were picked from the 6,474 micrographs following frame alignment with dose weighting using MotionCor2 and CTF correction using Gctf, all from within RELION. After extensive 2D and 3D classification, the final reconstruction was calculated from 74,294 particles using the gold standard approach and assuming C6 symmetry throughout the reconstruction process. Following mask generation and map sharpening in RELION, the overall estimated resolution reached 3.75 Å (FSC 0.143 ) (Fig 1G and S1 Fig) , ranging from 3.1 Å in the baseplate core to >6.0 Å in peripheral parts of the fibers (S2 Fig). Generation of the C3 reconstruction used for modeling Tal and Dit required separation of the non-identical orientations of the C3 Tal trimer superimposed as a result of the initial assumption of C6 symmetry. The C6 reconstruction was segmented in UCSF Chimera [43] and a mask generated from all non-Tal density was used in subtracting the non-Tal density from the particle images using RELION-3.0 [19] . The resulting signal-subtracted images were C6 symmetry-expanded using relion_particle_symmetry_expand to create three copies of each particle where the Tal trimer, regardless of initial orientation, was superimposed with the 60r otated non-identical orientation. Masked 3D classification in RELION-3.0 of the expanded dataset using the Tal density from the C6 reconstruction as a reference yielded two good classes related by a 60˚rotation and excluding particles with suboptimal Tal density. The more populated class was cleaned of duplicate particles, the orientations were transferred to the original, unsubtracted dataset, and the particles were reconstructed assuming C3 symmetry using RELION-3.0 with the C6 baseplate reconstruction as an initial model. Following post-processing, the resolution of the C3 reconstruction reached 4.07 Å resolution (FSC 0.143 , Fig 1G) . Focused reconstructions for RBP and the N-terminal parts of FibL and FibU were done in the same way, but without symmetry expansion and with the application of C6 symmetry to optimize the resolution of each for atomic model refinement. Reconstruction of signal-subtracted images lacking all non-FibL density (Fig 5G) in RELION-3.0 allowed assembly of a full-length FibL pseudo-atomic model by rigid-body fitting matched structures from HHpred to the density. These models were manually mutated to the FibL sequence and joined together and to the refined FibL NTD structure (Fig 5H) . The FSC curves and ResMap analysis for the various reconstructions are shown in S1 and S2 Fig, respectively. The initial model for RBP was adapted from the ϕ11 RBP (PDB ID: 5EFV) [24] . Initial models for the remaining proteins were generated using I-TASSER [29] . Those best matching the baseplate reconstruction were fitted into the density using UCSF Chimera [43] . Changes to the protein sequence, extension of the models, and manual adjustments to the protein backbone, including local real-space refinement with secondary structure, torsion angle, rotameric, and Ramachandran restraints, were made in Coot [44] . The positions of large aromatic and basic side chains, as well as predicted secondary structures, were compared to the baseplate protein sequences and initial models to confirm the identity of proteins in the baseplate density and to correctly place the models during manual adjustment. The models were iteratively refined using real-space refinement in PHENIX [30] , followed by manual correction and local realspace refinement in Coot guided by geometry reports from MolProbity [45] and EMRinger [46] . All models except Tal and TMP were initially refined into the C6 reconstruction. The final Dit, Tal and TMP models were refined in the C3 reconstructions, while RBP, FibU and FibL were further refined in their respective focused reconstructions. All reconstructions were autosharpened in PHENIX for the final stages of model building. Simulated annealing, rigid body fitting, and morphing were included in early cycles of real-space refinement in PHENIX. Secondary structure restraints and non-crystallographic symmetry restraints generated from the models by PHENIX were included throughout. The model statistics for each protein are listed in S2 Table and the quality of the fit (FSC) is shown in S1 Fig. Model and map visualization was done in UCSF Chimera [43] . For each refinement, the gold-standard half-map FSC plots for the appropriate maps, calculated in RELION (green curve) and in PHENIX (red curve) are shown. The blue curve represents the corresponding model-to-map FSC plots, calculated in PHENIX. (TIF) The predicted structure of the ϕ187 tail fiber is also shown. Domains are labeled: CC, coiled coil; P, β-propeller platform domain; T, tower domain; K, phage K gp68 CTD; Ig, immunoglobulin-like domain; βH, beta-helix; col, collagen-like helix. Protein lengths and relevant residue numbers are indicated. (TIF) S1 Table. Comparison of proteins. The 80α reference protein and residues selected are listed in columns 1 and 2, the matched proteins and residues in columns 3 and 4. (Unless otherwise indicated, 80α proteins are implied.) The first RMSD value is between Cα atoms in all residues considered equivalent by sequence alignment. The number or residues used is listed. The second RMSD value is after pruning to remove atom pairs that are more distant than the indicated cutoff level. The number of residues compared is listed. The percentage is the fraction of equivalent residues used in the pruned set compared to the unpruned set. All data were generated using the MatchMaker function in UCSF Chimera. (PDF) S2 Table. Reconstruction and modeling statistics. (PDF) Pathogenesis of methicillin-resistant Staphylococcus aureus infection Vital Signs: Epidemiology and Recent Trends in Methicillin-Resistant and in Methicillin-Susceptible Staphylococcus aureus Bloodstream Infections-United States Staphylococcus aureus genomics and the impact of horizontal gene transfer Mobile genetic elements of Staphylococcus aureus Phages of Staphylococcus aureus and their impact on host evolution Pirates of the Caudovirales The Floating (Pathogenicity) Island: A Genomic Dessert The wall teichoic acid and lipoteichoic acid polymers of Staphylococcus aureus Wall teichoic acid structure governs horizontal gene transfer between major bacterial pathogens Long noncontractile tail machines of bacteriophages Contractile tail machines of bacteriophages Bacteriophage Taxonomy: An Evolving Discipline Structures and host-adhesion mechanisms of lactococcal siphophages The complete genomes of Staphylococcus aureus bacteriophages 80 and 80 alphaimplications for the specificity of SaPI mobilization Competing scaffolding proteins determine capsid size during mobilization of Staphylococcus aureus pathogenicity islands Cleavage and Structural Transitions during Maturation of Staphylococcus aureus Bacteriophage 80α and SaPI1 Capsids Single-Particle Refinement and Variability Analysis in EMAN2 Processing of Structurally Heterogeneous Cryo-EM Data in RELION New tools for automated high-resolution cryo-EM structure determination in RELION-3 Transducing particles of Staphylococcus aureus pathogenicity island SaPI1 are comprised of helper phage-encoded proteins An essential role for the baseplate protein Gp45 in phage adsorption to Staphylococcus aureus A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core Structure of the phage TP901-1 1.8 MDa baseplate suggests an alternative host adhesion mechanism Structure of the host-recognition device of Staphylococcus aureus phage ϕ11 Crystal structure of bacteriophage SPP1 distal tail protein (gp19.1): a baseplate hub paradigm in grampositive infecting phages Structure of lactococcal phage p2 baseplate and its mechanism of activation The peptidoglycan hydrolase of Staphylococcus aureus bacteriophage 11 plays a structural role in the viral particle Capsid size determination by Staphylococcus aureus pathogenicity island SaPI1 involves specific incorporation of SaPI1 proteins into procapsids I-TASSER server for protein 3D structure prediction Real-space refinement in PHENIX for cryo-EM and crystallography Phages have adapted the same protein fold to fulfill multiple functions in virion assembly A common evolutionary origin for tailed-bacteriophage functional modules and bacterial machineries Bacteriophage SPP1 tail tube protein self-assembles into β-structure-rich tubes The phage lambda major tail protein structure reveals a common evolution for long-tailed phages and the type VI bacterial secretion system Bacteriophage T5 tail tube structure suggests a trigger mechanism for Siphoviridae DNA ejection Functional and structural dissection of the tape measure protein of lactococcal phage TP901-1 Structure and genome ejection mechanism of Staphylococcus aureus phage P68 Genome of staphylococcal phage K: a new lineage of Myoviridae infecting gram-positive bacteria with a low G+C content SaPI DNA is packaged in particles composed of phage proteins A Novel Staphylococcus podophage encodes a unique lysin with unusual modular design Ubiquitous Carbohydrate Binding Modules Decorate 936 Lactococcal Siphophage Virions Bacteriophages and their genomes Visualizing density maps with UCSF Chimera Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions MolProbity: all-atom structure validation for macromolecular crystallography EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy We are grateful to John Spear and Duncan Sousa at Florida State University for assistance with the high-resolution data collection at the Southeastern Consortium for Microscopy of Macromolecular Machines (SECM4). We are also grateful to Dr. Alasdair C. Steven for access to the EM facility at The National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), where the initial data collection was carried out, to Cynthia M. Rodenburg for technical assistance at the UAB cryo-EM facility, to Dr. José Penadés for providing the 80α baseplate deletion mutants, and to Dr. Pavel Plevka for sharing the atomic coordinates for phage P68. Conceptualization: Terje Dokland.