key: cord-0889146-sij6r0rd authors: Zhang, Xiaodong; Rosenthal, Peter B.; Formanowski, Frank; Fitz, Wolfgang; Wong, Chi‐Huey; Meier‐Ewert, H.; Skehel, John J.; Wiley, Don C. title: X‐ray crystallographic determination of the ­structure of the influenza C virus haemagglutinin‐esterase‐fusion glycoprotein date: 2007-09-27 journal: Acta Crystallogr D Biol Crystallogr DOI: 10.1107/s0907444999000232 sha: 9120ffe443d40980ed06e63ecc9afe7890e7fe8a doc_id: 889146 cord_uid: sij6r0rd The structure of the haemagglutinin‐esterase‐fusion (HEF) glycoprotein from influenza C virus has been determined to 3.2 Å resolution by X‐ray crystallography. A synthetic mercury‐containing esterase inhibitor and receptor analogue, 9‐acetamidosialic acid α‐thiomethylmercuryglycoside, was designed as the single isomorphous heavy‐atom derivative. The asymmetric unit of one crystal form (form I; P4(3)22, a = b = 155.4, c = 414.4 Å) contained an HEF trimer. Six mercury sites identifying the three haemagglutination and three esterase sites were located by difference Patterson map analysis of a 6.5 Å resolution derivative data set. These positions defined the molecular threefold‐symmetry axis of the HEF trimer. A molecular envelope was defined by averaging a 7.0 Å resolution electron‐density map, phased by single isomorphous replacement (SIR), about the non‐crystallographic threefold‐symmetry axis. Iterative non‐crystallographic symmetry averaging in real space, solvent flattening and histogram matching were used to extend the phases to 3.5 Å resolution. Molecular replacement of the model into a second crystal form (form II; P4(3)2(1)2, a = b = 217.4, c = 421.4 Å) containing two HEF trimers per asymmetric unit permitted iterative ninefold averaging of the electron density. The 3.5 Å electron‐density map allowed an unambiguous tracing of the polypeptide chain and identification of N‐linked carbohydrates. The model has been refined by least squares to 3.2 Å resolution (R (free) = 26.7%). The structure of the haemagglutinin-esterase-fusion (HEF) glycoprotein from in¯uenza C virus has been determined to 3.2 A Ê resolution by X-ray crystallography. A synthetic mercury-containing esterase inhibitor and receptor analogue, 9-acetamidosialic acid -thiomethylmercuryglycoside, was designed as the single isomorphous heavy-atom derivative. The asymmetric unit of one crystal form (form I; P4 3 22, a = b = 155.4, c = 414.4 A Ê ) contained an HEF trimer. Six mercury sites identifying the three haemagglutination and three esterase sites were located by difference Patterson map analysis of a 6.5 A Ê resolution derivative data set. These positions de®ned the molecular threefold-symmetry axis of the HEF trimer. A molecular envelope was de®ned by averaging a 7.0 A Ê resolution electron-density map, phased by single isomorphous replacement (SIR), about the non-crystallographic threefold-symmetry axis. Iterative non-crystallographic symmetry averaging in real space, solvent¯attening and histogram matching were used to extend the phases to 3.5 A Ê resolution. Molecular replacement of the model into a second crystal form (form II; P4 3 2 1 2, a = b = 217.4, c = 421.4 A Ê ) containing two HEF trimers per asymmetric unit permitted iterative ninefold averaging of the electron density. The 3.5 A Ê electron-density map allowed an unambiguous tracing of the polypeptide chain and identi®cation of N-linked carbohydrates. The model has been re®ned by least squares to 3.2 A Ê resolution (R free = 26.7%). In¯uenza C virus is a lipid-enveloped orthomyxovirus which causes respiratory infections in humans. Many strains of in¯uenza C virus appear to circulate simultaneously, causing occasional outbreaks of disease, especially in children (Katagiri et al., 1983; Air & Compans, 1983) , but not the familiar periodic epidemics and pandemics caused by in¯uenza A and B viruses (Katagiri et al., 1983; Air & Compans, 1983) . The virus membrane contains multiple copies of a single 225 kDa trimeric glycoprotein arranged in an open hexagonal array (Herrler et al., 1981; Hewat et al., 1984) . This haemagglutininesterase-fusion glycoprotein (HEF) has three activities essential for viral infectivity: receptor binding, receptordestroying enzyme activity and membrane fusion. Like the HA glycoprotein of in¯uenza A and B viruses, HEF is synthesized as a single-chain precursor HEF 0 , which is cleaved post-translationally into the disul®de-linked polypeptides HEF 1 and HEF 2 . This cleavage is required for the protein to be able to undergo a conformational change, triggered by the low pH in endosomes, which initiates the fusion of virus and cell membranes that effects viral entry. The HEF 1 polypeptide contains the binding site for the 9-O-acetylsialic acid cell receptor and a distinct receptor-destroying, 9-O-acetylsialic acid esterase active site (Pleschka et al., 1995) . Thus, HEF contains the three activities which are distributed on two glycoproteins in in¯uenza A and B viruses, where the HA protein has the receptor-binding and membrane-fusion activities and the neuraminidase (NA) glycoprotein has the receptor-destroying activity. Though similar in structure to the haemagglutinin (HA) of in¯uenza A, HEF (641 residues) and HA (512 residues) share less than 15% sequence identity (Nakada et al., 1984; Pfeifer & Compans, 1984) . A glycoprotein on the membrane of both coronaviruses and toroviruses, the haemagglutinin-esterase (HE; $409 residues), has 30% sequence identity to HEF 1 of in¯uenza C virus and possesses the same receptor binding and esterase speci®cities (Luytjes et al., 1988; Vlasak et al., 1988; King et al., 1985) . Coronaviruses cause approximately 25% of common colds in humans (Cornelissen et al., 1997; Monto & Lim, 1974; McIntosh et al., 1970) . Crystallization of HEF and X-ray data collection on the native HEF ectodomain from two tetragonal crystal forms [form I, P4 1(3) 22, a = b = 155.4, c = 414.4 A Ê and form II, P4 1(3) 2 1 2, a = b = 217.4, c = 421.4 A Ê ] using¯ash cooling and synchrotron radiation have been described previously . The two crystal forms have approximately 80% solvent content and one and two HEF trimers per asymmetric unit, respectively . An enzyme inhibitor and receptor analogue which would bind in both the receptor and enzyme active sites was designed with an acetamido group substituted for the 9-O-acetyl group on the viral receptor 9-O-acetylsialic acid (Imhof et al., 1988; Herrler et al., 1992; . A mercury adduct of this inhibitor, 9-acetamidosialic acid -thiomethylmercuryl glycoside, was synthesized and found to form an isomorphous heavy-atom derivative (Fitz et al., 1996; Rosenthal et al., 1998) . The structure of HEF at 3.2 A Ê has been described elsewhere (Rosenthal et al., 1998) ; methods used in the structure determination are described here. Attempts to use the lowhomology (<15% sequence identity) in¯uenza A virus HA structure for molecular replacement were unsuccessful. Instead, initial SIR phases were calculated at 7.0 A Ê resolution from six mercury sites located by Patterson map analysis of X-ray data on the mercury derivative collected to 6.5 A Ê resolution. The non-crystallographic threefold-symmetry axis of the HEF trimer was de®ned by the mercury positions. Averaging the electron-density map calculated from SIR phases around the molecular threefold-symmetry axis revealed an approximate molecular envelope. Iterative noncrystallographic symmetry averaging in real space, solvent attening and histogram matching were used to extend the phases from 7.0 to 3.5 A Ê . The path of the polypeptide could be traced in this electron-density map and an atomic model was built. A polyalanine model was then used to solve the form II crystal containing two trimers per asymmetric unit, allowing multi-crystal averaging of the electron density. Re®nement, using data to 3.2 A Ê , has resulted in a free R factor of 26.7% and an R work of 22.3%. 2.1. Data collection X-ray diffraction data were collected from two crystal forms of native HEF as described previously on¯ash-cooled crystals at 108 K , using synchrotron radiation at the CHESS F1 and A1 beamlines and phosphor image-plate detection. 6.5 A Ê resolution diffraction data from a form I crystal soaked in 50 mM 9-acetamidosialic acid -thiomethylmercurylglycoside were collected with a MAR Research image-plate detector using a GX-13 rotating-anode X-ray source (Elliot Ltd.) collimated with Franks mirrors. The data were indexed and integrated with DENZO (Otwinowski & Minor, 1997) and further analyzed with SCALEPACK and the programs from the CCP4 suite (Collaborative Computational Project, Number 4, 1994) . The highest resolution data were obtained from a form I crystal soaked in EMTS (3 mM) which had a small ÁF/F (0.08) similar to that found when comparing native data sets. Data-quality statistics are presented in Table 1 . The orientation of the non-crystallographic threefoldsymmetry axis in the HEF crystals was identi®ed by analysis of self-rotation functions, but locating the HEF molecule with cross-rotation and translation functions using the in¯uenza A HA structure was unsuccessful. Searches for the non-crystallographic molecular threefold axis were performed using both reciprocal-and real-space methods. [Model calculations using the in¯uenza A HA structure placed in different orientations in a model P4 1(3) 22 unit cell indicated that the correct threefold self-rotation function peak could be found in model data using the X-PLOR (Bru È nger, 1992b) real-space rotation function searching with a 25±50 A Ê vector annulus of the Patterson.] Fig. 1 shows the top four X-PLOR self-rotation Acta Cryst. Stereographic projection of the threefold self-rotation function peaks from form I and form II crystals. Pattersons analyzed were from data in the 7.0±4.0 A Ê resolution annulus; vectors were restricted to the length range 25±50 A Ê . The lattices of crystal form I and crystal form II are related by a 45 rotation about the c axis, as shown. Peaks 1, 2 and 3 are packing artifacts (see text). Peak 4 de®nes the molecular threefoldsymmetry axis. A rotation axis is speci®ed by (3, 9) where 3 is the angle between the c axis and the rotation axis and 9 is the angle between the projection of the rotation axis onto the ab plane and the a axis. function peaks for HEF crystal forms I and II. The unit cells of these crystal forms are related by a 45 rotation about their common c axes, as described previously . Peaks 1, 2 and 3 in each space group appear to be a consequence of the superposition of crystal packing vectors, as they were also found as the top peaks using the observed Figure 3 Harker sections in the mercury difference Patterson maps. The mercury compound is described in Fig. 2(b) . The Patterson was calculated with 15±6.5 A Ê data. Peaks on sections w = 1 2 and w = 1 4 are related by mirror symmetry across the diagonal. The asymmetric units of the v = 0 and u = v sections are shown. The Harker peaks from the six mercury positions summarized in Table 2 are indicated by the corresponding numbers. Cross vectors close to the Harker sections are marked as`X'. intensities randomized with respect to Miller indices (hkl) within resolution shells, thus preserving the intensity pro®le as a function of resolution similar to the original data (Jones, Walker et al., 1991) . This suggested that peaks 1±3 were rotation-function artifacts and that peak 4, common to both space groups, identi®ed the orientations of the molecular threefold-symmetry axes (3 = 63, 9 = 20 ; Fig. 1 angle conventions). The subsequent determination of the location of heavy-atom derivatives labeling the three receptor-binding and three enzyme active sites con®rmed this orientation of the non-crystallographic symmetry axis. HEF has been expected to resemble the in¯uenza A virus HA, for which the crystal structure is known Wiley et al., 1981; Weis et al., 1990) , for the following reasons. (i) HEF and HA share the activities of binding a sialic acid and mediating membrane fusion. (ii) Low-resolution image reconstructions ($30 A Ê ) from electron micrographs of the hexagonal glycoprotein arrays on in¯uenza C virus in negative stain (Hewat et al., 1984) suggested that HEF is a trimer similar in size and shape to the 135 A Ê long and 50 A Ê wide in¯uenza A haemagglutinin (HA; Wilson et al., 1981) . (iii) Although the overall sequence identity is less than 15% and attempts to align the HA 1 and HEF 1 sequences in the absence of the structure of HEF were unsuccessful (Pfeifer & Compans, 1984; Nakada et al., 1984) , HEF 2 and HA 2 are the same length and share 15% sequence identity (Pfeifer & Compans, 1984; Nakada et al., 1984) . The similarity of HEF 2 to HA 2 includes three conserved cysteines, a similar non-polar fusion-peptide at the N-terminus and a membrane-anchor sequence near the C-terminus. The sequence of HEF 1 is 100 residues longer than HA 1 . The extra 100 residues have been expected to form the esterase domain unique to HEF 1 , but they could not be located in the HEF 1 sequence by sequence comparisons. Cross-rotation functions were calculated using both reciprocal-and real-space methods with the in¯uenza A virus HA trimer as a search model or subsets of the model corresponding to the HA monomer, the HA 1 subunits only, the HA 2 subunits only and the same models consisting of only C Acta Cryst. Skewed difference Fourier map (between the derivative and the native data) using phases from the three mercury sites with highest occupancies ( Table 2 ). The map is a view down the molecular threefold-symmetry axis which was de®ned by the location of the three heavy atoms forming a triangle in Fig. 4(b) . (a) Three Hg atoms located at the HEF receptorbinding sites. (b) Three Hg atoms located at the HEF esterase sites (see text). The other nearby peaks in the diagram are related by crystallographic symmetry. Stereographic drawing of the initial molecular envelope of the HEF trimer, superimposed on the HA C model. The Hg-atom positions are marked as red spheres. The HA trimer from in¯uenza A virus was positioned such that its receptor-binding sites are near the Hg atoms forming the smaller triangle. The other Hg atoms are expected to be near the enzyme active sites. The bulge in the envelope relatively near the lower triangle of Hg atoms de®nes the location of the enzyme domain of HEF. atoms. No consistent peaks were found for different models, data-resolution ranges (including the low-resolution 15±8 A Ê annulus, see below) and Patterson search annuli. Patterson correlation re®nement (Bru È nger, 1990 ) and translation functions calculated in both P4 1 22 and P4 3 22 using the top 100 peaks for all the rotation searches failed to give a translationfunction peak above the noise level. In retrospect, the 9-acetamidosialic acid -thiomethylmercurylglycoside heavy-atom positions con®rmed the self-rotation function analysis. Although several cross-rotation function peaks were near the self-rotation peak, no signi®cant translation-function peaks were found. To investigate whether searches with lower resolution data (not actually measured) might have been successful, low-resolution data were calculated from the ®nal model and solvent correction added. A real-space cross-rotation function calculation with model data in the resolution range 40±10 A Ê using X-PLOR (Bru È nger, 1992b) contained the approximately correct orientation within the top ten peaks (no solution was evident in 100±10 A Ê resolution data). However, no solution closer than 30 A Ê to the correct location was found by translation-function searches with X-PLOR (Bru È nger, 1992b). To investigate further why the molecular-replacement searches with the HA structure were unsuccessful, the HA structure was placed at the location of the ®nal HEF structure and re®ned against the HEF data, allowing each polypeptide to move as a rigid body. This re®ned model has a correlation coef®cient of 0.22 and an R factor of 53% against the HEF data in the resolution range 15.0±3.5 A Ê . The mean phase error between the re®ned HEF structure and the HA model is 89 . When the HA model was placed at incorrect locations in the form I unit cell, the correlation coef®cient was 0.20 and the R factor was 55%. The structural differences between HEF and HA are signi®cant. Although the structure of individual sequence segments of the two proteins can be individually superimposed quite closely (Rosenthal et al., 1998) , when the full structures are compared, inter-domain movements over the length of the large molecule contribute to r.m.s. discrepancies between C positions of 11 A Ê between HEF 1 and HA 1 and 9 A Ê between HEF 2 and HA 2 , excluding HEF segments not present in HA. These large structural changes are apparently responsible for the failure of the molecular-replacement searches. HEF contains a receptor-binding site for 9-O-acetylsialic acid, the virus receptor, and a separate 9-O-acetylesterase enzyme active site, which removes the 9-O-acetyl group as depicted in Fig. 2(a) Vlasak et al., 1989; Figure 7 Flow diagram of the phase-extension procedure from 7.0 to 3.5 A Ê resolution. Average phase changes per cycle of iterative threefold real-space averaging at 7.0 A Ê resolution. Convergence was achieved after ®ve or six cycles. Correlation coef®cient of each resolution shell on the ®nal phaseextended electron-density map at 3.5 A Ê resolution. The correlation coef®cient between the structure factors calculated from the phaseextended electron-density map and the observed structure factors is approximately constant to 3.8 A Ê , after which it decreases. In the highest resolution shell (3.6±3.5 A Ê ), the correlation coef®cient is still above 0.5. The overall correlation coef®cient of the 3.5 A Ê resolution phase-extended map is 0.79. research papers Figure 9 Electron density compared at different stages in the phase-improvement and phase-extension procedures (contoured at 1'.) (a) SIR electron density at 7.0 A Ê along the long HEF 2 helix. (b) Electron density of the same section as in (a) after improvement by iterative cycles of threefold NCS averaging, solvent¯attening and histogram matching. (c) 3.5 A Ê phase-extended electron density of the same section as in (a). (d) Final re®ned electron density at 3.2 A Ê resolution. (e) SIR electron density at 7.0 A Ê resolution viewed from the top of the molecule down the molecular threefold axis. (f) 7.0 A Ê phase-improved electron density showing the same section as in (e). The protein density is clearly separated from the solvent density and three individual monomers are distinguishable. -strands appear as continuous sheets. (g) 3.5 A Ê resolution phase-extended electron density of the same section as in (e). Individual -strands are distinguishable. (h) Final 3.2 A Ê re®ned electron density. Hayes & Varki, 1989; Pleschka et al., 1995) . The structural study of HEF in complex with its natural receptor is hindered by the presence of the receptor-destroying function of the HEF esterase. A nonhydrolyzable substrate analogue and enzyme inhibitor, 9-acetamidosialic acid -methylglycoside, is shown in Fig. 2(b) and has a K i of 2.8 mM Imhof et al., 1988) . Because this inhibitor also prevents haemagglutination (Fitz et al., 1996) , it was expected that it might bind to the receptor-binding site as well as to the enzyme active site. A mercury adduct of the inhibitor, 9acetamidosialic acid -thiomethylmercuryl glycoside (Fig. 2c) , was synthesized with the mercury at the glycosidic linkage point where the natural receptor is linked to a cell-surface oligosaccharide and, therefore, unlikely to in¯uence binding to HEF. This mercury-substituted sialoside was found to be a competitive inhibitor of enzyme activity with a K i = 4.2 mM (Fitz et al., 1996) . Although this is a weak inhibition constant, HEF crystals could be soaked in 50 mM (10 Â K D ) solutions of inhibitor to achieve high occupancy. X-ray diffraction data collected to 6.5 A Ê resolution from a form I crystal soaked in the 9-acetamidosialic acid -thiomethylmercurylglycoside indicated that the derivative had similar unit-cell constants to the native crystal and a mean isomorphous difference in structure factors Acta Cryst. (1999). D55, 945±961 (ÁF/F ) of 20.6% (Table 1) . The difference Patterson map calculated at 6.5 A Ê resolution was noisy (Fig. 3) . In addition to factors such as non-isomorphism, we anticipated a poor signalto-noise ratio for the mercury difference Patterson owing to the large molecular mass of protein in the asymmetric unit (225 kDa). The six mercury positions were located by a combination of manual and computational searches of the Patterson map. Beginning with some prominent peaks on the Harker sections, a few tentative sites were assigned, which were con®rmed by searches for the cross vectors among the sites using the Patterson search program RSPS (Collaborative Computational Project, Number 4, 1994) . A combination of further Patterson searches and difference Fourier calculations located the full six expected sites. The location of the Harker vectors of the six mercury sites are marked on the four Harker sections (Fig. 3) . Cross vectors which overlap with or are close to the Harker sections are marked with an`X' in Fig. 3 . Distances between the sites indicated that they formed two equilateral triangles, as anticipated for the two sialosidebinding sites on each monomer of the trimeric HEF. The difference Fourier electron-density maps skewed for viewing normal to the planes of the triangles are shown in Fig. 4 . The molecular threefold-symmetry axis de®ned by the mercury locations is oriented 3 = 63 from the c axis and 2 = À20 from the a axis, according to the angle conventions of Fig. 1 , at the same place indicated by the self-rotation function analysis. The positions of the mercury sites are listed in Table 2 . The three mercury sites 55 A Ê from each other (Fig. 4a ) form a triangle similar in size to that formed by the locations of the receptor-binding sites on the in¯uenza A virus HA trimer. When the receptor sites of the HA were positioned near these mercury positions, the second triangle of mercury sites 75 A Ê from each other (Fig. 4b) fell into space off the edge of the HA model, suggesting the location of the enzyme domain of HEF (Fig. 5) . The positions of the heavy-atom sites were re®ned and SIR phases were calculated with MLPHARE (Otwinowski & Minor, 1997) . Statistics are summarized in Table 2 . 2.5. Molecular-envelope determination and phase improvement at 7.0 A Ê resolution An SIR electron-density map was calculated to 7.0 A Ê resolution using the six mercury sites. The map revealed long rod-like features reminiscent of the long triple-stranded -helical coiled coil in HA 2 . The atomic model of the HA was placed along the trimer axis de®ned by the heavy-atom positions to facilitate the calculation of approximate transformation matrices which superimposed the electron density of monomers onto each other in the HEF trimer using O (Jones, Zou et al., 1991) . These approximate non-crystallographic symmetry operators were improved by a real-space sixdimensional search in order to ®nd the maximum correlation between electron densities related by the non-crystallographic symmetry axis using RAVE (Kleywegt & Jones, 1994) . To permit iterative real-space averaging, solvent¯attening and histogram matching, an initial molecular envelope was constructed from the SIR electron-density map at 7.0 A Ê resolution, assuming 78% solvent content. An SIR electrondensity map was threefold averaged about the molecular symmetry axis without an envelope and folded back into the asymmetric unit using RAVE (Kleywegt & Jones, 1994) . This map was back-transformed to calculate structures factors which were input to the program DM for determining the solvent mask in a unit cell (Cowtan, 1994) . The mask about one molecule was isolated by averaging the DM solvent mask around a molecular threefold axis using RAVE (Kleywegt & Jones, 1994) . Positions in the DM mask not reinforced by the local threefold averaging were set to zero with MAPMAN (Jones, Zou et al., 1991) . The mask was enlarged and smoothed by ®ve-point interpolation at its boundaries (MAMA; Kleywegt & Jones, 1996a,b) . Internal`islands' were removed and crystallographic packing overlaps minimized using MAMA. The molecular envelope (Fig. 5) was approximately the size and shape of the in¯uenza HA, but contained a new domain in the vicinity of the second triangle of mercury sites, formed by the additional 100 residues of HEF 1 . With the molecular envelope and the NCS transformation matrices, iterative threefold real-space averaging with solvent attening was used to improve the phases to 7.0 A Ê resolution using RAVE (Kleywegt & Jones, 1994) . Convergence was observed after ®ve or six cycles, as indicated by the average phase changes between consecutive cycles dropping from 65 to less than 3 (Fig. 6) . The increase in the quality of the electron-density map at 7.0 A Ê resolution was dramatic, with continuous rods of -helical density replacing broken rod-like fragments in the stem region of the molecule (Figs. 9a, 9b ) and connected electron density clearly de®ning the top globular domains (Figs. 9e, 9f) . Table 2 Heavy-atom parameters and SIR phasing statistics. Positions The phases of the improved electron-density map at 7.0 A Ê resolution were then extended to 3.5 A Ê resolution by the procedure outlined in Fig. 7 . At each resolution step, the electron-density map was back-transformed to calculate phases for slightly higher resolution Fourier terms than were used to calculate the map. Electron-density maps calculated with the incrementally higher resolution terms were improved by 20 cycles of threefold NCS averaging in real space, solvent attening and histogram matching using DM (Cowtan, 1994) . ' A weighting (Free-Sim mode in DM) was used to combine phases between the iterative cycles. A total of 50 phaseextension steps were used, which included 20 0.1 A Ê steps from 7.0 to 5.0 A Ê and 30 0.05 A Ê steps from 5.0 to 3.5 A Ê (each resolution increment was less than the interval between lattice points along the longest unit-cell axis). At resolutions of 6.0, 5.0 and 4.0 A Ê , the NCS operators were redetermined using a six-dimensional search for the maximum correlation NCSrelated electron density. The progress of the phase-extension process was monitored by the DM free-R indicator and the correlation coef®cient between the observed amplitudes and those calculated from the map ( Table 3 ). The ®nal phase-extended electron density calculated at 3.5 A Ê had good correlation coef®cients in all resolution shells (Fig. 8) and showed clearly interpretable protein features including long -helices in the stem region (Fig. 9c ) and a -barrel in the top globular domain (Fig. 9g) . The hand of the -helices was used to determine the correct enantiomorph of the space group to be P4 3 22. 97% of the path of the HEF 1 chain and 85% of the path of the HEF 2 chain were traced into the phase-extended electron-density map at 3.5 A Ê resolution. A partial molecular model for HEF 1 , including residues 4±424 (HEF 1 has 432 residues), was built using the amino-acid sequence aided by clear electron density for some of the aromatic side chains, seven internal disul®de bridges and N-linked carbohydrate. Only 131 of the 175 residues of the HEF 2 polypeptide could be built unambiguously (residues 24±154). However, additional electron density at the N-and C-termini of HEF 2 suggested where the remaining residues might be located. The phase-extension protocol, as described above, was repeated with a new envelope calculated from the partial model of HEF 1 and HEF 2 (Madden, 1992) but extended to include the untraced electron density. The path of HEF 2 was indicated for residues 8±154, though side-chain density was poor. Further clarity in the chain tracing came from an analysis of the form II crystal. Diffraction data collected to 4.0 A Ê resolution from crystal form II, which contains two HEF trimers per asymmetric unit, permitted iterative ninefold averaging to 4.0 A Ê resolution of electrondensity maps from form I and form II crystals. The location of the HEF trimers in the form II crystals was determined by using a polyalanine model of the unre®ned HEF trimer and the program AMoRe (Navaza, 1994). A cross-Acta Cryst. rotation function indicated trimers at the same orientation as in crystal form I but rotated 45 about the c axis, as expected . The top rotation-function peak (8.7') gave a translation-function solution in P4 3 2 1 2 with a correlation coef®cient of 0.23 (18.9'). With the ®rst trimer ®xed, the second trimer was located in a translation function (49.5') at a relative translation vector of (0.481, 0.498, 0.005) in fractional coordinates. This displacement is a pseudo-Ccentering translation ( 1 2 , 1 2 , 0) between the two trimers in the asymmetric unit, consistent with a peak in the native Patterson function of the form II crystals . Rigid-body re®nement of both trimers yielded a correlation coef®cient of 0.38 and an R factor of 0.48. Side-chain electron densities consistent with the HEF sequence were evident in a 2F o À F c map calculated from the polyalanine model. A model of the two trimers containing HEF 1 residues 6±420 and HEF 2 residues 30±154, which were the clearest residues in the electron-density maps of the form I crystal before re®nement, was used to calculate a 2F o À F c map in the form II crystal to 4.0 A Ê resolution. Structure factors calculated from this electron-density map were used with the program DM for 20 cycles of iterative sixfold non-crystallographic symmetry averaging in real space, solvent¯attening and histogram matching. The molecular envelopes for this procedure were calculated from the model from the form I crystal with ENVAT (Madden, 1992) , extended in the regions where the original model was expected to be incomplete with MAMA (Kleywegt & Jones, 1996b) , and trimmed to avoid symmetry overlaps using the program NCSMASK (Collaborative Computational Project, Number 4, 1994) . A monomer mask was used for non-crystallographic symmetry averaging and a mask surrounding both trimers was used for solvent¯attening. The improved sixfold-averaged electron-density map from the form II crystal was combined with the threefold-averaged electron-density map calculated prior to model re®nement from the form I crystals in 20 cycles of iterative ninefold averaging in real space, solvent¯attening and histogram matching of monomer density in both crystal forms using the program DMmulti (Cowtan, 1994) . The resulting electrondensity map, although at lower resolution (4.0 A Ê ) than those from form I alone (3.2 A Ê ), showed connected electron density for HEF 1 residues 3±425 and HEF 2 residues 6±154. The improved electron density for HEF 2 residues 6±24 resolved uncertainty about the path of main chain in this region, which had been confused by the presence of the electron density from an oligosaccharide attached at HEF 1 residue 12. The initial model had an R factor of 42% for all the intensities in the 15.0±3.5 A Ê resolution range. 5% of the diffraction data (3700 re¯ections) were omitted from subse- Final re®ned model at 3.2 A Ê resolution superimposed on a simulatedannealing omit map. The electron-density map was calculated by omitting atoms within an 11 A Ê radius surrounding residue 53 of HEF 1 , with an initial annealing temperature of 1000 K (Bru È nger, 1992b; Hodel et al., 1992) . The map was calculated with data between 15 and 3.2 A Ê resolution and contoured at 1'. Electron density and superimposed carbohydrate model at Asn381. The electron density was calculated by omitting the carbohydrate model (contoured at 1'). quent re®nement and used to calculate the free R factor (Bru È nger, 1992a) . Strict non-crystallographic threefold symmetry was applied in the initial rounds of re®nement (Weis et al., 1990; Braig et al., 1995; Bru È nger, 1992b) , with an observations-to-parameters ratio of 4.1 (60 000 re¯ections, 14 630 atoms). An initial B factor of 40 A Ê 2 was assigned to every atom; Wilson statistics suggested an overall B factor of 60 A Ê 2 based on observed diffraction intensities. An initial 150 cycles of positional re®nement decreased R free to 36.1% and R work to 34.2% (X-PLOR; Bru È nger, 1992b; Table 4 ). This was followed by simulated-annealing re®nement with slow cooling, during which the temperature was decreased from 4000 to 300 K by increments of 25 K every 12.5 fs. 120 cycles of conventional energy minimization were performed after the simulated annealing. The R factors decreased more than 2% to an R free of 33.7% and an R work of 31.2% (Table 4) . A restrained temperature (B) factor re®nement which restricts the deviation of B factors between adjacent atoms to less than 2' was applied after simulated annealing ( Table 4 ). The Ramachandran plot was consulted during re®nement using PROCHECK (Laskowski et al., 1993) and residues outside the most probable regions were reexamined and rebuilt if consistent with the electron density. The phases calculated from the re®ned model were used to calculate a ' A -weighted electron-density map using (2mF o À DF c ) coef®cients (Read, 1986; Kleywegt & Jones, 1996a,b) . This electron-density map was threefold averaged and solvent¯attened using RAVE and an envelope calculated from the model (ENVAT; Madden, 1992). There was a clear improvement in the electron density as a result of the Acta Cryst. Correlation coef®cient indicating real-space ®t between the 2F o À F c electron density and the ®nal model re®ned at 3.2 A Ê resolution for mainchain atoms (MC, solid line) or side-chain atoms (SC, dashed line). Solvent-exposed polar side chains are labeled. (a) HEF 1 , (b) HEF 2 . Ramachandran plot of the main-chain torsion angles (9, 2) edited from PROCHECK for the ®nal HEF model re®ned at 3.2 A Ê resolution. 76% of the residues have dihedral angles in the most favored region (darkest gray) and 0.2% of the residues are in the disallowed region (white). Triangles represent glycines and squares represent all other residues. re®nement, including better electron density for HEF 1 residues 1±3 and 425±427 and HEF 2 residues 8±23, 155±160, and con®rmation of the path of the polypeptide chain segments which had been omitted from the re®nement. These additional residues were added to the model and re®ned together with the rest of the molecule as described above. Electron density for residues 1±5 and 161±175 of HEF 2 and residues 427±432 of HEF 1 , all at termini, remained poor after many cycles of re®nement and examination of averaged difference Fourier maps. There are crystal contacts near the HEF 1 and HEF 2 C-termini which can cause deviations from the non-crystallographic symmetry. A tentative model including residues 1±8 and 161±175 of HEF 2 and 427±432 of HEF 1 , was built into unaveraged 2mF o À DF c Fourier maps. This complete, but tentative, model was re®ned by applying a weighted non-crystallographic restraint (X-PLOR; Bru È nger, 1992b) for the entire model and residues 422±432 of HEF 1 , 1± 10 and 161±175 of HEF 2 free of any restraint. No dramatic changes were observed in the unrestrained residues after re®nement. In each round of re®nement, before simulated annealing, B factors were ®rst re®ned as a group with each residue given one B factor for all the main-chain atoms and one for all the side-chain atoms. A restraint B-factor re®nement which restricts the deviation of the B factors between adjacent atoms to be less than 2' was applied after simulated-annealing re®nement. There are eight potential N-linked carbohydrate attachment sites per HEF monomer, located at HEF 1 residues 12, 47, 117, 130, 176 and 381 and HEF 2 residues 106 and 157 (Fig. 10) . At each of ®ve of the sites (HEF 1 12, 47, 130 , 381, and HEF 2 106), clear electron density was observed for a trisaccharide attached to the asparagine (e.g. Fig. 11) . A model of MAN± NAG±NAG was built into the electron density for each of these sites (Fig. 11) . Re®nement of this addition to the model resulted in very small changes in the R factors (Table 4 ). Extra electron density was observed beyond the core trisaccharide at residues 47, 130 and 381. Some density was observed near residue 176 in one monomer, but no further carbohydrate could be added to the model with con®dence. Of the two potential oligosaccharide sites which lacked any potential oligosaccharide electron density, one, Asn117, is in the rarely glycosylated sequence Asn± Trp±Ser±Pro (Shakin-Eshleman et al., 1996; Katsuri et al., 1997; Gavel & von Heijne, 1990 ) and the other, Asn157 of HEF 2 , is near the C-terminus of HEF 2 where the model B factors are high and the electron density is poor, suggesting that the disorder of this region is obscuring the oligosaccharide electron density. Re®nement was continued by including diffraction data to 3.2 A Ê resolution. A higher resolution native data set (an EMTS soak at 3 mM, Table 1 , ÁF/F = 0.08) consisting of data between 5.5 and 3.2 A Ê were merged with the native data from 10 to 3.5 A Ê . The same set of re¯ections was used for the R free set in the 10.0±3.5 A Ê range, but supplemented to include data from the 3.5±3.2 A Ê resolution shell. A cycle of positional re®nement followed by simulated-annealing re®nement with a bulk-solvent correction (X-PLOR; Bru È nger, 1992b) yielded an R free of 26.7% and an R work of 22.5% for the 10.0± 3.2 A Ê resolution range (Table 4 ). In a ®nal round of re®nement, a maximum-likelihood target was used as implemented in REFMAC (Collaborative Computational Project, Number 4, 1994) . R free did not change, although R work increased slightly. The improvement in the electron density resulting from re®nement is evident in Figs. 9(d) and 9(h). A section of the electron density from a simulatedannealing omit-map (Bru È nger, 1992b; Hodel et al., 1992) with the model superimposed is shown in Fig. 12 . The current model consists of residues 1±427 (of 432) for HEF 1 and residue 5±165 (of 175) for HEF 2 . The ®nal electrondensity map has one major break in the main-chain electron density at residues 32±35 of HEF 2 , where the sequence contains three consecutive glycines. There is poor electron density for 30 side chains, 26 of which are charged surface residues (labeled in Fig. 13) . No model has been built for either the C-termini of HEF 1 , residues 428±432, or of HEF 2 , residues 166±175, owing to poor electron density in difference (2F o À F c ) and omit electron-density maps. After iterated cycles of re®nement and rebuilding, continuous but poor electron density was found in omit maps for residues 1±8 of HEF 2 , which have very high B factors (>100 A Ê 2 ). The correlation coef®cient between the ®nal electron-density map [(2F o À F c ), È c ] and the model is 0.87 for the HEF 1 main chain and 0.86 for HEF 1 side chains (Fig. 13a) . The correlation coef®cient is somewhat lower for the HEF 2 chain: 0.82 for the main chain and 0.79 for the side chains (Fig. 13b) . The Ramachandran plot (Fig. 14) shows that 76% of the residues have main-chain dihedral angles in the most-favored region and 0.2% are in the disallowed region. The HEF trimer has an extended stalk-like domain on the membrane proximal end and globular head-like domains at the membrane distal end (Fig. 15a) . The three head domains are $60 A Ê tall and extend $116 A Ê in diameter at the widest point of the trimer (Fig. 15b) . The head domains consist of only HEF 1 residues. They each contain a cell-receptor binding site and a cell-receptor-destroying enzyme active site (Fig. 16) . The stem domain contains all of HEF 2 and part of HEF 1 . It has three long -helices (60 A Ê ) in the HEF 2 trimer interface and the membrane-fusion peptide at the N-terminal of HEF 2 , located about 35 A Ê from the bottom of the molecule. The HEF 1 polypeptide chain starts from the bottom of the stem domain at the viral membrane end of the trimer and extends directly up to the head domain, except for a transverse -hairpin loop (residue 14±28) halfway up the stem. HEF 1 reaches the top of the stalk at residue 43, 80 A Ê away from the bottom of the molecule (Fig. 17a) . The polypeptide then forms part of the esterase domain (41±73) containing a -strand followed by the active-site serine 57 (dot in Fig. 17b ) and two short helical segments (Fig. 17b) . A compact domain from residues 74±150 composes the top side of the esterase domain situated under the receptor-binding domain (Fig. 17c) . One loop from this domain contributes one residue, Tyr127, to the receptor-binding site in the receptor-binding domain located directly above it. The receptor domain, residues 151±310, forms a`jelly-roll' -sandwich of nine -strands at the distal end of the molecule (Fig. 17d) . This domain contains the receptor-binding site (Fig. 16) . Below the receptor-binding site, the polypeptide extends downward to complete the esterase domain. A compact, motif (311±366), containing the active-site triad residues His355 and Asp352 (dots in Fig. 17e ), packs against the N-terminal segments of the esterase, residues 41±73, which contained Ser57. The HEF 1 subunit ends in a segment, residues 367±432 (Fig. 17f) , which runs down the stalk of the molecule, antiparallel to the N-terminal segment, residues 1±43. The ®ve C-terminal Acta Cryst. Surface diagram of HEF monomer. The receptor-binding pocket is located at the top of the head domain, while the enzyme active site is located near the widest part of the molecule. The electrostatic charge was calculated with GRASP (Nicholls et al., 1991) ; red = negative, blue = positive. Also shown are the cell-receptor analogs bound to both the receptor-binding sites and the enzyme active sites. Trisaccharide models are shown at ®ve of the eight potential glycosylation sites. residues of HEF 1 are not observed in electron-density maps and are presumed to extend into the solvent and be disordered. The topology of the HEF 1 subunit is dominated by -strands (Fig. 18a) . In the structure of HEF 1 , there are six -sheets arranged from bottom to top of the domain (colored in Fig. 18b ). The two major segments of sequence difference in HEF 1 relative to HA 1 , residues 41±73 and 311±366, which contain the catalytic triad of the esterase (Rosenthal et al., 1998) are boxed in the topology diagram (Fig. 18a) . 3.4. The HEF 2 polypeptide HEF 2 is dominated by two antiparallel -helices, the ®rst preceded by two antiparallel -strands (Fig. 17g ) and the second followed by two antiparallel -strands (Fig. 17h) . The ®rst 24 residues which precede the two -strands contain the non-polar fusion peptide implicated in the membrane-fusion activity of the HEF. The electron density for residues 1±5 of HEF 2 is poor probably owing to disorder of this region. Residues 13±18 of HEF 2 are sandwiched between N-terminal (9±13) and C-terminal (424±427) residues of HEF 1 . The two antiparallel -strands (residues 23±29 and 37±39) are part of a ®ve-stranded -sheet at the bottom of the HEF stem, the central strand of which is from the N-terminal of HEF 1 (residues 3±11; Figs. 17g, 17h; red in Fig. 18b) . A 6 2 3 -turn, 34 A Ê long -helix runs towards the top of the stem domain, followed by a 15-residue hairpin loop which reverses the direction of the chain (Fig. 17g ). An 11-turn, $60 A Ê long -helix (residues 82±124) runs towards the viral membrane end of the stem. Near the end of the second -strands following the long helix, Cys137 of HEF 2 is disul®de bonded to Cys6 of HEF 1 , forming the only covalent link between HEF 1 and HEF 2 (Fig. 17h ). An internal disul®de loop from HEF 2 145±149 precedes a short -helix before the polypeptide turns toward the viral end of the molecule. The electron-density map could not be interpreted beyond residue 165. The transmembrane anchor starts about 20 residues after 165. A single HEF monomer (HEF 1 + HEF 2 ) buries 4056 A Ê 2 of its solvent-accessible surface area in the HEF trimer interface. About 60% (2488 A Ê 2 ) of this buried surface is HEF 2 and 40% (1566 A Ê 2 ) is HEF 1 . The prominent triple-stranded -helical interface at the core of the stalk domain accounts for only about 13% of the interface (321 A Ê 2 ) formed by HEF 2 . This re¯ects the fact that the central helices diverge from each over both the top and bottom third of their lengths. The space on the threefold-symmetry axis between the top segments of the central helices is occupied by electron density which appears to be ions or water molecules (Rosenthal et al., 1998) . The space between the top of the two long -helices from adjacent HEF 2 monomers is ®lled by the long loop between the two helices of one HEF 2 monomer. This loop forms a substantial part of the HEF 2 ± HEF 2 interface ($600 A Ê 2 ). The space between the long -helices of HEF 2 at bottom third of these helices is ®lled by the HEF 2 -strand residues 133±135. After the low-pH-induced conformational change which activates the membrane-fusion activity, the loop and -sheet residues forming these HEF 2 ±HEF 2 interfaces are expected to change positions radically, based on the precedent of the structurally related HA molecule (Bullough et al., 1994) . The trimer interface formed by HEF 1 is also composed of three distinct regions. The Polypeptide chain fold of HEF from N-to C-terminus. The catalytic triad, Ser57, Asp352 and His355, are shown as spheres in panels (b), (e) and (h). largest interface, 1240 A Ê 2 , is at the top of the molecule, between the globular domains of HEF 1 . The HEF 1 globular domain also contacts the HEF 2 subunit of the neighboring molecule, forming a 126 A Ê 2 interface with the extended loop between the two -helices of the HEF 2 subunit. A third contact (160 A Ê 2 ) is formed by the HEF 1 residues 21 and 22, which are part of a -loop which projects toward the trimer axis in the stalk region of the molecule. These HEF 1 residues contact the HEF 2 subunit of an adjacent monomer at residue 106, which is part of the triple-stranded -helical coiled coil. The oligosaccharides seen at ®ve (HEF 1 12, 47, 130, 381 and HEF 2 106) of the eight potential glycosylation sites (see above) appear to form three rings around the HEF trimer (Fig. 15a) , one at the height of the fusion peptide, one near the junction of the top globular domain with the stem domain and one just below the receptor-binding site. The oligosaccharide attached at HEF 2 residue 157, which is in an area of poor electron density (see x2.8), occurs near the bottom of the trimer (Fig. 15a) . The oligosaccharide at HEF 2 residue 106 (on the long -helix) emerges from near the trimer axis and forms part of the trimer interface in the stalk of HEF (Figs. 10, 15b, 15c) . The oligosaccharides linked at HEF 1 residues 47 and 130 also both extend toward other monomers, suggesting a possible role in stabilizing the trimer interface in the globular head region (Fig. 15b) . The receptor-binding site is located near the top of the molecule, while the enzyme active site is located at the widest part of the trimer, near the base of the globular head domain (Fig. 16) . Both sites are obvious cavities in the surface of the molecule (Fig. 16) . Detailed structures of these sites and their interactions with a fragment of the cell receptor and an enzyme inhibitor are described elsewhere (Rosenthal et al., 1998) . Three compact domains, labeled R, E, and F, can be identi®ed in a Go distance plot (Go, 1981) of the HEF monomer (Fig. 19a) . Only the R domain, which contains the receptor-binding site (Rosenthal et al., 1998) , is formed by a continuous segment of the HEF sequence. The extended F (membrane-fusion) domain which forms the stem of HEF is composed of three segments labeled F1, F2 and F3 in Fig. 19 (a) and the esterase domain E is composed of E1, E H and E2. Both the R and E domains (blue and green in Fig. 19b ) have the topologies of large loops, with their respective N-and C-termini at almost the same point in the threedimensional structure. The singlesegment R domain is inserted into a surface loop of the esterase domain, and the esterase domain is inserted into a surface loop near the top of the stem domain, F (Rosenthal et al., 1998) . A search of the Dali structure database with the HEF enzyme domain, E, identi®ed structural similarity to the esterase from Streptomyces scabies (Wei et al., 1995 ; Z score = 8.0, sequence identity 10%) and a brain acetylhydrolase ( 8.3, sequence identity 13%; for details, see Rosenthal et al., 1998) . The implications of the domain structure of HEF for both the evolution of membrane-fusion proteins and for the evolution of in¯uenza A, B, and C virus glycoproteins, as well as the HE (haemagglutinin-esterase) glycoprotein of coronaviruses has been described elsewhere (Rosenthal et al., 1998) . Genetics of In¯uenza Viruses PLOR Version 371, 37±43. Collaborative Computational Project Jnt CCP4/ESF-EACBM Newslett. Protein Crystallogr. 31 Protein Eng. 3, 433±442. Go, M Acta Cryst. A47, 753±770 Virus Res. 2 Structure, 4 Domain structure of HEF. (a) Go distance plot (Go, 1981) of the HEF monomer E and F are composed of non-contiguous sequences labeled F1, F2 and F3, and E1, E H and E2. (b) HEF monomer colored by domain: blue = R, green = E, red = F. The stem domain F is elongated in the monomer, but compact as a trimer (see Fig We thank Mia Frayser and Richard Crouse for excellent technical assistance, one reviewer for suggesting additional molecular-replacement calculations and members of the Harrison±Wiley laboratory and the staff of the Cornell High Energy Synchrotron Source for assistance with data collection. XZ was supported by the Howard Hughes Medical Institute (HHMI). PBR was supported by Graduate Research Assistant Support from the Department of Molecular and Cellular Biology, Harvard University, as a graduate student and by a National Institutes of Health (NIH) training grant on the Molecular Basis of Viral Infectivity as a postdoctoral fellow. This research was supported by the NIH, The Deutsche Forschungsgemeinschaft, the Medical Research Council (UK) and the HHMI. DCW is an investigator of the HHMI.