key: cord-255350-dmbl4emn authors: Bonsor, Daniel A.; Beckett, Dorothy; Sundberg, Eric J. title: Structure of the N-terminal dimerization domain of CEACAM7 date: 2015-08-25 journal: Acta Crystallographica Section F Structural Biology Communications DOI: 10.1107/s2053230x15013576 sha: doc_id: 255350 cord_uid: dmbl4emn CEACAM7 is a human cellular adhesion protein that is expressed on the surface of colon and rectum epithelial cells and is downregulated in colorectal cancers. It achieves cell adhesion through dimerization of the N-terminal IgV domain. The crystal structure of the N-terminal dimerization domain of CEACAM has been determined at 1.47 Å resolution. The overall fold of CEACAM7 is similar to those of CEACAM1 and CEACAM5; however, there are differences, the most notable of which is an insertion that causes the C′′ strand to buckle, leading to the creation of a hydrogen bond in the dimerization interface. The K (dimerization) for CEACAM7 determined by sedimentation equilibrium is tenfold tighter than that measured for CEACAM5. These findings suggest that the dimerization affinities of CEACAMs are modulated via sequence variation in the dimerization surface. Carcinoembryonic antigen-related cell adhesion molecules (CEACAMs) belong to the immunoglobulin (Ig) family and are expressed differentially on the surfaces of cells (Gray-Owen & Blumberg, 2006; Tchoupa et al., 2014) . There are 12 CEACAMs found in humans: CEACAM1, CEACAM3-CEACAM8, CEACAM16 and CEACAM18-CEACAM21 (Beauchemin & Arabzadeh, 2013; Tchoupa et al., 2014) . Their functions and roles in cellular processes are diverse and include roles in phagocytosis, hearing, proliferation, signaling, tumor suppression and cell adhesion (Oikawa et al., 1989 (Oikawa et al., , 1991 Benchimol et al., 1989; Streichert et al., 2001; Pils et al., 2008; Singer et al., 2010; Zheng et al., 2011) . CEACAMs are typically dysregulated in cancer and are found to be parasitized by bacteria (e.g. Neisseria meningitidis, Escherichia coli and Haemophilus influenzae) and viruses in mice (e.g. coronavirus) during infection (Dveksler et al., 1991; Leusch et al., 1991; Bos et al., 1997; Schö lzel et al., 2000; Virji et al., 2000; Duxbury et al., 2004; Litkouhi et al., 2008; Obrink, 2008; Singer et al., 2010) . CEACAMs contain an N-terminal immunoglobulin variable domain (IgV), a variable number of immunoglobulin constant domains (IgC2) and either a C-terminal transmembrane and cytoplasmic domain or a glycophosphatidyl-inositol (GPI) moiety by which they are anchored to the plasma membrane (Tchoupa et al., 2014) . Cell adhesion is achieved through the N-terminal domain of CEACAMs, which can undergo heterodimerization and homodimerization in a cis (on the same cell) or trans (across different cells) fashion (Taheri et al., 2000; Watt et al., 2001) . # 2015 International Union of Crystallography CEACAM7 is expressed on highly differentiated epithelial cells of the colon and rectum and on the epithelial cells within the ducts of the pancreas (Schö lzel et al., 2000) . The expression pattern of CEACAM7 suggests a specialized function. In fetal tissues of the colon, CEACAM7 is located at the base of epithelial cells and has been found to migrate to the apical surface a few days after birth (Schö lzel et al., 2000) . CEACAM7 contains three domains: an N-terminal IgV domain, a single IgC2 domain and a cell-surface GPI anchor domain (Tchoupa et al., 2014) . CEACAM7 expression is downregulated during the early development of colorectal tumors, unlike CEACAM5 or CEACAM6, which are typically upregulated, suggesting a tumor-suppression function (Schö lzel et al., 2000) . Currently, it is unknown whether CEACAM7 is involved in cell adhesion through homodimerization or if any structural differences exist that could potentially allow CEACAM7 to function as a tumorsuppression molecule when compared with the two known structures of CEACAM1 and CEACAM5 (Fedarovich et al., 2006; Korotkova et al., 2008) . Here, we report the 1.47 Å resolution X-ray crystal structure of the N-terminal domain of CEACAM7 and have characterized its oligomeric state in solution. The N-terminal domain of CEACAM7 was synthesized as a codon-optimized GeneArt string (Life Technologies), which was digested and ligated into an NcoI/XhoI-cut pET-21d vector without a purification tag. CEACAM7 was expressed in inclusion bodies in E. coli BL21 (DE3) pLysS cells. Briefly, 1 l of cells were grown in LB Miller at 310 K until an OD 600 nm of $0.6 was attained, prior to induction with 1 mM isopropyl -d-1-thiogalactopyranoside (IPTG). Cells were grown for a further 4 h before harvesting (5000g for 15 min at 277 K). The cells were resuspended in lysis buffer [50 mM Tris-HCl, 500 mM NaCl, 1%(v/v) Triton X-100 pH 7.5] and lysed by sonication. Inclusion bodies were isolated (20 000g for 20 min at 277 K), resuspended in lysis buffer, sonicated and isolated by centrifugation (20 000g for 20 min at 277 K). Inclusion bodies were washed with a high-salt buffer (50 mM Tris-HCl, 1.0 M NaCl pH 8.0) to remove DNA, followed by lysis buffer without Triton X-100. Inclusion bodies were dissolved in 30 mM Tris-HCl, 150 mM NaCl, 8.0 M urea pH 8.3 ($5 ml per litre of grown cells), refolded by rapid dilution (1:12 ratio) at 277 K into 50 mM CHES-HCl, 500 mM l-arginine pH 9.2 and left for 24 h. Refolded CEACAM7 was dialyzed against 10 mM Tris-HCl pH 8.0 and concentrated by anion-exchange chromatography (Mono Q, GE Healthcare). A linear salt gradient from 0 to 1000 mM was run at 1 ml min À1 over 15 min, with CEACAM7 eluting at between 50 and 100 mM NaCl. CEACAM7 was further purified by size-exclusion chromatography (Superdex 200, GE Healthcare) in 50 mM Tris-HCl, 150 mM NaCl, 1 mM EDTA pH 7.5 and fractions were stored at 277 K. Typically, 10 mg of refolded protein per litre was obtained with a refolding efficiency of 10%. Macromolecule-production information is summarized in Table 1 . CEACAM7 was concentrated using a Centricon centrifugal filter unit (10 kDa MWCO, Millipore) and subsequently dialyzed against 20 mM Tris-HCl, 100 mM NaCl pH 7.5. CEACAM7 at 5.9 mg ml À1 was screened against The JCSG+ Suite screen (Qiagen) using a Crystal Gryphon Protein Crystallography System (Art Robbins Instruments) with sitting drops consisting of 150 nl protein solution and 150 nl reservoir solution equilibrated against 50 ml reservoir solution. A shower of small crystals grew over 5 d in condition D6 Crystals of CEACAM7 were washed and cryoprotected in mother liquor containing 20%(v/v) glycerol. Data were collected on beamline 23-ID-B at the Advanced Photon Source (APS), Argonne National Laboratory, USA. Data were processed using HKL-2000 (Otwinowski & Minor, 1997) . Data-collection and processing statistics are shown in Table 3 . F obs were obtained using SCALEPACK2MTZ . Molecular replacement was performed using MOLREP (Vagin & Teplyakov, 2010 ) and a CHAINSAW (Stein, 2008) model of CEACAM5 (PDB entry 2qsq; Korotkova et al., 2008) , a protein with 65% sequence identity to CEACAM7. CEACAM7 was refined with REFMAC (Murshudov et al., 2011) and rebuilt in Coot (Emsley et al., 2010) . MolProbity (Chen et al., 2010) was used for Ramachandran analysis. Refinement statistics are shown in Table 4 . Sedimentation-equilibrium measurements of CEACAM7 were performed using a Beckman-Coulter XL-I analytical ultracentrifuge equipped with a four-hole An-60 Ti rotor at 20 C. Prior to centrifugation, CEACAM7 was dialyzed extensively against 50 mM Tris-HCl pH 7.5, 50 mM NaCl. SEDNTERP (http://sednterp.unh.edu) was used to calculate values for the protein partial specific volume and solvent density from the protein amino-acid sequence and buffer composition, respectively. CEACAM7 at three different concentrations (24.1, 14.5 and 9.6 mM) was loaded into cells equipped with six-hole charcoal-filled Epon centerpieces (1.2 cm path length) with sapphire windows. Centrifugation was carried out at 29 000, 32 000 and 35 000 rev min À1 and scans were acquired at 280 nm with a step size of 0.001 and five averages per step. The data were globally analyzed using the WinNonLin program (Johnson et al., 1981) . A single X-ray diffraction data set was collected to a resolution of 1.47 Å . Initial indexing of the data suggested that the crystal contained a primitive orthorhombic lattice with two molecules in the asymmetric unit. However, attempts to find a molecular-replacement solution using the CEACAM5 monomer as a search model yielded no solutions that would refine in any of the orthorhombic space groups. The data were reprocessed in a primitive monoclinic lattice with a angle of 89.99 , which led to the correct solution in space group P2 1 with four copies of the search model found in the asymmetric unit. h|L|i tests of the data (0.503) show that the data are untwinned and no pseudo-merohedral twinning was detected. All residues in the final model were modeled except for the initial alanines of two of the four molecules in the asymmetric unit. The final model contained three chloride ions and 269 waters. The final model was refined to an R cryst and R free of 0.143 and 0.193, respectively. The structure factors and model have been deposited in the Protein Data Bank (PDB entry 4y89). The closest homologs of CEACAM7 in the PDB are CEACAM1 and CEACAM5, which both share 65% sequence identity with CEACAM7, with 38 residues differing between the proteins (Fig. 1a) . The overall fold of CEACAM7 is similar to those of the other CEACAMs that have been determined previously. The overall topology is that of the V-set fold of the immunoglobulin superfamily, comprised of two -sheets labeled ABED and A 0 GFCC 0 C 00 . The sheets are connected by the BC, EF, C 00 D and AA 0 loops (Fig. 1b) . The r.m.s.d.s of the CEACAM7 molecules to each other are low ($0.30 Å ), showing no structural differences within the asymmetric unit. The four molecules of CEACAM7 form two pairs of dimers (Fig. 1c) . The dimer interface is formed from the second -sheet, A 0 GFCC 0 C 00 , specifically the GFCC 0 C 00 strands and the CC 0 , C 0 C 00 and FG loops (Fig. 1c) . Dimerization buries 1610 Å 2 of solvent-accessible surface area as calculated by PISA (Krissinel & Henrick, 2007) . This is similar to the CEACAM1 and CEACAM5 homodimers, which bury 1600 and 1460 Å 2 of solvent-accessible surface area, respectively. The shape-complementarity value (Sc; Lawrence & Colman, 1993) is 0.68, which is smaller than those for the other CEACAMs, with values of 0.81 and 0.72 for CEACAM1 and CEACAM5, respectively. CEACAM7 forms nine hydrogen bonds in the dimerization interface. This is more than CEACAM5 (six hydrogen bonds) but less than CEACAM1 (16 hydrogen bonds). Of the 38 residues in CEACAM7 that differ from CEACAM1 and CEACAM5, eight are found in the dimerization interface. The dimerization constant of CEACAM5 was determined previously to be 0.8 mM by analytical ultracentrifugation (Korotkova et al., 2008) . CEACAM1 does dimerize but forms high molecular-weight oligomers (Korotkova et al., 2008) . The dimerization constant of CEACAM7 was estimated by sedimentation-equilibrium analysis using analytical ultra- in 50 mM Tris-HCl, 50 mM sodium chloride pH 7.5 with rotor speeds of 29 000, 32 000 and 35 000 rev min À1 (blue, red and green curves, respectively). Bottom, residuals of fitted data for each curve. centrifugation (Fig. 1d ). An estimated average molecular weight of 24.7 AE 0.7 kDa (the theoretical monomer molecular weight is 12 574 Da) and a K dimerization of 95 nM (+20/À60 nM) were measured, a tenfold increase in affinity when compared with CEACAM5. We observed no higher molecular weight oligomers for CEACAM7 other than the dimer. Superposition of the CEACAM7 dimer (A + B) onto the CEACAM5 dimer (A + B) was achieved using one half of each dimer (molecule A) and two r.m.s.d.s were calculated: one for the first half of the dimer (molecule A), which results in an r.m.s.d. of 0.67 Å , and the second for the second half, which shows a larger r.m.s.d. of 2.70 Å (molecule B ; Fig. 2a ). Superposition of CEACAM7 onto the CEACAM1 dimer using the same method reveals similar r.m.s.d.s of 0.83 Å for molecule A and 2.26 Å for molecule B (Fig. 2b) . Closer comparison of CEACAM7 with CEACAM5 and CEACAM1 shows two major regions of deviation. The first is the BC loop (residues 23-29), which is not involved in the dimerization interface (Fig. 2c ). This region is highly conserved among members of the CEACAM family; however, positions 25-26 of CEACAM7 differ from those of the other CEACAMs. In all other CEACAMs these residues are Leu25 and Pro26. However, in CEACAM7 they are Glu25 and Ser26. The loss of Pro26 is likely to reduce the rigidity of the loop. The r.m.s.d. of this loop in CEACAM7 is 3.10 Å relative to CEACAM5 and the displacement of this loop causes a slight movement of the N-terminal -strand. Notably, the preceding residue is Asn24, a unique N-linked glycosylation site found only in CEACAM7 and CEACAM4. Three other glycosylation sites are present in CEACAM7 (Asn52, Asn72 and Asn79). These are highlighted in Fig. 2(d) , showing that none are found in the dimerization interface. Glycosylation of CEACAM5 has been shown to be important for interaction with CD8 (Roda et al., 2014) and therefore may also be important for CEACAM7 function. The second region of deviation between CEACAM7, CEACAM5 and CEACAM1 is the C 0 C 00 loop and the C 00 strand. CEACAM7 is unique compared with other CEACAMs, as the C 0 C 00 loop contains a single amino-acid insertion (isoleucine) between residues 52 and 53. To accommodate the insertion in the C 0 C 00 loop without altering the length of the C 0 C 00 or the C 00 D loops, the C 00 strand is found to be distorted relative to CEACAM5 (Fig. 3a) and CEACAM1 (Fig. 3b ). The C 00 strand bulges in the center (residues 56-60), causing breakages of hydrogen bonds in the antiparallel -sheet between the C 0 and C 00 strands. The C 00 strand of CEACAM7 is held in place through a hydrogen bond from the O "2 atom of Asn57 to the N atom of Gly48 (Fig. 3c) . In CEACAM5 the carbonyl group of Thr57 forms the hydrogen bond to Gly48 (Fig. 3d) . Although the C 00 strand does not buckle in CEACAM1, as in CEACAM5, it is found that Thr57 (a) Superposition of the CEACAM7 dimer (CEA7A and CEA7B; red and pink, respectively) onto the CEACAM5 dimer (CEA5A and CEA5B; light and dark cyan, respectively) through molecule A. (b) Superposition of the CEACAM7 dimer (CEA7A and CEA7B; red and pink, respectively) onto the CEACAM1 dimer (CEA1A and CEA1B; blue and lilac, respectively) through molecule A. (c) Alignment of CEACAM7 and CEACAM5 monomers, colored by r.m.s.d. Dark blue is low r.m.s.d. and red is high r.m.s.d. CEACAM1 is omitted for clarity. (d) Potential N-linked glycosylation sites of the CEACAM7 dimer. All residues are solvent-exposed and are not found in the dimerization interface. does form a hydrogen bond across the dimerization interface to Asp95 (Fig. 3e ). This buckling of the C 00 strand creates an extra interaction in the dimerization interface, potentially explaining why CEACAM7 forms such tight dimers and has not been found to form heterodimeric CEACAM complexes such as CEACAM6-CEACAM8, CEACAM1-CEACAM8, CEACAM1-CEACAM6, CEACAM3-CEACAM6 and CEACAM5-CEACAM6 (Oikawa et al., 1991; Skubitz & Skubitz, 2008; Singer et al., 2014) . Although CEACAM1 does not buckle, it too creates an extra interaction across the dimerization through reorientation of Asp95. This also suggests that this hydrogen bond is important for a higher affinity interaction. The structure of CEACAM7 reveals that the dimerization interface is comprised of the same face as other CEACAMs (GFCC 0 C 00 ) yet can accommodate 16 different residues (eight from each monomer), suggesting that these sequence differences can modulate the homodimerization to achieve a tenfold increase in affinity. (a) The polypeptide backbone of the C 0 and C 00 strands of CEACAM7 (red) and CEACAM5 (cyan). (b) The polypeptide backbone of the C 0 and C 00 strands of CEACAM7 (red) and CEACAM1 (blue). (c) The insertion of Ser54 in CEACAM7 (red) results in breakage of the main-chain antiparallel hydrogen bonds and replacement with a hydrogen bond from the side chain of Asn57. This residue also forms a hydrogen bond across the dimer interface to Asn96 (brown). (d) In CEACAM5 (green), only the main-chain antiparallel hydrogen bonds exist. Thr57 is too short to form a hydrogen bond across the dimer interface to Asp95 (purple). (e) However, this is not the case for CEACAM1 (blue). Thr57 forms a hydrogen bond across the dimer interface to Asp95 (gray). N-terminal dimerization domain of CEACAM7 Lung Cancer Proc. Natl Acad. Sci. USA We thank the staff of the Advanced Photon Source (APS) GM/CA CAT beamline 23-ID-B for their support. The authors declare no competing financial interests.