key: cord-0305492-yy73cxrr authors: Mylemans, Bram; Killian, Theo; Vandebroek, Laurens; Van Meervelt, Luc; Tame, Jeremy R.H.; Parac-Vogt, Tatjana N.; Voet, Arnout R.D. title: Crystal structures of Scone, pseudosymmetric folding of a symmetric designer protein date: 2021-04-12 journal: bioRxiv DOI: 10.1101/2021.04.12.439409 sha: c2bb8bee72bc3215c482ca826ed026cb720a4360 doc_id: 305492 cord_uid: yy73cxrr Recent years have seen a raise in the development of computational proteins including symmetric ones. We recently developed a nine-fold symmetric β-propeller protein named Cake. Here we wanted to further engineer this protein to a three-fold symmetric nine-bladed propeller using computational design. Two nine-bladed propeller proteins were designed, named Scone-E and Scone-R. Crystallography however revealed the structure of both designs to adopt an eight-fold conformation with distorted termini, leading to a pseudo-symmetric protein. One of the proteins could only be crystallized upon addition of a polyoxometalate highlighting the usefulness of these molecules as a crystallisation additive. Protein design; polyoxometalate; β-propeller; symmetry Protein design has come a long way since its emergence in the late 80s when the first artificial proteins, consisting of helical bundles, were created by the group of De Grado Ho and DeGrado [1987] , Korendovych and DeGrado [2020] . Amino acid selection at that time was guided purely by physicochemical principles, since the database of known protein structures at that time was still comparatively small. As the understanding of protein folding improved, new computational techniques (such as side-chain repacking) were developed that refined the modelling and design procedure. These algorithms automated the process Dahiyat and Mayo [1996] , allowing a small natural protein domain to be repacked without human input Dahiyat and Mayo [1997] . As many more protein structures became available through the Protein Data Bank, fragment-based methods could be developed that eventually led to the first protein with a novel topology, called TOP7 Kuhlman et al. [2003] . Structure databases continued to expand as computers and synthetic gene synthesis became faster and cheaper, allowing the field to grow steadily. With increased efforts to relate protein sequences to structure in a predictive fashion, and the creation of freely available software such as the Rosetta suite Leaver-Fay et al. [2011] , protein design evolved from a purely academic field to application-driven development. Recent successes include the creation of protein logic gates for gene regulation, the design of a vaccine for Respiratory syncytial virus and a protein inhibitor of SARS-CoV-2 Chen et al. [2020], Sesterhenn et al. [2020] , Cao et al. [2020] . Many different globular protein folds have been targeted for (re)design, but monomeric folds with internal symmetry have been especially popular, including the β-trefoil fold Lee and Blaber [2011] , TIM-barrels Huang et al. [2016] and β-barrels Dou et al. [2018] . The use of symmetry reduces the sequence search space during the design step as the protein can be assembled from identical repeats. The symmetry can also assist in the bottom-up design of more complex structures such as capsids, arrays or frameworks Hsia et al. [2016] , Gonen et al. [2015] . Another protein fold with internal symmetry is the β-propeller fold, which consists of small β-sheets (each with four strands) arranged around a central channel. Natural propeller proteins contain between four and ten repeats of this motif, also called a blade Fülöp and Jones [1999] . The first successful design of a β-propeller was the six-bladed Pizza protein Voet et al. [2014] which has six identical repeats. Interestingly, this protein could self-assemble from two-or three-bladed fragments, as a trimer or dimer respectively. The Pizza protein was further functionalized to biomineralize a cadmium chloride nanocrystal Voet et al. [2015] , to form multimeric assemblies by incorporation of coiled-coils Vrancken et al. [2020] , to bind inorganic polyoxometalates (POMs) that control the crystal packing Vandebroek et al. [2020] , and to act as a novel hydrolase Clarke et al. [2019] . These examples show the potential of the β-propeller fold as a functional building block for a variety of materials. Other symmetric proteins have been designed, such as the eight-fold symmetric Tako8 protein and its four-fold variant Ika8. The Tako8 protein is unable to assemble from smaller repeat fragments because of repulsion between the charged blades. To overcome this effect, the protein was redesigned with alternating repeats, compensating the charges. This restored the ability to self-assemble and improved overall stability . Most recently we designed the Cake protein, which can adopt eight-fold or ninefold symmetry depending on the number of repeats expressed. While this structural plasticity has shed light on the evolutionary mechanisms, yielding diverse repeat numbers in natural β-propellers, it may lead to unexpected results during further functionalization of the Cake protein as a building block Mylemans et al. [2020a] . We therefore set out to redesign Cake9 into a three-fold symmetric nine-bladed protein. The design started from the protein with PDB entry 3hxj, the same protein that was used in the design of the Cake protein Mylemans et al. [2020a] . A three repeat fragment was isolated. From this fragment a three-fold symmetric template was created using the SymDock protocol from Rosetta André et al. [2007] . The order of the repeats four to seven was permuted in all possible ways to create a triple repeat. As one repeat is two amino acids shorter, variants of all repeats were made with the shorter length. The resulting 64 sequences were aligned and the corresponding phylogenetic tree was calculated. These two were used to generate 16 000 putative ancestral consensus repeat sequences using the FastML server Ashkenazy et al. [2012] . These ancestral sequences were mapped onto the symmetric template using a custom pyRosetta script Voet et al. [2017] . After a short relaxation with the Fast Relax protocol form Rosetta, a ref2015 energy score could be assigned to each sequence. The RMSD from the ideal symmetric backbone was also calculated. Two repeat sequences were selected for experimental validation, the sequence with the lowest energy score (Scone-E) and the sequence with the lowest RMSD (Scone-R). The amino acids sequences were reverse translated, optimized for E. coli and ordered as synthetic DNA fragments (gBlocks, IDT) afterwards they 6 were cloned into a pET-28b(+) vector and transformed into the BL21 (DE3) strain for protein expression. After induction with IPTG (0.5 mM), the bacteria were grown overnight at 20 • C and pelleted. The proteins were purified following the same protocol as the Cake protein Mylemans et al. [2020a] . Cells were lysed by resuspension in 30 mL buffer 50 mM NaH 2 PO 4 , pH 8, Table S2 . Crystals were cryoprotected using the crystallization liquor with added glycerol and flash frozen in liquid nitrogen. X-ray diffraction data were ob- An anomalous map could be created from the diffraction measured at 1Å. The restraints for the lacunary Keggin POM are based on the ligand structure of the parent silicotungstic acid Keggin (3-letter identifier = SIW). The 8 structure and restraints were edited using REEL Moriarty et al. [2017] , removing one W-O moiety to form the lacuna, and the resulting CIF was used for structure refinement to describe the lacunary silicotungstic Keggin (3letter identifier = LKE). Diffraction and refinement statistics can be found in Table 1 . Our design method is based on ancestral sequence reconstruction and a symmetrical template, and was previously used to create the Pizza, Tako and Cake proteins Voet et al. [2017] . A schematic overview is given in Fig. 1 . We started from the same four blades of the natural template (PDB entry 3hxj) that were used for the Cake protein design Mylemans et al. [2020a] . Following the modified strategy used to design Ika8, models with three repeats were used to construct a three-fold symmetric protein, instead of a single repeat with nine-fold symmetry. To generate the ancestral sequences two options were considered. Single repeat ancestral sequences could be created from an alignment of the four selected blades and these could be mixed in all combinations to create triple length repeats. However, this would generate over ten billion sequences which would be too computationally expensive. Instead, the order of these four repeats was shuffled and a phylogentic tree was created from the resulting 64 combinations. This tree was used to generate ancestral sequences which could then be mapped onto the symmetric template. After a short relaxation with the Rosetta relax protocol, each template was scored with the ref2015 energy function and the RMS devi-9 ation from the symmetrical template was calculated. The sequences with the best energy score (Scone-E) or lowest RMSD (Scone-R) were chosen for experimental validation. Each polypeptide carries three identical repeats of a motif, which itself consists of three similar sequences (about 40 residues long) in tandem. The sequences (presented in Table S1 ) both show roughly 75% sequence identity to the starting template model (PDB entry 3hxj), and 90% identity to each other. Both designs expressed well, and high purity was achieved in two steps using nickel affinity and gel filtration columns. Analytical size exclusion chromatography indicated that the hydrodynamic volumes of both proteins are similar, but smaller than Cake8 or Cake9 (Fig. 2) . The CD spectra of Scone-E and Scone-R are identical to that of Cake9, indicating folded proteins with similar secondary structure. Thermal unfolding measurements showed that both proteins are less stable than Cake9, with a melting temperature between 70 • C and 80 • C (Fig. 2 ). Scone-R readily crystallized in space group P 2 1 , allowing data to be collected to a resolution of 1.5Å, but Scone-E did not yield crystals under any conditions tested in screening. Scone-E has a highly positively charged central cavity that could lead to molecular repulsion, so we attempted to co-crystallize Scone-E with a negatively-charged polyoxometalate cluster (POM). POMs have previously been observed to assist the crystallization of various proteins. Anderson-Evans species are the most used Bijelic and Rompel [2017] , Aengus et al. [2018], but phosphotungstic acid Keggin has also been used as a co-crystallization and phasing agent Almo et al. [2007] , Ren et al. [2017] . We chose the Keggintype silicotungstic acid (STA) as its symmetry matches the three-fold symmetry the central channel, and it is more stable than other Keggin POMs at physiological pH Zhu et al. [2003] , Bajuk-Bogdanović et al. [2015] . Addition of STA to Scone-E yielded three different crystal forms, all of which diffracted x-rays, to resolution limits between 1.6Åand 2.2Å. Crystals in space-group P 2 1 gave the highest resolution data, but were distinctly non-isomorphous with the Scone-R crystals (see Fig. S1 ). Molecular replacement was initially attempted with the designed models, but this was unsuccessful. Different Cake variants were then tested as search models, and the eight-bladed Cake8 protein proved similar enough to give solutions for both Scone-E and Scone-R. Every crystal form was found to have a single copy of the protein in the asymmetric unit, and refinement proceeded smoothly in each case (parameters are given in Table 1 ). The models show no remarkable geometrical deviations, with three Ramachandran outliers in Scone-E. Ser34 and Ser152 were found to have slightly unusual backbone geometry in every model. These residues are found at loop regions between strands, with fewer bonds to neighbouring residues and more flexibility. Scone-E was found bound to STA in two crystal forms (called a and b). No POM was found in the P 2 1 crystal form. Scone-R and Scone-E overlay closely, with Cα rmsd between 0.92 and 1.23Å over 309 ordered residues. The lowest value was found with the crystal form without STA, but the distortions due to the POM molecules are small and localised. The final electron density map unexpectedly revealed that instead of the nine-bladed design, both proteins adopt an eight-bladed architecture with the remainder of the chain invisible in the density. A side-by-side comparison of the designs and experimental crystal structures is shown in Fig. 3 . In the majority of β-propeller proteins a 1-3 "Velcro" closure is observed in which one N-terminal strand on the outside complements the inner three C-terminal strands of one blade. This configuration is present in Cake, and Scone was designed to share the same feature. In the crystal structure, however, this is not observed, and the ends of the ordered part of the protein chain are found in different blades (4).While the "non-Velcro" configuration does occur in natural propeller proteins, it is rare and has only been reported for the family of prolyl-oligopeptidases Rea and Fülöp [2006] . The absence of strand exchange between blades is even more surprising in the light of recent studies using circularly-permuted Pizza6 protein, which showed this arrangement to be the least stable Mylemans et al. [2020b] . Scone-R and Scone-E were intended to form nine-bladed propellers with three-fold symmetry repeating a three-bladed ABC motif (see Fig. S2 and S3). The same C-A interface was expected to form between the first and last blades as between the C and A blades adjacent in the linear protein sequence. Both proteins however formed an eight-bladed propeller missing the final C blade, creating a new B-A interface between the first and last (8th) blade of the propeller. We submitted the sequences to modern protein structure prediction servers. Both I-Tasser Yang et al. [2015] and EVcouplings Hopf et al. [2018] predicted an eight-fold propeller but they also did so for the sequence of Cake9. We excluded the structures of all Cake proteins and the template protein (PDB entry 3hxj) for these predictions to avoid prior knowledge influences. This suggests they are biased towards the much more common eight-bladed fold and are not yet able to distinguish between eightand nine-bladed propellers. AI based algorithms such as Alphafold Senior et al. [2020] might give an improved prediction. Using the Rosetta interface energy protocol Bazzoli et al. [2017] and Ref2015, the same scoring function used for the design, we calculated the energy of each inter-blade interface in the expected models and crystal structures (see Table 2 ). In the design models, the energy of the C-A interface is noticeably higher than those of the A-B and B-C interfaces, giving an uneven energy distribution around the propeller ring (see Fig. S4 ). In the crystal structures this is not observed. The missing blade changes the overall arrangement of the blades within the propeller, slightly rearranging the interaction between all blades and leading to comparable interactions with very similar interface energies. The proteins therefore adopt a pseudo-symmetric eight-fold symmetrical structure that is more rounded (when viewed along the symmetry axis) than the intended propeller, which had a slightly triangular shape. Hints of strain are apparent in the design models. Average residue energy scores were calculated to be -3.0 Rosetta Energy Units (REU) for the designed nine-bladed symmetric models versus the more favourable -3.5 for the eight-bladed pseudosymmetrical structures. In comparison, the successfully designed Cake9 and Cake8 both have a score of -3.7. Clearly the designed nine-bladed propeller model was not the lowest energy configuration available to the protein. In practice, one ninth of the polypeptide remained unfolded and did not contribute any stabilising energy to the structure. One explanation is that the blades of the template protein (PDB entry 13 3hxj) inherently favour an eight-bladed organisation, therefore the amino acids necessary to enforce the nine-fold symmetry by stabilising the transition interface between the C and A motif, may not be present in the sample of ancestral sequences. Although the structure of Cake9, which was also derived from the same template, suggests any such bias is unlikely to be very strong. This failure of the design is probably due to the backbone template, which formed a notably distorted propeller with a triangular appearance. This resulted in unfavourable subunit interfaces of the trimeric design, leading to a very uneven distribution of energy between the blades, as well as a high energy overall. These bad energy interfaces might be caused by larger deviations from the average blade orientation. This suggests that the propeller blades might have evolved to conform within limits to the average, allowing them to fit between specific neighbours and maintain an overall average curvature which might be one of the reasons why only single blades could be identified as repeat units by a sequence bioinformatics study of β-propellers Chaudhuri et al. [2008]. Scone-E could only be crystallized in the presence of STA, and the polyoxometalate could be located in the electron density maps of forms a and b. The tungsten ions were identified by strong peaks in the anomalous scattering map (see Fig. S5 ). The additive is found at three different positions on the protein surface, and therefore unlikely to bind tightly to any one site, contrary to the strong interactions observed between Pizza6-S and STA Vandebroek et al. [2020] . Instead it is bound to the unstructured loops in crystal form a (grown at pH 7.5), stabilizing a fragment of the final disordered repeat. In all other crystal structure these termini are too disordered and invisible in the electron density. In crystal form b (grown at pH 6.0), two STA molecules are bound to highly positively-charged patches on the protein surface, one inside the protein cavity and one between neighbouring propellers, close to a two-fold crystallographic axis, and therefore modelled at half-occupancy. Both crystal structures with STA bound to the protein are in space groups with higher symmetry than the structures without, due to the crystal contacts formed by the POM (Fig. 5) . The symmetrical nature of POMs and ability to bind positively-charged surface side-chains on proteins allows them to facilitate crystallization. While STA was required to grow the monoclinic crystal form of Scone-E, it is not found in the final structure, despite the enormous scattering power of the metal atoms within it. Apparently STA can help bring the protein molecules together without becoming bound to them itself. Interestingly, while in previous crystal structure reports STA was observed in its non-lacunary state, here the POM in form b is observed as a monolacunary species with a sodium substituting the tungsten, see Fig. 6 . This is not surprising as the Keggin species are known to be pH sensitive. FT-IR measurements have shown that the parent silicotungstic acid Keggin is highly stable at acidic pH, from pH 1 to 6, but starts to show signs of decomposition in aqueous environments starting from pH 6.4 Bajuk-Bogdanović et al. [2015] . It is to be expected that the local environment of the protein, with a localized pKa, can further influ-ence the pH and either protect from or promote POM decomposition. This gives rise to the observation of different POM species at different locations on the protein surface in the same crystal structure. In conclusion, the two novel designed proteins presented here, Scone-E and Scone-R, adopt a different backbone architecture from the one intended. Like octarellin Figueroa et al. [2016] , which was designed as a TIM-barrel but produced a Rossmann-like fold, the Scone proteins are experimentally found to produce stable, soluble structures amenable to crystallisation. They represent interesting cases where in practice the expected model is not the lowest energy configuration available to the polypeptide, and which may help to improve computational design procedures. Interestingly the majority of recent de novo designed complexes are very rich in α-helices, yet failed designs are rich in β-strands. As a guideline for further designs we propose to use a symmetric template with the same symmetry as the number of blades and only vary the sequence as was done for the design of Ika8. Alternatively three-fold symmetric template might be used with a different algorithms not dependent on sequence information such as cost network optimization algorithms Simoncini et al. [2015] . Furthermore, addition of the Keggin polyoxometalate, STA was shown to facilitate crystal growth. Previously the Anderson-Evans species has been used as a crystal adjuvant Aengus et al. [2018] because it is a more pH stable POM. Yet even though the STA became lacunary, it still assisted in crystallization. As POMs are highly interesting for biomedical 16 research yet are not bio-compatible, Bijelic et al. [2019] the Scone-E with the molecule bound in the cavity may form an interesting platform for the further design of bio-compatible POM hybrid proteins as the STA is largely buried inside the protein. This could be achieved by further introducing positively charged and/or hydrogen bonding residues to create a specific high affinity complex. The designed model is shown with each repeat coloured individually, repeats with the same sequence share their colour, A purple, B orange and C teal. On the right side the crystal structure is shown following the same colouring by repeat. (B) A Close-up of the two terminal blades shows that the protein clearly folds as an eight-bladed propeller instead of a nine-fold one, with parts of both termini not visible in the electron density shown as dotted lines. Figure 4 : The interface between the N-terminal blade and C-terminal blade is shown. The 2Fo-Fc electron density map at sigma 1.5 is visualized as a mesh clearly showing that the final residues on either termini are unstructured. Arrows indicate the direction of the protein chain from N-to C-terminus Figure 5 : The crystal packing and the role of STA in providing an interface between the Scone-E proteins. When STA is bound in the crystal, the symmetry increases. Below a close-up of the area around the STA is shown with sticks of residues within 4Å. The whole protein is visible as electrostatic potential surface calculated with APBS at the correct pH. In crystal form a, the POM interacts mostly with arginines in the tail end of the unstructured part of the chain, this allows the protein to form a dimer. In crystal form b, two POMs are present. The STA inside the cavity is attracted by the arginines and lysines in the cavity. This might neutralise the charge of the protein facilitating the formation of a dimer. The POM on the side connects with three protein chains on a positively charged cluster of arginines, lysines and histidines, which might lower the threshold for the proteins to combine as a tetramer The interacting residues are shown as sticks, with H-bonding interactions shown as dark grey dashes and electrostatic interactions shown as teal dashes. The POM molecules are shown as sticks, where oxygen is red; tungsten is blue; and silicon is wheat-coloured. The potassium ion bound to the lacunary POM and Scone-E is shown as a purple sphere. Figure 12: Figure S4 : C-A interface: An alignment between the designed C-A interface (grey) and the crystal C (teal) -A (blue) interface shows the clear difference caused by the different fold. The C-blade of the design is aligned on the C blade of the crystal structure. Figure 13 : Figure S5 : Electrostatic potential map of Scone-E with STA: The electrostatic potential map was calculated with the APBS plugin in PyMOL. Zoom-in on the STA binding site is shown. The anomalous map at sigma 5.5 clearly shows the location of the tungsten atoms. a comprehensive Python-based system for macromolecular structure solution The crystallization additive hexatungstotellurate promotes the crystallization of the hsp70 nucleotide binding domain into two different crystal forms Prediction of the structure of symmetrical protein assemblies FastML: a web server for probabilistic reconstruction of ancestral sequences Vibration spectroscopy stability investigation of 12-tungstosilicic acid solution Using homology modeling to interrogate binding affinity in neutralization of ricin toxin by a family of single domain antibodies Ten good reasons for the use of the tellurium-centered anderson-evans polyoxotungstate in protein crystallography Polyoxometalates as potential next-generation metallodrugs in the combat against cancer De novo design of picomolar sars-cov-2 miniprotein inhibitors Evolution of the β-propeller fold De novo design of protein logic gates Artificial β-propeller protein-based hydrolases Protein design automation De novo protein design: Fully automated sequence selection De novo design of a fluorescence-activating β-barrel Features and development of Coot How good are my data and what is the resolution? Els Pardon The unexpected structure of the designed protein octarellin v.1 forms a challenge for protein structure prediction tools β Propellers: structural rigidity and functional diversity Design of ordered two-dimensional arrays mediated by noncovalent protein-protein interfaces Design of a 4-helix bundle protein: synthesis of peptides which self-associate into a helical protein The EVcouplings Python framework for coevolutionary sequence analysis Design of a hyperstable 60-subunit protein dodecahedron De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy De novo protein design, a retrospective Design of a novel globular protein fold with atomic-level accuracy Chapter nineteen -Rosetta3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules Experimental support for the evolution of symmetric protein architecture from a simple peptide motif An editor for the generation and customization of geometry restraints Structural plasticity of a designer protein sheds light on β-propeller protein evolution Influence of circular permutations on the structure and stability of a six-fold circular symmetric designer protein Computational design of symmetrical 8-bladed β-propeller proteins Structure-function properties of prolyl oligopeptidase family enzymes Structural and biochemical analyses of the dead-box atpase sub2 in association with tho or yra1. eLife, 6 Improved protein structure prediction using potentials from deep learning Guaranteed discrete energy optimization on large protein design problems Hybrid assemblies of a symmetric designer protein and polyoxometalates with matching symmetry Biomineralization of a Cadmium Chloride Nanocrystal by a Designed Symmetrical Protein Computational design of a self-assembling symmetrical βpropeller protein Evolution-Inspired Computational Design of Symmetric Proteins Molecular assemblies built with the artificial protein pizza Overview of the CCP4 suite and current developments DIALS: implementation and evaluation of a new integration package The i-tasser suite: protein structure and function prediction A study of the decomposition behaviour of 12-tungstophosphate heteropolyacid in solution We would like to thank Els Deridder for helping with the cloning of the constructs. We thank the beamline scientists at the Diamond Light Source macromolecular beamlines for their kind assistance. JRHT thanks OpenEye Scientific Software for financial support. ARDV thanks Research Foundation Flanders for financial support (G0E4717N, G0F9316N and G051917N ).