Computational design method
We developed our computational protein design protocols using Rosetta25,26 (2019.01) and PyRosetta4 (release 2019.22)27. Our protocol of parametric repeat protein generation started by building an ideal helix H1 (with a length of 12–28 residues) with the MakeBundleHelix mover in Rosetta25,26 and placing it away from the z axis with a given radius and an angle corresponding to its orientation. A second helix, H2 (with a length of 12–28 residues), was then modeled and placed according to the specification of the six rigid-body degrees of freedom for geometry transformation from H1 to H2. By combining H1 and H2 into one pose, we built the first repeat unit R1. Subsequently, we used user-specified six rigid-body degrees of freedom between repeat units to perform a geometric transformation to obtain the second unit R2. We propagated the repeat units based on the number of repeats desired to generate the helical repeat protein backbones. We then connected pairs of sequence-adjacent helices with loops of three to six residues using ConnectChainMover10. To filter the generated repeat protein backbones, we required a maximum distance of 18 Å between the termini of the helices to be connected by buttressed long loops. We also removed the low-quality backbone models with fewer than 28% of the residues in a buried core.
To design buttressed loops, we developed a hybrid method that assembles native structural motifs via kinematic loop closure. To guide the sampling toward the hairpin-shaped conformations, we constructed a motif library that consists of native β turns. A β-turn motif is defined by having a backbone-to-backbone hydrogen bond between the carbonyl group of residue i and the amine group of residue i + 3 (refs. 28,29). In this work, we searched for native β-turn fragments by mining a set of selected PDBs based on 90% maximum sequence identity and a 1.6 Å resolution cutoff from PISCES30. The collected β turns were further clustered by the K-centers algorithm31 at a maximum cluster distance of 0.63 Å, resulting in 180 motif clusters. Using the same approach, we compiled a library of native helical capping motifs to guide the sampling of loops connecting helices in the repeat proteins.
We used GeneralizedKIC11 for loop closure. An extended loop fragment was first constructed by stitching native helical capping motifs (four amino acids), β-turn motifs (four amino acids) and KIC residues (five to ten amino acids) with randomized backbone torsion angles. We chose these lengths because we found limited structural diversity for loops with lengths less than nine amino acids. When the loop length exceeded 14 amino acids, it became significantly more difficult to design buttressing interactions to stabilize the entire loop. The torsion angles of β turns were set according to the motifs sampled from the β-turn library, and the Φ/Ψ torsion angles of nonpivot KIC residues were sampled from the Ramachandran distribution, with omega torsion angles fixed at 180°. All the bond lengths were kept fixed at the ideal lengths. The position of the β-turn was randomly sampled in the loop. In each step of GeneralizedKIC, kinematic loop closure was performed to connect the loop to the intended insertion site. Loop conformations were filtered by backbone steric clashes. We further filtered the models by selecting loops with at least two intraloop backbone-to-backbone hydrogen bonds. To avoid helical conformations, we removed the models predicted to have more than five consecutive helical residues by DSSP32. This ensured the extended β-hairpin shape, which contributed to the loop stability and compatibility for buttressing.
To install the loops of the same conformation in each unit of repeat proteins, we used the RepeatPropagationMover in Rosetta25,26. After filtering out the loops with steric clashes, we computed three metrics to help select the best loop conformations for buttressing—number of interloop backbone-to-backbone hydrogen bonds, loop motif score and direction score. We required at least one interloop backbone-to-backbone hydrogen bond between each pair of neighboring loops to enhance the sequence-independent loop buttressing. To select loops with loop–helix hydrophobic contacts, the motif scores were computed by matching the selected pairs of residues to the known contacting native hydrophobic residue pairs (Val, Leu, Ile, Met and Phe) in PDB33. The scores for the matched residue pairs in the loop regions were then summed to one total score. Only the loops with a negative total motif score were selected. The direction score described the relative orientation of the loops from the rest of the input repeat proteins. Specifically, we defined the following two vectors: vector a started from the center of mass of the two loop terminal residues and pointed to the farthest Cα atom of the loop; vector b started from the same point as a but pointed toward the center of mass of the repeat unit. The direction score was derived by computing the angle between the two vectors.
$${\mathrm{Direction}}\; {\mathrm{score}}={\cos }^{-1}\frac{{{\bf{a}}}{\boldsymbol{\cdot }}{{\bf{b}}}}{\left|{{\bf{a}}}\right|\left|{{\bf{b}}}\right|}$$
The accepted angles ranged from 45° to 135°. We also required at least five residues within 8 Å of the closest helical residues.
Next, we performed a fast sequence design task to identify loop conformations compatible with interloop bidentate hydrogen bond networks. From each propagated set of loops, the loop on the second repeat unit was selected for sequence design. One packing step using PackRotamersMover25,26 was conducted separately for each residue on this loop using amino acids that are compatible with forming sidechain-to-backbone bidentate hydrogen bonds—Asn, Asp, Gln or His. We excluded amino acids with longer side chains (Arg and Lys), as their high entropic cost might diminish the free energy contribution of buttressing. After each packing step, bidentate hydrogen bonds between the packed residue and its neighboring residues were counted. A bidentate hydrogen bond was defined as two separate hydrogen bonds forming between atoms in the functional group of the sidechain from a residue on the loop and the backbone of a neighboring repeat unit. The selected amino acid was kept only if it formed interloop bidentate hydrogen bonds; otherwise, the original amino acid (by default, Ala) was kept. In the case where the one-step packing approach failed to generate any interloop bidentate hydrogen bonds, we used an alternative three-stage scheme to maximize the sampling efficiency of bidentate hydrogen bonds—identifying pseudo-bidentate hydrogen bonds, performing constrained minimization for building hydrogen bonds and evaluating the resulting bidentate hydrogen bonds. We defined that a pseudo hydrogen bond has a donor–acceptor distance <3 Å and a hydrogen bond angle >120°. After propagating the designed residue to all the repeat units, we imposed a harmonic distance constraint between each donor and acceptor atoms with a target distance of 2 Å and a s.d. of 0.5 Å. At the minimization stage, we performed symmetric minimization of the loops to improve the interactions of potential hydrogen bonds. Finally, we used the Rosetta score function to examine if the bidentate hydrogen bonds formed in the minimized loop conformations.
To guide the sequence design, we used LayerSelector to define the core, the boundary and the surface layers and specified the allowed amino acids for each layer. We added residue type constraints to fix the identity of the residues participating in loop buttressing bidentate hydrogen bonds, so the stabilizing interactions obtained during loop sampling would be maintained throughout sequence design. Next, we performed four rounds of sequence design using the FastDesign mover under the repeat-symmetric constraints to ensure the repeat units had the same structures and sequences. To improve the solubility and folding of the designs, we subsequently performed one round of FastDesign to remove the solvent-exposed hydrophobic residues on the terminal repeat units. Only polar residues such as Glu, Gln, Lys and Arg were allowed for this round of design. The designed structures were then refined by minimization in Cartesian space and subsequently filtered by the number of buried unsatisfied heavy atoms (≤3), hole score normalized by total number of core residues (≤−0.015), total score normalized by total number of residues (<−2), packstat (≥0.5) and hydrogen bonding energy of each loop residue (≤−1). Top 10% scoring structures were further tested by in silico validation methods such as molecular dynamics simulations (Cα RMSD < 3 Å), AlphaFold12,13 (PLDDT > 80, Cα RMSD < 3 Å) or RoseTTAFold34 (PLDDT > 80, Cα RMSD < 3 Å). Structural similarity between native ankyrin loops and the designed RBL loops was computed by TM-align35.
We performed molecular dynamics simulations using GROMACS (2018.4)36 with the Amber99SB-ILDN force field37. The design models were solvated in dodecahedron boxes of the explicit TIP3P38 waters with the net charge neutralized. We treated long-range electrostatic interactions with the Particle-Mesh Ewald method39. Both short-range electrostatic interactions and van der Waals interactions used a cutoff of 10 Å. Energy minimization was performed using the steepest descent algorithm. A 1-ns equilibration under the NPT ensemble was subsequently performed with position restraints on the heavy atoms. We used Parrinello–Rahman barostat40 and velocity-rescaling thermostat41 for pressure coupling (1 atm) and temperature coupling (310 K), respectively. For the production runs, we launched three 20-ns trajectories under the NPT ensemble for each design model. The Cα atom RMSD against the design model was computed for analysis.
Protein expression and characterization
Genes encoding the in silico validated designs were synthesized (IDT) and cloned into pET-29b expression vectors. The plasmids were transformed into Lemo21 (DE3) expression E. coli strain (NEB). Protein expression was performed using the auto-induction protocol42 at 37 °C for 24 h in 50 ml or 100 ml culture. During the purification, cells were pelleted at 4,000g for 10 min and resuspended in 25 ml lysis buffer (25 mM Tris–HCl (pH = 8), 150 mM NaCl, 30 mM imidazole, 1 mM DNase and 10 mM lysozyme with Pierce Protease Inhibitor Tablets (Thermo Fisher Scientific)). Sonication was subsequently performed for 2.5 min (10 s on and 10 s off per cycle). The lysate was then centrifuged at 16,000g for 30 min. The supernatant was applied to a gravity flow column packed with Ni-NTA resin (Qiagen), followed by 20 ml wash buffer (25 mM Tris–HCl (pH = 8), 150 mM NaCl and 30 mM imidazole) and 5 ml elution buffer (25 mM Tris–HCl (pH = 8), 150 mM NaCl and 400 mM imidazole). The eluted protein was then concentrated and injected into an Akta Pure FPLC device with a flow rate of 0.75 ml min−1 in the running buffer (25 mM Tris–HCl (pH = 8) and 150 mM NaCl). The typical yield of a monodisperse and thermally stable designed RBL is 1–6 g l−1. To perform SEC–MALS, we prepared the purified protein at ~2 mg ml−1 and injected 100 μl of sample into a Superdex 200 10/300GL column and measured the light scattering signals using a miniDAWN TREOS device (Wyatt Technology). To measure the CD signals, we first prepared the sample at ~0.2 mg ml−1 in 25 mM phosphate buffer in a 1 mm cuvette. A Jasco J-1500 CD spectrometer was used for all CD measurements. We set the range of wavelength from 190 nm to 260 nm and scanned over a three-temperature (25 °C, 95 °C and cooling back to 25 °C) set for each sample. We submitted all samples for SAXS43,44 to Advanced Light Source, LBNL for data collection at the SIBYLS 12.3.1 beamline.
Design and characterization of repeat peptide-binding proteins
We used the recently developed protein interface design method7 for in silico binder docking and design experiments. Docking of repeat peptides to the binder scaffold was guided by the geometric transformation between native ankyrins and their peptide targets in the crystal structures from PDB15. Symmetric sequence design was performed for each docked peptide–protein pair following the same protocol used for designing RBLs. All the designed complexes were computationally tested by AlphaFold with a cutoff of PAE_interaction ≤15 before experimental characterization.
Split-luciferase assay was performed using the Nano-Glo Luciferase Assay System (Promega). The coding sequence of small-BiT was fused to the gene of peptide binders, and the coding sequence of large-BiT was fused to the coding sequence of the target peptide (GenScript). The BiT-fused proteins and peptides were expressed and purified with the same protocol for RBLs. The purified peptide binders and target peptides were titrated in the presence of Nano-Glo substrate in 96-well plates, and the luminescence was measured on a Synergy Neo2 plate reader (Agilent Technologies). To conduct the fluorescence polarization binding assays, we synthesized the repeat peptide fragments with N-terminal tetramethylrhodamine labels. Fluorescence polarization measurements were performed at 25 °C in a Synergy Neo2 plate reader (Agilent Technologies) with a 530/590 nm filter. A series of twofold dilutions of binder–peptide 80-μl mixture were performed in 25 mM Tris–HCl (pH = 8), 150 mM NaCl and 0.05% (vol/vol) Tween 20 in 96-well assay plates. The protein concentrations ranged from 4 μM to 0.47 pM, and the concentration of N-terminal tetramethylrhodamine-labeled peptide was kept at 0.3 nM. The samples were incubated for 3 h before measurement.
Structural characterization by X-ray crystallography
RBL4 was concentrated to 150 mg ml−1 and crystallized by vapor diffusion. Initial crystals formed in the MCSG-2 crystallization screen (Anatrace) and optimized crystals were grown in 100 mM sodium acetate, pH 4.4, and 2% polyethylene glycol 4000. The crystal was cryoprotected with 30% ethylene glycol and flash-cooled in liquid nitrogen. Diffraction was measured at the Advanced Photon Source beamline 23 ID-B. Reflections were indexed, integrated and scaled with autoPROC (1.0.5)45. The structure was solved by molecular replacement in Phaser (2.8.3)46. Initial attempts using the predicted model were unsuccessful due to clashes. A subsequent search for eight copies of a single helix–loop–helix repeat (76–118 residues) identified two copies of the protein in the asymmetric unit. The model was rebuilt using Phenix AutoBuild (1.18.2_3874)47 and completed by iterative rounds of interactive refinement in Coot (0.9.5)48 and reciprocal space refinement in Phenix (1.19.1_4122)49,50,51,52. The final refinement strategy included reciprocal space refinement, individual atomic displacement parameters, Translation/Libration/Screw refinement using parameters determined with TLSMD (13 June 2012)53 and occupancy refinement of alternate conformations. Model geometry was assessed with MolProbity (implemented in Phenix 1.19.1_4122)54. The final model included 99.5% of residues in the favored region of the Ramachandran plot with no outliers.
RBL7_C2_3 was concentrated to 119 mg ml−1 and crystallized by vapor diffusion in 2.4 M sodium malonate, pH 7.0, using the MCSG-1 crystallization screen (Anatrace). The crystal was cryoprotected by the addition of ten volumes of 3.4 M sodium malonate, pH 7.0, and flash-cooled in liquid nitrogen. Reflections were indexed, integrated and scaled with XDS (5 February 2021)55. To solve the structure by molecular replacement, an ensemble of monomer structures was generated by AlphaFold and used as a search ensemble in Phaser (2.8.3). The solution contained eight molecules that formed four homodimers. The model was rebuilt with Phenix AutoBuild (1.19.2_4158) with morphing and completed by iterative rounds of interactive refinement in Coot (0.9.8.6) and reciprocal space refinement in Buster (2.10.4)56 or Phenix (1.20.1_4487). The final refinement strategy in Phenix included reciprocal space refinement, individual atomic displacement parameters, noncrystallographic symmetry restraints and Translation/Libration/Screw refinement using one group per chain. Model geometry was assessed with MolProbity (implemented in Phenix 1.20.1_4487)54. The final model had 98.22% of residues in the favored regions of the Ramachandran plot with no outliers. Composite omit maps were generated in Phenix by sequentially omitting 5% of the final structure model and performing simulated annealing from 5,000 K. Crystallographic software was installed and maintained using SBGrid57.
Data analysis and visualization were performed using Python (3.7)58, seaborn (0.11.2)59, Matplotlib (3.1.3)60, Pandas (0.24.2)61,62 and PyMOL (2.4.1)63.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.