key: cord-0462091-nz4oe3va authors: Khan, Asif; Cowen-Rivers, Alexander I.; Deik, Derrick-Goh-Xin; Grosnit, Antoine; Dreczkowski, Kamil; Robert, Philippe A.; Greiff, Victor; Tutunov, Rasul; Bou-Ammar, Dany; Wang, Jun; Bou-Ammar, Haitham title: AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation date: 2022-01-29 journal: nan DOI: nan sha: 2f9d41a0c7fe4a28270e74d5d07f610a2852a3c5 doc_id: 462091 cord_uid: nz4oe3va Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies to combat harmful pathogens. However, the combinatorial nature of CDRH3 sequence space makes it impossible to search for an optimal binding sequence exhaustively and efficiently, especially not experimentally. Here, we present AntBO: a Combinatorial Bayesian Optimisation framework enabling efficient in silico design of the CDRH3 region. Ideally, antibodies should bind to their target antigen and be free from any harmful outcomes. Therefore, we introduce the CDRH3 trust region that restricts the search to sequences with feasible developability scores. To benchmark AntBO, we use the Absolut! software suite as a black-box oracle because it can score the target specificity and affinity of designed antibodies in silico in an unconstrained fashion. The results across 188 antigens demonstrate the benefit of AntBO in designing CDRH3 regions with diverse biophysical properties. In under 200 protein designs, AntBO can suggest antibody sequences that outperform the best binding sequence drawn from 6.9 million experimentally obtained CDRH3s and a commonly used genetic algorithm baseline. Additionally, AntBO finds very-high affinity CDRH3 sequences in only 38 protein designs whilst requiring no domain knowledge. We conclude AntBO brings automated antibody design methods closer to what is practically viable for in vitro experimentation. conclude AntBO brings automated antibody design methods closer to what is practically viable for in vitro experimentation. Figure 1 : Figure A on the left is a schematic example describing the structure of an antibody along with antibody-antigen docking. Figure B on the right demonstrates the overall optimisation process of AntBO for Antibody Design: from a predefined target antigen (discretised from its known PDB structure), binding affinities of antibody CDRH3 sequences to the antigen are simulated using Absolut!, as a surrogate for costly experimental measurements. Absolut! is used as a black-box function to be optimised for high-affinity CDRH3 protein designs within a trust region of acceptable sequences. AntBO iteratively proposes a CDRH3 sequence and requests its affinity to Absolut!, before adapting its posterior with the affinity of this sequence. The performance of AntBO or other optimisation tools is measured as the highest affinity achieved and how fast it reaches high affinity. Antibodies or immunoglobulins (Igs) are a class of biotherapeutics utilised by the immune system to detect, bind and neutralise invading pathogens (Punt, 2018) . From a structural perspective, antibodies are mainly large Y-shaped proteins that contain variable regions, enabling specific molecular recognition of a broad range of molecular surfaces (Chothia and Lesk, 1987; Rajewsky et al., 1987; Xu and Davis, 2000; of foreign proteins called antigens. As a result, antibodies are a rapidly growing class of biotherapeutics (Nelson et al., 2010) . Monoclonal antibodies now constitute five of the ten top-selling drugs (Walsh, 2003; Kaplon and Reichert, 2018; Urquhart, 2021) . Antibodies are also utilised as affinity reagents in molecular biology research due to their ability to detect low concentrations of target antigens with high sensitivity and specificity (Sela-Culang et al., 2013) . A typical antibody structure consists of four protein domains: two heavy and two light chains connected by disulfide bonds. The heavy chains (VH) include three constant domains and one variable domain, while light chains (VL) possess one constant and one variable domain (Rajewsky et al., 1987; Xu and Davis, 2000; Rees, 2020) . Antibodies selectively bind antigens through the tip of their variable regions, called the Fab domain (antigen-binding fragment), containing six loops, three on the light and three on the heavy chain, called complementarity-determining region (CDR) (Xu and Davis, 2000; Kunik et al., 2012; . The interacting residues at the binding site between antibody and antigen are called the paratope on the antibody side and the epitope on the antigen side (Xu and Davis, 2000; Kunik et al., 2012; . The left side of Figure 1 illustrates the sketch of antibody and its binding to an antigen. Since the CDR regions mainly define the binding specificity, the main overarching goal in computational antibody design is to develop CDR regions that can bind to selected antigens, such as pathogens, tumour neoantigens, or therapeutic pathway targets (Cohn et al., 1980; Rajewsky et al., 1987; Norman et al., 2020) . Despite the intrinsic variation in all CDRs, the CDRH3 region possesses the highest sequence and structural diversity, conferring it a crucial role in the formation of the binding site (Chothia and Lesk, 1987; Xu and Davis, 2000; . For this reason, the highly diverse CDRH3 is the most extensively re-engineered component in monoclonal antibody development. Therefore, in this paper, we refer to the design of the CDRH3 region as antibody design. In practice, the development of antibodies is a complex process that requires various tools for building a structural model for different parts of the antibody (Leem et al., 2016) , generating folding from antigen sequences (Compiani and Capriotti, 2013) , and docking them (Rawat et al., 2021) . Recently, proposed Absolut! an in silico framework for simulating antibody-antigen binding. Absolut! is a centralised tool that provides an end-to-end simulation of antibody-antigen binding affinity using coarse-grained lattice representations of proteins all by preserving more than eight levels of biological complexity present in experimental datasets : antigen topology, antigen amino-acids (AAs) composition, physiological CDRH3 sequences, a vast combinatorial space of possible binding conformations, positional AA dependencies in high-affinity sequences, a hierarchy of antigen regions with different immunogenicity levels, the complexity of paratope-epitope structural compatibility, and a functional binding landscape that is not well described by CDRH3 sequence similarity. We can use Absolut! to evaluate all possible binding conformations of an arbitrary CDRH3 sequence to an antigen of interest and return the optimal binding conformation. However, the combinatorial nature of all the possible CDRH3 sequences makes it impractical to query Absolut! exhaustively. For a sequence of length L consisting of naturally occurring AAs (N = 20), there are N L possible sequences. Thus, even with a modest size L = 11, this number gets too large to search exhaustively. In reality, the search space is even more prominent since CDR sequence lengths can be up to 36 residues (Branden and Tooze, 2012), and designed proteins are not restricted to naturally occurring AAs (Yang et al., 2019) . Furthermore, not all CDRH3 sequences are of therapeutic interest. A CDRH3 can have a strong binding affinity to a specific target but may cause problems in manufacturing due to its unstable structure or show toxicity towards the patient. Antibodies should be evaluated against typical properties known as developability scores for such reasons. These scores measure properties of interest, such as whether a CDRH3 sequence is free of undesirable glycosylation motifs or the net charge of a sequence is in a prespecified range (Raybould et al., 2019; Bailly et al., 2020) . Therefore, the problem of antibody-antigen binding demands a sample-efficient solution to generate the CDRH3 region that binds an arbitrary antigen of interest while keeping developability constraints. Bayesian Optimisation (BO) (Betrò, 1991; Mockus et al., 1978; Jones et al., 1998; Brochu et al., 2010) offers powerful machinery for aforementioned issues. BO uses Gaussian Processes (GP) as a surrogate model of a black-box that incorporates the prior belief about the domain in guiding the search in the sequence space. The uncertainty quantification of GPs allows the acquisition maximisation to trade-off exploration and exploitation in the search space. This attractive property of BO enables us to develop a sample-efficient solution for antibody design. In this paper, we introduce AntBO-a combinatorial BO framework, for in silico design of a target-specific antibody CDRH3 region. Our key contributions are, • The AntBO framework utilises CDRH3 trust-region in the combinatorial sequence space to facilitate the search of CDRH3s with biophysical properties suitable for therapeutic development. • We demonstrate the application of AntBO on 188 known antigens of therapeutic interest. Our results show the benefits of AntBO for in silico antibody design through diverse developability scores of discovered protein sequences. • AntBO substantially outperforms the very high-affinity sequences available out of a database of 6.9 million experimentally obtained CDRH3s, with several orders of magnitude fewer protein designs. • Considering the enormous costs (time and resources) of wet-lab antibody design-related experimentation, AntBO can suggest very high-affinity antibodies while making the fewest queries to the black-box oracle. This result serves as a proof of concept that AntBO can be deployed in the real world where sample efficiency is vital. 2 Related Work Experimental datasets describing antibody binding landscape can be categorised into four ways : (1) Structures of antibody-antigen complexes provide the most accurate description of the binding mode of an antibody and the involved paratope and epitope residues (Schneider et al., 2022b) , which helps to prioritise residues that can modulate binding affinity. Structures do not directly give an affinity measurement but can be leveraged with molecular docking and energy tools to infer approximate binding energy. Only ∼ 1200 non-redundant antibody-antigen complexes are known so far (Schneider et al., 2022b) . (2) Sequence-based datasets contain the results of qualitative screenings of thousands of antibodies (either from manually generated sequence libraries or from ex vivo B cells) (Laustsen et al., 2021) . Typically, millions of sequences can be inserted into carrier cells that express the antibody on their surface. Following repeated enrichment steps for binding to the target antigen, a few thousand 'high affinity' sequences can be obtained (Mason et al., 2021) , and newer experimental platforms will soon allow reaching a few million. As of yet, however, sequencing datasets can only label sequences with «binder» or «non-binder»; or «low affinity», «medium affinity», and «high affinity» classes. (3) Affinity measurements are very time consuming because they require the production of one particular antibody sequence as protein before measuring its physico-chemical properties (including other in-vitro measurable developability parameters). Affinity measurements are precise and quantitative, either giving an affinity reminiscent of the binding energy or down to an association and dissociation constant. As an example, the AB-bind database only reports in total 1100 affinities on antibody variants targeting 25 antigens (Sirin et al., 2016) , and a recent cutting-edge study (Mason et al., 2021 ) measured the affinity of 30 candidate antibodies, showing the experimental difficulty to obtain the affinity measurement of many antibodies. Finally, (4) in (and ex) vivo experiments describe the activity of injected antibodies, including in vivo developability parameters (Raybould et al., 2019; Xu et al., 2019a) such as half-life, and toxicity including off-targets. In vivo experiments are restricted to lead candidates due to their high cost and cannot be performed when screening for antibody leads. Although qualitative (sequencing) datasets inform on initial antibody candidates, increasing the activity and specificity of antibody candidates requires many steps to further improve their affinity towards the antigen target while keeping favourable developability parameters. It is the most tedious and time-consuming step. Several computational approaches have been developed to support antibody design (Norman et al., 2020; Akbar et al., 2022) , either using physical-based antibody and antigen structure modelling (Fiser and Šali, 2003; Almagro et al., 2014; Leem et al., 2016) and docking (Brenke et al., 2012; Sircar and Gray, 2010) , or using machine learning methods to learn the rules of antibody-antigen binding directly from sequence or structural datasets (Akbar et al., 2022) : (1) Paratope and epitope prediction tools take an antibody and antigen sequence or structure and predict the interacting residues (Soria-Guerra et al., 2015; Lu et al., 2021; Sela-Culang et al., 2015; Jespersen et al., 2019; Krawczyk et al., 2014; Liberis et al., 2018; Ambrosetti et al., 2020; Kunik et al., 2012; Krawczyk et al., 2013; Del Vecchio et al., 2021) . Knowledge of the paratope and epitope does not directly inform affinity but rather helps prioritise important residues to improve affinity. (2) Binding prediction tools, often inspired from Protein-Protein Interaction prediction (PPI) tools (Liu et al., 2018) , predict the compatibility between an antibody and an antigen sequence or structure, as clustering predicted sequences to bind the same target (Wong et al., 2021; Xu et al., 2019b) , paratope-epitope prediction or using ranking of binding poses as to classify binding sequences (Schneider et al., 2022a) . However, predicting antibody binding mimics the experimental screening for antibody candidates but does not directly help to get high affinity and specific antibody sequences. (3) Affinity prediction tools specifically predict the affinity improvement following mutations on antibody or antigen sequences. This work focuses on the affinity prediction problem because it is a major time and cost bottleneck in antibody design. When a candidate antibody-antigen complex structure is already known, structural methods predicting affinity change around this starting point (Morea et al., 2000; Clark et al., 2006; Nimrod et al., 2018) are shown to be useful in generating antibodies with higher affinity. As recent examples, (Lippow et al., 2007) combined structure modelling and affinity scoring function to get a 140-fold affinity improvement on an anti-lysozyme antibody. (Kurumida et al., 2020) learnt from the affinity change induced by single-point mutations using an ensemble ML strategy to predict new sequences with improved affinity in comparison to other affinity-based scoring functions. mCSM-AB2 (Myung et al., 2020) uses graph-based signatures to incorporate structural information of antibody-antigen complexes, combined with energy inference using FoldX (Schymkowitz et al., 2005) , as to predict improvements in binding energy. Finally, two methods derived from the general PPI problem have been used on antibody affinity prediction: TopNetTree (Wang et al., 2020) combines a CNN with gradient-boosting trees, and GeoPPI (Liu et al., 2021) uses a graph neural network instead of the CNN. However, there is still a high discrepancy between the results of affinity prediction methods (Guest et al., 2021; Ambrosetti et al., 2020) . Generative ML architectures have been leveraged to generate antibody candidates from sequence datasets. Specifically, an autoregressive model (Sutskever et al., 2011), a variational autoencoder (Kingma and Welling, 2013) or a generative adversarial network (GANs) (Goodfellow et al., 2014) have been used for generating amino-acid sequences of antibodies (Amimeur et al., 2020; Eguchi et al., 2020; Shin et al., 2021; Akbar et al., 2021b; Shuai et al., 2021; Leem et al., 2021) . (Amimeur et al., 2020) also incorporate therapeutic constraints to avoid sampling a non-feasible sequence at inference. (Ingraham et al., 2019; Koga et al., 2012; Cao et al., 2021) additionally includes the information of a backbone structure. Recently (Jin et al., 2021) proposed an iterative refinement approach to redesign the 3D structure and sequence of antibodies for improving properties such as neutralising score. The generative modelling paradigm can increase the efficient design of antibodies by prioritising the next candidates to be tested experimentally. The application of ML methods for improving antibody affinity has been minimal due to the current size of datasets, which is extremely small compared to the space of possible antibody sequences. Further, the generalisability and interpretability of such methods are difficult to assess, and there is a lack of generative methods that can be conditioned for affinity. Here, we set out to leverage the maximal information on antibody sequence affinity from the minimal number of experimental, iterative measurements using BO to generate an informed prediction on potential higher affinity sequences. We leverage the Absolut! simulator as a black-box oracle to provide a complex antibody-antigen landscape that recapitulates many layers of the experimental complexity of antibody-antigen binding. Not only does antBO confidently generate diverse new CDRH3 sequences with higher affinity than the previously known ones, but its interpretable architecture offers the possibility to extract the reasons for higher affinity in silico de novo generated sequences. For a more detailed overview of related work, we refer readers to Appendix A. To design antibodies of therapeutic interest, we want to search for CDRH3 sequences with a strong affinity towards the antigen of interest that satisfies specific biophysical properties, making them ideal for practical applications (e.g., manufacturing, improved shelf life, higher concentration doses). These properties are characterised as "developability scores" (Raybould et al., 2019). The computation of binding affinity comes with challenges. The energy term does not have a closed-form solution, and a vast space of CDRH3 sequences makes it computationally impractical to exhaustively search for an optimal sequence. Here, we pose the design of the CDRH3 region of antibodies as a black-box optimisation problem. Specific to our work, a black-box refers to a tool that can take arbitrary CDRH3 sequence as an input and return its binding affinity towards a prespecified antigen. To simulate high experimental costs faced in the real world, we want a sample efficient solution that works with a prespecified number of protein designs. Many developability scores exist, and in general our framework is compatible with arbitrary developability constraints. In this work, however, we use the three most relevant scores identified for CDRH3 region (Raybould et al., 2019; Jin et al., 2021) . First the net charge of a sequence should be in the range [−2, 2]. It is specified as a sum of the charge of individual residues in a primary amino-acid sequence. Consider a sequence x = {x 1 , · · · , x n }, and let I[.] the indicator function that takes value 1 if the conditions are satisfied and 0 otherwise, then the charge of a residue is defined as E}] , and that of the sequence i C(x i ). Second, any residue should not repeat more that five times in a sequence I[count(x i ) ≤ 5 | ∀i ∈ [0, L − 1]]. Lastly third, a sequence should not contain a glycosylation motif that is a subsequence of form N-X-S/T. To formally introduce the problem, consider the combinatorial space X of protein sequences of length L, for 20 unique amino-acids, the cardinality of space is |X | = 20 L . We can consider a black-box function f as a mapping from protein sequences to a real-valued antigen specificity f : X → R where an optimum protein sequence under developability constraints is defined as, where CDRH3-Developable : X → {0, 1} is an indicator function that takes a sequence of amino acids and returns a Boolean value for whether constraints introduced in Section 3.1 are satisfied (1) or unsatisfied (0). An example of unsatisfied CDRH3 sequence is shown in Figure 1 C. Absolut! (Robert et al., 2021a) is a state-of-the-art in silico simulation suite that considers biophysical properties of antigen and antibody to create a simulation of feasible bindings of antigen and antibody. Although Absolut! is not able to directly generate antibody-antigen bindings at the atomic resolution, and therefore to predict antibody candidates directly. However, using Absolut!, we can develop methods in the simulation world and later employ It takes around 38 steps to design a sequence that surpasses a very high-affinity sequence from Absolut! 6.9M database and about 100 to outperform a super+ affinity sequence. We run all methods with 10 random seeds and report mean and 95% confidence interval for the 12 antigens of interest . The title of each plot is a protein data bank (PDB) id followed by the chain of an antigen. The name of the disease associated with the antigen is provided in Table 2 of Appendix. For results other 176 antigens, we refer readers to Appendix F. the best method in the complex real-world scenario, with the knowledge that this method already performed well on the levels of complexity already embedded into Absolut! datasets. This feature of Absolut! makes it an ideal black-box candidate for the antibody design problem. However, we note that our framework is agnostic to the choice of the tool and can be adapted to other tools provided they can compute binding affinity or other criteria relevant for antibody design. Absolut! does the computation of binding affinity in three main steps, i) antibody-antigen lattice representation, ii) discretisation of antigen and iii) binding affinity computation. We have provided a summary of the three steps in Section B of Appendix. We next introduce our AntBO framework. As introduced in Section 3, our goal is to search for an instance in the input space x * ∈ X that achieves an optimum value under the black-box function f . In a typical setting, the function f has properties such as a) high evaluation cost, b) no analytical solution, and c) may not be differentiable. To circumvent these issues, we use BO for solving the optimisation problem. BO typically goes through the following loop; We first fit a GP on a random set of data points at the start. Next, optimise an acquisition function that utilises the GP posterior to propose new samples that improve previous observations. At last, these new samples are added to data points to refit a GP and repeat the acquisition maximisation, as shown in Figure 1 . For a more comprehensive overview of BO, we refer readers to (Snoek et al., 2012; Shahriari et al., 2015; Hernández-Lobato et al., 2016; Frazier, 2018; Cowen-Rivers et al., 2020; Grosnit et al., 2021b; Garnett, 2022) . Let f : X → R be a continuous function, then the distribution over function f is specified using a GP, that is, is a covariance matrix. The standard choice for a mean function is a constant zero µ(x) = 0 (Rasmussen, 2003) and the entries of a covariance matrix are specified using a kernel function. By definition kernel function k : X × X → R maps a pair of input to a real valued output that gives a measure of correlation between pair of inputs based on the closeness of points in the input space. As X is combinatorial, we need particular kernels to get a measure of correlation, which we introduce in section 4.1.2. For the details of GP modelling we refer readers to Appendix C. We use following three kernel functions k for our work, a) Transformed Overlap Kernel (TK) Transformed overlap kernel defines measure as k(x, are the lengthscale parameters that learn the sensitivity of input dimensions allowing GP to learn complex functions. b) ProteinBERT Kernel (ProtBERT) We introduce a new Deep kernel for protein design based on success of BERT. The ProteinBERT (Brandes et al., 2021) model is a transformer neural network trained on millions of protein sequences over 1000s of GPUs. We use the encoder of the pretrained ProteinBERT model followed by a standard RBF kernel to measure similarity between a pair of input. c) Fast String Kernel (SSK) (SSK) (Leslie et al., 2004) defines similarity between two sequences by measuring number of common sub-strings of order l. Let Σ l be a set of all possible ordered substrings of length l in the alphabet, details of the string kernel can be found in the Appendix C.2. The acquisition function refers to the criterion for drawing new samples (in our problem protein sequences), from the posterior of our Gaussian Process, to improve the black-box output (energy). The goal of EI (Močkus, 1975) is to search for a data point that provides expected improvement over already observed data points. Suppose we have observed N data points D n = {(x 1 , f (x 1 )), · · · , (x n , f (x n ))} then the EI is defined as an expectation over D n under the GP posterior distribution as α EI ( . We only consider expected improvement (EI), however many other acquisitions exist (Grosnit et al., 2021a). In this work, we take inspiration from the combinatorial trust-region (TR) (Wan et al., 2021) to propose a method that utilises the CDRH3 developability constraints in the TR. Specifically, at each iteration t we define a TR around the best point x * that includes all points satisfying antibody design constraints introduced in Section 3.1 and differ in at most L t indices from x * . We then perform CDRH3-TR (Equation 2) acquisition maximisation, where δ(., .) is the Kronecker delta function. We start with the previous best x * , next, sample a neighbour point x * Neigh. contained within CDRH3-TR, by selecting a random amino acid and perturbing it with a new amino acid. We store the sequence if it improves upon the previous. Figure 1 illustrates this process. See Appendix 1 for detailed algorithm. Lt (x * ) = {x | CDRH3-Developable(x), i δ(x i , x * i ) ≤ L t } Here, we outline the baseline methods for comparisons and present the discussion of our key findings. We indicate our framework (AntBO's) kernel choice directly in the label e.g. AntBO SSK, AntBO TK, AntBO ProtBERT. Baselines: We also compare our work with several other combinatorial black-box optimisation methods such as HEBO We remind the reader that Super+ is the best energy score out of the 6.9 million (6.9M) experimentally obtained murine CDRH3s available from Absolut!. We introduce the developability criteria defined as CDRH3 trust-region in all the methods for a fair comparison. For an explanation of the algorithms, including the configuration of hyperparameters and implementation details, we refer readers to Appendix D. Figure 3 : We analyse the developability scores of 200 proteins designed by each method (with a fixed seed=42) to simulate the diversity of suggested proteins across a single trial. Here, we report developability scores for S protein from the SARS-CoV virus (PDB id: 2DD8). For each method, the landscape of designed sequences suggested during the optimisation process is shown with their binding affinity and three developability scores (Hydropathicity, charge and instability). Interestingly, we observe a positive correlation between Hydropathicity increasing with energy. We see that while other methods have a larger spread of charge, we see AntBO favourably suggesting the most points with a neutral charge. Overall we conclude that energetically favourable sequences still explore a diverse range of developability scores and that protein designs of AntBO are more stable compared to other methods. For the primary analysis in the main paper, we use twelve core antigens: their protein data bank (PDB) id, chain of antigen, and the name of the associated disease is provided in Table 2 of Appendix. Our choice of antigens is based on their interest in several studies Akbar et al., 2021b) . Precise wet-lab evaluation of an antibody is a tedious process and comes with a significant experimental burden because it requires purifying the antibody and testing its affinity (Rawat et al., 2021) . We, therefore, first investigate the sample efficiency of all optimisation methods. We ran experiments with a pre-specified budget of 200 function calls and reported the convergence curve of a number of protein designs vs minimum energy (or binding affinity) in Figure 2 . Core antigen experiments are run with ten random seeds, remaining antigens with three. We report the mean and 95% confidence interval of the results. We observe AntBO TK achieves the best performance w.r.t to minimising energy (maximising affinity), typically Table 1 : We report the average number of protein designs in Absolut! needed to reach low, high, very high and super affinity, (top 5%, 1%, 0.1%, 0.01% quantiles from Absolut!6.9M database) in successful trials. Note, this analysis does not include the unsuccessful trials that failed to reach the given affinity category. This analysis is presented in the Appendix 5. We denote by super+ the number of designs required to beat the best CDRH3 in the 6.9M database. The various binding categories are taken from existing works Akbar et al., 2021b) . We take the best performing AntBO with a transformed overlap kernel and compare against GA and RS across 188 antigens for three seeds per antigen per method. We denote categories in which no samples by a method were obtained with −. Our results demonstrate that AntBO significantly reduces the number of protein designs required to reach important categories (quantiles) of affinity. reaching high affinity within 200 protein designs, with no prior knowledge of the problem, AntBO TK can search for CDRH3 sequences that achieves significantly better affinity than very-high affinity sequences from experimentally obtained Absolut!6.9M database. In majority of antigens AntBO outperforms the best evaluated CDRH3 sequence by Absolut!. We also evaluate our approach on the remaining 176 antigens in Absolut! antibody-antigen binding database. Results for those are provided in Appendix F. We next investigate how many protein designs on an average it takes for AntBO TK to achieve this landmark across all antigens. For this purpose, we take five affinity groups from existing works Akbar et al., 2021b) : low affinity(5%), high affinity(1%), very high affinity (0.1%), super (0.01%) and super+ (best) and report the average protein designs needed to suggest a sequence in the respective classes for 188 antigens. In Table 1 we report the results of best performing AntBO TK and other baselines. We can see AntBO TK reaches a very high-affinity class in around ∼ 38 protein designs, super in around 50 designs and only 85 to outperform the best available sequence. This sample efficiency of AntBO TK demonstrates its superiority and relevance in the practical world. We want to remark that an optimisation process of AntBO methods starts with random initial points that allows exploration of loss landscape, including the space of non-binders. We hypothesise that this awareness of non-binders landscape AntBO presents better chances of finding very-high affinity sequences in contrast to protein engineering methods that only start from the knowledge of high-affinity binders. Due to a computational budget, we could not run all methods across all antigens; however, we report similar results in the Appendix (Table 4 ) for all methods across the 12 core antigens. We noticed that for S protein of 1NSN, P protein of 2JEL, and on a few other antigens in Appendix AntBO gets close to the best experimental sequence but doesn't beat its affinity. We attribute this result to the complexity of the 3D lattice representation of an antigen that might require more protein designs to explore the CDRH3 optimisation landscape. We wish to study this effect in future work. Next, we look at the developability scores of 200 CDRH3 designs on the binding affinity landscape. The developability scores we used in the CDRH3-TR are a few of many other biophysical properties. As discussed in Section 3.1, more scores can be added as constraints. However, that also adds an extra computational cost in finding an optimum sequence. Here, in addition to charge, we report hydrophobicity (HP) and instability index, which has been used in other studies for assessing downstream risks of antibodies Mason et al., 2021) . A smaller instability index value means the sequence has high conformational stability, and in practical scenarios, it is desired to have a score of less than 40. CDRH3 regions tend to aggregate when developing antibodies, making it impractical to design them. This phenomenon is due to the presence of hydrophobic regions. A low value of HP means a sequence has a lesser tendency to aggregate. We use the Biopython (Chapman and Chang, 2000) package to compute HP and instability scores. Figure 3 describes the hexagram of binding affinity and the developability scores of S protein of the SARS-CoV virus (PDB id: 2DD8) that is responsible for the entry of the virus into the host cell, which is an important therapeutic target for effective neutralisation of the virus. This analysis helps us understand how optimising energy affects the biophysical properties of protein sequences. The hexagon discretises the space with binding affinity on the vertical axis and developability score on the horizontal axis. The dark hexagons show a subspace frequency within a specific binding affinity range and the respective developability score. On the top of each plot is a histogram of binding affinity of 200 designs and a right histogram of developability scores. We observe the distribution of three scores varies across all methods showing the distinction between their designed sequences. Interestingly, the performance on developability scores which were not included in constraints demonstrates that the AntBO methods are capable of identifying sequences with diverse developability parameters. This observation suggests that our approach is suitable for exploring sequences towards high affinity and selecting candidates in a desired developability region. Thus, we can conclude AntBO is a more practically viable method for antibody design. We also did a similar analysis for additional 11 antigens. The results are described in Section F of Appendix. We have proposed AntBO, a combinatorial BO framework for designing CDRH3 regions of antibodies. AntBO utilises the developability criterion of antibodies to construct a trust region of feasible sequences in the combinatorial space, thus allowing us to design antibodies with desired biophysical properties. Our results across several antigens demonstrate the efficiency of AntBO in finding sequences outperforming many baselines, including the best CDRH3 obtained from Absolut!'s 6.9 million database. AntBO can suggest very high-affinity sequences with an average of 38 protein designs and a super binding sequence in under 100. In the future, we wish to investigate AntBO on improved structure prediction and docking simulation models. The authors would like to thank Puneet Rawat, Rahmad Akbar from Greiff Lab, the University of Oslo, as well as Simon Mathis, Arian Jamasb and Ryan-Rhys Griffiths from the University of Cambridge for their involvement in discussion and feedback on the paper. A.1 Methods to improve antibody developability A list of therapeutically relevant developability parameters with related methods can be found in Akbar et al. (2022) . These parameters include solubility, charge, aggregation, thermal stability, viscosity, immunogenicity (i.e., the antibody should not induce an immune response, which might also induce its faster clearance by the body), glycosylation motifs, and the in vivo half-life. Interestingly, although the whole antibody sequence can be modified to improve developability, the CDRH3 region seems also to have a critical impact on them beyond only affinity and antigen recognition Grevys et al., therefore it is crucial to include developability constraints into CDRH3 design. Interestingly, many parameters can be calculated in advance from the antibody sequence according to experimentally validated estimators Akbar et al. (2022) , allowing to define boundaries of the search space according to development needs. (2021); Weber et al. (2020). Beyond virtually unconstrained size, the other major advantage of simulated data is that they represent ground truth, i.e., the rules underlying the generation of the data are known and can be tested prospectively, for example, for benchmarking model interpretability. In the context of antibody-antigen binding, proposed the simulation suite Absolut!, an in silico binding affinity simulation framework that allows fast computation of antibody-antigen affinity all by providing both sequence and structural information on the antibody-antigen binding site. They also use Absolut! to annotate an extensive database of native human and mouse antibody sequences with antigen-binding affinity. Absolut! simulated data has been proven useful as the authors could benchmark different ML structure-or sequence-based encoding strategies that improve paratope-epitope prediction , and to test under which conditions a deep learning antibody generative model (Akbar et al., 2021b) can generate antibody sequences with a prespecified paratope-epitope affinity. Methods on protein engineering Romero and Arnold (2009); Goldsmith and Tawfik (2017); Zeymer and Hilvert (2018) use evolutionary methods to explore the combinatorial space of protein sequences. They use directed evolution -an iterative protocol of mutation and selection followed by a screening to identify sequences with improved diversity and functional properties. However, the approach suffers from high experimental costs due to inefficient screening methods. To overcome the experimental hurdle (Yang et al., 2018) propose an ML pipeline for protein engineering. The central theme is to utilise the measurements of known protein sequences to train an ML model that can further guide the evolution of protein sequences. However, these methods have not been used for antibody design due to limited data on antibody specificity. The Absolut! suite utilises Latfit Mann et al. (2012; to transform a PDB structure of an antigen into a 3D lattice coordinates position. The PDB structure represents each residue in a protein sequence using 3D coordinates. The Latfit maps these coordinates to a discretised lattice position by optimising dRMSD (Root Mean Square Distance) between the original PDB structure and many possible lattice reconstitutions of the same chains. Specifically, for a sequence of length L, Latfit first assigns a lattice position to a starting residue and then enumerates all neighbouring sites to select the one with the best dRMSD with the PDB coordinate of the next residue. The generated nascent lattice structures are rotated to better match the original PDB before adding the next AA. This process is repeated sequentially, and at each step, Latfit keeps track of N best structures of length K to find the best position of the next residue. Absolut! uses a Ymir Robert et al. (2021b) framework to represent the protein structures as a 3D lattice model. A protein's primary structure is a sequence of amino acids (AA); in a 3D lattice structure, each AA can occupy a single position, and the consecutive AAs occupy the neighbouring sites. This layout form only permits a fixed inter-AA distance with joint angles of 90 degrees. The structure of the protein is specified with the help of a starting position in the grid and a sequence of relative moves (up (U), down (D), left (L), right (R)) that determine the next AA position. The first step is to define a coordinate system with the starting point as an observer and the next move relative to the observer to specify the sequence of moves. There is also a possibility of backwards (B) for the first move that is not allowed for other positions to prevent any collision. In this stage, the lattice structure of two proteins is used to compute their binding affinity. Since the structure of the antibody is not known apriori for a specific antigen, all possible foldings of CDRH3 are generated recursively using the Algorithm 1: Antibody Bayesian Optimisation (AntBO) Input: Objective function f : X → R, number of evaluations N , alphabet size of categorical variable K. Randomly sample an initial data set for i = 1, ..., N do Fit a GP surrogate g on D i Construct a CDRH3-TR Li (x * ) around the best point x * = arg min x∈Di g(x) using Equation 2 in the main document. Optimise constrained acquisition, and stored in the memory. As the number of possibilities of folding grows combinatorially with the length of a sequence, Absolut! restricts the size of the CDRH3 sequence to 11 and limits the search to structures with a realistic minimum of contact points (10) to the target antigen. After we obtain the lattice structure of antigen and the list of pre-computed structures for the CDRH3 sequences, the binding affinity of one structure is described as a summation of three terms, a) binding energy the interaction between residues of antibody and residues of antigen, b) folding energy antibody the interactions within the residues of antibody, and c) folding energy antigen the interaction within the residues of antigen. Since the antigen structure is fixed apriori, the third term is constant and can be ignored. Consider a pair of lattice positions and residues of an antigen sequence (S, R) and of an antibody sequence (G, K). The binding energy E bind is defined as a sum of all interaction potential, and the folding energy E fold of an antibody is defined as a sum of intra-bonds between its AAs, where A(., .) is an interaction potential of residues determined via Miyazawa-Jernigan interaction potential Miyazawa and Jernigan (1996) and I(a, b) is an indicator function that takes the value 1 if a and b are non-covalent neighbors in the lattice otherwise 0. For the evaluation of an arbitrary CDRH3 sequence, the precomputed structures are filled one by one with residues of CDRH3, and their total energy is computed as E total = E fold + E bind , this step is known as exhaustive docking. The best structure is then selected using the minimum total energy criterion. Absolut! does this computation for sequences of length 11; if the CDRH3 is of size greater than 11, the same process is repeated for all subsequences of length 11 with a stride of 1 from left to right. C GP Fitting C.1 GP prediction be a set of training data points and X be a set of test data points. To fit a GP we parameterise hyperparameters of a kernel and maximise the marginal log-likelihood (MLL) using the data. Specifically, we define K(X, X) as a covariance matrix of training samples, K(X, X * ) and K(X * , X) are covariance matrix of train-test pairs and viceversa, and K(X * , X * ) is a covariance matrix of test samples. The final posterior distribution over test samples is obtained by conditioning on the train and test observation as, f * |X * , X,y ∼ N (K(X * , X)K(X, X) −1 y, K(X * , X * ) − K(X * , X)K(X, X) −1 K(X, X * )) GP Training We fit the GP by optimising the negative MLL using Adam Kingma and Ba (2014) . The kernel functions in GPs come with hyperparameters that are useful to adjust the fit of a GP; for example, in a SE kernel described above, we have a lengthscale hyperparameter that acts as a filter to tune the contribution of various frequency components in data. In a standard setup, the optimum value of hyperparameter is obtained by minimising the negative marginal loglikelihood, − log p(y|X, θ) = 0.5 log |(K θ (X, X) + σ 2 I)| +0.5y T (K θ + σ 2 I) −1 K θ (X, X) + 0.5N log (2π) where θ is the set of kernel hyperparameters and |.| is the determinant operator. where x ij is a length j subsequence of sequence x , θ = θ m , θ g are kernel hyperparameters, θ m , θ g ∈ [0, 1] control the relative weighting of long and non-contiguous subsequences, I y [x] is an indicator function set to 1 if strings x and y match otherwise 0, and φ θ y (x) measures the contribution of subsequence y to sequence x. p j ← sample with j th highest fitness from P i // Get sample with the next highest fitness 6 P i+1 ← P i+1 ∪ p j // Add this sample to the next population 7 Q i ← Q i ∪ p j // Add this sample to the list of parents for j = N elite + 1, ..., N pop where j increases in steps of 2 do 8 q 1 , q 2 ∼ Q i // Randomly sample two parents 9 constraint_satisf ied = F alse while not constraint_satisf ied do 10 η 1 , η 2 = crossover(q 1 , q 2 , p c ) // Perform crossover to generate two offsprings 11 η 1 , η 2 ← mutate(η 1 , p m ), mutate(η 2 , p m ) // Mutate both offsprings 12 constraint_satisf ied = C(η 1 ) ∧ C(η 2 ) // Check that both offsprings satisfy all constraints Given a computational budget of N black-box function evaluations in a constrained optimisation setting, RS samples N candidates that satisfy the specified constraints and evaluates the black-box function at those samples. A best candidate is the one with the minimum cost. The Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO Cowen-Rivers et al. (2020)) is the winning solution of the NeurIPS 2020 black-box optimisation (BBO) challenge Turner et al. (2021) . HEBO is designed to tackle BBO problems with continuous or categorical variables, dealing with categorical values by transforming them into one-hot encodings. Efforts are made on the modeling side to correct the potential heteroscedasticity and non-stationarity of the objective function, which can be hard to capture with a vanilla GP. To improve the modeling capacity, parametrised non-linear input and output transformations are combined to a GP with a constant mean and a Matérn-3/2 kernel. When fitting the dataset of observations, the parameters of the transformations and of the GP are learned together by minimising the negative marginal likelihood using Limited-memory BFGS (LBFGS) optimiser. When it comes to the suggestion of a new point, HEBO accounts for the imperfect fit of the model, and for the potential bias induced by the choice of a specific acquisition function, by using a multi-objective acquisitions framework, looking for a Pareto-front solution. Non-dominated sorting genetic algorithm II (NSGA-II), an evolutionary method that naturally handles constrained discrete optimisation, is run to jointly optimise the Expected Improvement, the Probability of Improvement, and the Upper Confidence Bound. The final suggestion is queried from the Pareto front of the valid solutions found by NSGA-II that is run with a population of 100 candidate points for 100 optimisation steps. HEBO results presented in this paper are obtained by running the official implementation by Cowen-Rivers et al. (2020) at https://github.com/huawei-noah/HEBO/tree/master/HEBO. To tackle optimisation of high-dimensional black-box functions, BO solvers face the difficulty of finding good hyperparameters to fit a global GP over the entire domain, as well as the challenge of directly exploring an exponentially growing search space. Eriksson et al. (2019) introduces the use of local BO solvers to alleviate the above issues. The key idea is to use local BO solvers in separate subregions of the search space, leading to a trust region BO algorithm (TuRBO). A TR is a hyperrectangle characterised by a centre point and a side length L similar to what we describe in Section 4.2. A local GP with constant mean and Matérn-5/2 ARD kernel fits the points lying in the TR better to capture the objective function's behaviour in this subdomain. The GP fit is obtained by optimising the negative MLL using Adam Kingma and Ba (2014). The size of the TR is adjusted dynamically as new points are observed. The side length L is doubled (up to L max ) after τ succ consecutive improvements of the observed black-box values, and is halved after τ f ail consecutive failures to find a better point in the TR. The TR is terminated whenever L shrinks to an L min value, and a new TR is initialised with a side size of L init . The next point to evaluate is selected using the Thompson Sampling strategy, which ideally consists of drawing a function f from the GP posterior and finding its minimiser. However, it is impossible to draw a function directly over the entire TR; therefore, a set of min(100d, 5000) candidate points covering the TR is used instead. Function values are sampled from the surrogate model's joint posterior at these candidate points. The candidate point achieving the lowest sample value is acquired. Our experiments only acquire suggested points that fulfil the developability constraints. In our experiments, we rely on the TuRBO implementation provided in the BBO challenge Turner et al. (2021) codebase at https://github.com/rdturnermtl/bbo_challenge_starter_kit/tree/master/ example_submissions/turbo. To adapt the BO framework for combinatorial problems, Oh et al. (2019) proposed to represent each element of the discrete search space as a node in a combinatorial graph. Then a GP surrogate model is trained for the task of node regression using a diffusion kernel over the combinatorial graph. However, the graph grows exponentially with the number of variables, making it impractical to compute its diffusion kernel. To address this issue, the authors express the graph as a cartesian product of subgraphs. This decomposition allows the computation of a graph diffusion kernel as a cartesian product of kernels on subgraphs. The efficient computation of diffusion kernel is done using Fourier transform. The hyperparameters of the GP model, such as kernel scaling factors, signal variance, noise variance, and constant mean value, are obtained using 100 slice sampling steps at the beginning and ten slice sampling iterations afterwards. Once we obtain the GP fit, it remains to optimise an acquisition function over the combinatorial space, which is done by applying a breadth-first local search (BFLS) from 20 starting points selected from 20, 000 evaluated random vertices. Since COMBO does not support constraints on the validity of the suggested sequences, we modify the acquisition optimisation to incorporate the CDRH3-TR introduced in 2. We use the default hyperparameters that we provide on Genetic algorithms (GAs) are inspired by Charles Darwin's theory of natural selection. The idea is to use probabilistic criteria to draw new population samples from the current population. This sampling is generally done via crossover and mutation operations Sastry et al. (2005) . Overall the primary operations involved in GA are: encoding schemes, crossover, mutation, and selection, respectively Katoch et al. (2021) . For encoding, we use a general ordinal encoding scheme that assigns a unique integer to each AA-inspired from binary encoding where each gene represents integer 0-1 or hexadecimal that represents integer 0-15 (or 0-9, A-F) S.N. Sivanandam (2008a) . Specific to our work, we express each gene by a letter of CDRH3 sequences ranging from (0-19). For selection, we use the elitism mechanism De Jong (1975), which preserves a few best solutions in the current population to the next population. Our mutation operator is inspired by the most commonly used bit flipping mutation S.N. Sivanandam (2008b) that flips a bit of each gene with a given probability. Instead, we randomly replace a gene from 0-19 as our range is different. Finally, for crossover, we use a uniform crossover, which suggests unbiased exploration and better recombination Katoch et al. (2021) . The pseudocode of a GA is illustrated in Algorithm 2. We run all our experiments on a Linux server with 87 cores 12 GB GPU memory. We use Python for the implementation of our framework and Absolut! for simulating energy of antibody-antigen complex. The architecture of the framework can be seen in part (a) of Figure 4 . The dataloader, execution and summariser layer are abstracted and integrated in the training, leaving only the optimizer for developers to design. The developers could also optionally include Gaussian Process, Neural Network or an arbitrary model to use with the optimiser. The platform has three important features that facilitate training, • Distributed Training : Multiple CPU processes for data sampling in parallel environment, especially useful in low data efficiency algorithm such as deep reinforcement learning. Multiple CPU processes are also utilised to evaluate the binding energy with Absolut, which speeds up the evaluation time. Multiple GPU training for algorithm that supports neural network. • Real Time Visualisation : Update the training results of the optimizer in real time. Our framework offers visualisation of the training graph, minimum binding energy obtained so far per iteration as well as the corresponding sequence for the minimum binding energy and antigen docking visualisation. • Gym Environment : Our framework offers a highly reusable gym environment containing the objective function evaluator via Absolut!. Developers could set the antigen to evaluate and CDRH3 sequences to bind, and the environment returns the binding energy of the corresponding CDRH3 sequences. The gym environment has two options, SequenceOptim and BatchOptim. For SequenceOptim, agent fills a character in each step until all characters for the CDRH-3 are filled, which the episode stops. For each step, the reward is zero until the last step of episode, which the CDRH-3 sequences will be evaluated and the negative binding energy is returned as reward. The binding energy is negative, hence lower negative binding energy represents higher reward. For BatchOptim, each episode only has one step, which the agent inputs the list of CDRH-3 sequences of the antigens into the environment and the reward returns are list of binding energy corresponding to the CDRH-3 sequences. SequenceOptim is useful for seq2seq optimisation and BatchOptim is useful for combinatorial optimisation. We have outlined hyperparameter used for all the methods in Table 3 . For BERT we use a pretrained "prot_bert_bfd" model available from Brandes et al. (2021) . Here we provide additional results to demonstrate the performance of AntBO. We first report the convergence curve of number of protein evaluated by Absolut vs total energy (or binding affinity). We compare the best performing BO method AntBO TK with other baselines.The results are described in Figure (5,6,7,8) . We observe AntBO consistently outperforms other baseline methods. Our results demonstrate AntBO can efficiently suggest antibodies for several antigens of interest. On an average it only takes approximately 38 evaluations to find an optimum sequence. This sample efficiency shows AntBO is a vital development towards real-world antibody design. We also observe for majority of antigens AntBO TK outperforms AntBO ProtBERT. This finding is contrary to our assumption that a transformer pretrained on millions of protein sequences would provide us a continuous representation that can be a good inductive bias for GP. We believe this could be associated with specific characteristics of antibody sequences that differ from general protein sequences. However, AntBO ProtBERT performs on par with other baselines. We next look into the range of developability scores of the other 11 antigens identified in Section 5.1 of the paper. Similar to analysis on SARS-CoV virus we report the diversity plots for these antigens in Figure (9,10,11) . A diverse range of scores is preferred. We observe AntBO is consistent in designing antibodies with stable structure. Table 3 : Hyperparameter Configuration of different optimisation methods. It is known for an antigen several antibodies can achieve a similar binding affinity value while differing in residues in the primary structure Akbar et al. (2021b) . We run multiple trials of AntBO with a distinct set of initial points in GP. This nonoverlapping exploration allows local search (in the acquisition maximisation step) to follow a different trajectory on the optimisation landscape leading to a distinct local optimum. In Table 2 , we report the binding energy and CDRH3 sequence obtained for twelve antigens under different trials. Table 4 : Here we analyse of successful and unsuccessful trials to reach binding affinity categories. We report performance across all respective methods (across 10 trials and 12 antigens collectively) in terms of the number of protein designs needed to reach low, high, very high and super affinity (top 5%, 1%, 0.1%, 0.01% quantiles from Absolut! 6.9M database). We denote by super+ the number of designs required to outperform the best CDRH3 in the 6.9M database. The various binding categories are taken from existing works Robert et al. (2021a); Akbar et al. (2021b) . For every affinity class, we report three scores. For a given method, let TE be a matrix of size [12 × 10, 200] (each trial lasts 200 iterations) of all trial affinities, I(TE i ≤ c) be an indicator function which returns 1 if for a given trial i a method finds any TE better than the affinity category c's value, and F as a function which returns minimum samples required to reach affinity category c, if the trial did not reach the affinity category it returns 0. In the first column we report the average number of protein designs N i F(TE i , c)/N required to reach the respective affinity quantile value c. The second column is the proportion of trials N i I(TE i ≤ c)/N that output a protein design better than the given affinity category c. Ideally, the best method would attain the lowest value in the first column, and a value of 1 in the second column, showing that it reaches the affinity category in ALL trials, and does so in the lowest number of samples on average. Due to the importance of both measurements, in the third column, we report the ratio of two values to get an overall measure of performance, where we penalise the reported mean samples required to reach an affinity category by the % of failed trials to reach that affinity category. The categories in which no samples by a method reach the affinity class are denoted by −. Our results demonstrate that AntBO significantly reduces the number of protein designs required to get to the important categories (quantiles) of affinity. We note the better performance (Low & High Affinity) of TuRBO in the first column is due to its first 23 random exploration points selected from a latin hypercube. Although TurBO reaches a lower average sample (∼ 82) when it gets a Super+ sequence, it does this rarely across experiments (9%), whereas AntBO TK gets to Super+ sequences 54% of the time in ∼ 98 samples. The penalised ratio balances the probability of designing a Super+ sequence and required evaluations that strongly suggest AntBO TK is the superior method. Table 5 : Here we analyse of successful and unsuccessful trials to reach binding affinity categories. We report performance across all respective methods (across 10 trials and 12 antigens collectively) in terms of the number of protein designs needed to reach low, high, very high and super affinity (top 5%, 1%, 0.1%, 0.01% quantiles from Absolut! 6.9M database). We denote by super+ the number of designs required to outperform the best CDRH3 in the 6.9M database. The various binding categories are taken from existing works Robert et al. (2021a); Akbar et al. (2021b) . For every affinity class, we report three scores. For a given method, let TE be a matrix of size [188 × 10, 200] (each trial lasts 200 iterations) of all trial affinities, I(TE i ≤ c) be an indicator function which returns 1 if for a given trial i a method finds any TE better than the affinity category c's value, and F as a function which returns minimum samples required to reach affinity category c, if the trial did not reach the affinity category it returns 0. In the first column we report the average number of protein designs N i F(TE i , c)/N required to reach the respective affinity quantile value c. The second column is the proportion of trials N i I(TE i ≤ c)/N that output a protein design better than the given affinity category c. Ideally, the best method would attain the lowest value in the first column, and a value of 1 in the second column, showing that it reaches the affinity category in ALL trials, and does so in the lowest number of samples on average. Due to the importance of both measurements, in the third column, we report the ratio of two values to get an overall measure of performance, where we penalise the reported mean samples required to reach an affinity category by the % of failed trials to reach that affinity category. The categories in which no samples by a method reach the affinity class are denoted by −. Our results demonstrate that AntBO requires significantly less number of protein designs across trials to reach important categories (quantiles) of affinity. Figure 9 : Diversity of developability scores. We report the scores using the ten runs of random seeds. Figure 10 : Diversity of developability scores. We report the scores using the ten runs of random seeds. Figure 11 : Diversity of developability scores. We report the scores using the ten runs of random seeds. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding In silico proof of principle of machine learning-based antibody design at unconstrained scale Terje Andersen, and Victor Greiff. Progress and challenges for the machine learning-based design of fit-forpurpose monoclonal antibodies. mAbs Second antibody modeling assessment (ama-ii) Differentiable biology: Using deep learning for biophysics-based and data-driven modeling of molecular mechanisms proabc-2: Prediction of antibody contacts v2 and its application to information-driven docking Designing feature-controlled humanoid antibody discovery libraries using generative adversarial networks Predicting antibody developability profiles through early stage discovery screening Bayesian methods in global optimization Proteinbert: A universal deep-learning model of protein sequence and function. bioRxiv Application of asymmetric statistical potentials to antibody-protein docking A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning Fold2seq: A joint sequence (1d)-fold (3d) embedding-based generative model for protein design Biopython: Python tools for computational biology Canonical structures for the hypervariable regions of immunoglobulins Affinity enhancement of an in vivo matured therapeutic antibody using structure-based computational design An antibody loop replacement design feasibility study and a loop-swapped dimer structure Antibodies to watch in 2018 A review on genetic algorithm: Past, present, and future Adam: A method for stochastic optimization Auto-encoding variational bayes Principles for designing ideal protein structures Antibody i-patch prediction of the antibody binding site improves rigid local antibody-antigen docking Improving b-cell epitope prediction and its application to global antibody-antigen docking Paratome: An online tool for systematic identification of antigenbinding regions in antibodies based on sequence or structure Predicting antibody affinity changes upon mutations by combining multiple predictors Animal immunization, in vitro display technologies, and machine learning for antibody discovery Simulation intelligence: Towards a new generation of scientific methods Abodybuilder: Automated antibody structure prediction with data-driven accuracy estimation Deciphering the language of antibodies using self-supervised learning. bioRxiv Fast string kernels using inexact matching for protein sequences Parapred: Antibody paratope prediction using convolutional and recurrent neural networks Computational design of antibody-affinity improvement beyond in vivo maturation Machine learning approaches for protein-protein interaction hot spot prediction: Progress and comparative assessment Deep geometric representations for modeling effects of mutations on protein-protein binding affinity A structure-based b-cell epitope prediction model through combing local and global features. bioRxiv CPSPweb-tools: a server for 3D lattice protein studies One billion synthetic 3d-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction Ymir: A 3d structural affinity model for multi-epitope vaccine simulations Exploring protein fitness landscapes by directed evolution Genetic Algorithms Dlab: Deep learning methods for structure-based virtual screening of antibodies Sabdab in the age of biotherapeutics: Updates including sabdab-nano, the nanobody structure tracker The foldx web server: An online force field The structural basis of antibody-antigen recognition Antibody specific epitope prediction-emergence of a new paradigm Taking the human out of the loop: A review of bayesian optimization Protein design and variant prediction using autoregressive generative models Generative language modeling for antibody design. bioRxiv Snugdock: Paratope structural optimization during antibody-antigen docking compensates for errors in antibody homology models Ab-bind: Antibody binding mutational database for computational affinity predictions Terminologies and Operators of GA Terminologies and Operators of GA Practical bayesian optimization of machine learning algorithms An overview of bioinformatics tools for epitope prediction: Implications on vaccine development Generating text with recurrent neural networks Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. CoRR, abs/2104.10201 Top companies and drugs by sales in 2020 Think global and act local: Bayesian optimisation over high-dimensional categorical and mixed search spaces A topology-based network tree for the prediction of protein-protein binding affinity changes following mutation Immunesim: Tunable multi-feature simulation of b-and t-cell receptor repertoires for immunoinformatics benchmarking Ab-ligity: Identifying sequence-dissimilar antibodies that bind to the same epitope Diversity in the cdr3 region of vh is sufficient for most antibody specificities Structure, heterogeneity and developability assessment of therapeutic antibodies Functional clustering of b cell receptors using sequence and structural features Machine learning in protein engineering Machine-learning-guided directed evolution for protein engineering Directed evolution of protein catalysts. Annual review of biochemistry