key: cord-0997154-65exfug6
authors: Hoffman, Samuel; Chenthamarakshan, Vijil; Wadhawan, Kahini; Chen, Pin-Yu; Das, Payel
title: Optimizing Molecules using Efficient Queries from Property Evaluations
date: 2020-11-03
journal: Nature Machine Intelligence
DOI: 10.1038/s42256-021-00422-y
sha: 1b9591f3665eea89c4071c3440b0c2bc9419ffbd
doc_id: 997154
cord_uid: 65exfug6

Machine learning based methods have shown potential for optimizing existing molecules with more desirable properties, a critical step towards accelerating new chemical discovery. Here we propose QMO, a generic query-based molecule optimization framework that exploits latent embeddings from a molecule autoencoder. QMO improves the desired properties of an input molecule based on efficient queries, guided by a set of molecular property predictions and evaluation metrics. We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules for drug-likeness and solubility under similarity constraints. We also demonstrate significant property improvement using QMO on two new and challenging tasks that are also important in real-world discovery problems: (i) optimizing existing potential SARS-CoV-2 Main Protease inhibitors toward higher binding affinity; and (ii) improving known antimicrobial peptides towards lower toxicity. Results from QMO show high consistency with external validations, suggesting effective means to facilitate material optimization problems with design constraints.

Molecule optimization (MO) for improving the structural and/or functional profile of a molecule is an essential step for many scientific and engineering applications including chemistry, drug discovery, bioengineering and material science. Without further modeling or use of prior knowledge, the challenge of MO lies in searching over the prohibitively large space comprised of all possible molecules and generating new, valid, and optimal ones. In recent years, machine learning has shown to be a promising tool for MO by combining domain knowledge and data-driven learning for efficient discovery [1] [2] [3] [4] . Compared to traditional high-throughput wet lab experiments or computer simulations that are time-consuming and expensive 5, 6 , machine learning can significantly accelerate MO by enabling iterative improvements based on instant feedback from real-time model prediction and analysis 7, 8 , and thereby reducing the gap between initial discovery and subsequent optimization and production of materials for various applications. For example, machine learning driven MO can enable prompt design of optimized candidates starting from existing lead molecules, leading to potentially better inhibition of SARS-CoV-2 proteins. It is now well-accepted that the majority of existing drugs fail to show desired binding (and inhibition) to SARS-CoV-2 targets, mostly due to the novel nature of the SARS-CoV-2 virus 9, 10 . Therefore, optimization of existing lead molecules toward better SARS-CoV-2 target binding affinity while keeping the molecular similarity high appears a promising first step for optimal drug design for COVID-19. Similarly, an efficient MO method can guide design of antimicrobials with better-optimized toxicity to fight against resistant pathogens, one of the biggest threats to global health 11 . Without loss of generality, we refer a lead molecule as the starting molecule to be optimized in order to meet a set of desired properties and constraints.

Many recent research studies that focus on machine learning enabled MO represent a molecule as a string consisting of chemical units. For small organic molecules, the SMILES representation 12 is widely used, whereas for peptide sequences, a text string comprised of amino acid characters is a popular representation. Often, for efficiency reasons, the optimization is performed on a learned representation space of the system of interest, which describes molecules as embedding vectors in a low-dimensional continuous space. A sequence-to-sequence encoder-decoder model, such as a (variational) autoencoder, can be used to learn continuous representations of the molecules in a latent space. Moreover, different optimization or sampling techniques based on the latent representation can be used to improve a molecule with external guidance from a set of molecular property predictors and simulators. The external guidance can be either explicitly obtained from physics-based simulations, (chem/bio-)informatics, wet-lab experiments, or implicitly learned from a chemical database.

Based on the methodology, the related works on machine learning for MO can be divided into two categories: guided search and translation. Guided search uses guidance from the predictive models and/or evaluations from statistical models, where the search can be either in the discrete molecule sequence space or through a continuous latent space (or distribution) Figure 1 . System illustration of the proposed query-based molecule optimization (QMO) framework. The QMO system progressively optimizes an input lead molecule (e.g., Remdesivir) according to a set of user-specified properties (e.g., binding affinity and Tanimoto similarity) by leveraging the learned molecule embeddings from a pair of pre-trained encoder and decoder (i.e. an autoencoder), and by evaluating the properties of the generated molecules. Given a candidate embedding z t at the optimization step t, QMO randomly samples the neighboring vectors of z t in the embedding space, evaluates the properties of the corresponding decoded molecules, and uses the evaluations for gradient estimation (see equation (4) ) and queried-based gradient descent (see equation (3)) for finding the next candidate embedding vector z t+1 . learned by an encoder-decoder. Genetic algorithms [13] [14] [15] and Bayesian optimization (BO) 16 have been proposed for searching in the discrete sequence space, but their efficiency can be low in the case of high search dimension. Recent works have exploited latent representation learning and different optimization/sampling techniques for efficient search. Examples include the combined use of variational autoencoder (VAE) and BO [17] [18] [19] [20] , VAE and Gaussian sampling 21 , VAE and sampling guided by a predictor 22, 23 , VAE and evolutionary algorithms 24 , deep reinforcement learning and/or a generative network [25] [26] [27] [28] [29] , and attributeguided rejection sampling on an autoencoder 30 . On the other hand, translation-based approach treats molecule generation as a sequence-to-sequence translation problem [31] [32] [33] [34] . Examples are translation with junction-tree 35, 36 , shape features 18 , hierarchical graph 37 , and transfer learning 38 . Comparing to guided search, translation-based approaches require the additional knowledge of paired sequences for learning to translate a lead molecule into an improved molecule. This knowledge may not be available for new MO tasks with limited information. For example, in the task of optimizing a set of known inhibitor molecules to better bind to SARS-CoV-2 target protein sequence while preserving the desired drug properties, a sufficient number of such paired molecule sequences is unavailable. We also note that these two categories are not exclusive. Guided search can be jointly used with translation.

In this paper, we propose a novel Query-based Molecule Optimization (QMO) framework, as illustrated in Figure 1 . In this context, a query to a designed loss function for QMO gives the corresponding numerical value obtained through the associated property evaluations. Efficiency refers to the performance of the optimization results given a query budget. QMO uses an encoder-decoder and external guidance, but it differs from existing works in the following aspects: (i) QMO is a generic end-to-end optimization framework that reduces the problem complexity by decoupling representation learning and guided search. It applies to any plug-in (pre-trained) encoder-decoder with continuous latent representations. It is also a unified and principled approach that incorporates multiple predictions and evaluations made directly at the molecule sequence level into guided search without further model fitting. (ii) To achieve efficient end-to-end optimization with discrete molecule sequences and their continuous latent representations, QMO adopts a novel query-based guided search method based on zeroth order optimization 39, 40 , a technique that performs efficient mathematical optimization using only function evaluations (see Section 1 in the Supplementary Material for more details). Its query-based guided search enables direct optimization over the property evaluations provided by chemical informatics/simulation software packages or prediction APIs, and it supports guided search with exact property evaluations that only operate at the molecular sequence level instead of latent representations or surrogate models. To the best of our knowledge, this work is the first study that facilitates molecule optimization by disentangling molecule representation learning and guided search, and by exploiting zeroth order optimization for efficient search in the molecular property landscape. The success of QMO can be attributed to its data efficiency, by exploiting the latent representations learned from abundant unlabeled data and the guidance for property prediction trained on relatively limited labeled data.

We first demonstrate the effectiveness of QMO through two sets of standard benchmarks. On two existing and simpler MO benchmark tasks of optimizing drug-likeness (QED) 41 and penalized logP (reflecting octanol-water partition coefficient) 22 with similarity constraints, QMO attains superior performance over existing baselines, showing at least 15% higher success on QED optimization and an absolute improvement of 1.7 on penalized logP. Performance of QMO on these molecular physical property benchmarks shows its potential for optimizing the material design prior to synthesis, which is critical in many applications, such as in food industry, agrochemicals, pesticides, drugs, catalysts, and waste chemicals.

Next, as a motivating discovery use-case that also, at least to some extent, reflect the complexity of real discovery problems 42 , we demonstrate how QMO can be used to improve binding affinity of a number of existing inhibitor molecules to the SARS-CoV-2 Main Protease (M pro ) , one of the most extensively studied drug targets for SARS-CoV-2. As an illustration, Figure 2 shows the top docking poses of Dipyridamole and its QMO-optimized variant with SARS-CoV-2 M pro . We formulate this task as an optimization over predicted binding affinity (obtained using a pre-trained machine learning model) starting from an existing molecule of interest (i.e. a lead molecule). Since experimental IC 50 values are widely available, we use them (pIC 50 = − log 10 (IC 50 )) as a measure for protein-ligand binding affinity. pIC 50 of the optimized molecule is constrained to be above 7.5, a sign of good affinity, while the Tanimoto similarity between the optimized and the original molecule is maximized. Retaining high similarity while optimizing the initial lead molecule means important chemical characteristics can be maximally preserved. Moreover, a high similarity to existing leads is important for rapid response to a novel pathogen such as SARS-CoV-2, as then it is more likely to leverage existing knowledge and manufacturing pipeline for synthesis and wet lab evaluation of the optimized variants. Moreover, the chance of optimized variants inducing adverse effects is potentially low. Our results show that QMO can find molecules with high similarity and improved affinity, while preserving other properties of interest such as drug-likeness.

We also consider the task of optimizing existing antimicrobial peptides toward lower selective toxicity, which is critical for accelerating safe antimicrobial discovery. In this task QMO shows high success rate (∼ 72%) in improving the toxicity of antimicrobial peptides, and the properties of optimized molecules are consistent with external toxicity and antimicrobial activity classifiers. Finally, we perform property landscape visualization and trajectory analysis of QMO to illustrate its efficiency and diversity in finding improved molecules with desired properties.

We emphasize that QMO is a generic-purpose optimization algorithm that enables optimization over discrete spaces (e.g. sequences, graphs), which involves searching over a latent space of the system by using guidance from (expensive) black-box function evaluations. Beyond the organic and biological molecule optimization applications considered in this study, QMO can be applied to optimization of other classes of materials, e.g. inorganic solid-state materials like metal alloys or metal oxides.

In our QMO framework, we model a molecule as a discrete string of chemical or amino acid characters (i.e. a sequence). Depending on the downstream MO tasks, the sequence representation can either be a string of natural amino acids 30, 43 , or a string designed for encoding small organic chemicals. In particular, the simplified molecular input line entry specification (SMILES) representation 12 describes the structure of small organic molecules using short ASCII strings. Without loss of generality, we define X m := X × X · · · × X as the product space containing every possible molecule sequence of length m, where X denotes the set of all chemical characters. To elucidate the problem complexity, considering the 20 protein-building amino acids as characters in a peptide sequence, the number of possible candidates in the space of sequences with length m = 60 is already reaching the number of atoms in the known universe (∼ 10 80 ). Similarly, the space of small molecules with therapeutic potential is estimated to be on the order of 10 6044, 45 . Therefore, the problem of MO in the ambient space X m can be computationally inefficient as the search space grows combinatorially with the sequence length m.

To address the issue of large search space for molecule sequences, QMO adopts an encoder-decoder framework. The encoder Enc : X m → R d encodes a sequence x ∈ X m to a low-dimensional continuous real-valued representation of dimension d, denoted by an embedding vector z = Enc(x). The decoder Dec : R d → X m decodes the latent representation z of x back to the sequence representation, denoted by x = Dec(z). We note that depending on the encoder-decoder implementation, the input sequence x and the decoded sequence x may be Table 5 in the Supplementary Material for details. of different length. On the other hand, the latent dimension d is universal (fixed) to all sequences. In particular, Winter et al. 46 proposed a novel molecular descriptor and used it for an autoencoder to learn latent representations featuring high similarity between the original and the reconstructed sequences. QMO applies to any plug-in (pre-trained) encoder-decoder with continuous latent representations and thus decouples representation learning and guided search, in order to reduce the problem complexity of MO.

In addition to leveraging learned latent representations from a molecule encoder-decoder, our QMO framework incorporates molecular property prediction models and similarity metrics at the sequence level as external guidance. Specifically, for any given sequence x ∈ X m , we use a set of I separate prediction models { f i (x)} I i=1 to evaluate the properties of interest for MO. In principle, for a candidate sequence x, a set of thresholds

where [I] denotes the integer set {1, 2, . . . , I}. Moreover, we can simultaneously impose a set of J separate constraints {g j (x|S) ≥ η j } J j=1 in the optimization process, such as molecular similarity, relative to a set of reference molecule sequences denoted by S.

Our QMO framework covers two practical cases in MO: (i) optimizing molecular similarity while satisfying desired chemical properties and (ii) optimizing chemical properties with similarity constraints. It can be easily extended to other MO settings that can be formulated via

In what follows, we formally define our designed loss function of QMO for Case (i). Given a starting molecule sequence x 0 (i.e. a lead molecule) and a pre-trained encoder-decoder, let x = Dec(z) denote a candidate sequence decoded from a latent representation z ∈ R d . Our QMO framework aims to find an optimized sequence by solving the following continuous optimization problem:

property validation loss (to be minimized) − J ∑ j=1 λ j · g j (Dec(z)|S) molecular score (to be maximized) (1) The first term ∑ I i=1 max{τ i − f i (Dec(z)), 0} quantifies the loss of property constraints and is presented as the sum of hinge loss over all property predictions, which approximates the binary property validation relative to the required thresholds {τ i } I i=1 . It achieves the optimal value (i.e. 0) only when the candidate sequence x = Dec(z) satisfies all the desired properties, which is equivalent to the condition that f i (Dec(z)) ≥ τ i for all i ∈ [I]. The second term ∑ J j=1 λ j · g j (Dec(z)|S) corresponds to a set of molecular similarity scores to be maximized (therefore a minus sign in the minimization formulation). The reference sequence set S can be the starting sequence such that S = {x 0 }, or a set of molecules. The positive coefficients {λ j } J j=1 are associated with the set of molecular similarity scores {g j (Dec(z)|S)} J j=1 , respectively. It is worth mentioning that the use of the latent representation z as the optimization variable in a low-dimensional continuous space greatly facilitates the original MO problem in a high-dimensional discrete space. The optimization variable z can be initialized as the latent representation of x 0 , denoted by z 0 = Enc(x 0 ). Similarly, for Case (ii), the optimization problem is formulated as

property score (to be maximized) (2) where {η j } J j=1 are the similarity score constraints and {γ i } I i=1 are positive coefficients of the property scores { f i (Dec(z))} I i=1 . Query-based Molecule Optimization (QMO) Procedure Although we formulate MO as an unconstrained continuous minimization problem, we note that solving it for a feasible candidate sequence x = Dec(z) is not straightforward because: (i) The output of the decoder x = Dec(z) is a discrete sequence, which imposes challenges on any gradient-based (and high-order) optimization method since acquiring the gradient of z becomes non-trivial. Even resorting to the Gumbel-softmax sampling trick for discrete outputs 47 , the large output space of the decoder may render it ineffective; (ii) In practice, many molecular property prediction models and molecular metrics are computed in an access-limited environment, such as prediction APIs and chemical softwares, which only allow inference on a queried sequence but prohibit other functionalities such as gradient computation. To address these two issues, we use zeroth order optimization in our QMO framework (see Methods section for detailed procedure) to provide a generic and model-agnostic approach for solving the problem formulation in (1) and (2) using only inference results of { f i } I i=1 and {g j } J j=1 on queried sequences. Let Loss(z) denote the objective function to be minimized, as defined in either (1) or (2). Our QMO framework uses zeroth order gradient descent to find a solution, which mimics the descent steps on the loss landscape in gradient-based solvers but only uses the function values Loss(·) of queried sequences. Specifically, at the t-th iteration of the zeroth order optimization process, the iterate (candidate embedding vector) z (t+1) is updated by

where α t ≥ 0 is the step size at the t-th iteration, and the true gradient ∇Loss(z (t) ) (which is challenging or infeasible to compute) is approximated by the pseudo gradient ∇Loss(z (t) ). The pseudo gradient ∇Loss(z (t) ) is estimated by Q independent random directional queries defined as

where d is the dimension of the latent space of the encoder-decoder used in QMO, and β > 0 is a smoothing parameter used to perturb the embedding vector z (t) for neighborhood sampling with Q random directions {u (q) } Q q=1 that are independently and identically sampled on a d-dimensional unit sphere. See Figure 1 for the illustration of random neighborhood sampling. In our implementation, we sample {u (q) } Q q=1 using a zero-mean d-dimensional isotropic Gaussian random vector divided by its Euclidean norm, such that the resulting samples are drawn uniformly from the unit sphere. Intuitively, the gradient estimator in (4) can be viewed as an average of Q random directional derivatives along the sampled directions

in (4) ensures the norm of the estimated gradient is at the same order as that of the true gradient 39, 40 . A schematic example of the QMO procedure is illustrated in Figure 1 using binding affinity and Tanimoto similarity as property evaluation criterion. Note that based on the iterative optimization step in (3), QMO only uses function values queried at the original and perturbed sequences for optimization. The query counts made on the Loss function for computing ∇Loss(z (t) ) is Q + 1 per iteration. Larger Q further reduces the gradient estimation error at the price of increased query complexity. When solving (1), an iterate z (t) is considered as a valid solution if its decoded sequence Dec(z (t) ) satisfies the property conditions f i (Dec(z (t) )) ≥ τ i for all i ∈ [I]. Similarly, when solving (2), a valid solution z (t) means g j (Dec(z (t) |S)) ≥ η j for all j ∈ [J]. Finally, QMO returns a set of found solutions (returning null if in vain). Detailed descriptions for the QMO procedure are given in Methods section.

In what follows, we demonstrate the performance of our proposed QMO framework on three sets of tasks that aim to optimize molecular properties with constraints, including standard MO benchmarks and challenging tasks relating to real-world discovery problems. The pretrained encoder-decoder and the hyperparameters of QMO for each task are specified in Methods section and in Section 3 of the Supplementary Material.

Method Success (%) MMPA 33 32.9 JT-VAE 22 

We start with testing QMO on two single property targets: penalized logP and Quantitative Estimate of Drug-likeness (QED) 41 . LogP is the logarithm of the partition ratio of the solute between octanol and water. Penalized logP is defined as the logP minus the synthetic accessibility (SA) score 22 . Given a similarity constraint, finding an optimized molecule that improves drug-likeness of compounds using the QED score (from range [0.7, 0.8] to [0.9, 1.0]) 41 or improves the penalized logP score 22 , are two widely used benchmarks. For a pair of original and optimized sequences (x 0 , x), we use the QMO formulation in (2) with the Tanimoto similarity (ranging from 0 to 1) over Morgan fingerprints 48 as g Tanimoto (x|x 0 ) and the interested property score (QED or penalized logP) as f score (x). Following the same setting as existing works, the threshold δ for g Tanimoto (x|x 0 ) is set as either 0.4 or 0.6. We use RDKit 1 to compute QED and logP, and use MOSES 49 to compute SA (synthetic accessibility), where f penalized logP (x) = logP(x) − SA(x).

In our experiments, we use the same set of 800 molecules with low penalized logP scores and 800 molecules with QED ∈ [0.7, 0.8] chosen from the ZINC test set 50 as in Jin et al. 22 as our starting sequences. We compare QMO with various guided-search and translation-based methods in Tables 1 and 2 . Baseline results are obtained from the literature 35, 38 that use machine learning for solving the same task.

For the QED optimization task, the success rate defined as the percentage of improved molecules having similarity greater than δ = 0.4 is shown in Table 1 . QMO outperforms all baselines by at least 15%. For penalized logP task, the molecules optimized by QMO outperform the baseline results by a significant margin, as shown in Table 2 . The increased standard deviation in QMO is an artifact of having some molecules with much improved penalized logP scores (see Section 4 in the Supplementary Material).

Although the above-mentioned molecular property optimization tasks provide well-defined benchmarks for testing our QMO algorithm, it is well-recognized that such tasks are easy to solve and do not capture the complexity associated with real-world discovery 51 . For example, it is trivial to achieve state-of-the-art results for logP optimization by generating long saturated hydrocarbon chains 52 . Coley et al. 42 has proposed that molecular optimization goals that better reflect the complexity of real discovery tasks might include binding or selectivity attribute. Therefore, in the remaining of this paper, we consider two such tasks: (1) optimizing binding affinity of existing SARS-CoV-2 M pro inhibitor molecules and (2) lowering toxicity of known antimicrobial peptides.

Optimizing Existing SARS-CoV-2 Main Protease Inhibitor Molecules toward better IC 50 . To provide a timely solution and accelerate the drug discovery against a new virus such as SARS-CoV-2, it is a sensible practice to optimize known leads to facilitate design and production as well as minimize the emergence of adverse effects. Here we focus on the task of optimizing the parent molecule structure of a set of existing SARS-CoV-2 M pro inhibitors. Specifically, we use the QMO formulation in (1), a pre-trained binding affinity predictor 53 f affinity (output is pIC 50 value), and the Tanimoto similarity g Tanimoto between the original and optimized molecules. Given a known inhibitor molecule x 0 , we aim to find an optimized molecule x such that f affinity (x) ≥ τ affinity while g Tanimoto (x|x 0 ) is maximized.

For this task, we start by assembling 23 existing molecules shown to have weak to moderate affinity with SARS-CoV-2 M pro54, 55 . These are generally in the µm range of IC 50 , a measure of inhibitory potency (see Section 3 in the Supplementary Material for experimental IC 50 values). We choose the target affinity threshold τ affinity as pIC 50 ≥ 7.5, which implies strong affinity. Table 3 shows the final optimized molecules compared to their initial state (i.e., the original lead molecule). We highlight common substructures and show a similarity map to emphasize the changes. The results of all 23 inhibitors are summarized in Table 5 Since all of these 23 inhibitors are reported to bind to the substrate-binding pocket of M pro , we investigate possible binding mode alterations of the QMO-optimized molecules. It should be noted that, direct comparison of IC 50 with binding free energy is not always possible, as the relationship of binding affinity and IC 50 for a given compound varies depending on the assay conditions and the compound's mechanism of inhibition 56 . Further, high fidelity binding free energy estimation requires accounting for factors such as conformational entropy and explicit presence of the solvent. Nevertheless, we report binding free energy and mode for the QMO-optimized variants. For simplicity, we limit the analysis to achiral molecules. First, we run blind docking simulations using AutoDock Vina 57 over the entire structure of M pro with the exhaustiveness parameter set to 8. We further rescore top 3 docking poses for each of the original and QMO-optimized molecules using the Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) method and AMBER forcefield 58 , that is known to be more rigorous and accurate than the scoring function used in docking. Next we inspect if any of the top-3 docking poses of the original as well as of QMO-optimized variants involves the substrate-binding pocket of M pro , as favorable interaction with that pocket is crucial for M pro function inhibition. As an illustration, Figure 2 shows top docking pose of Dipyridamole and its QMO-optimized variant to the M pro substrate-binding pocket. Consistent with more favorable MM/PBSA binding free energy, the QMO-optimized variant forms 14% more contacts (with a 5 Å distance cutoff between heavy atoms) with M pro substrate-binding pocket compared to Dipyridamole. Some of the M pro residues that explicitly form contacts with the Dipyridamole variant are LEU167, ASP187, ARG188, and GLN192. Similar observations, e.g. higher number of contacts with M pro substrate-binding pocket, which involves TYR54, were found for other exemplars of QMO variants, such as for (c) Peptide sequences and their similarity ratio 59 and AMP activity predictor with iAMP-2L 60 . Color coding follows red=mismatch, blue=match, and black=gap. Similarity ratio is the ratio of similarity between the original and optimized sequences with respect to the self similarity of the original sequence. Similarity is estimated by using the global alignment function in Biopython.

As an additional motivating use-case, discovering new antibiotics at a rapid speed is critical to tackling the looming crisis of a global increase in antimicrobial resistance 11 . AMPs are considered as promising candidates for next generation antibiotics. Optimal AMP design requires balancing between multiple, tightly interacting attribute objectives 61, 62 , such as high potency and low toxicity. As an attempt toward addressing this challenge, we show how QMO can be used to find improved variants of known AMPs with reported/predicted toxicity, such that the variants have lower predicted toxicity and high sequence similarity, when compared to original AMPs. For the AMP optimization task, a peptide molecule is represented as a sequence of 20 natural amino acid characters. Using the QMO formulation in (1), subject to the constraints of toxicity prediction value ( f tox ) and AMP prediction value ( f AMP ), we aim to find most similar molecules for a set of toxic AMPs. The sequence similarity score (g sim ) to be maximized is computed using Biopython 2 , which uses global alignment between two sequences (normalized by the length of the starting sequence) to evaluate the best concordance of their characters. See Methods section for detailed descriptions. The objective of QMO is to search for improved AMP sequences by maximizing similarity while satisfying AMP activity and toxicity predictions (i.e. classified as being AMP and non-toxic based on predictions from pre-trained deep learning models 30 ).

In our experiments, we use QMO to optimize 150 experimentally-verified toxic AMPs collected from public databases 63, 64 by Das et al. 30 as starting sequences. Note, the toxic annotation here does not depend on a specific type of toxicity, such as hemolytic toxicity. Figure 3 shows their cumulative success rate (turning toxic AMPs to non-toxic AMPs) using QMO up to the t-th iteration. Within the first few iterations, more than 60% molecules were successfully optimized. Eventually, about 72.66% (109/150) molecules can be successfully optimized. Analysis over all 109 original-improved pairs reveals notable physicochemical changes, e.g. lowering of hydrophobicity and hydrophobic moment in the QMO-optimized AMP sequences (see Figure 4 (a) and (b) and also Table 11 in the Supplementary Material). This trend is consistent with reported positive correlation of hydrophobicity and hydrophobic moment with cytotoxicity and hemolytic activity 65, 66 . Figure 4 (c) shows examples of known AMPs and their QMO optimized variant sequences. Sequence alignment and similarity ratio relative to the original sequence are also shown, indicating that sequences resulting from QMO differ widely from the initial ones. Figure 10 in the Supplementary Material depicts the optimization process of some AMP sequences. QMO can further improve similarity while maintaining low predicted toxicity and high AMP values for the specified thresholds after first success.

We perform additional validation of our optimization results by comparing QMO-optimized sequences using a number of state-of-the-art AMP and toxicity predictors that are external classifiers not used in the QMO framework. Figure 5 shows the external classifiers' prediction results on 109 original and improved sequence pairs that are successfully optimized by QMO. We note that these external classifiers vary on training data size and type, as well as on model architecture, and report a range of accuracy. Data and models for the toxicity prediction task are more rare, compared to those for the AMP classification problem. Further, external toxicity classifiers such as HAPPENN 59 and HLPpred-fuse 67 target explicitly predicting hemolytic nature. For Table 10 of the Supplementary Material. these reasons, the predictions of the external classifiers on the original lead sequences may vary, when compared to ground-truth labels (see the third column in Table 10 of Supplementary Material). Nonetheless, predictions on the QMO-optimized sequences using external classifiers show high consistency in terms of toxicity improvement, when compared with the predictors used in QMO. Specifically, the predictions from iAMP-2L 60 and HAPPENN 59 (hemolytic toxicity prediction) show that 56.88% (62/109) QMO-optimized molecules are predicted as non-toxic AMPs. In Section 7.1 of the Supplementary Material, we also show that the use of a better encoder-decoder helps the optimization performance of QMO.

To gain better understanding of how QMO optimizes a lead molecule with respect to the property constraints and objectives, we provide visual illustration of the property landscapes and search trajectories via QMO using a two-dimensional local interpolation on the molecule embedding space. Specifically, given the original embedding z 0 and the embedding of the best candidate z * returned by QMO, we perform local grid sampling following two selected directions v x and v y , and then evaluate the properties of the decoded sequences from the sampled embeddings for property landscape analysis. For the purpose of visualizing the property landscape in low dimensions, we project the high-dimensional search trajectories {z t } T t=1 to the two directions v x and v y . Figure 6 shows the landscape of Tanimoto similarity v.s. binding affinity prediction when using Remdesivir as the lead molecule, with the optimization objective of maximizing Tanimoto similarity while ensuring the predicted binding affinity is above a defined threshold 7.5. The two directions are the principal vector z * − z 0 and a random vector orthogonal to the principal vector (see Methods section for more details). The trajectory shows how QMO leverages the evaluations of similarity and binding affinity for optimizing the lead molecule. Figure 6 also displays the common substructure of candidate molecules in comparison to the Remdesivir molecule in terms of subgraph similarity and their predicted properties over sampled iterations in QMO.

In addition to demonstrating the efficiency in optimizing lead molecules, we also study the diversity of the optimized molecules by varying the random seed used in QMO for query-based guided search. Figure 7 shows three different sets of trajectory on the landscape of predicted binding affinity when using Remdesivir as the lead molecule (see Methods section for more details). The optimization objective is the same as that of Figure 6 . The visualization suggests that the trajectories are distinct and the best candidate molecule in each trajectory is distant from each other in the embedding space, suggesting that QMO can find a diverse set of improved molecules with desired properties. We also provide a quantitative study on the diversity and novelty of the QMO-optimized sequences when varying the similarity threshold in Section 6.1 of the Supplementary Material. Setting a lower similarity threshold in QMO results in more novel and diverse sequences. 

In this paper we propose QMO, a generic-purpose molecule optimization framework that readily applies to any pre-trained molecule encoder-decoder with continuous latent molecule embeddings and any set of property predictions and evaluation metrics. It features efficient guided search with molecular property evaluations and constraints obtained using predictive models and cheminformatics softwares. More broadly, QMO is a machine learning gear that can be incorporated into different scientific discovery pipelines with deep generative models, such as generative adversarial networks, for efficient guided optimization with constraints. As a demonstration, Section 6.2 and 6.3 of the Supplementary Material show the QMO results on the SARS-CoV-2 Main Protease inhibitor optimization task with alternative objectives and randomly generated lead sequences, respectively. QMO is able to perform successful optimization with respect to different objectives, constraints, and starting sequences. The proposed QMO framework can be applied in principle to other classes of materials, for example metal oxides, alloys, and genes.

On the simpler benchmark tasks for optimizing drug-likeness and penalized logP scores with similarity constraints, QMO demonstrates superior performance over baseline results. We also apply QMO to improve the binding affinity of existing inhibitors of the SARS-CoV-2 Main Protease, and to improve the toxicity of antimicrobial peptides. The QMO-optimized variants of existing drug molecules show favorable binding free energy with SARS-CoV-2 Main Protease upon blind docking and MM/PBSA rescoring, whereas the QMO-optimized peptides are consistently predicted to be antimicrobial and non-toxic by external peptide property predictors. The property landscape analysis and low-dimensional visualization of the optimization trajectories provide insights on how QMO efficiently navigates in the property space to find a diverse set of improved molecules with the desired properties. Our results show strong evidence for QMO to serve as a novel and practical tool for molecule optimization and other process/product design problems as well to aid accelerating chemical discovery with constraints. In Section 7 of the Supplementary Material, we provide an ablation study of QMO for additional performance analysis, including the effect of encoder-decoder, the difference between sequence-level and latent-space classifiers, and the comparison between different gradient-free optimizers. The results show that QMO is a query-efficient end-to-end molecule optimization framework, and a better encoder-decoder can further improve its performance. Future work will include integrating multi-fidelity expert feedback into the QMO framework for human-AI collaborative material optimization, and using QMO for accelerating discovery Figure 7 . Three sets of QMO trajectory visualization on the landscape of predicted binding affinity when using Remdesivir as the lead molecule. The trajectory differs by the rand seed used in QMO for query-based guided search with random samples. The optimization objective is to maximize Tanimoto similarity while ensuring the predicted binding affinity is above a defined threshold of 7.5. The visualization suggests that QMO can find a diverse set of improved molecules.

of novel, high-performance and low-cost materials.

• Procedure Inputs: Pre-trained encoder-decoder; Molecular property predictors

Step size {α t } T −1 t=0 ; Starting lead molecule sequence x 0 ; Reference sequence set S; Loss function from (1) or (2) • Procedure Initialization:

• Repeat the following steps for T times, starting from T = 0:

• Molecular property and constraint verification: If solving for formulation (1), check f i (Dec(z (t) )) ≥ τ i for all i ∈ [I].

If solving for formulation (2) , check g j (Dec(z (t) )|S) ≥ η j for all j ∈ [J].

• Update valid molecule sequence:

Inherited from zero order optimization, QMO has algorithmic convergence guarantees. Under mild conditions on the true gradient (Lipschitz continuous and bounded gradient), the zeroth order gradient descent following (3) ensures QMO takes at most O( d T ) iterations to be sufficiently close to a local optimum in the loss landscape for a non-convex objective function 39, 40 , where T is the number of iterations. In addition to the standard zeroth order gradient descent method, our QMO algorithm can naturally adopt different zeroth order solvers, such as zeroth order stochastic and accelerated gradient descent. Our implementation of gradient estimation gives Q + 1 loss function queries per iteration. If the decoder outputs a SMILES string, we pass the string to RDKit for validity verification and disregard invalid strings.

In our QMO implementation, we use the zeroth-order version of the popular Adam optimizer 68 that automatically adjusts the step sizes {α t } T −1 t=1 with an initial learning rate α 0 (see Section 2 in the Supplementary Material for more details). Empirically, we find that Adam performs better than stochastic gradient descent (SGD) in our tasks. The convergence of zeroth order Adamtype optimizer is given in Chen et al. 69 . We will specify experimental settings, data descriptions, and QMO hyperparameters for each task. In all settings, QMO hyperparameters were tuned to a narrow range and then all the reported combinations were 11/53 tried for each starting sequence. Among all feasible solutions returned by QMO, we report the one having the best molecular score given the required constraints. The stability analysis of QMO is studied in Section 5 of the Supplementary Material.

In our experiments, we run the QMO procedure based on the reported hyperparmeter values and report the results of the best molecule found in the search process. The procedure will return null (that is, an unsuccessful search) if the it fails to find a valid molecule sequence.

Benchmarks on QED and penalized logP The pre-trained encoder-decoder by Winter et al. 46 is used, with the latent dimension d = 512. For the penalized logP optimization task, we use Q = 100, β = 10, α 0 = 2.5, γ penalized logP = 0.04, and T = 80. For the QED task, we use Q = 50, β = 10, α 0 = 0.05, γ QED = 4, T = 20, and report the best results among 50 restarts. We find that for the QED task, using multiple restarts can further improve the performance (see Section 5 in the Supplementary Material for detailed discussion). For penalized logP, there is no reason to continue optimizing past 80 iterations as penalized logP can be increased almost arbitrarily without making the resulting molecule more useful for drug discovery 29 -even under similarity constraints, as we find. Therefore, we set T = 80 for the penalized logP task. 

In Figure 6 and Figure 7 , the optimization trajectory by QMO is visualized by projection on two selected directors v x and v y originated from the starting embedding z 0 . Specifically, in (6) we set v x = z * − z 0 and set v y as a unit-norm randomly generated vector that is orthogonal to v x . The two-dimensional local grid in the embedding space are then sampled according to z grid (x, y) = z 0 + x · v x + y · z * 2 · u y , where · 2 denotes the Euclidean distance and we sample x and y uniformly from [−0.5, 1.5] and [−2, 2], respectively. Note that by construction, z grid (0, 0) = z 0 and z grid (1, 0) = z * . Then, we evaluate the Tanimoto similarity and binding affinity prediction of the grid and present their results in Figure 6 . Similarly, in Figure 7 we set v x and v y to be two unit-norm randomly generated vectors, and set z grid (x, y) = z 0 + x · z 0 2 · v x + y · z 0 2 · u y , where x and y are sampled uniformly from [−1.6, 1.6]. Chen contributed to QMO methodology design and property landscape analysis. V. Chenthamarakshan contributed to common substructure analysis. All authors conceived and designed research, analyzed results, and contributed to paper writing. 

Supplementary Material

In contrast to first-order (i.e. gradient-based) optimization, zeroth-order (ZO) optimization uses function values evaluated at queried data points to approximate the gradient and perform gradient descent, which we call pseudo gradient descent 39 . It has been widely used in machine learning tasks when the function values are observable, while the gradient and other higher-order information are either infeasible to obtain or difficult to compute. A recent appealing application of ZO optimization is on efficient generation of prediction-evasive adversarial examples from information-limited machine learning models, known as black-box adversarial attacks 72 . For the purpose of evaluating practical robustness, the target model only provides model predictions (e.g., a prediction API) to an attacking algorithm and does not reveal other information.

A major benefit of ZO optimization is its adaptivity from gradient-based methods. Despite using gradient estimates, many ZO optimization algorithms enjoy the same iteration complexity to converge to a stationary solution as their first-order counterparts under similar conditions 40 . However, an additional multiplicative cost in a polynomial order of the problem dimension d (usually O(d) or O( √ d)) will appear in the rate of convergence, due to the nature of query-driven pseudo gradient descent.

Algorithm 1 describes the zeroth order version of the Adam optimizer 68 used in our QMO implementation. 

Adam update: 6 : 

We used the continuous and data-driven descriptors (CDDD) model 46 to embed the SMILES representation of a molecule into a continuous latent space. The model consists of an encoder and decoder, each with three stacked gated recurrent unit (GRU) cells and a fully-connected layer with hyperbolic tangent activation function. The latent representation value of each dimension is thus constrained within the range of [−1, 1]. In our QMO implementation, we perform projection of the candidate embedding z (t) onto [−1, 1] for each iteration. The training of CDDD is also regularized by molecular property predictors. The complete training of the CDDD model involves minimizing the reconstruction loss and property prediction loss. The decoder uses a left-to-right beam search for inference. The details of the datasets, models, and hyperparameters used to train the CDDD model are described in the supplementary material of Winter et al 46 .

Antimicrobial Peptide Use Case We used the pre-trained Wasserstein autoencoder (WAE) from Das et al 30 . In order to learn meaningful continuous representations from sequences in unsupervised fashion, variational autoencoder (VAE) family has been proven to be a principled approach 73 . However, vanilla VAE models are prone to mode collapse 74 . Advanced models such as β -VAE 75 and WAE 76 were proposed to address this issue, and WAE was found to encode the AMP space better 30 .

The default WAE architecture involved bidirectional-GRU encoder and GRU decoder, where GRU stands for the gated recurrent unit. The encoder is a bi-directional GRU with hidden state size of 80. The latent capacity was set to be 100. We used the same parameter setting as in Das et al 30 . We used the random feature approximation of the Gaussian kernel with kernel bandwidth σ = 7 as reported to be performing the best. The inclusion of latent space noise log variance regularization, denoted as R(logVar), helped avoiding collapse to a deterministic encoder. Among different regularization weights used, 1e − 3 had the most desirable behavior on the reported performance metrics 30 . For more details refer Das et al 30 and Table 4 , we see experimental IC 50 values for a few of the COVID-19 inhibitor candidates examined in Table 5 compared with their predicted affinity values. These are provided for reference. Antimicrobial Peptide Use Case In our experiments, we used the labeled part of a large curated antimicrobial peptide (AMP) database in a recent AI-empowered antimicrobial discovery study 30 . The AMP dataset has several attributes associated with peptides from which we used antimicrobial (AMP) and toxicity. The labeled dataset has only linear and monomeric sequences with no terminal modification and length up to 50 amino acids. The dataset contains 8683 AMP and 6536 non-AMP; and 3149 toxic and 16280 non-toxic sequences. For the starting AMP sequences, we consider sequences with up to length 15 and with property being both AMP and toxic. We then filter sequences for which our toxicity classifier predictions match with ground truth and obtain 167 sequences. Figure 8 shows the distributions of improvement in penalized logP scores compared to the original molecules for the two similarity thresholds. The results show a long tail on the right-hand side, increasing the variance. Table 5 shows the results for improving binding affinity of 23 existing SARS-CoV-2 M pro inhibitor molecules. Affinity predictions, Tanimoto similarity, quantitative estimation of drug-likeness (QED) 41 , synthetic accessibility (SA) 78 , and the logarithm of partition coefficient (logP) properties are reported. The results show that predicted affinity is improved past the threshold for every starting compound while attaining similarity of 0.64 on average. QED increases slightly, on average, showing QMO preserves and in some cases improves drug-likeness. SA increases only slightly, indicating synthesizability is still reasonable. Hydrophobicity (logP) decreases slightly meaning the molecules are more water-soluble. Finally, variants of all 11 M pro inhibitors show better or comparable binding free energy, when compared to that of the original one. Table 6 shows the original and improved versions of all 23 COVID-19 inhibitor candidates. Table 7 shows the SMILES representations of these same molecules.

We also provide the extended results of docking analysis on the COVID-19 inhibitor candidates and their improved variants in Table 8 . We show MM/PBSA energy calculation results on the top 3 docking poses for each molecule. In addition, we investigate if the top binding modes revealed in docking do correspond to any of binding pockets reported in 53 , which were estimated using PrankWeb 3 and indexed by score (see Figure 9 for the locations of these pockets). Note: pocket 0 corresponds to the substrate-binding pocket of M pro . If the pocket does not change between original and improved molecules, we can expect a similar mode of inhibition of the target which is desirable (in the cases where we know the experimentally validated binding pocket of the original drug, e.g. see Figure 2 ). Finally, Table 9 lists all of the residues that Dipyridamole contacts in it's original and optimized versions along with the number of heavy atom contacts. The improved version makes 73 contacts compared to 64 Table 7 . SMILES representations of original and improved (QMO-optimized) inhibitor molecules. 5 The selenium atom in Ebselen is rare for drug molecules and cannot be handled by the encoder/decoder so it is substituted for sulfur before beginning optimization as in 54 .

Binding Free Energy 41  46  49  140  141  142  143  144  145  163  164  165  166  167  168  186  187  188  189  190  191  192 Total contacts 64 73 Table 9 . Full set of contacts between SARS-CoV-2 M pro and Dipyridamole, original and QMO-optimized variant. Residues that contact either of the molecule variants are listed by number. The bottom row shows the total number of heavy atom contacts with a 5 Å cutoff. Figure 10 shows the similarity (normalized by self-similarity of the starting sequence) as QMO continues past the first success and the evolution of the sequences of selected iterations. The curve shows the similarity of best-found valid molecule (highest similarity and predicted as non-toxic AMP) with respect to iteration counts in QMO. We can see that QMO keeps on improving similarity of best-found candidate while maintaining low toxicity and amp property as the optimization process continues. For these experiments we used Q = 100, β = 1, 10, α 0 = {0.1, 0.05, 0.01}, λ sim = 0.01, and T = 5000. Table 10 summarizes the results of AMP optimization as displayed in Figure 5 . Table 11 shows several physicochemical properties of the 109 original and QMO-optimized sequences displayed in Figure 4 .

Existing machine learning-based AMP classifiers report a wide range of accuracy. For example, the reported accuracy is 66.8% for iAMP Pred 79 , 79% for DBAASP-SP 80 that relies on local density-based sampling using physico-chemical features, and 94% for Witten et al. method 81 that uses a convolutional neural net trained directly on a large corpus of peptide sequences. Our sequence-level LSTM model shows a comparable 88% accuracy.

Among the available toxicity predictors, many focuses on predicting hemolytic nature. For example, HLPpred-fuse 67 fuses predicted probabilities from six different classifiers to the final model. In contrast, HAPPENN 59 uses a neural net model and reports 84% accuracy. Ref. 82 uses a gradient boosting based model and obtains a 95% accuracy, similar to the in-house sequence-level toxicity classifier. We emphasize that the goal of this study is not to provide a new antimicrobial or toxicity prediction method that outperforms existing machine learning-based classifiers. Rather our goal is to leverage feedback from a set of reliable predictors to guide searches in the latent space. It should be noted that, comparing different AMP (as well as toxicity) prediction models is non-trivial, as different methods widely vary by training dataset size, sequence length, different definition of positive and negative sequences, and other data curation criteria.

Reported Accuracy (%) 

In this section, we selected two tasks to study the stability of QMO, in terms of run time comparison and the effect on the number Q of per-iteration random directions used in QMO.

The total runtime for the QED task with T = 20 iterations and 50 random restarts was approximately 487 CPU/GPU hours or, spread over 32 cores, 15.2 hours of wall time. We complete 15 random restarts at a success rate of 85.9% in roughly 8 hours of wall time for 32 cores, the same time reported in 28 . We ran all our experiments on machines with Intel Xeon E5-2600 CPUs and NVIDIA K80 GPUs.

(a) 20 steps per restart (50x20). (b) 300 steps per restart (4x300). Figure 11 . Cumulative success rate as a function of number of random restarts with 20 or 300 steps per restart for the QED task. In examining the effect of multiple restarts, we show two configurations for running QMO on the QED task in Figure 11 . Restarts happen if the algorithm is unable to find a successful candidate after the allotted number of steps. For this task, the algorithm stops after the first successful candidate is found. After 50 restarts at 20 steps (Figure 11a ), we reach a success rate of 92.8% whereas after 4 restarts at 300 steps (Figure 11b ), we only achieve 90.9% success despite taking about as long -447 hours (4x300) compared to 487 hours (50x20). We conclude that for relatively easy tasks (single-constraint optimization) using 28/53 a small number of steps T , restarting with a new random seed can be very effective since at early iterations the (effective) step size is relatively large and the guided search may tend to overshoot. Figure 12 compares the effect of the per-iteration random direction sampling number Q in QMO on the cumulative success of the AMP experiments. We observe that setting larger Q value can improve the cumulative success rate but at the cost of increased number of property evaluations. The performance becomes similar once the Q value is sufficiently large, suggesting the stability of QMO.

6 Additional Analysis on Diversity, Novelty, and Optimization Objectives

Metric Following the same experiment setup, Table 12 shows the performance gains of QMO on the logP task on 80 randomly sampled lead molecules as the similarity constraint is relaxed. Internal diversity is computed as the average Tanimoto similarity between all pairs of molecules in the set of 80 (1 run). The internal diversity of the original molecules is 0.89. It is observed that the internal diversity decreases as the threshold decreases. This is because the algorithm is incentivized to find more similar substructures which dominantly improve logP and this gets easier with a lower similarity threshold. On the other hand, after running QMO 100 times we find the fraction of unique final molecules for each initial point, averaged over all 80 initial molecules, increases dramatically as the similarity constraint is relaxed. In other words, QMO is able to generate novel molecules when given different random seeds for the same starting point if the search space is expanded. QMO works with a variety of optimization objectives, including those which do not require a similarity constraint. In these cases, the choice of starting sequence is less important since the optimizer is able to move arbitrarily far from the initial point (see Section 6.3). In Table 13 , we show the results on two tasks without similarity constraints: minimizing SA while maintaining binding affinity with M pro ≥ 7.5 and maximizing QED while maintaining the same binding affinity threshold. In the SA minimization task, we note that the internal diversity of the improved molecules is low and we observe that the exact same final molecule is often found when starting from different initial points. In this case, the task is relatively simple and the optimizer is able to find the same minimum point regardless of where it starts. The QED maximization task shows more diverse results. For full results, see Tables 17 and 18.

In addition to starting from known lead-molecules for SARS-CoV-2 inhibition, we also experimented with starting from molecules drawn uniformly randomly from the latent space. The same set of 23 random molecules are used to reproduce the Table 14 show that the utility of QMO is not restricted to just searching the space near a known lead molecule and can be used for random generation as well. QMO is able to consistently achieve similar results to Section 6.2 without starting from a predetermined initial point. Full results can be found in Tables 19 and 20 below.

Here we compare QMO performance using different β -VAE 75 and the default WAE 76 models studied in Das et al 30 for the AMP task. For β -VAE, we used three models by changing the Kullback-Leibler (KL) annealing term β from 0 to 0.03, 0 to 0.3, and 0 to 1.0. For WAE, we used random feature approximation of Gaussian kernel with kernel bandwidth σ = 7 as this was found working to be the best model 30 . Additionally, the latent space noise log variance regularization, R(logVar), was imposed to prevent encoder from mode collapsing. Figure 13 shows the QMO success rate plot for these model variants. All QMO hyperparamters are fixed among all set of experiments with only change of encoder-decoder model. It is shown that QMO performance increases with an improved underlying autoencoder model. The cumulative success rate aligns well with the performance metric evaluations reported in Table 15 . Our default WAE model performs better than variants of β -VAE as it has lower reconstruction error and higher BLEU score. The results show that a better encoder-decoder can lead to improved QMO performance. We also note that setting a larger latent dimension may improve the performance of the encoder-decoder at the price of requiring more queries per iteration for balancing gradient estimation.

Given the same number of queries per iteration, Figure 15 compares the QMO performance on the AMP task using WAE models with different latent dimensions as listed in Table 15 . One can observe that models having low BLEU scores and large reconstruction errors (e.g., setting the latent dimension to be 50 or changing log variance R(logVar) = 1e − 2) lead to low success rate due to poor molecule representation learning of the encoder-decoder. Increasing the latent dimension may slightly improve the encoder-decoder performance, but it may not improve the QMO performance since more queries per iteration are needed for zeroth-order gradient descent to reach the same level of gradient estimation accuracy 40 . 

We test the impact of switching from sequence-level property predictors to latent-space (z-space) predictors on QMO optimization for AMP task. For the latent-space classifier, we use logistic regression with inverse regularization strength C = 1.0 and 200 Limited-memory BFGS (L-BFGS) optimization iterations to train on the latent representations from the same underlying encoder-decoder. Datasets for AMP and Toxicity attributes are taken from Das et al 30 . Test accuracies of the latent-space classifiers are 80.27% for AMP and 87.65% for toxicity. Figure 15 shows the performance of QMO using sequence-level and latent-space classifiers. All hyperparameter settings are the same except that different classifiers are used for property prediction. We found that QMO with sequence-level classifiers performs better. The final success rate of sequence-level classifiers is 109/150 while that of latent-space classifiers is 92/150. Moreover, in the latter case 20 out of 92 lead sequences get improved candidate same as original ones, which attributes to latent-space classifiers mis-classifying those lead sequences. Excluding the initially mis-classified sequences, the success rate of QMO using latent-space classifiers comes down to 72/150. The reason about the performance difference can be explained from the fact that since latent representations are information-compressed proxies of molecules, using latent-space classifiers can give worse results for guided optimization when compared to using sequence-level classifiers.

In order to demonstrate the effect of the search radius/smoothing parameter in the QMO algorithm, β , we show results from the QED task with varying β values in Figure 16 . At β = 1 and below, QMO was unable to improve QED above the threshold for any molecules. This is because the search radius is too small to approximate an accurate gradient so the search is essentially random. At β = 100 and above, all of the unsuccessful molecules were due to the decoder model returning invalid molecules. 31/53 Figure 16 . Success rate on drug likeness (QED) task vs. β value in QMO for the QED optimization task.

Success (%) QMO-SPSA 80.1 QMO 88.4 Table 16 . Performance of drug likeness (QED) task with Tanimoto similarity constraint δ = 0.4. QMO-SPSA means solving the QMO loss function using the simultaneous perturbation stochastic approximation (SPSA) method 84 .

In this case, the search radius is too large and the resultant z points are at the edges of the latent space. Between these extremes, the effect is milder and we found β = 10 to work well in many cases with this decoder model. Table 16 compares the results on the QED optimization task between our QMO zeroth-order optimizer and the simultaneous perturbation stochastic approximation (SPSA) method 84 . QMO-SPSA is identical to Algorithm 1 except the gradient estimation step is replaced with SPSA: ∇Loss(z (t) ) = Loss(z (t) +β ·u)−Loss(z (t) −β ·u) 2·β ·u . QMO shows a significant (10.4%) increase in success rate over this baseline while requiring an equivalent number of estimator queries.

For SPSA, we use a fixed perturbation coefficient, equivalent to β . In this case, we run QMO with Q = 10 and the other parameters the same as in the main paper. We also run SPSA with (Q + 1)/2 times as many steps for parity in the number of estimator queries. Hence, we use T = 110 for SPSA. Figure 17 demonstrates one advantage of our algorithm over the molecule swarm optimization (MSO) 24 in terms of query efficiency: QMO has a higher success rate with low numbers of queries. In other words, QMO is able to find success more quickly than MSO. The task has two thresholds -binding affinity with SARS-CoV-2 M pro ≥ 7.5 and Tanimoto similarity to initial molecule ≥ 0.6 -and the algorithm is successful as long as both criteria are met. For this task, we limit the number of queries allowed to the respective optimization objective function at intervals of 1000. To be specific, for MSO we use binding affinity weight = 5, particles = 50, and steps = [19, 39, 59, 79, 99, 199 ] and for QMO we use α 0 = 0.02, β = 10, Q = 9, λ = 1, and T = [100, 200, 300, 400, 500, 1000]. Figure 17 . Comparison of molecular optimization methods at equivalent numbers of loss function queries on a M pro -affinity ≥ 7.5 and similarity ≥ 0.6 task. We use the same 23 lead-molecules from Section 4.2.

The key insights concluded from the corresponding QMO results in Section 7 are (i) encoder-decoders with lower reconstruction errors lead to better optimization performance; (ii) using sequence-level classifier gives a higher success rate than latent-space classifier; (iii) setting the search radius too small or too large degrades the optimization performance; and (iv) QMO outperforms other gradient-free optimizers including particle swarm optimization and simultaneous perturbation stochastic approximation. Table 20 . Results of QMO optimization on 23 randomly generated molecules with M pro -affinity ≥ 7.5, max-QED objective.

Machine learning unifies the modeling of materials and molecules

Machine learning for chemical discovery

Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis

Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks

Estimation of the size of drug-like chemical space based on gdb-17 data

Artificial intelligence for drug discovery, biomarker development, and generation of novel chemistry

Machine learning-assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials

Exploiting machine learning for end-to-end drug discovery and development

Analysis of therapeutic targets for sars-cov-2 and discovery of potential drugs by computational methods

Molecular interaction and inhibition of sars-cov-2 binding to the ace2 receptor

Novel classes of antibiotics or more of the same?

Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules

Multi-objective molecular de novo design by adaptive fragment prioritization

Ligbuilder 2: a practical de novo drug design approach

Augmenting genetic algorithms with deep neural networks for exploring the chemical space

Bayesian optimization of small organic molecules with synthesizable recommendations

Automatic chemical design using a data-driven continuous representation of molecules

Shape-based generative modeling for de novo drug design

Constrained bayesian optimization for automatic chemical design using variational autoencoders

Deltadelta neural networks for lead optimization of small molecule potency

Optimol: Optimization of binding affinities in chemical space for drug discovery

Junction tree variational autoencoder for molecular graph generation

Automatic molecule optimization using copy & refine strategy

Efficient multi-objective molecular optimization in a continuous latent space

Molecular de-novo design through deep reinforcement learning

Happenn is a novel tool for hemolytic activity prediction for therapeutic peptides which employs neural networks

iamp-2l: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types

Discovering de novo peptide substrates for enzymes using machine learning

In silico optimization of a guava antimicrobial peptide enables combinatorial exploration for peptide design

Satpdb: a database of structurally annotated therapeutic peptides

Dbaasp v. 2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides

Origin of low mammalian cell toxicity in a class of highly active antimicrobial amphipathic helical peptides

Characterization of the bioactivity and mechanism of bactenecin derivatives against food-pathogens

HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation

A method for stochastic optimization

Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization

Amino acid substitution matrices from protein blocks

ZOO: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models

Auto-encoding variational bayes

Generating sentences from a continuous space

beta-vae: Learning basic visual concepts with a constrained variational framework

Wasserstein auto-encoders

Identification of antiviral drug candidates against sars-cov-2 from fda-approved drugs

Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions

The authors thank Prof. Tingjun Hou and Zhe Wang, for the help with binding free energy calculation using the FarPPI webserver. The authors also thank Bhanushee Sharma for her help in improving the presentation of the system plot ( Figure 1 ).