key: cord-0548180-msdycum2 authors: Yang, Wuyue; Peng, Liangrong; Zhu, Yi; Hong, Liu title: When Machine Learning Meets Multiscale Modeling in Chemical Reactions date: 2020-06-01 journal: nan DOI: nan sha: 2dc523821be08418dd7e93eebb1e4ced30583081 doc_id: 548180 cord_uid: msdycum2 Due to the intrinsic complexity and nonlinearity of chemical reactions, direct applications of traditional machine learning algorithms may face with many difficulties. In this study, through two concrete examples with biological background, we illustrate how the key ideas of multiscale modeling can help to reduce the computational cost of machine learning a lot, as well as how machine learning algorithms perform model reduction automatically in a time-scale separated system. Our study highlights the necessity and effectiveness of an integration of machine learning algorithms and multiscale modeling during the study of chemical reactions. Accompanied with the matter synthesis and decomposition, energy storage and release, biofunction activation and deactivation, chemical reactions play a fundamental role in multiple disciplines 1 , including biology, chemical engineering, materials science and so on. They help to model complicated phenomena in nature by an explicit reaction network, to allow the interpretation of observed data through quantitative mathematical equations, and to translate varied experimental conditions into tunable reaction rates and reaction orders. Due to their high complexity and nonlinearity, the previous studies of chemical reactions heavily rely on sophisticated mathematical analysis and first-principle calculations, like quantum chemistry 2 . The first mission of studies on chemical reactions is to obtain the proper mathematical model which can interpret the observed phenomena and data. Even though there are some empirical laws and some pre-knowledge on the reaction networks which may help to build the model, the parameters like reaction rates are usually deeply buried inside the massive data. Recent rapid development of various machine learning algorithms, especially deep neural networks, make inferring reaction networks and parameters be possible and efficient. Mangan et al. 3 proposed an implicit sparse identification of nonlinear dynamics to infer hidden biochemical reaction networks, with emphasis on the rational nonlinear forms of the governing dynamics. Hu et al. 4 constructed a so-called ODENet (short for Ordinary Differential Equations Network), which was used for explicitly modeling the Lotka-Volterra type dynamics and actin growth in the presence of medium-level noises. From a stochastic perspective, the chemical reaction system was modeled as a continuous-time Markov chain, whose propensity function was reconstructed as a combination of the pre-designed basis functions based on the maximization of log-likelihood function 5 . Costello and Martin 6 showed that a supervised learning method can predict the metabolic pathway dynamics from proteomics data, which may be used to design various bioengineered systems. Yang et al. 7 revealed that the aggregation rates of amyloid proteins could be reliably estimated based on the feedforward fully connected neural network and feature selection. In organic chemistry, given some reactants and external conditions, all possible reactions were ranked by a machine learning approach, including a reactive site classifier and a ranking model, with the top-ranked mechanism corresponding to the major products 8 . The Gaussian process regression was utilized to construct the potential energy surface of the HO x system 9 , which could reduce the computational cost and meanwhile guarantee the convergence with fewer training points. For more applications of machine learning to chemical reactions, including the supervised and unsupervised learning, see e.g. Refs. [10] [11] [12] [13] [14] . The successful attempts of machine-learning-based modeling pave a new way to understand the complicated dynamics of chemical reactions. However, most chemical reactions involve plenty of reactants, multiple potential reaction routines, diverse reaction rates and so on. Without considering the this intrinsic multi-component and multiscale nature of the system, direct applications of machine learning algorithms may face inevitable difficulties (see examples below for details). Motivated by the requirements on a real complex system, especially a simultaneous maintenance of the efficiency of macroscopic models and the accuracy of microscopic models, the view of multiscale modeling is introduced. It focuses on a proper separation of the system or phenomenon into several scales with minimum overlap, a correct characterization of the relation between different levels of physical models, as well as a systematical procedure of coarse-graining 15 . Multiscale modeling offers a unified way to examine the system of chemical reactions, by looking into the reactions occurring at different time scales and the relations between them. Therefore, it is expected that a proper integration of machine learning algorithms with ideas and methodology of multiscale modeling and analysis will shed some light into this field. And this leads to the major motivation of our current study. To be concrete, we will justify our arguments from two aspects: (1) By using the explicit correspondence between mesoscopic chemical master equations and macroscopic mass-action equations in Kurtz's limit, the challenging task of learning detailed probability distribution function (PDF) is converted into learning low-order moments. Obviously, the latter is much easier. In this case, the computational cost of direct machine learning is greatly reduced by incorporating the multiscale modeling. (2) When fast and slow reactions appear simultaneously in the same system, meaning there is a time-scale separation among the chemical reactions, the ODENet -a kind of machine learning algorithms with sparse identification show an astonishing ability of deriving simplified models under Quasi Steady State Approximation (QSSA) automatically. Therefore, machine learning could help to model multiscale chemical reactions too. These two examples clearly demonstrate that machine learning and multiscale modeling are closely related to each other. A proper integration of two approaches will greatly facilitate our study of chemical reactions. The whole paper is organized as follows. A basic architecture of the ODENet, a special kind of machine learning algorithms which is designed to derive the explicit form of ODEs from the pregiven time series data, is introduced in Section II. Along with the basic ideas and techniques for multiscale modeling and analysis for chemical reactions, including the Kurtz's limit from chemical master equations to mass-action equations, and the quasi steady-state approximation. In Section III, we illustrate our key ideas through two examples -the development and differentiation of cells, as well as the self-regulatory gene transcription and translation. The usefulness of an integration of machine learning and multiscale modeling could be clear learned. The last section contains some discussions. The ordinary differential equations network was proposed 4,16 as a continuous version of the famous ResNet 17 for dealing with time series data modeled by ordinary differential equations (ODEs). Mathematically, the consecutively repeating building blocks -each layer of a residual network can be expressed as y k+1 = y k + f (y k ; θ k ), where y k is the output of k th hidden layer, y k+1 is the output of (k + 1) th hidden layer and f (y k ; θ k ) represents the function of a network layer parameterized by θ k . After a simple algebraic transformation, we can get is the Euler's discretization scheme of ODEs, As a consequence, the forward propagation process of a residue network is actually equivalent to the numerical solvation of a group of corresponding ordinary differential equations. Alternatively, it also means if we use an ODE solver to solve the ODEs directly, the process of forward propagation in a residue network is accomplished too. This significant finding lays down the theoretical foundation of ODENet. The application of ODE solvers could easily cope with input data with unequal time intervals, fight against medium-level noises, control the numerical errors and dynamically adjust its convergence criteria. To enhance the ability of learning the explicit governing ODEs from the pre-given time series data, in a previous work we combined the ODENet with symbolic regression and sparse identification means in the loss function L, an additional regulation term θ 1 is added in order to remove redundant free parameters θ as many as possible. So that the loss function contains two parts: The first part controls the difference between the training data y and the predicted data y by ε is a hyperparameter. To obtain the optimal parameters θ , the classical Back Propagation (BP) algorithm 18 is adopted to make an update, which will be repeated for many iterations until the loss function converges or is less than the threshold. Please see Fig. 1 on the flowchart of ODENet or refer to Ref. 4 for further details. Without loss of generality, we consider a chemical system with N species and M reactions 19 , where k j > 0 denotes the rate constant of the reaction j. The nonnegative integers {ν i j } and {ν i j } denote the stoichiometric coefficients of the reactants and products respectively. The stoichiomet- We focus on the molecular number of species (S 1 , S 2 , · · · , S N ) represented by a stochastic variable n = (n 1 , n 2 , · · · , n N ) T in a reaction vessel of volume V . When the magnitude of (n 1 , n 2 , · · · , n N ) T is relatively small compared with the Avogadro's constant, the randomness comes into play due to the intrinsic stochasticity of molecular collisions. From the perspective of ensemble average, we can denote the probability of the system in the state n by p(n,t), where the time-dependence is usually omitted as p(n). With respect to the reactions in (3), the probability distribution obeys the following chemical master equations (CMEs), in a compact form as, accompanied by the initial condition p(n)| t=0 = p 0 (n). Here u j is the j-th column of stoichiometric matrix U = (u 1 , u 2 , · · · , u M ), and Φ j (n) is the mesoscopic propensity function characterizing the probability Φ j (n)dt for which the j−th reaction occurs once within the time interval [t,t + dt). In general, the state-dependent mesoscopic propensity function Φ j (n) of CMEs is assumed to follow the laws of mass-action, which is the product of molecular number in a polynomial form and the rate coefficient k j for n l ≥ ν l j , ∀l = 1, 2, · · · , N. When there are not enough particles to form a reactant, saying S l , such that n l < ν l j , the propensity reduces to zero, Φ j (n) = 0. In most cases, the chemical master equations in (4) are a huge group of ordinary differential equations, which are quite computational consuming. Alternative efficient sampling algorithms are needed. The Gillespie algorithm (GA) 20 , which is able to generate typical time evolutionary trajectories of species according to the reaction mechanisms and reaction rates in a stochastic way, maybe the most famous one. Gillespie implemented two stochastic simulation algorithms. The one is the direct method (DM) and the other is the first-reaction method (FRM). These two methods are theoretically equivalent, so we here only implement the first reaction method. The FRM generates putative time for every reaction and chooses a time at which the corresponding reaction would occur while no other reaction occurred before that. By independently running the Gillespie algorithm once and again, statistics on the corresponding stochastic trajectories will converge to the corrected probability distribution given by the chemical master equations. They constitute the training data set to feed into the machine learning algorithms. Although CMEs provide a relatively accurate way to model general chemical reaction systems, it leads to a heavy burden in both modeling and experiments since the dimensionality of the transition matrix is usually extremely high. Moreover, the time-consuming numerical simulation of CMEs becomes a common bottleneck when the number of species or reactions is large. In order to make a simplification, we turn to look at the mean density of species, when the molecular number of reactants becomes large. To deduce the macroscopic kinetics of the concentration c i , we multiply (4) by the number density V −1 n i and take the summation over all admissible state {n} on both sides, which yields, where in the last step we have used the variable substitution n − u j = n and have neglected the boundary terms. Direct calculation shows that the volume density of mesoscopic propensity func- Taking the limit of V → +∞, n → +∞ while keeping V −1 n finite, we have the following mass- on a nonnegative continuous state space {c|c ∈ R N ≥0 }. The MAEs in (8) is the macroscopic description derived from the mesoscopic CMEs of the reaction system (3). A rigorous mathematical justification of the above limit process was first done by Kurtz in the 1970s 21 . Similar procedure can be carried out for high-order moments of PDF, like the second-order variance studied in the first example in Section III. Remark II.1 According to the results proved by Kurtz 21 QSSA is a very classical model reduction approach and has been widely used in the study of chemical reactions, see e.g. Ref. 22 for details. In A. Single Proliferative Compartment Model In the first example, the SPCM of IFE maintenance considered by Clayton et al. 23 is adopted to illustrate how multiscale modeling helps to reduce the computational cost of machine learning during inferring the detailed reaction mechanisms and reaction rates. According to the observations by Clayton et al. 23 , the clone fate of proliferating epidermal progenitor cells (EPCs) plays an essential role in adult epidermal homeostasis. And the key clone size distribution is modeled by chemical master equations, whose explicit forms are the major goal of machine learning. By taking the explicit correspondence between mesoscopic chemical master equations and macroscopic mass-action equations in the Kurtz's limit, the challenging task of learning detailed probability distribution function is converted into learning low-order moments. Obviously, the latter is much easier. A similar idea has been previously applied by one of the authors to investigate the kinetics of amyloid aggregation, but without referring to machine learning 24, 25 . Consider two reactant species in the single-proliferative compartment model, including proliferating EPCs (denoted as A) and post-mitotic cells in the basal layer (B). There are four reactions which involve symmetric cell division and asymmetric cell division. As shown in Fig. 2a, A has a unlimited self-reproduction potential at a rate r 1 λ in order to maintain the epidermis, where λ is the integrated division rate of proliferating EPCs A. A can also differentiate into A + B or B + B at the rate of (1 − r 1 − r 2 )λ or r 2 λ , respectively. In the transfer process, B cells in the basal layer leak from the clone-size distributions at Γ rate. The above reaction system reduces to the one studied by Clayton et al. 23 with symmetric division rates r 1 = r 2 . With respect to the SPCM and coefficients given in Fig. 2 , 10 6 times independent stochastic simulations are performed by using the Gillespie algorithm. They constitute the training data set to feed into our following ODENet-based machine learning procedure. Here our major goal is to obtain the SPCM in Fig. 2a Now we implement the ODENet to learn the dynamics in (11) . Clearly, not all reaction rate constants will appear in the final model. Those redundant coefficients will be picked out by ODENet and removed through sparse identification. After training and regression, only three non-zero coefficients α 11 = 0.0079, α 21 = 1.0903 and α 22 = −0.3094 are kept in the final results. During the learning procedure of ODENet, since all coefficients in front of quadratic terms in (11) are removed, we can make a conclusion that only first-order reactions are present in the current system. Then with respect to above learned dynamics and coefficients, the desired single proliferative compartment model as shown in Fig. 2a Fig. 2a) could not be excluded in principle, which means at the moment the probability distribution of A-type and B-type cells follows Here the same notations are borrowed just for simplicity. It should be noted that at the moment we still have no precise knowledge on all six reaction rate constants k 1 , k 2 , r 1 , r 2 , λ and Γ. To determine the unknown coefficients, we further go to the second-order of PDF, the variance of cell numbers to be exact. Based on (12) , the first-order (average) and second-order moments (variance) of n A and n B evolve according to where V A = n 2 A − n A 2 , V B = n 2 B − n B 2 and Cov = n A n B − n A n B . Furthermore, we have β 11 = k 1 + k 2 + (r 1 + r 2 ) λ , β 12 = −2 (k 1 + k 2 ) + 2 (r 1 − r 2 ) λ , β 21 = k 2 + (1 − r 1 + 3r 2 ) λ , β 22 = Γ, β 23 = −2Γ, β 24 = 2 ((1 − r 1 + r 2 ) λ + k 2 ), β 31 = − (2r 2 λ + k 2 ), β 32 = (1 − r 1 + r 2 ) λ + k 2 , β 33 = (r 1 − r 2 ) λ − k 1 − k 2 − Γ. Now following the same procedure, the unknown values of β 's could be learned by ODENet and are summarized in SI. The relations among desired rate constants k 1 , k 2 , r 1 , r 2 , λ , Γ and those learned parameters α's and β 's are stated through the following matrix, i.e. Direct calculations show that the rank of the augmented matrix (rank(V |b) = 7) is larger than that of the coefficient matrix (rank(V ) = 6), meaning the linear equations in (14) constitute an overdetermined system, which can be solved through the Least Square Method. The unique leastsquare solution is given byû = V T V −1 V Tb , whose relative errors with respect to the true values are less than 8%. Even though k 1 and k 2 are not exactly identified as zero, their values are about one order of magnitude smaller than the others. In this sense, we have successful reconstructed the original SPCM based on the stochastic time trajectories of n A and n B in the training data set. As further validated in Fig. 3 , the joint probability distribution of the desired single-proliferative compartment model is honestly reproduced (see SI for the marginal probability distribution), which highlights the efficiency and effectiveness of our integrated approach of ODENet with multiscale modeling during the study of chemical reactions. In the second example, we plan to show how machine learning can be used for model reduction, an important aspect of multiscale modeling with vast applications in chemical reactions. To illustrate our ideas, let us consider a gene network with autoregulatory negative feedback, which includes five reactants -the gene (G), mRNA (M), protein (P), and two gene-protein complexes (GP, GP 2 ). Among them, there are eight reactions (see Fig. 4a ). k 0 , k s , k dm are rate constants of transcription from the gene G, translation into the protein P, and mRNA degradation, respectively. The gene can bind with either one or two proteins, whose forward and backward reaction rate constants are denoted as k 1 , k −1 , k 2 and k −2 separately. Furthermore, it is assumed that GP produces mRNA at the same rate k 0 as the transcription rate of G alone. Macroscopically, the gene network in Fig. 4a is described by chemical mass-action equations, It is noted that the total gene concentration is a constant due to the conservation law, i.e. c G + Due to the existence of time-scale separation, it is possible to make a simplification of the reaction system in (15 At the first step, with the help of traditional classification algorithms, like the correlation analysis based on the Pearson's coefficient between concentration derivatives (see Table. II), the fast and slow variables can be easily separated into two groups. Inspired by the classical results of Michaelis-Menton kinetics, we suppose three fast variables c G , c GP , c GP 2 (see Fig. 4b-4e ) are characterized by fractional functions, whose numerator and denominator are polynomials of c P (up to the second-order in the current study). In contrast, c M does not appear in the fractional functions, since the last three formulas in (15) contain no terms of c M . Consequently, the simplified model we are seeking for is given by where Ω = β 1 + β 2 c p + β 3 c 2 P , Ω 1 = α 11 + α 21 c P + α 31 c 2 P , Ω 2 = α 12 + α 22 c P + α 32 c 2 P , Ω 3 = α 13 + α 23 c P + α 33 c 2 P . β 1 , · · · , β 3 and α 11 , · · · , α 33 are twelve free parameters to be specified. As summarized in Table. III, the simplified model learned by ODENet is very close to that by QSSA (see next section). In particular, terms of α 21 c P , α 31 c 2 P , α 12 , α 32 c 2 P , α 13 and α 23 c P are removed by sparse identification during the learning procedure. A major difference between two simplification methods lies in the extra four underlined terms on the right-hand side of the first formula in (16) . In QSSA, these four terms are exactly cancelled by each other. While during the simplification procedure aided by ODENet, we can only conclude that their sum is quite small instead of exactly zero (see SI). Our above ODENet aided model reduction is consistent with the classical quasi steady-state approximation. Since G, GP, GP 2 are considered as the fast intermediates, in contrast to the slow species P and M, a direct application of QSSA to (15) leads to where Ω = K 3 + K 2 c P + c P 2 , The corresponding reduced equations are which has been used to evaluate the performance of our ODENet aided model reduction. Nowadays, various machine learning algorithms, like deep learning and reinforcement learning, have found their applications in diverse fields with great success. While in the field of chemical reactions, related studies begin to emerge, yet are still quite few. In the current paper, through two concrete biochemical examples, the single proliferative compartment model and a gene network with autoregulatory negative feedback, we present our key ideas on how machine learning and multiscale modeling can help each other during the study of chemical reactions. And, as we believe, an effective integration of two approaches will be crucial for the success of related studies in this direction. Potential generalizations of our current work include but are not limited to: (1) The spacial heterogeneity of chemical reactions. In the current study, all reactions are assumed to proceed under well-mixed conditions, which means we can adopt a relatively simple ODE-based description. However, it is well-known the spacial heterogeneity can produce far more complicated and also interesting phenomena 1 , like the Turing pattern, phase separation, active matter, etc. So how to generalize our results to PDEs would be of general interest. Recently, PDE-based machine learning algorithms 27, 28 shed light on this aspect. (2) Bistability, oscillation, bifurcation of chemical reactions. Even restricted to ODEs, a chemical reaction system can possess very complex dynamical behaviors, like bistability, oscillation, bifurcation, blow-up, etc., than one can imagine 29, 30 . In the presence of noise, the situation becomes even more complicated. The high-nonlinearity of chemical reactions puts forward great challenges to our ODENet-based model derivation and model reduction. Bureau (2020-XG-002). The authors would like to thank the helpful discussions from Dr. Pipi Hu. All the data in this paper which support the findings of this study are available from the corresponding author upon reasonable request. Mathematical Models of Chemical Reactions: Theory and Applications of Deterministic and Stochastic Models Inferring biological networks by sparse identification of nonlinear dynamics Revealing hidden dynamics from time-series data by odenet Learning chemical reaction networks from trajectory data A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data Prediction of amyloid aggregation rates by machine learning and feature selection Learning to predict chemical reactions Revisiting the gaussian process regression for fitting highdimensional potential energy surface and its application to the OH + HO 2 −→ O 2 + H 2 O reaction Reverse engineering and identification in systems biology: strategies, perspectives and challenges Discovering governing equations from data by sparse identification of nonlinear dynamical systems Sparse learning of stochastic dynamical equations Robust approaches to generating reliable predictive models in systems biology Automated adaptive inference of phenomenological dynamical models Principles of Multiscale Modeling Neural ordinary differential equations Deep residual learning for image recognition Backpropagation applied to handwritten zip code recognition Analysis of complex reaction networks Exact stochastic simulation of coupled chemical reactions The relationship between stochastic and deterministic models for chemical reactions The quasi-steady-state assumption: A case study in perturbation A single type of progenitor cell maintains normal epidermis Simple moment-closure model for the self-assembly of breakable amyloid filaments Modeling fibril fragmentation in real-time Partial equilibrium approximations in apoptosis. ii. the death-inducing signaling complex subsystem Data-driven discovery of partial differential equations Numerical gaussian processes for time-dependent and nonlinear partial differential equations Stochastic bifurcation, slow fluctuations, and bistability as an origin of biochemical complexity Stochastic bistability and bifurcation in a mesoscopic signaling system with autocatalytic kinase Information theory and statistical mechanics Logistic and Ordinal Regression, and Survival Analysis