key: cord-243070-0b06zk1q authors: Lesniewski, Andrew title: Epidemic control via stochastic optimal control date: 2020-04-14 journal: nan DOI: nan sha: doc_id: 243070 cord_uid: 0b06zk1q We study the problem of optimal control of the stochastic SIR model. Models of this type are used in mathematical epidemiology to capture the time evolution of highly infectious diseases such as COVID-19. Our approach relies on reformulating the Hamilton-Jacobi-Bellman equation as a stochastic minimum principle. This results in a system of forward backward stochastic differential equations, which is amenable to numerical solution via Monte Carlo simulations. We present a number of numerical solutions of the system under a variety of scenarios. The classic SIR model, originally proposed in [15] , is the standard tool of mathematical epidemiology [4] for quantitative analysis of the spread of an epidemic. It describes the state of the affected population in terms of three state variables, traditionally denoted by S, I, and R: (i) 0 ≤ S ≤ 1, the fraction of individuals who are susceptible to the disease. (ii) 0 ≤ I ≤ 1, the fraction of individuals who are infected with the disease. (iii) 0 ≤ R ≤ 1, the fraction of individuals who have been removed and are immune to the disease. Note that, in this simplified model, R includes the individuals who have left the population through mortality. The model dynamics is given as the following dynamical system: , where the constant parameters β > 0 and γ > 0 are called the infection rate and recovery rate, respectively. The evolution above is subject to the initial conditions: ( Notice that this dynamics obeys the conservation law consistent with the assumption that the variables S, I, and R represent population fractions. This means that the variable R is, in a way, redundant, as its current value does not affect the dynamics of S and I, and it can be computed in a straightforward manner from (3) . It is thus natural to consider the two dimensional system defined by S, I only. Eventually every epidemic comes to a natural halt 1 , but its impact on the population may be very serious. The overall objective of epidemic control is to slow down the spread of infection in a way that it does not overwhelm the healthcare system and it allows the economy to function. All of this should be done within the limits of available resources. In this note we study the problem of optimal control of an epidemic modeled by means of a stochastic extension of the SIR model (see Section 2 for definition). We assume that the controlling agent ("government") has the ability to impact the spread of the disease through one of the policies (or the combination of both): (i) Vaccination of susceptible individuals, with rate v. This makes the fraction vS of the susceptible population immune to the disease. (ii) Isolation of infected individuals, with rate i. This removes the fraction iI of the infected population and prevents it from spreading the disease. The controlled dynamics of the SIR model reads [1] , [13] : R(t) = v(t)S(t) + (γ + i(t))I(t). For efficiency, we will be using the notation X 1 = S and X 2 = I throughout the remainder of the paper. Mathematical models are only as good as (i) their analytic specifications, and (ii) the data that fuel them. During initial phases of an epidemic the available data tend to be of limited usefulness: because of the lack of reliable large scale testing, it is not really known what fractions of the population fall into the different compartments S, I, and R. This may lead to a panic reaction of the population and a chaotic and economically devastating public health response to the epidemic. In the absence of an effective vaccine (which would allow to immunize a portion of the population) the optimal policy is to isolate at least a significant fraction of the infected individuals so that the basic reproduction ratio R 0 can be brought significantly below one. Since we lack the knowledge who is infected and who is not, the public health response is to try to isolate everyone, whether susceptible, infected or immune. These circumstances impose a serious limitation on practical applicability of the approach to optimal epidemic control discussed in this paper, as well as other quantitative approaches. Unless the inputs to the model (β, γ, and the current snapshots of S and I) can reliably be estimated from the available data, the model's output is unreliable 2 . The paper is organized as follows. In Section 2 we review the formulation of the continuous time stochastic SIR model. The optimal control problem for the stochastic SIR model is formulated in Section 3. The optimal control problem is recast as the stochastic minimum principle problem and formulated in terms of a system of forward backward stochastic differential equations (FBSDE). We present an algorithm for solving this system in Section 5. This section presents also the results of a number 2 "This is indeed a mystery," Watson remarked. "What do you imagine that it means?" "I have no data yet," Holmes replied. "It is a capital mistake to theorise before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts." [8] of numerical experiments involving the three mitigation policies within various cost function regimes. Acknowledgement. I would like to thank Nicholas Lesniewski for numerous discussions. We consider a continuous time stochastic extension of the deterministic SIR model, see [3] and references therein. Let W t denote the standard Brownian motion, and leṫ W t denote the white noise process. We assume that the infection rate β is subject to random shocks, rather than being constant, namely while the recovery rate γ remains constant. Here σ > 0 is a constant volatility parameter. This leads to the following system of stochastic differential equations (SDE), driven by W t with the initial conditions The third component of the process, X 3 = R, follows the dynamics which implies that the conservation law continues to hold in the stochastic model. Notice that, under the stochastic SIR model (6), an epidemic eventually comes to a natural end. More precisely, the solution (X 1,t , X 2,t ) = (0, 0) to (6) is stable in probability [16] . In order to see it, we set for 0 ≤ X i ≤ 1, i = 1, 2, and fixed 0 < ρ < β/γ. Then V ρ (0, 0) = 0, and V ρ (X 1 , X 2 ) > 0 in a neighborhood of (0.0). Furthermore, denoting by L the generator of the process(6), we verify that In other words, V ρ (X 1 , X 2 ) is a Lyapunov function for (6) and our claim follows from Theorem 5.3 in [16] . The model (6) is a one factor model, driven by a single source of randomness. There is a natural two-factor version of the stochastic SIR model [18] , in which also the recovery rate γ is allowed to be subject to white noise shocks. For simplicity, our analysis will focus on the one-factor model (6). We frame the problem of epidemic control as a stochastic control problem [2] . We denote by u = (u 1 , u 2 ) ≡ (v, i) the vaccination and isolation controls, and we denote the controlled process by X u t . Generalizing the deterministic specification (4) to the stochastic case, we assume that the dynamics of X u t is given by: subject to initial conditions (7) . Two special cases of the controlled process are of interest. If a vaccine against the disease is unavailable, we set u 1 = 0 in the equation above, which yields the following controlled process: We will refer to this policy as an isolation policy. Similarly, we can consider a vaccination policy, for which u 2 = 0. In this case the controlled dynamics reads dX u We assume a finite time horizon T < ∞. The controlling agent's objective is to minimize a running cost function c(X t , u t ) and the terminal value function G(X T ). In other words, we are seeking a policy u * such that We consider the following cost function: where for i = 1, 2. In other words, the running cost of vaccination is assumed to be proportional to the number of susceptible individuals, while the cost of isolation is assumed to be proportional to the number of infected individuals. The coefficients L i > 0, M i , N i are determined by the overall cost of following the mitigation policy. In particular, they should be selected so that the running cost functions are strictly positive. As the terminal value function we take the transmission rate of the infection [13] , namely We now invoke stochastic dynamic programming, see eg. [10] , [21] . The key element of this approach is the value function J(t, x, y). It is determined by two requirements: (B1) It satisfies Bellman's principle of optimality, for all 0 ≤ t < T , where E t denotes conditional expectation with respect to the information set at time t, and where the minimum is taken over all admissible controls u t , [10] . (B2) It satisfies the terminal condition, Using Ito's lemma, we verify that these two conditions lead to the following nonlinear partial differential equation for the value function, namely the stochastic Hamilton-Jacobi-Bellman equation: subject to the terminal condition As the first step towards solving the HJB equation (21), we let u = u * denote the minimizer of the expression inside the curly parentheses in (21) . In other words, u * satisfies which leads to the following first order condition on u * : Substituting u * back to the HJB equation yields the equatioṅ We do not believe that the solution to this equation can be explicitly represented in terms of standard functions. This is a three dimensional partial differential equation, and solving it numerically may pose challenges. Rather than following this path, we shall invoke the stochastic minimum principle and reformulate the problem as a system of FBSDEs. Among the advantages of this approach is that it might be amenable to a deep learning approach via the method advocated in [14] . The stochastic minimum principle, see [21] and references therein, offers an alternative approach to stochastic optimal control. It is a stochastic version of Pontryagin's minimum principle introduced in the context of deterministic optimal control [22] . It also offers an effective numerical methodology to solving the HJB equation (24) via Monte Carlo simulations. In this approach, the key object is the Hamiltonian function H = H(x, u, y, z) of four arguments and a system of stochastic differential equations, both forward and backward, determined by H. Specifically, for the case of the controlled stochastic SIR model (14), we have x, u, y, z ∈ R 2 , and the Hamiltonian function reads: We consider the following system of stochastic Hamilton's equations: and where Equation (26) is merely an alternative way of writing the underlying controlled diffusion process (14), while equation (27) reflects the dynamics of the control variables given the running cost function. Note that while the first of the equations (26) is a standard (forward) SDE, the second one is a backward stochastic differential equation (BSDE), see e.g. [9] . Explicitly, these four equations can be stated as subject to the boundary conditions in (26). Then u * is an optimal control, i.e. it satisfies the optimality condition (15) . In fact, there is a direct link between the Hamilton function H and the value function J (also known in classical dynamics as the action function). Namely, we set [21] , [23] : If u * is an optimal control and X * denotes the corresponding optimal diffusion process, then the pair is the solution to the BSDE (27). Going back to the main line of reasoning, we find that u * has to satisfy From equations (32) we see that, up to a simple linear transformation, Y t is essentially the optimal policy. Furthermore, the process Z t can be thought of as the sensitivity of the optimal policy to the underlying process X t (multiplied by the instantaneous volatility of that process). Substituting this expression into (29), we find that the optimal process has to follow the dynamics: subject to the boundary conditions in (26) and (27). In particular, the isolation only and vaccination only policies are given by the following systems of FBSDEs: and respectively. Even though derived in the context of a meaningful underlying dynamics, there is no a priori reason why these equations should have solutions. The drivers of the backward equations above contain terms quadratic in Y , and so the standard existence theorems [9] do not apply. We proceed in the following assuming that these systems do, in fact, have solutions. In this section we discuss a numerical algorithm for solving the system (34). Applying this methodology in a number of numerical experiments, we present compelling evidence that solutions to (34) exist and are meaningful over a wide range of model parameters. We start by describing a method for discretization of the basic FBSDE system (34). We notice first that the two dimensional system defined by the state variables S, I is in fact Hamiltonian [17] . Namely, we define new (canonical) variables and It is easy to verify that d dt under the dynamics (1). The system (1) can be explicitly written in the Hamilton forṁ Notice also that the Hamiltonian function is separable, i.e. it is of the form H(p, q) = T (p) + V (q) 4 . A convenient discretization to (6) can be formulated in terms of the canonical variables (p t , q t ) = (− log X 2,t , − log X 1,t ) as follows [18] . Choose the number of steps n, define the time step δ = T /n, set t n = nδ for n = 0, 1, . . . , n, and use simplified (p n , q n ) ≡ (p tn , q tn ). This yields the following Euler scheme: for n = 0, 1, . . . , n − 1. Here, the Brownian motion increments ∆W = W (n+1)δ − W nδ are independent variates drawn from the normal distribution N (0, δ). At each iteration step also have to to floor q n+1 and p n+1 at 0, q n+1 = max(q n+1 , 0), p n+1 = max(p n+1 , 0), so that e −qn , e −pn ≤ 1. Approximating the backward equations of the system (34) leads to the following backward Euler scheme: where denote the generators of the two BSDEs. Starting with the terminal condition we will move backward in time with computing Y n and Z n , for n = n − 1, . . . , 0. Notice that two difficulties arise while doing so: (i) the Y n 's in (42) are not necessarily adapted, and (ii) they depend on Z n . These two problems can be solved by taking conditional expectations E n ( · ) = E( · | X 0 , X 1 , . . . , X n ) on time t n . This leads to the following condition: This is an implicit scheme, which may slow down the computations. However, we can easily replace it with an explicit scheme of the same order: In order to determine Z i,n , i = 1, 2, we multiply (42) by an increment ∆W n and take conditional expectations. This yields and hence we obtain the following expression for Z i,n : In summary, we are led to the following discrete time scheme for solving (42): for n = n − 1, . . . , 0. Note that simulating this system requires numerical estimation of the conditional expected values E n ( · ) in the formulas above. We discuss this issue in the following section. A practical and powerful method of computing the conditional expected values in (45) is the Longstaff-Schwartz regression method originally developed for modeling American options [20] . We use a variant of this method that involves the Hermite polynomials, and that was used for a similar purpose in [19] . This choice is natural, as conditional expectations of Hermite polynomials of a Gaussian random variable lead to simple closed form expressions. Let He k (x), k = 0, 1, . . ., denote the k-th normalized Hermite polynomial corresponding to the standard Gaussian measure dµ(x) = (2π) −1/2 e −x 2 /2 dx. These functions form an orthonormal basis for the Hilbert space L 2 R, dµ . The key property of He k (x) is the following addition formula for χ ∈ [0, 1] and w, x ∈ R: Consequently, integrating over x with respect to to the measure µ yields the following conditioning rule: Here, w, x are independent standard normal random variables. We shall use this relation in order to estimate the conditional expected values in (45). We set W ti = √ t i w i , for i = 1, . . . , m, where w i is an n-dimensional standard normal random variable. We notice that where χ i = t i /t i+1 , and where X i is standard normal and independent of w i . In the following, we shall use this decomposition in conjunction with (47). Now, we assume the following linear architecture: where K is the cutoff value of the order of the Hermite polynomials. This is simply a truncated expansion of the random variable Y i+1 in terms of the orthonormal basis He k (w i+1 ). The values of the Fourier coefficients are estimated by means of ordinary least square regression. Then, as a consequence of the conditioning rule (47), In other words, conditioning Y i+1 on w i is equivalent to multiplying its Fourier coefficients g k by the factor χ k/2 i . This allows us to calculate the first term on the right hand side of the third equation in (45). In order to calculate the second term, we substitute the explicit formula for Z n into the generators (43) and repeat the calculations in (49) and (50) with Y n+1 replaced by Y n+1 + f (X n , Y n , Z n )δ. For numerical experiments within the framework developed above, we assume a time horizon of 1 year (365 days), and choose the SIR model parameters as follows: β = 38.0, γ = 11.5, S 0 = 0.999, These parameters are purely hypothetical, and they do not arise from the calibration to any actually reported data. The corresponding value of the basic reproduction ratio R 0 = S 0 β/γ is 3.30 and it indicates a highly infectious disease such as COVID-19. We choose the diffusion parameter σ = 3.1. (52) A typical scenario generated by this model is graphed in Figure 1 . For solving the FBSDE system (34) we generate 2, 000 scenarios (Monte Carlo paths) using the low variance spectral decomposition method. Each scenario is based on daily sampling, i.e. n = 365. For calculating the conditional expectations in the Longstaff-Schwartz regression we use the cutoff value of K = 4. The expectation behind this choice is that we obtain the accuracy of four sigma. We attempt to construct a numerical solution to (34) through the following iterative procedure. We start by generating initial Monte Carlo paths X (0) t using the (discretized version of the) raw process (6) . For the initial guess of Y t we take Y Notice that this is not at solution to the backward equations in (34), it merely satisfies the terminal condition. After this, we iterate for k = 0, 1, . . ., until the stopping criterion is satisfied or the maximum number of iterations is reached. For the stopping criterion we choose the condition that the average L 2 -norm change of a Monte Carlo path falls below a tolerance threshold of 10 −8 . We make various choices of the coefficients L, M, N defining the running cost functions. Depending on the values of these coefficients, the iterative process described above converges to a meaningful solution or it diverges. At this point, it is unclear what choices of the coefficients lead to what outcomes, but it appears that there are well defined basins of convergence in the space of the parameters. We first consider a high cost policy. Figures 2 -5 show the graphs of the solutions assuming the following parameters of the quadratic polynomial in the running cost function c 2 (u 2 , x 2 ): Unlike Figure 1 , the curves in all graphs below show the averages of the corresponding quantities over 2,000 Monte Carlo paths. Under this running cost function, the optimal policy is to implement a draconian isolation regime, which leads to a rapid drop in infections, while keeping the susceptible fraction of the population at a very high level. On the other hand, Figures 6 -9 show the plots of the solutions assuming a low cost policy, with the following parameters of the quadratic polynomial in the running cost function c 2 (u 2 , x 2 ): Under this running cost function, the optimal policy is a moderate isolation regime. Following this policy, the isolation rate is high, as the infections are low, and then it declines over time while the epidemic develops. Unlike the policy above, this leads to a gradual decline in both the infected and susceptible fractions of the population. Consider now the case of an optimal vaccination strategy. Again, we make two choices of the running cost function: high cost and low cost. For the high cost case we choose the parameters as follows: The results of Monte Carlo simulations for this cost function are plotted in Figures 10 -13 . They parallel the results presented above in the case of high cost isolation mitigation. The optimal policy is a massive vaccination campaign that dramatically reduces the susceptible fraction of the population and leads to significantly lower infections. Population biology of infectious diseases: Part I Dynamic Programming Time-optimal control strategies in SIR epidemic models Mathematical Models in Epidemiology Stochastic epidemic models: A survey Deterministic and stochastic models for recurrent epidemics Discrete Time Approximation and Monte-Carlo Simulation of Backward Stochastic Differential Equations, Stochastic Processes and their Applications A Scandal in Bohemia, (1891), included in The Adventures of Sherlock Holmes Backward stochastic differential equations in finance Controlled Markov processes and viscosity solutions A stochastic differential equation SIS epidemic model Geometric numerical integration illustrated by the StörmerVerlet method Optimal control of epidemics with limited resources Solving high-dimensional partial differential equations using deep learning A contribution to the mathematical theory of epidemics Stochastic Stability of Differential Equations Simulating Hamiltonian Dynamics Options on infectious diseases Managing counterparty credit risk via backward stochastic differential equations Valuing American Options by Simulation: A Simple Least-Squares Approach Continuous-time Stochastic Control and Optimization with Financial Applications The Mathematical Theory of Optimal Processes Stochastic Controls: Hamiltonian Systems and HJB Equations