key: cord-0511064-4giw3s5k
authors: Vijayan, Sushant
title: Differential Games in the spread of Covid-19
date: 2021-09-27
journal: nan
DOI: nan
sha: a2ea19ea74585b4908338db8b5f9b99b4d33d2fa
doc_id: 511064
cord_uid: 4giw3s5k

Given the ongoing Covid-19 pandemic, it is of interest to understand how the infections spread as the combined result of measures taken by central planners (governments) and individual behavior. In this work, the spread of Covid-19 is modelled as a differentiable game between the planner and population with appropriate disease spread dynamical equations. We first characterise the equilibrium dynamics of only the population with modifed Susceptible-Infected-Recovered (SIR) equations to highlight the qualitative nature of the equilbrium. Using this result, we formulate the joint equilibrium exposure profile between the planner and population. Additionally, as in case of Covid-19, the role of asymptomatic carriers, inadequacies in testing, contact tracing and quarantining can lead to a significant underestimate of the true infected numbers as compared to just the detected numbers. Therefore, it is vital to model the true infected numbers within the context of choices made by individuals within the population. To incorporate this, we extend our framework by modifying the dynamics to include additional sub-compartments of `undetected infected' and `detected infected' in the disease dynamics. The individuals make their own estimates of the total infected from the detected numbers and base their strategies on those estimates. We show that these considerations lead to a retarded optimal control problem for the players. We present some simulation results based on these results to demonstrate how population behavior, planner control, detection rates and trust in the reported numbers play a key role in how the disease spreads.

Infectious diseases spread because of interactions between the infected and the susceptible. At an individual level, a simple strategy to reduce the possibility of transmission is to voluntarily reduce ones interaction with others, i.e., to do social distancing. A central planner aims to impose constraints to individual behavior so as to maximise the total societal welfare. To model the disease spread effectively it is important to combine the choices of both the planner and the individuals in a unified way. The main goal of this work is to formulate a game theoretic framework in which one can analyse and characterise the resulting equilibrium between the individuals and the planner. A secondary goal is to modify the disease spread dynamics so as to account for the spread of disease by infected individuals who are not detected and isolated from the susceptibles. This also leads to incorporating the individual's estimate of infection in the disease spread model. Using this, we present some simulations to qualitatively demonstrate the impact of planner control, population choices, detection rates and trust in the detected numbers in the spread of the disease. 

In mathematical epidemiology the spread of diseases is often modelled through various compartmental models. The simplest of them is the SIR model [1] . A considerable literature has been built up to include many extensions and variations to this basic model (see for example [2] , [3] , [4] and references therein). An issue with these models is that they don't capture interventions of government nor individual choices. These decisions can have a significant impact on the disease spread trajectory.

Prior to the outbreak of Covid-19 some applications of optimal control in field of epidemiology include masking rates to prevent swine flu [5] , treatment rates in dengue transmission [6] , etc. Since the Covid-19 pandemic began there have been a considerable number of works which formulate optimal Non-Pharmaceutical Intervention (NPI) as a control problem. [7] proposed a a lockdown that tapers down gradually. [8] has a multi-group SIR model in which the authors look at the optimal control problem for a social planner with control to do age-specific targeted lockdowns. They show that the optimal solution is to enforce stringent lockdowns for the older section of the population. [9] shows that intermittent lockdowns may be better than moderate measures suggested above for a class of utility functions particularly in low sero-prevalence scenarios. In a line of work closely related to the current work, [10] and [11] combine game theoretic equilibrium analysis based on utility considerations of the individuals with the SIR model. It is clear from disease trajectories of many countries that it is not sufficient to study a control problem for a planner or the equilibrium strategies of the population separately. In all these models, they don't study combined interaction between individual choices and governmental policies. We find that the planner can try to take advantage of social distancing tendencies of individuals to control the spread of diseases. The above models also don't incorporate how perceptions of the extent of disease spread affects the further spread of the disease.

We assume that the game is played till a finite horizon time T . This T can be interpreted as the idealized vaccine arrival time wherein the vaccine affects the entire population instantly and puts an end to the disease.

The planner tries to control the spread of disease by imposing constraints on individual exposure choices. The individual tries to modify their behavior by either reducing or increasing their exposure fraction at any time t while complying with the constraints imposed by the planner.

A strategy A ∈ C[0, T ] 1 of the planner is such that A t 2 ∈ [0, 1], ∀t ∈ [0, T ]. A t sets the maximum permissible exposure of an individual at time t. A t represents the restrictions imposed by the planner on the individual's exposure profiles in the form of lockdowns, closing down schools, restricting public transport and other NPIs. A t can vary between [0, 1] but, as will be argued later, a natural upper bound of A t to be binding is the so called 'population equilibrium' formed by individuals amongst themselves. In a realistic setting it is unacceptable (due to public disapproval of harsh lockdowns) for the planner to keep the threshold extremely low for extended periods of time. To capture this we extend the model to ensure certain average threshold limits are imposed on the planner.

An individual is assumed to respond to the spread of the disease by modifying their exposure to other individuals while complying with the threshold (A t ) imposed by the planner. To keep the analysis simple, each individual is considered indistinguishable from another (one could consider the more realistic setting of several distinct groups) and is assumed to symmetrically employ an exposure strategy (or individual control or exposure profile) g ∈ C[0, T ] with g t ∈ [0, A t ], ∀t ∈ [0, T ]. g t represents the reduced exposure from a normal baseline of unity prior to the onset of the disease. For the purposes of calculating the equilbrium we shall, at times, also consider the strategy g α of a canonical individual α differing from the symmetric strategy g employed by the rest of the population.

In the simplest setting we assume that infected individuals are not isolated from the rest of the population and know the total number of infected at any given time. Later, we remove these restrictions by modelling the detecting of infection by introducing new infection compartments of 'undetected infected' and 'detected infected'.

The evolution of susceptible fraction S t ∈ [0, 1] and the infected fraction I t ∈ [0, 1] is given by:

with the initial conditions S 0 = 1 − ǫ, I 0 = ǫ. ǫ is the initial fraction of infection in the population, β is the probability of getting infected per interaction with an infected individual and γ is the recovery rate from infection 3 . If the entire population plays a uniform exposure fraction g t , the effective susceptible and infection fractions are g t S t and g t I t respectively, and the total number of interactions between the susceptibles and infected is g quadratic dependence on g t . We note that if all individuals played g t = 1, ∀t ∈ [0, T ] then we get the standard SIR model. A canonical individual α, will get infected at an unknown random time τ α . We assume that each individual has an estimate for their own survival probability which they estimate through a hazard rate model

Here g α t is the exposure fraction strategy played by the individual α and P (τ α > t) is the survival probability 4 at time t. This estimate of the infection/survival probability and the trade off between benefits and risks of exposure will drive the individual's exposure strategy. To model the benefits and risks for the players we next define the cost functionals which they each will seek to minimise.

For simplicity, we assume α gets a linear rate of benefit B per unit time from interacting with other individuals in the society. We also assume that upon contracting the disease, the individual suffers a one time cost C. If α survives till T , there is a reward R for surviving. Thus, α minimises the following cost functional J α where ( in what follows 1 1 {E} is the characteristic function of a set E.)

The expectation above is with respect to the α's survival probability defined in (2) above. An equivalent formulation of the functional 5 more suited for optimal control methods we seek to apply is

In the objective functionals above, we impose the restriction g α t , g t ≤ A t . An individual gets a benefit of −Bg t ∆t and has an expected cost of infection Cβg 2 t I t from an exposure strategy of g t in the interval [t, t+∆t]. Thus, a measure of cost borne by the entire society is then (−B + Cβg t I t )g t S t ∆t during this interval. Similarly, the total societal reward for surviving is RS T . The above discussion motivates the planner's functional to be

We shall first describe the result when there is no planner control (A t = 1, ∀t). The result highlights the qualitative nature of the equilibrium amongst only the individuals in the population. It will also serve as a natural constraint on A t when we consider the more involved case with the planner control. This allows one to take a symmetric view on strategies -choices of the planner constrains the choices of the individual and vice versa.

We shall assume that there is no control from the planner, i.e., A t = 1 for all t ∈ [0, T ]. It is only a game between the individuals of the population. An equilibrium result similar to that mentioned in this subsection can be found in [10] . To derive the equilibrium exposure strategy we use the Pontryagin Minimum Principle (PMP) (see section 3.3 in [12] ). Let α play the strategy profile g α t and let the rest of the population play g eq,t . Then in equilibrium we have

The first condition is the definition of an equilibrium (Nash) while the second equation follows from the symmetric assumption on the individual's exposure profile. Theorem 1: For the dynamical game without a central planner the equilibrium exposure profile must be of the form: g α eq,t = g eq,t = min

The corresponding dynamics are governed by the equations:

with boundary conditions:

The Hamiltonian (dynamical system is (1)- (2) and (3) is the cost functional) for the individual α's minimisation problem when all the other individuals play the profile g eq,t is

. λ t , µ t , ν t are the adjoint functions. We know that the optimal control g α t minimises the Hamiltonian and combining this along with (4) gives g α eq,t = arg min

=⇒ g α eq,t = min

The only adjoint variable in the expression above is λ t and from PMP we have its dynamical equation to be

with the additional boundary condition for this equation supplied by the transversality condition of PMP, i.e., λ T = −R.

From standard results in optimal control (see [15] ) we can associate the adjoint variable with partial derivative of the value function wrt state variables. Here, λ t is the partial derivative of the value function wrt P t . This leads us to conclude (see Appendix A for proof) that

Thus λ t can be interpreted as the future expected cost given the individual has survived till time t. With this interpretation the equilibrium profile in Theorem 1 implies that whenever either the total number of infections or the future expected costs are high then the population starts social distancing until the number of infections decrease.

Lemma 2: λ t is a non-decreasing function of t. More precisely whenever g eq,t < 1 then λ t is a constant and is strictly increasing whenever g eq,t = 1.

Proof: This is easily verified by looking at the expressions for g eq,t and dλt dt from Theorem 1. This monotonicity intuitively is because for any arbitrarily small interval the equilibrium strategy must accrue more benefit than a strategy of complete social distancing, i.e, g t = 0 within the same time interval.

Lemma 2 indicates that in equilibrium the dynamics is a hybrid one with the value of g eq,t triggering the switch between the states of social distancing and normal behavior. When g eq,t = 1 we have normal behavior with no social distancing and the dynamics of S t , I t is just the same as the SIR model. We characterise the dynamics in social distancing regime (g eq,t < 1) with the following lemma:

Lemma 3: In the social distancing regime we have the following relation between S t and I t

where L n (x) and K n (x) denote the n th order modified Bessel functions of the first and the second kind, respectively.

. C 1 is determined by the initial values S 0 , I 0 , λ 0 at the onset of social distancing.

Proof: In the social distancing regime from Theorem 1 we have g eq,t = B βIt(C−λt) and that λ t is constant. Thus we have:

This is a Riccatti equation and the standard reduction of a Riccatti equation to a linear second order ODE (see [16] ) gives the result. We shall use the results in this subsection to characterise the more complicated equilibrium that exists between the planner and population of individuals.

In this case the planner is trying to optimally set a threshold to minmise its own cost functional. The constraint on the planner is that if it sets a high enough threshold then individuals behavior may follow the pure population equilibrium from the previous subsection (and hence the threshold A t becomes non-binding). We have the constraints that 0 ≤ g t ≤ A t and 0 ≤ A t ≤ g pop eq,t 6 . In this case at equilibrium we must have J α (g eq , g α eq,t , A eq,t ) ≤ J α (g eq , g α , A eq,t ), J P (g eq , A eq ) ≤ J P (g eq , A), g α eq,t = g eq,t .

As the set of admissible controls for the players vary with both time and state we use a generalised version of PMP (Theorem 3.1 in [13] ). We have the following result characterising the equilibrium between population and planner: Theorem 4: The population-central planner game has the following equilibrium profile:

, g pop eq,t 1 1 {C+λ2,t>λ1,t}

6 g pop eq,t refers to the pure population equilibrium described in the (5).

with g pop eq,t as defined in (5) of Theorem 1. The corresponding dynamics is given by:

with boundary conditions:

The individual is trying to minimise J α given the strategies g eq,t and A eq,t . Assuming A eq,t to be a given function of time we can parametrize the admissible set of controls as Q 1 ≤ 0 where:

This has a non-zero derivative wrt the control and hence we can apply Theorem 3.1 from [13] for the individual's control problem. The difference from Theorem 1 is that the controls are restricted dynamically. The Hamiltonian has to be minimised only within this dynamically changing feasible control set. The Hamiltonian in this case is H(S t , I t ,P t , g α t , λ t , λ 2 , λ 3 , µ t ) := P t g α t (−B + Cβg eq,t I t ) + λ t (−βg α t g eq,t I t P t ) + κ t (−βg 2 eq,t S t I t ) + ι t (βg 2 eq,t S t I t − γI t ) + µ t (g α t − A eq,t ). λ t , κ t , ι t and µ t are adjoint variables. Minimising the Hamiltonian with (7) gives g α eq,t = arg min g α t ∈[0,Aeq,t] P t g α t (−B + Cβg eq,t I t − λ t βf t g eq,t I t ) g α eq,t = g eq,t = min

, A eq,t 1 1 {C>λt} + A eq,t 1 1 {C≤λt} .

In the first step above µ t doesn't appear because of complementarity condition µ t (g α t − A eq,t ) = 0. The exposure profile in this case is the population equilibrium with upper threshold now set to A eq,t rather than 1.

In case of the planner, for the threshold to be binding, it must be set lesser than the population equilibrium profile (see (5) ).

g pop eq,t = min

Hence, for the planner, the state variables are S t , I t , λ t . The planner minimises J P given the population strategy g eq,t . The dynamical equations relevant to the planner are:

The set of admissible controls for the planner can be summarised by Q 2 ≤ 0 where

Thus we can again invoke Theorem 3.1 from [13] for the planners control problem, The Hamiltonian is given by

Minimising the Hamiltonian along with (7) gives A eq,t = arg min

The adjoint variables for the planner are denoted by λ 1,t , λ 2,t , λ 3,t . The adjoint equation of λ 3,t becomes: λ 3,t = −βA 2 t S t I t with the boundary condition λ 3,T = 0. But as S t , I t are positive, the only way this boundary condition can be satisfied is when λ 3,t = 0, ∀t. Using this in the minimum principle we get:

A eq,t = arg min

, g pop eq,t 1 1 {C+λ2,t>λ1,t} + g pop eq,t 1 1 {C+λ2,t≤λ1,t} .

The adjoint equations become:

It can be easily seen that A eq,t = g eq,t and hence the planner's threshold is always binding. The exposure profile of the population is also seen to be the net result of the strategic choices of the planner and the population. Additionally, we impose an average threshold constraint on the planner ie.

T 0 A t dt > C 1 . This is to prevent the planner from accessing strategies which entail harsh thresholds over an extended period. These types of constraints are called "isoperimetric constraints" and are handled in a standard way in optimal control literature (see [14] ).

As mentioned earlier it is important to model the group of undetected infectious indivduals who spread the disease. This framework also allows us to model the estimates of infection spread made by individuals and the planner. In this section we partition the infected group I t into two subgroups -I u,t , the undetected group of infected and I d,t , the detected group of infected. We have:

We assume that once the infected are detected they are effectively quarantined and no longer infect the susceptibles. Hence, an infected individual either recovers without being detected or gets quarantined after detection. An infected individual is modelled to remain infectious for a period of 1 γ and has a probability of being detected in this period.

For an individual α, conditioned on the event τ α = t, we assume a probability density of detection over the period (t, t + 1 γ ]. τ d denotes the random time of detection once α is infected. Thus τ d ∈ [0, 1 γ ]. For simplicity, we shall assume that the probability of detection is uniform over [0, 1 γ ].Thus, the individual's objective functional becomes:

We can re-write the first term as:

We have ∀t ∈ [0, T ]:

Assuming that τ α has a density f and the uniform conditional density for τ d is γ η , we rewrite the second term in RHS of (9) as:

Here 1 η (with η > 1) captures the probability of detection and is a parameter in the model. Setting M t := γ t t− 1 γ P (τ α > r)dr, we rewrite the individual's objective functional as (superscript d stands for detected):

(11) The individual α has knowledge only of I d,t and makes an estimate of I u,t from I d,t . For simplicity, we assume that the estimate has the form:

κ encapsulates the trust the population has on the reported detected numbers. Now as P (τ α > t) is linked to the individuals perception of infection, we must modify (2) to:

. The state equations for the individual are:

New infections are caused by the interaction between the susceptibles and undetected infected. These new infections are intially always assumed to be undetected. Then, some of the undetected infected move to I d,t due to the detection density γ η . The control formulation now has constant delays in state variable P t (:= P (τ α > t)) for both the objective functional and state equations. These types of control problems are called Retarded Optimal Control Problems (ROCP). We shall use a version of the minimum principle for this ROCP (see theorem 4.2 in [13] ). Although one can in principle also include the planner's control in this more elaborate model, the resulting profile is rather messy and unwieldy. This joint equilibrium profile can be derived in an analogous manner as in section III-B and is omitted. We shall assume a control on the part of the planner and present the result for only the resulting population equilibrium under this control.

Theorem 5: The population game with detection has the following equilibrium profile:

The equilibrium dynamics is given by (12) . Additionally, the equation for the adjoint variable λ t is given by

Proof: We apply theorem 4.2 from [13] to the ROCP of the individual α. Compared to Theorems 1 & 4 the Hamiltonian also incorporates the delayed state variables. Consequently, the adjoint equations and the optimal control depend on these delayed variables.The Hamiltonian is given by:

Minimising the Hamiltonian as a function of g α and using g eq,t = g α eq,t , we get g α eq,t = arg min

which is the profile in (13) . As the exposure profile depends only on λ t , which in turn depends on λ 4,t , we consider differential equations of only these two variables. We can explicitly solve for λ 4,t with the condition λ 4,T = 0. We have

The differential equation for λ t is then given by:

Bg eq,t+ 1

IV. SIMULATION RESULTS The theorems derived in section III provide a basis for simulating a dynamical system with initial infection. The equilibrium solution to the game leads to solving a system a differential equations with a two point boundary condition. This is fairly typical in optimal control and is due to the PMP. The equilibrium in the model with detection leads to a two point boundary value problem in a system of advanceddelay differential equations. We only approximately solve this system by using a cubic extrapolation for the advanced term (see [13] , [17] for other numerical examples).

The boundary value problem was solved using a shooting approach coupled with an initial value differential and delaydifferential equation solver. This then reduces the problem to solving a nonlinear problem (see [18] ) of finding the appropriate initial values for the adjoint variables. The values for the various parameters in the simulation are given in Table I above. The parameters η, the probability of detection, and κ, trust in detected numbers, are varied to give various scenarios shown in figures 4,5 and 6. In figures 1, 2 and 3, we plot the susceptible fraction, exposure and infected fraction, respectively, versus time. SIR (blue) shows an exponential decrease in susceptibles at peak infection with almost no susceptibles remaining at the end. This is the worst case scenario-no social distancing, no detection, no quarantining and no planner control. It has the highest peak infection whose onset is advanced compared to other scenarios.

The case with population equilibrium (black) shows a In the case with planner control (red) the planner initially sets a moderate threshold (see figure 2 ) to control the spread of the disease. This results in a delayed infection peak. As the infection numbers inevitably rise the population voluntarily reduce exposure below even the planner's threshold. This results in the peak infection becoming plateaued in a manner similar to population equilibrium. The social distancing and thresholding is gradually reduced as we approach the vaccine arrival (which is assumed to instantly stop the infection). Compared to the population equilibrium case the peak infection is delayed and the economic impact (as measured by exposure time) is reduced.

The case with high detection (η = 1) and high trust (κ = 1) leads to significantly lower peak than SIR but unlike the population or population planner cases the peak is not prolonged (though the peak infection itself is higher). There is no social distancing due to high levels of trust, detection and quarantining. The total susceptible surviving at the end is similar to population or population-planner case. This seems to be the most preferable case where the peak infection is delayed and not prolonged and the economic impact minimal assuming the higher peak infection can be managed.

In figures 4, 5 and 6, we plot the effects of different detection rates (η) and trust parameters (κ) on the spread of disease. We have already discussed the case with high detection and high trust in the paragraph above. In the case where the detection rates are high (η = 1) but trust (κ = 32) is low (dashed dark green), then we observe that (see figure 5 ) as soon as infected numbers peak the population completely reduces exposure to zero. This completely stops the disease spread. This is due to the low trust in the detected numbers. The population believes the planner is doing a poor job of the detection even though in reality the detection rates are high. This leads to unnecessary loss of exposure benefits.

In the low detection (η = 5) but high trust (κ = 1) (light green) case, the disease spread curve is very close to the SIR situation. This is expected since in this case there is poor detection and yet the population trusts the detected numbers are an accurate measure of disease spread. This leads to the undetected infected comprising the entirety of the infected numbers while the population seeing the low detection numbers chooses not to socially distance. This is similar to the SIR situation where there is no planner control nor any social distancing.

Finally in the low detection (η = 5) and low trust (κ = 32) (dashed light green) case, just as in the high detection low trust case, the population reduces exposure completely as soon as the infection numbers start to peak. However, compared to the high detection setting the total infected numbers is higher because most of the infected aren't detected and help spread the disease.

The various comparisons seem to suggest ideally for controlling disease spread one needs to have high detection rates with transparency in the reported numbers so that the population has confidence in the reported numbers. If the detection numbers are low and the planner tries to underplay the poor detection it could result in the worst possible scenarioan unmitigated disease spread which could stress the medical resources at peak infection. In practice, it is often difficult to achieve very high detection rates due to asymptomatic carriers, testing errors etc. and hence must be combined with some moderate amount of planner control in form of lockdowns and imposing social restrictions. The population also has an important role to play by voluntarily reducing exposure and other NPIs (wearing masks, sanitisation, adhering to restrictions etc.). The role of trust/confidence in the reported numbers is demonstrated -too little confidence can lead to a panic, societal intermingling can stop leading to other adverse effects like economic collpase and too much confidence can lead to scenarios where the confidence is unfounded, leading to unmitigated spread of the dissease.

A unified game theoretic framework incorporating the interventions of a planner, behavioral choices of individuals, detection rates and trust in reported numbers has been developed. Both the planner and population begin to favor moderate social distancing when the infection numbers begin to peak. The detection and the subsequent trust in these reported numbers also play crucial role in the spread of disease. Too little detection coupled with unfounded confidence can lead to an unmitigated spread of the disease while too little confidence when the detection is reasonably high leads to unnecessary loss of economic activity. Simulation results supporting these conclusions are presented.

The author acknowledges the useful discussions he had with Prof. Sandeep K. Juneja on this topic. This work was supported by the Department of Atomic Energy, Government of India, under project no. RTI4001. Proof of (6) .

In what follows we have set P t := P(τ > t). Define Z t , ∀t ∈ [0, T ] as follows :

Observe that Z 0 is the loss functional J α . For any time t < T we have: Now since time t was arbitrary we can write the same expression for t + ∆t < T with ∆t > 0. Thus:

Bf s ds) + P t Z t =H t+∆t + P t+∆t ( t+∆t 0 Bg s ds)

Solving for Z t we get:

Plugging in the definition of H t and doing some straightforward but tedious algebra we get: This combined with continuity of g t and the fact P(τ > t + ∆t|τ > t) → 1 as ∆t → 0 gives us : This shows that the limit in term A exists and is equal to dZt dt . Hence we have the differential equation:

We also observe from definition of Z t that Z T = 0. This is the same equation as the adjoint variable λ t in Theorem 1 with the relation λ t = −Z t − R.

Containing papers of a mathematical and physical character

Lecture notes in mathematical epidemiology

Modeling epidemics with differential equation

Analysis of an SIR epidemic model with pulse vaccination and distributed time delay

Optimal control problem in preventing of swine flu disease transmission

An optimal control problem arising from a dengue disease transmission model

A Simple Planning Problem for COVID-19 Lockdown

A multi-risk SIR model with optimally targeted lockdown

Strict physical distancing may be more efficient: A mathematical argument for making lockdowns count

Internal and external effects of social distancing in a pandemic

Cambridge Working Papers in Economics 2021

Dynamic optimization and differential games

Optimal control problems with delays in state and control variables subject to mixed control-state constraints

Nonlinear and dynamic optimization: From theory to practice

Calculus of variations and optimal control theory

Ordinary Differential Equations

Numerical solution of a nonlinear advance-delay-differential equation from nerve conduction theory

Numerical Solution of Two Point Boundary Value Problems