key: cord-0721537-4d6xy05o
authors: Sameni, Reza
title: Model-based Prediction and Optimal Control of Pandemics by Non-pharmaceutical Interventions
date: 2021-02-12
journal: IEEE Journal on Selected Topics in Signal Processing
DOI: 10.1109/jstsp.2021.3129118
sha: 514fe94ec5859423771c0a103dd57390f3624909
doc_id: 721537
cord_uid: 4d6xy05o

A model-based signal processing framework is proposed for pandemic trend forecasting and control, by using non-pharmaceutical interventions (NPI) at regional and country levels worldwide. The control objective is to prescribe quantifiable NPI strategies at different levels of stringency, which balance between human factors (such as new cases and death rates) and cost of intervention per region/country. Due to infrastructural disparities and differences in priorities of regions and countries, strategists are given the flexibility to weight between different NPIs and to select the desired balance between the human factor and overall NPI cost. The proposed framework is based on a textit{finite-horizon optimal control} (FHOC) formulation of the bi-objective problem and the FHOC is numerically solved by using an ad hoc textit{extended Kalman filtering/smoothing} framework for optimal NPI estimation and pandemic trend forecasting. The algorithm enables strategists to select the desired balance between the human factor and NPI cost with a set of weights and parameters. The parameters of the model are partially selected by epidemiological facts from COVID-19 studies, and partially trained by using machine learning techniques. The developed algorithm is applied on ground truth data from the Oxford COVID-19 Government Response Tracker project, which has categorized and quantified the regional responses to the pandemic for more than 300 countries and regions worldwide, since January 2020. The dataset was used for NPI-based prediction and prescription during the XPRIZE Pandemic Response Challenge.

The COVID-19 pandemic highlighted the fact that social life-as a dynamic system-is always in a metastable condition, which is continuously prone to pandemic outbreaks (regardless of the severity or geographical origin of pandemics). Parallel to medical solutions and vaccinations against known viruses, the rapid and effective response to future pandemics requires proactive planning and interdisciplinary collaborations between different scientific communities. Specifically, some of the prominent contributions, which can be made by the signal processing and data science communities include: 1) developing accurate spatio-temporal forecasting models at different levels of abstraction, which Manuscript Copyright (c) 2021 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org. simulate pandemic outbreaks and trends, 2) identifying quantifiable non-pharmaceutical intervention (NPI) plans, with fact-based estimates of the impact and cost of each NPI [1] , and 3) simulated multi-objective pandemic response strategies, which balance between NPI cost and effectiveness, to help governments and policymakers in resource allocation and factbased decision making to control new pandemic waves.

In this context, NPIs refer to actions and policies adopted by individuals, authorities or governments that help slowing down the spread of epidemic diseases. Enforcement of social distancing, face covering, restrictions on social events and public transportation, etc. were among the NPIs that were experienced during the COVID-19 pandemic. NPIs are among the best ways of controlling pandemic diseases when vaccines or medications are not yet available 1 .

During the COVID-19 pandemic, several attempts were made to categorize and quantify the various NPIs of different regions and nations, which was essential for comparing the effectiveness of regional policies in containing the pandemic spread. By using signal processing and machine learning techniques, the quantified NPI can be used to forecast the future trends of the pandemic and to simulate "what if scenarios" for the better management of human and medical resources, and to eventually prescribe appropriate NPI for controlling the pandemic [2] . The overall pandemic monitoring, forecasting and control cycle can be seen as a closed-loop system consisting of social and algorithmic elements, as shown in Fig. 1 .

The Oxford COVID-19 Government Response Tracker (Ox-CGRT) was one of the NPI tracking projects, which were launched and regularly updated during the COVID-19 pandemic [3] . OxCGRT has been used by the data science community for NPI-based prediction and prescription planning. Specifically, the XPRIZE Pandemic Response Challenge addressed the problem of predicting future trends of the pandemic in different regions and countries, and prescribing efficient NPIs that compromise between the number of new cases and a weighted-cost of intervention [4] , [2] . The challenge was motivated by the fact that due to disparities in infrastructures, available resources and priorities, policymakers worldwide tend to give different weights to each NPI and are interested to know the impacts and consequences of each policy in advance. During this challenge, the Alphanumerics Team from the Department of Biomedical Informatics at Emory University, adopted a model-based signal processing approach, based on estimation theory and finite-horizon optimal control, to address NPIs practiced by the society. The monitoring entities (healthcare system and other organizations) observe the "ground truth" effective NPIs, the number of new cases, hospitalized and death cases, which contain inevitable errors and uncertainties. Data analysts contribute to this loop by forecasting the future trend of the pandemic and providing fact-based prescriptions of future NPI to control the pandemic, while satisfying other socioeconomic objectives.

the problem of weighted NPI prescription. Our team was among the finalists of this challenge. The notion of multi-objective finite-horizon pandemic control has also been considered by other researchers in simulated scenarios [5] , [6] , and also on real data [7] . The latter is closely related to the hereby developed bi-objective optimization via NPI. Over the past year, several other control-theoretical approaches have emerged for pandemic control [8] , [9] , [10] , which have been mainly tested in simulated scenarios or over regional data.

In order to develop a general framework, which is applicable for all regions worldwide, the OxCGRT dataset was used for the current study. Since the only globally registered NPIs in this dataset are the total confirmed cases, the total confirmed deaths and the daily NPIs, we adopted an extension of a generic susceptible-infected (SI) compartmental model from our previous work, as the base model for all regions/countries [11] . The proposed model parameters are trained on historic data and used to predict future trends from input NPI by using an extended Kalman filter. It is further shown that the forecasting model can be integrated with a finite-horizon optimal controller to find the optimal daily NPIs with arbitrary NPI cost weight vectors. It should be noted that since the XPRIZE Challenge was held before the global availability of COVID-19 vaccines, the interventions considered for all nations do not include vaccinations. This fact is also reflected in the compartmental model detailed in the sequel. Nevertheless, the proposed framework is generic and can be extended to other compartmental models, when more accurate data exist for a specific region/nation.

While the majority of recent pandemic data analysis research-including the winners of the XPRIZE Pandemic Response Challenge [12] , [13] -are based on data-driven machine learning (ML) techniques and deep neural networks, this research demonstrates how classical signal processing and optimal control theories can be used and combined with ML techniques for solving pandemic trend forecasting and NPI prescription. The advantages of the proposed approach are manifold, including: theoretical support for the predictions/prescriptions, ease of interpretation through fact-based data models, quantitative performance bounds, and lower computational/training cost as compared with fully data-driven ML-based approaches.

In Section II, the OxCGRT NPI database is explained. Section III details the data model. The developed finite-horizon optimal NPI prescription framework is elaborated in Section IV. This framework is combined with an extended Kalman filter/smoother for pandemic forecasting in Section V, followed by the details of model training and implementation in Section VI. The results on real data from the COVID-19 pandemic and a detailed discussion on the proposed method are presented in Sections VIII and IX, followed by concluding remarks and future directions. The source codes of the developed models and algorithms are provided online for reference [14] .

To date, the Oxford COVID-19 Government Response Tracker (OxCGRT) is an ongoing project [1] , which categorizes and quantifies the NPI policies of different regions/nations since the beginning of the pandemic. The dataset was used in the XPRIZE Pandemic Response Challenge [4] , which addressed the problem of pandemic trend forecasting in different regions/countries under social interventions (e.g., social distancing, mandatory mask wearing, social gathering prohibitions, closure of schools and public transportation limitations), and prescribing efficient NPIs that compromise between human factors (infection and death rates) and a weighted cost of intervention. The subset of OxCGRT NPI categories used by the XPRIZE Challenge and the current study are listed in Table I . Note that the OxCGRT dataset uses the Johns Hopkins Coronavirus dataset for US states [15] , which also provides state-level information for the US. In OxCGRT, the level of NPI stringency for different social activities are quantified by integer values ranging from zero (minimum or no stringency), to two or four (for a maximum stringency, depending on the index). The total social stringency can be considered as a weighted combination of the different indexes. Throughout the XPRIZE Challenge, the weight vector was considered as a design parameter, which enables strategists to prioritize certain NPIs over the others.

III. DATA MODEL The two major classes of methods for epidemic disease spread modeling are: 1) Compartmental models, which split the total population of a region into various compartments (groups) such as susceptibles, exposed, infected, recovered, vaccinated, deceased, etc. These compartments are used to form parametric differential/difference equations, which are fit on real data and are analytically or numerically solved to predict future trends of the disease spread. 2) Agent-based models, which model the behaviors of individuals and their interactions at a simplified level-ofabstraction. Using these models, large groups of agents are generated in stochastic simulated environments, as they randomly move, interact and probabilistically pass the infection to one another, recover, pass away, etc. The population-level properties are obtained by ensemble averaging over the entire population. Each approach has its advantages and limitations. For large population sizes at regional or national levels-which is the scope of the current study-the first approach is asymptotically accurate and is more advantageous as it can be analytically studied in a rigorous mathematical framework and combined with state estimation techniques for forecasting, and optimal control theories for NPI prescription. Therefore, the first approach was adopted for this study, by using a contactcontrolled time-variant version of the so-called susceptibleinfected (SI) compartmental model shown in Fig. 2 . This model is a simplified variant of the general multi-compartment models studied in our previous research [11] . Apparently, more accurate models can be used for the regions that additional data such as the number of recovered, hospitalized, vaccinated, or the age pyramid of the population are available. However, for the current study, since the global data provided in the OxCGRT dataset were the number of daily confirmed cases, total death cases, and the regional NPIs, the same SI model is used for all regions and countries.

The nonlinear dynamic equations corresponding to the proposed compartmental model are:

where • s(t) is the fraction of population in a region/country that is susceptible at time t (i.e., the susceptible population divided by the population size N );

• i(t) is the fraction of population that is infected and contagious at t (i.e., the infected contagious population divided by the regional population size N ); • u(t) ∈ R L is the NPI vector considered as an exogenous control input (L = 12, for the list of NPIs in Table I, used for the XPRIZE Challenge [4] ). The full description of the OxCGRT data NPI set is detailed in [3] ; • α(t) is the time-variant contagion rate, with inverse time unit; • h[u(t)] is a causal monotonic function of the NPI, which maps the NPI to the contagion rate; • β is the rate of elimination from the contagious group (through quarantine, recovery, or death), assumed to be constant in the simplified case; • γ is the action to effect rate (or the inverse of the NPI lag to the inter-individual contact rate), which accounts for the delay between adopting an NPI policy and the onset of its practical effectiveness in the pandemic trend. The third

As a corner case, γ → ∞ represents zero latency between action and effect, resulting in α(t) = h[u(t)]. The parameters β, γ and the function h[u(t)] require learning using the observed variables, as detailed in Section VI. Furthermore, in [11] we showed how the infection reproduction rate R t can be calculated from α(t) and β. Specifically, using the eigenanalysis-based definition of the reproduction rate proposed in [11] , during the pandemic outbreak, when only several percents of the population are infected and herd immunity has not been reached, we have:

where ∆ is the reproduction rate generation time-unit. Finally, for estimation purposes, the dynamic equations in (1) can be related to real-world reports of the fractions of new cases:

or through the fraction of total confirmed cases:

where v(t) is measurement noise due to case report errors (which inevitably existed during the COVID-19 global reports in all regions/nations), and s(t 0 ) is the initial susceptible population fraction at the beginning of the pandemic (very close or equal to 1, for an un-vaccinated initial population).

From (1), the total number of new infections over an arbitrary time window [t 0 , t 1 ] is:

and the total cost of NPIs over the same time period is where w(t) is the NPI weight vector given as input. The motivation for the user-selected weight vector w(t) is that the cost of intervention is different across regions. A stereotypical example considered in the XPRIZE Challenge was that "closing public transportation may be much costlier in London than it is in Los Angeles. Such preferences are expressed as weights associated with each intervention plan dimension, given to the prescriptor as input for each region [4] ." With these assumptions, the optimal NPI prescription problem can be formulated as a bi-objective optimization problem, with a total cost:

where ǫ ∈ [0, 1] is a free parameter that compromises between the human factor (J 0 ) and the NPI cost (J 1 ), and Γ is the set of admissible inputs:

where the vectors u min and u max are (element-wise) the minimum and maximum ranges of the NPI indexes in the OxCGRT dataset (the "values" column in Table I) . Accordingly, u min = 0 corresponding to no stringency, and u max = [3, 3, 2, 4, 2, 3, 2, 4, 2, 3, 2, 4] T for the maximum stringency of each NPI index. For a given pair of design parameters {ǫ, w(t)}, the objective is to find u * (t) for all t ∈ [t 0 , t 1 ], such that:

The problem (9) can be solved by finite-horizon optimization [16] . In optimal control theory, the inputs which satisfy this equation are known as Pareto optimal (efficient). In fact, for an arbitrary weight vector w(t), by sweeping ǫ over [0, 1], the Pareto-optimal front of the optimization problem is found, from which pandemic strategists can select the desired free parameter ǫ that determines the desired operation point to balance between NPI effectiveness and cost (and its corresponding optimal NPI u * (t) to be adopted by the country/region).

To solve (9), first the corresponding Hamiltonian function is formed [16, Ch. 2] :

where λ 1 (t), λ 2 (t) and λ 2 (t) are known as co-states. According to Pontryagin's minimum principle, the co-states and the optimal solution u * satisfy [16, Ch. 6]:

When the inputs are unconstrained, the Hamiltonian minimizer input u * , in the last condition of (11), can be found by solving

where ∇ u H denotes the Hamiltonian gradient with respect to the input vector u, and the condition should hold elementwise. In this case, a sufficient condition for the existence of a solution is to have ∇ 2 u H(u * ) ≻ 0 (where ∇ 2 u denotes the Hessian operator with respect to the input vector u and ≻ 0 denotes positive-definiteness). In the constrained-input caseas in this problem-where the inputs are confined to the admissible set (8) , while the global solution of (12) might not exist or belong to the admissible set (8), a Hamiltonian minimizer optimal input still exists. In either case, the optimal input is found as a parametric function of the costate λ 3 (t) and the other model parameters.

The parametric optimal input found from (12) is next combined with (11) and (1) to calculate the states, using the initial conditions and appropriate boundary conditions (also known as the transversality conditions) on the co-states and the Hamiltonian. The desired boundary conditions, which satisfy the pandemic control problem are:

The conditions in (13) are the general free end-point conditions of finite-horizon optimization problems, which match the objectives of the pandemic control problem. Alternative transversality conditions that can be studied within the proposed framework are [16, Section 2.7]: 1) When the end-time t 1 is not fixed, but we require that i(t 1 ) reaches below i max by the end of the control period (infinite-horizon scenario). This requires the additional condition: H(t 1 ) = 0. 2) Assuming that the objective of any NPI policy over a reasonable time period [t 0 , t 1 ] (long enough to make the NPIs effective) is to bring the number of active cases down to i(t 1 ) ≤ i max , where i max is some target fraction of active cases (ideally zero). In this case, the second condition in (13) can be replaced by:

The solution of the NPI optimization problem depends on the choice of h[u(t)], i.e. the NPI to inter-human contact mapping model. Intuitively, h[u(t)] is expected to be a monotonically decreasing function of the input NPI vector u(t). In other words, more strict restrictions on social contact (corresponding to the higher values in Table I ) should overall reduce the person-to-person contact rates at the population level (this was the globally accepted rationale behind the social restrictions during the COVID-19 pandemic). However, the exact shape of h[u(t)] generally requires learning from historic data, where the monotonic decreasing assumption acts as a constraint during learning.

Based on this intuitive assumption, we study the following two cases, which lead to closed form solutions for the optimal input as functions of the model co-states.

where a is a vector of input influence weights and b is a constant bias (intercept value). The least absolute shrinkage and selection operator (LASSO) method falls into this category. In addition, adding the constraint a ≥ 0 guarantees the monotonically decreasing relationship between the NPI and α. In other words, more stringent NPI policies have a nonincreasing effect on the human interactions parameter α (i.e. the NPI do not have any counter-impacts on the contact rates).

Inserting h[u(t)] in (10) we find:

Now, since ∇ u H(u) is independent of u, depending on its sign, the Hamiltonian which is a linear function of u, admits its minimum at one of the extreme ends of the admissible input ranges (8) . This eventually results in

for k = 1, . . . , L.

2) Quadratic regression: In the second case, we assume

where a ≥ 0 and S ∈ R L×L is a positive-definite matrix. These assumptions guarantee the monotonically decreasing relationship between the NPI and α(t). Multivariate constrained polynomial fitting can be used to find S, a and b from historic NPI and case-report data 2 . Therefore, the required constraint polynomial fitting is straightforward to implement by conventional least squares solvers. In this case, we have: (18) and setting ∇ u H(ũ) = 0 gives

From the second partial derivative test, since ∇ 2 u H = γλ 3 (t)S, and the fact that S is assumed to be positive definite, three cases may occur: 1) if λ 3 (t) > 0,ũ is a local minimum, 2) if λ 3 (t) < 0, it is a local maximum, and 3) λ 3 (t) = 0 results in a saddle point. Therefore, by applying Pontryagin's minimum principle and considering the admissible input range (8) , after some algebraic simplifications we find:

whereũ k and s k are the k th entries of the vectorsũ and S(u max − u min ), respectively. Comparing (20) and (16), it is clear how the quadratic case simplifies to the linear case when S → 0. It is also seen that in the linear case, the "optimal NPI" is always one of the extreme cases u min k (no action) or u max k (maximum stringency). But when h(·) is nonlinear, intermediate interventions may also be in the optimal NPI set.

For an ideal deterministic model, the state and co-state dynamic equations detailed in Section IV can be solved with numerical toolboxes for finite-horizon optimal control (cf. [17] for a MATLAB-based solution). However, for the application of interest, there are major issues which limit the numerical performance, including: 1) model inaccuracies, 2) noisy observations (inaccurate case reports), 3) missing reports (during holidays), 4) unknown or variable parameters (which is inevitable for a highly dynamic complex system, such as a global pandemic), 5) the difficulty of incorporating the startand end-point boundary conditions from (13) .

Due to these issues, we propose a novel technique based on optimal state estimation. Accordingly, we integrate the finite-horizon NPI optimizer and the pandemic trend predictor in a classical extended Kalman filter (EKF) and extended Kalman smoother (EKS) [18] . Using (1), (3) and (11), the state-augmented dynamic equations for the EKF are:

where the first six equations are the state and co-state dynamics, the last equation is the observation equation and h[u * (t)] is the impact of the optimal control calculated from (16) or (20) . The terms w s (t), w i (t), w α (t), η 1 (t), η 2 (t) and η 3 (t) in (21) represent process noises (due to model inaccuracies), and v(t) is observation noise. Note that for an estimation based on the total number of confirmed cases (instead of the new cases), the last observation equation in (21) can be replaced with (4). The dynamic system (21) may now be numerically solved by using standard EKF and EKS equations. The discretized version of (21), which is required for the discrete-dime implementation of the EKF and EKS is detailed in the Appendix.

With the above formulation, the finite-horizon optimal NPI prescription and the EKF/EKS-based forecasting problems are unified in a single model. The overall scheme is summarized in Algorithm 1. The MATLAB implementation of this algorithm is available in our online repository [14] . Note however that the prediction and prescription schemes may still be used independently. Specifically, the proposed optimal NPI prescriptor may be combined with any reliable forecasting scheme, including the data-driven LSTM-based models developed during the XPRIZE Pandemic Response Challenge [2] , [12] , [13] .

VI. MODEL TRAINING The model parameters h[u(t)], β, γ and the EKF/EKS parameters require region-wise training or fact-based selection. Train the compartmental model parameters over historic NPI and case reports (or a predictor model).

Use EKF and EKS for prediction and prescription of finite-horizon optimal control inputs u * (t).

For this study, we used classical techniques for Kalman filter engineering, based on monitoring the properties of the innovations process of the Kalman filter to select and automatically adapt the Kalman filter parameters (initial/final states and covariance matrices) over time [19, Ch. 8] . The parameters related to the social and epidemic aspects of the model are explained in the sequel.

The mapping h[u(t)] was trained over the OxCGRT dataset historic cases. For this, the developed filter was first applied to the historic data by neglecting the explicit relation between the NPI and contact rates (equivalent to h[u(t)] = 0). Referring to (21) , this assumption is equivalent to considering the input-driven fluctuations of α(t) in the process noise w α (t). Therefore, the entry of the process noise covariance matrix, which corresponds to w α (t) is increased (as compared with the expected α(t) error) to account for the inaccuracy of the model due to neglecting h[u(t)]. The resulting EKS gives a primary estimate of α(t) over the training period, which in the next stage is given to a constrained LASSO or quadratic polynomial fitter (for the linear and quadratic forms presumed in Section IV-C) to estimate h[u(t)] using the historic NPI data. Next, the trained model h[u(t)], together with the historic data is used in a second round of EKS, this time by using the historic NPI and apparently a smaller a priori assumption for the variance of w α (t) (as it no longer accounts for neglecting h[u(t)]). After the secondary EKS, the new estimates of α(t) are once more used to refine the model parameters of h[u(t)]. The refined parameters are stored per country/region for utilization during the prescription phase over real or synthetic scenarios.

The action-to-effect rate parameter γ was selected intuitively. From various social experiences, it is reasonable to expect a smooth transition in α(t) due to any change in the NPI. This is based on the social experience that imposing any policy on a complex social system is rarely abrupt. Although the transition is region and NPI dependent, in order to reduce the model complexity, we have fixed γ to 1/(7 days)=0.1429 days −1 for all regions/countries.

The recovery parameter β was selected by educated guesses from the Centers for Disease Control and Prevention (CDC) reports regarding recovery and contagion periods 3 . Accord-ingly, multiple scientific studies worldwide have reported that an exposed subject is no longer infectious after three to four weeks. Of course, this is a stochastic range. To clarify, with an exponential model such as the SI model, in absence of new infected cases (α = 0), we find the ratio i(t 0 + T )/i(t 0 ) = exp(−βT ), which can be considered as an exponential law for the probability of infectiousness after T time-units (days). Combining the model with the CDC reports, we derive the following rule of thumb for setting β: β = − log(contagion probability after time T )/T (22) For the later presented results, we have set the probability of contagion to 0.01 at T =21 days, resulting in β=0.219 days −1 . Following recent studies [20] , the reproduction rate of the pandemic during outbreak was taken to be R 0 =2.5 with ∆=1 day, which using (2) together with β were used to initialize α(t 0 ), i.e. the contact rate during outbreak. Note that one of the advantages of the EKF/EKS framework is that the model parameters can also be considered as state variables and be state-augmented with the other equations to be estimated (or updated over time). This approach can be used for both γ and β to refine the initial educated guesses.

Finally, the regional/national population sizes, as required for normalizing the total and new contaminated cases to the normalized variables of the SI model were obtained from the United Nations' World Population Prospects 2019 dataset [21] , and was assumed to remain fix over the study, i.e. immigration, inter-border travels, natural birth/deaths have been neglected throughout the study.

Kalman filters have intrinsic mechanisms for monitoring their performance and the consistency of their selected parameters. The so-called innovation process is at the heart of performance monitoring. It is defined:

wherex(t) is an estimate of the desired observation x(t), corresponding to the total number of cases or the new cases, as defined in (4) or (3), respectively. For a well-functioning Kalman filter with a single observation (the total number of cases or the new cases), the innovation process has the following properties [18, Ch 5.3] , [22] :

, whereP(t) is the covariance matrix of the state vector estimation error before the t th measurement, c(t) is the stateto-observation map Jacobian, and r(t) is the observation noise variance (defined in Appendix B). P1 and P2 guarantee that the process noise is zero-mean and spectrally white, and P3 assures that the Kalman filter's estimate of the observation noise is in accordance with the presumed model parameters. The violation of any of these properties is an indication of parameter mis-selection, e.g., the state or observation noise covariance/variance. Although the above properties originally belong to the linear Kalman filter, they can be equally used Fig. 3 . New cases trend tracking using the proposed EKS on daily reported cases of the US, since Mar 4 th 2020 (the 100 th case report). The raw noisy daily reports have been adopted from the OxCGRT dataset [3] . The smoothing period is up to day 250 and used to forecast the trends thereafter.

to monitor the well-functioning of the EKF/EKS. In other contexts, indexes have been proposed based on the above properties for tracking the performance of Kalman filters and the selection of their parameters [22] , [18, Ch. 6] , [23] . These monitoring indexes are provided in the open-source implementation of the hereby developed EKF [14] , and are useful for tuning the EKF/EKS model parameters per region/country. Fig. 3 shows the smoothing result of the number of new cases using the proposed EKS on daily reported cases of the US from March 4 th 2020 (the 100 th case report date), until November 9 th 2020. These 250 days of smoothing is followed by 40 days of forecasting by using an EKF. The ±3σ standard deviation envelopes obtained from the error covariance matrix estimates of the EKF are also plotted on the trend estimates. It is seen that as we estimate the number of cases farther in the future, the confidence intervals enlarge, i.e. the forecasts become less reliable.

In order to show the forecasting accuracy, another experiment was conducted, in which the EKF/EKS NPI prescription model was trained by the real US new cases and NPI data from March 4 th 2020 through December 3 rd 2020. The trained model was applied to the ground truth data from December 4 th 2020 to March 4 th 2021, to forecast the number of new cases from 1 to 60 days ahead, having only the actual NPIs as the input. Fig. 4 shows the percentage of forecasting error for different start dates from December 3 rd 2020 to March 4 th 2021, versus the number of look-ahead forecasting days. As expected, the model is generally more accurate for short-term forecasting as compared to long-term forecasting. Note that although we can see that the look-ahead forecasting errors have slightly decreased beyond 30 days, the observation is not generalizable and may not be associated to the accuracy of the forecasting model, as it highly depends on the dynamics of the pandemic, ground-truth data, NPI policies adopted by a country, and many other socioeconomic factors. We further elaborate on this point in the discussion.

The OxCGRT dataset has above 300 countries and regions (states). Due to inconsistencies in the reported COVID- 19 cases, some of the countries/regions were omitted from the study during the XPRIZE Challenge and the proposed prediction-prescription algorithm was trained and applied to a total number of 235 regions/countries, with arbitrary NPI cost weights. The training period for the model was from January 1 st 2020 to Feb 7 th 2021, and the test phase was from February 8 th 2021 up to May 7 th 2021 (the preparation date of the manuscript). As proof of concept, the bi-objective optimization space of the NPI cost J 1 vs the human factor cost J 0 are shown in Fig. 5 , for several countries worldwide. Accordingly, each point in this figure corresponds to a (J 0 , J 1 ) pair for a sequence of NPI scenarios over the test phase. In Fig. 5 , the red points are the result of the proposed method for different values of ǫ (the bi-objective optimization free parameter). The black crosses correspond to the scenario of continuing the latest NPI policy of each government at the end of the training date, up to the end of the testing date. Finally the blue points correspond to a pool of random constant stringencies u(t) = κ, κ ∼ U [u min , u max ], and random variable stringencies u(t) ∼ U [u min , u max ], i.e. uniformly distributed between u min and u max . In all cases, the user defined NPI weight vector was chosen to be equal for all NPI (w(t) = 1), i.e., the NPI were considered equally important for the policymaker.

As a bi-objective problem, the Pareto efficient front comprises of the NPI points which either have a smaller value of J 0 or J 1 , while the non-efficient solutions are the ones for which there exists at least a point that gives a smaller cost of both J 0 and J 1 . In other words, a Pareto efficient solution should outperform any other solution either in its cost or efficiency. As trivial cases, the maximum stringency case ǫ = 0 (maximal enforcement of social limitations, to minimize human cost) and the minimum stringency case ǫ = 1 (no social constraints, to minimize costs of intervention) are both Pareto efficient; the former minimizes the human losses and the latter minimizes the socioeconomic cost of intervention. Apparently, policymakers prefer a balance between these two objectives. Therefore, the Pareto efficient NPI policies should be close to the origin or along one of the left or bottom axes. From Fig. 5 , it is seen that none of the NPI policies adopted by countries/regions were optimal (assuming equal NPI weights), and despite the significant disparities between the studied countries, the Pareto optimal points all belong to the proposed algorithm. Note that the "optimal point" is clearly a function of the bi-objective parameter ǫ, eventually selected by policymakers.

A similar performance was obtained for all the 235 studied countries and regions. As a case study, Fig. 6 demonstrates the estimates number of new case in the US for different NPI scenarios, by using seven-day averages of official reports from March 4 th 2020 (the US 100 th case report date) through January 15 th 2021 for NPI model training. The trained model was applied to the ground-truth data from January 16 th 2021 to March 15 th 2021 for test. Apparently, the hypothesized NPI scenarios may only be evaluated by simulation using our trained EKF forecasting model or other models. The compared scenarios are: A) optimal NPIs corresponding to 500 random ǫ ∈ [0, 1], B) fixed NPI (no changes in the NPI after January 15 th 2021), C) maximum stringency, D) zero stringency, and E) a compromised NPI on the bi-objective space Pareto front corresponding to the NPI scenario with the smallest normalized distance from the US bi-objective plane origin in Fig. 5 (l). The latter point is a compromise between NPI cost and human factors, and was numerically found to correspond to ǫ = 8.235 × 10 −8 . In Fig. 6 , the ground-truth new number of cases (the result of the actual US NPI policy over the test period) is also shown for comparison. Accordingly, the human cost of the last scenario, which is Pareto efficient is very close to the full-stringency case, without imposing the maximum stringency socioeconomic cost.

Since the Pareto front solutions are found by mathematical derivation (rather than trial and error or cumbersome searches), the proposed framework is extremely computationally efficient and the run-time for testing arbitrary scenarios is minimal. The MATLAB version of the codes applied to all regions and countries (235 in total), takes less than 30 s to train over the historic cases, on a MacBook Pro laptop with 2.3 GHz Quad-Core Intel Core i7 and 32 GB of memory, without notable optimizations. The run-time on the test scenarios takes about 15 s in total for all regions, as it contains only one EKS stage during the test period, per region/country.

The proposed algorithm for predicting pandemic trends and prescribing Pareto optimal NPI policies has multiple advantages over mere data-driven machine learning algorithms. The highlights of this algorithm include:

• The method is based on theoretical derivations and within the scope of the proposed compartmental model accuracy (which is asymptotically accurate for region/country-level population sizes), it gives accurate Pareto efficient solutions. technique, which accurately predicts pandemic trends from historic data (see for example [2] ). • This framework can be used for targeted pandemic control, where strategists can target specific infection bounds that match the medical resources of a country/region, over a fixed or maximally bounded period of time. Therefore, apart from the optimal NPI and fatality rate objectives, such scenarios can also be considered: "how to bring the pandemic reproduction rate below 0.8 by a specific time?", or "how to bring the new cases below 200 per day in less than two months?" The training phase of the pandemic over historic data together with the forecasting model can be used to study the feasibility of such scenarios and the prescription of the required NPI policies that would achieve such goals. • Both the model parameters and NPI cost weights can be updated over time. Therefore, unprecedented events such as vaccination or virus mutation effects can be integrated in the model with appropriate training. According to the socalled principle of optimality, "any portion of an optimal control trajectory is optimal [16, Sec 6.4]," which implies that optimality of future actions is independent of the past. Therefore, the prescribed optimal control strategy may be adopted at any point, regardless of the past actions of a region/nation. • The computational efficiency of this algorithm permits its combination with other machine learning methods to reduce the search space and to improve the accuracy on other datasets and under more complicated models such as the long short-term memory (LSTM), as in [2] . This feature is specifically useful for training the NPI to inter-human contact map h(·), which requires learning. • The predictor part of the model gives confidence intervals during both the prediction and prescription steps of the algorithm. Therefore, the performance and well-function of the algorithm can be continuously monitored and adapted by using the innovation process, as detailed in Section VII. • The proposed framework is extendable to pharmaceutical intervention plans and vaccinations, whenever sufficient data Fig. 6 . The number of new case estimates in the US for different NPI scenarios: optimal NPIs corresponding to 500 random ǫ ∈ [0, 1], fixed NPI (no changes in the last training date NPI), full NPI (maximum stringency), no NPI (no stringency), and optimal NPI (numerically found to be ǫ = 8.235 × 10 −8 corresponding to the NPI scenario with closest normalized distance from the bi-objective origin in Fig. 5(l) , which is a compromise between NPI cost and human factors), compared to the ground-truth (official reports is available to design and train alternative compartmental models for these factors. As a point of reservation, we should note that the scope of algorithmic and machine learning-based pandemic trend forecasts and prescriptions should not be overestimated or exaggerated. As an extreme case, consider a forecasting model which predicts that the daily new cases will reach below a certain threshold in several years. We can debate that such long-term speculations are neither scientific nor of any practical use. Because the intrinsic dynamics of pandemics (with or without NPIs, vaccination or herd immunity), guarantee that at some point in the future, there will no longer be any susceptible groups (neglecting the virus mutations). It is therefore essential to demonstrate that any forecasting is factbased, nontrivial, and predictable from previous observations (in the causal sense).

In this research, a model-based approach was used for the prediction and prescription of NPIs that best balance between an arbitrary weighted-cost of interventions and the human factors (number of new cases) during a pandemic. The proposed algorithm and the prescribed NPIs were proved to be Pareto optimal, within the scope of the utilized compartmental model accuracy. Software implementations of the proposed algorithms are online available at [14] . In future studies, different aspects of this framework can be extended and improved. Specifically, advanced ML algorithms can be used for training the NPI to contact rate function h(·). The LSTM is specifically a promising approach for this purpose.

For regions which have access to additional data (e.g., the number of hospitalized, number of vaccinated, fatality rate of the virus, the population age pyramid, etc.), more accurate models such as the fatal susceptible-exposed-infectedrecovered (SEIR) can be used [11] . Other indexes such as the daily death reports can be augmented as additional observation equations in the dynamic model (21) , and will help to increase the EKF/EKS accuracies. Theoretical aspects of the proposed EKF/EKS frameworks, including stability conditions, parameter identifiability and robustness to parameter and modeling errors also require further studies.

For a discrete-time implementation of the EKF/EKS, the discrete form of the dynamic system (21) is required. Accordingly, we define the discrete variables s k = [s(k∆), i(k∆), α(k∆)] T w k = [w s (k∆), w i (k∆), w α (k∆), η 1 (k∆), η 2 (k∆), η 3 (k∆)] T n k = n(k∆), c k = c(k∆), v k = v(k∆)

where ∆ is the discretization time unit. Assuming that ∆ is small as compared with the variations of the pandemic trends, a first order discrete approximation of (21) is found as follows: s k+1 = s k − ∆α k s k i k + ∆w sk i k+1 = i k + ∆α k s k i k − ∆βi k + ∆w ik α k+1 = α k − ∆γα k + ∆γh[u * k ] + ∆w αk λ 1,k+1 = λ 1k + ∆[λ 1k −λ 2k − 1+ǫ]α k i k + ∆η 1k λ 2,k+1 = λ 2k + ∆[λ 1k −λ 2k − 1+ǫ]α k s k + ∆βλ 2k + ∆η 2k λ 3,k+1 = λ 3k + ∆[λ 1k −λ 2k − 1+ǫ]s k i k + ∆γλ 3k + ∆η 3k n k = α k s k i k + v k (24) which can be formulated in a compact form:

s k+1 = f (s k , w k ; h(u k )) n k = g(s k ) + v k

where f (·) and g(·) represent the nonlinear equations in (24) . Following (4) , if the number of confirmed cases is used as the observation, the second equation in (25) is replaced by

which is a linear function of the state vector. It is straightforward to linearize the discrete-time dynamic model (24) by calculating its Jacobian matrices, as required for the implementation of the EKF/EKS. An alternative approach is to use a continuous-dynamics discrete-observations approach, which is a classical method in optimal state estimation [24] , [18] . Accordingly, to implement the EKF/EKS, the state equations are updated by using the continuous version of the dynamic model (21) , while the observations are only updated on discrete-time intervals (e.g., on a daily basis).

With the state vectors and discretized dynamic models defined in Appendix A, defining the noisy regular reports x k (number of new cases n k , or alternatively the total confirmed cases c k ) with observation noise variance r k , process noise covariance matrix Q k , initial state estimateŝ + 0 with covariance P + 0 , the recursive equations for the EKF are listed in Algorithm 2. In this algorithm, A k = ∂f /∂s k | s=ŝ + k and c k = ∂g/∂s k | s=ŝ − k are the linearized forms of the system's dynamics equations, andw k = E{w k }.

Step 10 of the algorithm corresponds to enforcing hard constraints on the estimated variables (e.g. positiveness or range) and

Step 11 corresponds to Kalman filter sanity checks (cf. Section VII).

Algorithm 2 An extended Kalman filter for simultaneous compartment variable and model parameter tracking Input: The noisy regular reports x k (number of new cases n k , or alternatively the total confirmed cases c k ) Input: Initial conditions: Q, R,x + 0 , P + 0 Output:ŝ + k (vector of state and model parameter estimates) 1: for k = 0, · · · , T do 2:

State prediction:

Measurement update: 6 :

Check and enforce variable and parameter ranges using hard-constraints 11: Performance monitoring 12: end for ACKNOWLEDGMENT The author sincerely thanks Prof. Christian Jutten, Emeritus Professor of Université Grenoble Alpes (UGA) for the very insightful and motivating comments on the first versions of this work. The author also acknowledges the SEEPIA COVID-19 working group at UGA, directed by Prof. Didier Georges, for the fruitful (virtual) scientific meetings during the pandemic.

A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker)

From prediction to prescription: Evolutionary optimization of nonpharmaceutical interventions in the covid-19 pandemic

Oxford COVID-19 Government Response Tracker, 2020, Blavatnik School of Government

The XPRIZE Pandemic Response Challenge

Optimal control applied to a seir model of 2019-ncov with social distancing

Transport effect of covid-19 pandemic in france

Optimal control of the COVID-19 pandemic with non-pharmaceutical interventions

Optimal covid-19 epidemic control until vaccine deployment

Optimal timing for social distancing during an epidemic

Control strategies to curtail transmission of COVID-19

Mathematical Modeling of Epidemic Diseases; A Case Study of the COVID-19 Coronavirus

Open Data Science to Fight COVID-19: Winning the 500k XPRIZE Pandemic Response Challenge

Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19

Open-access codes for the mathematical modeling of epidemic diseases

An interactive web-based dashboard to track COVID-19 in real time

Optimal control systems

Solving optimal control problems with MATLAB: Indirect methods

Optimal Filtering

Global positioning systems, inertial navigation, and integration

Comparing SARS-CoV-2 with SARS-CoV and influenza pandemics

United Nations, Department of Economic and Social Affairs

Theory and Practice Using MATLAB

Temporally Nonstationary Component Analysis; Application to Noninvasive Fetal Electrocardiogram Extraction

Applied Optimal Estimation