key: cord-0985158-7ku5qcxt authors: Massonis, Gemma; Banga, Julio R.; Villaverde, Alejandro F. title: Structural identifiability and observability of compartmental models of the COVID-19 pandemic() date: 2020-12-21 journal: Annu Rev Control DOI: 10.1016/j.arcontrol.2020.12.001 sha: 0e96c8ad897d384af44dbf8b0eb902ef7967dc1f doc_id: 985158 cord_uid: 7ku5qcxt The recent coronavirus disease (COVID-19) outbreak has dramatically increased the public awareness and appreciation of the utility of dynamic models. At the same time, the dissemination of contradictory model predictions has highlighted their limitations. If some parameters and/or state variables of a model cannot be determined from output measurements, its ability to yield correct insights – as well as the possibility of controlling the system – may be compromised. Epidemic dynamics are commonly analysed using compartmental models, and many variations of such models have been used for analysing and predicting the evolution of the COVID-19 pandemic. In this paper we survey the different models proposed in the literature, assembling a list of 36 model structures and assessing their ability to provide reliable information. We address the problem using the control theoretic concepts of structural identifiability and observability. Since some parameters can vary during the course of an epidemic, we consider both the constant and time-varying parameter assumptions. We analyse the structural identifiability and observability of all of the models, considering all plausible choices of outputs and time-varying parameters, which leads us to analyse 255 different model versions. We classify the models according to their structural identifiability and observability under the different assumptions and discuss the implications of the results. We also illustrate with an example several alternative ways of remedying the lack of observability of a model. Our analyses provide guidelines for choosing the most informative model for each purpose, taking into account the available knowledge and measurements. The current coronavirus disease (COVID-19) pandemic, caused by the SARS-CoV-2 virus, continues to wreak un-2 paralleled havoc across the world. Public health authorities can use mathematical models to answer critical questions 3 related with the dynamics of an epidemic (severity and time course of infected people), its impact on the healthcare 4 system, and the design and effectiveness of different interventions [1] [2] [3] [4] . Mathematical modeling of infectious dis-5 eases has a long history [5, 6] . Modeling efforts are particularly important in the context of COVID-19 because its 6 dynamics can be particularly complex and counter-intuitive due to the uncertainty in the transmission mechanisms, 7 possible seasonal variation in both susceptibility and transmission, and their variation within subpopulations [7] . The 8 media has given extensive coverage to analyses and forecasts using COVID-19 models, with increased attention to 9 cases of conflicting conclusions, giving the impression that epidemiological models are unreliable or flawed. However, 10 a closer look reveals that these modeling studies were following different approaches, handling uncertainty differently, 11 and ultimately addressing different questions on different time-scales [8] . 12 Broadly speaking, data-driven models (using statistical regression or machine learning) can be used for short-13 term forecasts (one or a few weeks). Mechanistic models based on assumptions about transmission and immunity 14 In this paper we assess the structural identifiability and observability of a large set of COVID-19 mechanistic mod-48 els described by deterministic ordinary differential equations, derived by different authors using the compartmental 49 modeling framework [31] . Compartmental models are widely used in epidemiology because they are tractable and 50 powerful despite their simplicity. We collect 36 different compartmental models, of which we consider several vari-51 ations, making up a total of 255 different model versions. Our aim is to characterize their ability to provide insights 52 about their unknown parameters -i.e. their structural identifiability -and unmeasured states -i.e. their observability. 53 To this end we adopt a differential geometry approach that considers structural identifiability as a particular case of 54 nonlinear observability, allowing to analyse both properties jointly. We define the relevant concepts and describe the 55 methods used in Section 2. Then we provide an overview of the different types of compartmental models found in the 56 literature in Section 3. We analyse their structural identifiability and observability and discuss the results in Section 57 4, where we also show different ways of remedying lack of observability using an illustrative model. Finally, we 58 conclude our study with some key remarks in Section 5. (1) where f and h are analytical (generally nonlinear) functions of the states x(t) ∈ R n x , known inputs u(t) ∈ R n u , 63 unknown constant parameters θ ∈ R n θ , and unknown inputs or time-varying parameters w(t) ∈ R n w . The output 64 y(t) ∈ R n y represents the measurable functions of model variables. observations of the model output y(t) and input u(t) in the interval t 0 ≤ τ ≤ t ≤ t f , for a finite t f . Otherwise, x i (τ) 73 is unobservable. A model is called observable if all its states are observable. We also say that M is invertible if it is 74 possible to infer its unknown inputs w(t), and we say that w(t) is reconstructible in this case. Structural identifiability can be seen as a particular case of observability [33] [34] [35] , by augmenting the state vector with the unknown parameters θ, which are now considered as state variables with zero dynamics, x = (x T , θ T ) T . The reconstructibility of unknown inputs w(t), which is also known as input observability, can also be cast in a similar way, although in this case their derivatives may be nonzero. To this end, let us augment the state vector further with w as additional states, as well as their derivatives up to some non-negative integer l: The l−augmented dynamics is: leading to the l−augmented system: Remark 1 (Unknown inputs, disturbances, or time-varying parameters). In Section 4, when reporting the results of 76 the structural identifiability and observability analyses, we will explicitly consider some parameters as time-varying. In the model structure defined in equations (1-2) the unknown parameter vector θ is assumed to be constant. To 78 consider an unknown parameter as time-varying we include it in the "unknown input" vector w(t). Thus, changing 79 the consideration of a parameter from constant to time-varying entails removing it from θ and including it in w(t). The We augment its state vector as z(t) = x(t) T θ T w(t) T T (4), which leads to its augmented form (5). We say that M has the FISPO property if, for every t 0 ∈ I, every model unknown z i (t 0 ) can be inferred from y(t) and u(t) in a finite time interval t 0 , t f ⊂ I. Thus, M is FISPO if, for every z(t 0 ) and for almost any vector z * (t 0 ), there is a neighbourhood N (z * (t 0 )) such that, for allẑ(t 0 ) ∈ N (z * (t 0 )) , the following property is fulfilled: y (t,ẑ(t 0 )) = y (t, z * (t 0 )) ⇒ẑ i (t 0 ) = z * i (t 0 ) , 1 ≤ i ≤ n x + n θ + n w . 3 J o u r n a l P r e -p r o o f In this paper we analyse input, state, and parameter observability -that is, the FISPO property defined above -88 using a differential geometry framework. Such analyses are structural and local. By structural we refer to properties 89 that are entirely determined by the model equations; thus we do not consider possible deficiencies due to insufficient or 90 noise-corrupted data. By local we refer to the ability to distinguish between neighbouring states (similarly, parameters 91 or unmeasured inputs), even though they may not be distinguishable from other distant states. This is usually sufficient, since in most (although not all, see e.g. [37] ) applications local observability entails global observability. This specific 93 type of observability has sometimes been called local weak observability [38] . This approach assesses structural identifiability and observability by calculating the rank of a matrix that is con-95 structed with Lie derivatives. The corresponding definitions are as follows (in the remainder of this section we omit 96 the dependency on time to simplify the notation): Definition 3 (Extended Lie derivative [39] ). Consider the system M (1-2) with augmented state vector (4) and augmented dynamics (5). Assuming that the inputs u are analytical functions, the extended Lie derivative of the output alongf =f (·, u) is: The zero-order derivative is L 0f h = h, and the i−order extended Lie derivatives can be recursively calculated as: Definition 4 (Observability-identifiability matrix [36]). The observability-identifiability matrix of the system M (1-2) with augmented state vector (4), augmented dynamics (5), and analytical inputs u is the following mnx × nx matrix, The FISPO property of M can be analysed by calculating the rank of the observability-identifiability matrix: Theorem 1 (Observability-identifiability condition, OIC [39] ). If the identifiability-observability matrix of a model 99 M satisfies rank (O I (x 0 , u)) = nx = n x + n θ + n w , withx 0 being a (possibly generic) point in the augmented state space, 100 then the system is structurally locally observable and structurally locally identifiable. which the population is divided into three classes: • Susceptible: individuals who have no immunity and may become infected if exposed. • Infected and infectious: an exposed individual becomes infected after contracting the disease. Since an infected 122 individual has the ability to transmit the disease, he/she is also infectious. • Recovered: individuals who are immune to the disease and do not affect its transmission. Another class of models, called SEIR, include an additional compartment to account for the existence of a latent 125 period after the transmission: 126 • Exposed: individuals vulnerable to contracting the disease when they come into contact with it. These idealized models differ from the reality. Contact tracing, screening, or changes in habits are some dif- . Individuals who recover leave the infectious class at rate γ, where 1/γ is the 139 average infectious period. The set of differential equations describing the basic SIR model is given by: As mentioned above, compartmental models can be extended to consider further details. We have found models 141 that incorporate the following features: asymptomatic individuals, births and deaths, delay-time, lock-down, quaran-142 tine, isolation, social distancing, and screening. Figure 1 shows a classification of the SIR models reviewed in this 143 article, and Table 1 J o u r n a l P r e -p r o o f Journal Pre-proof 21 [10] S, I, D, C, R p, q, r β, µ (1)I (2)C (3)C, DṠ (2)R, D Individuals in the SEIR model are divided in four compartments: Susceptible (S), Exposed (E), Infected (I) and 148 Recovered (R). Compared to the SIR models, the additional compartment E allows for a more accurate description 149 of diseases in which the incubation period and the latent period do not coincide, i.e. the period between which an 150 infected becomes infectious. This is why SEIR models are in principle best suited to epidemics with a long incubation 151 period such as COVID-19 [50] . Susceptible individuals move to the exposed class at a rate βI(t), where β is the transmission rate parameter. Exposed individuals become infected at rate κ, where 1/κ is the average latent period. Infected individuals recover at 154 rate γ, where 1/γ is the average infectious period. Thus, the set of differential equations describing the basic SEIR model is: Existing extensions of SEIR models may incorporate some of the following features: asymptomatic individuals, S, E, I, R, P α e , α i , ρ, β, µ, κ, e 0 (1)S + vs(t) [60] S, E, I, Sq, Eq, H, R c, q, λ, β δ i , δ q , α, γ i γ h , θ (1)I, R (2)H, viI, vrR (3)S q, Eq [64] S, E, I, A, J, R α, σ, h, r, q, f, β 1 , β 2 , φ, γ, I 0 (1)I, J (2)I 0.9N [51] S, L, E, I, Q, R γ, β 1 , η, δ, ξ, θ 2 , , θ 1 , α 1 , α 2 , L, QṠ The transmission and recovery rates (β, γ) are the two parameters common to all SIR models. For the SEIR models, the consideration of the β parameter as an unknown input function follows a similar trend to 201 that of the SIR models with the exception of model 38, which gains both observability and identifiability and becomes 202 FISPO. Considering the recovery rate γ (Fig. 7) or the latent period κ (Fig. 6) (which describes the proportion of exposed/latent individuals who become clinically infectious) is considered time-214 varying, all parameters become identifiable (including ρ) and six states become observable (all except R, which is 215 never observable unless it can be directly measured, as we have already mentioned). The fact that allowing an unknown quantity to change in time can improve its observability -and also the observability of other variables in a model -may seem paradoxical. An intuitive explanation can be obtained from the study of the symmetry in the model structure. The existence of Lie symmetries amounts to the possibility of transforming parameters and state variables while leaving the output unchanged, i.e. their existence amounts to lack of structural identifiability and/or observability [72] . The STRIKE-GOLDD toolbox used in this paper includes procedures for finding Lie symmetries [73] . Let us use the SIR 15 model as an example. This model has five parameters (τ, β, ρ, µ, d), of which only τ is identifiable if assumed constant. The model contains the following symmetry: where is the parameter of the Lie group of transformations. Thus, there is a symmetry between ρ and µ that makes Let us now illustrate how the results of this study may be applied in a realistic scenario. We use as an example 223 the model SIR 26, which has 6 states (S, I, R, A, Q, J) and 16 parameters (d 1 , d 2 , d 3 , d 4 , d 5 , d 6 , k 1 , k 2 , λ, γ 1 , γ 2 , 224 a , q , j , µ 1 , µ 2 ); its equations are shown in Table 1 . This model includes the following additional features with 225 respect to the basic SIR model: birth/death, asymptomatic individuals (A), quarantine (Q), and isolation (J). In its 226 original publication two states were measured (Q, J). With these two states as outputs the model has five identifiable 227 parameters (d 1 , d 5 , q , k 2 , µ 1 ) and two observable states (A, I); thus, there are two unobservable states (S, R) and ten 228 unidentifiable parameters. If we are interested in estimating e.g. the number of susceptible individuals (S), this model would not be appro-230 priate. How should we proceed in that scenario? One way of improving observability could be by including more outputs (option 1). For example, since there is a 232 separate class for asymptomatic individuals (A), the infected compartment (I) considers only individuals with symp-233 toms, and we could assume that they can be detected. By including 'I' in the output set, the structural identifiability 234 and observability of the model improves: six more parameters are identifiable (λ, a , j , d 4 , k 1 , µ 2 ) and the state in 235 which we are interested (S) becomes observable. 236 However, including more outputs is not always realistic. Another possibility would then be to reduce the complexity of the model by decreasing the number of additional features (option 2). For example, leaving out the asymptomatic compartment leads to the following model: A third possibility is to simplify the parametrization of the model (option 3). This model considers a different 239 death rate for every compartment (d i , i = 1, . . . , 6.). With some loss of generality, we could consider a specific death 240 rate for infected individuals, d I = d 2 , and a general death rate d for all non-infected and asymptomatic individuals, This reduction of the number of parameters leads to a better observability to the model: 242 the only unidentifiable parameters are d 2 , γ 1 , and k 1 , and the only non-observable state is R. Thus, this option also 243 allows to identify S. it is still impossible to estimate either the transmission rate or the ratio of reported cases. In addition, the three states 254 would also have to be scaled by one of the above-mentioned parameters, thus losing their original meaning. Our analyses have shown that a fraction of the models found in the literature have unidentifiable parameters. Key 257 parameters such as the transmission rate (β), the recovery rate (γ), and the latent period (κ) are structurally identifiable 258 in most, but not all, models. The transmission and recovery rates are identifiable in roughly two thirds of the models, 259 and the latent period in almost all (> 90%) of them. Likewise, the states corresponding to the number of susceptible 260 (S) and exposed (E) individuals are non-observable in roughly one third of the model versions analysed in this paper. The number of infected individuals (I) can usually be directly measured, but it is non-observable in one third of the 262 model versions in which it is not measured. The situation is worse for the number of recovered individuals (R), which 263 is almost never observable unless it is directly measured. Many models include other states in addition to S, E, I, and 264 R, which are not always observable either. 278 Even when it is not possible (or practical) to avoid non-observability or non-identifiability by any means, the model 279 may still be useful, as long as it is only used to infer its observable states or identifiable parameters. For example, we 280 may be interested in determining the transmission rate β but not the number of recovered individuals R; in such case 281 it is fine to use a model in which β is identifiable even if R is not observable. Of course, this means that, to ensure 282 that a model is properly used, it is necessary to characterize its identifiability and observability in detail, to know if 283 the quantity of interest is observable/identifiable. The contribution of this work has been to provide such a detailed analysis of the structural identifiability and 285 Opinion: Mathematical models: A key tool for outbreak response An introduction to mathematical modeling of infectious diseases How simulation modelling can help reduce the 320 impact of COVID-19 Special report: The simulations driving the world's response to COVID-19 Mathematical Epidemiology An introduction to mathematical epidemiology Modeling infectious disease dynamics Wrong but useful-what COVID-19 epidemiologic models can and cannot tell us On the predictability of infectious disease outbreaks Predictability: Can the turning point and end of an expanding epidemic be precisely forecast? Sensitivity analysis for uncertainty quantification in mathematical models, in: Mathematical and statistical 331 estimation approaches in epidemiology Asymptotic estimates of sars-cov-2 infection counts 333 and their sensitivity to stochastic perturbation COVID-19 outbreak in wuhan demon-335 strates the limitations of publicly available case numbers for epidemiological modelling Fitting dynamic models to epidemic outbreaks with quantified uncertainty: a primer for parameter uncertainty, identifiability, 337 and forecasts Why is it difficult to accurately predict the COVID-19 epidemic? A simple planning problem for COVID-19 lockdown A multi-risk SIR model with optimally targeted lockdown Can the COVID-19 epidemic be controlled on the basis of daily test reports? Practical unidentifiability of a simple vector-borne disease model: Implications for parameter estimation and 348 intervention assessment The structural identifiability of a general epidemic (SIR) model with seasonal forcing The structural identifiability of the susceptible infected recovered 352 model with seasonal forcing The structural identifiability of susceptible-infective-recovered type epidemic models with incomplete immunity 354 and birth targeted vaccination Identifiability and estimation of multiple transmission pathways in cholera and waterborne 356 disease Integrating measures of viral prevalence and seroprevalence: a mechanistic modelling approach 358 to explaining cohort patterns of human papillomavirus in women in the usa Population modeling of early COVID-19 epidemic dynamics in 361 french regions and estimation of the lockdown impact on infection rate Structural and practical identifiability analysis of outbreak models Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to 364 infectious disease transmission models Influencing public health policy with data-informed 367 mathematical models of infectious diseases: Recent developments and new challenges Parameter identifiability of fundamental 369 pharmacodynamic models Compartmental Models in Epidemiology Dynamic systems biology modeling and simulation New results for identifiability of nonlinear systems A probabilistic algorithm to test local algebraic observability in polynomial time Observability and structural identifiability of nonlinear biological systems Full observability and estimation of unknown inputs, states, and parameters of nonlinear biological 377 models Local identifiability analysis of nonlinear ode models: how to determine all candidate solutions Nonlinear controllability and observability An efficient method for structural identiability analysis of large dynamic systems Structural identifiability of dynamic systems biology models Genssi 2.0: multi-experiment structural identifiability analysis 386 of sbml models A new version of DAISY to test structural identifiability of biological models SIAN: software for structural identifiability analysis of ode models On finding and using identifiable parameter combinations in nonlinear dynamic systems biology 392 models and combos: A novel web implementation Total variation regularization for compartmental epidemic models with time-varying dynamics Effective containment explains subexponential growth in recent confirmed COVID-19 cases in china Modelling the COVID-19 epidemic and 396 implementation of population-wide interventions in italy A simple SIR model with a large set of asymptomatic infectives Fundamental principles of epidemic 399 spread highlight the immediate need for large-scale serological surveys to assess the stage of the sars-cov-2 epidemic A feedback SIR (fSIR) model highlights advantages and limitations of infection-based social distancing Construction of compartmental models for COVID-19 with quarantine, lockdown and vaccine interventions Models of SEIRS epidemic dynamics with extensions, including network-structured populations, testing, contact tracing, and 405 social distancing A modified SEIR model to predict the COVID-19 outbreak in spain and italy: simulating control scenarios and 407 multi-scale epidemics SEIAR model with asymptomatic cohort and consequences to efficiency of quarantine government measures in 411 COVID-19 epidemic Research about the optimal strategies for prevention and control 413 of varicella outbreak in a school in a central city of china: based on an SEIR dynamic model Epidemic analysis of COVID-19 in china by dynamical modeling Mathematical modeling of epidemic diseases To mask or not to mask: Modeling the 419 potential for face mask use by the general public to curtail the COVID-19 pandemic SEIR transmission dynamics model of 2019 ncov coronavirus with considering the weak infectious ability and changes 421 in latency duration Healthcare impact of COVID-19 epidemic in india: A stochastic mathematical model Modeling the control of COVID-19: impact of policy interventions and 425 meteorological factors Modelling the transmission dynamics of COVID-19 in six high burden countries Mathematical model of transmission dynamics with mitigation and health measures for sars-cov-2 infection in european countries A novel COVID-19 epidemiological model with explicit susceptible and 430 asymptomatic isolation compartments reveals unexpected consequences of timing social distancing A mathematical model of epidemics with screening and variable infectivity Dynamic models for the analysis of epidemic spreads Effects of quarantine in six endemic models for infectious diseases Introduction to SEIR models A time-dependent SIR model for COVID-19 with undetectable infected persons A periodic SEIRS epidemic model with a time-dependent latent period Finding and breaking lie symmetries: Implications for structural identifiability and observability in biological 444 modelling Extensions to a procedure for generating locally identifiable reparameterisations of unidentifiable systems Higher-order lie symmetries in identifiability and predictability analysis of dynamic models Data-based identifiability analysis of non-linear dynamical models Minimal output sets for identifiability This research has received funding from the Spanish Ministry of Science, Innovation and Universities and the European Union FEDER under project grant SYNBIOCONTROL (DPI2017-82896-C2-2-R) and the CSIC intramural project grant MOEBIUS (PIE 202070E062). The funding bodies played no role in the design of the study, the collection and analysis of the data or in the writing of the manuscript.  Structural identiaiility and oiservaiility are desiraile model propertess  They descriie a model's aiility to inform aiout unmeasured parameters and statess  We collect and analyse hundreds of compartmental models of the COVID-19 pandemicss  We show which parameters and states can ie determined from output measurementss  We discuss how to choose the most informatve model for the availaile knowledges The authors whose names are listed immediately below certify that they have NO affi liations with or involvement in any organization or entity with any fi nancial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-fi nancial interest (such as personal or professional relationships, affi liations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. The authors whose names are listed immediately below report the following details of affi liation or involvement in an organization or entity with a fi nancial or non-fi nancial interest in the subject matter or materials discussed in this manuscript. Please specify the nature of the confl ict on a separate sheet of paper if the space below is inadequate.Author names: