key: cord-0457778-tcg2ezjx authors: Deva, Ayush; Shingi, Siddhant; Tiwari, Avtansh; Bannur, Nayana; Jain, Sansiddh; White, Jerome; Raval, Alpan; Merugu, Srujana title: Interpretability of Epidemiological Models : The Curse of Non-Identifiability date: 2021-04-30 journal: nan DOI: nan sha: ff70770c11ad45078bb55dd1b53bb85c2011b589 doc_id: 457778 cord_uid: tcg2ezjx Interpretability of epidemiological models is a key consideration, especially when these models are used in a public health setting. Interpretability is strongly linked to the identifiability of the underlying model parameters, i.e., the ability to estimate parameter values with high confidence given observations. In this paper, we define three separate notions of identifiability that explore the different roles played by the model definition, the loss function, the fitting methodology, and the quality and quantity of data. We define an epidemiological compartmental model framework in which we highlight these non-identifiability issues and their mitigation. The global COVID-19 pandemic has spurred intense interest in epidemiological compartmental models (Thompson, 2020; Brauer, 2008) . The use of epidemiological models is driven by four main criteria: expressivity to faithfully capture the disease dynamics; learnability of parameters conditioned on the available data; interpretability to understand the evolution of the pandemic; and generalizability to future scenarios by incorporating additional information. Compartmental models are popular because they are relatively simple and known to be highly expressive and generalizable. However, their interpretability and learnability depend strongly on the alignment between data observations and model complexity. Different choices of model parameters can often lead to (approximately) the same forecast case counts, leading to what is commonly referred to as non-identifiability (Raue et al., 2009; Jacquez & Perry, 1990) . The problem of nonidentifiability is only exacerbated with increased model complexity. This lack of identifiability is detrimental because (a) parameter distributions estimated from the observed data tend to be biased with large variances, thus precluding easy interpretation, and (b) nonidentifiable models typically have reduced accuracy on long-time forecasts due to the high parameter variance. This phenomenon is illustrated in Figure 1 , where the forecasting errors between a nonidentifiable model and its reparametrized version (later shown to be identifiable) are compared. Non-identifiability in epidemiological models is rooted in the model dynamics, in the fitting loss function and methodology, and in the quality and quantity of data available. Identifiability is typically broadly classified into structural (i.e., purely model-dependent) (Reid, 1977; Massonis et al., 2020) and practical (i.e., data-, loss-and fitting methodology-dependent), with the latter often defined vaguely, and in the context of specific loss functions (Raue et al., 2009; Wieland et al., 2021) . Contributions. This paper delineates various general notions of model identifiability that are contextualized in compartmental epidemiological models, including a novel notion of statistical identifiability that depends on the loss function optimized in estimation, and an empirical framework to assess practical identifiability in terms of the highest posterior density intervals. We study these ideas in the specific context of a SEIR-like compartmental model (SEIARD) that was deployed in a major densely populated city in India for case forecasting to inform capacity and policy decisions. 2 NOTIONS OF IDENTIFIABILITY Consider a dynamical system M characterized byẋ = f (x, θ), where x is the system state,ẋ the time derivative, and θ the parameters. Let y(t) = g(x(t), θ) be the observation function that maps state x to observations at time t. Combining these, we may express y(t) = h(x 0 , θ, t), where x 0 is the initial state. Epidemiological compartmental models are dynamical systems in which states correspond to compartmental population counts, with a subset or aggregates of these counts being observed. We consider three notions of identifiability for such a system (Table 1) . Structural Identifiability. We say that a parameter θ i in a model M is structurally identifiable if for any θ in the domain, h(x 0 ,θ, t) = h(x 0 , θ, t) ∀t =⇒θ i = θ i , i.e., distinct parameter choices result in distinct observation series. Note that in partially observable systems, this notion of identifiability also applies to components of the initial state x 0 that are not observed. In linear timeinvariant systems (LTI), whereẋ = B(θ)x and y = C(θ)x, the resulting solution takes the form x = e B(θ)t x 0 . Structural identifiability for this case is characterized in terms of the properties of B(θ) and C(θ) (Kalman, 1959; Sontag, 2013) . Recent works (Martinelli, 2020; Villaverde, 2019; Massonis et al., 2020) extend these results to nonlinear systems and SEIR model variants. Statistical Identifiability. We propose a new notion of identifiability that depends both on the parametric form of the model and the statistical estimation process. Let y M (t) = h(x 0 , θ, t) denote the model output and y D (t) the observations over a finite time horizon t ∈ [t b , t e ]. Parameter estimation given the observations is carried out by optimizing a loss function L (·), i.e., θ * = arg min θ∈Θ L D (θ) = arg min θ∈Θ te t=t b L (y D (t), y M (t)) . We consider a parameter-wise loss function, called the profile-likelihood (PL) of θ i (Raue et al., 2009) , defined as where θ −i = θ \ θ i denotes the complementary set of parameters. We say a parameter θ i is statistically identifiable within a domain Θ i if L i D (θ i ) is strictly convex with respect to θ i ∈ Θ i . Though this ensures a unique global optimum, it does not necessarily imply joint convexity of L D with respect to θ. Clearly, statistical identifiability implies structural identifiability, but the converse is not always true. For example, a loss function with a flat extended minimum will lead to lack of statistical identifiability even though the model itself is structurally identifiable. However, structural identifiability does imply statistical identifiability for a LTI system with a strictly convex loss function, when the number of observations at distinct times in D is at least equal to the rank of the observability matrix (Villaverde, 2019; Kalman, 1959) . The proof follows from the strict convexity of the loss function, the convexity of the solution e B(θ)t x 0 itself, and the fact that the composition of a convex function with a convex non-decreasing function remains convex . Practical Identifiability Intervals. Models could be both structurally and statistically identifiable, but parameter fitting could nevertheless present practical challenges, related to application-specific tolerance to error on the fitted parameters (Roosa & Chowell, 2019) . This situation is captured through the notion of practical identifiability. Given a model M , data D, loss function L D , and fitting method H, let p D (θ i ) be the resulting posterior distribution of θ i . Following Raue et al. (2009) , we characterize practical identifiability in terms of the α-identifiability intervals of the parameters, in two different ways. The first approach defines this interval in terms of level sets of the loss function, i.e., The second, Bayesian approach defines the α-identifiability interval J i P P (α, L D , H) as a highest posterior density interval (HPDI) (Wright, 1986) with probability mass α. When L i D (·) is strictly convex in θ i and the marginal posterior distribution p i D (θ i ) is monotonically decreasing with respect to L i D (·), the two definitions can be shown to be equivalent based on the convexity properties and one-one association of the corresponding level sets . Empirical analysis in Raue et al. (2013) suggests a similar relationship. See Section A.4 for a detailed analysis. SEIARD Model. Figure 2 shows the SEIARD model, which is an extension of the well-known SEIR model (Hethcote, 2000) . The S (Susceptible), E (Exposed), and I (Infectious) compartments and their associated parameters (transmission rate β, incubation period T inc = σ −1 , and infectious period T inf = γ −1 ) are defined as in the SEIR model. The post-infectious stage consists of parallel paths through A fatal or A recov , depending on the eventual outcome: fatality (D) with probability P fatal or recovery (R), with T fatal and T recov being the respective time durations. Observed case counts are mapped to compartment populations as follows: recovered cases (R)= |R|, deceased cases (D)= |D|, active cases (A)= |A recov | + |A fatal |. Defining state x = (S, E, I, A recov , A fatal , R, D) and observations y = (A recov + A fatal , R, D), the model dynamics ( Figure 3 ) reduces to that of an LTI (Kalman, 1959) in the early stages of the epidemic, because one can then set S N to first approximation, thus yieldingṠ = −βSI/N −βI. Parameter Estimation. In our experiments, synthetic data was simulated from the SEIARD model using realistic parameters (Section A.3). We treat the initial state values of E (E 0 ) and I (I 0 ) compartments as unobserved parameters. We estimate model parameters (including E 0 and I 0 ) by optimizing the mean absolute percentage error (MAPE) between the true and predicted values for each of the three observed time series in y as well as that of the total count (sum of these three). Fitting is performed using two different approaches: Sequential Model Based Optimization (SMBO) based on Tree-structured Parzen Estimators as implemented in the HyperOpt library (Bergstra et al., 2013) , and an MCMC-based sampling approach (see Section A.2). Identifiability in SEIARD Model. Structural identifiability analysis similar to that in Massonis et al. (2020) indicates that the SEIARD model is structurally non-identifiable. This may be attributed to the unobserved compartments (E, I) as well as the split between A recov and A fatal . A larger fatality rate with a larger delay T fatal would be indistinguishable from a smaller fatality with a smaller delay just by observing the active, deceased and recovery counts. This structural non-identifiability also manifests as statistical non-identifiability, as can be seen in the shape of the profile likelihood curves corresponding to the non-reparametrized (Original) SEIARD model in Figure 4 (a) and 4(b) . Similarly, the confidence intervals of the parameters of this model, as shown in Figure 4 (a), 4(b), and 4(d), are broad, indicating considerable practical non-identifiability. Improving Identifiability in SEIARD Model. Non-identifiability in dynamical systems can often be attributed to correlated parameters. Figure 4 (e) shows the correlation between parameters constructed using samples from the joint posterior distribution. A common way (Joubert et al., 2020) to reduce non-identifiability is thus through reparameterization of models by constraining a parameter subset. The choice of parameters to constrain may be informed by secondary information sources. For epidemiological models, parameters such as the incubation period T inc , infectious period T inf , (d) and the time to death T fatal can be estimated from medical literature or line lists of representative case data. A model where these parameters are fixed to their true values (see Section A.3) is referred to as the reparameterized (Reparam.) SEIARD model, which can be shown to be structurally identifiable (Massonis et al., 2020) . The PL curves for the reparameterized model in Figure 4 (a) and 4(b), and the posterior density plot in Figure 4 (d) have relatively tighter confidence intervals, indicating an improvement in practical identifiability. Additionally, Figure 4 (c) empirically indicates that the PL curves and the negative logarithm of the posterior density give similar information. Figure 4 (f) shows the PL curves for the reparameterized model with various training durations, indicating increased practical identifiability with more observations. A larger set of observations may therefore serve to further fine-tune parameters in such a setting. We presented a novel framework to analyze identifiability of model parameters in epidemiological models, using the various lenses of model structure, loss functions and fitting methodology, and data. While the notions of structural and statistical identifiability are useful for detecting and resolving non-identifiability in an analytic fashion via reparameterization, practical identifiability intervals function as an effective tool for fine-tuning the parameter estimation process. We present these ideas and empirical results in the specific context of the SEIARD compartmental model. In the future, we plan to explore connections between different types of identifiability and empirically analyze the identifiability of multiple SEIR variants on real and synthetic data. where N is the city population, β, σ, and γ are the standard epidemiological parameters for a SEIR model, P fatal is the transition probability to the mortality branch, and Trecov and T fatal are timescales that govern transitions out of the Arecov and A fatal compartments. Variables S, E, I, Arecov, A fatal , R, D denote the populations of the similarly named compartments. In the early stages of the epidemic, when S N , the model dynamics reduces to that of an LTI, withẋ = B(θ)x and y = C(θ)x. For our model, we have B(θ) and C(θ) as shown below: Likelihood function. We assume a likelihood function of the form where N (z | µ, σ 2 ) denotes the Normal distribution pdf with mean µ and variance σ 2 following an appropriate conjugate prior. Further, and s is the variance of the normal likelihood function, which is discussed later in this section. Proposal distribution. At iteration k, we generate the samples from the proposal distribution for accept-reject step as θ ∼ Q(θ k−1 , Σprop, θmin, θmax) where θ k−1 is the parameter vector chosen at k−1, and Q(·) is the pdf of a multivariate truncated Gaussian with the parameter range [θmin, θmax] and covariance matrix Σprop as listed in Table 2 (MCMC Proposal Variance). We further assume that s has the conjugate prior s ∼ InvGamma(u, v) where u = 40 and v = 2/700 are hyperparameters. Thus, it is straightforward to show, by multiplying the Normal likelihood with the prior, that if the MCMC chain has sampled parameters θ k , the sample s k ∼ P (s k | θ k , X) is also drawn from an Inverse Gamma distribution with parameters Table 3 : Hyperparameters used to generate the synthetic dataset. We briefly outline the connection between the Bayesian and the loss function-based notions of practical identifiability intervals in Section 2 for clarity. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures Convex optimization Compartmental models in epidemiology The mathematics of infectious diseases Parameter estimation: local identifiability of parameters An efficient procedure to assist in the re-parametrization of structurally unidentifiable models On the general theory of control systems Rank conditions for observability and controllability for time-varying nonlinear systems Structural identifiability and observability of compartmental models of the COVID-19 pandemic Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood For a strictly convex Lipschitz continuous real-valued function LD : Θ → R, with a global minimum c * , its level sets {θ ∈ Θ|LD(θ) ≤ c} for c ∈ [c * , ∞) are (possibly multi-dimensional) convex regions nested within each other Given a probability distribution q(θ) that is non-zero on Θ, every level set of LD(θ) has a welldefined probability mass resulting in a bijection Q L : [0, 1] → [c * , ∞) For any function h : Θ → R that can be expressed as h(θ) = g(LD(θ)) for a strictly monotonically increasing function g(·), the level set {θ ∈ Θ|LD(θ) ≤ c} is the same as {θ ∈ Θ|(θ) ≤ g(c)} for c ∈ [c * , ∞). Due to the mapping between the level sets, the uniqueness of regions with probability mass When LD(θ) is the sum of the negative log-likelihood and appropriate regularization terms corresponding to the prior on θ, then the posterior distribution pD(θ) is strictly monotonically decreasing with respect to LD(θ) since −pD(θ) turns out to be a strictly monotonically increasing function of LD(θ), the level sets map to each other On the other hand, the projection of the credible intervals from pD(θ) on to the i th dimension do not exactly correspond to that of the marginal posterior p i D (θi) This study is made possible by the generous support of the American People through the United States Agency for International Development (USAID). The work described in this article was implemented under the TRACETB Project, managed by WIAI under the terms of Cooperative Agreement Number 72038620CA00006. The contents of this manuscript are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government. This work is co-funded by the Bill and Melinda Gates Foundation, Fondation Botnar and CSIR -Institute of Genomics and Integrative Biology. This work was enabled by several partners through the informal COVID19 data science consortium, most notably a group of volunteer data scientists under the umbrella group 'Data Science India vs. COVID'. We would like to thank the Smart Cities Mission for providing early impetus to this work, the Brihanmumbai Municipal Corporation and the Integrated Disease Surveillance Program (Jharkhand) for their support.