key: cord-0676958-m5l4g6rh
authors: Hamelin, Fr'ed'eric; Iggidr, Abderrahman; Rapaport, Alain; Sallet, Gauthier
title: Observability, Identifiability and Epidemiology A survey
date: 2020-11-20
journal: nan
DOI: nan
sha: f91a10260d6271a0168940984572e918efc2d0d9
doc_id: 676958
cord_uid: m5l4g6rh

In this review, we recall the concepts of Identifiability and Observability of dynamical systems, and analyse them in the framework of Mathematical Epidemiology. We show that, even for simple and well known models of the literature, these properties are not always fulfilled. We then consider the problem of practical identifiability and observability, which are connected to sensitivity and numerical condition numbers. We also recall the concept of observers to reconstruct state variable of the model which are not observed, and show how it can used with epidemiological models.

1.1. Prologue. Many papers in Mathematical Epidemiology have the following structure:

-a model is proposed, -some parameters are given, extracted from literature, -then the remaining unknown parameters are estimated by fitting the model to some observed data. Fitting is done usually by using an optimization algorithm with the use for example of a least square method or a maximum likelihood estimation. To validate the parameters estimation, one can use noisy synthetic simulated data obtained from the model for given values of the parameters, to check that the algorithm is able to reconstruct from the data the values of these parameters with accuracy.

One objective of this paper is to show that this procedure is not always safe and that an examination of the identifiability of parameters is a prerequisite before a numerical determination of parameters. We will review different methods to study identifiability and observability and then consider the problem of numerical identifiability. Our touchstone will be the most famous, however simple, model in Mathematical Epidemiology, the SIR model of Kermack and Mckendrick [112] . This model get a renewed attention with the COVID-19 pandemic [72, 163] . Parameter identifiability analysis addresses the problem of which unknown parameters of an ODE model can uniquely recovered from observed data. We will show that, even for very simple models, identifiability is far from being guaranteed.

The problem of identifiability for epidemiological models is relatively rarely addressed. For instance, a research in the Mathematical Reviews of the American Mathematical Society 1 with epid* AND identifiability gives only 4 papers, while epidem* AND parameter returns 68 publications. Only a small part of the later publications address the problem of identifiability. The following publications consider the problem of identifiability in epidemiological models: [10, 13, 24, 40, 41, 73, 77, 103, 119, 134, 140, 149, 167, 166, 187, 188, 189, 194, 204, 206] . However the majority of these papers is published elsewhere than in Biomathematics journals.

The question of observability, i.e. the ability to reconstruct state variables of the model from measurements, is often considered separately from the problem of identifiability. Either model parameters are known, or the identifiability analysis is performed prior to the study of observability. Indeed, the concepts of identifiability and observability are closely related, as we recall below. However, we shall show that for certain models, it is possible to reconstruct state variables with observers, while the model is not identifiable. In other situations, we show that considering jointly identifiability and observability with observers can be a way to solve the identifiability problem. This is another illustration of the interest of observers. This is why we shall dedicate a fair part of this review to the concept of observers and their practical constructions.

Definitions. The question of parameter identifiability originates from control theory and is related to observability and controllability [174] . The first appearance is in Kalman [109] and is now sixty years old. Identifiability is related to observability: the observability of a model is the ability to reconstruct the state of a system from the observation. In the language of dynamical systems with inputs and outputs, which is the standard paradigm in control systems theory, an input-output relation is defined. The inputs, which are also called the control, are considered as known. We will only consider systems without control, which is a peculiar case where the control is a singleton. When controls are known, with more information, observability/identifiability is sometimes easier. These problems have rarely been considered for uncontrolled systems whereas many methods have been developed for controlled systems. To be more precise, let us consider the following system in R n (1.1) Σ :

(t) = f (x(t)),

where we have denotedẋ(t) = dx dt (t). The ordinary differential equation (ODE)

x = f (x) is the dynamics and x is called the state of the system. To avoid technical details we will assume that for any initial condition x 0 , there exists an unique solution denoted x(t, x 0 ) such that x(0, x 0 ) = x 0 and d dt x(t, x 0 ) = f (x(t, x 0 )).

We will assume that this solution x(t, x 0 ) is defined for any time t ≥ 0. This is often the case with epidemiological models for which the state space is a positively compact invariant set. Then we will assume that the system is defined on a positively invariant compact set Ω, which means that any solution starting in Ω stays in the compact and this implies that the solution is defined for any t ≥ 0. This situation is also often encountered in biological systems.

The output (or "observation") of the system is given by h(x) where h is a differentiable function h : x ∈ R n → h(x) ∈ R m . The set R m is the observation space. We will denote by h(t, x 0 ) or y(t, x 0 ) the observation at time t for an initial condition x 0 . Definition 1.1 (Observability). The system (1.1) is observable if for two distinct initial conditions x 1 = x 2 there exists a time t ≥ 0 such that h(x(t, x 1 )) = h(x(t, x 2 )) Two states are called indistinguishable if for any t ≥ 0 we have h(x(t, x 1 )) = h(x(t, x 2 )).

Indistinguishability means that it is impossible to differentiate the evolution of the system, from two distinct initial conditions, by considering only the observation. Now we consider a system depending on a parameter θ ∈ R p (1.2)

Identifiability is the ability to recover the unknown parameter from the observation. We denote by x(t, x 0 , θ) the solution of (1.2) for an initial condition x 0 . Definition 1.2 (Identifiability). System (1.2) is said to be identifiable if for any distinct parameters θ 1 = θ 2 , there exists t ≥ 0 such that h(x(t, x 0 , θ 1 )) = h(x(t, x 0 , θ 2 )).

There is an obvious similarity between observability and identifiability. Actually we will say that (1.2) is observable and identifiable if the augmented system

Actually, for an epidemiological model it is unlikely to know the initial condition and it has long been recognized that initial conditions play a role in identifying the parameters [67, 121, 186, 206, 169] .

What we have called identifiability is also known as structural identifiability. This expression has been coined by Bellman and K.J.Åström [23] in 1970. This is to stress that identifiability depends only on the dynamics and the observation, under ideal conditions of noise-free observations and error-free model. This is a mathematical and a priori problem [103] .

The observability concept has been introduced by Kalman [109] in the sixties for linear systems. For nonlinear systems, observability has been characterized circa the seventies [90, 94] . The definition is given by Hermann and Krener in the framework of differential geometry. Identifiability and structural identifiability has been introduced in compartmental analysis in 1970 by Bellman andÅström [23] in a paper that appeared in a bio-mathematics journal. The problem of identifiability is now addressed in text-books [120, 200, 198, 197] . Numerical identifiability of linear control system is implemented in softwares such as Matlab and Scilab.

Identifiability of nonlinear systems has been addressed in different context and the first systematic approach is by Tunali and Tarn in 1987 [186] in the differential geometry framework. The introduction of the concepts of differential algebra in control theory is due to Fliess around 1990 [67, 68, 79] followed by Glad [81, 121] . Identifiability is a general problem which has received different names depending on the community: -observation, identification, -data assimilation, -inverse problem, -parameters estimation.

"Data assimilation" is mainly used in en meteorology and oceanography [115, 181] . A direct (as opposed to inverse) problem is considering a model which, when introducing an input, gives an observed output. The parameters are considered as known. Conversely the "inverse problem" is to reconstruct the parameters from the knowledge of the output [183] . Finally, "parameters estimation" is used in the probability and statistics domains [4, 27, 116, 117, 144, 162] .

1.4. Identifiability in mathematical epidemiology. Identifiability is well known in bio-mathematics from the seventies, as already mentioned with the paper of Bellman andÅström [23] . However, considering identifiability in mathematical epidemiology is relatively recent [187, 149, 206, 167, 134, 73, 77] . The first paper, to our knowledge, considering identifiability of an intra-host model of HIV is by Xia and Moog [206] , and has been published in 2003 in a journal of automatic control.

1.5. The concept of observers. The construction of an observer is based on an estimation approach different from statistical methods: it consists of determining a dynamical system (called an "observer") whose input is the vector y(·) of measures acquired over time, and whose state is an estimatex(t) of the (unknown) true state x(t) of the system at time t.

An observer estimates x(t) continuously over time and without anticipation, in the sense that the estimatex(t) is updated at each instant t through its dynamics as measurement y(t) is available, without requiring the knowledge of any future measurement. This is why an observer is sometimes also called a "software sensor". Since the estimatex(t) is given by the solution of a system of differential equations, the main idea behind an observer is the paradigm of integrating instead of differentiating the signal y(·). Note that although an observer is primarily devoted to state estimation, an observer can also aim to reconstructing simultaneously state and parameters, when some parameters are unknown (in this case a parameter vector p is simply considered a part of the system dynamics withṗ = 0).

The most well-known observer is the so-called Luenberger observer [122] that is recalled in Section 4, and that has inspired most of the existing observers (several ones are discussed in Section 4). However, observers are yet relatively unpopular in Mathematical Epidemiology, comparatively to other application domains (such as mechanics, aeronautics, automobile, etc.). The aim of the present review is also to promote the development and use of observers for epidemiological models.

Section 4 presents the theoretical background of observers construction and their convergence as estimators based on the model equations, independently of the quality of real data. In a complementary way, Section 5 discusses some implementation issues when observers are used with real world data that could be corrupted with noise.

2. Mathematical foundations.

2.1. Observability. We consider the following system

where we assume, to avoid technical details, that f and h are C ∞ functions. The function f : R n −→ R n is called a vector field. The classical definition of Lie derivative of a C ∞ function g : R n −→ R with respect to the vector field f is given by

where ∇g is the gradient of g and | the inner product of R n .

2.1.1. Observability with Differential Geometry. The components of the observation map h are denoted by h = (h 1 , · · · , h m ). Each h i is a C ∞ function from the state space R n to R. 

We have the following result Theorem 2.2. For an analytic system (i.e., f and h are analytic functions) the observability is equivalent to the separation of the points of the state space R n by O i.e., if x 1 = x 2 there exists g ∈ O such that g(x 1 ) = g(x 2 ).

Proof. By analyticity we have

but, by induction we have the following relation

Then a necessary and sufficient condition to distinguish x 1 = x 2 is that there exists n such that

We have defined a global observability concept, but it might be necessary to travel a considerable distance or for a long time to distinguish between points of R n . Therefore a local concept of observability is introduced [94] .

Definition 2.3. The system (2.1) Σ is said locally observable if, for any x 0 , for any open set U containing x 0 , x 0 is distinguishable from all the points of U for the restricted system Σ|U . The system (2.1) Σ is locally weakly observable if for any x 0 there exists an open set U containing x 0 , such that for any neighborhood V with x 0 ∈ V ⊂ U , x 0 is distinguishable for Σ|V from all the points of V . Intuitively, a system is locally weakly observable if one can instantaneously distinguish each point from its neighbors. The local weak observability can be characterized as follows

where dψ(x) is the differential of ψ at x. Definition 2.5. A system Σ is said to satisfy the observability rank condition in

where dO is generated by the gradients of the L k f h. Theorem 2.6 (Hermann-Krener [94] ). If Σ satisfies the observability rank condition (ORC) at x 0 then Σ is locally weakly observable at x 0 .

Proof. Since dim (d O(x)) = n, there exists n functions ϕ 1 , · · · , ϕ n ∈ O such that the gradients dϕ 1 , · · · , dϕ n are linearly independent. Therefore the function Φ : x → (ϕ 1 (x), · · · , ϕ n (x)) has a non-singular Jacobian in x 0 . As a consequence there exists an open set U containing x 0 where Ψ is a bijection. On any open set V ⊂ U assume that we have h(x(t, x 0 )) = h(x(t, x 1 )) for x 0 = x 1 and t > 0 sufficiently small. Then by derivation we have

Proposition 2.7. For an analytic system if the observability rank condition is satisfied everywhere the system is locally observable, hence observable.

Proof. It is due to the fact that h(x(t, x 0 )) is sum of its Taylor series. The rank condition implies, for the same reason as before, that the coefficients of the Taylor series separate points.

Exemple 2.8. We consider the SIR model of for which the parameters β, γ are assumed to be known: The system is observable on

With the gradients, the matrix

2.1.2. Observability and differential algebra. From 1970 differential geometry was the tool for studying nonlinear systems in control theory. Circa 1985 Fliess proposed to use Differential Algebra for the analysis of nonlinear systems.

Intuitively, observability for system (2.1) is when the state x can be expressed as an algebraic expression of y and its time derivatives y (k) [68, 67] .

Definition 2.9. [68, 67] A system is said to be algebraically observable if the states can be reconstructed by solving algebraic equations depending only of the observation and its time derivatives.

Note that the systems under consideration are rational systems, i.e., the functions f and h are rational functions [67, 68] . A more precise definition can be given using the formalism of Differential Algebra. Differential Algebra can be considered as a generalization of the concepts of commutative algebra and algebraic geometry to differential equations. This theory, founded by Ritt, is an appropriate framework for defining algebraic observability. The interested reader can consult Fliess' publications for more details [78, 80] . We recover N as a rational expression of y,ẏ,ÿ, then also for S and R. The system is algebraically observable in K.

Consider a state-output system

where f and h are polynomial or rational functions in x.

We have two rational relationsẋ − f (x) = 0 and y − h(x) = 0. The last relation can

By induction we will obtain a sequence of rational relations Q j (x, y) = 0.

It can be shown [10, 68, 171] that to obtain algebraic observability it is sufficient to consider the matrix

and to check that its rank is n the dimension of the state space. This is the Herman-Krener criterion for local weak observability. The number of Lie derivatives is bounded. Specifically, it is proved that it is sufficient compute no more than n − 1 derivatives [10, 68, 171] . To check the observability rank condition, computing (n − 1) Lie derivatives is sufficient.

Proof. A detailed proof is given in the cited references. However an outline goes like this: assume that k is the first integer such that {dh, dL f h, · · · , dL k f h} are linearly dependent with rational coefficients. Then there exists k + 1 rational functions g i , not all of them null, such that

Thanks to the definition of k, we have g k = 0. We will apply L f to this relation

It is well known that d and L f commute. Therefore L f dL i f h = dL i+1 f h. Because the system is rational, L f g i is a rational function. Since g k = 0 this prove that dL k+1 f is a linear combination of the previous Lie derivatives. This ends the proof.

The advantage of the Differential Algebra method promoted by Fliess and others is the possibility to be implemented inside a computer algebra software. Actually as soon as the system is in high dimension, the computations get rapidly involved. There is now software for local or global observability [171, 24, 40, 110, 194, 131, 97, 157, 15, 16] .

Remark 2.12. For linear systemsẋ = Ax, y = Cx, all the definitions of observability are equivalent to having the observability matrix

full rank. This result can be extended to observation vector y ∈ R p with m > 1 (see for instance [108] ).

The observability analysis can also be a way to choose the right sensor, as illustrated on the following example.

Exemple 2.13. Consider a population model structured in five age classes, whose population sizes are x 1 for juveniles, x 2 for subadults capable of reproduction when adults, x 3 for subadults not capable of reproduction when adults, x 4 for adults capable of reproduction, x 5 for adults not capable of reproduction, and the dynamics is

(where α is an aging rate, m 1 , m 2 are mortality rates, and β is a fecundity rate). If only one sub-population x i can be targeted for measurement, one can easily check that the only possibility for the system to be observable is to measure the variable y = x 5 .

Since very often the initial conditions are not known, or partially known, we will consider in the following the problem of identifiability and observability, considering the augmented system (1.3). Note that identifiability-only problems are a special case in which y = x.

We consider a polynomial system

, with x ∈ R n , θ ∈ R p and y ∈ R m . We will consider differential polynomials, i.e., polynomial in n+m variables and their derivatives, with coefficients in R. For examplė x − f (x, θ) andẏ − h(x, θ) are differential polynomials belonging to R[x, y] (the set of polynomials in (x, y) with real coefficients). We consider the parameter θ ∈ R p as a constant, i.e.,θ = 0.

We have n + m differential polynomial equations, n states and m observations. Intuitively we can obtain, by differentiating-multiplying by any differential polynomial, an infinity of new equations. In other words the n + m equations generate a differential ideal I. Any x, y satisfying (2.5) will satisfy these equations in the ideal I.

The idea behind the differential algebra techniques is to get a finite set of differential polynomials which describes and summarizes all the information in the generated ideal. Such a set C is called a characteristic set. The details of the complete algorithm for constructing a characteristic set is rather involved and can be found in the references [113, 121, 159] .

Among all the polynomials in I we can consider the set I c of differential polynomials with only the observation y. Since I c = R[y] ∩ I this set is an ideal. It is possible to obtain a characteristic set for I c , namely C ∩ R[θ][x, y]. This set is obtained by eliminating the state variables from the equations [63] . Actually, since we have no input, this characteristic set is the differential polynomial in y with the lower order [106] . This set is also called, in the literature, the input-output relation of the system [13, 24, 130, 105] . Making the polynomials monic in the input-output relations gives a set of coefficients c i (θ) and a map c : R p −→ R ν (for some ν) which is called the exhaustive summary of the model [142, 166, 169, 13] .

The injectivity of c from the parameter space is only a necessary condition for identifiability [169] . Indeed, the input-output relations do not depend on the initial conditions and since the identifiability is dependent on the initial conditions it can happen that, with c injective, the system is not identifiable. Some authors [132, 187, 73] use the injectivity of c to ascertain the identifiability for almost all initial conditions. The following example shows that the injectivity of c is not sufficient for all initial conditions [169, Section 3.3].

Exemple 2.14 (input-output relation is not sufficient). Consider the following compartmental system

.

Clearly the application (a 21 , a 12 ) → (a 12 + a 21 , a 12 a 21 ) is injective and however the system is not identifiable if x 2 = 0. This can be seen in two ways :

1. It is easy to see thatÿ So we need to have (ẏ 2 − yÿ) = 0 to recover the parameters. Since

we therefore need x 2 = 0. We also need a 12 = 0 for observability, since a 12 x 2 =ẏ + a 21 x 1 .

2. The second way is to compute the Jacobian of [y,ẏ,ÿ, y (3) ] for local identifiability and observability. We have easily

In the case of analytic systems an answer has been given by par D. Aeyels [3, 2] and E. Sontag [177] : for an analytic system with r parameters it is sufficient to randomly choose 2 r + 1 measures to distinguish two different states. This means that generically (hence the term randomly) 2 r + 1 measures are sufficient. Let us stress that the whole state vector x(·) is assumed to be measured and therefore the initial state x(0) is known.

It can happen that a system is identifiable and however not observable.

Exemple 2.15. The following academic model is identifiable but not observable.

We haveẏ = −α y,ÿ = α 2 y and y (p) = (−1) p α p y. The Jacobian

is clearly of rank 2. The parameter α is differentially algebraic on the field R y .

We show here that a change of variables can help to show the identifiability of a model. Let us consider a system in R n parameterized by θ ∈ Θ ⊂ R p

where X ⊂ R n is positively invariant for any θ ∈ Θ.

Proposition 2.16. Assume that the following properties hold.

1. The map f verifies

2. There exist smooth maps g,g, l such that (a) for any solution x(·) of (2.8) in X, w(t) := g(x(t), y(t)) ∈ W ⊂ R m verifies

for any x ∈ X, one has w = g(x, h(x))) ⇐⇒ x =g(w, h(x)) Then the system (1.1) is identifiable over Θ for any initial condition in X.

Proof. Let x 0 ∈ X and denote by x θ (·) the solution of (2.8) for the parameter θ ∈ Θ. Let θ 1 , θ 2 be in Θ. If one has h(x θ1 (t)) = h(x θ2 (t)) = y(t) for any t ≥ 0, then the solution w(·) ofẇ = l(w, y(t)) for the initial condition w(0) = g(x 0 , y(0)) verifies

that is x θ1 (t) = x θ2 (t) for any t > 0. From hypothesis (2.9), we deduce that one has necessarily θ 1 = θ 2 , which shows the identifiability of the system.

Let us illustrate this result on an intra-host model for malaria infection [25] .

where S is the concentration of uninfected erythrocytes in the blood, I i are the concentrations of infected erythrocytes in different age classes, and M is the concentration of free merozoites. The dynamics is given by the following system

where the different parameters are Λ: recruitment of the healthy red blood cells (RBC). β: rate of infection of RBC by merozoites. µ S : natural death rate of healthy cells. µ i : natural death rate of i-th stage of infected cells. γ i : transition rate from i-th stage to (i + 1)-th stage of infected cells. r : number of merozoites released by the late stage of infected cells. µ M : natural death rate of merozoites. The two first stages of infected erythrocytes (I 1 and I 2 ) correspond to the concentration of free circulating parasitized erythrocytes than can be observed (seen on peripheral blood smears) i.e. the quantity I 1 (t) + I 2 (t) is measured at time t. The model (2.11) takes the form

Among parameters in (2.11), some of them (µ i , γ i , and r) are known or at least widely accepted by the community, but the infection rate β is unknown and cannot be estimated by biological considerations. Note that one has ECE = E. Therefore one can consider the variable w = x−Ey = (I −EC)x whose dynamics is independent of the non-linear term βSM :ẇ =Āw +ĀEy + Λ e 1 where we positĀ = A − ECA. The state x is then given by x = w + Ey. So we are exactly in the conditions of Proposition 2.16 for θ = β if one considers X = (R + \{0}) 7 . We immediately deduce that the parameter β is identifiable.

This example illustrates the interest of exploiting conjointly identifiability and observability to solve the identifiability problem. [112] is certainly one of the most famous model in Epidemiology. It is given and studied in all of the classic books of Mathematical Epidemiology. This model appears in the book of Bailey, which is probably the first book in Mathematical Epidemiology. Some examples can be found in [6, 30, 31, 32, 54, 118, 127, 137] . The figure, in the original paper, fitting the model to plague data in Bombay during the 1906 year, is one of the most famous picture in Epidemiology. A research with SIR in MathScinet returns 11 106 articles.

In the quoted books the SIR model is fitted to data in the following ways:

• in [30, 31, 32] the model is fitted to the plague in Eyam (in the year 1666);

• in [54] the model is fitted to an influenza epidemic in England and Wales;

• in [118] a fitting is done with simulated noisy data;

• in [127, 30] , in a chapter devoted to fitting epidemiological models to data, a SIR model is fitted to an influenza outbreak in an English boarding school.

More recently two publications [14, 124] revisit the fit of the Kermack-McKendrick SIR model to the plague in Bombay.

As already mentioned, before attempting to adjust parameters, an identifiability analysis should be performed.

3.1. The different forms of the SIR model. The original model [112] is

where S, I, R represent respectively the numbers of susceptible, infectious and removed individuals. This model can also be found in a slightly different form

where N = S + I + R is the total population. Obviously, one can pass from one model to the other thoughβ = β/N . Both models are mathematically equivalent as long as N is a constant. However, we stress that identifyingβ does not allow one to estimate the parameters β and N separately. For instance, estimatingβ and γ only (without knowing N or β) does not allow one to estimate the basic reproduction number:

3.2. Observability-identifiability of the SIR model. Quite surprisingly, the observability and identifiability of the original Kermack-Mckendrick SIR model has not been studied much, although this model is commonly used to model epidemics.

Interestingly, the observability and identifiability of the SIR model with births and deaths, constant population, and an observation y = k I, has been first studied in 2005 [77] :

where µ is the renewal rate of the population. The article [77] concludes that the system is neither observable nor identifiable.

In [188] the identifiability of (3.2) is addressed assuming (i) that the initial conditions (and therefore N = S(0) + I(0) + R(0)) are known, (ii) observing y = kI with k = 1, and using only the input-output relation to conclude. Under assumptions (i) and (ii), the identifiability is quite immediate, as we shall see, but of limited interest.

Consider the SIR model

The last equation has been omitted since R = N − S − I . The observation is y = k I, in other words only a fraction of the infectious individuals are observed. This situation is used for example in [124, 163] .

Theorem 3.1. The variables and parameters to be estimated are S, I, N , β, γ, and k. System (3.4) is neither observable, nor identifiable.

Particularly if N is known and if k = 1 or k = γ the system is identifiable and observable.

If k = γ then S, I, γ, β N are identifiable.

Remark 3.2. One could believe that if k = γ, with N unknown, then (3.1) is observable. This is wrong. Certainly S, I are observable, but not R. Therefore N = S + I + R is inaccessible. As a consequence, R 0 =βN/γ is not identifiable. Theorem 3.1 can be obtained from [77] by setting µ = 0. However we provide a short and elementary proof.

Proof. We will show that the parameters and the state can be expressed with the observation and their derivatives [68, 67] . Otherwise the identifiability will not be obtained. We consider the system on

This open set U is positively invariant. At any equilibrium point (S 0 , 0) the system is not observable. Therefore we assume y = 0 and also, for the same reason, S = 0. We have (with S = 0 and I = 0)

All the derivatives of y can be expressed as rational expressions of k I, β k N , β S N , γ.

Therefore the only information obtained will be k I, β k N , β S N , γ which are identifiable functions. Now if N is known with k = 1 or k = γ, the parameters β, γ, S, I, R are polynomial functions of (y,ẏ,ÿ), hence the system is observable and identifiable.

Equivalently the variables k S, k I, γ, β k N are identifiable.

3.2.1. Using input-output relations. We proceed by elimination:

Equivalently, one hasẏÿ

We have already seen, with the notation of the previous proof, that

This shows that k I, β k N , β S N , γ are identifiable.

and denote by Jac Φ = ∂Φ ∂(S, I, β, γ, k) the Jacobian of Φ. Then det(Jac Φ) = 0 proves that the system is neither identifiable nor observable. On the other hand with Ψ : (S, I, β, γ) −→ (y, y (1) , y (2) , y (3) ) and denoting by Jac Ψ = ∂Ψ ∂(S, I, β, γ)

the Jacobian of Ψ, we have det(Jac Ψ) = − k 4 β 4 S I 6 N 5 = 0. This proves, with N and k known, the local observability and identifiability.

3.2.3. SIR with cumulative incidence. Very often, the observation are the cumulative numbers of infectious individuals. We study how this change the observability and identifiability of the SIR model. We consider the system where the observation is given by

This problem has been addressed for the SIR model with demography for constant population in [77] . Identifiability with known initial conditions for (3.4) is also considered in [188] using input-output relations.

Theorem 3.3. The system (3.4) with cumulative incidence observation is neither observable, nor identifiable.

The parameters k S, k I, γ, β k N are identifiable. When N is known and when k = 1 or k = γ the system is observable and identifiable.

Proof. A straightforward computation gives:

Differentiating this relation, ifẏ 2 − yÿ = 0, gives

Since the zeroes of an analytic function are isolated we have to prove thatẏ 2 −yÿ ≡ 0.

If this is the case, this implies that d dt

which proves our claim.

4. Observers synthesis. So far we have studied observability as a property such that a measured "signal" y(·) allows to reconstruct the state of the system univoquely, by studying the information provided by the successive derivatives of y(·). Formally, when the system is observable and we know (perfectly) enough derivatives of y(·) at a given time t, we just have to invert the map φ → (h(x), L f h(x), · · · ) at (y(t),ẏ(t),ÿ(t), · · · ) to reconstruct the state variable x(t). In practice, it is known that numerically calculating derivatives on the raw signal data is imprecise and sensitive to measurement noise, especially if several successive derivatives have to be determined. It is generally preferable to use a "filter" to smooth the data, for example using polynomial functions or splines that approach with more regularity the measurements obtained over time, on which the derivative calculations can be performed before the inversion operation.

In this section, we study the construction of observers and their theoretical convergence, without considering their practical performances in presence of measurements noise. This point will be addressed in the more practical Section 5.

Observers for dynamicsẋ = f (x) with observation y = h(x) of the forṁ

where G is a constant matrix, are called Luenberger observers [122] . Note that this construction consists in a copy of the original dynamics f plus a correcting term which depends on an innovation term, that is the difference between the expected outputŷ = h(x(t))) if the true state wasx(t) and the effective measured output y(t). Therefore, ifx(t) and x(t) are equal at a certain time t, it will remain identical at any future time. The main point is that the matrix G has to be chosen such that the estimation errorx(t) − x(t) does converge to 0. When f and h are linear and the system is observable, the theory of linear automatic control teaches that there always exists G such that the convergence speed of the estimator can be chosen arbitrarily fast (see e.g. [8] ). Obviously epidemiological models are rarely linear. However, looking for an Luenberger observer is often a first trial before considering more sophisticated estimators. Indeed, we shall see that for certain nonlinear dynamics, such observers do the job and in other cases, observers can be inspired from this form.

Let us begin by some simple cases of observers for particular dynamics, and then we shall present a more general framework. The choice of the gains vector G providing a convergence of e(t) to 0 comes directly from the poles placement technique of the theory of linear systems that we recall below.

The following lemma is well-known and often used in automatic control. Specifically, if π A (ξ) = ξ n + a 1 ξ n−1 + · · · + a n−1 ξ + a n is the characteristic polynomial of A, then one has

and the σ k designate the symmetric functions of the roots

Remark. This result can be generalized to vectorial observations, i.e. for matrices C with m > 1 rows and n columns.

Finally, by choosing numbers λ i with negative real parts, one can make the convergence of the error given by the exponentially decreasing dynamics (4.3) as fast as desired. The observer (4.2) is thus adjustable. It should be noted that when the differenceẑ 1 (t) − y(t) (usually called "innovation") becomes and remains close to 0, the trajectories of the observer follow those of the system: we can then consider that the observer has practically converged. The innovation is thus very useful in practice because it provides information on the current stage of convergence of the estimate.

We illustrate this technique on a population model with age classes.

Exemple 4.2. Let us consider a population structured in three stages: young, subadult and adult, of stocks x 1 , x 2 , x 3 respectively. It is assumed that only adults x 3 can reproduce, giving birth to young x 1 :

The coefficients a i are the transition rates between age classes, m i are the mortality rates of each class, and r(·) is the reproductive function (usually non-linear and seasonally dependent), for example

Here, it is assumed that only the size of the adult class is measured over time. The aim is to estimate the stocks of larvae and subadults over time. The model (4.5) is of the form (4.1), where we posed

One can check that the observability matrix O si full rank, or alternatively directly check that the system is observable. Indeed, we obtain by using the expressionẋ 3 :

then with the expressionẋ 2 :

Therefore, the following system

with well chosen G 1 , G 2 , G 3 numbers is an observer for the dynamics (4.5), with an exponential convergence.

It may also happen that the estimation error of an observer is only partially assignable, as we shall see in the next example.

Observers via change of coordinates. Consider, as in Section 2.4, sys-

for which there exists a change of coordinates x → w = g(x, h(x)) such that one haṡ

for any solution x(·), with the properties

where g,g, l and k are smooth maps.

Then, one can look for an observerŵ(·) of the system ẇ = l(w, h(x)) y = k(w) and take, as an estimator of x(·) (4.7)x(t) =g(ŵ(t), y(t)) .

There is an advantage of considering such a change of coordinates when the maps g,g, l and k are independent of a parameter θ present in the expression of f , as in Proposition 2.16 of Section 2.4. However, the estimator (4.7) does not filter the measurement y(·) and might be sensitive to noise.

Let us illustrate this approach on the malaria model (Example 2.17). ẇ =Āw +ĀEy + Λ e 1 y = Cx = Cw and one can consider the following observer for system (2.12)

where L is a vector in R 7 to be chosen. The dynamics of the error e(t) =x(t) − x(t) is given byė = (Ā − LC)e

Note that one has CĀ = 0. Therefore the rank of the observability matrix of the pair (Ā, C) is equal to one, and 0 is an eigenvalue ofĀ. The choice of L allows then to assign only one eigenvalue ofĀ − LC, equal to −(L 2 + L 3 ), the other eigenvalues remaining negative. Therefore (4.8) is an observer for system (2.12) with exponential convergence, that does not use the unknown parameter β.

Remark 4.4. Differently to observers of Section 4.1, one cannot expect a convergence speed of the observer (4.8) faster than the dynamics (2.11), because the error dynamics is not completely assignable. However, the convergence is exponential. This is illustrated with numerical simulations in Section 5.2.2.

A typical situation is when one can operate a state decomposition as follows 1. decompose the state vector x (may be at the price of a change of variables) as

where x u represent the unmeasured variables, 2. look for an auxiliary variable (that we called z) whose dynamics is independent of x uż = g(z, y) and asymptotically stable, and such that x u can be globally expressed as

where l is a smooth map (say C 1 ).

Then, the dynamics ż = g(ẑ, y(t)) x u = l(ẑ, y(t))

is an asymptotic observer, whose error convergencex u − x u is simply provided by the asymptotic convergence ofẑ z to 0, whatever is the initial conditionẑ(0). When the convergence speed of an estimator cannot be adjusted, it is usually called an asymptotic observer, differently to the previous section for which the error convergence can be made arbitrarily fast.Note that differently to the previous section, these observers have no tuning parameters and are not driven by innovation terms. These estimators are reduced-order observers when the variable z is of lower dimension than x. An interest for such observers is that it can possess robustness features when the maps g and l are independent of some terms or parameters of the dynamicsẋ = f (x). Let us illustrate this feature on the Kermack-McKendrik model with fluctuating rates. 

where parameters β and ρ fluctuate unpredictably over time. We assume, for simplicity, that the birth rate ν is equal to the death rate µ, so that the total population remains constant of size N = S + I + R (assumed to be known). Let us suppose that the size of the infected population is monitored over time as well as the number of new cured individuals, which amounts considering that the observation vector at time t is

.

Stocks of classes S and R are not initially known. Then, the system (4.10)

is an observer allowing to estimate S and R without knowing β(·) and ρ(·). Indeed, the dynamics of the estimators is

which ensures the convergence of theŜ andR estimates. Note that the internal dynamics of the observer is here of smaller dimension than the system, and that the estimate of the unmeasured state variable S is a function of the internal state Z of the observer and the observation y 1 . The speed of convergence of this observer is not adjustable, but it has the advantage of being perfectly robust to any (unknown) variations of the terms β(·) and ρ(·). This is illustrated with numerical simulations in Section 5.2.3.

In a general way, we will retain that an observer for a system ẋ = f (t, x) y = h(t, x)

is an input-output system (input: y, output:x) of the form ξ = g(t, ξ, y(t)) x = l(t, ξ, y(t))

such as the coupled system

verifies the property lim t→+∞ l(t, ξ(t), h(t, x)) − x(t) = 0 for any initial condition x(0), ξ(0).

In the two previous examples, the dynamics of the estimation error was linear. For an observable non-linear system, the existence of an observer whose estimation error is linear is not guaranteed. This is a difficult problem. However, one can consider the (nonlinear) observability canonical form [85] (given here for a scalar output)

where the map ψ is Lipschitz on R n . Then, one can show that there exists an observer of the Luenberger form with exponential convergence when G is a well-chosen gains vector. When an observ-

is not in normal form, but the application

is a diffeomorphism from R n into R n and the map ψ(z) := L m f h • φ −1 (z) is Lipschitz on R n , then the observer (4.13) can be written in the x coordinates as followsẋ

where Jφ(x) denotes the Jacobian matrix of φ at x. The observer preserves the Luenberger structure but with variable gains.

Let us first note that the pair (A, C) as defined in (4.11)-(4.12) is observable. Indeed, we have O = Id. Thus, according to the Lemma 4.1, one can freely assign the spectrum of A + GC by choosing the vector G. We show now how to choose the eigenvalues of A + GC to ensure the convergence of the non-linear observer 4.13. To do this, we begin by giving some properties of the Vandermonde matrices 

Moreover, for any C > 0 and θ > 0, there exist λ n < λ n−1 < · · · < λ 1 < 0 such that

The proof of Lemma 4.6 is given in Appendix B.

We are now ready to show the convergence of the observer (4.13) in coordinates z, for a gains vector G such that A + GC has n distinct eigenvalues λ 1 , ..., λ n of negative real values. Denote the error e =ẑ − z. We havė e = (A + GC)e + B(ψ(ẑ) − ψ(z)) Let ξ = V e where V designates the Vandermonde matrix V λ1,··· ,λn . Thanks to Lemma 4.6, we obtainξ

where ∆ is the diagonal matrix diag(λ 1 , · · · , λ n ). By multiplying on the left by ξ , one obtains

where L is the Lipschitz constant of ψ. Thus the norm of ξ verifies

and by Gronwall's Lemma, we obtain

Finally, for any θ >0, Lemma 4.6 gives the existence of numbers λ n < λ n−1 < · · · < λ 1 < 0 such that ||ξ(t)|| ≤ ||ξ(0)||e −θt for any t > 0, which guarantees the exponential convergence of the error e to 0.

The observer (4.13) with the gains vector G θ is called high gain observer [85] , because the value of θ must be "sufficiently" large, and its successive powers might take large values.

Remark 4.7. In practice, the map ψ is not necessarily globally Lipschitz on R n . Nevertheless, if there exists a compact sub-set K of R n that is forwardly invariant by the dynamics (4.11), one can consider an extension of ψ outside K that is globally Lipschitz on R n and define then the observer on whole R n (see [152] ).

Let us now illustrate this construction on the Kermack-McKendrik model model. It is also assumed that the size of the total population N = S + I + R is known. To put the system in canonical form, we write

We can then reconstruct the I, S and R stocks from the variables z as follows

Note that ψ is not globally Lipschitz on R 3 , and has a singularity at z 2 = 0. Nevertheless, we notice that the term z 3 /z 2 can be framed as follows

and that one hasż

. We can therefore consider the expressioñ min(b, x) ) .

Finally, we choose the gains G i of the observer such that Sp(A + GC) = {λ 1 , · · · , λ n } with λ n < λ n−1 < · · · < λ 1 < 0 and λ 1 enough negative. This amounts to take G i = σ i ({λ 1 , · · · , λ n }). One thus obtains the internal dynamics of the observer 

4.5. Discussion. The construction of an observer can avoid in certain situations to study the identifiability. For instance, in Sections 4.2 and 4.3, the observers do not require the knowledge of all the parameters of the model, and even of some functions involved in the model. This is why such observers are also called unknowninputs observers. The theory of unknown-inputs observers has been mainly studied for linear systems [99, 39, 139] . Very few general results are available for nonlinear systems (this research field is today largely open).

The existence of observers without the possibility of fixing arbitrarily the speed of convergence, as in Examples 4.3 and 4.5, is connected to the property of detectability (see for instance [135, 9] ), which is a weaker than observability: a system (S) :

is detectable if for any pair of solutions x a (·), x b (·) of (S), one has

For linear dynamicsẋ = Ax, y = Cx that are not observable, there exists a Kalman decomposition [109] i.e. an invertible matrix P such that

where l < n is the rank of the observability matrix O recalled in (2.4) , such that the subsystemż = A 11 z, y = C 1 z is observable. Then, the system is detectable when the matrix A 22 is Hurwitz. This is exactly the case of Example 4.3 with l = 1.

5.1. Practical identifiability. Till now we have studied structural identifiability/observability . While structural identifiability is a property of the model structure, given a set of outputs, practical identifiability is related to the actual data. In particular, it is a measure of the amount of information contained in those data.

A model can be structurally identifiable, but still be practically unidentifiable due to poor data quality, e.g., bad signal-to-noise ratio, errors in measurement or sparse sampling [155] . Structural identifiability means that parameters are identifiable with ideal (continuous, noise-free) data. While structural identifiability is a prerequisite for parameter identification, it does not guarantee that parameters are practically identifiable with a finite number of noisy data points.

Moreover, parameter estimation requires using numerical optimization algorithms. The distance, for the problem considered, to the nearest ill-posed problem, [60, 96] , i.e., the conditioning of the problem, can challenge the convergence of algorithms. Another source of practical unidentifiability is the lack of information from the data, i.e., the signal from the data does not satisfy the persistence of excitation [121] . This is the case when the observation is near an equilibrium [19] .

In this section we use sensitivity analysis and results from asymptotic statistical theory to study practical identifiability. We refer to previous surveys and papers on the topic [45, 17, 18, 55, 20] . Our purpose here is to give an intuitive account of these techniques.

5.1.1. Rationale for using sensitivity analysis. Practical identifiability is often assessed in terms of confidence intervals on parameters [203] . Confidence intervals can be derived from the Fisher Information Matrix (FIM) [27] . More specifically, the covariance matrix (Σ) of the estimated parameters may be approximated as the inverse of the FIM. The diagonal elements of Σ ≈ FIM −1 correspond to the variance of the parameter estimates. Their square-roots (the standard deviations) give confidence intervals on the parameters, thus providing information on practical identifiability.

In the least-squares framework, the Fisher Information Matrix can be expressed in terms of sensitivity matrices, that we define below.

We consider that the initial condition x 0 is unknown. Unless otherwise specified, the term "parameter" now refers to both the parameter θ and the initial condition x 0 , i.e. Θ = (θ, x 0 ). We make explicit the dependence of the state variables x and y on Θ to clarify the following derivations.

with x ∈ R n , y ∈ R m and θ ∈ R p .

We wish to quantify how the observed variable y(t, Θ) changes for a small parameter variation ∆Θ.

We denote the Jacobian of the observation y(t, Θ) with respect to the parameter Θ as

This m × (n + p) matrix is called the sensitivity matrix. By linearization (first-order Taylor approximation), one can write

Side remarks. Reid [158] defined a parameter vector as "sensitivity identifiable" if the above equation can be solved uniquely for ∆Θ. This linear problem is well known: if χ has maximal rank then the solution is given by means of the Moore-Penrose pseudo-inverse χ + = (χ T χ) −1 χ T : ∆Θ = χ + (t, Θ) ∆y(t, Θ).

It is also well known [86] that the sensitivity of this solution is ruled by the condition number κ 2 (χ) = σ max /σ min with σ max and σ min respectively the greatest and smallest singular value of χ (which are the corresponding eigenvalues of χ T χ).

Squares. Now we consider a set of M observations Y i , i = 1, . . . , M , that have been obtained at times t i . We assume that the observation is given by

with the error E i assumed to be a random variable satisfying the following assumptions:

• the errors E i have mean zero E[E i ] = 0;

• the errors have a constant variance var(E i ) = σ 2 ;

• the errors are independent and identically distributed. The Fisher Information Matrix, for the preceding defined observations, is defined as

Solving the ordinary least square (OLS) equations gives an estimatorΘ OLS of the parameter Θ:

Even though the error's distribution is not specified, asymptotic statistical theory can be used to approximate the mean and variance of the estimated Θ (a random variable) [170, 20] : the bias-adjusted approximation for σ 2 (with n + p "parameters") is

5.1.5. Confidence intervals. The above approximation of the error variance can be used to further approximate the parameter covariance matrix Σ:

The standard error (SE) forΘ OLS can be approximated by taking the square roots of the diagonal elements of the covariance matrix Σ: for all k = 1, . . . , n + p,

Finally, to compute the 95% confidence interval for the k-th component of the parameter vectorΘ OLS with n + p "parameters", one may use the Student's t−distribution with M − (n + p) degrees of freedom: letting

the confidence interval is defined aŝ

From these formulas it appears that the conditioning of the Fisher Information Matrix plays an essential role. Huge confidence intervals give indications about the practicality of the identification. 5.1.6. Computing the sensitivity matrix. The sensitivity matrix χ(t, Θ), with Θ = (θ, x 0 ), is obtained by integrating an ODE. The components of the ODE to be integrated depend on whether one differentiates with respect to θ or x 0 .

Differentiating with respect to θ. The first part of the ODE is given by

The Jacobian ∂h ∂x is a m × n matrix while ∂h ∂θ is a m × p matrix. We then have to compute the n × p matrix

Let A(t, Θ) and B(t, Θ) be the following time-dependent n × n and n × p matrices, respectively:

and

It is well known [93] that z(t) is the solution of the linear matrix equation:

with the initial condition z(0, Θ) = 0 n×p (a zero matrix of size n × p).

Differentiating with respect to x 0 . The second part of the ODE is given by

Based on the same reference [93] , w(t, Θ) is solution of the linear matrix ODĖ

with the initial condition w(0, Θ) = Id n×n (the identity matrix of size n). Full system. To summarize, one has to solve the following system in dimension

For large systems, the computation of the different Jacobians can be prohibitive, in this case automatic differentiation software has to be used.

In this section, we consider two classical examples as case studies. These examples have been used in many books of mathematical epidemiology, e.g. [137] .

Case 1. Influenza in a boarding school. Our first example is an outbreak of influenza in a United Kingdom boarding school which occurred in 1978 [38] . In [137] the parameters β, γ are identified by an unspecified "best-fit" algorithm. A more complete analysis is done in [30] , where the analysis is done using sensitivity analysis and asymptotic statistical theory. In [127] the same example is considered. Different sources exist for the data [31, 30, 59] with small differences.

Using the figure in [38] and the Plot Digitizer software, we got an approximation of the data. It was reported that N = 763, and the conditions at the start of this outbreak were S 0 = 762 and I 0 = 1. We used the following data, in which time t is in day and i(t) denotes the number of infectious people at time t. Specifically, we considered model (3.4) with k = 1 (all infectious are assumed to be observed). We obtained the following OLS estimation, as given by the Scilab software:

β ≈ 1.9605032, γ ≈ 0.4751562 (see the numerical code in Appendix C). The fit is shown in Figure 1 . We computed the 95% confidence intervals using the formulas given in the preceding section: (5.2), (5.3), (5.4), (5.5), (5.6), up to a few changes due to the fact that the initial conditions x 0 = (S 0 , I 0 ) are assumed to be known in this example (see Appendix C). We find:

β ≈ 1.9605032 ± 0.0731602 , γ ≈ 0.4751562 ± 0.0408077 .

One can obtain approximately the same results using the likelihood profile method to compute confidence intervals [27] , assuming normally distributed errors. However, it is well known that the profile method quickly becomes impractical for model with more than two parameters [27] , which is the rule rather than the exception, as will be the case in the following example. This is why we stick to the FIM method. Lastly, we note that the condition number of the FIM is approximately equal to 3.80 in this example.

Case 2. Plague in Bombay. Our second example is the Bombay Plague of 1905-1906 [112] . We collected the data from [53, Table IX ], over the same period as [112] (Dec. 17 to Jul. 21). The form of the data is presented in the following table, in which time t is in week, andṙ(t) denotes the number of death per week at time t. We consider that the number of death per week is the same asṘ(t) = γI(t), meaning that all infections lead to death, which is a reasonable assumption in this context [14] . Therefore, we consider model (3.4) with k = γ. In this example, not only the parameters β and γ, but also the size of the population, N , as well as the initial conditions, S 0 and I 0 , are unknown [14] . According to Theorem 3.1 (and Remark 3.2), the model is neither observable nor identifiable. However, the model is partly identifiable in the sense that S 0 , I 0 , γ andβ = β/N are structurally identifiable. Starting from an arbitrary initial guess, we obtained the following OLS estimation, as given by the Scilab software: Proceeding as in the previous example, we obtained the following 95%-confidence intervals:β ≈ 0.0000855 ± 0.0015784, γ ≈ 3.7161743 ± 25.255243 , S 0 ≈ 48113.13 ± 593794.26,

The confidence intervals are huge, which means that we can have absolutely no confidence in the estimated values of the parameters, even though the fit looks good and these parameters are structurally identifiable in principle. In practice, one can show that many other and very different combinations of the parameters can yield approximately the same fit. Note that if we did as if the initial conditions were known, the confidence intervals onβ and γ would be reasonable, as in the previous example. In this example, the condition number of the FIM is approximately 9.14 × 10 24 , meaning that the problem is "sloppy" [42] ; note however that while it is usual that a model is both sloppy and practically non-identifiable, this is not always the case [52] . Altogether, we can conclude that there is a severe practical identifiability issue in this classical example.

5.1.8. Discussion. The SIR model of Kermack-McKendrik has been studied in a series of papers [46, 45, 47, 35] where the problem of observability/identifiability is approached from the statistical point view: addressing parameter identifiability by exploiting properties of both the sensitivity matrix and uncertainty quantification in the form of standard errors. In this series of papers, structural observability and identifiability were not explicitly addressed. For example in [47] the authors identify (S 0 , I 0 , β/N, γ) based on incidence observations -akin to equation (3.5) with k = 1which we have proved to be structurally identifiable (see Theorem 3.3) . Similarly in [35] the authors seek to identify (S 0 , I 0 , β, γ) which, with N known, are structurally identifiable in principle. However the authors encounter practical identifiability issues. This is a typical example of strictly practical unidentifiability (as in our second example, the Plague in Bombay).

Although a structural observability and identifiability analysis should be done as a prerequisite to a practical identifiability analysis, it does not suffice. Moreover, when doing practical identifiability analyses, the error structure of the data should be considered. For instance, sensitivity analyses can be extended to non-constant error variance through Generalized Least Squares (GLS), which makes it possible to test different ways of weighting errors [46] . An additional issue may occur when the output signal is not sufficiently informative (i.e., not persistently exciting [121] ). For example when the data correspond to states near unobservability, e.g., near an equilibrium. In those cases, one has to wait to have data sufficiently far from equilibrium.

To conclude, the problem of observability and identifiability, either structural or practical, is far from being simple, even in relatively simple SIR models with seemingly good quality data [46] . Of course, the more complex the model, the more parameters there are to identify, the more serious the problem of identifiability.

In this section, we show how the various observers presented in Section 4 behave in practice, and the role of the tuning parameters. Up to know, we have assumed the measurements to be perfect i.e. not tainted with any noise. Since integration has good "averaging" properties, an observer is expected to filter noise or inaccuracies in the measurements. However, we will see that the filtering capacity of an observer is related to his convergence speed, which often leads to a "precision-speed" dilemma in the choice of the observer or his settings.

Let us underline that when identifiability/observability cannot be proved theoretically or is too difficult to be proven analytically, one can still look for an observer and study its asymptotic convergence, theoretically or numerically.

Observers with linear assignable error dynamics. We illustrate the observer (4.6) of the age-structured model (4.5) on simulations, for the following values of the parameters. The following code is used to compute the gain vector G for a set of desired eigenvalues. Figure 3 shows convergence for a moderately negative spectrum, while Figure 4 shows the acceleration of convergence obtained for a spectrum located further to the left in the complex plane. For the same choice of gains, Figures 5 and 6 show the effect of noise on the y(·) measurements. It can be seen that a faster convergence is more sensitive to noise and loses accuracy. In practice, one often has to make a compromise for the choice of the observer's setting. 

cannot be tuned as fast as desired. However, this is quite satisfactory in practice. Let us also underline that the observer does not require the reconstruction of the parameter β, although this parameter is identifiable (see Section 2.4). This is a strength of this observer, because the parameter β could switch or fluctuate with time.

We illustrate on simulations the behavior of the asymptotic observer (4.10) of the SIR model with fluctuating rates (4.9), for the following values of the parameters. Here β and ρ are functions of time chosen randomly in between the bounds given in the table. Figures 8 and 9 show that the observer has a convergence relatively insensitive to measurement noise, but the speed of convergence is slow because the exponential decay of the error is equal to µ, which is not adjustable. Unlike the observers in previous sections, let us underline that the present observer is not based on aŷ − y innovation. Therefore, one is not informed of the quality of the estimate over time, which is a price to pay to have a observer insensitive to unknown variations of the epidemic parameters β, ρ.

High gain observer. The non-linear observer (4.15)-(4.15) of the classical SIR model (4.14) is illustrated on simulations for the following values β ρ N 0.4 0.1 10000 where the y cumulative measures were made discretely every day (rounded to the nearest integer). In order to obtain a time-continuous y(·) signal, we performed an interpolation by cubic splines. Figure 10 shows the convergence of the observer for the eigenvalues {−2, −2.2, −2.4}. .

We also simulated the observer when the measurements are corrupted by random counting errors up to ±5 individuals per day (see Figure 11 ). The proof is adapted from [8] .

Let us first consider pairs (Ā,C) of the canonical form known as Brunovsky's formĀ

0 · · · · · · · · · 0 −a n 1 0 · · · · · · 0 −a n−1

where the a i are any numbers. Their observability matrices are lower triangular:

It is easy to see that the characteristic polynomial of the matrix A is given by πĀ(ξ) = ξ n + a 1 ξ n−1 + · · · + a n−1 ξ + a n Indeed, if X is a left eigenvector ofĀ for an eigenvalue λ (possibly complex), XĀ = λX gives

. . X n = λX n−1 = λ n−1 X 1 , −a n X 1 − a n−1 X 2 − · · · − a 1 X n = λX n .

Thus the line vector X is of the form X = 1 λ λ 2 · · · λ n−1 X 1 with X 1 = 0 and λ verifies λ n + a 1 λ n−1 + a 2 λ n−2 + · · · + a n−1 λ + a n X 1 = 0 .

Since X 1 is non-zero, we deduce that the eigenvalues are roots of the polynomial λ n + a 1 λ n−1 + a 2 λ n−2 + · · · + a n−1 λ + a n = 0 which is of degree n and whose coefficient of λ n is equal to 1.

The characteristic polynomial of the matrixĀ +ḠC, whereḠ is a vector of R n with elements denotedḡ i , is written as follows πĀ +ḠC (ξ) = ξ n + (a 1 −ḡ n )ξ n−1 + · · · + (a n−1 −ḡ 2 )ξ + (a n −ḡ 1 ) Thus, one can arbitrarily choose the n coefficients of this polynomial by choosing the n elements ofḠ, and thus freely assign the spectrum of the matrixĀ +ḠC. For any set Λ = {λ 1 , · · · , λ n } of n real or complex numbers two by two conjugates, one has just to identify the coefficients of the polynomial πĀ +ḠC with those of

Let us now show that for any pair (A, C) such that O is full rank, there is an invertible P matrix such that P −1 AP =Ā and CP =C, where the pair (Ā,C) is in the Brunovsky's form. Consider the vector

and the matrix consisting of the concatenation of the columns

Thus the OP matrix is of the form

which shows that P is indeed an invertible matrix. Finally, the columns of the AP matrix are

Its first n − 1 columns are written as follows

By Cayley-Hamilton's Theorem, we have π A (A) = 0, which allows us to write the last column of AP as A n L = −a n L − a n−1 AL − · · · − a 1 A n−1 L = P      −a n −a n−1 . . . The proof is adapted from [76] .

Let X be a left eigenvector of A + GC for the eigenvalue λ i . By writing X(A + GC) = λ i X, we obtain the n − 1 inequalities.

Thus X n is necessarily non-zero and can be taken equal to 1, which gives

We then obtain the n rows of the matrix V λ1,··· ,λn , which defines a matrix of change of basis that diagonalizes the matrix A + GC. The conditions (B.1) amount to write P j (λ i ) = δ ij , i.e. the polynomial P j has n − 1 roots λ i for i = j and P j (λ j ) is equal to 1. So it has the following expression

By identifying its coefficients with those of the expression (B.2), we obtain

where the σ k are the symmetric functions defined in (4.4).

Let ϕ(λ 1 , · · · , λ n ) = λ 1 + C||V −1 λ1,··· ,λn || ∞ + θ

The expression (B.3) shows that the norm ||V −1 λ1,··· ,λn || ∞ becomes arbitrarily large when λ i − λ j approaches 0 (for i = j), which ensures the existence of numbers λ n < λ n−1 < · · · < λ 1 < 0 such as ϕ(λ 1 , · · · , λ n ) > 0. For λ i = −α i (i = 1, · · · , n), we obtain, for any j lim α→+∞ w ij = 0 i < n 1 i = n and ||V −1 −α,−α 2 ,··· ,−α n ||| ∞ thus tends towards 1 when α tends towards +∞, which shows the existence of numbers λ n < λ n−1 < · · · < λ 1 < 0 such that ϕ(λ 1 , · · · , λ n ) < 0. Finally, by continuity of ϕ, we deduce the existence of λ n < λ n−1 < · · · < λ 1 < 0 such that ϕ(λ 1 , · · · , λ n ) = 0.

Appendix C. Implementation of the "Boarding School" example.

C.1. Derivation of the Fisher Information Matrix. In this example, we consider x ∈ R n with n = 2, y ∈ R m with m = 1, i.e.,

and θ ∈ R p with p = 2, i.e., θ = β γ .

We consider the following model, equivalent to model (3.4) with k = 1:

We disregard Θ = (θ, x 0 ) since the initial conditions are assumed to be known in this example. The Jacobian of the observation with respect to the parameter θ is: 

where σ 2 is defined as the sum of the squared error (SSE) divided with M − p instead of M − (n + p) as in equation (5.4), since the initial conditions are assumed to be known in this example. Computing Fisher's Information Matrix. Let

and

The matrix z can be computed by numerically solving the following system of ODE's:

which is a subsystem of (5.7) since the initial conditions are assumed to be known in this example (i.e., we disregard w). In the following code, the entries of x and z are indexed in this way:

leading to χ = z 4 z 6 .

C.2. Numerical implementation. The code has been written with the Scilab language and executed under SCILAB 6.0.0 2 . It consists in a function for identifying β and γ and using the lsqrsolve function which implements the Levenberg-Marquard algorithm to perform ordinary least squares. We could have chosen the fminsearch function which is an implementation of the Nelder-Mead algorithm, but this gives exactly the same results. For solving ODE's, Scilab uses the lsoda solver of ODEPACK. It automatically selects between non-stiff predictor-corrector Adams method and stiff Backward Differentiation Formula (BDF) method. It uses non-stiff method initially and dynamically monitors data in order to decide which method to use.

We define the following functions in the Scilab environment: Appendix D. Implementation of the "Plague in Bombay" example.

D.1. Derivation of the Fisher Information Matrix. In this example, we consider x ∈ R n with n = 2, y ∈ R m with m = 1, i.e.,

x(t) = S(t) I(t) , y(t) = I(t) ,

and θ ∈ R p with p = 2, i.e.,β = β N , θ = β γ .

We consider the following model, equivalent to model We consider Θ = (θ, x 0 ) since the initial conditions are assumed to be unknown in this example. The Jacobian of the observation with respect to the parameter Θ is:

This Jacobian has dimension m × (p + n) = 1 × 4. .

Let {t i }, i = 0, 1, 2, . . . , M , be the sampling times. Fisher's Information Matrix is defined as:

where σ 2 is defined as in equation (5. The FIM can be computed via numerically solving the following system of ODE's:

x = f (x, θ) , x(0) = x 0 , z = Az + B , z(0) = 0 n×p , w = Aw , w(0) = Id n×n , which repeats equation (5.7). In the following code, the entries of x, z, and w are indexed in this way:

x = x 1 x 2 , z = z 3 z 5 z 4 z 6 , w = w 7 w 9 w 8 w 10 , leading to χ = γ z 4 z 6 + x 2 w 8 w 10 .

Although the code is very similar the one provided in the previous example (Appendix C), we provide it for convenience, as it required a number of small changes. We define the following functions in the Scilab environment: 

Observer design for a class of nonlinear piecewise systems. Application to an epidemic model with treatment

Generic observability of differentiable systems

On the number of samples necessary to achieve observability

A new look at the statistical model identification

An observer-based vaccination control law for an SEIR epidemic model based on feedback linearization techniques for nonlinear systems

Non-linear phenomena in host-parasite interactions

Control Theory for Engineers: A Primer

Observability necessary conditions for the existence of observers

Observability and identifiability of nonlinear systems with applications in biology

Minimal output sets for identifiability

Systems biology: parameter estimation for biochemical models

Global identifiability of nonlinear models of biological systems

The model of Kermack and McKendrick for the plague epidemic in Bombay and the type reproduction number with seasonality

Amigo, a toolbox for advanced model identification in systems biology using global optimization

Amigo2, a toolbox for dynamic modeling, optimization and control in systems biology

Parameter selection methods in inverse problem formulation

An inverse problem statistical methodology summary

Standard errors and confidence intervals in inverse problems: sensitivity and associated pitfalls

Modeling and inverse problems in the presence of uncertainty

Mathematical and experimental modeling of physical and biological processes

Macdonald's model and the transmission of bilharzia

On structural identifiability

Daisy: a new software tool to test global identifiability of biological and physiological systems

On the estimation of sequestered infected erythrocytes in plasmodium falciparum malaria patients

Locally parameter identifiable systems are generic

Ecological models and data in R

Prevalence of locally parameter identifiable systems

Computing representations for radicals of finitely generated differential ideals

Mathematical models in population biology and epidemiology

Mathematical Epidemiology, no. 1945 in Lectures Notes in Math

Practical identifiability analysis of large environ-mental simulation models

Estimation of parameters in a structured SIR model

Parameter estimation and uncertainty quantification for an epidemic model

Parameter estimation of some epidemic models. The case of recurrent epidemics caused by respiratory syncytial virus

Synthesis of nonlinear observers: a harmonic-analysis approach

Influenza in a boarding school

Unknown input observer design for a class of nonlinear systems: an LMI approach

Genssi: a software toolbox for structural identifiability analysis of biological models

Structural identifiability of systems biology models: a critical comparison of methods

On the relationship between sloppiness and identifiability

On the relationship between sloppiness and identifiability

Recent developments in parameter estimation and structure identification of biochemical and genomic systems

A sensitivity matrix based methodology for inverse problem formulation

The estimation of the effective reproductive number from disease outbreak data

The estimation of the effective reproductive number from disease outbreak data

Parameter and structural identifiability concepts and ambiguities: a critical review and analysis

Identifiability of compartmental systems and related structural properties

Controllability, Observability and Structural Identifiability of Multi Input and Multi Output Biological Compartmental Systems

On the structural identifiability of biological compartmental systems in a general inputoutput configuration

Parameter redundancy and identifiability

Xxii-epidemiological observations made by the commission in bombay city

Epidemic modelling : An introduction

Nonlinear Models for Repeated Measurement Data

How does transmission of infection depend on population size ?

Observer-based vaccination strategy for a true mass action SEIR epidemic model with potential estimation of all the populations

Virus dynamics: A global analysis

Quantitative modeling with mathematical and computational methods

On condition numbers and the distance to the nearest ill-posed problem

A necessary condition, a sufficient condition for structural identifiability

An easy to check criterion for (un)identifiability of uncontrolled systems and its applications

Some effective approaches to check the identifiability of uncontrolled nonlinear systems

Observer design for a schistosomiasis model

Elimination in control theory

Differential-algebraic decision methods and some applications to system theory, Theoret

Nonlinear observability, identifiability, and persistent trajectories, in proceedings 36th IEEE-CDC

On nonlinear observability

On parameter and structural identifiability : nonunique observability/reconctructibility for identifiable systems, other ambiguities and new definitions

Dynamic systems Biology modeling and application

Limits of variance-based sensitivity analysis for non-identifiability testing in high dimensional dynamic models

Identifying the number of unreported cases in SIR epidemic models

Identifiability and estimation of multiple transmission pathways in cholera and waterborne disease

Identifiability of uncontrolled nonlinear rational systems

Structural indistinguishability between uncontrolled (autonomous) nonlinear analytic systems

A Luenberger-like observer for nonlinear systems

The structural identifiability of the susceptible infected recovered model with seasonal forcing

Quelques définitions de la théorie des systèmesà la lumière des corps différentiels

Nonlinear control theory and differential algebra

Automatique et corps différentiels

in Springe sciences

Reconstructeurs d'Etats

Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems

Observability for systems with more outputs than inputs and asymptotic observers

Deterministic Observation Theory and Applications

A model for estimating total parasite load in falciparum malaria patients

The regulation of malaria parasitaemia: parameter estimates for a population model

Estimating sequestered parasite population dynamics in cerebral malaria

On the observability of nonlinear systems. I

Introductory overview of identifiability analysis: A guide to evaluating whether you have the right type of data for your modeling purpose

Parameter identification in epidemic models

Ordinary differential equations

Nonlinear controllability and observability

Van Ark, Epidemiological models for heterogeneous populations: proportionate mixing, parameter estimation, and immunization programs

Matrix nearness problems and applications

SIAN: a tool for assessing structural identifiability of parametric ODEs

Global identifiability of differential models

Design of observers for linear systems with unknown inputs

High-dimensional Bayesian parameter estimation: case study for a model of JAK2/STAT5 signaling

Stability analysis and observer design for discrete-time SEIR epidemic models

Numerical paremeter identifiability and estimability : Integrating identifiability, estimability and optimal sampling design

Parameter identifiability of fundamental pharmacodynamic models

Three novel approaches to structural identifiability analysis in mixed-effects models

Algebraic Methods for Modeling and Design in Control

Some remarks about an identifiability result of nonlinear systems

Linear systems

Mathematical description of linear dynamical systems

An efficient Method for Structural identifiability Analysis of Large Dynamic Systems

Identification of silent infections in SIR epidemics

A contribution to the mathematical theory of epidemics

Differential algebra and algebraic groups

The identification of structural characteristics

Variational algorithms for analysis and assimilation of meteorological observations : theoretical aspects

Stochastic differential equations as a tool to regularize the parameter estimation problem for continuous time dynamical systems given discrete time measurements

Fitting mechanistic epidemic models to data: a comparison of simple Markov chain Monte Carlo approaches

An introduction to mathematical modeling of infectious diseases

On the identifiability of transmission dynamic models for infectious diseases

System identification : Theory for the user

On global identifiability for arbitrary model parametrizations, Automatica J. IFAC

An Introduction to Observers

The dynamics of helminth infections, with special reference to schistosomes

The parameter identification problem for SIR epidemic models: identifying unreported cases

Differential algebra methods for the study of the structural identifiability of rational function state-space models in the biosciences

Structural identifiability analysis of some highly structured families of state space models using differential algebra

An introduction to mathematical epidemiology

How should pathogen transmission be modelled?

III, Alternative to Ritt's pseudodivision for finding the input-output equations of multi-output models

III, An algorithm for finding globally identifiable parameter combinations of nonlinear ODE models using Gröbner bases

On finding and using identifiable parameter combinations in nonlinear dynamic systems biology models and combos: a novel web implementation

Algebraic tools for the analysis of state space models

A Differential Algebra Method for Eliminating Unidentifiability

On identifiability of nonlinear ODE models and applications in viral dynamics

Global observability and detectability analysis of uncertain reaction systems

Parameter identification in dynamical models of anaerobic waste water treatment

Mathematical Biology I: An introduction

A general procedure for accurate parameter estimation in dynamic systems using new estimation errors

The Unknown Input Observer and its Advantages with Examples

Ebola virus infection modeling and identifiability problems

virus dynamics. Mathematical principles of immunology and virology

Le problème de l'identifiabilité structurelle globale :étude théorique, méthodes effectives et bornes de complexité

Standard bases of differential ideals, in Applied algebra, algebraic algorithms and error-correcting codes

A tutorial introduction to bayesian inference for stochastic epidemic models using markov chain monte carlo methods

Multi-experiment parameter identifiability of odes and model theory

Information sensitivity functions to assess parameter information gain and identifiability of dynamical systems

An information-theoretic approach to assess practical identifiability of parametric dynamical systems

Parameter estimation in ordinary differential equations for biochemical processes using the methodof multiple shooting

Identifiability analysis of an epidemiological model in a structured population

System identifiability based on the power series expansion of the solution

On parameter estimation approaches for predicting disease transmission through optimization, deep learning and statistical inference methods

Design of exponential observers for nonlinear systems by embedding

Identifiability and observability analysis for experimental design in nonlinear dynamical models

Comparison of approaches for parameter identifiability analysis of biological systems

Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood

Addressing parameter identifiability by model-based experimentation

Data2dynamics: a modeling environment tailored to parameter estimation in dynamical systems

Structural identifiability in linear-time invariant systems

Differential algebra

An epidemic model with noisy parameters

Bayesian inference for dynamical systems

Why is it difficult to accurately predict the covid-19 epidemic?

Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems

The prevention of malaria

Parameter identifiability of nonlinear biological systems

An effective automatic procedure for testing parameter identifiability of HIV/AIDS models

Structural vs practical identifiability in system biology

Parameter identifiability of nonlinear systems: the role of initial conditions

Nonlinear regression, Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics

A probabilistic algorithm to test local algebraic observability in polynomial time

Studying the identifiability of epidemiological models using MCMC

On the length of inputs necessary in order to identify a deterministic linear system

Mathematical control theory, deterministic finite dimensional systems

Spaces of observables in nonlinear control

Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets

For differential equations with r parameters, 2r + 1 experiments are enough for identification

Dynamic compensation, parameter identifiability, and equivariances

I/o equations for nonlinear systems and observation spaces

Parameter estimation and site-specific calibration of disease transmission models

On the mathematics of data assimilation

Parameters and state estimation for dengue epidemic model

Inverse problem theory and methods for model parameter estimation

Generalized sensitivity functions in physiological system identification

Profile likelihood-based analyses of infectious disease models

New results for identifiability of nonlinear systems

Structural and practical identifiability issues of immuno-epidemiological vector-host models with application to Rift Valley Fever

Structural and practical identifiability analysis of outbreak models

Structural and practical identifiability analysis of Zika epidemiological models

Similarity transformation approach to identifiability analysis of nonlinear compartmental models

Remarks on modeling hostviral dynamics and treatment., in Mathematical approaches for emerging and reemerging infectious diseases: An introduction

Observability and structural identifiability of nonlinear biological systems

Reverse engineering and identification insystems biology: strategies, perspectivesand challenges

Structural identifiability of dynamic systems biology models

Full observability and estimation of unknown inputs, states and parameters of nonlinear biological models

Identifiability of state space models

Global approaches to identifiability testing for linear and nonlinear state space models

Identification de Modèles Paramétriquesà partir de Données Expérimentales

Identifiabilities and nonlinearities

From experimental data, Translated from the 1994 French original and revised by the authors

On two definitions of observation spaces

Orders of input/output differential equations and state-space dimensions

On structural and practical identifiability

Parameter identifiability and estimation of HIV/AIDS dynamic models

Estimation of HIV/AIDS parameters

Identifiability of nonlinear systems with application to HIV/AIDS models

Structural identifiability and indistinguishability of compartmental models

Acknowledgments. The authors are grateful to P.-A. Bliman, N. Cunniffe, C. Lobry, J. Harmand, T. Sari and M. Souza for exchanges and fruitful discussions.