key: cord-0473408-r3by7e0i
authors: Kouye, Henri Mermoz; Mazo, Gildas; Prieur, Cl'ementine; Vergu, Elisabeta
title: Exploiting deterministic algorithms to perform global sensitivity analysis for continuous-time Markov chain compartmental models with application to epidemiology
date: 2022-02-15
journal: nan
DOI: nan
sha: eee5e7e686310e4e5bf5bd2f2ede9369071b5af0
doc_id: 473408
cord_uid: r3by7e0i

In this paper, we develop an approach of global sensitivity analysis for compartmental models based on continuous-time Markov chains. We propose to measure the sensitivity of quantities of interest by representing the Markov chain as a deterministic function of the uncertain parameters and a random variable with known distribution modeling intrinsic randomness. This representation is exact and does not rely on meta-modeling. An application to a SARS-CoV-2 epidemic model is included to illustrate the practical impact of our approach.

The increasing use of mathematical modeling leads to an enhanced complexity of computational models. These models can be seen as mappings which take inputs and return random or deterministic outputs. If the output is random, two evaluations of the model at the same input generate different realizations: the model is said to be stochastic. Otherwise, the model is deterministic.

Stochastic models are often used in epidemiology. Indeed, in order to study and control the spread of infectious diseases in populations (humans, animals or plants), stochastic compartmental models enable to describe epidemic dynamics by incorporating randomness associated with biological and contact events. They consist in dividing the population into disjoint groups with respect to the different health statuses that are considered. The groups form the compartments of the model. As health statuses of individuals change over time, there are transitions between compartments. These transitions occur at random times and depend on numerous uncertain parameters (also called the input hereafter). Even by fixing uncertain parameters, the number of individuals in each compartment varies randomly over time. Therefore, the corresponding process is stochastic and under some modeling assumptions it is a continuous-time Markov chain (CTMC).

Very often, transition parameters of CTMC are poorly known. In order to better characterize and predict epidemic spread and assess corresponding control strategies, it is important to identify key parameters of the infection spread accounting parameter uncertainty. For this purpose, global sensitivity analysis (GSA) can be used. GSA enables to assess influence of uncertain parameters on the model output. However, performing GSA for stochastic model output is challenging. Unlike deterministic models, stochastic models include two sources of uncertainty: parameter uncertainty and intrinsic randomness. Indeed, intrinsic randomness originates from latent random variables that are generally assimilated to noise when performing sensitivity analysis. So far, several approaches have been introduced.

The pragmatic approach for this purpose consists in performing GSA on both conditional expectation and conditional variance of the model output with respect to uncertain parameters. Both quantities are averaged quantities over the intrinsic randomness of the stochastic model. In practice, this comes down to estimate Sobol' indices (Sobol' [1993] ) for two deterministic models. This approach is often used in practice in various applications, for instance in: Courcoul et al. [2011] to identify key parameters of a model describing the spread of an animal disease in a cattle herd; Rimbaud et al. [2018] for a model describing the spatio-temporal spread of plant pathogens; Richard et al. [2021] for a SARS-CoV-2 spread model, Cristancho Fajardo et al. [2021] for a theoretical metapopulation model. However, this approach can suffer from inconsistent conclusions. Since GSA is performed separately on conditional mean and conditional variance, a parameter can appear to be important for a quantity and not for the other one. One can check this on the toy example given by Y = X 1 + X 2 Z with X 1 , X 2 , Z i.i.d. under standard normal distribution such that X 1 , X 2 are the inputs and Z stands for the intrinsic randomness variable. In this example, the first-order indices of X 1 and X 2 of the conditional expectation are respectively S X1 = 1 and S X2 = 0 whereas those of the conditional variance are S X1 = 0 and S X2 = 1. Moreover, in Mazo [2021] it is shown that the Monte-Carlo estimator of first order Sobol' indices for conditional mean with respect to input is biased. So, estimation accuracy issues arise.

In addition, conditional expectations with respect to inputs require an averaging over the intrinsic randomness variable since the law of the latter is unknown. This induces a loss of information as this averaging affects inputs that interact with intrinsic randomness. In order to avoid this loss, Hart et al. [2017] introduced new sensitivity indices. For this purpose, Sobol'-Hoeffding decomposition is interpreted as a random decomposition, where the randomness is due to the intrinsic noise; corresponding Sobol' indices can also be considered as random. The sensitivity indices proposed by Hart et al. [2017] are then defined as expectations of the random Sobol' indices. However, they do not fully reflect interactions between intrinsic randomness and uncertain parameters.

Recently, new methods have been developed for stochastic models. They rely on a paradigm shift in the way of dealing with intrinsic stochasticity. The stochastic model is interpreted as a deterministic model with values in the set of probability distributions on the original output space. Fort et al. [2021] and da Veiga [2021] consider this approach and define new sensitivity indices well-suited to deterministic models with output valued in a set of probability distributions. This approach has the advantage of setting a framework in which stochastic output models are deterministic but it does not allow to assess the interactions between the intrinsic randomness and uncertain parameters.

Furthermore, it should be noted that even though stochastic models are less studied than deterministic ones, not all types of stochastic models have received the same attention from the sensitivity analysis community. The most studied stochastic models are based on stochastic differential equation (see, e.g. Le Maître and , Jimenez et al. [2017] , Étoré et al. [2020] ). For these types of models, methods based on Polynomial Chaos Expansion meta-modeling have been proposed. Conversely, there are few GSA methods for models based on jump processes (Poisson processes, Markov chains, piecewise-deterministic jump processes). Generally, for these models, blackbox or meta-model based GSA are proposed (e.g. Marrel et al. [2012] , Zhu and Sudret [2021] ).

In this paper, we focus on continuous-time Markov based models. We develop an approach for performing GSA for such models. This approach consists first in representing the model outputs as deterministic functions of a random vector of uncertain parameters and intrinsic randomness with known probability distribution and then in exploiting the resulting representation for GSA. This enables to put model output under a deterministic form so that contribution of uncertain parameters and intrinsic randomness as well as interaction of both can be assessed. Concretely, we study the continuous-time Markov chain given by the stochastic process that counts over time the number of individuals in each compartment of epidemic compartmental models. We rely on Gillespie Stochastic Simulation Algorithm (Gillespie [1976] ) allowing exact simulation of Markov chains from which we derive a new representation. Furthermore, we include a second representation, the random time change introduced by Kurtz (Ethier and Kurtz [1986] ) and studied in Navarro Jimenez et al. [2016] so as to achieve comparison of GSA results between the two representations. We apply the two approaches to a SARS-CoV-2 spread model. The paper is organized as follows. In Section 2 we set the framework of stochastic models and associated representations. The definition of Sobol' indices is reminded and dependence of GSA results on the choice of the representation is discussed. Section 3 is dedicated to the description of representations of continuous-time Markov chains. For this purpose, we describe compartmental models and we discuss two different representations: Gillespie and Kurtz. In Section 4, we present the application of our approach to a SARS-CoV-2 spread model. This section includes a description of the model, the GSA results and comparison elements between the results obtained with the two different representations of the model introduced in Section 3.

This section is devoted to introducing (Subsection 2.2) the concept of stochastic model representation under a deterministic form. Under such a form, GSA methods for deterministic models can be applied to perform sensitivity analysis. For this, the definition of the so-called Sobol' indices is briefly reminded in Subsection 2.1 in the context of deterministic models with scalar or functional outputs. In Subsection 2.3, we discuss the question of the dependence of GSA results to the choice of the representation of the stochastic model.

Let us consider a deterministic model g with input X = (X 1 , · · · , X m ) and output Y = g (X) so that E g (X) 2 < +∞ and X 1 , · · · , X m are mutually independent. For such a model, the definition of first-order and total Sobol' indices (Homma and Saltelli [1996] ) is reminded according to two frameworks: a scalar output Y = g (X) ∈ R and a functional output g (X) = (q(t, X); t ∈ {t 0 , · · · , T }) for which dynamical and aggregated Sobol' indices are introduced.

In scalar case, first-order and total Sobol' indices (Sobol' [1993] ) associated to each input X j , j = 1, · · · , m are defined as:

Var (g (X)) ,

) .

In the functional case, we focus on two types of indices: dynamical Sobol' indices and aggregated Sobol' indices. Considering each random variable q(t, X), sensitivity indices such as first-order Sobol' indices and total Sobol' indices can be obtained as in scalar case for each time t as E q(t, X) 2 is finite. Regarding aggregated Sobol' indices, note that output g(X) is multidimensional so that according to Lamboni et al. [2011] and Gamboa et al. [2014] , first-order and total Sobol' indices associated to each input X j , j = 1, · · · , m are respectively given by:

where Var (E [g (X) | X j ]) and Var (g (X)) are the variance-covariance matrices of the random vectors E [g (X) | X j ] and g(X), respectively.

In the following, a stochastic model g is defined as a random function taking θ ∈ Θ ⊂ R p as input and producing an output g(θ) which is a random variable with values in a set Y. If f : Θ×Z −→ Y is a deterministic function and Z is a random element valued in a set Z such that (X, g (X)) and (X, f (X, Z)) are identically distributed, then the couple (f, Z) is said to be a deterministic representation of the stochastic model g. An example of stochastic model with two representations is provided in Example 1.

. Consider the stochastic model g with input X and output g(X) = X + Z. This model can be represented by using f (X, Z 1 , Z 2 ) = X + 1

The function f is not necessarily explicit. It can correspond to an algorithm. Proposition 1 (see Appendix B for the proof) provides a sufficient condition for a couple (f, Z) to be a representation of a stochastic model g.

Proposition 1. Assume that X and Z are independent and that for all θ ∈ Θ, the probability distributions of g(θ) and f (θ, Z) are identical. Then (f, Z) is a deterministic representation of the stochastic model g.

Representing stochastic models under deterministic form is useful in GSA. If both the probability distribution of Z and the function f are known, then the stochastic model becomes a deterministic model with inputs (X, Z). Hence, all the standard methods of GSA such as those presented in Subsection 2.1 can be applied. Therefore, contribution of Z and its interaction with uncertain parameters can be assessed.

However, as already discussed (see Example 1), a stochastic model may admit several representations. Indeed, the way to simulate a random variable is not unique, leading to different stochastic simulators. Considering two distinct representations (f, Z) and (f ′ , Z ′ ) of the same stochastic model necessarily yields that the joint probability distributions of (X, f (X, Z)) and (X, f ′ (X, Z ′ )) are identical. However, the intrinsic randomness elements Z and Z ′ and functions f and f ′ may differ from one representation to the other. A natural question is then whether GSA results depend on the chosen representation. The answer is yes; this point is discussed in Subsection 2.3.

In Subsection 2.2, it appears that representation always preserves the probability distribution of (X, g(X)). Our aim in this section is to prove that the results of GSA depends on the choice of representation.

Let us consider (f, Z) and (f ′ , Z ′ ) two distinct representations of the stochastic model g with input X = (X 1 , · · · , X m ) and output g(X). By definition, (X, g(X)) ∼ (X, f (X, Z)) ∼ (X, f ′ (X, Z ′ )). If u is a subset of {1, · · · , m}, then: Mazo [2021] ). So, it yields:

It implies that closed Sobol' indices associated to X u , u ⊆ {1, · · · , m} are representationfree, i.e. they do not depend on the chosen representation. A straightforward consequence is the invariance of first-order Sobol' indices associated to each X j , j = 1, · · · , m, and of the total Sobol' index associated to Z with respect to the choice of the representation.

However

, the way each function f or f ′ combines its relative intrinsic randomness variable with input X to generate outputs is different. So, differences can appear on some quantities such as conditional expectations with respect to a group of random variables that includes the intrinsic randomness variable. This is illustrated in Example 2.

. So (f, Z) and (f ′ , Z ′ ) represent the same stochastic model but:

First-order Sobol' indices of intrinsic randomness in the two representations are given by:

And total Sobol' indices of X in the two representations are different:

Consequently, first-order Sobol' index associated to intrinsic randomness depends on the choice of the representation. Via a counter-example (see Example 2) and theoretical elements, we showed that GSA results depend on the choice of the representation. This point is illustrated in Section 4 in the context of GSA of continuous-time Markov chain based models, which are the models of interest in this paper.

In this section, we discuss two different representations for CTMC stochastic compartmental models. The section is organized as follows. In Subsection 3.1, we provide a description of the CTMC under study by using graph formalism to represent compartmental models. In Subsection 3.2, we present the representation detailed in Navarro Jimenez et al. [2016] and then we introduce a new representation based on Gillespie Stochastic Simulation Algorithm (SSA).

Consider a closed population that includes N individuals (i.e. N remains constant over time). Assume that an epidemic outbreaks within this population. Individuals can be susceptible or at various stages of infections. So, at each time, each individual is in a certain health status. Then, individuals are grouped according to their health status. The resulting groups form a partition (compartment). Let V be the set of compartments. As health status of each individual can change over time, transitions can take place between compartments. A transition always involves two different compartments, say α, β ∈ V such that α = β. A pair of compartments (α, β) between which transitions are possible defines a type of transition occurring in the direction α → β. So, the pair (α, β) forms an arrow. An individual can move from a compartment α to another β only if there is an arrow from α to β. To each arrow is associated a vector u α,β ∈ {−1, 0, +1} card V whose components are zero except at the components corresponding to α and β which are equal to −1 and +1 respectively. Denote E the set of arrows and n E its cardinal. The couple G = (V, E) is a directed graph with vertices V and edges E.

The intensities of transitions between compartments depend on the specific parameters θ of the epidemic. As θ is generally unknown, let Θ ⊂ R d be the set of all possible parameters θ. We are interested in the dynamics of the number of individuals in each compartment over time, which are stochastic since transitions occur at random times. Let W θ α (t) denote the number of individuals in compartment α at time t and suppose that the initial state ξ 0 ∈ {0, · · · , N } card V of the process W θ := W θ α (t) α∈V ; t ≥ 0 is known. Assume that for each θ ∈ Θ, the stochastic process W θ is a homogeneous continuous-time Markov chain with state space E defined as the the smallest subset of {0, · · · , N } card V that contains all the vectors of the

where n ∈ N and α (i) , β (i) ∈ E for all i = 1, · · · , n. The generator Q θ of W θ is characterized by nonnegative rate functions g α,β defined by:

so that the element of the generator at the row correspoding to ξ ∈ E and column corresponding to ξ ′ ∈ E is given by:

Example 3. Let us consider the classical SIR model:

There are three compartments V = {S, I, R} and two types of transitions: infection (S, I) and removal (I, R) so that E = {(S, I), (I, R)}. Infection is characterized by transition vector u S,I = (−1, +1, 0) and rate function g S,I = β N W I W S . Removal has transition vector u I,R = (0, −1, +1) and rate function g I,R = γ I W I .

Commonly, Gillespie direct method is used to simulate continuous-time Markov chains. Algorithm 1 provides instructions intended to simulating paths of the process W θ for a given θ ∈ Θ.

Assume that the random vector X models the parameter uncertainty. Then, consider the stochastic model (X, W X ). For simplicity, let denote W X by W. We seek deterministic representations (f, Z) of (X, W), i.e. such that the probability distribution of Z is known. For this aim, thanks to Proposition 1, it is sufficient to find Z independent of X and f such that: for all θ ∈ Θ,

In the following, we discuss two representations of (X, W). We present the random-time change representation studied in Le Maître et al. [2015] and Navarro Jimenez et al. [2016] and introduce the new representation based on Gillespie algorithm.

The random time change representation is based on the random time change decomposition of the process W θ for each θ ∈ Θ. This decomposition has been introduced by Ethier and Kurtz [1986] . Consider the vector Z K = (Z α,β ) (α,β)∈E of independent unit-rate Poisson processes Z α,β (·). Kurtz [1982] showed that there exists a function f K satisfying

which defines a continuous-time Markov chain with initial state ξ 0 and generator Q θ . So, f K (·, θ, Z K ) ∼ W θ for all θ ∈ Θ. In addition, Z K does not depend on Θ. By construction, X and Z K are independent. Hence (f K , Z K ) defines a representation of (X, W).

For each (α, β) ∈ E, Z α,β stands for the intrinsic noise of the reaction or type of transition (α, β). So, Z K includes intrinsic noise of each reaction channel or type of transition. Since the Poisson processes Z α,β , (α, β) ∈ E that compose Z K are mutually independent, then this representation enables to assess contribution of intrinsic noise of each reaction channel or type of transition as well as that of the whole intrinsic randomness Z K . Le Maître et al. [2015] and Navarro Jimenez et al. [2016] used this representation to perform GSA for chemical reaction network models in order to estimate contribution of uncertain parameters and reaction channel intrinsic noises. In practice, simulation of this representation relies on the Modified Next Reaction Method (MNRM) developed by Anderson [2007] and provided in Algorithm 2 (Navarro Jimenez et al. [2016] ).

Data

Gillespie SSA is widely used to simulate Markov chains. Especially, in epidemiology, many CTMC model simulators are based on this algorithm. Both Gillespie SSA and the Modified Next Reaction Method generate statistically exact paths of CTMC including W θ : they are stochastically equivalent. However, representations of (X, W) derived from these two algorithms can lead to different sensitivity analysis results corresponding to different information as the interpretation of intrinsic as noise is not the same for both representations. In the following, we introduce a representation of (X, W) based on Gillespie SSA.

Gillespie SSA simulates two processes: the jump process that provides the jump times of W θ and the embedded chain that describes successive states of W θ . Given two independent sequences of i.i.d. standard uniform variables U 1 = (U 1 1 , U 1 2 , · · · ) and U 2 = (U 2 1 , U 2 2 , · · · ), assume that the jump process is simulated from U 1 and the embedded chain from U 2 . In practice, U 1 and U 2 are respectively assimilated to two numbers RG 1 and RG 2 which are used as a seed for the random number generator so that setting seed to a value enables to stream random numbers. The resulting algorithm that is detailed in Algorithm 3 is a modification of the Gillespie SSA provided in Algorithm 1: 

is such that the continuous-time Markov chains W θ and f G (·, θ, Z G ) := f G t, θ, Z G ; t ≥ 0 have the same finite-dimensional distributions.

The proof of Proposition 2 is detailed in Appendix B. Note that Z G does not depend on θ so that Z G and X are independent. Therefore, Propositions 1 and 2 ensure that (f G , Z G ) is a representation of (X, W). The random vector Z G = U 1 , U 2 stands for the intrinsic randomness variable. Z G includes two intrinsic noises U 1 and U 2 that correspond to intrinsic noises of the jump time process and the embedded discrete chain.

Algorithm 3 aggregates all the types of transition processes to generate the sequence of jump times and the discrete chain. So, intrinsic noise associated to each type of transition (α, β) ∈ E cannot be identified with this algorithm. Therefore, it is not possible to assess contribution of intrinsic noises of type of transition processes. To overcome this insufficiency, we can rely on the first reaction method studied by Gillespie (1976) to build a representation allowing to separate the intrinsic noises associated to each type of transition processes.

For this purpose, let Z G be the random vector (U α,β , (α, β) ∈ E) where each U α,β is a sequence of i.i.d. standard uniform variables and assume that each component of Z G is identified to a number RG j with j = 1, · · · , n E . A modification of Gillespie first reaction method algorithm in a similar way as Gillespie direct method yields Algorithm 4: 

is such that the continuous-time Markov chains W θ and f G (·, θ, Z G ) := f G t, θ, Z G ; t ≥ 0 have the same finite-dimensional distributions.

Proposition 3 provides a new representation of (X, W) since X and Z G are independent. This representation allows to separate intrinsic randomness into independent intrinsic noises of type of transition processes. The proof of Proposition 3 is detailed in Appendix B.

The intrinsic randomness captures all the variability of the model which does not depend on epidemic parameters. From an epidemiological point of view, it encompasses the random behavior of individuals and their social interactions, as well as the biological variability between individuals with respect to infection.

In this section, we implement our method on a parsimonious SARS-CoV-2 spread model inspired by the literature (Cazelles et al. [2021] ). We perform variance-based GSA using the two representations: the one analyzed in Navarro Jimenez et al. [2016] and based on the random time change (described in Section 3.2.1) and the new one we introduced in Section 3.2.2 based on the modification of Gillespie algorithm provided in Proposition 3. The model considered in this section, although not necessarily the most refined from the point of view of application, remains very relevant. Moreover, it should be noted that our approach is generic and can therefore be applied to any other compartmental model.

The compartmental model in Figure 1 is used to describe the spread of SARS-CoV-2 within a closed population with size N . This model includes seven compartments corresponding to seven health states and nine transitions between these states. Indeed, in this model, an individual can be susceptible (S), exposed (E) (i.e. infected but not yet infectious), asymptomatic infectious (A), symptomatic infectious (I), hospitalized (H), recovered (R) or dead (D). Two modeling assumptions can be mentioned. First, infection is neglected within hospitals so that hospitalized individuals cannot infect. Secondly, it is assumed that recovered individuals get perfectly immunized, so they cannot be susceptible after recovering. This assumption is valid on short time intervals. 

that takes values in E ⊂ {0, · · · , N } 7 and counts over time the number of individuals in each compartment, where N is the size of the population and T ∈ (0, +∞] is the final time of the study. The set of compartments is given by V = {S, E, A, I, H, R, D}. The different types of transitions and their characteristics (rate function g α,α ′ and transition vector u α,α ′ , where (α, α ′ ) denotes a type of transition of the model) are described in Table 1 .

Rate function (g α,α ′ ) Rate functions depend on the vector parameter θ = β, γ E , γ A , γ I , γ H , p (E,A) , p (I,H) , p (I,D) , p (H,D) . The interpretation of each of these parameters and their ranges of variation in the sensitivity analysis are provided in Table 2 p (E,A) probability for an exposed to become asymptomatic 0.6 (0.3, 0.7) p (I,H) probability for a symptomatic to be hospitalized 0.15 (10 −3 , 0.2) p (I,D) probability for a symptomatic to die 0.05 (10 −3 , 0.1)

probability for a hospitalized to die 0.08 (0.001, 0.1) Knock et al. [2021] ).

The process under study is:

Let Θ be the set of all possible values of θ. For all θ ∈ Θ, W θ is assumed to be a continuous-time Markov chain.

Let denote p I the vector (p (I,H) , p (I,D) ). For GSA purposes, we focus on the group of inputs p I instead of p (I,H) and p (I,D) separately for two reasons. The first reason is related to the assumption of mutual independence of inputs that is necessary for Sobol'-Hoeffding decomposition. Since p (I,H) and p (I,D) are correlated, they are grouped as one input in order to ensure input independence. The second reason is that we are interested in the global influence of the probability distribution (p (I,H) , p (I,D) , 1 − p (I,H) − p (I,D) ) that is represented by p I . Therefore, in the context of sensitivity analysis, there are eight parameter inputs considered: β, γ E , γ A , γ I , γ H , p (E,A) , p I , p (H,D) . Let us model the uncertainty by introducing the random vector X = β, γ E , γ A , γ I , γ H , p (E,A) , p I , p (H,D) such that components are mutually independent with uniform distributions on intervals specified in Table 2 . Specifically, the vector input p I is drawn from a multidimensional uniform on [10 −3 , 0.2] × [10 −3 , 0.1].

We study two types of outputs that are our quantities of interest (QoIs): a scalar QoI and a functional one. The scalar QoI considered is the extinction time Y ext of the epidemic. Y ext is defined as the first time when there are no more individuals E able to get infected or individuals A or I able to maintain infection by infecting new susceptible individuals:

Note that for all θ ∈ Θ, Y ext is well-defined, i.e. Y ext < +∞. Indeed, by considering the compartmental model described in Figure 1 , after a finite number of transitions, the stochastic process will necessarily reach an absorbing state where there is no individual in compartments E, A and I. A boxplot of n = 2000 simulations of Y ext with parameter inputs set to nominal values given in Table 2 is showed in Figure S1 .

The functional QoI is given by the dynamic of the number of symptomatic infectious individuals:

The quantities Y ext and Y I are functions of the random field W = {W X (t), t ∈ [0, T ]}, so they inherit its representations. By relying on Subsection 3.2, we obtained two representations of W: the Gillespie representation (X, W) ∼ X, f G X, Z G where Z G = U 1 , · · · , U 9 is a vector of nine independent sequences (one for each type of transition) of i.i.d. standard uniform variables and the Random time change representation (X, W) ∼ X, f K X, Z K where Z K = Z 1 , · · · , Z 9 is a vector of nine independent unit-rate Poisson processes. For sensitivity analysis, intrinsic randomness variables Z G and Z K are treated as one input each. We are interested in the global influence of (X, Z) where Z denotes Z G or Z K depending on the representation.

We consider a population of N = 2005 individuals including five exposed individuals at the start of the epidemics t = 0, so that for all θ ∈ Θ the CTMC W θ has the initial state:

For the quantity of interest Y ext , the final time T of the study is set to T = +∞.

In practice, this means that trajectories of W θ are simulated until they reach absorbing states, i.e. extinction. Concerning Y I , T is set to T = 60 days. This choice is motivated by the outputs of Y I obtained with the nominal values of parameters and displayed in Figure S2 . This figure shows that trajectories do not much vary after 60 days and about two-thirds of them reach extinction right after T = 60 days which is the 32th percentile of extinction times (see boxplot of Figure S1 ). Therefore, Y I is studied on the interval [0, 60] that is discretized into 1000 equidistant time points for GSA purposes.

Regarding sensitivity index estimation, we drawn n = 2000 samples of each input with respect to uniform distributions over ranges specified in Table 2 . The parameter space exploration is performed by using Latin Hypercube Sampling as the dimension of Θ is quite large. We simulate W θ through Algorithm 2 for random time change representation and Algorithm 4 for Gillespie representation. In both cases, the intrinsic randomness variable is a vector of nine components. Nine integers are drawn independently and uniformly in {1, · · · , 10 9 } to serve as seeds for the random number generator with the aim to stream random numbers.

In practice, simulations are carried out with R. We use the R package DiceDesign (Dupuy et al. [2015] ) for Latin Hypercube Sampling. Sensitivity indices are estimated by using functions soboljansen and sobol2007 of the R package sensitivity (Iooss et al. [2021] ). The function soboljansen is used for total Sobol' index estimation while sobol2007 is used for first-order Sobol' index estimation.

GSA is performed for the two outputs. For Y ext , first-order and total Sobol' indices of the nine inputs are estimated. For each representation, 50 replications of these indices are provided. Regarding Y I , dynamical and aggregated Sobol' indices are estimated. Figure 2 suggests that, considering first-order indices, only two parameters influence the variability of Y ext : β and γ E . Also, there is no significant difference in the input sensitivity indices between the two model representations. Regarding total indices, results point out strong interactions between inputs. Thus, one can notice that almost all the inputs have non-zero total effects except p (H,D) in the case of the Gillespie representation. Two main points can be emphasized: first, the total Sobol' index estimations of Z for the two representations are very close. Indeed, the total Sobol' index of Z is theoretically the same for the two representations (see Section 2.3). That is why estimations are equal up to sampling errors. Secondly, there are significant differences between the estimations for Gillespie and Random time change representations, and the three most influential inputs (more than 25% of variance each) are: β, γ E and Z.

The intrinsic randomness interacts strongly with uncertain parameters. This explains the difference that appears between main and total effects. The ranking of inputs with respect to total Sobol' indices for Gillespie representation yields β, Z, γ E , γ A , γ I , p (E,A) , p I , γ H , p (H,D) while that of Random time change representation is β, γ E , Z, γ A , p (E,A) , γ I , p I , γ H , p (H,D) . From one representation to another, rankings of inputs γ E , Z, γ I , p (E,A) have switched. So, in practice, GSA conclusions relative to ranking depend on representation.

In order to confirm statistically the difference between indices, we carry out asymptotic statistical tests of comparison of means of two samples. For each input, the test consists in comparing the total Sobol' indices renormalized by the total variance (i.e. numerator of total Sobol' index formula defined in Subsection 2.1) obtained for the two representations. The samples derive from the total Sobol' index samples used for the boxplots in Figure 2 .

The null hypothesis (H 0 ) is rejected for inputs γ E , γ A , γ I , γ H , p EA , p I and p HD with p-values less than 2.2 × 10 −16 but not for inputs β and Z as their corresponding p-values 0.75 and 0.85 are greater than 5%. There is no surprise that the test for β is inconclusive since boxplots overlap (see Figure 2b) .

Results provided in Subsection 4.3 reveal that differences can occur between GSA responses for the two representations. In particular, for total Sobol' indices of uncertain parameters, intrinsic randomness distributions are explicitly involved in formulas and consequently estimations for the two representations do not seem to be distributed around the same theoretical value. Even though these differences may legitimately exist, they can have significant repercussions in practice. For this model, we could indeed choose to adopt Gillespie representation as interactions with intrinsic noise are much less important for this representation. However, the most important message to keep in mind is that the intrinsic randomness varies from one representation to another, thus different representations yield different information. And in practice, this can impact GSA conclusions and even resulting decisions.

As showed in Figure 3 , both dynamical first-order and total Sobol' index estimation indicate that the most important input over time is β, followed by γ E , and p (E,A) , except at the very beginning of the epidemic where γ E is the most important parameter. These conclusions are valid for both representations. First-order and total Sobol' indices of Z are the highest from the start until day 5, meaning that the intrinsic stochasticity influence exceeds that of uncertain parameters at the beginning of epidemics. This reflects the fact that intrinsic stochasticity rules the dynamics especially at the beginning of outbreaks when the number of infectious is low.

Random time change representation 

The aggregated Sobol' indices enable to summarize over the whole time interval the impact of inputs. So, for both first-order and total indices, the transmission parameter β remains the most important input, followed by γ E , p (E,A) and Z. The impact of Z is again noticeable when considering total effects, due to its interactions with other inputs. With the aim of validating the differences between total Sobol' indices for the two representations, statistical tests are carried out as in the case of the scalar output Y ext by using samples that allowed to generate boxplots of Figure 4 . The null hypothesis that states the equality between total Sobol' indices of input parameters for the two representations is rejected with p-values smaller than 2.2 × 10 −16 except parameters β and γ E for which the p-values are respectively 2.24×10 −4 and 2.6×10 −14 . The test for Z does not reject the null hypothesis with a p-value of 0.76. Overall, the conclusions of these tests perfectly match with the theoretical analysis detailed in Subsection 2.3.

In this work, we developed an approach of sensitivity analysis for stochastic compartmental models described by continuous-time Markov chains. This approach consists in separating intrinsic randomness from parameter uncertainty by building exact deterministic representations of the model outputs as functions of uncertain parameters and explicit intrinsic randomness. For this purpose, we rely on Gillespie SSA to propose a deterministic representation of continuous-time Markov chains. We present two versions of the new Gillespie SSA based representation: the direct method (see Proposition 2) and the first reaction method (see Proposition 3). The latter representation enables to highlight the impact of intrinsic noise of each type of transition or reaction channel of the model. Regarding the representation based on direct method, it has the advantage to be computationally faster and most commonly used in practice. This approach is applied to a stochastic compartmental model of SARS-CoV-2 spread and is compared to an approach based on random time change analyzed in Navarro Jimenez et al. [2016] . GSA is performed for two quantities of interest. The first QoI studied is the extinction time of epidemics. The Gillespie and random time change representations reveal that the transmission parameter (β) is the most important input and that the intrinsic randomness (Z) much interacts with uncertain parameters. The second QoI is functional and given by the dynamics of number of symptomatic infectious individuals. For both representations, sensitivity analysis highlights again the main role of transmission parameter (β) and incubation mean duration parameter (γ E ) and the slight influence of the intrinsic randomness (Z), except at the very beginning of the epidemic. Estimating Sobol' indices of intrinsic randomness allowed to point out not only the different phases of the epidemic, regarding the influence of the intrinsic randomness, but also its global influence on the whole epidemic dynamic.

Overall, our approach allows to estimate sensitivity indices for the main effect of intrinsic randomness as well as for its interactions with uncertain parameters. This additional information is complementary to the one on the influence of uncertain parameters. We also highlighted the fact that GSA results depend on the chosen representation. In practice, this impacts conclusions resulting from GSA. Nevertheless, the different results between the representations provide interesting information; for instance, choosing a representation which has lower interaction between the intrinsic randomness and the uncertain parameters. This may allow to estimate uncertain parameter values with a better accuracy.

In future research, the approach proposed in this study could be extended to compartmental models based on non-Markovian stochastic processes based on Sellke construction (Sellke [1983] ).

A Some simulations of QoIs Y ext and Y I

For uncertain parameters set to nominal values (see Table 2 ), we simulate n = 2000 outputs of Y ext . The outcomes are presented under the form of boxplot in Figure S1 . Figure S2 shows 20 trajectories of the number of symptomatic infectious individuals corresponding to 20 outputs of the model for parameters set to nominal values (see Table 2 ). The proof is provided for a case of stochastic process so that it can be adapted to real or multidimensional output stochastic models. Assume that for all θ ∈ Θ, g(θ) is a stochastic process {g(θ, t), t ≥ 0} over a discrete state space E. Let A 1 , · · · , A d and B be Borel sets. Assume X has a distribution µ. Consider 0 ≤ t 1 ≤ · · · ≤ t d . Note that P (X ∈ B, g (X, t 1 ) ∈ A 1 , · · · , g (X,

For all θ ∈ Θ, W θ is a CTMC on the finite state space E with generator Q θ . The jump sequence {T n ; n ≥ 0} of W θ is defined as:

The sequence {W θ (T n ); n ≥ 0} is the embedded chain of W θ i.e. a discrete-time Markov chain on E with initial state ξ 0 and transition probabilities given by:

otherwise.

(1) Therefore, W θ can be rewritten under the form:

Consider a discrete-time Markov chain D θ = {D θ n ; n ≥ 0} with transition probabilities (p ξ,ξ ′ ) (ξ,ξ ′ )∈E×E defined in Eq. (1) and initial state ξ 0 . If the sequence {τ n ; n ≥ 0} satisfies:

(α,β)∈E g α,β (θ, D θ n ) Conditionally to D θ , (τ n+1 − τ n ) are mutually independent (2) then the stochastic process H θ = +∞ n=0 D θ n ·1 [τn,τn+1[ (t); t ≥ 0 defines a continuoustime Markov chain with generator Q θ , state space E and initial state ξ 0 . Since E is finite then:

Equation (3) implies that the two processes are stochastically equivalent. The discrete chain D θ and sequence {τ n ; n ≥ 0} can be arbitrarily chosen. Let us construct an example of D θ and sequence {τ n ; n ≥ 0} using two independent sequences U 1 = U 1 1 , U 1 2 , · · · and U 2 = U 2 1 , U 2 2 , · · · of i.i.d. standard uniform variables. Note that E is finite as V is finite. Recall that n E the cardinal of E. So, we order the elements of E with respect to an arbitrarily order: E = {(α (1) , β (1) ), · · · , (α (nE ) , β (nE ) )}. Let us introduce the function:

The discrete chain built by the recursion:

is a Markov chain with transition probabilities (p ξ,ξ ′ ) (ξ,ξ ′ )∈E×E . Furthermore, the sequence: 

satisfies conditions of Eq. (2.) Therefore the discrete chain in Eq. (5) and the sequence in Eq. (6) define a CTMC stochastically equivalent to W θ in the sense of finite dimensional distributions. Let define the function:

Φ θ : L ([0, 1]) −→ E N u = (u 1 , u 2 , · · · , ) −→ Φ θ (u) = ξ 0 , φ θ (ξ 0 , u 1 ), φ θ (φ θ (ξ 0 , u 1 ), u 2 ), · · · By construction Φ θ U 1 = {Φ θ n U 1 = D θ n , n ≥ 0}. If λ n = (α,β)∈E g α,β θ, Φ θ n U 1 then the following function:

is such that f G (, θ, Z G ) is stochastically equivalent to W θ with Z G = U 1 , U 2 . Moreover, since X and Z G are independent and f G (, θ, Z G ) ∼ W θ then the equivalence (X, W) ∼ X, f G (·, X, Z G ) is proved in the same way as in Proposition 1.

We consider the vector of sequences of i.i.d. standard random variables: (U α,β , (α, β) ∈ E)

where U α,β = U 1 α,β , U 2 α,β , · · · . Given the following recursive system: 

(U α,β , (α, β) ∈ E) is a vector of sequences of i.i.d. standard random variables where U α,β = U 1 α,β , U 2 α,β , · · · . We have the following system: it yields D n = ϕ θ (D n−1 , U n ). So, {D θ n , n ≥ 0} defines a Markov chain with initial state ξ 0 . Let us find the transition probabilities of this chain. Let ξ ′ and ξ be two states such that ξ ′ = ξ. We shall discuss two cases.

• Case 1: ξ ′ = ξ + u α,β for all (α, β) ∈ E. In this case, we have: P D θ n+1 = ξ ′ | D θ n = ξ = P ξ ′ = ξ + u α,β 1 s n α,β =min{s n α,β ;(α,β)∈E} | D θ n = ξ = P ξ ′ = ξ + u α,β 1 s n α,β =min{s n α,β ;(α,β)∈E} = 0

• Case 2: ∃ (α,β) ∈ E such that ξ ′ = ξ + uα ,β . In this case, we have: P D θ n = ξ ′ | D θ n−1 = ξ = P ξ ′ = ξ + u α,β 1 s n α,β =min{s n α,β ;(α,β)∈E} | D θ n−1 = ξ = P uα ,β = u α,β 1 s n α,β =min{s n α,β ;(α,β)∈E} | D θ n−1 = ξ = P s n α,β = min{s n α,β ; (α, β) ∈ E} | D θ n−1 = ξ = P s n α,β ≤ s n α,β ; (α, β) ∈ E \ (α,β) | D θ n−1 = ξ

It turns out that {D θ n , n ≥ 0} has transition matrix (p ξ,ξ ′ ) (ξ,ξ ′ )∈E×E .

2. It is straightforward since {U n α,β ; (α, β) ∈ E} are i.i.d. and s n α,β are exponential variables.

3. The construction of f G is similar to the one of the proof of Proposition 2 by using Z G = (U α,β , (α, β) ∈ E) and φ θ = ϕ θ .

A modified next reaction method for simulating chemical systems with time dependent propensities and delays

A mechanistic and data-driven reconstruction of the time-varying reproduction number: Application to the covid-19 epidemic

Modelling the effect of heterogeneity of shedding on the within herd coxiella burnetii spread and identification of key parameters by sensitivity analysis

Accounting for farmers' control decisions in a model of pathogen spread through animal trade

Kernel-based anova decomposition and shapley effects -application to global sensitivity analysis

DiceDesign and DiceEval: Two R packages for design and analysis of computer experiments

Markov processes -characterization and convergence

Global Sensitivity Analysis for Models Described by Stochastic Differential Equations. Methodology and Computing in Applied Probability

Global sensitivity analysis and wasserstein spaces

Sensitivity analysis for multidimensional and functional outputs

A general method for numerically simulating the stochastic time evolution of coupled chemical reactions

Efficient computation of sobol' indices for stochastic models

Importance measures in global sensitivity analysis of nonlinear models. Reliability Engineering and System Safety

sensitivity: Global Sensitivity Analysis of Model Outputs

Nonintrusive Polynomial Chaos Expansions for Sensitivity Analysis in Stochastic Differential Equations

The 2020 sars-cov-2 epidemic in england: key epidemiological drivers and impact of interventions. medRxiv

Representation and approximation of counting processes

Multivariate sensitivity analysis to measure global contribution of input factors in dynamic models

Pc analysis of stochastic differential equations driven by wiener noise

Variance decomposition in stochastic simulators

Global sensitivity analysis of stochastic computer models with joint metamodels

A trade-off between explorations and repetitions for estimators of two global sensitivity indices in stochastic models induced by probability measures

Global sensitivity analysis in stochastic simulators of uncertain reaction networks

Age-structured non-pharmaceutical interventions for optimal control of covid-19 epidemic

Using sensitivity analysis to identify key factors for the propagation of a plant epidemic

On the asymptotic distribution of the size of a stochastic epidemic

Sensitivity analysis for non-linear mathematical models

Global sensitivity analysis for stochastic simulators based on generalized lambda surrogate models