key: cord-0004733-7c4jsvlz
authors: Simon, Péter L.; Taylor, Michael; Kiss, Istvan Z.
title: Exact epidemic models on graphs using graph-automorphism driven lumping
date: 2010-04-28
journal: J Math Biol
DOI: 10.1007/s00285-010-0344-x
sha: 450a282e43d38ff85826f975228d51c650361b56
doc_id: 4733
cord_uid: 7c4jsvlz

The dynamics of disease transmission strongly depends on the properties of the population contact network. Pair-approximation models and individual-based network simulation have been used extensively to model contact networks with non-trivial properties. In this paper, using a continuous time Markov chain, we start from the exact formulation of a simple epidemic model on an arbitrary contact network and rigorously derive and prove some known results that were previously mainly justified based on some biological hypotheses. The main result of the paper is the illustration of the link between graph automorphisms and the process of lumping whereby the number of equations in a system of linear differential equations can be significantly reduced. The main advantage of lumping is that the simplified lumped system is not an approximation of the original system but rather an exact version of this. For a special class of graphs, we show how the lumped system can be obtained by using graph automorphisms. Finally, we discuss the advantages and possible applications of exact epidemic models and lumping.

county etc), disease characteristics and various control programmes that are aimed at halting disease transmission or bringing infection prevalence to as low a level as possible (Anderson and May 1991) . The main aim of many models is to gain insight into how diseases transmit and to identify the most effective strategies for their prevention and control. The early work by Kermack and McKendrick (1927) forms the basis of differential-equation-based models which lie at the heart of modern quantitative epidemiology. Traditionally, mathematical epidemiology is based on differential equation models (Anderson and May 1991; Diekmann and Heesterbeek 2000) and these operate on the basis of some strong simplifying assumptions about the behaviour of the individuals and the biology of the disease. While these models do not explicitly model contact-network structure, they allow us to improve our understanding of the disease transmission process and to derive threshold quantities such as the basic reproduction number R 0 and critical vaccination threshold (Diekmann and Heesterbeek 2000) in terms of key biological parameters such as rate of infection and infectious period.

A key component in any disease transmission model is the population contact structure (i.e. the collection of disease causing contacts between individual epidemiological units) (Ball et al. 1997; Diekmann et al. 1998) . In most cases, this is highly heterogeneous with strong correlations and non-trivial large scale structure (Hufnagel et al. 2004; Kao et al. 2006; Kiss et al. 2006a) which can make it difficult to use compartmental differential equations. In an attempt to increase model accuracy, in the context of ecological and epidemiological modelling, various pair-approximation (Keeling 1999; Keeling et al. 1997; Rand 1999; Sato et al. 1994; van Baalen 2000) and individual-based models (Keeling and Eames 2005) have been developed. These more sophisticated models can more accurately capture contact network properties but often at the expense of limited analytical tractability. Pair-approximation models focus on the interaction between pairs of individuals and assume that changes in the status of individuals depends on the status of their neighbours. A key factor in such models is the closing relation which is used to truncate and close equations at the level of pairs. This is a useful approach which works well for certain classes of networks, although its precise domain of applicability remains unknown.

Individual-based models where each individual is represented as a node and potentially infectious links between these as edges is a more flexible approach that can account for many complexities in the contact structure (Keeling and Eames 2005; Kenah and Robins 2007; Newman 2002) . However, this comes at the price of little analytical tractability, unless some strong limiting assumptions (i.e. proportionate mixing), are made. Individual-based network models fall in two broad categories. First, those where network data is available and can be used to specify the contact network (Dent et al. 2008; Green et al. 2006; Kao et al. 2006; Kiss et al. 2006a) , and second, theoretical network models that focus on understanding the impact of specific network properties on outbreak threshold, final epidemic size or prevalence level and the efficacy of control measures (Keeling 1999 (Keeling , 2005 Kiss et al. 2005 Kiss et al. , 2008 May and Lloyd 2001) . The latter models are used to establish some general principles while the former are driven, at least partially, by the real-time predictive modelling of human (SARS, Hufnagel et al. 2004; Lipsitch et al. 2003; Meyers et al. 2005 ; and the current swine-flu outbreak, Smith et al. 2009 ) and animal disease (foot-and-mouth disease, Ferguson et al. 2001; Kao et al. 2006; Avian Influenza, Dent et al. 2008) outbreaks.

Modelling disease transmission in finite populations using stochastic simulations alone poses many challenges. For example, to tease apart rare events from average system behaviour many replicate simulations are needed. A more rigorous modelling alternative as discussed by Keeling and Ross (2008) is the use of Markov chains (Kemeny and Snell 1976) where the future state of the population depends on current state only (memoryless process) and the probability of the population being in any of the possible states at a given time is determined by a system of linear differential equations that can be constructed based on the rates of transition between states. In this paper, using a continuous time Markov chain (CTMC), we start from the exact formulation of a simple epidemic model on an arbitrary contact network. We rigorously derive and prove some known results that were previously mainly justified directly from biological hypotheses rather than being deduced from an exact system of equations (Rand 1999) . We revisit lumping (see Appendix A in Heesterbeek 2000 and Filliger and Hongler 2008; Jacobi and Görnerup 2009) and illustrate how graph automorphisms can be used to guide the lumping process which leads to a significant reduction in the large number of equations (2 N − 1 for a model with two disease states, such as the SIS model, on a network with N nodes) of the original exact system. For a special class of graphs, we illustrate how lumping can be directly inferred from graph automorphisms and we combine lumping on a complete graph with simple mathematical techniques to show that, in the limit of large networks, the solution of the exact Markov chain model converges to that of the ODE-based mean-field model. In addition, we discuss the advantages, disadvantages and possible applications of exact epidemic models and lumping.

The S I S type dynamics (Anderson and May 1991) is considered on a network with N nodes and with adjacency matrix G = (g i j ) i, j=1,2,...,N ∈ {0, 1} N 2 where g i j = 1 if node i and j are connected, and g i j = 0 otherwise. Here, we only consider networks with bi-directional edges and without self-loops. This requirement translates to the following two constraints upon G: G = G T and g ii = 0. The dynamics of transmission has two key stages: transmission of the disease and recovery of infectious individuals. Infection starting from an initial index case is transmitted at rate τ across every (S, I ) edge. This is followed by the recovery of infectious individuals at rate γ . Upon recovery, infectious individuals become susceptible again. Both infection and recovery are modelled as independent Poisson processes. For example, in a small time interval δt, a susceptible individual with k I infectious neighbours becomes infected with probability 1 − exp(−k I τ δt). Similarly, 1 − exp(−γ δt) represents the recovery probability of a single infectious individual, and this is independent of the status of its neighbours.

At any point in time, every node can be either susceptible (S) or infected and infectious (I ), and hence, the state of the system is given by a vector of length N with all entries either S or I (or zero and one). The state space of system is given by S = {S, I } N or S = {0, 1} N and the transmission dynamics on the network can be formulated in terms of a transition matrix between all possible states. In the case of continuous time, this matrix is also known as the infinitesimal generator matrix (Brauer et al. 2008; Daley and Gani 1999; Kemeny and Snell 1976) . Based on the generator matrix, it is straightforward to write down the Kolmogorov differential equations that uniquely determine the probability of the system being in a particular state at a given time (Brauer et al. 2008; Daley and Gani 1999) . However, in practice even for small network size, this approach becomes unfeasible due to the large number of equations (i.e. 2 N − 1). As detailed in the introduction, using various techniques, it is possible to reduce the number of states (i.e. number of equations) and derive models that are either equivalent to the original system or are approximations that in the limit of large graphs (i.e. N → ∞) become exact.

The 2 N elements of the state space can be conveniently divided into N + 1 subsets as follows: (a) S 0 is the subset with one single element, namely the state with all nodes susceptible: S 0 = (S, S, . . . , S), (b) S k is the subset of all states with k infected nodes (on a graph with N nodes, k infected nodes can be arranged in N k distinct configurations), and (c) S N is the subset with one single element, namely the state with all nodes infected: S N = (I, I, . . . , I ). The elements of the subset S k are denoted by S k 1 , S k 2 , . . . , S k c k , where c k = N k . The status of the lth node of state S k j will be denoted by S k j (l), thus S k j (l) = S or S k j (l) = I . The state of the system can change in two ways:

-A node becoming infected: an S node becomes an I node, that is an S k j → S k+1 i type transition, where j and i are chosen such that ∃ l for which S k j (l) = S, S k+1 i (l) = I , and S k j (m) = S k+1 i (m) for ∀m = l. Moreover, ∃ r = l such that S k j (r ) = I and g lr = 1 (i.e. there is an (S, I ) type edge between nodes labelled l and r ).

-The recovery of a node: an I node becomes an S node, that is an S k j → S k−1 i type transition, where j and i are chosen such that ∃ l for which S k

This means that states S k j and S k−1 i may differ only at one position, this is the l'th position.

The evolution in the state space can be described by a continuous time Markov-process. Let us denote the probability of the system being in state S k j at time t by X k j (t). Let

be a c k -dimensional vector for k = 0, 1, . . . , N . The above transitions determine the Kolmogorov equations (i.e. a system of linear differential equations) for the probability functions X k j (t). In the general case of an arbitrary graph, the Kolmogorov equations can be written in terms of a matrix with the following block tridiagonal forṁ

where A 0 and C N are zero matrices. Thus Eq. 1 can be written in the following forṁ

We note that often the transpose of P is used, and then X is a row vector, not a column. However, here we use this formulation since it is more convenient from a dynamical system point of view. The A k matrices capture the infection while the C k matrices describe the recovery process. These matrices depend on the structure of the network, and the transmission and recovery rates. The dimension and the entries of these matrices can be obtained using some simple bookkeeping rules.

The entry in the ith row and jth column of the matrix A k is denoted by A k i, j . This element gives the rate of transition from S k−1 j to S k i . In the S k−1 class there are c k−1 elements, and in the S k class there are c k elements, hence matrix A k has c k rows and c k−1 columns. The entry A k i, j is non-zero only in the case when the states S k−1 j and S k i differ at one position, say at position l and S k−1 j (l) = S, S k i (l) = I , and S k−1 j (m) = S k i (m) for ∀m = l. Moreover, we also require that there ∃ r = l such that S k−1 j (r ) = I and g lr = 1 (i.e. there is an (S, I ) type edge between nodes labelled l and by r ). In this case

In order to better understand the role of the A k matrix let us consider a state S k−1 j and choose an index l such that the node at position l is S (i.e. S k−1 j (l) = S). Next, find an index i ∈ {1, 2, . . . , c k } such that states S k−1 j and S k i only differ at position l, that is S k i (l) = I and S k−1 j (m) = S k i (m) for ∀m = l. Let us denote by q the number of nodes in state S k−1 j which are I and are connected to node l which is S. Then A k i, j = τ q. Repeating this for all l ∈ {1, 2, . . . , N }, such that S k−1 j (l) = S, it can be observed that the total number of (S, I ) edges in the S k−1 j state multiplied by τ is equal to the sum of the elements in the jth column of matrix A k , that is for ∀ j ∈ {1, 2, . . . , c k−1 } the following equality holds

where N S I (S k−1 j ) denotes the number of (S, I ) edges in state S k−1 j . The entry in the ith row and jth column of matrix C k is denoted by C k i, j . This element gives the rate of transition from S k+1 j to S k i . In the S k+1 class there are c k+1 elements, and in the S k class there are c k elements, hence matrix C k has c k rows and c k+1 columns. The entry C k i, j is non-zero only in the case when the states S k+1 j and S k i differ at one position, say at position l such that S k+1

In this case C k i, j = γ . In state S k+1 j , k + 1 nodes of the graph are in state I , hence in the jth column of matrix C k there are k +1 entries that are equal to γ and the remaining entries are zero. Hence, for all j ∈ {1, 2, . . . , c k+1 } the following equality holds

The matrix B k is a diagonal matrix with the number of rows and columns equal to c k . This is due to B k only accounting for the rate of the S k i → S k j type transitions, where the rate of a transition from S k i to S k j is equal zero if i = j. The diagonal elements of B k are determined such that the sum of entries in each column of P sum to zero. This gives the following expression for the entries of B,

In the Appendix, the rules detailed above are used to derive the Kolmogorov equations for a complete graph with N = 3. We note that the above rules can be conveniently programmed in a code, using Matlab, that will automatically generate and provide numerical solution to the full set of differential equations.

The system given by Eq. 1 consists of 2 N linear differential equations that are impractical to solve or cannot be solved for large N . However, it is not always necessary to determine all probability functions, and in many situations, the expected values of the number, or proportion, of susceptible (S) and infectious (I ) nodes or individuals is equally valuable. These expected values at time t are denoted by [S](t) and [I ](t) and can be expressed as follows

There are three different approaches to determine or approximate these exact values as given by the system in Eq. 1. The simplest way is through individual-based network simulation (Keeling and Eames 2005; Kiss et al. 2005 Kiss et al. , 2008 . However, this offers little insight into the dynamics, and results obtained from simulation are difficult to generalise. An alternative is the derivation of mean-field type equations upon using pair or triple approximations (House et al. 2009; Keeling 1999; Keeling et al. 1997; Rand 1999; Sato et al. 1994; van Baalen 2000) . Finally, where appropriate, by using the technique of lumping (see Appendix A in Heesterbeek 2000 and Filliger and Hongler 2008; Jacobi and Görnerup 2009) , the 2 N equations can be reduced to a tractable number of equations that are not an approximation of the original system but an exact equivalent. The first two techniques are well known and widely used in Mathematical Epidemiology where a considerable amount of research focuses on models of disease transmission within populations with complex contact structure.

In this section, we report on various approximation techniques and for some meanfield type models we show how these can be derived directly from the Kolmogorov equations. We also formalise rigorously the validity of approximation models and discuss the limit in which approximations become exact.

The idea of deriving mean-field equations is based on the observation that [S] and [I ] are linear combinations of the probability functions in Eq. 1. Hence, based on Eq. 1, it is feasible to try to derive a new system of differential equations that will uniquely define [S] and [I ]. Introducing [S I ] as the expected value of (S, I ) pairs (directed edges), that is

we obtain the following system:

Despite this being a well-known system (Rand 1999) , we provide a proof of this Lemma because of two reasons. On one hand, this system is usually not derived from the differential equations of the Markov-process, and on the other hand, the proof illustrates the usefulness of the block tridiagonal form of the transition matrix P.

Proof Let us introduce the row matrix S k = (1 1 . . . 1) with c k columns. Then c k j=1 X k j = S k X k (here X k is a column vector). Hence from Eq. 6 we obtain

Using this notation, Eq. 5 takes the following form,

which is true for ∀i = 1, . . . c k . Thus the following equation is obtained

and this holds for ∀k = 0, 1, . . . , N . We note that, when indices are out of the relevant range (i.e. A N +1 and C −1 ) matrices should be set to zero. Differentiating [I ](t) and using Eq. 1 we obtaiṅ

Now upon using Eq. 11, we obtaiṅ

The statement for [I ](t) follows from the Proposition below. The proof for [S](t) is similar.

Proof 1. According to Eq. 4, for all j ∈ {1, 2, . . . , c k } the following equality holds

and this implies that S k−1 C k−1 = γ kS k , since the jth coordinate of the left and right-hand side are equal. 2. The second statement is a direct consequence of the first part of the Proposition and Eq. 10. 3. According to Eq. 3, for all j ∈ {1, 2, . . . , c k } the following equality holds

We note that the Lemma above holds for any network. However, the system given by Eqs. 8-9 is not closed since it contains the new variable [S I ]. This problem can be solved in two ways. First, we can aim to provide an approximation for the expected number of [S I ] pairs in terms of [S] and [I ] (i.e. pair approximation), and second, we can attempt to derive a differential equation for [S I ].

The simplest closing relation is [S I ] [S] [I ] , and this is based on the statistical independence in the state of individuals (i.e. the expected number of [S I ] pairs is equivalent to the product of the expected number of [S] and [I ]). Applying this relation in Eqs. 8-9, the well known mean-field model is obtaineḋ

Such a closure relation leads to the problem of identifying graphs for which the approximation above holds. It is known that for a large, complete graph this is a good approximation when τ = β/N for some fixed value β (see Appendix A in Heesterbeek 2000 and Kurtz 1971) . In order to formalize this statement more precisely it is useful to use the scaled variables:

In the new variables, with τ = β/N , the system given by Eqs. 12-13 takes the following formṡ

It is generally believed that, under appropriate conditions,ĩ is a good approximation of [i]. Before formalising this in a mathematically rigorous way we give details of a key aspect which seems to contradict our previous statement. This observation also justifies the need to formalise rigorously the meaning ofĩ being a good approximation of [i].

It is known, and it can also be easily derived from Eqs. 15-16, that in the case of β > γ the following holds

and in the case when β ≤ γ , this limit is zero. However, the corresponding limit for

as shown first in Picard (1965) . Nåsell (1996) 

The proof of Theorem 1, as outlined in the Appendix, is based on the fact that in the case of a complete graph the system given by Eq. 1 takes the form given in Lemma 4 below. This significant reduction in dimensionality can be obtained intuitively, or by using the more rigorous formalism of lumping as explained later in the paper. Kurtz (1970) proved that the Markov chain given by the lumped system converges in probability to the solution of the mean-field equation. This statement is stronger than Theorem 1 since it implies that not only the expected value converges to the solution of the mean-field equation but also the distribution. However, the proof of this general statement uses abstract tools, namely an extension of Trotter's operator semigroup approximation theorem. Later Kurtz (1971) formalised the statement in a more abstract setting and proved a central limit theorem using martingale theory. Our proof of Theorem 1 is an alternative proof that does not use abstract semigroup and martingale theory and it is based on ideas outlined in the Appendix A in Diekmann and Heesterbeek (2000) . 

While we will not provide a proof for the above Lemma, we believe that a proof similar to that given for Lemma 1 is possible. Such a proof would again provide a rigorous derivation from the exact epidemic model as given by Eq. 1. Rand (1999) and others (Keeling et al. 1997; Keeling 1999; van Baalen 2000) provide the derivation of the above equations based on biological hypotheses and without rigorous derivation from some form of master equation.

We can observe that in the equation for [S I ] new unknowns, the triples appear. Hence we have two possibilities again. Either derive differential equations for the triples, or approximate the triples with pairs and singles. These approximations are called closing relations. For randomly mixed and clustered homogeneous graphs, where every node has the same number of edges, the following closing relation has been proposed and has been extensively used in disease modelling

Here n is the node degree and φ is the clustering coefficient (i.e. the ratio of triangles to triples over the whole graph (Keeling 1999) It is known that closing relations depend on the structure of the graph and these relations have been discussed at length elsewhere (Keeling 1999; Keeling et al. 1997; Rand 1999; Sato et al. 1994; Sharkey 2008; van Baalen 2000) . To evaluate the performance of such closing relations, the typical approach in the literature is the following. A closing relation yielding a closed ODE system is introduced heuristically, then its validity is justified numerically by comparing the values of [S](t) and [I ](t) obtained numerically from Monte Carlo simulation to those obtained by numerically solving the ODE system. According to our knowledge, finding the exact class of graphs for which the above closing relations or others hold is an open problem. Our approach presented in this paper makes an attempt towards providing rigorous proof for the validity of some closing relations (i.e. see Theorem 1).

In this section we revisit the process of lumping linear ordinary differential equations and illustrate the link between network/graph automorphisms and lumping. For some particular graph types, and for an S I S type dynamics, we use lumping to derive exact models where the number of equations compared to the original complete system is significantly reduced. In this case, the lumped system is a reduced but exact version of the original system.

We start by defining lumping in general. Let us consider an n-dimensional linear systemẊ = AX, where A is an arbitrary n × n matrix.

The linear systemẊ = AX (or equivalently the corresponding Markov-process) is called lumpable, if there is a partition {L 1 , L 2 , . . . , L m } of the set {1, 2, . . . , n} (or equivalently of the state space) satisfying the following property.

For any j and l in {1, 2, . . . , m} there exist a number B jl such that

that is the sum does not depend on r whenever r ∈ L l . The m × m matrix B is called the lumping of matrix A.

In simple terms, lumping is equivalent to defining new variables based on additive combinations of the original variables. For example, for the S I S-type dynamics on a complete graph, the new variables in the lumped system could be defined as Y k = X k 1 + X k 2 + · · · + X k c k for ∀k = 0, 1, . . . , N . In general, the condition in the Definition above amounts to requiring that the sum of the rates of all possible transitions from any individual element of any newly defined variable to all individual elements (rates of transmission to some elements might be zero) of another specified new variable should be the same. To formalise lumping and to derive the precise grouping of the original variables, as given by the partition of the state space, the following two useful propositions are given.

Proof Let us define matrix T by T ji = 1 when i ∈ L j , and let all the other entries of T be zero. Then the element in the jth row and r th column of T A is

where l is the index for which r ∈ L l .

The element in the jth row and r th column of BT is

where l is the index such that r ∈ L l given that in every column of T there is a single entry which is 1, the others are zero. It is worth noting that the crucial ingredient for lumping is the partition of the state space. This can either be derived intuitively or attempting to search for partitions that satisfy the property stated in the Definition of lumping. Once the partition is known both B and T follow. The lumpability condition on P is a non-trivial requirement and not all systems will be lumpable.

In the previous section the lumping procedure for a general linear ODE was defined and detailed. We now consider the linear system given by Eq. 1, where matrix P has a special block tridiagonal structure. In the previous general case, the state space was given as the set {1, 2, . . . , n}. For the epidemic model, the state space has 2 N elements but instead of the set {1, 2, . . . , 2 N }, we denote the state space by S as given in Sect. 2. The reason for this notation is motivated by the special structure of the state space S = ∪S k which is worth conserving throughout the lumping process. This is essential when using the solution of the lumped systemẋ = Bx to determine the expected values [S](t) and [I ](t). There are choices of lumping such that the lumped systemẋ = Bx cannot be used to determine [S](t) and [I ](t). For example, the trivial lumping, when T is a row matrix full of ones, is not a useful lumping. In this case, all variables are lumped into one single variable and the lumping of matrix A (i.e. B) will be a single row and single column matrix with a single zero entry. This just shows that the system is closed and that at any one point in time, the probabilities of being in any of the specified 2 N states sum to one. In order to use the lumped system to determine [S](t) and [I ](t), the states with different number of I nodes cannot be lumped together. Hence we will use the following more restrictive definition of lumping.

Definition 2 The linear system given by Eq. 1 (or equivalently the corresponding Markov-process) is called lumpable, if there is a partition {L 1 , L 2 , . . . , L m } of the state space S satisfying the following two properties.

1. For any l there exist k, such that L l ⊆ S k . 2. For any j and l there exist a number Q jl such that

that is the sum does not depend on r whenever r ∈ L l .

The m × m matrix Q is called the lumping of matrix P.

We note that in Part 1 of the Definition, the state space is considered to be S, however in Part 2, the state space is given by the set {1, 2, . . . , 2 N }. In order to avoid this ambivalence we will introduce the following more detailed notation of the lumping partition or classes. The lumping classes that are subsets of S k will be denoted by L k 1 , L k 2 , . . . , L k l k with the condition that S k = L k 1 ∪ L k 2 ∪ · · · ∪ L k l k , and the total number of lumping classes is m = l 0 + l 1 + · · · + l N . It is worth noting that for any arbitrary lumping l 0 = l N = 1 since the classes S 0 and S N contain a single state or element. Using this new notation for the lumping classes, the first assumption of the Definition is automatically fulfilled and the second assumption can be expressed in terms of matrices A k and C k as follows. Eq. 1 is lumpable, if for ∀k ∈ {0, 1, 2, . . . , N } there is a partition L k 1 , L k 2 , . . . , L k l k of the set S k satisfying the following properties. For any p ∈ {1, 2, . . . , l k−1 } and r ∈ {1, 2, . . . , l k } there exist a number A k r p such that

that is the sum does not depend on j whenever S k−1 j ∈ L k−1 p . For any p ∈ {1, 2, . . . , l k+1 } and r ∈ {1, 2, . . . , l k } there exist a number C k r p such that

that is the sum does not depend on j whenever S k+1 j ∈ L k+1 p .

Proof As mentioned above we only have to prove that the second condition in the definition of lumping is fulfilled. Let us take two lumping classes L j and L l . According to the new ordering of the lumping classes, there exists k ∈ {0, 1, 2, . . . , N } and r ∈ {1, 2, . . . , l k } such that L j = L k r and there exist h ∈ {0, 1, 2, . . . , N } and p ∈ {1, 2, . . . , l h } such that L l = L h p . Because of the tridiagonal structure of matrix P and based on Eq. 1, there are four possible cases: h = k − 1, h = k, h = k + 1 or h has a different value. For all four cases, we will separately prove that Eq. 22, given in the definition of lumping, holds. In the case of h = k − 1, the part of matrix P corresponding to the index set in Eq. 22 is part of matrix A k , hence Eq. 22 is equivalent to Eq. 23. In the case of h = k + 1 the part of matrix P corresponding to the index set in Eq. 22 is part of matrix C k , hence Eq. 22 is equivalent to Eq. 24. In the case of h = k the part of matrix P corresponding to the index set in Eq. 22 is part of matrix B k , hence Eq. 22 is equivalent to the following. For any p ∈ {1, 2, . . . , l k } and r ∈ {1, 2, . . . , l k } there exist a number B k r p such that

This equation is a consequence of Eqs. 23 and 24, and taking into consideration Eq. 11 (i.e. the sum of every column of P is zero). Finally, if h = k − 1, h = k and h = k + 1, then the part of matrix P corresponding to the index set in Eq. 22 has only zero entries and hence, Eq. 22 obviously holds. It is worth noting that the analysis above is based on considering all transitions that takes the system to a state that belongs to S k (i.e. through an infection from S k−1 , through a recovery from S k+1 , nothing happening and zero rate transitions from S l , when l ∈ {0, 1, 2, . . . , N }\{k − 1, k, k + 1}).

Before formulating one of our main results, we recall the Definition of a graph automorphism and the automorphism group of the graph. and only if ( (x), (y) ) ∈ E(G) is an automorphism of graph G. The set of all automorphisms of G, under the composition of maps, forms the automorphism group denoted by Aut (G) (Yap 1986 ).

For graphs with labelled vertices (i.e. nodes or vertices labelled with S or I ), a graph automorphism needs to also conserve the labelling of the nodes in the sense that for ∀x, y ∈ V (G) it must be that the state of nodes x and (x), and y and (y) are the same. We will say that the automorphism takes the state S k i to the state S k j if S k i (l) = S k j ( (l)). Now, we can formulate our main results that connects the automorphism group of the graph to the lumping of the Markov chain.

Proof Let us denote the lumping classes as in Lemma 3. We will prove that Eq. 23 holds. The proof of Eq. 24 is similar, hence it is not detailed here. Let k ∈ {0, 1, 2, . . . , N }, p ∈ {1, 2, . . . , l k−1 } and r ∈ {1, 2, . . . , l k } be arbitrary numbers. Let S k i 1 ∈ L k r , S k−1 j 1 ∈ L k−1 p and let z = A k i 1 j 1 = 0. This means that the states S k i 1 and S k−1 j 1 differ only at one node, say at node numbered by u, and S k i 1 (u) = I, S k−1 j 1 (u) = S. Moreover, according to Eq. 2 the number of I nodes in the state S k−1 j 1 that are connected to the node numbered by u is equal to z/τ (see the definition of matrix A). Now let us consider another element in the lumping class L k−1 p , that is let S k−1 j 2 ∈ L k−1 p . Then there is an automorphism of the graph that takes the state S k−1 j 1 into the state S k−1 j 2 . Let us apply this automorphism to the state S k i 1 . Then we get the state S k i 2 ∈ L k r . This means that the relation between the states S k i 2 and S k−1 j 2 is the same as the relation of the states S k i 1 and S k−1 j 1 . Hence we have A k i 2 j 2 = z. Thus, if there is a non zero element z in the j 1 th column of the matrix A k in a row belonging to the lumping class L k r , then this element z appears also in the j 2 th column of the matrix A k in a (possibly different) row belonging to the same lumping class L k r , which proves Eq. 23.

This Theorem proves that graph automorphisms can be used to identify states that can be lumped together. This can be formalised further using group theory arguments. In the case when the automorphism group (i.e. Aut (G)) is known, finding the lumping classes is equivalent to working out the orbits of the elements of the state space with respect to the group Aut (G). For example, the orbit of an element S k j ∈ S k is given by the set

The orbit of the element will contain all the individual states that can be lumped together and thus defines a lumping class. Finding further lumping classes can be done by working out the orbits of elements in S k that are not yet part of any previously computed orbits. The number of elements in an orbit is bounded from above by the number of elements in the automorphism group (i.e. |Aut (G)|). Thus, the number of elements in a lumping class is also bounded from above by |Aut (G)| which in turn means that the number of lumping classes is bounded from below by 2 N |Aut (G)| . It is worth noting that this is a weak lower bound and it is only meaningful when 2 N > |Aut (G)| (this is not the case for a completely connected graph). When the inequality holds, the lower bound can serve as an indicator of the effectiveness of lumping in reducing the dimensionality of the original exact system.

In this section we show some applications of the above theorem and show how the lumping can be carried out when the graph has some special structure.

First we show that for a complete graph the 2 N -dimensional system given by Eq. 1 can be lumped to an N + 1-dimensional system. The lumped system is well-known in the literature, however, according to our knowledge, its derivation from the full 2 N -dimensional system is not available.

The automorphism group of the complete graph is the permutation group S N , hence there is an automorphism between any two states in S l . Hence, the orbit of any element from S l is the same and is equal to S l itself. Therefore all the states with l infected nodes can be lumped together. This means that there are N + 1 lumping classes: L l = S l for ∀l ∈ {0, 1, . . . , N }. That is we have m = N + 1 and the lumping matrix T takes the form 1, 1, . . . , 1) is a vector of length c k . Thus the new (scalar valued) unknown functions are introduced as

Lemma 4 If G is a complete graph, then the x k functions satisfy the following differential equations.

for k = 1, 2, . . . , N − 1.

Proof The system for X k in Eq. 1 consists of c k equations. Adding these equations we obtainẋ

Now the statement follows from Proposition 1 and from the Proposition below.

Proof 1. In a complete graph for ∀ j ∈ {1, 2, . . . , c k } we have that N S I (S k j ) = k(N − k) since every node in state I is connected to every other node in state S. Hence according to Eq. 3, for ∀ j ∈ {1, 2, . . . , c k−1 } the following relation holds

This proves that S k A k = (k − 1)(N − k + 1)τ S k−1 since the jth coordinate of the left and right-hand sides are equal. 2. The second statement follows from Proposition 1 and from Eq. 11.

In the Appendix, the lumping process is derived and explained for a complete graph with N = 3 nodes.

Let us consider a star like graph with N nodes. In this graph a single central node is connected to all other nodes with no further connections. Let the N th node be the center of the star. Thus, for example SS . . . S I denotes the state when the central node is infected and the other nodes are susceptible. We will show that in the case of a star graph, the 2 N -dimensional system as defined by Eq. 1 can be lumped to a 2N -dimensional system.

The automorphism group of the star graph is the permutation group S N−1 , since an automorphism leaves the central node unchanged and it can permutate the remaining N − 1 nodes in an arbitrary way. However, the labelling of the nodes have to be taken into account. Therefore two states can be taken into each other by an automorphism if and only if the center is in the same state and the number of I 's among the non-central nodes is the same. Hence, for l = 1, 2, . . . , N −1 the set S l of states of the graph, such that there are l nodes in state I , can be lumped into two classes: in the first class the central node is I and there are l −1 non-central I nodes, in the second class the central node is S and there are l non-central I nodes. In the case of l = 0 and l = N there is obviously only one class. This means that altogether there are 2N = 2(N − 1) + 2 lumping classes:

Thus the lumped variables can be introduced as

for k = 1, . . . , N − 1. Similarly to Lemma 4, we can prove the following Lemma,

If G is a star graph, then the x k i functions satisfy the following differential equations.

for k = 1, 2, . . . , N − 1, where the following notations are used:

Let us consider the simplest graph with a so-called household structure. This graph consists of two types of nodes, inner and outer nodes. Outer nodes have only within household connections while inner nodes have within household as well as connections to other households. We consider the simplest case where each household has two nodes, an inner and outer node. The inner nodes of all households form a complete graph with N /2 nodes (N is an even number), and every outer node is connected to an inner node. Thus the degree of all inner nodes is N 2 and the degree of all outer nodes is 1. We will show that in the case of this household-type graph the 2 N -dimensional system given by Eq. 1 can be lumped to an N /2+3 3 -dimensional system. The automorphism group of this graph is the permutation group S N/2 , since an automorphism can permutate the inner nodes in an arbitrary way, and once the automorphism is given on the inner nodes, its effect on the outer nodes is determined uniquely. In order to determine the lumping classes let us note first that there may be four different types of households in this graph: S I households, in which the inner node is S and the outer node is I, I S households, SS households and I I households. Therefore two states of the whole graph can commute through an automorphism if and only if the number of S I, I S, SS and I I type households is the same in the two distinct states of the complete system. Hence, to obtain all different states we have to choose (with repetition) N /2 households out of the four different types. Thus, the number of different states (i.e. states where the number of SS, S I, I S and I I -type households is different but their distribution can be arbitrary) can be obtained as the number of combinations with repetitions: 4+N /2−1

3 . Hence, states with the same number of SS, S I, I S and I I -type households can be lumped into one newly defined lumped variable.

For the completely connected and star graph, lumping can also be carried out intuitively. However, lumping based on intuition alone can become unwieldy even for a relatively simple graph such as the cycle graph (C N ) where N nodes are connected in a close chain such that each node connects to the two nearest neighbours only. The automorphisms of the cycle graph can be given in terms of all possible rotations and reflections of the graph. These together, N of each, give 2N automorphisms, and hence |Aut (C N )| = 2N . The automorphism group of the cycle graph is also known as the dihedral group D N . Carrying out lumping based on intuition alone is prone to error and it is desirable to use the automorphism group to work out lumping classes rigorously. For small graphs, these calculations can be done on paper or for larger graphs calculations can be easily carried out using some suitable software or programming language. To illustrate the process, let us consider the case of a cycle graph with N = 5 nodes. Here, |D 5 | = 10 is made up of five rotations including the identity and five reflections. The lumping classes L i (i = 1, 2, . . . , m), with m yet to be determined, have to be such that for ∀i = 1, 2, . . . , m, ∃ k ∈ {0, 1, . . . , N } such that L i ⊆ S k (as required by Definition 2). Based on this the first lumping class is trivial L 1 = {(SSSSS)}. This is followed by considering Based on similar argument four more lumping classes can be identified. This means that the original system with 2 5 = 32 equations can be reduced to a system with only 8 equations.

Using similar arguments, it is easy to show that for N = 6, N = 7 and N = 8 the exact system can be lumped from 64, 128 and 256 to 13, 18 and 30 equations, respectively. Given that for the cycle graph 2 N > |Aut (G)| = 2N , the argument presented in Sect. 4.2 can be used to show that the number of lumping classes for the cycle graph is bounded from below by 2 N 2N = 2 N −1 (1/N ) . This indicates that the number of equations in the lumped system is much larger than polynomial in N .

In this paper, using a continuous time Markov chain, we have formulated an SIS type epidemic model on an arbitrary graph and discussed various approximation methods that allow us to reduce the number of equations from 2 N − 1 to a feasible number that can either be solved analytically and/or numerically. Exploring the special block tridiagonal nature of the transition matrix P, we have rigorously derived mean-field approximations that previously have only been derived heuristically based on biological hypotheses. The key result of the paper is the precise formulation of lumping in terms of graph automorphisms and their group. Lumping is a powerful method that can be used to reduce large linear ODE models to a tractable number of equations which are still exact and can be used to evaluate the time evolution of quantities of interest as required by the specific modelling context. For example, Table 1 illustrates that for graphs with a high number of automorphisms, the exact equations given by Eq. 1 can be more effectively lumped into fewer equations that are easier to analyse.

For the complete and star graph, the time evolution of [i] = [I ]/N is illustrated in Fig. 1 . This plot is based on the lumped system which is used to compute the expected level of prevalence over time. This is in contrast with many such results which are either based on some form of mean-field equations or individual-based network simulations. Figure 1 shows that the complete graph allows the infection to spread very quickly to a high proportion of nodes. For the star graph, the spread is much slower and only a much smaller proportion of the network will become infected. Here, the difference between prevalence levels can be mainly explained by the marked difference in average connectivity, N +2 4 for the complete graph versus 2 − 2/N for the star graph. The use of lumping enabled us to compare the disease spread on different graphs and to investigate the effect of the structure of the graph on the spread of the infection. Based on mean-field approximations, equations for the prevalence level can be derived. For the complete and star graph, these read as follows: It is worth noting that even for a small graph with N = 50, the numerical values of the exact solution [i](t) and that of the mean-field approximationĩ(t) are close to each other. In the case of a complete graph, the quasi steady state of [i] is 0.9796 and the steady state ofĩ is lim t→∞ĩcomp (t) = 1 − γ N τ = 0.9800, when γ = 0.1 and τ = 0.1. In the case of a star graph with the same parameters, these values are 0.4967 versus 0.4992. In Fig. 2 , we show that even for small networks N = 20, results from the mean-field and exact models agree well, with a small further increase in N making the two almost indistinguishable. This is somewhat in contrast with the general belief that mean-field and pair-approximation models become exact in the limit of large graphs. Due to their great flexibility network models are used widely in many different areas ranging from neuro-networks to evolutionary dynamics. For example, Broom and Rychtář (2008) have recently considered the fixation probability of a mutant on special classes of non-directed graphs and have shown that the total number of mutantresident formations is proportional to the inverse of the cardinality of Aut (G) (i.e. |Aut (G)|). In this paper, in line with results derived by Broom and Rychtář (2008) , we have also illustrated how the richness of the automorphism group impacts on the reduction in dimensionality of the exact system (see Table 1 ). For graphs with a high number of automorphisms, such as completely connected or star graphs, the 2 N equations of the exact system can be reduced to O(N ). It is worth noting that for a more complex dynamics such as an S I R model, the number of equations in the reduced system on a completely connected graph will be of O(N 2 ). While for graphs with less symmetry (i.e. smaller number of automorphisms) the reduction in dimensionality will not be as significant, the use of automorphism-driven lumping still provides a rigorous way to identify the lumping classes. Even though constructing the automorphism group of an arbitrary graph is a challenging problem and belongs to the class NP of computational complexity (Brandes and Erlebach 2005) , there are many special classes of graphs which are both relevant from an application point of view and possess a computable or known automorphism group. More importantly exact models for small populations such as households, hospital wards, schools, workplaces and small holding farms can play an important role in accounting correctly for stochastic effects that are more important in small populations. Therefore, exploiting the structure of the graph is key when formulating models where the derivation of analytical results is needed to strengthen or confirm simulation results.

It is not often that mean-field and pair-approximation models are compared directly to the solution of the exact system (Keeling and Ross 2008) and, at least for small or large graphs with special structure, graph automorphism driven lumping can be used to rigorously test the performance of various approximation models. These models can both simplify calculations and underpin the derivation of some general theoretical results. Hence, improving the testing and validation of such models is an important part of the modelling exercise in many areas such as epidemiology, ecology and other areas where network models are used. This study also links well-known and well-studied problems in graph theory (Gross and Yellen 2003; Yap 1986 ) (e.g. graph automorphisms and their group) to real-life applications and we foresee that other results from graph theory will become applicable in the context of network model formulation and analysis.

6.1 Kolmogorov equations in the case of a complete graph with N = 3

The aim is to derive the full set of Kolmogorov differential equations using the special block tridiagonal matrix form introduced in Sect. 2.2.1. This small graph has a state space with 2 3 elements, and following previously introduced notation, this can divided in the following subsets, X 0 = X SSS , X 1 = (X SS I , X S I S , X I SS ), X 2 = (X S I I , X I SI , X I I S ), X 3 = X I I I with X = (X 0 , X 1 , X 2 , X 3 ). For N = 3, P is given by

Taking into account the structure of the network (i.e. each node connected to every other node) and using the bookkeeping rules presented in Sect. 2.2.1, the entries of matrix P are given by

For example, here the entries in the first column of matrix A 2 (τ, τ, 0) correspond to the rates of the following transitions, SS I → S I I, SS I → I SI and SS I → I I S. Given that our graph is a completely connected triangle with starting state SS I , the transition through an infection always happens at rate τ since there is only one source of infection. Based on the definition of the sub-matrices above,Ẋ = P X is equivalent to the following systemẊ 0 = B 0 X 0 + C 0 X 1 ,

In terms of the most intuitive notation, this system is equivalent tȯ X SSS = γ (X SS I + X S I S + X I SS ), and from either formulation, the transition matrix is

6.2 Lumping in the case of a complete graph with N = 3

Here, we show how lumping is carried out in the case of a complete graph with three nodes (N = 3). In this case, the differential equations for X 0 , X 1 , X 2 , X 3 are given above. Let us introduce the new (scalar valued) unknown functions

By adding the differential equations corresponding to X 1 1 , X 1 2 , X 1 3 , and those corresponding to X 2 1 , X 2 2 , X 2 3 , we obtain the following four dimensional system for x 0 , x 1 , x 2 , x 3 :ẋ 0 = γ x 1 ,

It is easy to see that matrix Q, as defined in Eq. 22, is given by

with entries based on the sum of columns in matrices A k , B k and C k as they appear in the original matrix P. For example, if only the x 1 → x 2 transition is considered, the original differential equations reduce tȯ

Our aim here is to investigate the limit of large N and compare these to the solution of the scaled mean-field equations as given by Eqs. 15-16.

The idea of comparison for large N is to introduce a continuous, time dependent density function ρ(t, z) instead of the discrete distribution x k (t), with the following formal relation, z = k/N . Following this, in Eq. 26 we can formally changeẋ k to ∂ t ρ(t, z), x k (t) to ρ(t, z), x k−1 (t) to ρ(t, z − 1/N ) and x k+1 (t) to ρ(t, z + 1/N ). This leads to the following partial differential equation ∂ t ρ(t, z) = (N z + 1)γρ(t, z + 1/N ) + (N z − 1)(N − N z + 1)τρ(t, z − 1/N ) −(N z(N − N z)τ + N zγ )ρ(t, z). Now using the approximations ρ(t, z + 1/N ) = ρ(t, z) + ∂ z ρ(t, z)/N , ρ(t, z − 1/N ) = ρ(t, z) − ∂ z ρ(t, z)/N , and after some algebra we obtain ∂ t ρ(t, z) = (N z + 1)γ ∂ z ρ(t, z)/N + (2N z − N − 1)τρ(t, z) −(N z − 1)(N − N z + 1)τ ∂ z ρ(t, z)/N + γρ(t, z).

Substituting τ = β/N , neglecting the 1/N and 1/N 2 terms and writing ρ instead of ρ(t, z), we obtain the following first order partial differential equation for ρ ∂ t ρ = zγ ∂ z ρ + (2z − 1)βρ − z(1 − z)β∂ z ρ + γρ.

Introducing the function g(z) = γ z − βz(1 − z), the equation for ρ becomes ∂ t ρ = ∂ z (gρ).

This first order partial differential equation needs an initial condition of the following type ρ(0, z) = ρ 0 (z).

The desired initial condition can be obtained from the initial condition of Eq. 26. This latter initial condition can be written as

x m (0) = 1, for some m, x l (0) = 0, for l = m,

that is at the initial instant there are m infected nodes. Since the formal relation between the variables is z = k/N , the above initial condition yields ρ 0 (z) = 1 for m N < z < m + 1 N and ρ 0 (z) = 0 otherwise.

Finally, we want to determine the expected value of the infected and susceptible nodes from the first order PDE. Thus we have to find the functions corresponding to [s](t) and

[i](t) in Eq. 27. Using z = k/N and changing the term x k (t) to ρ(t, z), we note that the sums in Eq. 27 correspond to integrals. Namely, [i](t) corresponds to

and this sum is an approximation of the integral ( 31) The mean-field equation (Eq. 16) can be solved explicitly and the solution is given bỹ

where i 0 =ĩ(0) is the initial condition and

The first order PDE (Eq. 28) can also be solved explicitly, and then Eq. 31 yields

.

Having these explicit formulas for i * (t) andĩ(t), it is easy to see that i * is not a solution of the mean-field equation (Eq. 16) but it can be proved that as N → ∞ it tends to the solution of Eq. 16. Namely, we have the following Lemma.

Lemma 6 Let ρ be the solution of the system given by Eq. 28 with initial condition given by Eq. 29. Let i * (t) be defined by Eq. 31. Letĩ(t) be the solution of the scaled mean-field equation given by Eq. 16 with initial conditionĩ(0) = m/N . Then for any t ≥ 0 we have The Lemma can be proved by using the explicit formulas for i * (t) andĩ(t).

Now the proof of the Theorem can be concluded as follows. We want to prove that the scaled expected value [i](t) tends to the solutionĩ(t) of the scaled mean-field equation as N → ∞. In order to prove this, we introduced a first order PDE that can be considered the limit of Eq. 26 as N → ∞. Using this PDE, we defined the function i * (t) that corresponds to [i](t). According to Lemma 6, i * (t) is close toĩ(t) for large N . Hence, we only have to show finally that [i](t) is close to i * (t). Thus the proof of Theorem 1 will be complete if the following Lemma is verified.

Lemma 7 Let x k be the solution of Eq. 26 satisfying the initial condition given by Eq. 30, and let ρ be the solution of Eq. 28 with initial condition given by Eq. 29. Let [i](t) and i * (t) be defined by Eq. 27 and by Eq. 31. Then for any t ≥ 0 we have

The proof of the Lemma is based on the fact that the lumped system seen in Eq. 26 can be considered as the discretisation of the first order PDE seen in Eq. 28 in the variable z. It is known even for more general PDEs, see Chaps. 3 and 4 in Hundsdorfer and Verwer (2003) , that the solution of the discretised system tends to that of the PDE as the step size of the discretisation goes to zero, that is in our case N tends to infinity.

A threshold limit theorem for the stochastic logistic epidemic

Epidemics with two levels of mixing

Network analysis: methodological foundations

An analysis of the fixation probability of a mutant on special classes of non-directed graphs

Contact structures in the poultry industry in Great Britain: exploring transmission routes for a potential avian influenza virus epidemic

Mathematical epidemiology of infectious diseases: model building, analysis and interpretation

A deterministic epidemic model taking account of repeated contacts between the same individuals

The foot-and mouth epidemic in Great Britain: pattern of spread and impact of interventions

Lumping complex networks

Modelling the initial spread of foot-and-mouth disease through animal movements

A motif-based approach to network epidemics

Forecast and control of epidemics in a globalized world

A spectral method for aggregating variables in linear dynamical systems with application to cellular automata renormalization

Demographic structure and pathogen dynamics on the network of livestock movements in Great Britain

The effects of local spatial structure on epidemiological invasions

The implications of network structure for epidemic dynamics

Networks and epidemic models

On methods for studying stochastic disease dynamics

Correlation models for childhood epidemics

Network-based analysis of stochastic SIR epidemic models with random and proportionate mixing

A contribution to the mathematical study of epidemics

Disease contact tracing in random and clustered networks

The network of sheep movements within Great Britain: network properties and their implications for infectious disease spread

The effect of network heterogeneity and multiple routes of transmission on final epidemic size

The effect of network mixing patterns on epidemic dynamics and the efficacy of disease contact tracing

Solutions of ordinary differential equations as limits of pure jump Markov processes

Limit theorems for sequences of jump Markov processes approximating ordinary differential processes

Transmission dynamics and control of severe acute respiratory syndrome

Infection dynamics on scale-free networks

Network theory and SARS: predicting outbreak diversity

The quasi-stationary distribution of the closed endemic SIS model

The spread of epidemic disease on networks

Sur les modèles stochastique logistiques en démographie

Correlation equations for spatial ecologies. In: McGlade J (ed) Advanced ecological theory

Pathogen invasion and host extinction in lattice structured populations

Deterministic epidemiological models at the individual level

Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic

Pair approximations for different spatial geometries

Some topics in graph theory

We thank the referees for their careful consideration of our paper and for their constructive comments. Michael Taylor acknowledges full support from EPSRC. Istvan Z. Kiss acknowledges support from EPSRC (EP/H001085/1) and LMS (4901). Péter L. Simon acknowledges support from LMS (4901).