key: cord-272759-dqkjofw2 authors: Small, Michael; Tse, C.K.; Walker, David M. title: Super-spreaders and the rate of transmission of the SARS virus date: 2006-03-15 journal: Physica D DOI: 10.1016/j.physd.2006.01.021 sha: doc_id: 272759 cord_uid: dqkjofw2 We describe a stochastic small-world network model of transmission of the SARS virus. Unlike the standard Susceptible-Infected-Removed models of disease transmission, our model exhibits both geographically localised outbreaks and “super-spreaders”. Moreover, the combination of localised and long range links allows for more accurate modelling of partial isolation and various public health policies. From this model, we derive an expression for the probability of a widespread outbreak and a condition to ensure that the epidemic is controlled. Moreover, multiple simulations are used to make predictions of the likelihood of various eventual scenarios for fixed initial conditions. The main conclusions of this study are: (i) “super-spreaders” may occur even if the infectiousness of all infected individuals is constant; (ii) consistent with previous reports, extended exposure time beyond 3–5 days (i.e. significant nosocomial transmission) was the key factor in the severity of the SARS outbreak in Hong Kong; and, (iii) the spread of SARS can be effectively controlled by either limiting long range links (imposing a partial quarantine) or enforcing rapid hospitalisation and isolation of symptomatic individuals. The SARS virus first appeared during October 2002 in the Guangdong province of southern China. It then passed over the border to Hong Kong and from there spread to Europe, Africa, Asia, Australia and the Americas [1] . The outbreak in 2003 infected 8422, killing 916 [2] . In this paper we focus on modelling the transmission of SARS within Hong Kong. Beside mainland China, Hong Kong suffered the greatest casualties [2] . In addition, the epidemiological data currently available for Hong Kong is far superior to that of the Chinese mainland. 1 Two characteristic features were observed during the SARS outbreak in Hong Kong in 2003 (see Fig. 1 ) [3, 4] : so-called super-spread events (SSE), in which a single individual initiates a large number of cases; and persistent transmission within the community. Two widely cited SSEs were observed early in the epidemic and have been the subject of much attention: at the Amoy Gardens housing estate and at the Prince of Wales hospital. Epidemiological studies [5, 1] have found that in Hong Kong: • the fatality rate was approximately 17% (compared to 11% globally); • the mean incubation period was 6.4 days (range 2-10 days) [6] ; • the duration between onset of symptoms and hospitalisation was 3-5 days; and, • the mean number of individuals infected by each case during the initial phase of the epidemic (excluding SSEs) was 2.7 [4] . Standard deterministic SIR (susceptible-infected-removed) models of the spread of infectious diseases [7] make several important assumptions. An alternative approach [8] , particularly popular for the study of sexually transmitted diseases [9] [10] [11] , is to build an explicit network and model disease transmission along the links. Under certain network structures, it is then possible to obtain a closed form study of the underlying transmission pattern [10, 12] . In particular, [12] studied the transmission of an epidemic among a population whose individuals are connected both locally and globally. From this, they were able to obtain an approximately Poisson distribution for the contact distribution. In [13] , Hufnagel and co-workers also studied the local and non-local transmission of epidemics. The main features modelled by [13] are local SIR infection (with the usual stochastic differential equations) and long distance complex network links to model global aviation routes. They show that transmission dynamics with similar geographical dispersion to the 2003 SARS outbreak can develop in the model. In contrast to [13] , we focus on abstract network models within a particular community. A similar global model of SARS transmission could be achieved by introducing an aviation network model on top of the models presented in this paper. Recently, both small-world (SW) and scale-free (SF) networks have been observed in many areas of natural and physical science, including social relationships [14, 15] . The important feature of both SW and SF networks is that they are highly connected: the average path length between random individuals is relatively short. Moreover, for SF networks, the node degree distribution follows a scale-free distribution. Hence, in many areas of natural and physical science, this new model structure has unveiled a rich range of behaviours. In this paper, we apply these methods to the modelling of the spread of SARS in Hong Kong; transmission is only allowed to occur along a limited number of direct links between individuals. By doing this, we will avoid making one of the assumptions underlying standard Susceptible-Infected-Removed (SIR) models: a homogeneous fully connected population. The SIR model assumes that all individuals are susceptible to the disease and all suffer an equal, small, positive probability of contracting the virus. This homogeneous model leads to a continuous and smooth inter-day distribution of infections. Irregularities in this are usually attributed to random variation and non-stationarity in the model parameters. In proposing an alternative to the standard SIR model, we do not claim that the SIR model has failed. Certainly, the power of any model lies in its simplicity, and its ability to capture the important features of a system. Hence, like the SIR model, the complex network model that we describe here is very simple and is described by a very small number of parameters. Analysis of the spread of SARS with SIR models shows good correlation between decreasing infection rate r and the introduction of various governmental control measures: quarantine and public awareness campaigns [16] . However, these results are based on a very simplistic model, and localised outbreaks (such as the incident at Amoy Gardens) are not modelled very well. While it is impossible to predict the occurrence of such outbreaks, SW-SF models provide a more realistic physical model of the relationships between individuals in a community and will therefore provide a better picture of the true disease dynamics. Note that, since one cannot actually predict when a particular SSE will occur, one cannot accurately model the timing of the peak in the time series of Fig. 1 . Moreover, it is not possible to accurately model the initial SSE at the Prince of Wales hospital, which "kick started" the SARS outbreak in Hong Kong. To include the same initial spread of SARS in Hong Kong, via a rather singular SSE, that SSE must be included explicitly in the model. In other areas where SW-SF models have been applied, a rich range of behaviours has been observed. Unlike standard differential equation based models, SW and/or SF structures model the underlying network of connections between individuals directly. Our model is designed to capture explicitly the small-world features of social interaction. Our model is not scale-free. To generate a scale-free model for disease epidemics, one needs a power-law distribution of infection links; we consider the theoretical implementation of such a model in a separate paper [16] . Finally, we will also provide one possible answer to the question posed by Galvani and May [17] : "Were SARS superspreaders anomalies, or are super-spreaders characteristic of most infectious diseases?" [17] . We show that, even with uniform rates of infection, super-spreaders will occur, to varying degrees, in a small-world or scale-free [16] network. The implication is that super-spreaders are not (necessarily) a result of variable rate of infection. Nonetheless, we argue that the SARS outbreak in Hong Kong was initiated by a single, rather unfortunate, super-spreader event. In the next section, we describe our model and study its behaviour analytically. Subsequent sections present some numerical simulations and a summary of our results. In the following subsections, we define our model structure (Section 2.1) and derive some analytical results concerning the likelihood of a widespread outbreak (Section 2.2). Our aim is to capture accurately the qualitative features of the SARS epidemic with the simplest model (the fewest parameters). We propose four distinct states. Individuals can be susceptible (S), prone (P), infected (I), or removed (R). The transmission path is depicted in Fig. 2 . Infected individuals can cause susceptible individuals, to whom they are linked, to become prone with some probability ( p 1 or p 2 ). By infection we mean the transition from the susceptible state to the prone state. Infected individuals can cause their immediate neighbours to become infected with probability p 1 ; long range links cause infection with probability p 2 . Prone individuals become infected with probability r 0 and, finally, infected individuals become removed with probability r 1 . Just as in the SIR model, we do not distinguish fatalities from recoveries: in either case, the individuals are assumed to have acquired immunity. The model states that we describe here bear a close correspondence to the four states of the SEIR model. The prone state is analogous to the exposed (E) state in the SEIR model. We choose to use the new term P because the epidemiological state is slightly different. By prone we mean both the infected but not infectious (incubation) period and the pre-symptomatic period. For SARS, all infections in Hong Kong could be traced to contact with a symptomatic individual. 2 Hence, for the purposes of modelling transmission, we assume that, during the pre-symptomatic period, there is no chance of transmission. Moreover, to calibrate our model with the observed data (which, by definition, are hospital admissions), we further prescribe that the time between the onset of symptoms and hospitalisation is constant (or, equivalently, follows a stationary unimodal distribution). In our model we explicitly model the geographical structure of the population. We include both "local" and "non-local" links. Because of common transmission of SARS within specific housing estates and districts in Hong Kong, and the (both real and perceived) risk of transmission in workplaces (primarily hospitals and schools) or other public areas, we model these two types of transmission separately. The geographical arrangement of nodes represents the residence of each individual. So, by "local" transmission, we mean only transmission within a family unit (i.e. residents of a single flat), or between adjacent flats. 3 Hence, "non-local" transmission The top panel depicts the transmission state diagram: S to P based on the SW structure and the infection probabilities p 1,2 ; P to I with probability r 0 ; and I to R with probability r 1 . The lower panel depicts the distinction between "local" (i.e. short range) and "non-local" (long range) network links. The lower panel shows the arrangement of nodes in a small-world network. The black (infected) node may infect its four immediate neighbours with probability p 1 and three other nodes (hashed) with probability p 2 . refers to transmission between non-family members due to the mixing of individuals in public spaces. In the context of the SARS outbreak in Hong Kong, this would include transmission within hospitals, schools and public spaces. In our model, we expect SSE to be represented through a single node with a large number of non-local connections. 4 We fix the population N and assume that there are no other additions to, or deletions from, the population from any other cause for the duration of each simulation. The population of N nodes is arranged in a regular grid, of side length L (L 2 = N ), and each node is connected directly to n 1 immediate neighbours. 5 An infected individual will infect each of its n 1 neighbours (provided that they are still susceptible) with probability p 1 . Furthermore, each node has n 2 non-local (i.e. long distance) links (see Fig. 2 ). These are links to nodes that are geographically remote from one another; infection occurs along these pathways with probability p 2 . For each node i, the number n (i) 2 is fixed, and so are the links to its n (i) 2 remote neighbours once they are established. The number n (i) 2 is chosen to be proportional to a decaying exponential f X (x) ∝ e − x µ with parameter µ proportional to the expected (average) number of 4 As we will see in what follows, the number of local connections is constant for all nodes. 5 The grid is assumed to be topologically equivalent to the surface of an annulus in three dimensions, that is, nodes (1, 1) and (L , L) are neighbours. links to remote nodes, where n (i) It is the inclusion of non-local links with a random number of links that can give rise to the network's SW (and, in other cases not considered here, SF) structure. In this paper, we assign an exponentially decaying probability distribution to any number of links, and (for uni-directional links) this is sufficient to generate the necessary SW properties. An SF network requires a power-law distribution of the number of links (which can consequently lead to more nodes with many more links) and we do not examine that case here. The SF distribution of the induced network of actual infections has been considered elsewhere [16] . It is worth considering that, for the model that we present here, the links between nodes are uni-directional. That is, infection only spreads in one direction. Clearly, the true network of social interaction consists of bi-directional links. But, for the purposes of simulating disease transmission, unidirectional links appear to be a sufficient approximation. The consequence of this is that it becomes easier to generate the small-world (and elsewhere the scale-free) network. Finally, for each simulation we seed the model with one randomly chosen initial infection. We expect that computational simulations of this network will show that infection spreads locally, just as SARS spread within particular geographical regions of Hong Kong. Moreover, the system can also exhibit non-local infection, as a single individual may infect individuals in distant communities. Occasionally, individuals will infect a large number of others, exactly as was observed at the start of the SARS epidemic in Hong Kong (an SSE). However, unlike the start of the SARS epidemic in Hong Kong, because we seed our population with a single infectious individual, we do not expect to see the initial SSE triggered by that individual (except by chance). Therefore, the initial growth of transmission in our model is exponential rather than an SSE. The only way to overcome this is to include explicitly the "seed" SSE in the model, regardless of network topology. The epidemic will eventually be contained if the rate of infection is lower than the rate of removal. Intuitively, provided that (n 1 p 1 + µp 2 ) r 1 , one would expect the disease to become endemic; conversely, if (n 1 p 1 +µp 2 ) r 1 , the disease will be contained. In what follows, we study this condition more precisely. Moreover, with this model we can analytically compute the probability of an outbreak being self-terminating. For a single infectious node, the probability of no further infections on a given day is given by Hence the probability of no further infections from this node can be closely approximated by the infinite geometric series using the average P no1 computed in Eq. (2): provided that |P no1 (1 − r 1 )| < 1. Upon substitution of Eq. (2) into (3), we find that Eq. (4) is the probability of no infections from a given individual, and is therefore a weak lower bound on the probability of no general outbreak. Now, let us denote the probability of no further infections occurring, given that there are k infectious nodes by = P (k) none = Prob(no further infection | k infectious nodes) where, for notational convenience, we will drop the subscript on P none . Treating infections as discrete events (i.e. they occur one at a time), we have that (1− P k ) is the probability of at least one further infection from k infectious nodes. The probability that the epidemic will terminate is given by where P = P none is given by Eq. (4). By expanding Eq. (5) and comparing to the Pentagonal Number Theorem, 6 we find that Eq. (5) can be rewritten as an infinite sum (1 − P n ) = P + P 2 − P 5 − P 7 6 This result was originally proved by Euler in 1775. where the sequence of indices 0, 1, 2, 5, 7, 12, 15, 22, 26, 35 . . . is the generalised pentagonal numbers (described as sequence A001318 in [18] ). Eq. (6) may also be re-written in terms of the Dedekind eta function, but for the purposes of this discussion it is unnecessary to do so. Nonetheless, for 0 ≤ P < 1 this sequence converges fairly rapidly as the order of the exponent increases. The exact 7 probability of a general outbreak can alternatively be obtained by using a branching process method. Following [19] , we define the probability generating function for the number of secondary cases produced by a single infectious case in a day. Then the probability generating function for the overall number of secondary infections from a single primary case is One can then obtain the probability of no general outbreak (i.e. the probability of the disease not becoming endemic) as the smallest solution x ∈ [0, 1] of g(x) = x. Unfortunately, Eq. (8) cannot be readily used for further analysis. Similarly, although Eq. (5) can be computed easily, it is not in a form that is immediately amenable for further analysis. However, since P safe ≥ P none , it is clear that 1 will make P safe ≈ 0. Hence, either µ 1 or p 2 ≈ 1 will lead to widespread infection (as expected). Differentiating (5) with respect to (1 − p 1 ) n 1 , we can easily verify that P safe is a monotonic function of both p 1 and n 1 . One can therefore observe that P safe ≈ 0 if p 1 ≈ 1 or n 1 1. The most severe limitation on Eq. (5), and also Eqs. (7) and (8) , is that we assume that no infected nodes have common neighbours, and that all of the neighbours are susceptible. In reality, the number of potential infections is limited by the fact that some of the potential neighbours are already infected. It is therefore important to estimate the number of neighbours of an infected node that have not been infected. This is equivalent to estimating the ratio of local and non-local infections in an epidemic. 8 One can consider the network of infected individuals as consisting of a number of "clumps": one clump 7 Because of the assumption that infections occur individually and sequentially, the branching process in Eq. (5) is only an approximation to the solution of Eq. (8). 8 We can achieve this as follows. Suppose that there are no non-local infections (i.e. p 2 = 0) and that infections grow in a single (roughly spherical) "clump". Then, if the clump consists of I (t) individuals, then the radius of this for each non-local infection (i.e. each clump is seeded by a nonlocal transmission; all other transmissions within that clump are local). Provided p 2 > 0, this implies that, as the clump gets bigger, the probability of any given infection being a long range infection will increase. Conversely, as the number of clumps increases, the probability of local infection (relative to nonlocal infection) will increase. 9 Let us now estimate the expected number of connections from an infected node. Let N S denote the expected number of susceptible nodes linked to a random node. If this node is the result of a non-local infection, then we suppose that N S = n 1 + µ; however, if this is the result of a short range infection, then this number should be lower (certainly no more than µ + n 1 − 1). Now, where k is the proportion of local links that support possible infection and 0 < k ≤ n 1 −1 n 1 . Hence, From the preceding geometric argument, if infection grows in a single clump, then k ≈ 1 2 . Moreover, k < 1 2 only if nodes remain infected when they are on the interior of such "clump" (i.e. when r 1 is very low). We would therefore expect that 1 2 ≤ k ≤ n 1 −1 n 1 . Note that k is not a model parameter, but rather it is a term in the model that will both depend on the various model parameters and vary with time. Finally, we now consider the rate of transmission. Let P(t), I (t) and R(t) be the number of prone, infected and removed individuals at time t (in days). The probabilities r 0 and r 1 can therefore be considered as the rates at which prone nodes become infectious and infectious nodes become removed (respectively). Similarly, (n 1 p 1 k+µp 2 )S(t)I (t) is the expected number of new infections. Suppose that S(t) R(t) + I (t) + clump will be I (t) π and the number of susceptible individuals is 2 √ I (t)π . Now, further suppose that all nodes in the clump are infectious (i.e. r 1 = 0), then the mean number of links per infected individual is 2 π I (t) . Even with r 1 > 0, as the clump grows, there are, on average, fewer potential infection paths. 9 Moreover, one can estimate the number of clumps K . Observe that K ≈ µp 2 n 1 p 1 +µp 2 × (number of infections). More precisely, n 1 p 2 (n 1 k 2 p 1 + µp 2 ) + µp 2 (n 1 kp 1 + µp 2 ) × I where I is the total number of infections. where n k = (n 1 p 1 k + µp 2 ) is the expected number of links for each infectious node. We are now modelling the inter-day process assuming discrete day-to-day dynamics. The reason for this approximation is that the available time series data (which will be the basis of our comparison) is similarly course-grained. Assuming that the population is seeded with a single infectious individual, the solution of Eq. (11) is given by where is the matrix of eigenvectors and is formed from the corresponding eigenvalues, given by It then follows that the system has a marginally stable focus (i.e. the epidemic will terminate) if |λ 2,3 | < 1, i.e. n k < r 1 (13) n k r 0 < (2 − r 0 )(2 − r 1 ). The second condition (14) is only violated if n k > 1, which would also violate condition (13) . Therefore, the epidemic is controllable provided that n k = n 1 p 1 k + µp 2 < r 1 . The right hand side of this inequality is the rate of infection and the left hand side is the rate of removal, as expected. In fact, this result is exactly analogous to the equivalent result for the continuous SIR model [7] . Moreover, Computationally, we can see that, as r 0 or n k increases, then the rate of growth of the epidemic also increases. Conversely, as r 1 increases, the rate of growth decreases. This is as one would expect, as increasing r 1 will decrease the number of infectious individuals while increasing either r 0 or n k increases this quantity. In the following subsections, we confirm the preceding relationships and numerically explore the behaviour of our models under a variety of conditions. As stated, our model has seven explicit parameters: L, n 1 , µ, p 1 , p 2 , r 1 , and r 2 . For the population of Hong Kong, we set L = 2700 (N = L 2 = 7,290,000). We arbitrarily choose n 1 = 4 and set E(n 2 ) = µ = 7 [16] . One further parameter is the time interval between successive steps in the discrete time simulations from our model. We choose the natural scale of one day. However, it is not obvious that this is the best choice. Given that the data is discretised to this interval, it is perhaps the best choice in this situation. Nonetheless, to confirm that this choice is appropriate, we have repeated our analysis with both longer and shorter time steps. The results are equivalent. The average incubation period between infection and an individual becoming symptomatic is 6.4 days [6] . The number of days in the prone state can therefore be modelled as the result of a series of independent Bernoulli trials with a mean 1 r 0 and so it follows a geometric distribution f X (x) = (1 − p) x−1 p. 10 For a general disease model, it would possibly be more appropriate to have a separate state for the pre-symptomatic but infectious period. Although contact tracing of all SARS patients in Hong Kong has demonstrated that this is not a significant period for SARS, we have repeated the simulations with this state. If the hiatus in this state is relatively short, the results are not altered significantly. In a similar spirit, the time before hospitalisation is 3-5 days, and we model this as a series of independent Bernoulli trials. In an effort to establish the degree to which hospital transmission in our model matches what was observed, we first assume that hospitalisation is equivalent to isolation of infectious individuals. That is, we suppose that infectious individuals are only infectious for 3-5 days. The current weight of evidence suggests that hospital transmission was a crucial factor for the SARS outbreak in Hong Kong during 2003. We will show that this is also the case for our model. Hence, we suppose for now that the average amount of time prior to isolation is 4 days (we will consider larger values later). In our model, the number of days prior to hospitalisation is also made to follow a geometric distribution with mean 1 r 1 . Hence, the only free parameters are µ, p 1 and p 2 . Without active control, we also know that the average number of new infections per case (excluding SSEs) is 2.7 [4] . In this state, each infectious individual will infect, on average, n 1 p 1 + E(n 2 ) = µ and we suppose that the time before hospitalisation is d days, we have We set d = 4 and therefore p 1 ≈ 1 n 1 (0.675 − µp 2 ) = 0.135 − µ n 1 p 2 . In subsequent simulations, we also consider d = 3, 4, 6 and also a larger number of new infections than 2.7. 10 We do it this way simply because it is convenient to do so. For simulation purposes, this sequence of Bernoulli trials is both easy to implement and (more importantly) requires very modest computational storage resources. In each case, the results are equivalent to those presented here. However, we maintain the values described above in the following discussion. Any deviation of results between these and other values will be highlighted in the text. Our initial model parameters are therefore [16] : Note that, because we have the possibility of P to I transition after zero days, r 0 = 1 7.4 rather than 1 6.4 . This does not have a significant effect on our results; it is merely a computational convenience. Now, from Eq. (15) we can deduce that the rate of growth of infection is approximately which, for r 1 = 0.25, yields either a growth rate significantly less than exhibited in the data or rates of infection significantly greater. Even for reasonable variation of d and the average number of secondary infections, we obtain similar results. Hence, we conclude that the assumption of no nosocomial transmission is inconsistent with the observed data. Increasing the average infectious time to 6 days gives a substantially higher rate of infection: consistent with the observed data. Subject to Eq. (5), we explore which parameter values give a significant probability of the epidemic becoming endemic. From Eq. (13), we have that P safe = 1 if n k < r 1 . Consistent with the discussion of the preceding section, we start with the assumption that k = n 1 −1 n 1 and set n 1 = 4 and µ = 7. Fig. 3 is a plot of the probability of complete infection for various parameter values estimated from the data and Eq. (5). We see that there is close agreement between the theoretical and experimental results. Moreover, we note that a smaller value of k is appropriate for scenarios with a relatively high proportion of local spreading (i.e. p 1 larger and p 2 ≈ 0). This is consistent with the case of spreading within a single clump, and we approach the situation of k = 1 2 . However, in these simulations the best choice is k > 1 2 in all cases. Only for extremely small values of r 1 and p 2 would we expect smaller values of k. Typically, k = n 1 −1 n 1 = 3 4 seems to be a good choice. Furthermore, in Fig. 3 it is evident that the greatest probability of an epidemic becoming endemic is when there is both local and non-local infection. In the model that we have constructed here, local infection spreads approximately geometrically, 11 while non-local infection spreads exponentially. Yet, it is some combination of both that provided the greatest possibility of an outbreak spreading without control. Clearly, exponential growth will lead (all else being equal) to more rapid growth of an epidemic than geometric growth. But, by combining some geometric growth, the epidemic becomes even more dangerous. It seems that the additional local transmissions allow each nonlocal transmission the possibility of seeding a new infection cluster (rather than just a single point), and each new cluster is more difficult to eradicate than a single point infection. Fig. 4 depicts level curves for the probability of fixed levels of infection. That is, we compute the parameter values p 1,2 ∈ [0, 1] and r 1 ∈ [0.1, 0.5] required to achieve specific values of P safe . We see that only for p 1,2 0.2 is the outbreak likely to be controlled. Moreover, the variation with r 1 is not critical i.e. the range of behaviour for specific p 1,2 is not great. Finally, in Fig. 5 we plot the probability of the epidemic being controlled for various values of µ with p 1,2 ∈ [0, 1], r 0 = 0.1 and r 1 = 0.25. Consistent with Fig. 4 , results for different values of r 0,1 did not change significantly. Moreover, from Fig. 5 we see that only for µ > 2 does the infection probability p 2 have a significant effect. That is, to limit long range infection, one should aim to reduce the average number of long range links below 2. We find that, for all values of p 1,2 and µ > 2, there was remarkably little variation in the value of P safe . However, the rate of infection (Eq. (15)) did change significantly. Subject to the choice of parameters in the previous sections, we now simulate the expected dynamics and compare this to the theoretical bounds of Section 2.2. According to theory, restricting the infectious period to five or fewer days does not yield a growth rate large enough to be consistent with the observed data. We test that assertion numerically and find that the observed data (over 1000 individuals infected after 50 days) is inconsistent with the simulation for r 1 = 0.25. However, by lowering r 1 to 0.165 (and therefore increasing the mean infectious period to six days, we achieve results more consistent with the observed data. We see that only with r 1 ≥ 0.165 do we obtain results for which the true data is not statistically atypical. Moreover, this result is robust to moderate changes of the other relevant parameters, as illustrated in Fig. 6 . We conclude that the obtained results for which the true data is not atypical; we require r 1 ≥ 0.165 (i.e. a mean exposure time of six or more days). Moreover, we see from Fig. 6 that widespread infection is associated with a large number of clusters. The results of the next section corroborate this. To examine the role of clustering more closely, Fig. 7 is a snapshot of a single simulation for parameter values r 1 = 0.25 and p 2 = 0.006. This simulation shows SSE resulting from clustering and highly connected nodes. Moreover, the gradual spread of the disease within a single cluster is evident; one can see that the number of clusters is far less than the number of infections. Time series of the infection total appear qualitatively similar to the Hong Kong SARS data. Specifically, burstiness typical of SSEs is evident. However, the quantitative behaviour of this model is remarkably different to the dynamics observed in the SARS outbreak. The daily number of reported infections in Fig. 1 far exceeds the total number of active infected and prone individuals in Fig. 7 . Finally, we provide simulations of the Hong Kong epidemic and from multiple simulations estimate the likelihood of various outcomes based on the model. We initiate the model with a single infected individual and a relatively low removal rate r 1 . Fig. 8 depicts our results. We can see from Fig. 8 that many of the features of the true data are reproduced well in the simulations. However, two important aspects of the simulations are not sufficiently similar to the data. Firstly, the initial spreading of the disease is exponential, rather than the single SSE observed in the real data. Secondly, the magnitude of the SSEs in the simulations is somewhat smaller than the largest SSEs in the data. This second aspect can be overcome by simply altering the distribution of non-local links. In [16] we describe how a power-law distribution of links can lead to many more extreme events. Conversely, the initial SSE in the data cannot be modelled well by our simulations, except by chance. Therefore, to achieve similar initial events, we would expect that we would have to execute many simulations (and choose only those that suit our purpose), or simply build the SSE into the model. Neither of these approaches are desirable. We prefer to focus on the possibility of SSEs occurring randomly, without explicitly adding them in the model. Hence, we do see SSEs, but not necessarily immediately after the start of the simulation (as we suppose occurred in the true data). We now wish to demonstrated how the SW model described here improves on stochastic SIR-type models [7] . One very simple way to achieve this is to compare the distribution of statistic values measured from simulations of either type of model to the true data. In Fig. 9 we do this, and we find that the small-world model exhibits statistical properties much closer to the observed data. In Fig. 9 we compare the results for the SIRtype model (with stochastic inputs) to the SW model described in this paper. By generating multiple simulations of both models and estimating simple statistical measures from both sets of simulations, we see that the model dynamics of the SW model are much closer to the true data. Assuming that this model and parameters are accurate, or at least appropriate, we compute the likely behaviour for various epidemics. We generated 1000 realisations of the model depicted in Fig. 8 and computed the total number of casualties. Fig. 10 is a plot of the probability distribution of the number of fatalities for these simulations, and Fig. 11 is the probability distribution for the daily number of infections. We found that the probability of infecting fewer than 20 people was approximately 0.18, while the probability of infecting more than 1000 was 0.27. One can see that, with respect to these gross statistics, the true situation for Hong Kong (1755 casualties) is quite typical. 12 It is interesting to note, however, that there is a large variation in the number of casualties. In all cases, the parameter values of Fig. 8 provided effective control of SARS transmission after approximately 150 days. From these simulations, we therefore conclude that, with effective control measures in place, the likelihood of a significant outbreak is low. 12 From 1000 simulations, 106 exhibited a larger number of casualties. We summarise the main results of the preceding sections as follows. • The probability of the disease spread being controlled, P safe , is approximately • The epidemic is under control if n k = n 1 kp 1 + µp 2 < r 1 (Section 2.2). • For the endemic case, the rate of growth of infection is • This model exhibits behaviour consistent with both SSEs and persistent localised transmission (Section 3.4). More extreme SSEs can be observed simply by fattening the tail of the distribution of links. By assigning a power-law distribution to the number of nonlocal links that a node has, one can readily obtain SSE involving many hundreds of secondary infections from a single source. • An SSE does not imply highly infectious individuals, only highly connected ones (Section 3.4). • Theoretical results and model simulations are unlike the true data, unless exposure time is significantly greater than an average of three days (Sections 2.2 and 3.3). If the exposure time is three days or less, the rate of growth of the epidemic is significantly lower than that observed in the true data. • Nosocomial transmission was therefore a key factor in the acuteness of the SARS epidemic in Hong Kong in 2003. Effective control of hospital transmissions would have prevented a serious outbreak (Section 3.3). With respect to nosocomial transmission, our model therefore confirms what has been observed independently, and suggested by many authors. • Simulations of our model, with minimal parameter variation, produced dynamics indistinguishable from the true data. These same parameter values exhibit a wide variety of behaviours (Section 3.5). Hence, any effort to actually obtain maximum likelihood parameter estimates from the observed data is futile. Despite this, our calculations show that the true data is certainly typical of our models. Moreover, our models exhibit a marked long-tailedness both in infection times and epidemic lifetimes. • The likelihood of infecting fewer than 20 people was approximately 0.18, and the likelihood of infecting more than 1000 was 0.27 (Section 3.5). Apart from deriving analytic expression for the spreading and control of an epidemic, the main results of this study, when applied to the SARS epidemic in Hong Kong in 2003, is that our data is consistent with the observation that the epidemic that occurred was largely preventable. The primary factor for the severity of the outbreak was poor infection control in the hospital setting and the delayed introduction of quarantine and community isolation practices. This is consistent with various discussions presented elsewhere. However, we note that nosocomial transmission was considered only by supposing (admittedly erroneously) that the time between becoming symptomatic (and infectious) and hospitalisation is constant. Nosocomial transmission is simulated by changing the expected duration between becoming symptomatic (I) and becoming removed (R). Although this is perhaps unconventional from an epidemiological viewpoint, it is probably the best that we can do, as the only data that we have to work from is the date of hospital admission. This conclusion is in contrast with work done by Pastor-Satorras and colleagues [20, 21] with scale-free SIS type [7] disease models. In that case, they found that the disease would almost always persist and that random immunisation was ineffective [21] . The model presented here emphasised the small-world transmission dynamics created by providing a small number of highly connected nodes. We model the inter-connection between nodes with one-directional links (rather than the bi-directional links we would expect for social contacts) because this makes the network much easier to construct, without sacrificing any realism. This model could easily be extended to exhibit scale-free characteristics, by simply altering the distribution of non-local links (to follow a power-law distribution). Doing so would produce simulations with a larger number of highly connected individuals, and therefore we could simulate the largest SSEs in the data. The results presented in this paper show small (less than 100 individuals) SSEs; extension to the power-law distribution would provide SSEs with a larger number of secondary infections. We should note that it is very difficult to reliably and accurately fit even a moderate number of parameters to a stochastic model from such limited data. Especially when the model, for the same parameter values, can exhibit a wide variety of behaviour. To overcome this, we (a) make our model so simple that the number of parameters is very few (certainly comparable in number to ordinary SIR), and (b) only "fit" the model with typical parameter values and test that, for these parameter values, the observed data is typical. Certainly, we cannot exclude the (quite likely) probability that other parameter values also produce behaviour of which the observed data is typical. Therefore, in this paper we only draw conclusions based on our observations of parameters that produce behaviours that are either typical or (more importantly) atypical of the observed data. Although we have observed that our model exhibits more realistic dynamics than SIR models, it is only a model. And, like the SIR model, our model is a compromise between complexity and realism. We intend to address this problem in the future by applying Monte Carlo Markov Chain models [22] to this sparse data [23] . Even our choice of a discrete time model is open to debate. The choice, with a time interval chosen as one day, was motivated by the data and what was more intuitive to us. We have conducted calculations with both longer and shorter interinterval time scales and find that the results are invariant under a suitable change in time scale (i.e. we get equivalent results by using a shorter time step). Moreover, recent epidemiological case studies have found that, overall, in Hong Kong 8% of individuals sharing a flat with a SARS patient contracted SARS [24] . Given an average infectious period of four days (prior to hospitalisation), this implies that the daily probability of transmission is 0.02. This is approximately consistent with the values of p 1 used in this study, and therefore provides further support of our results. Conversely, analysis of the SSE at the Prince of Wales Hospital found that, for a group of medical students, the probability of direct contact with the index patient leading to SARS infection was 10 27 [5] . As this study dealt with a single meeting, this implies a daily infection probability of approximately 0.371: significantly higher than p 2 in our simulations. Hence, although we have concluded that SSEs may occur without individuals necessarily being highly infectious, epidemiological evidence suggests that, in some cases, this may still be the case. Finally, we note that, although the methods presented here are applied only to the SARS outbreak in Hong Kong in 2003, these methods are not limited to this situation. Apart from repeating, or modifying, this analysis for different infectious diseases (such as HIV AIDS), we imagine that these techniques could be useful in the theoretical study of quarantine and isolation practices, as well as disease transmission among isolated communities. Quarantine can be effectively modelled as a limitation of long range transmission. One can easily model change in quarantine by varying the parameter µ (or alternatively, but not equivalently, p 2 ). This should provide a new, simpler approach to the study of quarantine to supplement compartmental models such as those described in [25] . World Health Organisation, Consensus document on the epidemiology of SARS Report of the severe acute respiratory syndrome expert committee Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions Cluster of SARS among medical students exposed to single patient Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong Mathematical Biology Modeling infection transmission More realistic models of sexually transmitted disease transmission dynamics: Sexual partnership networks, pair models and moment closure A versatile ODE approximation to a network model for the spread of sexually transmitted diseases Likelihood-based inference for stochastic models of sexual network formation, Theor Poisson approximation for epidemics with two levels of mixing Forecast and control of epidemics in a globalized world Six Degrees: The Science of a Connected Age Collective dynamics of 'small-world' networks Small world and scale free model for transmission of SARS Dimensions of superspreading On-line encyclopedia of integer sequences, Online Mathematical Epidemiology of Infectious Diseases Epidemic spreading in scale-free networks Epidemic dynamics and endemic states in complex networks Bayesian inference for partially observed stochastic epidemics Stochastic modelling of ecological processes using hybrid Gibbs samplers Probable secondary infections in households of SARS patients in Hong Kong Simulating the effect of quarantine on the spread of the 1918-19 flu in central Canada This work is supported by funding from the Health, Welfare and Food Bureau of the government of the Hong Kong Special Administrative Region under the Research Fund for Control of Infectious Diseases (RFCID). DW thanks the Scottish Executive Environment and Rural Affairs Department (SEERAD) for support.