key: cord-0005448-aa2h3bqy authors: de Menezes, Renée X.; Ortega, Neli R. S.; Massad, Eduardo title: A Reed-Frost model taking into account uncertainties in the diagnostic of the infection date: 2004 journal: Bull Math Biol DOI: 10.1016/j.bulm.2003.10.003 sha: 8a48241d708dc039322571dc468ac9aae45677a2 doc_id: 5448 cord_uid: aa2h3bqy In this paper, we model the epidemic course of a pathogen infection within a semi-closed group which generates clinical signals which do not necessarily permit its ready and certain identification. Typical examples of such a pathogen are influenza-type viruses. We allow for time-varying infectivity levels among individuals, and model the probability of infection per contact as a function of the clinical signals. In order to accomplish this, we introduce a modified chain-binomial Reed-Frost model. We obtain an expression for the basic reproduction ratio and determine conditions which guarantee that the epidemic does not survive in the long-term. These conditions being functions of the signal’s distribution, they can be used to design and evaluate interventions, such as treatment protocols. The Reed-Frost model was proposed by Reed and Frost in a series of lectures held at Johns Hopkins University (Abbey, 1952) . It is a particular case of a chainbinomial model, in which it is assumed that each infected individual infects susceptible individuals independently, and that individuals are under the same contact rate with each other. If we represent by p the probability of a contact between a susceptible and an infected individual resulting in a new case, we have that, at time t, the probability that a susceptible individual does become infected, c t , is equal to the probability of at least one infectious contact, i.e., where I t is equal to the number of infected individuals at time t. Time t is assumed to be a discrete variable, and an individual's classification, as either susceptible, infected or resistant, can only change when time changes from t to t + 1. Other assumptions are that the probability of an infectious contact is fixed along the epidemic course and is the same, for all individuals. The Reed-Frost model (1) can be used to describe the spread of any infectious disease affecting closed, uniformly-mixed groups. The group has a constant and small size N , is homogeneous, both from the susceptibility and infectivity viewpoints [see Bailey (1975) ], with individual members spending a significant and constant part of the day in close contact. From the infection viewpoint, the infectious period is assumed to be short compared to the incubation period which, in its turn, is taken as constant. The Reed-Frost model assumes that individuals are classified according to their disease status: susceptible and infected and, in some cases, also resistant or immune. No error involved in the classification process, such as a truly infected individual being classified as susceptible, is considered in the model. For a great number of infectious diseases, however, such a diagnostic test is neither readily nor easily available: examples are influenza and several other viral and bacterial infections. The corresponding diagnostic process involves uncertainty, and is based upon a set of clinical characteristics, often subjective, which we call signals. It is then important to consider, in the epidemic model, the uncertainty involved in the classification process. The homogeneity assumption is unlikely to hold in real epidemics, especially in large groups [see, for example, Becker (1979) ]. In certain cases, the assumption of time-invariant susceptibility/infectivity levels does not hold either. Each individual may have a varying susceptibility level to infections, depending on physical and psychological factors, even within a short-lasting epidemic. Infectivity levels may also vary according to similar factors. Indeed, the capacity of an infected individual to produce an infectious contact may depend upon the set of signals developed. Some signals, such as sneezing and coughing in influenza-type infections, may increase the probability that a contact be infectious, while others, such as fever, may decrease it, by making the host less prone to contacts. In this paper, we shall consider studies involving small groups, within which both homogeneous mixing and homogeneous susceptibility still hold. We consider the clinical signals involved in the classification process in the study of the epidemic course. These clinical signals may include symptoms, results from laboratorial and physical exams. We assume that, after being infected, no resistance is gained and the individual becomes susceptible again. In Section 2 we model an individual's infectivity as a function of the signals, therefore allowing for time-dependent, heterogeneous infectivity. We consider this model in the context of both prospective and retrospective studies. For each study, we obtain expressions for the epidemic basic reproduction ratio in Section 3, and for its probability function in Section 4. In Section 5 conditions are given under which the proposed model is reduced to the classic Reed-Frost model. Finally, in Sections 6 and 7 we illustrate how the results can be applied to study long-term disease establishment and impact of intervention, respectively. 2.1. Overview. Clinical signals are recorded and are taken into account in the epidemic course via a signal summary, both as part of the classification process and to define the probability of an infectious contact. It is assumed that, the higher the signal summary, the higher the probability that a contact be infectious. The probability that an individual has at least one infectious contact, which is the core of the Reed-Frost model, is then computed taking into account the heterogeneous infectivity in the group. The model can include both signals linked to an increased infectiousness and signals linked to a decreased infectiousness. Both types of signals enter the signal summary, affecting it in opposite directions. Distinct signals can have different weights in the summary, reflecting the impact they are believed to have on both the classification process and on the infectious contact probability. A probability distribution is assigned to the signal summary, conditioning on the previous probability of at least one infectious contact. This distribution is a mixture of the one given the individual is infected, with the one given the patient is susceptible. The classification is seen as a probabilistic step conditioned on the signal summary. The probability of an infectious contact is taken as a deterministic, polynomial function of the signal summary. In this formulation, we take a susceptible individual as reference, and construct a generalized Reed-Frost model. Hereafter, upper case letters such as (S i j,t , D i,t , G i,t , P i j,t , C i,t ) represent random variables, while lower case letters (s i j,t , d i,t , g i,t , p i j,t , c i,t ) represent values the corresponding variables may assume. Greek letters (α, β, η) represent possibly unobservable quantities. We first assume that, at time t, each individual i has a true health status represented by η i,t , which takes value 1 if the individual is infected at t, and 0 if the individual is susceptible. Thus, the number of individuals infected at t is given by Each individual has one or more clinical signals, which can be summarized by one variable D i,t , taking values between 0 and 1. At time t, the probability P il,t that a contact between a susceptible individual i and an infected individual l results in a new case is a function of the signals of the infected individual only, D l,t , as a consequence of the susceptibility homogeneity assumption. We assume in particular that this function can be written as a polynomial of degree M. That is, where 0 ≤ φ j ≤ 1 and j φ j = 1, that is, P l,t is a convex combination of D j l,t , guaranteeing that P l,t ∈ [0, 1] for all i, t. Then the probability that a susceptible individual has, at time t, at least one infectious contact defines our Reed-Frost model as We can interpret C t here as the probability that an individual be infected at time t + 1, as in the classic Reed-Frost model, and we write C t = P{η i,t +1 = 1}. In some instances η i,t is unknown, so individuals have to be diagnosed as either infected or susceptible. This consists of a classification procedure which takes into account the clinical signals or, for simplicity, the signals summary D i,t , and is defined outside the model, probably by specialists. Let G i,t = 1 indicate that the individual i is diagnosed as infected at t, and G i,t = 0 indicate that the individual is diagnosed as susceptible. The number of individuals diagnosed as infected at time t is an estimator of the number of infected individuals at t, and is given bŷ The probability that a contact between a susceptible individual i and an infected individual l results in a new case is defined as before by (3) and, in this case, (4) is estimated asĈ Thus,Ĉ t here is the estimated probability that an individual be infected at time t + 1. This approach can be used in two related contexts. One is that of a retrospective study, in which patients' health status are observable and modelled as random variables. The objective of such a study is typically to estimate the parameters of the signals' distributions, and it involves relations (2)-(4). Once these parameter estimates are available, the approach can be used in a prospective study, where the true {η i,t } are not known, due to either time or cost constraints. Such a study could include as an objective evaluating the function P( ), and it involves expressions (3), (5) and (6). This consists of recording patients' clinical signals over a certain period of time, and then estimating their true health stata using the model and the classification process. Each infectious disease produces clinical signals with varying degrees of severity, which depend upon both pathogenic and individual variability. Susceptible individuals may also present some of these signals, for reasons other than infection by the pathogen considered, but it is expected that they do so with a lower severity than if they were infected. The true health status η i,t is a binary variable. For t > 1 and given all the epidemic information up to t − 1, P{η i,t = 1} is equal to the probability of having at least one infectious contact at t−1, C t −1 , which is the same for all individuals due to the homogeneous mixing assumption. For t = 1, we define P{η i,1 = 1} ≡ θ 1 as the a priori probability that any individual is infected at the epidemic onset, which must be evaluated via populational measurements (e.g., the estimated prevalence of the pathogen). For the pathogen under study, represent by X I the clinical summary for any infected individual, and by X S the clinical summary for any susceptible individual. Given an individual's health status η, X I , X S are random variables intrinsically linked to the pathogen, their distributions remaining unaffected by the epidemic progress. Here we shall assume that they take a value within the interval [0, 1] with a distribution within the beta family, as follows: We define µ I = E(X I ) and µ S = E(X S ). Note that all moments E(X k I ), E(X k S ) of X I , X S are finite, for all k = 1, 2, . . . (see appendix for their expressions). We also define = X I − X S and δ ≡ E( ) = µ I − µ S . The observed clinical summary for individual i at time t, D i,t , is equal to X I if the individual is infected and it is equal to X S if the individual is susceptible. So, we can write for all i = 1, . . . , N and all t ≥ 1. For t > 1, the conditional mean of D i,t , given C t −1 , is given by Similarly for t = 1, In other words, the expected value of the signal summary at time t is a convex combination of the mean signals for infected and susceptible individuals, based upon the probability of being truly infected. Given C t −1 , the number of infected, I t , defined by (2), is a sum of conditionally independent binomial variables, all with the same probability of success: C t −1 for t > 1, and θ 1 for t = 1. Thus, I t is in this case a binomial random variable with probability C t −1 and sample size N . Since the probability of an infectious contact P l,t is a deterministic function of D l,t , it is a constant, when D l,t is given. Unconditionally, a probability distribution is effectively assigned to each probability of an infectious contact, P l,t , for each individual l at each generation t, l = 1, . . . , N , t > 0. The probability of at least one infectious contact, C t , does not have a well-known probabilistic distribution, but its conditional expected value, given the clinical summaries {D i,t }, can be computed. In the above, we have preferred to perform calculations in terms of µ S , δ instead of µ S , µ I . This separates the contribution of the signal's distribution, which can be treated as unaffected by the disease spread, from the probability that each individual is infected. Therefore, it incorporates treatment effects naturally as a reduction of the difference between mean signals, δ = µ I − µ S . In a prospective study context, most of the structure introduced in Section 2.4 applies, but the diagnostic uncertainty must be included. When the {η i,t } are unknown, patients must be classified as either infected or susceptible. The classification is represented by G i,t and defined as a simple process: given the clinical summaries {D i,t }, each individual is independently classified as infected with probability: We can interpret the outlined probabilistic structure for D i,t , G i,t as defining a conditional binomial distribution for G i,t , given the probability of success D i,t which itself has a conditional beta distribution, given the health status η i,t . As a conse- The conditional probability that an individual is classified as infected in generation t, given the probability of at least one infectious contact in the previous generation C t −1 , is given by: where either (8) or (9) can be used to re-express this as a function of the signals. The number of patients diagnosed as infected,Î t , defined by (5), is again a sum of conditionally independent binomial variables, given {D i,t }, but since each one of these has a different probability of success, the distribution of I t is not the usual binomial. 3.1. Definition. We shall now study the basic reproduction ratio, R 0 . It is defined as the number of secondary infections resulting from a single case in an entirely susceptible group, during its infectious period (Anderson and May, 1991) . In our context, R 0 = I 2 , given that I 1 = 1. Note that this definition of R 0 is coherent with the Diekmann et al. (1990) definition of the next generation operator. In this case, we have The same result is obtained by considering that, given C 1 , I 2 has binomial distribution with mean NC 1 . Using the definition of I 1 , we can re-write E(C 1 | I 1 = 1) as All individuals are equally likely to be the first one infected, so P{η j 1 = 1 | I 1 = 1} = 1/N . Moreover, given that η j 1 = 1 and all other η k1 are equal to zero, we have from (4) that C 1 = P(D j 1 ) = P(X I ), since this individual is infected. Therefore, we can re-write (13) as and thus In the simple case where P(D) ≡ D, we get 3.3. Expected R 0 in prospective study. In a prospective study, a diagnostic is estimated with a certain margin of error, and similar arguments to the ones used in Section 3.2 can be used to derive an expression for the expected value of R 0 . Indeed, we now have where it is obvious that we must be sure about the first recorded case being indeed an infection. By conditioning on D j 2 , we get E(G j 2 | I 1 = 1) = E(D j 2 | I 1 = 1), for all j = 1, . . . , N . Now conditioning on C 1 , we get E(D j 2 | I 1 = 1) = µ S + δ E(C 1 | I 1 = 1), for all j . This means that In Section 3.2 we saw that E(C 1 | I 1 = 1) = E{P(X I )}, and the same still holds here, as conditioning on having one infected individual at t = 1 there is no diagnostic uncertainty at t = 1. Thus, we obtain In particular, when P(D) ≡ D, we get 4. PROBABILITY FUNCTION FOR R 0 4.1. Motivation. Our main objective with the evaluation of R 0 and related functions is to define criteria yielding clues as to the long-term disease establishment. While the expected value gives some clues, it is also important to evaluate how likely the value of R 0 is to spread around it. The traditional statistical approach is to construct a confidence interval for R 0 , and check if the value contains those leading to long-term disease establishment, in this case any value greater than, or equal to, 1. In our problem, however, this is of limited use: R 0 being a random variable assuming only nonnegative integer values 0, 1, 2, . . . , all of these values have positive probability mass and, thus, any confidence interval is likely to include the value R 0 = 1 at least. A more useful measurement seems to be the probability that R 0 ≥ 1. While taking uncertainty into account, this is perhaps more useful: if, under certain conditions, it is known that the probability that R 0 ≥ 1 is about 20%, then only 1 in 5 independent initial cases are likely to propagate the disease. In order to evaluate the uncertainty around the R 0 estimate, we must evaluate its probability distribution. We start with the retrospective study. We have: where we have used the fact that, given C 1 , I 2 has a binomial distribution, with n z representing the binomial coefficient equal to N !/[z!(N − z)!]. If we then condition on the true health status {η j 1 }, we can write In particular for z = 0, where the fact that, given {D i2 }, the random variables {G i2 } are conditionally independent of I 1 = 1 is used. Since each G i2 is a binary random variable, N j =1 G j 2 = z occurs whenever exactly z of the {G i2 } are equal to 1. Define J z as a set of exactly z indices, ranging from 1 to N . That is, J z is a subset of exactly z elements of the discrete set {1, 2, . . . , N }. Note that there exist n z = N !/[z!(N − z)!] such subsets. Let these be represented by J z,1 , J z,2 , . . . , J z,n z . Given {D i2 }, the conditional probability that only the variables within the subset {G i2 , i ∈ J z,l } are equal to 1 is equal to  When taking the conditional expectation given C 1 , we use the fact that the {D i2 } are conditionally independent and we get that (21) is which means that (20) becomes Note that we have hereby shown that, given C 1 , R 0 has conditional binomial distribution with mean N (µ S + δC 1 ). For z = 0 the right-hand side of (22) becomes E[(1 − µ S − δC 1 ) N | I 1 = 1] and, given that only one individual is infected, C 1 = P(X I ) as before. Thus, we can write Now we use to re-express (24) as Thus, we get an infective contact given the signal, P(D), is likely to be other than the identity, reflecting lower contact rates between individuals. For the retrospective study, we compute E(R 0 ) and P{R 0 = 0} using (14) and (19), respectively, whilst for the prospective study we use (17) and (23). In Fig. 1(a) we can see E(R 0 ) as a function of δ in both kinds of study, assuming the variances for both X S , X I are fixed as equal to 0.06, µ S = 0.1 and µ I varies from 0.13 to 0.83 by 0.1. First, we note that the expected value in the retrospective study is an upper bound for the one in the prospective study. This suggests that the uncertainty involved in the diagnostic process implies an underestimation of E(R 0 ). We can also note from Fig. 1 (a) that the two quantities are similar for values of δ = µ I − µ S in the extremes of the range considered, differing more for intermediate values. This was expected: first, when µ I → 1, µ S +δµ I tends to µ S +δ = µ I and, thus E(R 0 ) in the prospective study (17) tends to N µ I , which is E(R 0 ) in the retrospective study, (14). On the other hand, when µ I → µ S , we also have δ → 0 and µ S + δµ I → µ S , and thus in both studies we have E(R 0 ) → N µ S . Note, however, that in spite of the two expectancies converging to the same value as δ decreases, there is a discordance region where one of them is above 1, while the other is below. In Fig. 1(b) we have the computed probabilities of R 0 being greater than zero, as a function of δ, for the same parameter values considered. Differently from the expected values, the probabilities for one study type are not consistently greater than those for the other study type. Perhaps the most important aspect highlighted by this figure is the fact that P{R 0 ≥ 1} may solve apparent discordances between the expected values from different study types. Indeed, for the points within the discordance region in Fig. 1(a) , the computed probabilities lie between 0.5 and 0.7 for both study types indicating that, while long-term disease establishment is not entirely without doubt, the chance is far from negligible in both study types. The classic Reed-Frost model can be seen as a particular case of both Reed-Frost models introduced in Section 2. Let us consider the more general model for a prospective study; the result follows for the retrospective study model. Suppose that µ S = 0 and µ I = δ = 1, so that the variances of X I , X S are both equal to zero. This implies that D i,t = X I = µ I for all i, t. Then, from (10) P{G i,t = 1 | D i,t } = 1 for an infected individual, and 0 otherwise, with no uncertainty, implying that l G l,t = I t in this case. Suppose also that M = 1 in (3), meaning that P l,t = φ 1 ≡ P for all infected individuals, whilst P l,t = 0 for all susceptible individuals, for all t. Then expression (4) becomes: which is equation (1). 6.1. The proposed model. Conditions under which the disease establishes itself in the group can be obtained by determining conditions under which E(R 0 ) is greater than, or equal to, 1. Assume for simplicity P(D) ≡ D. In a retrospective study, we use expression (14) to say that the disease establishes itself in the population whenever In a prospective study, we use (18) to say that the disease establishes itself in the population whenever In this context, it is also interesting to consider P{R 0 ≥ 1} as a stochastic way of evaluating how likely the disease is of establishing itself in the long term. For that, we simply compare the value obtained for P{R 0 ≥ 1} to a pre-specified threshold; when the probability is below the threshold, the disease may be said to be unlikely to establish itself. This threshold may vary according to disease and context. We fall into the classic Reed-Frost model when µ S = 0 and µ I = δ = 1 and, in this case, from (26) the epidemic establishes itself whenever E(C 1 | I 1 = 1) ≥ 1 N or P ≥ 1 N which implies that, for large N , any epidemic with nonnegligible probability of an infectious contact, P, establishes itself in the population. It is important to point out, however, that this remark has a purely theoretical interest: in practice, when N is large the homogeneous mixing assumption rarely holds. An important practical application of the models presented here is that to studies aiming at intervention design. An intervention may involve simply a change in risk behaviour, thus changing P(D), or it may involve administering treatment, which might affect both P(D) and infected individuals' signals summary distribution. The signals summary distribution for susceptible individuals, including the mean µ S , is assumed to not be affected, as it represents the populational distribution of the signals summary under study, due to causes other than the disease. For simplicity, we assume that P(D) = φ D. First, let us consider the impact of a risk-behaviour reducing intervention, which can be represented by a change from P(D) to P * (D) = φ * D, where φ * < φ. In a retrospective study, the post-intervention R 0 expected value is given by E(R * 0 ) = N E[P * (X I )] = N φ * µ I , from (14). A desirable intervention yields E(R * 0 ) < 1, which is guaranteed to hold if φ * < (N µ I ) −1 . In a similar way, such an intervention can be designed to guarantee that where p 0 is a pre-specified threshold. Indeed, using (19) we get that (27) is satisfied by designing the intervention so that holds. Similar conditions can be obtained to evaluate a priori the intervention impact in a prospective study. Indeed, using (17) we can conclude that the intervention will generate an expected basic reproduction ratio smaller than 1 whenever No analytical expressions are available for the roots of this polynomial on φ * for general N and, thus, in practice this condition can be used mainly to check whether or not a specific value of φ * satisfies it. The post-intervention probability of long-term disease establishment can be evaluated by replacing P(X I ) by P * (X I ) = φ * X I in (25), in the same way as before. In the same vein, the impact of treatment affecting only the signals distribution can be evaluated prior to introduction via replacing δ by δ * in expressions for E(R 0 ) and P{R 0 ≥ 0}. For example, in a retrospective study such a treatment generates on average less than 1 new infected cases for each first case whenever A treatment can also affect both δ and P(D). Expressing the treatment impact in the same way as before, the condition guaranteeing that on average less than 1 new infected cases are generated for each first case is The treatment impact on the probability of long-term disease establishment, as well as in the prospective study case, can be evaluated in the same way. The impact of a treatment affecting both the signals summary distribution and the probability of an infectious contact can be similarly evaluated, by combining the ideas above. Several variants of the proposed model can be obtained. Consider first the probability of an infectious contact, which is assumed to be a function of the signals. This function can have any polynomial form, and as such can potentially include any desired function: by assigning beta distributions to the individual signals, not only a flexible distribution family is used, but also one for which all moments are available, thus there is no limitation on the polynomial degree. The function of the signals can also be extended to allow some of the signals to yield an increase, and others a decrease, on the infection probability. Furthermore, it can be generalized to take the probability of an infectious contact as a probabilistic, rather than deterministic, function of the signals, or it can even consider frailty, or varying susceptibility levels. Here we considered signals mainly as being disease symptoms, but in general these may include demographic variables, such as gender and age, as well as behavioural patterns. These may help in estimating model parameters, such as those involved in the signals distributions and in the classification process, as well as in yielding a better understanding between signals and the probability of an infectious contact. Other variants of the proposed model can be obtained by considering more sophisticated classification procedures, which effectively suggests separating the clinical signals effect on different aspects of the epidemic: the one on the classification process may well be distinct from the one on the infectious contact probability. In the above formulation it was assumed that the infectious period is of constant duration across individuals. If infectious period duration varies across individuals, but its duration can also be seen as a function of clinical signals, then different lags may be used for different individuals. The probability of an infectious contact is thus the product between the probability per time unit and the infectious period duration. The same conditional probability properties can be used as above to derive useful relations between the basic reproduction ratio and the signals' distribution parameters. Some diseases are known to have an infectious period starting before clinical signals onset. This information, if known, can be used in the retrospective study model to yield better estimates for the signals distributions parameters, as well as for the probabilities of infectious contacts. In a prospective study context, however, when decisions must be made with regards to treatment, this information is less useful, as no signals exist to mark the infectious period onset. The main idea behind our model is to use information available on signals to assess both the probability of an infectious contact and the diagnostic procedure. The basic reproduction ratio, R 0 , is seen as the number of secondary infections caused over one generation, after the introduction of a single infected individual in an entirely susceptible population. This definition is coherent with another, introduced by Diekmann et al. (1990) , based upon the next generation operator. We believe it is interpretable and suitable for our purposes. Within the context of the proposed model, it is a random variable and, as such, moments and a probability distribution function are available. There have been several attempts to generalize the Reed-Frost model so as to consider a nonhomogeneous group, either from the susceptibility or from the infectivity viewpoint (Maia, 1952; Scalia-Tomba, 1986; Lefèvre and Picard, 1990; Picard and Lefèvre, 1991) . In all these, the homogeneity assumption is relaxed by dividing the main group into subgroups, and considering that there is homogeneous mixing within each subgroup. Subgroups are closed and individuals remain within the same subgroup for the entire duration of the epidemics, which means that an individual's susceptibility and infectivity levels are taken as constant throughout the epidemic course. The Reed-Frost model proposed here handles varying infectivity levels by assuming that infectivity is determined by observable and quantifiable clinical signals, effectively assigning a probability distribution to each individual's infectious contact probability. We do not associate individuals with fixed sub-groups, as this is of limited practical interest in small groups studies, our focus here. Moreover, individual infectiousness may vary with time, a possibility not included in other generalizations. Some of the possible applications involve disease spread within classrooms, hospital wards and work groups. Extensions to larger group studies may involve subdivision into subgroups to ensure that homogeneous mixing within subgroups still holds, while allowing for infectiousness heterogeneity within subgroups as proposed here. The inclusion of heterogeneity in individual susceptibility to infections gives a number of qualitative differences compared to ordinary methods. At present practical applications are not yet available, but the heterogeneity in frailty explains some unexpected results. For instance, in Coutinho et al. (1999) it is shown that large heterogeneity in individual susceptibility to infection results in a decreasing populational force of infection with age. This is due to the fact that the population is subject to a heavy selection of highly susceptible individuals, the remaining being less and less susceptible with age. Disease studies to which the proposed model can be applied include all fastpropagating infectious diseases, in the sense that the disease propagates at a faster rate than its diagnostic and control can be performed. Examples of such diseases are the influenza-types, such as SARS, and meningitis. Applications also include several kinds of confinement, such as the forced confinement of hospital wards, and the weather-imposed confinement of classrooms at winter time. The minimum required degree of confinement is such that, apart from the initial cases, all infections are acquired within the group. Thus, applications to diseases propagating within a classroom, for example, could only account for new cases within the same classroom. Finally, we should make clear that our approach does not consist simply of generalizing the classical Reed-Frost model formulation. The inclusion of signals not only yields a more flexible model with heterogeneous susceptibilities, but also leads to conditions for effective intervention designs. Moreover, it incorporates naturally existing differences among individuals in order to make it applicable to real epidemic scenarios. For, on the one hand, there are several infections, like influenza, which, besides being transmitted among small groups of individuals, produce highly heterogeneous clinical pictures. On the other hand, the huge amount of genetic information provided by the emerging field of genomics (and proteomics) generates clinical information which may sharply distinguish individuals. These tailor-made diagnostic techniques make obvious the necessity of new tools to deal with heterogeneities. Dr Nico Nagelkerke and from both referees, which helped clarifying the text. This work was partially funded by PRONEX, CNPq and LIM01/HCFMUSP. REFERENCES An examination of the Reed-Frost theory of epidemics Infectious Diseases of Humans The Mathematical Theory of Infectious Diseases and its Applications Analysis of Infectious Disease Data Heterogeneities in individual frailties in epidemic models: a theoretical framework On the definition and the computation of the basic reproduction ratio R 0 in models for infectious diseases in heterogeneous populations A non-standard family of polynomials and the final size distribution of Reed-Frost epidemic processes Some mathematical developments on the epidemic theory formulated by Reed and Frost The dimension of Reed-Frost epidemic models with randomized susceptibility levels Asymptotic final size distribution of the multitype Reed-Frost process The authors are grateful to Dr Clarisse V Machado and Dr Ana V A Mendes for many fruitful discussions in the preparation of this work, and for comments from Beta distribution. For any random variable X with beta distribution with parameters (α, β), its probability density function iswhere (x) is the gamma function, defined asIn particular,var(X ) = αβ (α + β) 2 (α + β + 1).