key: cord-340260-z13aa1wk authors: Farewell, V. T.; Herzberg, A. M.; James, K. W.; Ho, L. M.; Leung, G. M. title: SARS incubation and quarantine times: when is an exposed individual known to be disease free? date: 2005-10-19 journal: Stat Med DOI: 10.1002/sim.2206 sha: doc_id: 340260 cord_uid: z13aa1wk The setting of a quarantine time for an emerging infectious disease will depend on current knowledge concerning incubation times. Methods for the analysis of information on incubation times are investigated with a particular focus on inference regarding a possible maximum incubation time, after which an exposed individual would be known to be disease free. Data from the Hong Kong SARS epidemic are used for illustration. The incorporation of interval‐censored data is considered and comparison is made with percentile estimation. Results suggest that a wide class of models for incubation times should be considered because the apparent informativeness of a likelihood depends on the choice and generalizability of a model. There will usually remain a probability of releasing from quarantine some infected individuals and the impact of early release will depend on the size of the epidemic. Copyright © 2005 John Wiley & Sons, Ltd. Control of infectious diseases is a major public health concern. After an individual's exposure to infection, opposing biological processes take place both in the infecting organism and in the host and these result in either that individual's development of clinical evidence of the disease or in an imperceptible host-victory. During this variable period of time, the individual may in turn become infectious to others and thus play a part in generating or perpetuating an epidemic. Historically, attempts have been made to prevent and control epidemics by isolating, for an arbitrary period of time after which the biological struggle could be assumed complete, any individuals who might be incubating the disease. The word 'quarantine', derived from the Latin word quaresma, means 40 and re ects the origin of the practice in the 40-day period of compulsory isolation of ships arriving in Venice in the 14th century. As more has been learned about the di erent infections, quarantine periods have varied, but when a hitherto unknown disease appears it is extremely di cult to decide what arbitrary period should be applied. And yet this is especially important if there should be no e ective treatment for the disease or its infectious state. Controlling or preventing an epidemic then depends solely on releasing no infectious individuals into the general community. But, as was noted earlier, the period of unperceived changes in the individual is variable. Quarantine was one of the key aspects of infection control introduced during the recent severe acute respiratory syndrome (SARS) epidemic. Individuals who may have been exposed to the SARS virus were quarantined for a ÿxed period of time, most commonly 10 days. The premise was that those who may have been exposed, but who showed no signs of illness after 10 days, were unlikely to come down with the disease. Since SARS was previously unknown, a quarantine policy o ered the only control. An important paper on epidemiological aspects of SARS was that of Donnelly et al. [1] which made use of data from the Hong Kong experience with SARS. The estimation of the incubation period in this paper was based on only '57 patients with only one exposure to SARS over a limited time scale with recorded start and end dates'. Donnelly et al. [1] assumed a gamma distribution for the incubation times, implicitly therefore assuming the possibility of very long incubation periods. The work reported here arose from a question related to the conÿdence a community should have that an individual who has passed through the SARS quarantine period is disease-free and how long the quarantine period should be to make the probability of this very high. The concept of a maximum incubation time could be relevant to these considerations. There are many issues to be considered in setting a quarantine time, for example the extent of disruption to individuals' lives. Also, and quite sensibly, it can be argued that there is unlikely to be a 'true' maximum incubation time. However, one motivation for a quarantine policy is the assumption that there is a reasonably well-behaved distribution of incubation times and some maximum time beyond which it is biologically quite implausible that symptoms may arise. This time could be the basis for setting a quarantine time. Whether it is helpful to think about quarantine in this way is debatable. To inform this debate, we investigated what might reasonably be inferred about such a maximum incubation time based on the moderately sized samples that would typically be available in the early course of an epidemic. For comparison, brief consideration is also given to the estimation of tail behaviour in untruncated distributions. Our general premise is that careful speciÿcation of the available knowledge concerning the incubation distribution must be central to public health decisions to control epidemics. The work reported here should be viewed primarily as an exploration of statistical methodology that might be useful for this purpose, not as a critique of other approaches or speciÿc estimates, such as those for SARS. To illustrate the general principles involved, we follow Donnelly et al. [1] and consider a gamma distribution for incubation times. Thus if T is the random variable representing an incubation time, with an observed value of T = t, then a gamma distribution for T is speciÿed by the probability density function where t¿0; a¿0 and s¿0. The expectation of this distribution is as and the variance is as 2 . However, we now introduce the assumption that this distribution is truncated at some time M , so that 0¡T ¡M , and that the density function for T now becomes Assume that data are available on n incubation times t 1 ; t 2 ; : : : ; t n . Maximum likelihood estimation of the parameters a; s and M can then be based on the likelihood function Standard asymptotic distributional results for MLEs will not be applicable for the parameter M . In the consideration of inferential statements concerning M , there are parallels with Je reys' 'Bus problem' or more accurately, 'tramcar problem', raised in a letter to Fisher on 10 April 1934 [2, p. 163] . A brief summary is that in a town it is known that tramcars are numbered consecutively and that a new arrival in the town observes a tramcar numbered 100. Can the new arrival infer anything about the number of tramcars, say N , in the town? The problem can be extended by allowing the observation of more than one tramcar. Je reys' considered the use of a prior proportional to 1=N , after showing that a constant prior leads to no useful inferential statements. A very similar problem is the estimation of N in binomial (N; p) models. In both situations, the choice of the prior can be shown to be highly in uential inferentially. In the tramcar problem, the maximum observed number is the MLE for N and is su cient for its estimation if a uniform distribution is assumed for the observed numbers. It is, however, a biased estimate. A unique unbiased estimate can be derived but the question of optimal interval estimation remains. For the binomial problem, it has been shown that no unbiased estimator of N exists [3] . For the purpose of this paper, we simply deÿne the MLE of M and make no claims for its optimality in any sense. For public health purposes, the upper end-point of some interval of plausible values is more likely to be useful for decision making than a point estimate of the parameter. We consider the likelihood function simply as representing the information available from the data for inference concerning the unknown parameters. Comparison of the shape of the likelihoods is su cient for the issues considered here and the likelihood function, particularly through providing ratios of likelihoods, is simply regarded as giving the relative plausibilities of parameter values [4, p. 50 ]. Since, by deÿnition, it is true that for M ¿t (n) , L P (M ) can be deÿned for t (n) 6M ¡∞, it will thus provide some indication of the values of M which are plausible given the observed data. It is frequently convenient to standardize this function so that the maximum value is one by dividing by the value of the likelihood function at the MLEs. This function can then be deÿned as whereM is the MLE of M . While the MLE of M will be the same irrespective of the distributional assumption made concerning T , the shape of the proÿle likelihood for M , and therefore the range of plausible values for M , will depend on the assumption and, in particular, on assumptions about the tails of the truncated distribution. While the gamma model is well known in epidemic theory, motivated by regarding the incubation period as a ÿxed number of independent and successive stages of infection, each exponentially distributed, alternatives to the gamma distribution should be considered from a model ÿtting perspective at least. For illustration, we consider the log-normal distribution. A log-normal regression model can be written as a location scale model y = log(t) = + e, where e follows a standard normal distribution f(e) = 1 (2 ) 0:5 exp(−0:5e 2 ) and where ∈ R and ¿0. The development of a truncated log-normal model follows the development for the truncated gamma given in Section 2.1 as does the likelihood development with ( , , M ) replacing (a, s, M ) as the set of model parameters. The use of this model is also considered in Section 3. More general distributions than the truncated gamma and the truncated log-normal can also be considered. A convenient choice is the so-called log-gamma distribution of Farewell and Prentice [5] which represents a reparameterization and extension of a generalized gamma distribution. With ; q ∈ R and ¿0, the log-gamma model can be written as the location scale model y = log(t) = + w, where the density f(w; q) for w is if q = 0 and, when q = 0, is the standard normal distribution. The cumulative distribution function can be written as The log-gamma distribution includes the Weibull (q = 1) and exponential (q = = 1) distributions as special cases as well as the gamma (q = 1= ) and log-normal (q = 0). The distribution of W is negatively skewed for q¿0 and positively skewed for q¡0. Another alternative truncated distribution for incubation times is, therefore, the truncated log-gamma. The development of a proÿle likelihood for M will follow as in Sections 2.1 and 2.2 with maximization over ; and q. The use of this more general distribution is also illustrated in Section 3. We consider data from 128 SARS cases, a subset of 1755 cases in a Hong Kong Hospital Authority database, for which some information was available on time of infection. The data consist of the date of the appearance of the symptoms of SARS and an earliest and latest possible date of exposure. Initially, we restrict attention to 67 cases whose interval of possible exposure times is less than 5 days and also exclude 10 cases recorded as having ÿrst symptoms on the date of exposure. These may represent questionable records or cases related to an unusually high level of exposure, possibly hospital acquired, not of general relevance for setting quarantine times for controlling community outbreaks. Relatively short intervals of exposure times are used to provide some reasonably precise information concerning incubation, as is done in AIDS seroconverter cohorts [6] . Table I provides some comparison of the 67 cases with infection intervals less than 5 days with the cases with longer intervals. The variables examined were age, sex, health care worker status, vital status on hospital discharge and lactate dehydrogenase (LDH) level, where higher values of LDH re ect more severe disease. It can be seen that while the cases are similar in age, sex and worker status, there is a higher death rate and some evidence of more severe disease in the cases with the longer possible infection intervals. This may re ect the fact that more severe cases arriving at a hospital might well have had a longer period with the disease and be less able to characterize precisely their possible time of infection. The impact of extending the allowed interval size is examined later. There remains, of course, the implicit assumption that the cases with some information on infection time are a random sample of the entire distribution of cases. However, the possibility of biases in reporting, heterogeneity in routes of transmission or varying infectious doses of the SARS coronavirus remains. Table II presents the longest and shortest possible incubation times for these patients as well as the average of these two times, rounded to the nearest day since that is how the data would normally be recorded. We consider ÿrst the averaged times. For the data set of averaged times, Figure 1 presents the proÿle likelihoods, L * P (M ), based on the gamma, log-normal and log-gamma models discussed in Section 2. Figure 1 allows the comparison of the apparent information in the data set under the di erent modelling assumptions. For the truncated gamma model, the proÿle likelihood never drops below 60 per cent suggesting that any value of M greater than the maximum time observed, 14, is plausible. Thus it appears that the data is uninformative with respect to the maximum incubation time if a gamma distribution is assumed. However, the situation is di erent for the truncated lognormal model. While the MLE for M is again 14 days for this model, any value for M greater than 19.5 days makes the data more than 10 times less plausible than does the MLE of 14 days. For public health purposes, it could therefore be argued that, based only on such data and an assumed log-normal model, that a quarantine time of 20 days might be necessary to ensure that SARS cases were not released 'too early'. Recall that the focus here is on the upper limit of an interval of plausible values rather than any speciÿc estimator for the maximum incubation time. A possible reason for the widely di erent behaviour of the proÿle likelihood for the two models is a di erence in model ÿt. If we consider the more general log-gamma model that includes both of the other models as special cases, the proÿle likelihood for M is more informative than that based on a gamma model, but it never falls below a value of 20 per cent. Thus the apparent ability to rule out larger values of M under the log-normal model is not present if a less restrictive model assumption is made. This is true even though the maximum likelihood estimate of q is −0:13, a value close to the value q = 0 corresponding to the log-normal model. The hypothesis of a truncated gamma distribution would not be supported within this class of models. The maximum likelihood estimates of the various models are given in Figure 2 along with a histogram of the data. The estimated log-normal and log-gamma distributions are quite similar while the truncated gamma does not appear to ÿt the data very well. All the models fail to some extent in re ecting the preponderance of short incubation times. Since the use of the log-gamma model suggests there is little information for the estimation of a maximum incubation time, this may raise doubts about the assumption of a To illustrate the di erent behaviour of the proÿle likelihoods for the log-normal and loggamma models, Figure 3 plots the estimated log-gamma and log-normal models ÿt when the truncation time is taken to be M = 18 days. This shows that the lack of ÿt is much more pronounced for the log-normal distribution than for the log-gamma thus reducing the plausibility of M = 18 under the log-normal model. In general, as for the SARS cases in Hong Kong, it will be di cult to specify precisely when the exposure leading to a case occurred. As with many other diseases therefore, the usual data on incubation times will derive from cases in which the exposure is known to be within a small window of time. This will generate interval-censored incubation time information for each case. Assume that such information leads to a set of data {(t Li ; t Ui ); (i = 1; n)} where, for individual i, the incubation time is known to lie between a lower limit, t Li , and an upper limit, t Ui . If we assume that f(t) = g(t)=G(M ) is the probability density function for the actual incubation time, as in Section 2 but where g(t) is not restricted to be a gamma distribution, then, with only minor modiÿcations, the development given there can be followed for interval-censored data. The assumption of a maximum possible incubation time M creates some complication because it will limit the intervals of possible incubation times. It is also convenient to assume that all incubation times are interval-censored and that M is only allowed to take values greater than max(t Li ). This avoids any possibility of a case contributing to the likelihood via its probability density function for the smallest value of M and via a probability value otherwise. In principle, other cases could be taken to have a known incubation time if such times were below any plausible values for M , but in practice such accuracy does not exist in any event. This type of consideration arises in other non-standard likelihood inference problems [7] . The likelihood function for the estimation of the parameters of g(t) and M can then be written A proÿle likelihood for M can be deÿned in the usual manner. However, it is not possible to determine immediately the MLE,M , which will lie somewhere between the lowest allowed value, max(t Li ) + , and max(t Ui ). To illustrate the e ect of interval-censoring, we consider the data in Table II which show the lower and upper limits of the incubation times for the 67 SARS cases, for which average rounded times were used in Section 3. We have subtracted 0.5 from the lowest time in days and added 0.5 to the highest to give appropriate intervals in continuous time and to make all observations interval-censored. As outlined earlier, it is convenient mathematically to make all observations interval-censored. Observations with a single day of presumed exposure are given an interval of width 1 day in our analysis but, in principle, a much narrower interval could be used if the precision could be justiÿed. A brief exploration suggests that this would have little impact on the likelihood. Figure 4 presents proÿle likelihoods for M based on the gamma, log-normal and log-gamma models of Section 2. These plots are based on calculations of the likelihoods for values of M at intervals of 0.25 and beginning at min(t Li ) + 0:5 = 13. For convenience, the MLE of M has been taken to be the value among these which gives the largest likelihood. Further precision could be achieved but is not likely to be important. It can be seen that while the general pattern of the likelihoods is similar to that in Figure 1 , with interval-censoring not even the log-normal likelihood drops to less than the 10 per cent level. This is, of course, reasonable in the sense that much less precise information is being assumed about the incubation times and this must impact the precision of inferences. In spite of this slight, but perhaps important, change in the likelihoods, the ÿtted distributions are not much altered by the interval-censoring. For example, with the log-gamma model and Table II and (1:33; 0:81; −0:10) the interval censored data in Table II . Finally, to show the e ect of more extreme interval-censoring, we consider extending the set of data in Table II by including additional SARS cases from Hong Kong whose period of possible exposure, which deÿnes the width of the interval within which their incubation time lies, is thought to be less than 10 days rather than 5 days. This produces a set of data of 86 cases and Figure 5 presents the relative likelihoods for the three models based on these data. The proÿle likelihoods are seen to be substantially less informative with the gamma likelihood being virtually at for M values greater than 16. Note that one of the additional cases has an interval of (13:5; 19:5) for their incubation time in days. The use of censoring intervals of width 10 days is quite large in the context of SARS and could not be recommended in practice. Consideration of models for incubation times which incorporate truncation may provide valuable information for public health purposes. Nevertheless, as is illustrated in the earlier sections, there might often be insu cient evidence to be very conÿdent about a maximum incubation time, even within the context of a particular model. In this situation, an alternative approach is to set a quarantine time on the basis of percentile estimation, i.e. a quarantine time might be set as the time below which 95 per cent of cases are expected to develop. For comparison with the analyses presented earlier, the use of parametric models for this purpose is considered here. Model choice will be important since the behaviour of a distribution in the tail is very model dependent. Thus, the log-gamma model which incorporates a signiÿcant component of model choice through the parameter q might be recommended. A more ad hoc approach to model choice could be adopted although the uncertainty involved in the choice might be more di cult to incorporate into inferences. Figure 6 illustrates the best ÿtting log-gamma and log-normal distributions, not involving truncation, to the average incubation time data in Table II . The slightly better ÿt of the log-gamma at shorter times can be seen and there is some di erence in the tails. For the log-gamma, the probability of an incubation time greater than 14 days is 0.013 while, for the log-normal, it is 0.032. The MLE for the log-gamma, in contrast to the case with truncated distributions, is further from the log-normal model withq = 0:61. Essentially this re ects the need for the distribution to drop more quickly at larger values of T . The estimated 95th percentiles for the log-gamma and log-normal distributions are 10.66 and 12.09. Conÿdence intervals for these values can be derived by simulating from the estimated asymptotic distribution of the MLEs to produce an interval within which 95 per cent, say, of the corresponding simulated percentiles lie. This methodology has been compared with a delta method and a non-parametric bootstrap and performed well for the estimation of a complicated function of MLEs [8] . Based on a simulation of 1000 values, the corresponding 95 per cent intervals are (9:24; 13:68) and (9:95; 15:34). Interestingly, these values suggest the commonly adopted quarantine time for SARS of 10 days is associated with the possibility of 'releasing' approximately 5 per cent of patients 'too early'. In fact, to ensure that this is the maximum fraction released, consideration should be given to longer quarantine times re ecting the upper endpoint of the estimation intervals. Note that if the interval-censored data in Table II is used to ÿt the log-gamma model, then the estimated 95th percentile is 10.2 with a conÿdence interval of (8:64; 13:68), an interval 14 per cent longer than that for the average data. The present paper explores methodology to characterize the available knowledge on incubation times early in an infectious epidemic. Issues such as di erent routes of infection or di erent subsets of infectious individuals have not been discussed. In principle, the models used could be extended to incorporate explanatory variables deÿned by such factors. Preliminary investigations of possible explanatory variables in the Hong Kong data did not reveal any strong relationships. We have made pragmatic decisions as to which data to include for model ÿtting. These might warrant revisiting in a more comprehensive analysis. Also, since infection events cannot be observed, some data on incubation times will inevitably be 'guesses'. Many aspects of the comparison of methodologies will not be altered by this but such data will naturally give rise to interval-censoring which the methodologies discussed here do allow. A further extension is to consider individuals with more than one period of possible exposure prior to the development of symptoms. Meltzer [9] considers a simple simulation approach to this. Deÿnitive conclusions about the choice of statistical methodology are not warranted based on the investigations reported here. In the early days of an epidemic this will usually be the case. Thus, the range of inferences based on di erent methodologies will often be the basis of decisions. Nevertheless, some comments can be made. Inference concerning a truncation parameter is apparently more informative the stronger the assumptions made about the form of the incubation distribution. In the absence of independent reasons to make such an assumption however, the use of a general model, such as the loggamma, for inference should be considered, at least as part of a sensitivity analysis. The key aspect to such inferences will be the shape of the tails encompassed in the model for the incubation times. In the absence of precise information on a truncation time, estimation of percentiles provides a natural way to ÿx quarantine times. It can also be argued that this approach is less risky, and more realistic, than making the assumption of a truncated distribution. Because of its exibility in the tails, the log-gamma can also be recommended for percentile estimation. Investigation of other methods is warranted. Possibilities would include the use of sample quantiles to deÿne non-parametric conÿdence intervals for population quantiles [10, Chapter XI, Section 3.1] or the asymptotic distribution of sample quantiles [4, Appendix A.2.3] . Whatever method is adopted, the uncertainty involved in any estimation of percentiles should be incorporated into public health decisions. In the setting of quarantine times, other factors must also be considered. Meltzer [9] presents evidence for some SARS incubation times greater than 10 days. It appears based on the data presented here that a quarantine time of 10 days for SARS might release one infectious patient in twenty. Therefore, for a quarantined population of 200, this would correspond to 10 individuals but the larger the quarantined population, the larger the number of released infectious individuals. Thus the length of a quarantine period might well be set in light of the expected number of quarantined individuals. Also consideration of the psychological and economic impact of quarantine on individuals and the population as a whole must be balanced against the risks associated with early release of infected individuals. Finally, note that the implicit assumption in setting a quarantine time is that quarantine is isolation of x days from the supposed day of contact whereas it is often implemented as isolation of x days from the ÿrst day on which an individual is identiÿed as having been exposed to the disease. This may build in an additional margin of safety from the public health perspective. Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong Statistical Inference and Analysis. Selected Correspondence of R.A On maximum likelihood estimation of the binomial parameter n Theoretical Statistics A study of distributional shape in life testing Extending public health surveillance of HIV disease On a singularity in the likelihood for a change-point hazard rate model A Markov model for HIV disease progression including the e ect of HIV diagnosis and treatment: application to AIDS prediction in England and Wales Multiple contact dates and SARS incubation periods McGraw-Hill Kogakusha: Tokyo, 1974. Copyright ? We thank the referees for their comments that led to an improved presentation. This work was supported by the Medical Research Council (U.K.), the National Science and Engineering Research Council (Canada) and the Research Fund for the Control of Infectious Diseases of the Health, Welfare and Food Bureau of the Hong Kong SAR Government.