key: cord-0995245-cok7vbmg authors: Soubeyrand, Samuel; Beaudouin, Rémy; Desassis, Nicolas; Monod, Gilles title: Model-based estimation of the link between the daily survival probability and a time-varying covariate, application to mosquitofish survival data date: 2007-07-13 journal: Math Biosci DOI: 10.1016/j.mbs.2007.06.005 sha: 6a901d907b7fa1df10287e916e9b3d61e95b8119 doc_id: 995245 cord_uid: cok7vbmg The survival probability in a group of individuals may evolve in time due to the influence of a time-varying covariate. In this paper we present a model-based approach allowing the estimation of the functional link between the survival probability and a time-varying covariate when data are grouped and time-period censored. The approach is based on an underlying model consisting in non-stationary Markov processes and describing the survival of individuals. The underlying model is aggregated in time and at the group level to handle the group structure of data and the censoring. The aggregation yields a generalized non-linear mixed model. Then, a Bayesian procedure allows the estimation of the model parameters and the description of the link between the survival probability and the time-varying covariate. This approach is applied in order to explore the relationship between the daily survival probability of mosquitofish (Gambusia holbrooki) and their time-varying lengths (small mosquitofish die with a higher rate than large ones because they are more affected by predation, cannibalism and environmental stress). The daily survival probability (DSP) is defined as the conditional probability that an individual survives at the day of interest given it was alive the day before. The DSP is one minus the so-called hazard probability in discrete-time survival analysis [2] . Note that the content of this paper applies for other time scales: one can consider survival probabilities for a second, a week, or a year. Moreover, one can consider, instead of the dichotomy alive/dead, dichotomies such as healthy/infected, student/worker, or not parent/parent. Here, we consider the daily time scale and the dichotomy alive/dead because of the application. Depending on the context, the DSP can depend on time-varying (or time-dependent) covariates; in such a case the DSP is non-stationary in time. For example, consider the outbreak of the severe acute respiratory syndrome (SARS) in 2003, the DSP within the population of SARS in-patients was expected to be positively influenced by improvements, with time, of the knowledge about the disease and the management of the authorities [28, 29] . Other examples arise in prey-predator systems. Indeed, the survival probability of preys may be affected for instance by (i) the time-varying count of predators, (ii) the time-varying count of preys, (iii) the time-varying size of preys. In the case of infectious diseases, the probability for an individual to stay healthy may be affected by the count of infectious individuals at the date before (autoregressive process). In social science, the probability for an individual to stay a student may be affected by the time-varying policy of the government. In order to learn about the relationship between a DSP and a time-varying covariate, one can collect data related to the phenomenon of interest and perform a statistical analysis of these data. But collecting daily data and repeating the measures a large number of days may not be possible. Indeed, each measure may be time and money consuming, and may affect the survival of individuals. That is the reason why, instead of daily measures, one can choose to carry out for several groups of individuals a few measures separated by several days, e.g. one initial measure plus a few additional measures. To get data corresponding to a large range of the time-varying covariate, the groups have to be sampled for different ranges of the covariate. Such a structure for the data generates two problems: (i) the observations are time-period (or interval) censored and (ii) there may be group effects. These two points should be taken into account when a statistical analysis of the data is performed. Some of the models which have been developed in order to explore the link between survival of individuals and time-varying covariates account for time-period censoring; see for instance [8, 17, 19, 20, 23] when time is discrete and [6, 13, 21] when it is continuous. Among the references adapted to discrete time, some ( [17, 20, 23] ) also deal with group effects. In this paper we base our approach on a model which can be viewed as an extension of the one proposed in [23, chapter 3] in the nest survival context, and we combine it with a Bayesian estimation procedure. The model was built by deriving a model at the individual level and day scale: daily survival processes of individuals were modeled as non-stationary Markov processes depending on the DSP; then the survival processes were aggregated at the group level and in time to obtain, for each group, the conditional distribution of the count of surviving individuals at time t 2 given the count of surviving individuals at time t 1 < t 2 . The DSP was modeled as a function of (i) a random group effect, constant with time, and (ii) the time-varying covariate. The resulting model for the count of surviving individuals conditional on the count of surviving individuals at the previous observation day is a generalized non-linear mixed model [7, 14, 25] . This model was used to study the variations in the DSP of mosquitofish (Gambusia holbrooki) with fish length: small mosquitofish die with a higher rate than large ones because they are more affected by predation, cannibalism and environmental stress [1] . Accurate quantitative knowledge about the link between the DSP and the length of mosquitofish is needed (i) for improving the knowledge about life history traits of mosquitofish and, more specifically, (ii) for building an individual-based model describing the dynamics of mosquitofish populations in experimental ecosystems (mesocosms). This model, still under construction in research units of the National Institute for Agricultural Research (INRA, France), is being developed to carry out studies in ecotoxicology (mosquitofish is an experimental animal commonly used in ecotoxicology; see for instance [5, 9, 10, 15, 24] ). A first version of the individual-based model, presented in [12] , describes at a daily time step the early stages of the establishment of a mosquitofish population. In that earlier version, the link between fish length and fish survival was modeled such that below a fixed threshold a fish had constant DSP p 1 and above the threshold a fish had constant DSP p 2 > p 1 . A more realistic description of variations in the DSP with fish length is needed in order to improve the individual-based model in its realism. We precisely propose such a description in this article by performing a statistical analysis of experimental data using the model described in the previous paragraph. As we will see in this paper, the accompanying Bayesian procedure allows us to give strong insight into the relationship between the fish DSP and the fish length. Section 2 presents the grouped and time-period censored data which were collected for the study of the relationship between mosquitofish survival and fish length. Presenting the motivating data allows us to highlight the problems occurring in the statistical analysis when the data are grouped and censored. The survival model and the estimation procedure are detailed in Section 3. The methodology is applied to the mosquitofish survival data set in Section 4. The paper is ended by a discussion in Section 5. Survival and mean length were measured for I = 38 groups of mosquitofish introduced in mesocosms settled in the years 2002, 2003, 2004 and 2005 in the Agrocampus of Rennes, Brittany, France (48N/1.7W). The mesocosms consisted of circular metal tanks (3.0 m diameter, 0.6 m height) filled with various elements for building fish habitats. In January, the tanks were lined up with polyethylene film (0.2 mm thick), filled with tap water (water depth: 45 cm), and sediments were introduced at a uniform layer (about 5 cm thick). The tanks then sat for a few months to allow for phyto-and zoo-plankton as well as macroinvertebrates to naturally settle in the tanks. In April, pieces of Ludwigia peploide were introduced into the mesocosms. Then, at different times in the summer, groups of mosquitofish were introduced in the mesocosms, one group per mesocosm, and nets with small mesh were installed to protect the fish from bird predation. The initial sizes of the groups of mosquitofish varied from 5 to 150 individuals, and the initial mean lengths of fish varied from 6.6 to 29.7 mm (see Table 1 ). In each group, individuals had the same age and thermal history, so their lengths were about the same. For 6 groups (i P 33 in Table 1 ), fish were captured, counted, measured for length and released in the mesocosms two or three times. Such an experimental procedure is particularly time-consuming because, as mosquitofish survival can be affected by the stress of captures, fish must be captured and manipulated with care. That is the reason why, for 32 other groups (i 6 32 in Table 1 ), fish were captured, counted, measured for length only one time after their introduction in the mesocosms. Forty six captures were done in total for the 38 groups of fish. For each capture we know: the mesocosm and the duration in days since the last capture (or fish introduction), the initial and final fish counts, and the initial and final mean fish lengths (see Table 1 ). If daily data would have been obtained for each mesocosm i, i.e. if the counts N i,t and the average lengths X i;t of the surviving fish would have been measured at times t i , . . . , t i + q i , then we could have estimated the DSP p i,t in mesocosm i at day t = t i + 1,. . . , t i + q i bŷ Then, the functional link between the DSP and the expected length of fish could be inferred, for example, by fitting a regression curve [18] to scatterplot S 0 ¼ fð Unfortunately, the procedure for counting fish and measuring their lengths has many difficulties. First, fish were captured by using fish traps, and capturing all the fish of a mesocosm can take a whole day. Second, the fish can hardly support the stress caused by frequent captures and manipulations. Therefore, the data were obtained for each mesocosm i at days t i;1 < Á Á Á < t i;K i where t i,1 represents the day of fish introduction into the mesocosm. The number of captures, K i , was often two and no more than four. The time between observations varied from 8 to 127 days. Thus, the observations of the survival process N i = {N i, t : t = 0,1,2, . . .} and the time-varying covariate . . .g were time-period censored. Despite the censoring, we could think about adapting the method proposed in the case of daily observations by making the two following assumptions. First, assume that the DSP is constant between consecutive observation days, then the fish DSP in mesocosm i at day t between the kth and (k + 1)-th captures, which occur, respectively, at times t i,k and t i,k+1 (k < K i ), can be estimated bỹ Table 1 Mosquitofish survival data set Id., identification number; i, mesocosm; N 0 and N 1 , initial and final counts of surviving fish; X 0 and X 1 , initial and final mean lengths of surviving fish; D t , duration in days separating the initial and final measurements. Second, assume that the fish length increases linearly between consecutive captures, then the expected length EðX i;t Þ for fish in mesocosm i at day t 2 {t i,k , . . . , t i,k+1 } can be estimated bỹ (a more realistic growth assumption will be made in Section 3.4). Using this material, we can plot the fish DSP against the mean fish length: Fig. 1 . It also shows scatterplot S 2 (circles) derived from S 1 by only keeping, for each couple (i, k), the point ðX i;tÀ1 ;p i;t Þ for which t is the median day of {t i,kÀ1 , . . . , t i,k }. The estimation of the functional link between the DSP and the expected length of fish could be done, as in the case of daily observations, by fitting a regression curve to scatterplot S 1 or S 2 . However, fitting a regression curve to scatterplots S 1 or S 2 , as well as S 0 in the hypothetical case of daily observations, is problematic since there exists a temporal dependence between points corresponding to the same mesocosm. In addition, the daily survival probability of fish might be submitted to a mesocosm effect. Indeed, the survival of fish is known to be affected, by the densities of predators and vegetation (filamentous algae principally) [1, 26] . These densities were precisely varying from mesocosm to mesocosm in our experiments since the settlement process of predators and vegetation was natural. Consequently, motivated by building a realistic simulation model for the mosquitofish life, we want to characterize the variability of the DSP due to the possible mesocosm effect. We now describe the construction of the model and the estimation method which allowed us to analyze the data set detailed above. Underlying assumptions are labelled A 1 , A 2 , A 3,aÀc , A 4 . The model and the estimation method allow the handling of time-period censoring and group effects. In the following the context of fish survival is used to present the model; nevertheless, as mentioned in the introduction, the model can be applied in other contexts. Let S i = {S i, t : t = 0,1, . . .} be the survival process of any fish in mesocosm i: at day 0 the fish is alive and S i,0 = 1, then S i,t = 1 as long as the fish is alive and 0 afterward. The process S i is assumed to be (A 1 ) a non-stationary Markov process with transition probabilities at day t = 1,2, . . . where p i,t is the survival probability for any fish in mesocosm i between days t À 1 and t; it is called the daily survival probability (DSP) in ecology [4] . Note that 1-p i,t is the so-called hazard probability in discrete-time survival analysis [2] . Remark. p i,t does not depend on the fish but only on the mesocosm where the fish is and the day; so, fish in a given mesocosm are submitted to the same DSP. Let us now determine the conditional distribution of the count N i;t 2 of surviving fish in mesocosm i at day t 2 given the count N i;t 1 of surviving fish in the mesocosm at day t 1 < t 2 . From the Markovian assumption A 1 , S i;t 2 given S i;t 1 follows a Bernoulli distribution whose probability is the conditional probability that a fish in mesocosm i is alive at day t 2 given it was surviving at day t 1 . This probability is given by Assuming that (A 2 ) survival processes of fish are independent, N i;t 2 given N i;t 1 follows a binomial distribution with size N i;t 1 and probability The DSP p i,t is modeled as (assumption A 3,a ) a function of the expected length EðX i;tÀ1 Þ of fish in mesocosm i at day t À 1 where a i is a real parameter reflecting survival features in mesocosm i and f b is a real-valued function parameterized by a vector b. The observed mesocosms are considered as elements of a population and we are interested in characterizing this population. Consequently, a 1 , . . . , a I are viewed as random effects [14] . They are modeled as (assumption A 3,b ) independent normal variables with mean 0 and standard deviation r. The function f b is assumed (A 3,c ) to take the form Note that when b 2 is zero, then b 3 is no more identifiable and viceversa. Using Eq. (4) for f b provides a flexible form for the DSP in Eq. (3). Thus, the model for the counts of surviving fish is a random-effects model, and Eq. (2) becomes This model is a generalized non-linear mixed model [7, 14, 25] . The non-linearity is with respect to both the random effects and the regression parameters. It is due to the product of the DSPs from day t 1 + 1 to day t 2 , and to parameter b 3 . Compared with the model proposed in [23, chapter 3], b 3 is an additional parameter which provides more flexibility and allows a better description of the relationship between the DSP and the time-varying covariate. (see Section 4). Notation. For n 1 , n 2 in N and a in R, let p t 1 !t 2 i ðn 2 jn 1 ; a; bÞ denote the conditional probability, under model (5) , that N i;t 2 ¼ n 2 given N i;t 1 ¼ n 1 and a i = a. In this subsection, we motivate the choice of a Bayesian estimation procedure based on an MCMC algorithm [7, 16] to infer on the parameters of the survival model. The R code of the estimation procedure can be asked to the corresponding author. Under assumptions (A 1 ) and (A 2 ) set in Section 3.2, conditional variables N i;t i;k jN i;t i;kÀ1 , k = 1,. . . , K i , are independent random variables, and the counting processes of surviving fish are independent from mesocosm to mesocosm. Consequently, the likelihood for r and b is where N ¼ fN i;t : i ¼ 1; . . . ; I; t ¼ t i;1 ; . . . ; t i;K i g and /(AE;r) denotes the density function of a normal random variable with mean 0 and standard deviation r. In Eq. (6), Q K i À1 k¼1 p t i;k !t i;kþ1 i ðN i;t i;kþ1 jN i;t i;k ; a; bÞ is the probability of the survival sequence N i;t i;1 ; . . . ; N i;t i;K i observed in mesocosm i, given a and b. Since a is considered as a random effect, this probability is integrated with respect to a. Thus, the integrated probability depends on b and the standard deviation r of the random effects. We then multiply the integrated probabilities for all the mesocosms to obtain Eq. (6). In statistics, the likelihood is often used to estimate model parameters. But two problems arise for evaluating the likelihood given by Eq. (6) . First, the probability p ðN i;t i;kþ1 jN i;t i;k ; a; bÞ depends on the daily expected values of the fish length EðX i;t Þ, t = t i,k , . . . , t i,k+1 , that are unknown for the experiments. We only observe the sample means X i;t i;k and X i;t i;kþ1 of the fish lengths at the capture days t i,k and t i,k+1 . So, p ðN i;t i;kþ1 jN i;t i;k ; a; bÞ is computed by approximating This approximation was obtained by assuming that (A 4 ) the expected fish length satisfies a Bertalanffy's relationship where X 1 is the maximum fish length and s is the growth rate. So, expression (7) corresponds to a concave growth curve which interpolates observed mean lengths. Second, although probabilities p ðN i;t i;kþ1 jN i;t i;k ; a; bÞ can be computed using the proposal made above, likelihood (6) cannot be evaluated analytically because of the integrals over a. This difficulty is circumvented in [23, chapter 3] by maximizing an approximation of the likelihood where the integrals are assessed using Gaussian quadrature. Instead, we specify prior distributions for r and b and apply an MCMC algorithm yielding posterior joint distributions for r, b but also the group effects a i (i = 1,. . . , I). The MCMC algorithm includes a Metropolis-Hasting within Gibbs algorithm [16] . At each iteration, the unknowns (i.e. log r, b 1 , b 2 , b 3 and the a i s) are updated using Gaussian transition kernels. The kernels are centered on the values of the unknowns at the previous iteration and their standard deviations were 0.1 for log r, b 1 , b 3 and the a i s, and 5 for b 2 . The regression parameters b 1 , b 2 , b 3 are updated simultaneously as well as the group effects a i s. Compared with the estimation procedure used in [23] , the Bayesian approach we propose has two advantages. First, one obtains estimates for the random effects which can be used in a postanalysis (using additional covariates) to better understand the mesocosms characteristics. Second, the assessment of the uncertainty and the correlation structure of the parameter estimators are not based on asymptotic results. Remark. The model can also be fitted via a Monte-Carlo expectation-maximization (MCEM) algorithm [27] . The uncertainty and the correlations of the parameter estimators can be assessed by applying a parameteric bootstrap procedure [11] . Such an approach has the same advantages than the Bayesian approach presented above, but it is much more time consuming. We applied the Bayesian estimation procedure using constant improper priors for log r, b 1 , b 2 and b 3 . Fig. 2 shows the two-dimensional posterior distributions of the parameters. In each panel, the posterior mode is marked by a white dot. The four-dimensional posterior mode corresponds tor ¼ 0:630,b ¼ ðb 1 ;b 2 ;b 3 Þ ¼ ð7:30; À295; À2:25Þ. Note on the bottom right panel that the MCMC runs for b 2 and b 3 are strongly dependent. This feature will be discussed in the perspective of sensibility analysis (see Section 4.3). Using the Bayesian procedure, we fitted the submodel with r = 0 fixed (no mesocosm effect) and the submodel with b 3 = 1 fixed (f b is linear). We compared the fits using two criteria, the deviance information criterion [DIC, 22] and the posterior expectation of the Akaike's information criterion [EAIC, 3] ; lower these criteria are, better the fit is. We computed the DIC and AIC differences, say DDIC and DEAIC, between the complete model and each of the two submodels. For the complete model against the submodel with r = 0: DDIC = À 39 and DEAIC = À 49. For the complete model against the submodel with b 3 = 1: DDIC = À15 and DEAIC = À11. Obviously the complete model has lower criteria, and according to the rule given in [22] , the differences are large enough to state that the submodels have considerably less support than the complete model. Consequently, both the mesocosm effects and the non-linearity of f b allows a more accurate description of the relationship between the fish DSP and the fish length. We performed a residual analysis to check whether the parametric link we specified in Eqs. (3) and (4) is suitable or not. We used standardized residuals for the ratios N i;t k =N i;t kÀ1 (i = 1,. . . , I and k = 2,. . . , K i ) and we plotted them against the estimated fish lengths (given by (7)) at the mean time between t kÀ1 and t k . Given a i and b, we define where (see Eq. (5)) Remark. There are three clusters of points at the right hand side of each residual plot. These clusters correspond to observations with 0 (top cluster), 1 (middle cluster) or 2 (bottom cluster) dead fish in small groups of fish (see Table 1 ). Let g b denote the conditional DSP for a fish with expected length x given the mesocosm effect is zero EðX i;t Þ ¼ EðX i;tÀ1 Þ þ sfX 1 À EðX i;tÀ1 Þg; where the initial length is EðX i;0 Þ ¼ 7 mm, the growth rate is s = 0.013 mm per day and the maximum length is X 1 = 50 mm. Fig. 5 shows the daily evolution of the fish length (top left). It also shows simulated paths of the count of surviving fish: in the top right panel 20 mesocosm effects were generated from the law N ð0;rÞ and for each mesocosm effect a path was simulated; in the bottom panels, the mesocosm effect was fixed at different quantiles of the law N ð0;rÞ and 20 paths were simulated for each value of the mesocosm effect. Under the fitted model, the paths of the count of surviving fish can be very variable. For instance, in the top right panel, there is a path without death during 200 days and another one with about 25 deaths during the same period. The bottom panels show the strong influence of the mesocosm effect. Including mesocosm effects in the individual-based model of Ref. [12] will allow the authors to reflect a significant source of variability of the population dynamics. We have presented a generalized nonlinear mixed model and proposed a Bayesian estimation procedure using an MCMC algorithm in order to learn about the relationship between the survival probability of individuals and a time-varying covariate when data are grouped and time-period censored. The model was built by aggregating in time and at the group level an underlying model adapted to the individual and day scale. Moreover, the model integrates a flexible parametric form for the daily survival probability (DSP). Thus, the link between the DSP and the time-varying covariate can be more accurately explored. The model we used can be viewed as an extension of the nest survival model first developed in [8] and improved in [23] . We have applied this approach in order to explore the relationship between the mosquitofish DSP and the fish length. This study will be useful for improving the simulation model proposed in [12] which includes, at present, a quite rough description of this function. The characterization of the mesocosm effect will also be useful for improving the ''realism'' of the simulation model. Besides, we saw that the results obtained can be used in the perspective of a sensibility analysis. It would be interesting to assess what covariates could be linked with the mesocosm effects. Plotting the estimated mesocosm effects against the experimentation years or the experimentation months did not result in a significant relationship (not shown). The possible link between the mesocosm effect and the predator and vegetation densities [see Section 2 and references 1,26] remain to be explored. In the model construction we assumed that survival processes of individual fish are independent (assumption A 2 in Section 3.2). In the context of mosquitofish this assumption is not strong, but in other contexts it can be. For instance, instead of the occurrence of the death of a mosquitofish, consider the occurrence of a transmissible disease affecting humans (e.g. AIDS, SARS), animals (e.g. avian flu) or plants (e.g. wheat rusts), and suppose one aims to link the DSP (the time step of interest can be different from the day) with the time-varying intensity of prevention measures for example. In such a context, assumption A 2 is clearly not suitable. Indeed, the ''survival'' process of an individual can affect the ''survival'' processes of other individuals since the disease is transmissible. The model might be adapted by conditioning the transition probabilities (1) not only on the past state of the individual of interest but also on the past states for the other individuals. The consequences on the construction of the model and on the estimation procedure of such a modification should be explored. Population development of the mosquitofish, Gambusia affinis, in rice fields Event History Modeling, A Guide for Social Scientists Discussion on the paper by spiegelhalter, best, carlin and van der linde Group size and ectoparasitism affect daily survival probability in a colonial bird Modeling mosquitofish (Gambusia holbrooki) response to genapol 0xd-080, a non ionic surfactant, in rice fields Modeling grouped survival data with time-dependent covariates Nonlinear Models for Repeated Measurement Data Adv anced techniques for modeling avian nest survival Sexual behavior and impregnation success of adult male mosquitofish following exposure to 17 beta-estradiol Effects of 4-nonylphenol on sex differentiation and puberty in mosquitofish (Gambusia holbrooki) An Introduction to the Bootstrap Combined use of local and anova-based global sensitivity analyses for the investigation of a stochastic dynamic model: application to the case study of an individual-based model of a fish population Hazard regression with interval-censored data Generalized, Linear and Mixed Models Genetic and demographic responses of mosquitofish (Gambusia holbrooki girard 1859) populations stressed by mercury Modeling nest-survival data: a comparison of recently developed methods that can be implemented in mark and sas Semiparametric regression Survival of cub and yearling grizzly bears in the Greater Yellowstone Ecosystem A unified approach to analyzing nest success Parameteric survival models for interval-censored data with time-dependent covariates Bayesian measures of model complexity and fit (with discussion) The influence of landscape characteristics on duck nesting success in the Missouri Coteau Region of North Dakota Disturbed sexual characteristics in male mosquitofish (Gambusia holbrooki) from a lake contaminated with endocrine disruptors Linear and Nonlinear Models for the Analysis of Repeated Measurements Implications of laboratory mosquitofish experiments for population development in rice fields A monte carlo implementation of the em algorithm and the poor man's data augmentation algorithms A comparison study of realtime fatality rates: severe acute respiratory syndrome in hong kong, singapore, taiwan, toronto and beijing, china A chain multinomial model for estimating the real-time fatality rate of a disease, with an application to severe acute respiratory syndrome The authors would like to thank G. Bounaud and C. Sevellec for technical assistance for the experiments. This work was supported in part by the Programme National d'Ecotoxicologie (PNETOX., France).