key: cord-254729-hoa39sx2 authors: Wan, Wai-Yin; Chan, Jennifer So-Kuen title: Bayesian analysis of robust Poisson geometric process model using heavy-tailed distributions date: 2011-01-01 journal: Comput Stat Data Anal DOI: 10.1016/j.csda.2010.06.011 sha: doc_id: 254729 cord_uid: hoa39sx2 We propose a robust Poisson geometric process model with heavy-tailed distributions to cope with the problem of outliers as it may lead to an overestimation of mean and variance resulting in inaccurate interpretations of the situations. Two heavy-tailed distributions namely Student’s [Formula: see text] and exponential power distributions with different tailednesses and kurtoses are used and they are represented in scale mixture of normal and scale mixture of uniform respectively. The proposed model is capable of describing the trend and meanwhile the mixing parameters in the scale mixture representations can detect the outlying observations. Simulations and real data analysis are performed to investigate the properties of the models. Most time series measured over continuous range assume a normal error distribution. These traditional time series models are however vulnerable to outliers. Outlier in time series has been realized as an influential factor in model fitting and forecasting. Failure to downweigh the outlying effects may lead to a poor model fit, an overestimation of variance, an inappropriate interpretation and an inaccurate prediction. This issue has received a great deal of attention, therefore several approaches have been developed to reduce the influence of outliers and the distributional deviation in the data analysis. Over the past decades, two main approaches are considered to cope with the overdispersion caused by the outliers. The first approach simply incorporates mixture effects to account for the heterogeneity in the distribution of the data. This can be viewed as a missing data problem assuming that the membership of the data from one of the distributions is unknown and has to be estimated. The mixture model is usually implemented using expectation-maximization (EM) algorithm or Markov chain Monte Carlo (MCMC) sampling algorithm. The second approach is to adopt a heavy-tailed distribution instead of the commonly used Gaussian distribution as the error distribution of the data. Some popular choices of heavy-tailed distributions include the Student's t-distribution and a more general class of distributions is the Pearson Type IV distribution (Johnson et al., 1995) . Alternatively, the exponential power (EP) distribution which can describe a leptokurtic (positive excess kurtosis) or platykurtic (negative excess kurtosis) shape is another good choice. However, the implementation of these distributions is difficult because the derivation of the marginal posterior distributions of the parameters is intractable using conventional numerical and analytic approximations (Choy and Smith, 1997) . To overcome this problem, Box and Tiao (1973) proposed a new exponential power family of normal scale mixtures (SMN) and later Qin et al. (1998) pioneered the scale mixtures of uniform (SMU) which replaced the normal distribution in the SMN form by uniform distribution. Theoretically, any distribution that can be expressed by SMN form, also has a SMU representation. The hierarchical structure of the SMN or SMU representation possesses two prominent advantages: (1) the resulting density contains a mixing parameter which can accommodate the extra-Poisson variation and help to identify the extreme values in outlier diagnosis and (2) the parameter estimation can be simplified by sampling from normal or uniform distribution using Markov Chain Monte Carlo (MCMC) algorithms such as Gibbs sampling. The recent emergence of the software WinBUGS which performs the Bayesian statistical inferences using MCMC algorithms also facilitates the implementation of these representations, thus enhancing their popularity in the context of Bayesian modelling. This hierarchical structure is very practical in insurance applications because it is well known that the normal error distribution falls short of allowing for irregular and extreme claims and hence contaminates the estimation procedure and leads to poor estimation. For instances, Choy and Chan (2003) applied the Student's t-and EP distributions in SM representations to predict the insurance premiums to be charged on the policyholders in credibility analysis and Chan et al. (2008) predicted and projected the loss reserves data with various heavy-tailed distributions under the generalized-t distribution family expressed in SMU representation. So far, the techniques of using scale mixture representation apply solely to continuous time series. Yet, discrete count time series is observed in many occasions especially in a medical context. In clinical trials, patients usually have longitudinal measurements and sometimes the appearance of outlying observations in the data set may inflate the mean and variance of the data distribution and have an adverse effect on both the parameter estimation and prediction. Despite overdispersion, trend is often observed in time series and examining the trend patterns provides useful information on the movement of outcomes over time. To cope with this problem, Thall and Vail (1990) proposed adding cluster-specific and time-specific random effects into the mean link function to give extra-Poisson variation to the data. Thereon, Jowaheer and Sutradhar (2002) incorporated the gamma random effects in the Poisson mixed model and later, Jowaheer et al. (2009) adopted a Poisson mixed model with two independent sources of normal random effects to deal with the problem. Nevertheless, the mixed effects approach which assumes the mean of the Poisson distribution follows gamma or log-normal distribution maybe inadequate to cast light on the outlying effect caused by the extreme observations. To tackle this pitfall, plenty of researches tried various mixing distributions. Some useful ones include generalized inverse Gaussian, generalized gamma, generalized exponential, inverse gamma, etc. Refer to (Gupta and Ong, 2005) for more details. In this paper, we seek a new direction to model overdispersed longitudinally time series of counts due to the presence of outliers. Our proposed model is an extension of the generalized mixture Poisson geometric process (GMPGP) model proposed by Wan and Chan (2009) which is developed from the geometric process (GP) model originated by Lam (1988a,b) . The GMPGP model is a two-component mixture model with a mean function to analyze the covariate effect and a ratio function to describe the time effect. Meanwhile, the mixture effects in both mean and ratio functions can capture the population heterogeneity and overdispersion in the data. See Wan (2006) and Wan and Chan (2009) for more details. In this paper, we introduce a robust mixture Poisson geometric process (RMPGP) model using some heavy-tailed distributions in scale mixture representation. We assume that the outcome variable has a Poisson distribution with a stochastic mean which forms a latent GP, and the mean of the GP after geometrically discounted by the ratio follows a heavy-tailed distribution such as log-t distribution or log-exponential power (log-EP) distribution represented in SMN and SMU forms respectively. Under the scale mixture representation, the model parameters can be simulated using the MCMC algorithms and the mixing parameters help to identify the extreme values in the outlier diagnosis. To our knowledge, this is a pioneering work in adopting Student's t-or EP distributions for Poisson time series. Besides, our proposed GP models have a trend component (ratio function) which enables the study of trends of the outcomes over time and can accommodate the clustering effect using a mixture of homogeneous distributions. These make our model more advantageous than many existing Poisson time series models. To demonstrate the characteristics and application of our models, the paper is presented as follows. First, the development of the robust mixture Poisson geometric process (RMPGP) model using Student's t-and EP distributions from the GP model is described in Section 2. Next, Section 3 introduces the scale mixture representation of the two heavy-tailed distributions and their implementation in the RMPGP model. Besides, the hierarchical structure and MCMC algorithms of the models are given followed by the introduction of the model assessment criterion. Furthermore, Section 4 consists of a simulation study of the robust Poisson geometric process (RPGP) models and Section 5 demonstrates an application of the proposed models using the epilepsy data studied by Leppik et al. (1985) with discussion. Lastly, a brief summary is given in Section 6. Lam (1988a,b ) first proposed to model positive continuous data with monotone trend, such as inter-arrival times, by a monotone process called the geometric process (GP) defined as: Definition. Let X 1 , X 2 , . . . be a sequence of non-negative random variables. If there exists a positive real number a such that {Y t = a t−1 X t , t = 1, 2, . . .} forms a renewal process (RP), then the stochastic process {X t , t = 1, 2, . . .} is called a geometric process (GP) and the real number a is called the ratio of the GP. The GP model asserts that if the ratio a discounts the tth outcome X t geometrically by t − 1 times, the resulting process {Y t } becomes stationary and forms a RP, which may follow some parametric distributions f (y t ) with E(Y t ) = µ and Var (Y t ) = σ 2 . Hence, the mean and variance of the GP model are: E(X t ) = µ a t−1 and Var (X t ) = σ 2 a 2(t−1) respectively. With the ratio parameter, the GP model allows the mean and variance to change over time. In fact, a GP is a monotonic increasing sequence of non-negative random variables if a < 1, and monotone decreasing if a > 1. When a = 1, it becomes a stationary renewal process (RP) which is independently and identically distributed with the same distribution f (y t ). Over the past decade, GP models have been applied to various fields of research (Chan et al., 2006; Lam and Chan, 1998; Lam and Zhang, 2003; Lam et al., 2004) and have been extended to fit different types of data. For example, Chan et al. (2006) introduced the threshold GP model to study the trends of the daily number of infected cases for the severe acute respiratory syndrome (SARS) epidemic outbreak in 2003. Chan and Leung (2010) initiated the binary GP model to study the trends of methadone treatment outcomes. And, Wan and Chan (2009) introduced the generalized mixture Poisson GP (GMPGP) model to analyze the new tumour counts of the bladder cancer patients which is extended from the PGP model initiated by Wan (2006) . Let W it denote the count for subject i at time t, i = 1, . . . , m, t = 1, . . . , n i and n = m i=1 n i . Following the framework of GP model, W it is assumed to follow a Poisson distribution f P (w it |x it ) with mean X it which forms a latent GP. Then, we further assume the stochastic process {Y it = a t−1 it X it } follows some lifetime distributions f (y it ), such as exponential distribution in Wan (2006) and gamma distribution in Wan and Chan (2009) , with mean µ it , the resultant model is called Poisson geometric process (PGP) model. Without loss of generality, we assume that the logarithm of (1) The resultant model is named as robust Poisson GP (RPGP) model. It is essentially a state space model with state variables X it and has time-evolving mean and ratio functions to accommodate the exogenous effects and non-monotone trends. The mean µ * it and ratio a it are identity-linked and log-linked respectively to linear functions of covariates defined respectively below: ln a it = β a0 + β a1 z a1it + · · · + β aq a z aq a it where z jkit , k = 1, . . . , q j are some time-evolving covariates. Hence, {Y it } is no longer a RP but becomes a stochastic process. Similarly, the non-constant ratio function a it allows a non-monotone trend and different values for β ak , k = 0, . . . , q a can describe a variety of trend patterns. Refer to Wan (2006) for more details. Apparently, by mixing the Poisson distribution with some heavy-tailed distributions, extra variability will be added to the Poisson distribution which enables the model to accommodate the enlarged variance caused by some extreme observations. In this paper, we consider the Student's t-and EP distributions because they have different shapes including heavier-thannormal to normal tails in the former as well as the platykurtic shape and leptokurtic shape with a kink in the latter. The various shapes allow the model to be more flexible to capture different kurtoses in the data. If Y * it has a Student's t-distribution with mean µ * it and variance ν ν−2 σ 2 , the probability density function of Student's tdistribution f T (y * it ) is given by where µ * it is given by (2), ν > 2 and is the degrees of freedom which controls the tails. A small ν gives a heavier tail and Student's t-distribution converges to normal distribution when ν → ∞. The kurtosis is 6 ν−4 + 3 for ν > 4 which is greater than 3, the kurtosis of the normal distribution. To study the pmfs of the proposed RPGP-t model, we assume q µ = q a = 1, z µ1 = b = 0, 1 as the covariate effect and z a1t = t as the time-evolving effect in (2) and (3). Hence the mean function µ t = exp(β µ0 + β µ1 b) and ratio function a t = exp(β a0 + β a1 t). Fixing b = 1 and t = 2, we change the values of one of the scale, shape or location parameters each time while keeping the other parameters constant and approximate the pmf in (1) using Monte Carlo integration as described below. Conditional on covariate b and time t, the marginal pmf estimatorf bt (w), in general, can be obtained by: where M = 10 000, the latentŷ * (j) bt are simulated from the Student's t-distribution in (4) given the parameters µ * it , σ and ν. Besides the mean and variance, we also study the kurtosis of the pmfs using the method in Gupta and Ong (2005) for a discrete distribution. The relative long-tailedness of the distribution is defined as lim w→∞fbt (w + 1)/f bt (w) where the limit is zero for Poisson distribution. The marginal pmfs are displayed in Fig. 1 (a)-(d) with their means, variances and kurtoses summarized in Table 1 below. Note that since different pmfs are drawn on the same graph for comparison, we use curves instead of bars to represent marginal pmfs for better visualization. Results reveal that in general the location, variability and the tailedness of the marginal pmf depend on parameters β jk , j = µ, a; k = 0, 1, in which a larger β µ0 and a smaller β a0 lead to a larger mean, variance and kurtosis. Whereas, ν and σ control the spread and the tail behaviour of the distribution without altering its mean. A smaller ν and a larger σ contribute to a larger variability and a heavier tail and thus the model can accommodate the outlying effect due to extreme values while keeping the mean unchanged. Since the variance of each distribution in Table 1 is substantially larger than the mean, the RPGP-t model is capable of fitting data with overdispersion due to outliers as well as data with equidispersion when σ is small. If Y * it has an exponential power (EP) distribution, also known as generalized error distribution, with mean µ * it and variance σ 2 , it has a probability density function where c 0 = Γ (3ν/2)/Γ (ν/2), c 1 = c 1/2 0 /(νΓ (ν/2)) and ν ∈ (0, 2] is a shape parameter which controls the kurtosis. This family subsumes a range of symmetric distributions such as uniform (ν → 0) with kurtosis equal to 1.8, normal (ν = 1) with kurtosis equal to 3 and double exponential (ν = 2) with a kurtosis of 6. Its tails can be more platykurtic when ν < 1 or more leptokurtic when ν > 1 compared with the normal tail (ν = 1). By fixing the parameters at β a0 = 0.5, β a1 = −0.1, β µ0 = 3.0, β µ1 = −0.2, ν = 1, σ = 0.5 but leaving one floating, the marginal pmfs are again approximated using Monte Carlo integration specified in (5) but replacing f T (y * (j) bt |µ * t , σ , ν) with f EP (y * (j) bt |µ * t , σ , ν). The effects of different parameters on the resulting pmfs are illustrated in Table 2 Different moments of marginal pmfs for RPGP-EP model under a set of floating parameters with fixed values of ν = 1, σ = 0.5, β a0 = 0.5, β a1 = −0.1, β µ0 = 3, β µ1 = −0.5. Overdispersion may arise due to clustering effects, hence an alternative way to tackle overdispersion is to add mixture effects into the mean and ratio functions. Suppose that there are G groups of subjects who have different trend patterns and each subject has the probability π l of coming from group l, l = 1, . . . , G. Conditional on group l, the marginal pmf for W it is given by: and the group-specific mean µ itl and ratio a itl functions become µ * itl = β µ0l + β µ1l z µ1it + · · · + β µq µ l z µq µ it , and ln a itl = β a0l + β a1l z a1it + · · · + β aq a l z aq a it respectively. The resulting model is named as robust mixture PGP (RMPGP) model in which f (y * itl ) is given by (4) or (6) for the RMPGP-t or RMPGP-EP model. To illustrate the distribution of a 2-group RMPGP-EP model, its pmf ( f 2 (w it )) where f l (w it ), l = 1, 2 is given by (7) is plotted in Fig. 3 by assuming G = 2, q µ = q a = 1, t = 1, z µ1i = 1 and z a1it = t. For l = 1, we set π 1 = 0.8, β a01 = −0.1, β a11 = 0.05, β µ01 = 3, β µ11 = −0.2, ν 1 = 0.2 and σ 1 = 0.2 while for l = 2, we use β a02 = 0.1, β a12 = −0.01, β µ02 = 5, β µ12 = −0.4, ν 2 = 1.9 and σ 2 = 0.01. Fig. 3 clearly displays the two distinct modes in the distribution with a larger mode (l = 1) at smaller values of W and a smaller one (l = 2) at larger values of W representing the outliers. This explains how the incorporation of mixture effects in the RMPGP-EP model can accommodate overdispersion due to clustering effects. Performing statistical inference using classical methods like maximum likelihood approach is cumbersome when the data distribution has no closed form because the likelihood function involving high-dimensional integration is intractable. To avoid such numerical difficulties, we use a Bayesian approach via MCMC algorithms to convert the optimization problem to a sampling problem. Since the non-conjugate structure in the posterior distribution for both RMPGP models and the absolute term in the density function of the EP distribution complicate the sampling algorithms, representing the heavy-tailed distributions in a scale mixture form produces a simpler set of full conditional posterior distributions for the parameters and alleviates the computational burden of the Gibbs sampler in the MCMC algorithms. Choy and Smith (1997) has shown that Student's t-and EP distributions can be expressed in scale mixture representation to facilitate the simulation in the MCMC algorithms via a Bayesian hierarchical structure. However, the ways they handle outliers are different. Choy and Walker (2003) revealed that the former downweighs the extreme values, whereas the latter merely downweighs or bounds the influence of the outliers. Thus, it will be interesting to study their performances in outlier diagnosis. In the following, the Student's t-distribution expressed in SMN form and the EP distribution represented in SMU form will be discussed in details. Assume that a continuous random variable Y has a Student's t-distribution f T (y) with location µ, scale σ 2 and degrees of freedom ν. The probability density function of Y is said to have a SMN representation if it can be expressed as where f N (.| c, d) represents a normal distribution with mean c and variance d, f G (.| c, d) refers to a gamma distribution with mean c/d and variance c/d 2 , ν is a shape parameter and λ is a mixing parameter which can be used to identify outlier. An outlier is indicated if λ is substantially small as small value implies that the normal distribution in (10) has an inflated variance and hence helps to downweigh its influence on the variance σ 2 . Applying the SMN form (10) for (4) in the RMPGP-t model, the marginal pmf for W it in (7) becomes: Theoretically, any distribution that can be expressed in SMN form, also has a SMU representation (Qin et al., 1998) . To simplify the implementation of the MCMC sampling algorithm, Walker and Gutiérrez-Peña (1999) first proposed to express the EP distribution in SMU representation. In a slightly different form, Chan et al. (2008) write the EP distribution in SMU form as follow: where f U (.|c, d) is a uniform distribution on the interval (c, d) and again λ is a mixing parameter. Different from the Student's t-distribution, the larger the λ, the wider is the range of the uniform distribution to accommodate a possible outlier. In the RMPGP-EP model, if we replace (6) with (11), the marginal pmf for W it in (7) becomes: To implement the MCMC algorithms, WinBUGS, an interactive Windows version of the BUGS program for Bayesian analysis of complex statistical models using MCMC techniques is used. For the RMPGP-t and RMPGP-EP models, the hierarchical structure under the Bayesian framework is outlined in the following: for RMPGP-t model; for RMPGP-EP model. where µ * itl and a itl are given by (8) and (9) and I il is the group membership indicator for subject i such that I il = 1 if he/she comes from group l and zero otherwise. In order to construct the posterior density, some prior distributions are assigned to the model parameters as follows: β jkl ∼ f N (0, τ 2 jkl ), j = µ, a; k = 0, 1, . . . , q j ; l = 1, . . . , G for RMPGP-EP model (14) (I i1 , . . . , I iG ) T ∼ multinomial (1, π 1 , . . . , π G ) where c l , d l , h l are some positive constants, f IG (c, d) denotes the inverse gamma density given by and f Dir (α) represents a Dirichlet distribution, a conjugate to multinomial distribution, with parameters α = (α 1 , . . . , α G ). In case of a 2-group (G = 2) mixture model, (15) can be simplified to I i1 ∼ Bernoulli(π 1 ), I i2 = 1 − I i1 and (16) becomes a uniform prior f U (0, 1) for π 1 with π 2 = 1 − π 1 . With the posterior meansÎ il of the group membership indicators I il , patient i is classified to group l ifÎ il = max lÎil . According to Bayes' theorem, the posterior density is proportional to the joint densities of complete data likelihood and prior probability distributions. For the RMPGP-t and RMPGP-EP models, the complete data likelihood functions L T (θ) and L EP (θ) for the observed data w it and missing data {y * itl , λ itl , I il } are: The vector of model parameters is θ = (θ T 1 , . . . , θ T G , π 1 , . . . , π G−1 ) T where θ l = (β µl , β al , σ l , ν l ) T ; i = 1, . . . , m, t = 1, . . . , n i , l = 1, . . . , G, β µl = (β µ0 l , . . . , β µq µ l ) and β al = (β a0l , . . . , β aq a l ). Treating {y * itl , λ itl , I il } as missing observations, the joint posterior density of the RMPGP-EP model is illustrated as follows: where w = (w 11 , w 12 , . . . , w mn m ) T , I = (I 11 , I 12 , . . . , I mG ) T , y * = (y * 111 , y * 121 , . . . , y * mn m G ) T , λ = (λ 111 , λ 121 , . . . , λ mn m G ) T , β = (β a01 , . . . , β aq a G , β µ01 , . . . , β µq µ G ) T , σ = (σ 1 , . . . , σ G ) T , ν = (ν 1 , . . . , ν G ) T and π = (π 1 , . . . , π G ) T . The complete data likelihood L EP (θ) of the RMPGP-EP model is given by (17) and the priors are given by (12)-(16). In Gibbs sampling, the unknown parameters are simulated iteratively from their univariate conditional posterior distributions which are proportional to the joint posterior density of complete data likelihood and prior densities. The univariate full conditional posterior densities for each of the unknown model parameters θ = (β, σ, ν, π, λ) and the latent group indicator I il are given by: and I − and are vectors of β, y * , λ, σ, ν, π and I excluding β jkl , y * itl , λ itl , σ l , ν l , π l and I i respectively. The MCMC algorithms are implemented using WinBUGS where 55 000 iterations are executed for each model and the first 5000 iterations are discarded as the burn-in period. Thereafter parameters are sub-sampled from every 50th iteration to reduce the auto-correlation in the sample. This results in M = 1000 simulated posterior samples of every parameter and parameter estimates are given by their sample means or medians. History plots and auto-correlation function (ACF) plots of each parameter are examined to ensure convergence and independence amongst the parameters. In our analysis, we adopt the deviance information criterion (DIC ), originated by Spiegelhalter et al. (2002) , as the model selection criterion. Deviance information criterion (DIC ) is the sum of posterior mean deviance D(θ) measuring the model fit and an effective dimension p D which accounts for the model complexity. For the RMPGP model, the DIC is defined as where D = T or EP, f D (.) are densities given by (4) and (6) respectively, and θ (j) andθ represent the jth posterior sample and posterior mean of parameter θ , , andĪ il is defined in a similar way by replacing y * (j) itl and θ (j) withȳ * itl andθ. The rule of thumb is that the smaller the DIC , the better is the model. To investigate the properties of the RPGP-t and RPGP-EP models, we conduct a simulation study in which r = 100 data sets are simulated from each of the RPGP models based on a set of true parameters. We set q µ = q a = 1 and each data set contains m = 80 time series of length n i = 8 from m 0 = 40 subjects in the control group (b = 0) and another m 1 = 40 subjects in the treatment group (b = 1) with z µ1it = b in the mean function (2). The degrees of freedom ν is set to include a b heavy (ν = 2.5) and light (ν = 50) tails for Student's t-distribution and platykurtic (ν = 0.1) and leptokurtic (ν = 1.8) shapes for EP distribution. The parameters β ak in the ratio function (3) with z a1it = t are also set to include different trend patterns by varying the sign and magnitude of the true values. Afterwards both models are fit to each data set using the Bayesian approach implemented by WinBUGS and R2WinBUGS and the parameter estimateθ is given by the average of the M = 1000 posterior mediansθ j . To examine the bias and precision of the MCMC sampling algorithms, the standard deviation (SD) ofθ j over r = 100 simulated data sets for each parameter θ is reported. To assess the accuracy of parameter estimates when the data set is simulated and fitted to the same model, we calculate the mean squared error (MSE) for each parameter θ which is given by: For model selection, the average DIC and average squared error (ASE) proposed by Wegman (1972) is used to assess the quality of the density estimator on the true pmf. Denote the true pmf of the RMPGP model by f bt (w) at time t with covariate b, the ASE is used to compare the performance of the two models on the same simulated data set and is defined as wheref jbt (w) is the pmf estimator of (1) using Monte Carlo integration described in (5) for counts w in the jth simulated set at time t with treatment effect b and ψ b = m b /m is the weight associated with the control (b = 0) or treatment group (b = 1). Clearly, the smaller the ASE, the closer is the estimated pmf to the true one and thus the better is the model performance. Table 3 summarizes the results of the four set of simulation experiments with the first two data sets simulated from the RPGP-t model and the next two simulated from the RPGP-EP model. In general, the MCMC algorithms give unbiased and precise results as both MSE and SD of most parameters are reasonably small. Moreover, the values of the shape parameter ν of the two models match with each other in terms of the tailedness. For example, the smallν = 5.286 of the Student's t-distribution agrees withν = 1.8234 which is close to 2 in the EP distribution. However, it is noticed that ν of the RPGP-t model has relatively lower precision and higher bias reflecting the higher level of difficulty in estimating the tailedness of the heavy-tailed Student's t-distribution. In model comparison, although the RPGP-t model has a slightly smaller ASE (0.04504 versus 0.04817 averaged over the four simulated sets), the RPGP-EP model outperforms the RPGP-t model in DIC (1968 DIC ( .4 versus 1979 . All in all, the simulation experiment shows that the performance of the MCMC algorithms for the two models is satisfactory and the estimated pmfsf jbt (w) approximate the true pmfs f bt (w) reasonably well. While EP distribution can be platykurtic or leptokurtic with a kink and Student's t-distribution can give a very heavy tail when the outlying effect is tremendous, the two RPGP models are suitable under different circumstances. We illustrate the usefulness of our proposed models through the epilepsy data which can be found in Thall and Vail (1990) . The data were collected from a clinical trial of 59 epileptics by Leppik et al. (1985) . In the randomized controlled trial, = 59 patients suffering from simple or complex partial seizures were assigned to either an antiepileptic drug progabide (z µi1 = 1) or a placebo (z µi1 = 0) with no intrinsic therapeutic value. The seizure counts were recorded at a two-week interval for an eight-week period (n i = 4) with no dropout or missing cases. As shown in Table 4 , the seizure counts exhibit a prominent extra-Poisson variation with large variance to mean ratios at all time t due to some outlying observations as displayed in Fig. 4(a) and (b) . To assess the overdispersion in the data, we fit a simple Poisson regression model using a mean link function η it = exp(β 0 + β 1 z µi1 + β 2 t) and a PGP model using a mean function µ it = exp(β µ0 + β µ1 z µi1 ) and a ratio function a it = exp(β a0 +β a1 t). The mean and variance under both models (indicated by ' * ' in Table 4 ) are equivalent and are given by η it and µ it a t−1 it respectively. Obviously, neither of the two simple models, as restricted by their equidispersed property, can capture the overdispersion. Besides, the higher mean seizure counts of the placebo group indicate that 'treatment group' is a feasible covariate. Moreover, the gradually decreasing seizure counts over time for both placebo and progabide groups suggests that time t maybe a possible time-evolving effect. In addition, population heterogeneity in terms of trend pattern Table 4 Mean and variance for the observed epilepsy data and under two simple fitted and the best models. Time Overall and count level are also detected intuitively. In consideration of these and the clinical interest of examining trend patterns, we adopt the RMPGP models to analyze the epilepsy data. Referring to the MCMC algorithms detailed in Section 3.2, our prior specifications are mostly non-informative except for ν l in the RMPGP-t model. In both RMPGP models, we assign τ 2 jkl = 1000 in (12), c l = d l = 0.001 in (13) and α l = 1/G in (16) for a G-group (G ≥ 2) model. For ν l in the RMPGP-t model, we take h l = 20 since there is a high degree of overdispersion in the seizure counts. After implementing the MCMC algorithms in WinBUGS, the posterior sample means are adopted as parameter estimates since the posterior densities of most model parameters are highly symmetric and the posterior sample mean are close to the posterior sample median. Table 5 summarizes the parameter estimates, standard errors (SE), 95% credibility interval (CI) and model selection criterion of the fitted models. We first fitted a simple RPGP model with treatment group (z µ1it = 0, 1) as the covariate in the mean function µ itl in (8) and two-week interval (z a1it = t = 1, 2, 3, 4) as the time-evolving effect in the ratio function a itl in (9). The negative β µ1 in both RPGP models indicate the treatment effect that patients receiving progabide are associated with lower seizure counts. However, within the treatment group, it is explicit that some of these patients have abnormally high seizure counts. Simply fitting a simple RPGP model may fail to allow for the clustering effect among patients receiving the same treatment. We therefore fitted a 2-group RMPGP model and also attempted to fit a 3-group RMPGP model using both Student's t-and EP distributions but the results indicated that one of the groups in the 3-group RMPGP model degenerated and hence the models were discarded. Not surprisingly, both RMPGP models give parallel results as they share some common model properties except the shape of the distribution of y * itl . In the RMPGP-t model, two distinct groups of patients were identified with the first group of patients having generally higher seizure counts and is named as the high-level group (l = 1). Within this group, 54% of the patients are receiving progabide and they have lower seizure counts in general (β µ11 < 0) than those receiving placebo. Whereas in the low-level (l = 2) group, 49% of the patients belong to the progabide group and again they generally have less epileptic seizures during the studying period (β µ12 < 0). Besides, the ratio function a it1 in (9) reveals that there is a slightly decreasing trend in the seizure counts in the high-level group while a it2 indicates that no obvious trend is detected in the low-level group. In addition, comparing with the low-level group, the relatively smaller ν 1 shows that the high-level group has a higher degree of overdispersion in the seizure counts due to the existence of some abnormally large observations as revealed in Fig. 4 (a) and (b). As expected, the RMPGP-EP model gives consistent results in terms of trend pattern and treatment effect. Moreover, the group membership of the patients has a close affinity with that of RMPGP-t model and two diverse groups, high-level and low-level groups, are recognized as well. But for the high-level group, despite the comparable mean level, the ratio function shows that the seizure count increases in the second 2-week interval before it drops in the next two 2-week intervals. Coherently, the estimate of ν 1 = 1.49 agrees with the small ν 1 = 7.21 in the RMPGP-t model indicating that the distribution of the seizure counts in the high-level group has a heavier tail to account for the higher degree of overdispersion. On the other hand, the smaller ν 2 = 0.24 indicates the data distribution is more uniform in the low-level group. For model selection, the smaller DIC given by (18) as shown in Table 5 for the RMPGP-EP model manifests its better model fit on the epileptic seizure counts after accounting for the model complexity. A plausible explanation is that the EP distribution has a more flexible tail behaviour and thus provides a better fit to the data. To further investigate this, their observed and fitted pmfs for the low-level group are illustrated in Fig. 5 (a)-(d) at different time points, in which the observed pmf f tl (w) for group l at time t is generally given by where I(W it = w) is an indicator which returns 1 when W it = w for patient i at time t in group l and 0 otherwise, I(z µ1i = b) indicates the treatment group b of patient i,Î il is the posterior mean of the group membership indicator and ψ b is the weight associated with the placebo (b = 0) or progabide group (b = 1). On the other hand, the fitted pmff tl (w) is simply obtained by the numerical approximation described in (5) based on the parameter estimates θ l and is weighted by ψ b . Based on Fig. 5 (a)-(d), both models imitate the observed pmfs pretty well. However the estimated trend in the ratio function cannot accommodate the upsurge in observation w = 4 at t = 2 resulting in a discrepancy between the observed and fitted pmfs in Fig. 5(b) . It is not surprised to find that the two RMPGP models give similar pmfs as both have the capability of modelling highly overdispersed data. Nevertheless, despite of the affinity, the slightly heavier tail in the distribution of the RMPGP-EP model possibly gives rise to the better DIC . In the best model, the RMPGP-EP model, since extra variation is added to the mean of the Poisson distribution, the variances of the estimated pmff tl (w) of each treatment group and the overall variances which comprises the variance of expectation and the expectation of variance conditional on the mixture group being known, show a dramatic improvement and are reported in Table 4 . Advantageously, implementing the model using Bayesian approach enables us to study the latent stochastic process y * itl and the mixing parameters λ itl which can be output in the coda in WinBUGS. Under the RMPGP-EP model, to examine the density of the unobserved y * itl which are simulated from an EP distribution with location parameter µ * itl , scale parameter σ 2 l and shape parameter ν l , we compare the densities of the posterior samples of y * itl with the normal distribution which has mean and standard deviation of the posterior samples and the EP distribution with same mean and standard deviation and a shape parameterν l . Four selected y itl from each cluster group l and treatment group z µ1i are illustrated in the following Fig. 6 . Obviously, y it1 's have a leptokurtic shape whereas y it2 's appear to be more uniform than the normal distribution. These agree with the results in Table 5 that the shape parameter of the high-level group is larger than that of the low-level group (ν 1 > ν 2 ) and explain how the EP distribution can downweigh the outlying effect. Last but not least, the outlier diagnosis is performed using the mixing parameters in the better model, the RMPGP-EP model. For each cluster group l, the mixing parametersλ itl are plotted with the standardized observations w itl of patients who are classified to group l in Fig. 7 . For fair comparison, the seizure count is standardized as w itl = |w it −ŵ itl |/σ w itl wherê w itl andσ w itl are respectively the mean and standard deviation of the estimated pmf f tl (w) in (19) at time t under group l of the RMPGP-EP model. An unusually large λ itl indicates that the w it is possibly an outlier under group l. Both mixing parameters λ itl and standardized seizure counts w itl are sorted by groups (l = 1, 2) for better visualization and are graphed in Fig. 7. a b c d Clearly, all the top 10 (5%) outlying counts with the first 10 largest λ itl locate in the high-level group (l = 1) due to the presence of some extreme observations. They are highlighted in Fig. 7 with the corresponding rank and observation in parentheses. Not surprisingly, the correlation between λ it1 and w it1 is high (r = 0.9731) which signifies the appropriateness of using the mixing parameters in outlier diagnosis. While the outlying effect is not substantial in the low-level group (l = 2), λ it2 appears to be relatively smaller. The top 4 outlying values come from the same patient with ID 49 who has abnormally high seizure counts at each 2-week interval. Besides, 2 large observations are identified from patients with ID 8, and 25. In spite of large observation, four small seizure counts are also classified as outlier from patients with ID 10, 24, 39 and 56. Knowing which patients are associated with abnormal seizure counts, specialists can pay more attention to their abnormalities and alternative treatments maybe considered. At the same time, the RMPGP-EP model has downweighed the outlying effect and thus the general trend pattern is not distorted by the extreme observations. In this paper, we propose using a Poisson geometric process (PGP) model to analyze repeated measurements of counts over time. The model is essentially a state space model with a ratio function which describes the direction and strength of the trend and a mean function which reveals the initial level and studies the covariate effect. However, the PGP model fails to allow for extra-Poisson variation when extreme observations appear in the data. Ignoring the outlying effects may also lead to overestimated mean and variance resulting in invalid interpretation and prediction. As remedies, two methods are suggested to account for overdispersion which include adopting a heavy-tailed distribution and incorporating mixture effect. The latter can handle clustering effect in the data and overdispersion arisen from that may also be captured. As a new direction to account for population heterogeneity, we apply the heavy-tailed distributions in the modelling of time series of counts and pioneer the robust Poisson geometric process (RPGP) model. This model allows the mean X itl of the Poisson distribution to follow a GP and the logarithm of the underlying stochastic process {Y it = a t−1 it X it } follows a heavytailed distribution. The resultant model is called the RPGP model. By varying a set of model parameters, the properties of the RPGP models are reported in Tables 1 and 2 and their pmfs are revealed in Figs. 1 and 2. Although the marginal pmfs do not have a closed-form, Monte Carlo integration in (5) can be used to approximate the pmf, and hence the mean as well as the variance. Tables 1 and 2 show that the model can accommodate both equidispersed and overdispersed data with varying degrees of kurtosis. The RPGP models and its extension, the RMPGP models to allow clustering effects are implemented using a Bayesian approach. Expressing the heavy-tailed distributions in a scale mixture form facilitates the model implementation using MCMC algorithms and the mixing parameters enables us to perform outlier diagnosis as shown in the real data analysis. Here, the Student's t-distribution in scale mixture of normal (SMN) and exponential power (EP) distribution in scale mixture of uniform (SMU) are adopted. The resultant models can be efficiently implemented via the user-friendly software, WinBUGS. Moreover, the posterior densities for the MCMC algorithm are derived for the RMPGP-EP model. The simulation study shows that the performances of the two RMPGP models are comparable and satisfactory. In case of data with very long tail, the RMPGP-t model seems to fit better since Student's t-distribution allows a much heavier than normal distribution. On the other hand, the RMPGP-EP model gives a better fit in the analysis of epilepsy data with diverse degree of overdispersion across mixture groups as EP distribution has a more flexible tail which can be either leptokurtic or platykurtic. One pitfall of our proposed model is that taking log-transformation of the latent Y itl inevitably causes those data associated with close-to-zero means being identified as outliers. Hence when zero is dominant in the data, the proposed RMPGP models can be extended to include a zero-altered component (Wan and Chan, 2009 ). Bayesian Inference in Statistical Analysis Robust Bayesian analysis of loss reserve data using the generalized-t distribution Binary geometric process model for the modelling of longitudinal binary data with trend Modelling SARS data using threshold geometric process Scale mixtures distributions in insurance applications Hierarchical models with scale mixtures of normal distributions The extended exponential power distribution and Bayesian robustness Analysis of long-tailed count data by Poisson mixtures Continuous Univariate Distributions Analysing longitudinal count data with overdispersion On familial Poisson mixed models with multi-dimensional random effects Geometric process and replacement problem A note on the optimal replacement problem Statistical inference for geometric processes with lognormal distribution A geometric-process maintenance model for a deteriorating system under a random environment Analysis of data from a series of events by a geometric process model A double-blind crossover evaluation of progabide in partial seizures Uniform scale mixture models with applications to Bayesian inference Bayesian measures of model complexity and fit (with discussion) Some covariance models for longitudinal count data with overdispersion Robustifying Bayesian procedures Analysis of Poisson count data using geometric process model A new approach for handling longitudinal count data with zero-inflation and overdispersion: Poisson geometric process model Nonparametric probability density estimation: a comparison of density estimation methods See Tables 1-5 and Figs. 1-7.