key: cord-0486084-1wzxy8g9
authors: Doukhan, Paul; Leucht, Anne; Neumann, Michael H
title: Mixing properties of non-stationary INGARCH(1,1) processes
date: 2020-11-11
journal: nan
DOI: nan
sha: 4ca72b3e5ef3ad9f918540b806cb4628c9bc64ad
doc_id: 486084
cord_uid: 1wzxy8g9

We derive mixing properties for a broad class of Poisson count time series satisfying a certain contraction condition. Using specific coupling techniques, we prove absolute regularity at a geometric rate not only for stationary Poisson-GARCH processes but also for models with an explosive trend. We provide easily verifiable sufficient conditions for absolute regularity for a variety of models including classical (log-)linear models. Finally, we illustrate the practical use of our results for hypothesis testing.

Conditionally heteroscedastic processes have become quite popular for modeling the evolution of stock prices, exchange rates and interest rates. Starting with the seminal papers by Engle (1982) on autoregressive conditional heteroscedastic models (ARCH) and Bollerslev (1986) on generalized ARCH, numerous variants of these models have been proposed for modeling financial time series; see for example Francq and Zakoïan (2010) for a detailed overview. More recently, integer-valued GARCH models (INGARCH) which mirror the structure of GARCH models have been proposed for modeling time series of counts; see for example Fokianos (2012) and the recently edited volume by Davis, Holan, Lund, and Ravishanker (2016) .

We consider integer-valued processes where the count variable Y t at time t, given the past, has a Poisson distribution with intensity λ t . The intensity λ t itself is random and it is assumed that λ t = f t (Y t−1 , λ t−1 , Z t−1 ), for some function f t , i.e. λ t is a function of lagged values of the count and intensity processes and a covariate Z t−1 . Mixing properties of such processes have been derived for a first time in Neumann (2011) , for a time-homogeneous transition mechanism with λ t = f (Y t−1 , λ t−1 ). This has been generalized by Neumann (2020) to a GARCH structure of arbitrary order. In both cases a contractive condition on the intensity function f was imposed which resulted in an exponential decay of the coefficients of absolute regularity. Under a weaker semi-contractive rather than a fully contractive condition on the intensity function, Doukhan and Neumann (2019) also proved absolute regularity of the count process, this time with a slower subexponential decay of the mixing coefficients. In the present paper we extend these results in two directions. We include an exogeneous covariate process in the intensity function and we also drop the condition of time-homogeneity. This allows us to consider "weakly non-stationary" processes, e.g. with a periodic pattern in the intensity function. Moreover, we also allow for a certain explosive behavior which could e.g. result from a deterministic trend. As shown in the text, this requires certain modifications of the techniques used in our previous work.

In the next section, we state the precise conditions, describe our approach of deriving mixing properties, and state the main results. In Section 3 we apply these results to time-homogeneous and time-inhomogeneous linear INGARCH models as well as to the loglinear model proposed by Fokianos and Tjøstheim (2011) . In Section 5 we sketch possible applications of our results and discuss a particular one in detail. All proofs a deferred to a final Section 6.

We derive mixing properties of an integer-valued process Y = (Y t ) t∈N 0 defined on a probability space (Ω, F, P), where, for t ≥ 1,

(2.1a) λ t = f t (Y t−1 , λ t−1 , Z t−1 ),

(2.1b) and F s = σ(Y 0 , λ 0 , Z 0 , . . . , Y s , λ s , Z s ). Here, λ = (λ t ) t∈N 0 is the process of random (nonnegative) intensities and Z = (Z t ) t∈N 0 is a sequence of R d -valued covariates. We assume that Z t is independent of F t−1 and Y t . We do not assume that the Z t 's are identically distributed since we want to include cases with a possibly unbounded trend. Note that with a slight abuse of notation and to avoid unnecessary case-by-case analysis Pois(0) denotes the Dirac measure in 0.

In what follows we derive conditions which allow us to prove absolute regularity (βmixing) of the process X = (X t ) t∈N 0 , where X t = (Y t , Z t ). In contrast, the intensity process (λ t ) t∈N 0 is not mixing in general; see Remark 3 in Neumann (2011) for a counterexample. We will show that, in case of a stationary two-sided stationary process, λ t = g(X t−1 , X t−2 , . . .), for a suitable function g. This allows us to conclude that the intensity process, and the joint process ((Y t , λ t , Z t )) t∈Z as well, are ergodic.

Let (Ω, A, P ) be a probability space and A 1 , A 2 be two sub-σ-algebras of A. Then the coefficient of absolute regularity is defined as

For the process X = (X t ) t∈N 0 on (Ω, F, P), the coefficients of absolute regularity at the point k are defined as

and the (global) coefficients of absolute regularity as

Our approach of proving absolute regularity is inspired by the fact that one can construct, on a suitable probability space (Ω,F ,P) two versions of the process X, (X t ) t∈N 0 and (X ′ t ) t∈N 0 , such that (X 0 , . . . ,X k ) and (X ′ k+n ,X ′ k+n+1 , . . .) are independent and β X (k, n) =P X k+n+r ≠X ′ k+n+r for some r ≥ 0 .

(2.2) Indeed, for given (X t ) t∈N 0 , it follows from Berbee's lemma (see Berbee (1979) or Rio (2017, Lemma 5.1), for a more accessible reference) that one can construct (X ′ t ) t≥k+n following the same law as (X t ) t≥k+n and being independent of (X 0 , . . . ,X k ) such that (2.2) is fulfilled. Using the correct conditional distribution we can augment (X ′ t ) t≥k+n withX ′ 0 , . . . ,X ′ k+n−1 such that (X ′ 0 , . . . ,X ′ k ) is independent of (X 0 , . . . ,X k ), as required. Such an ideal coupling is usually hard to find and we do not see a chance to obtain this in the cases we have in mind. However, any coupling with (X 0 , . . . ,X k ) and (X ′ 0 , . . . ,X ′ k ) being independent provides an estimate of the mixing coefficient since then β X (k, n) ≤P X k+n+r ≠X ′ k+n+r for some r ≥ 0 ; see our arguments below.

We obtain the following estimate of the coefficients of absolute regularity at the point k.

m ∈ N} is the system of cylinder sets. Note that the last but one equality in (2.3) follows since the process ((Y t , λ t , Z t )) t∈N 0 is Markovian and since the conditional distribution of (Y t , λ t , Z t ) under F t−1 depends only on λ t . Since a purely analytic approach to estimate the right-hand side of (2.3) seems to be nearly impossible, we use a stepwise coupling method to derive the desired result. Suppose that we have two versions of the original process ((Y t , λ t , Z t )) t∈N 0 , ((Ỹ t ,λ t ,Z t )) t∈N 0 and ((Ỹ ′ andλ ′ k+1 are independent underP, then we obtain from (2.3) the following upper estimate of the coefficients of absolute regularity at time k:

Thus we have just proved the following result.

) t∈N 0 , of the process ((Y t , λ t , Z t )) t∈N 0 defined by (2.1a) and (2.1b) which are both defined on the same probability space (Ω,F,P) such thatλ k+1 andλ ′ k+1 are independent underP, then

The close relationship between absolute regularity and coupling has been known for a long time. Berbee (1979, Theorem 2) showed that, for two random variables X and Y defined on the same probability space, the latter one can be replaced by an random variable Y * being independent of X and following the same distribution as Y such that the probability that Y * differs from Y is equal to the coefficient of absolute regularity between X and Y ; see also Doukhan (1994, Theorem 1.2.1.1) for a more accessible reference. In our paper, we go the opposite way: Starting from a coupling result we derive an upper estimate of the coefficients of absolute regularity.

In what follows we develop a coupling strategy to to keep the right-hand side of (2.4) small. To this end, we coupleZ k+n+r andZ ′ k+n+r (r ∈ N 0 ) such that they are equal with probability 1, and we apply the technique of maximal coupling to the count variablesỸ k+n+r andỸ ′ k+n+r . If Q 1 and Q 2 are two probability distributions on (N 0 , 2 N 0 ), then one can construct random variablesX 1 andX 2 on a suitable probability space

denotes the total variation distance between Q 1 and Q 2 . (An alternative representation is given by d

.) In our case, we have to couple among othersỸ k+n andỸ ′ k+n . We denote byF s = σ(Ỹ 0 ,λ 0 ,Z 0 ,Ỹ ′ 0 ,λ ′ 0 ,Z ′ 0 , . . . ,Ỹ s ,λ s ,Z s ,Ỹ ′ s ,λ ′ s ,Z ′ s ) the σ-algebra generated by all random variables up to time s. We constructỸ k+n andỸ ′ k+n such that, conditioned onF k+n−1 , they have Poisson distributions with respective intensitiesλ k+n and λ ′ k+n andP

Examples for such distances are given by d(λ, λ ′ ) = 2 e √ λ − √ λ ′ (see e.g. Roos (2003, formula (5) ) or Exercise 9.3.5(b) in Daley and Vere-Jones (1988, page 300) ), and d(λ, λ ′ ) = λ − λ ′ . Hence, we can constructỸ k+n andỸ ′ k+n such that

Since Z t is by assumption independent of F t−1 and Y t , we chooseZ k+n andZ ′ k+n such that they are equal with probability 1. In view of the other terms on the right-hand side of (2.4), we impose the following condition.

(A1) There exists some L 1 < 1, such that the following condition is fulfilled:

Then, if we continue to use maximal coupling,

. Proceeding in the same way we obtain that

holds for all r ∈ N. It follows from (2.4) to (2.6) that

(2.7)

To proceed, we have to find an upper estimate ofẼ[d(λ k+n ,λ ′ k+n ) ], still under the condition thatλ k+1 andλ ′ k+1 are independent, having the same distribution as the frequency λ k+1 of the original process. We make the following assumption.

(A2) There exists some L 2 < 1, such that the following condition is fulfilled. If λ, λ ′ ≥ 0, then there exists a coupling of (Y, Z) and

If (A2) is fulfilled, we obtain that

. Therefore, we obtain in conjunction with (2.7) that

Finally, in order to obtain a good bound for β X (n) we have to ensure that sup{Ẽd(λ k+1 ,λ ′ k+1 )∶ k ∈ N 0 } < ∞. Recall that, with the above method of estimating β X (k, n),λ k+1 andλ ′ k+1 have to be independent, following the same distribution as λ k+1 . In the case of a stationary process, an upper bound may follow from the fact that the intensities λ k are stochastically bounded in an appropriate sense. Such an argument, however, cannot be used if the process has an explosive behavior which means that we genuinely have to derive an upper bound forẼd(λ k+1 ,λ ′ k+1 ), with an appropriately chosen distance d; see the examples in the next section for the necessity of a tailor-made way of handling this problem. As above, it seems to be difficult to derive an upper bound forẼd(λ k+1 ,λ ′ k+1 ) in a purely analytical way. Therefore, we employ once more a coupling idea and the desired upper bound will be obtained by observing two independent versions (λ t ) t∈N 0 and (λ ′ t ) t∈N 0 of the original intensity process. We impose the following condition.

holds for all k ∈ N. Now we obtain from (2.8) and (2.9) the following result.

(ii) Suppose in addition that ((Y t , λ t , Z t )) t∈Z is a two-sided strictly stationary version of the process. Then there exists a

is the system of cylinder sets, such that

The process ((Y t , λ t , Z t )) t∈Z is ergodic.

Remark 1. As it can be seen from the proofs of Corollaries 3.1 and 3.2 below, the broad applicability of this result is assured by flexibility in the choice of the metric d in (A1) to (A3); for details see discussion the below Corollary 3.1.

In retrospect, we note that our coupling method which delivers an upper estimate for β X (k, n) consists of three phases: (2.8) shows that the upper estimate depends on the expectation of d(λ, λ ′ ), where λ and λ ′ are independent versions of λ k+1 . Since this expectation can hardly be computed analytically we consider two independent versions, (λ t ) t∈N 0 and (λ ′ t ) t∈N 0 , of the intensity process and we derive recursively an upper estimate ofẼd(λ k+1 ,λ ′ k+1 ). Condition (A3) ensures boundedness of this expectation. Once we have a uniform bound forẼd(λ k+1 ,λ ′ k+1 ), we start a second coupling mechanism which keeps the probability ofX k+n ≠X ′ k+n small; see (2.4) for how this enters the upper estimate for β X (k, n). This is accomplished by a coupling which leads to an exponential decay of d(λ k+r ,λ ′ k+r ) as r → ∞; (A2) serves this purpose. And finally, it can also be seen

contributes to the upper estimate for β X (k, n). For this we have to take care thatX k+n+r differs fromX ′ k+n+r with a small probability, givenX k+n =X ′ k+n , . . . ,X k+n+r−1 =X k+n+r−1 . Condition (A1) is intended to keep the probability of these undesired events small.

In this section we discuss some of the most popular specifications for INGARCH(1,1) processes. We begin with a linear INGARCH(1,1) process allowing for real-valued covariates, where

(3.1)

Without covariates, this model has become popular for modeling count data. Rydberg and Shephard (2000) proposed such a model for describing the number of trades on the New-York stock exchange in certain time intervals and called it BIN(1,1) model. Stationarity and other properties for this model where derived by Streett (2000), Ferland et al. (2006) who referred to it as INGARCH(1,1) model, and Fokianos et al. (2009) . Agosto et al.

(2016) generalized model (3.1) by augmenting a covariate process and coined the term PARX (Poisson autoregression with exogeneous covariates). These authors also proved, in the case of a t = a, b t = b ∀t, the existence of a stationary distribution. We study first the non-explosive case.

Then the process (X t ) t∈N 0 is absolutely regular with coefficients satisfying

The proof of Corollary 3.1 relies on the application of Theorem 2.1 with the simple metric d(λ, λ ′ ) = λ − λ ′ . In case of an explosive INGARCH(1,1) process, however, it could well happen that this distance is no longer appropriate. To see this, consider the simple case of a specification λ t+1 = aY t + C t , where 0 < a < 1 and C t being an arbitrarily large non-negative constant. Recall that our estimate (2.8) of the local coefficients of absolute regularity β X (k, n) contains the factor

which means that assumption (A3) will be violated. We show that the alternative distance

The use of such a square root transformation should not come as a big surprise.

On the other hand, it is well known that a square root transformation on Poisson variates has the effect of being variance-stabilizing. McCullagh and Nelder (1989, p. 96) . This transformation is similar to the Anscombe transform (x ↦ 2 x + 3 8) which is also a classical tool to treat Poisson data. On the other hand, for small values of λ and λ ′ , the distance λ − λ ′ turns out to be more suitable when a contraction property has to be derived; see the proof of Corollary 3.2 below. In view of this, we choose

where a suitable choice of the constant M ∈ (0, ∞) becomes apparent from the proof of Corollary 3.2 below.

Note that the random variable Z t may get arbitrarily large as t increases, for example, it could represent a trend. Hence, we allow for nonstationary, explosive scenarios here.

3.2. Non-linear INGARCH processes. Next, we consider the log-linear model proposed by Fokianos and Tjøstheim (2011) .

( 3.4) where d ∈ R and a + b < 1, and (Z t ) t∈N 0 are i.i.d. random variables such that E Z 0 < ∞. Then (i) there exists a (strictly) stationary version of ((Y t , λ t , Z t )) t , (ii) if additionally E[e 2Z 0 ] < ∞, then the process (X t ) t is absolutely regular with exponentially decaying coefficients.

Finally, we mention that other popular models where the Poisson distribution is replaced by a zero-inflated Poisson or a negative binomial distribution can be included. In both cases, we specify our model

t 's are obviously assumed to be non-negative.

t−1 ), Y t has a zero-inflated Poisson distribution and (ii) if (A3) holds and f satisfies (A1) and (A2), then β X (n) = O(ρ n ) for some ρ ∈ (0, 1).

Remark 2. The second part of Corollary 3.3 holds irrespective of the distribution of ε t . Suppose that ε t has a Gamma distribution with parameters a, b > 0 and

instead. Then, conditioned on F (1) t−1 as above, Y t has a negative binomial distribution. Indeed, since a Gamma(a,b) distribution has a density p with

This is the probability mass function of a NB(a, b (λ + b)) distribution.

In the context of stationary INGARCH processes, absolute regularity with a geometric decay of the mixing coefficients of the count process has already been proved in Neumann (2011) under a fully contractive condition,

where a and b are non-negative constants with a + b < 1. Doukhan and Neumann (2019) proved absolute regularity with a somewhat unusual subgeometric decay of the coefficients for GARCH and INGARCH processes of arbitrary order p and q under a weaker semicontractive condition,

for all y 1 , . . . , y p ∈ N 0 ; λ 1 , . . . , λ q , λ ′ 1 , . . . , λ ′ q ≥ 0, where c 1 , . . . , c q are non-negative constants with c 1 + ⋯ + c q < 1.

For the specification (3.3) and without a covariate (Z t = 0 ∀t), conditions (4.1) and (4.2) are both fulfilled. However, in case of a non-stationary covariate process (Z t ) t∈N 0 , stationarity of the process ((Y t , λ t )) t∈N 0 might fail and the results in the above mentioned papers cannot be used. More seriously, in case of an explosive behavior, e.g. if Z t is nonrandom with Z t → ∞ as t → ∞, the stability condition (2.5) in Neumann (2011) as well as the drift condition (A1) in Doukhan and Neumann (2019) are violated and a direct adaptation of the proofs in those papers seems to be impossible.

In case of a specification λ t = (a

IfỸ t andỸ ′ t are equal but large, then the right-hand side of this equation will be dominated by the term 2ab Ỹ t λ t − λ′ t which shows that both (4.1) and (4.2) are violated. However, Theorem 2.1 is applicable. One can follow the lines of the proof of Corollary 3.1 to verify the validity of (A1) to (A3) for d(λ,

We would like to mention that similar results as in our paper are possible if the Poisson distribution is replaced by a mixed Poisson or a switching Poisson specification. Moreover, standard GARCH models with a normal distribution can be treated by this approach as well. There is an ongoing project where the Poisson distribution is replaced by the distribution of the difference of two independent Poisson variates (special case of a Skellam distribution); see Doukhan, Mamode Khan, and Neumann (2020).

COVID-19 data 5.1. Statistical study. Suppose that we observe Y 0 , . . . , Y n of a linear INARCH process as in (3.1) with b t = 0, ∀t. We aim to test stationarity versus the presence of an isotonic trend. Thus, the null hypothesis will be that EY 1 = ⋯ = EY n while the alternative can be characterized by EY 1 ≤ EY 2 ≤ ⋯ ≤ EY n with at least one strict inequality in this chain of inequalities. When we fit a linear model

with a possibly non-stationary sequence of innovations (ε t ) t , then the null hypothesis corresponds to θ 1 = 0 and the alternative to θ 1 > 0. (Even if the above linear model is not adequate, a projection will lead to θ 1 > 0.) The following discussion will be simplified when we change over to an orthogonal regression model,

Then the columns in the corresponding design matrix are orthogonal and the l 2 norm of the vector composed of the entries in the second column is equal to 1. Therefore, we obtain for the least squares estimatorθ 1 of θ 1 that

where w t = (t − n+1 2 ) ∑ n s=1 (s − n+1 2 ) 2 . As before, we have θ 1 ∶= Eθ 1 > 0 if there is any positive (linear or nonlinear) trend and θ 1 = 0 under the null hypothesis. Therefore,θ 1 can be used as a test statistic.

Proposition 5.1. Suppose that Y 0 , . . . , Y n is a stretch of observations of a stationary INARCH(1) with constant coefficients such that

We show that the test statistic is asymptotically unbounded for the special case of a linear trend component in the intensity function. Other situations such as general polynomial trends can be treated similarly.

Proposition 5.2. Suppose that Y 0 , . . . , Y n is a stretch of observations of a nonstationary INARCH(1) with constant coefficients and trend such that λ t = aY t−1 + b 0 + b 1 t with a ∈ (0, 1), b 0 ≥ 0 and b 1 > 0, t ∈ N 0 and λ 0 has a finite absolute fourth moment. Then, for any K > 0

Hence, a test rejecting the null ifθ 1 σ > z 1−α is asymptotically of size α and consistent. Here, z 1−α denotes the (1−α) quantile of N (0, 1). In practice, σ is unknown and has to be estimated consistently. For our simulations and the data example presented below, we used the corresponding OLS-estimatorsâ andb 0 to obtainσ 2 =b 0 (1 −â) 3 . More precisely, we considered the model stated in Proposition 5.2 and calculated With similar arguments as in the proof of Lemma 5.1 it can be shown that the OLS estimators of a and b 0 are also consistent under the alternative described in Proposition 5.2. 5.2. Numerical study. Next, we investigate the finite sample behavior of the proposed test. Considering low and moderate levels of persistence (a = 0.2 and a = 0.5) we increase the effect of a linear trend from b 1 = 0 (null hypothesis) to b 1 = 0.1 holding the intercept fixed (b 0 = 1). We vary the sample size n = 50, 100, 200 for a = 0.2 and n = 100, 250, 500 for a = 0.5. The results for α = 0.1 using 5000 Monte Carlo loops are displayed in Figure 1 . The power properties of our test are very convincing however it tends to reject a true null too often in small samples. In particular, note that for a = 0.5 increasing the sample size from 250 to 500 improves the performance of the test under the null but barely influences the behavior under the alternative (the solid and dashed line in Figure 1 nearly coincide).

Analysis of COVID-19 data. We applied our test to investigate daily COVID-19 infection numbers as well as the cases of deaths related to COVID-19 in France and Germany from July 15 to September 15, 2020 using a data set published by the European Centre for Disease Prevention and Control (2020); see Figure 2 . Obviously, no test is required to observe an increasing trend in the daily infection numbers in France as well as in Germany. Our test clearly rejects the null in both cases (France:θ 1 σ = 317433, Germany:θ 1 σ = 69.19). However, the situation changes if we look at the cases of deaths. Again, the null is rejected for France (θ 1 σ = 7.53) at any reasonable level. Contrary, evaluating the test statistic based on the number of deaths in Germany that are related to COVID-19, we obtainθ 1 σ = −0.37, that is, the null hypothesis of no trend is not rejected at any reasonable level. We also studied a shift of the window of observation of 16 days, i.e. we considered the period from August 1 to September 30. Then, unfortunately, the null is rejected for both countries for the number of daily infections as well as for the COVID-19 related number of deaths.

Proof of Theorem 2.1. The proof of assertion (i) is given in the running text of Section 2. To prove (ii), we first identify a function g, which will satisfy the required equality (2.10). We consider backward iterations g [k] , where (with x i = (y i , z i )) g [1] (x 1 , λ 1 ) ∶= f (y 1 , λ 1 , z 1 ) and, for k ≥ 2, g [k] (x 1 , . . . , x k , λ k ) ∶= f (y 1 , g [k−1] (x 2 , . . . , x k , λ k ), z 1 ). Using the idea of iterations as in exercise 46 of Doukhan (2018) , we consider and we set precise approximations to λ t , λ

t ) P → 0 and, therefore,

By taking an appropriate subsequence (k n ) n∈N of N we even obtain

(6.1)

In order to obtain a well-defined function g, we define, for any sequence x 1 , x 2 , . . ., g(x 1 , x 2 , . . .) = lim sup n→∞ g [kn] (x 1 , . . . , x kn ,λ).

As a limit of the measurable functions g [kn] , g is also (σ(Z) − B)-measurable. From (6.1) we conclude that

holds with probability 1, as required.

Since absolute regularity of the process (X t ) t∈Z implies strong mixing (see e.g. Doukhan (1994, p. 20) ) we conclude from Remark 2.6 on page 50 in combination with Proposition 2.8 on page 51 in Bradley (2007) that any stationary version of this process is also ergodic. Finally, we conclude from (6.1) by proposition 2.10(ii) in Bradley (2007, p. 54 ) that also the process ((Y t , λ t , Z t )) t∈Z is ergodic.

Proof of Corollary 3.1. We choose the distance d as d(λ, λ ′ ) = λ − λ ′ and verify that conditions (A1) to (A3) are fulfilled.

(A1): We construct the coupling such thatZ t =Z ′ t . Then

It follows from (3.1) that Eλ k+1 ≤ (a k + b k )Eλ k + EZ k , which implies that

We construct the coupling such thatZ t =Z ′ t . Since

On the other hand, the inequality λ t+1 −λ ′

(A2): We couple the covariates such thatZ t =Z ′ t . For the count variables, we use an additive coupling as described in the proof of Corollary 3.1(A2). This yields in particular thatỸ t ≥Ỹ ′ t ifλ t ≥λ ′ t andỸ t ≤Ỹ ′ t ifλ t ≤λ ′ t . We will show that, for some

provided that the constant M in (3.2) is chosen appropriately. To this end, we distinguish between two cases:

Furthermore, we drop the index t with a t and b t . Again, we have to distinguish between two cases. a):

√ λ ′ In this case the proof of (6.2) is almost trivial. We have

Here, the second inequality follows by Jensen's inequality since x ↦ √ x is a concave function. b):

√ λ ′ This case requires more effort. We split up

say. Then

Note that the last inequality follows from 2

(6.8)

To sum up, we conclude from (6.3) to (6.8) that (6.2) is fulfilled for

Choosing now the constant M sufficiently large we obtain that ρ < 1, as required. (A3): Part (i) of (A3) is fulfilled by assumption.

Assume that the processes ((Ỹ t ,λ t ,Z t )) t∈N 0 and ((Ỹ ′ t ,λ ′ t ,Z ′ t )) t∈N 0 are independent copies of the original process ((Y t , λ t , Z t )) t∈N 0 . We have that

say. We obtain that (6.10) and, for the same reason,Ẽ

Finally, we have that

It follows from (6.9) to (6.12) that part (ii) of condition (A3) is fulfilled with

Proof of Proposition 3.1.

(i) First of all, note that the process (V t ) t∈N 0 with V t = (log(λ t ), log(Y t + 1), Z t ) forms a time-homogeneous Markov chain. Let S = R × log(N) × R be the state space of this process. In order to derive a contraction property, we choose the metric

where κ 1 and κ 2 are strictly positive constants such that a ≤ κ 1 , b ≤ κ 2 , and κ ∶= κ 1 + κ 2 < 1. We show that we can couple two versions of the process (V t ) t∈N 0 ,

We couple the corresponding covariate processes such that they coincide, i.e.Z t = Z ′ t ∀t ∈ N 0 . Let v = (x, y, z), v ′ = (x ′ , y ′ , z ′ ) ∈ S be arbitrary. We assume that

) as follows. According to the model equation (3.4) we set log(λ t+1 ) = d + ay + bx +Z t and log(λ ′ t+1 ) = d + ay ′ + bx ′ +Z ′ t . Conditioned onṼ t andṼ ′ t , the random variablesỸ t+1 andỸ ′ t+1 have to follow Poisson distributions with intensitiesλ t+1 andλ ′ t+1 , respectively. At this point we employ a coupling such thatỸ t+1 −Ỹ ′ t+1 has with probability 1 the same sign asλ t+1 −λ ′ t+1 . This implies in particular that

(6.14)

To estimate the term on the right-hand side of (6.14), we show that, for

To see this, suppose that Y (λ) ∼ Pois(λ) and Y ( ) ∼ Pois( ) are independent. Then

as well as

that is, (6.15) holds true. Hence, we obtain from (6.14) that

Recall that we have, by construction,Z t+1 =Z ′ t+1 . Using this and the above calculations we obtaiñ

It remains to translate this contraction property for random variables into a contraction property for the corresponding distributions. For the metric ∆ on S, we define

where z 0 ∈ S is arbitrary. For two probability measures Q, Q ′ ∈ P(S), we define the Kantorovich distance based on the metric ∆ (also known as Wasserstein L 1 distance) by

where the infimum is taken over all random variables V and V ′ defined on a common probability space (Ω,F,P ) with respective laws Q and Q ′ . We denote the Markov kernel of the processes (V t ) t∈N 0 by π V . Now we obtain immediately from (6.17) that

The space P(S) equipped with the Kantorovich metric K is complete. Since by (6.18) the mapping π V is contractive it follows by the Banach fixed point theorem that the Markov kernel π V admits a unique fixed point Q V , i.e. Q V π V = Q V . In other words, Q V is the unique stationary distribution of the process (V t ) t∈N 0 . Therefore, the process ((Y t , λ t , Z t )) t∈N 0 has a unique stationary distribution as well. (ii) In this case, we do not use Theorem 2.1 to prove absolute regularity, but Proposition 2.1. To this end, we make use of a contraction property on the logarithmic scale and change over to the square root scale afterwards. As above, we construct on a suitable probability space (Ω,F,P) two versions of the three-dimensional process, ((Ỹ t ,λ t ,Z t )) t∈N 0 and ((Ỹ ′ t ,λ ′ t ,Z ′ t )) t∈N 0 where these two processes evolve independently up to time k. Thenλ k+1 andλ ′ k+1 are independent, as required. For t = k + 1, . . . , k + n − 1, we couple these processes such thatZ t =Z ′ t as well asỸ t ≥Ỹ ′ t ifλ t ≥λ ′ t and vice versaỸ t ≤Ỹ ′ t ifλ t ≤λ ′ t . We obtain from (6.16) that

holds for all t ∈ {k + 1, . . .}. Using this inequality (n − 1)-times we obtain that

For t = k + n, k + n + 1, . . ., we use a maximal coupling of the count variables, that is,P

. This implies by Proposion 2.1 that

Finally, it remains to make the transition from our estimates of log(λ t ) − log(λ ′ t ) to the above total variation distances. Since x ↦ e x 2 is a convex function we have, for 0 ≤ x ≤ y, e x 2 − e y 2 = ∫ y 2

x 2 e u 2 2 du ≤ e x 2 +e y 2 8 x − y , which implies that

Using this and the estimate d T V (Pois(λ), Pois(λ ′ )) ≤ 2 e

and, analogously,

It remains to show that E[λ 2 k+n ] is finite. If Y ∼ Pois(λ), then E[(Y + 1) 2 ] = λ 2 + 3λ + 1. This implies E λ 2 t+1 λ t = e 2d E[e 2Z 0 ] λ 2a t [(λ t + 2)(λ t + 1)] b ≤ C 1 λ 2(a+b) t + 1 , for some C 1 < ∞. Therefore we obtain that E λ 2 t+1 λ t ≤ C 0 λ 2 t + C 2 , for appropriate C 0 < 1 and C 2 < ∞. From this recursion we conclude that E[λ 2 k+n ] is finite. (6.22) and (6.23) yield that

Proof of Corollary 3.3. Part (i) is an immediate consequence of (3.5). For (ii), we define a coupling such thatε t =ε ′ t , theñ E d(ε t f (Ỹ t ,λ t ,Z t ),ε t f (Ỹ ′ Proof of Proposition 5.2. We split up

(6.24)

First, note that the second sum tends to infinity. To see this, rewrite

As ∑ n s=1 (s − n+1 2 ) 2 ≥ C 1 n 3 2 , we obtain sup w t ≤ C 2 n −1 2 which implies w t t a t − 1 a − 1 = o(n) + b 1 − a n t=1 t w t = C 3 n 3 2 + o(n 3 2 ) for some positive, finite constants C 1 , C 2 , C 3 . It remains to show that the first sum in (6.24) is o P (n 3 2 ). To this end, we consider

applying the covariance inequality for α-mixing processes in Doukhan (1994) , Theorem 3 (1), or Theorem 1.1 in Rio (2017) , the fact that the α-mixing coefficients can be bounded from above by the corresponding β-mixing coefficients and Corollary 3.2. Recall that the 2 nd and the 3 rd central moment of a Poisson, Pois(λ), distributed random variable is just λ while the fourth central moment is λ 2 + 3λ. Using the binomial theorem and Eλ 2 s = O(S 2 ), we can further bound

Iterating these calculations yields that E(Y s − EY s ) 4 = O(s 2 ) which concludes the proof.

Proof of Lemma 5.1. Rewrite Y t = aY t−1 + b 0 + η t with η t = Y t − λ t , t = 1, . . . , n,. Using the corresponding matrix notation and the definition of X, we have to show that (X T X) −1 X T η = o P (1), where η = (η 1 , . . . , η n ) T . We proceed in 2 steps. First, we show that N X T η = o P (1) with N = diag(n −1 , n −1 , n −2 ). Second, we show that (N X T X) −1 = O P (1). For the first part, straight-forward calculations show that

For the second part, we rewrite (N X T X) −1 = M (N X T XM ) −1 with M = diag(0, 0, n −1 ) and show that N X T XM converges stochastically to an invertible matrix. To this end, note that

t=0 Y t n (n + 1) 2 n −1 ∑ n−1 t=0 t Y t (n + 1) 2 (n + 1)(2n + 1) (6n)

due to the exponentially decaying autocovariance function of (Y t ) t . Finally, straight-forward calculations show that the determinant of the remaining matrix is positive which concludes the proof.

Modeling corporate defaults: Poisson autoregressions with exogenous covariates (PARX)

Random walks with stationary increments and renewal theory

Generalized autoregressive conditional heteroskedasticity

Introduction to Strong Mixing Conditions, Volume I

An Introduction to the Theory of Point Processes

Handbook of Discrete-Valued Time Series

Mixing: Properties and Examples

Stochastic Models for Time Series. Mathematics and Applications

Mixing properties of Skellam-GARCH processes

Absolute regularity of semi-contractive GARCH-type processes

Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation

Integer-valued GARCH processes

Time Series: Methods and Applications, Handbook of Statistics 30

Nonlinear Poisson autoregression

Log-linear Poisson autoregression

GARCH Models: Structure, Statistical Inference and Financial Applications

Generalized Linear Models

Absolute regularity and ergodicity of Poisson count processes

Bootstrap for integer-valued GARCH(p,q) processes

About the Lindeberg method for strongly mixing sequences

Asymptotic Theory of Weakly Dependent Random Processes. Probability Theory and Stochastic Modelling 80

Improvements in the Poisson approximation of mixed Poisson distributions

A modeling framework for the prices and times of trades made on the New York Stock Exchange

Some observation driven models for time series of counts

An introduction to discrete valued time-series

Acknowledgment. This work was funded by CY Initiative of Excellence (grant "Investissements d'Avenir" ANR-16-IDEX-0008) Project "EcoDep" PSI-AAP2020-0000000013 (first and third authors) and within the MME-DII center of excellence (ANR-11-LABEX-0023-01), and the Friedrich Schiller University in Jena (for the first author).

Proof of Theorem 5.1. First, note that the contraction condition a ∈ (0, 1) assures existence of a strictly stationary version of the process with β-mixing coefficients tending to zero at a geometric rate (see Corollary 3.1 and Theorem 2.1 in Neumann (2011)). (alternatively, since we are in the stationary case, Theorem 3.1 in Neumann (2011) containing both results) Moreover, all moments of Y t are finite, see e.g. Weiß (2018, Example 4.1.6). Asymptotic normality ofθ 1 can be deduced from Application 1 in Rio (1995) setting a i,n = w i andTo this end, note that from ∑ n t=1 w t = 0 and stationarity, we getθAdditionally, straight-forward calculations yield (1 − a) 2 (1 + a) which gives and finally yields the desired result.