key: cord-0836415-g0qulvxs
authors: Otto, Sven; Breitung, Jorg
title: Backward CUSUM for Testing and Monitoring Structural Change with an Application to COVID-19 Pandemic Data
date: 2020-03-05
journal: Econometric Theory
DOI: 10.1017/s0266466622000159
sha: 0712191c60aadde0c92910ae1539922e9a67e941
doc_id: 836415
cord_uid: g0qulvxs

It is well known that the conventional cumulative sum (CUSUM) test suffers from low power and large detection delay. In order to improve the power of the test, we propose two alternative statistics. The backward CUSUM detector considers the recursive residuals in reverse chronological order, whereas the stacked backward CUSUM detector sequentially cumulates a triangular array of backwardly cumulated residuals. A multivariate invariance principle for partial sums of recursive residuals is given, and the limiting distributions of the test statistics are derived under local alternatives. In the retrospective context, the local power of the tests is shown to be substantially higher than that of the conventional CUSUM test if a break occurs in the middle or at the end of the sample. When applied to monitoring schemes, the detection delay of the stacked backward CUSUM is found to be much shorter than that of the conventional monitoring CUSUM procedure. Furthermore, we propose an estimator of the break date based on the backward CUSUM detector and show that in monitoring exercises this estimator tends to outperform the usual maximum likelihood estimator. Finally, an application of the methodology to COVID-19 data is presented.

Cumulative sums have become a standard statistical tool for testing and monitoring structural changes in time series models. The CUSUM test was introduced by Brown et al. (1975) as a test for structural breaks in the coefficients of a linear regression model y t = x t β t + u t with time index t, where β t denotes the coefficient vector, x t is the vector of regressor variables and u t is a zero mean error term. Under the null hypothesis, there is no structural change in β t , while, under the alternative hypothesis, the coefficient vector changes at unknown time T * ≤ T .

Sequential tests, such as the CUSUM test, consist of a detector statistic and a critical boundary function. The CUSUM detector sequentially cumulates standardized one-step ahead forecast errors, which are also referred to as recursive residuals. The detector is evaluated for each time point within the testing period, and, if its path crosses the boundary function at least once, the null hypothesis is rejected. If the endpoint of the sample is fixed and the test is applied once to the full sample by comparing the path of the detector with the boundary function, the test is called a retrospective test (henceforth: R-test). A variety of R-tests have been proposed in the literature (for recent reviews, see Robbins et al., 2011 , Aue and Horváth, 2013 , and Casini and Perron, 2019 .

Since the seminal work of Chu et al. (1996) , increasing interest has been focused on monitoring structural stability in real time. Sequential monitoring procedures (henceforth:

M -tests) consist of a detector statistic and a boundary function that are evaluated for periods beyond some historical time span. The monitoring time span with t > T can either have a fixed endpoint M < ∞ or an infinite horizon. In the fixed endpoint setting, the monitoring period starts at T + 1 and ends at M , while the boundary function depends on the ratio m = M/T . In case of an infinite horizon, the monitoring time span does not need to be specified before the monitoring procedure starts. These two monitoring schemes are also referred to as closed-end and open-end procedures (see Kirch and Kamgaing, 2015) . The null hypothesis of no structural change is rejected whenever the path of the detector crosses some critical boundary function for the first time. Monitoring procedures for a fixed endpoint were proposed in Leisch et al. (2000) , Zeileis et al. (2005) , Wied and Galeano (2013) , and Dette and Gösmann (2020) , whereas Chu et al. (1996) , Horváth et al. (2004) , Aue et al. (2006) , Fremdt (2015) , and Gösmann et al. (2021) considered an infinite monitoring horizon.

In recent years, M -tests have become popular as tests for speculative bubbles in financial markets (e.g. Phillips et al., 2011 , Homm and Breitung, 2012 , Astill et al., 2018 . A well-known drawback of the conventional CUSUM R-test is its low power, whereas the CUSUM M -test may exhibit large detection delays. This is due to the fact that the prebreak recursive residuals are uninformative, as their expectation is equal to zero up to the break date, while the recursive residuals have a non-zero expectation after the break. Hence, the cumulative sums of the recursive residuals contain a large number of uninformative residuals that only add noise to the statistic. In contrast, if one cumulates the recursive residuals backwardly from the end of the sample to the beginning, the cumulative sum collects the informative residuals first, and the likelihood of exceeding the critical boundary will typically be larger than when cumulating residuals from the beginning onwards. In this paper, we show that backward CUSUM test procedures may indeed have a much higher power and lower detection delays than the conventional CUSUM R-and M -tests. Phillips and Shi 2018 proposed a reverse sample scheme for the PSY procedure that is used for detecting crises (bubble collapses). The main difference with our approach is that their regression is performed in reverse order, while our approach estimates the model in the original time but reverses the order of the (recursive) residuals.

Another way of motivating the backward CUSUM testing approach is to consider the simplest possible situation, where, under the null hypothesis, it is assumed that the process is generated as y t = µ + u t , with µ and σ 2 = V ar(u t ) assumed to be known. To test the hypothesis that the mean changes at T * , we introduce the dummy variable D * t , which is unity for t ≥ T * and zero elsewhere. The uniformly most powerful test statistic is the t-statistic for the hypothesis δ = 0 in the regression y t − µ = δD * t + u t , which is given by σ −1 (T − T * + 1) −1/2 T t=T * (y t − µ). If µ is unknown, we may replace it by the full sample mean y, resulting in the backward cumulative sum of the OLS residuals from period T through T * . If T * is unknown, the test statistic is computed for all possible values of T * , whereas the starting point T of the backward cumulative sum remains constant.

Since the sum of the OLS residuals is zero, it follows that the test is equivalent to a test based on the forward cumulative sum of the OLS residuals. In contrast, if we replace µ with the recursive mean µ t−1 = (t − 1) −1 t−1 i=1 y t , we obtain a test statistic based on the backward cumulative sum of the recursive residuals (henceforth: backward CUSUM). In this case, however, the test is different from a test based on the forward cumulative sum of the recursive residuals (henceforth: forward CUSUM). This is due to the fact that the sum of the recursive residuals is an unrestricted random variable. Accordingly, the two versions of the test may have quite different properties. In particular, it turns out that the backward CUSUM is much more powerful than the standard forward CUSUM at the end of the sample. Accordingly, this version of the CUSUM test procedure is better suited for the purpose of real-time monitoring, where it is crucial to be powerful at the end of the sample.

An additional problem of the conventional CUSUM test is that it has no power against alternatives that do not affect the unconditional mean of y t (see Krämer et al., 1988) . For both retrospective testing and monitoring, we propose a multivariate sequential statistic in the fashion of the score-based cumulative sum statistic of Hansen (1992) and the tests by Jiang and Kurozumi (2019) . The maximum vector entry of the multivariate statistic yields a detector and a sequential test that has power against a much larger class of structural breaks than when using conventional CUSUM detectors.

We also suggest a new estimator for the break date based on backwardly cumulated recursive residuals. This estimator outperforms the conventional estimator constructed by the sum of squared residuals whenever the break occurs close to the end of the sample, which is the relevant scenario for on-line monitoring. This paper is organized as follows. In Section 2, the limiting distribution of the multivariate CUSUM process is derived under both the null hypothesis and local alternatives.

Section 3 introduces the backward CUSUM and the stacked backward CUSUM tests for both retrospective testing and monitoring. While the backward CUSUM is only defined for t ≤ T and can thus be implemented only for retrospective testing, the stacked backward CUSUM cumulates recursive residuals backwardly in a triangular scheme and is therefore suitable for real-time monitoring. The local powers of the tests are compared in Section 4.

In the retrospective setting, the powers of the backward CUSUM and the stacked backward CUSUM tests are substantially higher than that of the conventional forward CUSUM test if a single break occurs after one third of the sample size. In the case of monitoring, the detection delay of the stacked backward CUSUM under local alternatives is shown to be much lower than that of the monitoring CUSUM detector by Chu et al. (1996) . In Section 5 we present a strong invariance principle for the multivariate CUSUM process and propose an infinite horizon monitoring procedure. Section 6 considers the estimation of the break date based on backwardly cumulated recursive residuals. We present an estimator, which is more accurate than the conventional maximum likelihood estimator if the break is located at the end of the sample. Section 7 presents Monte Carlo simulation results, in Section 8 we provide a real-data example on monitoring SARS-CoV-2 infections during the COVID-19 pandemic, and Section 9 concludes.

Throughout the paper, we use the following notation: a = max i=1,...,k |a i | denotes the maximum norm and A M = max i=1,...,k l j=1 |A i,j | denotes the maximum absolute row sum norm, where a ∈ R k , and A ∈ R k×l . We use p −→ to denote convergence in probability as T → ∞, ([0, m] ). The space is equipped with the Skorokhod metric (see Billingsley, 1999) , and the symbol "⇒" denotes weak convergence with respect to this metric.

We consider the multiple linear regression model

where y t is the dependent variable, and x t = (1, x t2 , . . . , x tk ) is the vector of regressor variables including a constant. The k × 1 vector of regression coefficients β t depends on the time index t, and u t is an error term. The time point T divides the time horizon into the retrospective time period t ≤ T and the monitoring period t > T . We impose the following assumptions on the regressors and the error term.

< ∞ for all t, and the sample covariance matrices

Model (1) allows for conditionally heteroskedastic errors and local non-stationary regressors, provided that a global long-run covariance matrix Ω exists. The regressors can contain lagged dependent variables such as in autoregressive distributed lag models. We focus on models with a correctly specified dynamic structure and uncorrelated errors.

Assumption 2. The error process u t is a martingale difference sequence with respect to

Following Brown et al. (1975) this assumption rules out autocorrelated error processes.

In practice this may require a dynamic specification with a suitable lag distribution of the variables. In Remark 1 we show that autocorrelated errors can be accommodated by replacing the ordinary covariance matrix by a (consistent estimate of the) long-run covariance matrix. Since the estimation of long-run covariances can lead to finite sample size distortions (see e.g. Casini 2021), Assumption 2 is a common and convenient assumption in practice. The expression of the global covariance matrix simplifies to Ω = σ 2 C under Assumption 2.

Recursive residuals for linear regression models were introduced by Brown et al. (1975) as standardized one-step ahead forecast errors, and are defined as

Using recursive residuals instead of ordinary OLS residuals as in Ploberger and Krämer (1992) has a number of advantages. First, the recursive residuals behave exactly as under the null hypothesis until the parameters change, whereas a structural break affects all OLS residuals in a different manner. Second, under Assumptions 1 and 2, the recursive residuals form a martingale difference sequence regardless of the estimation error in the recursive residuals. By contrast, the OLS residuals are (slightly) autocorrelated, which only disappears if the sample size gets large.

The conventional CUSUM detector is given by

denotes the sample variance of {w k+1 , . . . , w T }. Under the null hypothesis H 0 : β t = β 0 for all t, the univariate CUSUM process obeys the functional central limit theorem S rT ,T ⇒ W (r), where W (r) is a standard Brownian motion (see Sen, 1982) . The univariate CUSUM R-test of Brown et al. (1975) rejects the null hypothesis if the path of |S t,T | exceeds the linear critical boundary function b t = λ α d lin (t/T ) for at least one time index t = 1, . . . , T , where d lin (r) = 1 + 2r.

The critical value λ α is the (1 − α) quantile of sup 0≤r≤1 |W (r)|/d lin (r) and determines the significance level α, which accounts for the multiplicity issue of the sequential test procedure.

In the monitoring context, Chu et al. (1996) considered the radical type boundary function b rad (r) = r 1/2 (log(r) − log(α 2 )) 1/2 , which is derived from the boundary crossing probability for a Brownian motion (see Robbins and Siegmund, 1970) . The conventional univariate CUSUM M -test rejects the null hypothesis if the detector statistic |S t,T − S T,T | exceeds b t = b rad (t/T ) for some t > T .

A weakness of univariate CUSUM tests is that they focus on breaks in the intercept. Ploberger and Krämer (1990) studied local alternatives of the form β t = β 0 + T −1/2 g(t/T ),

where g : R → R k is piecewise constant and bounded. The authors showed that S rT ,T ⇒ W (r) + π h(r), where π = e 1 C, e 1 = (1, 0, . . . , 0) , and

Consequently, univariate CUSUM tests have no power if g(r) is orthogonal to π. To sidestep this difficulty, we follow Jiang and Kurozumi (2019) and consider the multivariate statistic

Under Assumption 1, the multivariate series x t u t obeys a multivariate functional central limit theorem (see Phillips and Durlauf 1986) , which also applies to the multivariate CUSUM process of recursive residuals.

Theorem 1. Let Assumptions 1 and 2 hold true. If β t = β 0 for all t, then

for any m < ∞, as T → ∞, where W (k) (r) is a k-dimensional standard Brownian motion.

If β t = β 0 + T −1/2 g(t/T ), where g(r) is piecewise constant and bounded, then

This is an extension of the results in Jiang and Kurozumi (2019) , who considered slightly stronger assumptions and no local alternatives. Note that the function g(r) is constant if and only if β t is constant. If β t = β 0 for all t, we have h(r) = 0. By contrast, under a local alternative with a non-constant break function g(r), it follows that h(r) is non-zero, and, consequently, C 1/2 h(r) is non-zero, since C 1/2 is positive definite. Hence, sequential tests that are based on Q T (r) have power against a larger class of alternatives than the tests of Brown et al. (1975) and Chu et al. (1996) .

Therefore, we consider R-and M -tests that are based on the multivariate detector Q t,T = Q T (t/T ). Note that Q t,T = S t,T if there is only an intercept in the model. The multivariate forward CUSUM R-test is defined by the following rule: the null hypothesis is rejected if the path of Q t,T exceeds the boundary function b t = λ α d(t/T ) for at least one index t = 1, . . . , T . Equivalently, we can express this sequential test as a one-shot test,

where H 0 is rejected if the maximum statistic Q T = max t=1,...,T Q t,T /d(t/T ) exceeds the critical value λ α , which is the (1 − α) quantile of its limiting null distribution.

Assumption 3. The boundary function is of the form b(r) = λ α d(r), where d(r) is continuous. There exists > 0 such that d(r) > for all r ≥ 0.

By Theorem 1 and the continuous mapping theorem it follows that

under the null hypothesis. The multivariate forward CUSUM M -test with fixed endpoint

where, under H 0 ,

Remark 1. If the dynamics of the model are not specified correctly, the errors may be autocorrelated and Assumption 2 does not apply. In this case, the limiting distribution differs from that in (4) and depends on the global long-run covariance matrix Ω. Under additional strong mixing assumptions, the process x t u t obeys the multivariate functional Wooldridge and White 1988) . To obtain the same limiting distribution as in Theorem 1, we may consider the modified multivariate CUSUM detector

where Ω T is some consistent estimator for Ω. Suitable choices are the long-run covariance estimators of Newey and West (1987) and Andrews (1991) . An alternative is the double kernel HAC estimator by Casini (2021) , which performs well in the presence of locally stationary regressors. In the Appendix we show that, if Assumption 1 holds and if there exists

Therefore, all R-and M -tests can also be constructed based on the modified detector Q t,T = Q T (t/T ).

A second approach to deal with possible autocorrelation was proposed by Robbins et al. (2011) . Their two-step adjustment approach first employs an ARMA model in order to obtain the pre-whitened residuals which in turn replaces the original residuals in the detector.

In our case the ARIMA pre-whitening may be performed in a recursive fashion. As shown by Robbins et al. (2011) the resulting detector possesses similar asymptotic properties as the original detector (apart from a scaling factor that depends on the long-run variance).

Remark 2. In practice, partial or one-sided tests can be beneficial in terms of a more powerful test if one is interested in breaks in certain coefficients or directions. For testing the partial hypothesis H 0 : H β t = H β 0 , where H is a k×l matrix with full column rank, we con-

, under H 0 and the conditions of Theorem 1. The R-test by Brown et al. (1975) and the M -test by Chu et al. (1996) are partial structural break tests for which the matrix H coincides with the first unit vector. In case of one-sided tests, e.g. H 1 : H β t > H β 0 , the maximum norm can be replaced by the simple maximum, so that H 0 is rejected if p(H Q t,T ) exceeds the respective boundary function, where p(

An alternative approach is to cumulate the recursive residuals in reversed order. Suppose there is a single break in β t at time t = T * . Then, {w t , t < T * } are the residuals from the pre-break period, and {w t , t ≥ T * } are those from the post-break period. As the prebreak recursive residuals are not affected by a violation of the null hypothesis, they do not Table 1 : Asymptotic critical values for Q T and BQ T k = 1 k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 provide useful information about a subsequent break. Accordingly, the partial sum process T −1/2 t j=1 w j behaves like a pure random walk for t < T * and cumulating those residuals brings nothing but noise to the detector statistic. In contrast, the post-break residuals have nonzero mean and reveal relevant information about a possible break. In order to focus on the post-break residuals, we therefore consider backwardly cumulated partial sums of the

We define the retrospective backward CUSUM detector as

for at least one time index t. The maximum statistic is given by

and, under the local alternatives defined in Theorem 1, the continuous mapping theorem

Hence, the limiting distribution of BQ T under H 0 coincides with that of Q T . Simulated asymptotic critical values under the linear boundary (2) are presented in Table 1 . Under local alternatives, the limiting distributions of BQ T and Q T differ. A simple illustrative example of the detector paths together with the linear boundary (2) of Brown et al. (1975) are depicted in Figure 2 , in which two processes with k = 1 and a single break in the mean at 3/4 and 1/4 of the sample are simulated. Note: The process yt = µt + ut, t = 1, . . . , T , is simulated for T = 100 with µt = 0 for t < τ * T , µt = 1 for t ≥ τ * T , and ut ∼ N ID(0, 1), where τ * = 0.75 in the upper panels and τ * = 0.25 in the lower panels. The bold solid line paths are the trajectories of Q t,T and BQ t,T , where the detectors are univariate such that the norm is just the absolute value. In the background, the recursive residuals are plotted. The dashed and dash-dotted lines correspond to the linear boundary d lin (r) with significance levels α = 5% and α = 0.1%, respectively.

Unlike the forward CUSUM detector, the backward CUSUM detector is not measurable with respect to the filtration of available information at time t and is therefore not suitable for a monitoring procedure. The path of BQ t,T cannot be monitored in real-time, as it is only defined for t ≤ T with fixed endpoint T . To obtain a feasible M -test in practice, we resort to a triangular backward inspection scheme of recursive residuals, which is discussed below.

The idea of the stacked backward CUSUM scheme is to compute this statistic sequentially for each t, yielding BQ T (1), . . . , BQ T (T ).

The corresponding maximum statistic SBQ T is the maximum among this sequence of backward CUSUM statistics. An important feature is that this sequence is measurable with respect to the filtration of information at time t, so that BQ T (t) is itself a sequential statistic. Stacking all backward CUSUM statistics on one another leads to a triangular array structure given by

which is denoted as the stacked backward CUSUM detector. We reject H 0 if SBQ s,t,T exceeds the triangular boundary b s,t = b( t T , s−1 T ) for some t = 1, . . . , T and s = 1, . . . , t.

Assumption 4. The triangular boundary function is of the form

The stacked backward CUSUM R-test can be equivalently expressed in terms of a maximum

Under the local alternatives defined in Theorem 1, it follows that

d(r, s) .

Since the triangular detector (6) is measurable with respect to the information set at time t, it can be monitored on-line across all time points t > T . The null hypothesis is rejected if SBQ s,t,T exceeds b s,t = b( t T , s−1 T ) at least once for some t ≥ T + 1 and s = T + 1, . . . , t. The M -test maximum statistic with a fixed horizon m < ∞ is given by

where, analogously to (5), 

are presented in Table 2 .

In order to illustrate the advantages of the backward CUSUM tests, we consider the simple local break model β t = β 0 +T −1/2 g(t/T ) with g(r) = c1 {r≥τ * } , where c ∈ R k , and τ * denotes 

and, under the linear boundaries (2) and (7), the R-tests satisfy

Asymptotic local power curves from the limiting distributions in (9) for the case k = 1 are presented in Figure 3 . The (2, 3)-element of the panel of figures shows that for a fixed break size the backward CUSUM and the stacked backward CUSUM outperform the forward CUSUM if a single break τ * is located after 15% of the sample size. If the break date τ * tends to the end of the sample, the power gain of BQ T and SBQ T increases substantially.

For the M -test statistics with fixed endpoint m = 2, the limiting distributions of Q T,m and SBQ T,m for a break at τ * ∈ (1, 2) coincide with those for the R-tests presented in (9) for a break at τ * ∈ (0, 1). Hence, the power of SBQ T,m is higher than that of Q T,m if breaks are located after 15% of the pre-monitoring sample.

Another important performance measure for M -tests is the delay between the actual break and the detection time point. Aue and Horváth (2004) 

where λ α,m,Q and λ α,m,SBQ are the corresponding critical values. Under the same setting as in (9), the relative detection stopping times satisfy Moreover, the asymptotic mean local delay of SBQ T,m slowly decreases in τ * and is much lower than that of Q T,m , except for early breaks.

Remark 3. While, for one-shot tests, the critical value determines the type I error, sequential testing involves two degrees of freedom. Besides the test size, which is controlled asymptotically by an appropriately chosen value for λ α , the shape of the boundary determines the distribution of potential relative crossing time points r. As already noted by (2) and (7) are considered in the first five plots and the radical boundary by Chu et al. (1996) is used in the last plot. Brown et al. (1975) , the forward CUSUM with the linear boundary (2) puts more weight on detecting breaks that occur early in the sample (c.f. Figure 3 ). In Figure 4 we present the distributions of the first boundary crossing under the null hypothesis, which is also referred to as the "distribution of the size" (see Anatolyev and Kosenok, 2018) . The results indicate that the size is skewed for the forward and backward CUSUM tests and almost evenly distributed for stacked backward CUSUM tests, which is due to the weighting scheme of the linear triangular boundary function (7). There is no consensus on which distribution should be preferred, as whether one wishes to put more weight on particular regions of time points of rejection depends on the particular application. However, Zeileis et al. (2005) and Anatolyev and Kosenok (2018) argue that if no further information is available, one might prefer a uniform distribution to a skewed one. 

The functional central limit theorem given by Theorem 1 is not suitable for analyzing the asymptotic behavior of an infinite horizon monitoring statistic, since the variance of Q T (r)

is unbounded as r → ∞, and sup r≥1 Q T (r) − W (k) (r) might not converge in general.

Instead, we need an almost sure invariance principle, which is specified by the following high level condition:

Assumption 5. There exists a k-dimensional Brownian motion W (k) (t) such that t j=1

x j u j = Ω 1/2 W (k) (t) + o(t 1/2 ), (a.s.).

Almost sure invariance principles were first studied by Strassen (1967) , who verified

Assumption 5 under the additional assumption that x t u t is a stationary and ergodic martingale difference sequence. Optimal rates were first derived by Komlós et al. (1975) . Aue and Horváth (2004) Theorem 2. Let Assumptions 1, 2, and 5 hold true and let β t = β 0 for all t ∈ N. There exists a k-dimensional standard Brownian motion W (k) (r), such that, as T → ∞, 

.

Analogously to the fixed horizon case, H 0 is rejected if the statistic exceeds the (1 − α)

quantile of its limiting null distribution. Since a maximum over a non-compact set can be unbounded, we need further restrictions on the boundary functions for infinite horizon monitoring.

Assumption 6. The boundary functions that are defined in Assumptions 3 and 4 satisfy

Under this assumption we show the following theorem:

Theorem 3. Let β t = β 0 for all t, and let Assumptions 1-6 hold true. Then, as T → ∞,

The linear boundary d lin (r) (see equation (2)) satisfies Assumption 6, whereas for the linear triangular boundary d sbq (r, s) (see equation (7)) it is not satisfied. Instead, the boundary must be at least of order √ r uniformly among all s, which motivates the alternative boundary d inf (r, s) = √ r(1 + 2(r − s)), 0 ≤ s ≤ r.

Simulated critical values for the M -tests under the boundaries d lin (r) and d inf (r, s) are presented in Table 3 .

As soon as the testing procedure has indicated a structural instability in the coefficient vector, the next step is to locate the break point. In the single break model with β t = β 0 + δ1 {t≥T * } , where δ = 0, Horváth (1995) suggested to estimate the relative break date τ * = T * /T by the relative time index for which the likelihood ratio statistic is maximized.

As an asymptotically equivalent estimator, Bai (1997) proposed the maximum likelihood estimator

where R 1 (t) is the OLS residual sum of squares when using observations until time point t and R 2 (t) is the OLS residual sum of squares when using observations from time t + 1 onwards. In case of monitoring, Chu et al. (1996) considered

to estimate τ * mon = T * /T d , where T d denotes the detection time point, which is the stopping time at which the detector statistic exceeds the boundary function for the first time. The maximum likelihood estimator is very accurate if the breakpoint is located in the middle of the sample. However, by construction, the true breakpoint T * tends to be close to the stopping time T d , and R 2 (T * ) is computed from very few observations, which may lead to a large finite sample estimation error for the maximum likelihood estimator. A theoretical explanation for this effect is given in Casini and Perron (2018, 2021) , where the finitesample distribution of the least squares estimator is investigated using a continuous record asymptotic framework.

To bypass this problem, we use backwardly cumulated recursive residuals to estimate the relative break location. In the single break model, BQ rT ,T is asymptotically proportional to h(1) − h(r) , which is constant in the pre-break period and decreases to zero in the post-break period. When scaled by its asymptotic standard deviation, the detector is asymptotically proportional to h(1) − h(r) / √ 1 − r, which in turn (see equation (8)) is where the maximum is attained at r = τ * . Accordingly, we consider

Theorem 4. Let β t = β 0 + δ1 {t/T ≥τ * } , where δ = 0, and let Assumption 1 hold true. If

This result implies that the breakpoint estimators (13) are consistent, as T → ∞.

We illustrate the finite sample performance of the R-tests and M -tests for the models y t = γ t + u t , (model I)

where γ t = 0.8 · 1 {t/T ≥τ * } , u t and e t are independent and N ID(0, 1), and

where L is the lag operator. For model I and model II we consider the full structural break tests, and for model III partial break tests with H = (1, 0) are considered (see Remark 2). 

In Table 4 the empirical sizes and powers of the retrospective tests are compared with that of the sup-Wald test of Andrews (1993) . First, we observe that BQ T and SBQ T outperform Q T , except for the case τ * = 0.1. Second, while Q T has much lower power than the sup-Wald test, the reversed order cumulation structure in BQ T and SBQ T seems to compensate for the weakness of Q T . Andrews (1993) showed that the sup-Wald test is weakly optimal in the sense that, in the case of a single structural break, its asymptotic local power curve approaches the power curve from the infeasible point optimal maximum likelihood test, as the significance level tends to zero. Within the framework of the considered models, BQ T performs similarly well as the sup-Wald test and thus has comparably good power properties as the weakly optimal test. In contrast to SBQ T , the the sup-Wald test is not suitable for monitoring since its statistic is not measurable with respect to the filtration of information at time t.

Fixed endpoint M -tests are particularly useful when the monitoring period is short. Mtests with infinite horizon can be used for long monitoring periods of arbitrary length. To Chu et al. (1996) and Fremdt (2015) with boundary parameter 0.25 are denoted as CSW and FR, respectively.

evaluate the performance of the M -tests for finite samples, we simulate the same models as for the R-tests for time points t = 1, . . . , mT , where we specify m = 2 for the fixed endpoint tests and m = 20 for the infinite horizon tests. The results in Table 5 show that the mean delay for SBQ T,m is much lower than that of Q T,m and is almost constant across the breakpoint locations.

For infinite horizon monitoring, SBQ T,∞ performs similarly well compared to conventional tests (see Table 6 ). The detection delay of Q T,∞ is much higher than that of SBQ T,∞ , and the gap increases further with increasing τ * . Compared to the tests of Chu et al. (1996) and Fremdt (2015) , we find a similar picture. Note that the two alternative tests have no power in model II and are therefore omitted for this case.

To compare the breakpoint estimator in equation (13) with its maximum likelihood benchmark in (11) and (12), we present Monte Carlo simulation results in Table 7 . If the break τ * is located after 85% of the sample, the estimator based on backwardly cumulated recursive residuals has a much lower bias and root mean squared error than the maximum likelihood estimator, which is due to the fact that the post-break entails only few observations. Note: The bias and root mean squared error (RMSE) for the break date estimators (11) and (13) we consider the seasonally differenced series y t = y t − y t−7 . Weekly differences are used to account for the seasonalities resulting from the weekly reporting pattern of COVID-19 data with lower numbers on weekends. We estimate the dynamic model

where x t = (1, y t−1 , y t−7 ) , and β = (φ 0 , φ 1 , φ 2 ) . The parameters for lags 2 and 7 are the only significant autoregressive parameters.

Both the Ljung-Box and the robust Q-statistic of Dalla et al. (2020) do not indicate any significant autocorrelation in the residuals for the pre-break training periods (see Figure Note: The cumulative Ljung-Box (LB) and robust Q-statistics ( Q) of Dalla et al. (2020) are plotted for the residuals of model (14). The left plot shows the statistics using the sample from the first pre-monitoring training period (April 10 until May 21) and the right plot for the second pre-monitoring training period (July 20 until August 30). The dashed line indicates the 5% critical values. The plots are created using the R-package testcorr provided by Dalla et al. (2020) .

We consider the infinite-horizon stacked backward CUSUM statistic for a break in the intercept and the infinite horizon forward CUSUM of Chu et al. (1996) . We are interested in detecting positive changes in the intercept φ 0 and apply one-sided infinite horizon monitoring statistics with a significance level of 5%. The critical values for the partial right-sided tests are given by those of the full test with α = 0.1 and k = 1 (see Remark 2). We consider the infinite-horizon stacked backward CUSUM statistic for a break in the intercept and the infinite horizon forward CUSUM of Chu et al. (1996) . To compare the detector statistics, we scale them by their boundaries and critical values, so that H 0 is rejected in favor of a positive change in φ 0 if the detector exceeds unity, respectively. An alternative to the dynamic modeling in (14) is to apply autocorrelation robust M -tests to the model

where we replace σ T with the long-run variance estimator by Newey and West (1987) (see Remark 1).

The results are presented in Figure 6 . Both monitoring procedures find an indication for a rise in SARS-CoV-2 infections in the US at the end of June and the end of September.

The stacked backward CUSUM detects the breaks much earlier and becomes significant between 2 and 16 days before the forward CUSUM becomes significant. This confirms our theoretical analysis and shows that precious time can be saved by applying the backward monitoring scheme. 

In this paper we propose two alternatives to the conventional CUSUM detectors by Brown et al. (1975) and Chu et al. (1996) . It has been demonstrated that cumulating the recursive residuals backwardly results in much higher power than using forwardly cumulated recursive residuals, in particular if the break is located at the end of the sample. Accordingly, the backward scheme is especially attractive for on-line monitoring. To this end, the stacked triangular array of backwardly cumulated recursive residuals is employed and we find that this approach yields a much lower detection delay than the monitoring procedure by Chu et al. (1996) . Due to the multivariate nature of our tests, they also have power against structural breaks that do not affect the unconditional mean of the dependent variable. We also propose a new break date estimator which outperforms conventional estimators if the break is located at the end of the sample.

A Appendix: Technical proofs

We first present some auxiliary lemmas which we require for the main proofs.

Lemma 1. Let Assumptions 1 and 2 hold true. Then, for any fixed m < ∞, as T → ∞,

Proof. Note that M −1/2 sM t=1 x t u t ⇒ Ω 1/2 W (k) (s), s ∈ [0, 1], as M → ∞, by Theorem 7.19 in White (2001) . Then, on the space D([0, m]) k , as T → ∞,

and (x t , u t ) is strong mixing of size −κ/(κ − 6).

Lemma 2. Let Assumptions 1 and 7 hold true. Then, for any fixed m < ∞, as T → ∞,

Proof. We have M −1/2 sM t=1 x t u t ⇒ Ω 1/2 W (k) (s), s ∈ [0, 1], as M → ∞, by Corollary 4.2 in Wooldridge and White (1988) . The result follows by analogous arguments as in (15).

Lemma 3. Let β t = β 0 for all t ∈ N, and let Assumption 1 hold true. Moreover, let either Assumption 2 or 7 hold true. Then, for any m < ∞, as T → ∞,

where X t = t j=1 x j w j and Y t = t j=1 x j u j .

Proof. Let f j = (1 + (j − 1) −1 x j C −1 j−1 x j ) −1/2 1 {j>k} . Since β j = β 0 + j −1 C −1 j Y j , we can represent the recursive residuals as

The multivariate cumulative sum can be written as

and we have

It remains to show that max 1≤t≤mT T −1/2 A t = o P (1) and max 1≤t≤mT T −1/2 B t = o P (1).

For the first part, note that, for j > k,

which implies that √ j(1 − f j ) = O P (1), as j → ∞. Since C j is uniformly positive definite, there exists a uniformly minimal eigenvalue λ min > 0, which is defined as the infimum of all eigenvalues of the matrices { C j } j>k . Any Rayleigh quotient of C −1 j−1 is bounded above by λ −1 min < ∞. Therefore, for any j and δ > 0,

Hence, by the Hölder inequality,

for any 0 < ≤ (κ − 6)/3. Then,

where

which is O(ln(mT )/T ) since the harmonic series satisfies mT j=1 j −1 = O(ln(mT )), and

which is zero under Assumption 2. Under Assumption 7, we apply Corollary 14.3 of Davidson (1994) . Let α(τ ) be the α-mixing sequence of (x t , u t ), and consider 0 < ≤ (κ − 6)/3.

Then, by (18), there exists a constant K < ∞, such that

since 1 − 2/(2 + ) ≤ (κ − 6)/κ. Consequently, max 1≤t≤mT A t,T = o P (1) by Chebyshev's inequality. For the second term, we consider the decomposition B t = B t,1 + B t,2 , where

Note that, by (17), (1), as j → ∞. Thus, by Lemmas 1 and 2, for any s > 0, plim

which is o P (1) by the continuous mapping theorem. For B t,2 , we apply Abel's formula of summation by parts, which is given by

By setting A j = x j+1 x j+1 C −1 − I k and b j = j −1 T −1/2 Y j we get

x j x j . Lemmas 1 and 2, the continuous mapping theorem, and the fact that plim t→∞ C t = C imply max 1≤t≤mT B * t,T,1 = sup

For the last term, note that ( C j C −1 − I k )T −1/2 x j+1 u j+1 is either a martingale difference sequence or strong mixing. Then, by the fact that we have bounded eighth moments, and by Theorems 24.3 and 24.6 in Davidson (1994) , we have max 1≤t≤mT B * t,T,3 = o P (1), and the assertion is shown.

Lemma 4. Let Assumptions 1 and 5 hold true, and let β t = β 0 for all t ∈ N. Moreover, let either Assumption 2 or 7 hold true. Then, as T → ∞,

Proof. Analogously to the proof of Lemma 3, it remains to show that

where A t and B t are defined in (16), which satisfy sup t≤T T −1/2 A t + B t . Thus, for any

and the assertion follows.

Lemma 5. Let Assumption 1 hold true, let β t = β 0 for all t, and let m < ∞. Under

Assumption 2,

and, under Assumption 7,

where W (k) (r) is a k-dimensional standard Brownian motion.

Proof. Lemma 3 implies that

where X t = t j=1 x j w j and Y t = t j=1 x j u j . Under Assumption 2, Lemma 1, the continuous mapping theorem, and (23) imply

Analogously, under Assumption 7,

by Lemma 2. It remains to show that, for any r ≥ 0,

Let W j (r) and B j (r) be the j-th component of W (k) (r) and B (k) (r), respectively. We show the identities for each j = 1, . . . , k, separately. Using Cauchy-Schwarz and Jensen's inequalities, we obtain r 0 z −1 E[|W j (z)|] dz < ∞ as well as r 0 z −1 E[|W j (r)W j (z)|] dz < ∞, which justifies the application of Fubini's theorem in the subsequent steps. Since both W j (r) and F (W j (r)) = W j (r) − r 0 z −1 W j (z) dz are Gaussian with zero mean, it remains to show that their covariance functions coincide. Let w.l.o.g. r ≤ s. Then,

= (2r + r ln(s) − r ln(r)) − (r + r ln(s) − r ln(r)) − r = 0, and the assertion follows.

Lemma 6. Let h be a R k -valued function of bounded variation, and let {A t } t∈N be a sequence

Proof. By the application of Abel's formula of summation by parts, which is given in (22),

The fact that h(r) is of bounded variation yields sup r∈[0,m] h(r) = O(1) as well as

Then, by the triangle inequality, the assertion follows.

Proof of Theorem 1

Consider the auxiliary sequence y * t = x t β 0 + u t , which coincide with y t if β t = β 0 . Moreover, define β * t−1 = ( t−1 j=1 x j x j ) −1 t−1 j=1 x j y * j and f t = (1 + (t − 1) −1 x t C −1 t−1 x t ) −1/2 1 {t>k} . Then, w * t = f t (y * t − x t β * t−1 ) are recursive residuals from a regression without any structural break in the coefficients. If β t = β 0 + T −1/2 g(t/T ), we have y t = x t β t + u t = y * t + T −1/2 x t g(t/T ), and β t−1 = β * t−1 + T −1/2 (t − 1) −1 C −1 t−1 t−1 j=1 x j x j g(j/T ), which implies that w t = w * t + f t T −1/2 x t g(t/T ) − f t T −1/2 (t − 1) −1 C −1 t−1 t−1 j=1

x j x j g(j/T ). Consequently, sup r∈[0,m] S 2,T (r)+S 3,T (r)−σCh(t/T ) = o P (1), and, by Slutsky's theorem, S 1,T (r) + S 2,T (r) + S 3,T (r) ⇒ σC 1/2 W (k) (r) + σCh(r).

Finally, Q T (r) = σ −1 T C −1/2 T (S 1,T (r) + S 2,T (r) + S 3,T (r)) ⇒ W (k) (r) + C 1/2 h(r), since σ 2 T is consistent for σ 2 (see Krämer et al. 1988 ). Note that, if we replace Assumption 2 by Assumption 7, the corresponding limiting result for the autocorrelation robust statistic Q T (r) under H 0 follows analogously. Moreover, under the local alternatives we have S 1,T (r) + S 2,T (r) + S 3,T (r) ⇒ Ω 1/2 W (k) (r) + σCh(r), in equation (26) so that Q T (r) ⇒ W (k) (r) + σΩ −1/2 Ch(r).

By Assumption 5 there exists a k-dimensional standard Brownian motion W (k) (t) such that sup t>T By Lemma 4, from the fact that T −1/2 W (k) (t) d = W (k) (t/T ), and from (24), it follows that there exists another k-dimensional standard Brownian motion W (k) (t), such that sup r>1 r −1/2 T −1/2 X rT − Ω 1/2 W (k) (r) = o P (1), and the assertion follows, since Ω = σ 2 C under Assumption 2, and since σ 2 T , C T , and Ω T are consistent estimators for their population counterparts.

By the triangle inequality, we have

for any r and s. Thus,

for some k-dimensional standard Brownian motion W (k) (r), which implies that

We transform the expression into a supremum over the unit interval. A k-dimensional

Brownian motion W (k) (r) and a k-dimensional Brownian Bridge B (k) (r) have the distributional relation B (k) (r) 

.

Adopting the notation of the local break in Theorem 1, we have β t + T −1/2 g(t/T ) with g(t/T ) = T 1/2 δ1 {t≥T * } . Note that By equation (3) and Theorem 1,

Then, by the continuous mapping theorem,

ln(1) − ln(τ * ) 1 {1≥τ * } − ln(r) − ln(τ * ) 1 {r≥τ * } √ 1 − r + o P (1) = τ * + o P (1), since − ln(τ * )/ √ 1 − r is strictly increasing for r ∈ (0, τ * ) and − ln(r)/ √ 1 − r is strictly decreasing for r ∈ [τ * , 1). Analogously, if τ * ∈ (1, τ d ],

Sequential testing with uniformly distributed size

Heteroskedasticity and autocorrelation consistent covariance matrix estimation

Tests for Parameter Instability and Structural Change with Unknown Change Point

Real-time monitoring for explosive financial bubbles

Delay time in sequential detection of change

Structural Breaks in Time Series: Structural Breaks in Time Series

Change-point monitoring in linear models

Delay times of sequential procedures for multiple time series regression models

Estimation of a change point in multiple regression models

Split invariance principles for stationary processes

Komlós-Major-Tusnády approximation under dependence

Convergence of Probability Measures

Techniques for Testing the Constancy of Regression Relationships Over Time

Theory of evolutionary spectra for heteroskedasticity and autocorrelation robust inference in possibly misspecified and nonstationary models

Continuous record asymptotics for structural change models

Structural breaks in time series

Continuous record Laplace-based inference about the break date in structural change models

Monitoring structural change

Robust tests for white noise and crosscorrelation. Forthcoming at: Econometric Theory

Stochastic limit theory: an introduction for econometricians

A likelihood ratio approach to sequential change point detection for a general class of parameters

Page's sequential procedure for change-point detection in time series regression

A new approach for open-end sequential change point monitoring

Testing for parameter instability in linear models

Testing for speculative bubbles in stock markets: a comparison of alternative methods

Detecting changes in linear regressions

Monitoring changes in linear models

Power properties of the modified CUSUM tests

On the use of estimating functions in monitoring time series for change points

An approximation of partial sums of independent RV'-s, and the sample DF. I. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete

Testing for structural change in dynamic models

Monitoring structural changes with the generalized fluctuation test

A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix

Multiple time series regression with integrated processes

Financial bubble implosion and reverse regression

Explosive behavior in the 1990s Nasdaq: When did exuberance escalate asset values?

The local power of the CUSUM and CUSUM of squares tests

The CUSUM test with OLS residuals

Boundary crossing probabilities for the wiener process and sample sums

Mean shift testing in correlated data

Invariance principles for recursive residuals

Almost sure behavior of sums of independent random variables and martingales

Asymptotic theory for econometricians

Monitoring correlation change in a sequence of random variables

Some invariance principles and central limit theorems for dependent heterogeneous processes

Strong invariance principles for dependent random variables

Monitoring structural change in dynamic econometric models

We are thankful to Holger Dette, Josua Gösmann, Alexander Mayer, Dominik Wied and three referees for their very helpful comments and suggestions which helped to improve the paper a lot. Furthermore, the usage of the CHEOPS HPC cluster for parallel computing is gratefully acknowledged.

An accompanying R-package for all methods presented in this article is available online at https://github.com/ottosven/backCUSUM.

We decompose T −1/2 rT t=1 x t w t = S 1,T (r) + S 2,T (r) + S 3,T (r), wherex j x j g( j T ).Lemma 5 yields S 1,T (r) ⇒ Ω 1/2 W (k) (r), where Ω 1/2 = σC 1/2 under Assumption 2. Analo-Since g(r) is piecewise constant and therefore of bounded variation, Lemma 6 yields sup r∈[0,m]For the third term, letFrom Assumption 1(b) and Lemma 6, p 1,T − p 3,T ≤ p 1,T − p 2,T + p 2,T − p 3,T = o P (1).Consequently,Since p 3 (r) = p 3, rT is a partial sum of a piecewise constant function, it is of bounded variation, and, together with (