key: cord-0130410-9lm83q02
authors: McGonigle, Euan T.; Cho, Haeran
title: Robust multiscale estimation of time-average variance for time series segmentation
date: 2022-05-23
journal: nan
DOI: nan
sha: b603eeafd3a7ba7301f15761c0916cd2331a7291
doc_id: 130410
cord_uid: 9lm83q02

There exist several methods developed for the canonical change point problem of detecting multiple mean shifts, which search for changes over sections of the data at multiple scales. In such methods, estimation of the noise level is often required in order to distinguish genuine changes from random fluctuations due to the noise. When serial dependence is present, using a single estimator of the noise level may not be appropriate. Instead, we propose to adopt a scale-dependent time-average variance constant that depends on the length of the data section in consideration, to gauge the level of the noise therein, and propose an estimator that is robust to the presence of multiple change points. We show the consistency of the proposed estimator under general assumptions permitting heavy-tailedness, and discuss its use with two widely adopted data segmentation algorithms, the moving sum and the wild binary segmentation procedures. We illustrate the good performance of the proposed estimator through extensive simulation studies and on applications to the house price index and air quality data sets.

Dating back to the 1950s (Page, 1954) , change point analysis has a long tradition in statistics.

The area continues to be an active field of research due to its importance in many applications where data is routinely collected in highly nonstationary environments, examples of which include genetics (Hocking et al., 2013) and climatology (Reeves et al., 2007) . Data segmentation, a.k.a. multiple change point detection, enables partitioning of a time series into stationary regions and thus provides a simple framework for modelling nonstationary time series.

In this paper, we consider the problem of detecting multiple change points in the piecewise constant mean of an otherwise stationary time series. We briefly review the existing literature on multiple change point detection in the presence of serial dependence; we refer to Aue and Horváth (2013) and Truong et al. (2020) for a comprehensive overview. One line of research takes a parametric approach and simultaneously estimates the serial dependence and change points. For example, Chakar et al. (2017) , Fang and Siegmund (2020) and Romano et al. (2021) assume an autoregressive (AR) model of order one, while Lu et al. (2010) , Cho and Fryzlewicz (2021) and Gallagher et al. (2021) permit an AR model of arbitrary order.

Another line of research focuses on applying the methodologies developed for independent data to time series settings. Lavielle and Moulines (2000) and Cho and Kirch (2021a) extend the use of an information criterion (Yao, 1988) to data exhibiting serial correlations and heavy tails, which requires the choice of an appropriate penalty that depends on the level of noise.

Tecuapetla-Gómez and Munk (2017), Eichinger and Kirch (2018) , Dette et al. (2020) and Chan (2022) propose estimators of the long-run variance (LRV) for quantifying the level of noise that are robust to the presence of multiple mean shifts. Such estimators are then combined e.g. with a moving sum procedure or the simultaneous multiscale change point estimator algorithm (Frick et al., 2014) , for multiple change point detection. We also note that Wu and Zhou (2020) and Zhao et al. (2021) extend self-normalisation-based change point tests to the data segmentation problem.

In this paper, we propose a robust estimator of the scale-dependent time-average variance constant (TAVC, Wu, 2009) . It is closely related to the literature on estimation of the LRV, namely σ 2 = lim N →∞ Var(N −1/2 N t=1 ε t ) for a stationary time series {ε t } t∈Z , but distinct in that our interest lies in estimating

for a given scale L. We argue that such a scale-dependent TAVC estimator is well-suited to be combined with a class of multiscale change point detection methodologies, examples of which include the moving sum (MOSUM) procedure (Eichinger and Kirch, 2018) and the wild binary segmentation (WBS, Fryzlewicz, 2014) algorithm. Such approaches locally apply change point tests for single change point detection to data sections of varying lengths and achieve good adaptivity in multiple change point detection (Cho and Kirch, 2021a) . We motivate the use of scale-dependent TAVC in (1) in combination with such multiscale methods in the following example.

Example 1. Consider an MA(1) process ε t = W t − 0.9W t−1 , where {W t } t∈Z is a white noise process with Var(W t ) = 1. Then, we have σ 2 = 0.01, while the TAVC at scales L = 50 and L = 100 are σ 2 50 = 0.1 and σ 2 100 = 0.055 respectively, which are several factors larger than the LRV. On the other hand, consider an AR(1) process ε t = 0.9ε t−1 + W t , with W t defined as above. Then, σ 2 = 100 while σ 2 50 ≈ 81.16 and σ 2 100 ≈ 90.53, both of which are smaller than the LRV.

This demonstrates that adopting the global LRV may fail to reflect the degree of variability in the local data sections that are involved in multiscale algorithms, which in turn may result in false negatives (failure in detecting genuine changes) or false positives (detection of spurious estimators). Moreover, when the LRV is close to zero as in the first instance of Example 1, some estimators of the LRV have been observed to take negative values (Hušková and Kirch, 2010) , which further makes their use in change point problems undesirable.

To ensure that the scale-dependent TAVC estimator is robust to the mean shifts, we adopt the robust M -estimation framework of Catoni (2012) , which was first proposed in the independent setting for mean and variance estimation and further extended to the serially dependent setting for LRV estimation in Chen et al. (2021) . We establish the consistency of the proposed robust estimator of scale-dependent TAVC under general conditions permitting heavy tails and serial dependence decaying at a polynomial rate. Then, we discuss its application with multiscale change point detection methods such as the MOSUM procedure and the WBS algorithm, and provide a heuristic approach to accommodate local stationarity in the data.

The remainder of the article is organised as follows. Section 2 introduces the scale-dependent TAVC and its robust estimator and establishes its consistency. Section 3 discusses its application with multiscale data segmentation algorithms and an extension to local stationarity. In Section 4, we examine the performance of the proposed methodology on simulated data sets and two real data examples on house price index and air quality. Section 5 concludes the paper. All proofs, algorithmic descriptions of multiscale change point methods and additional numerical results are given in the appendix. Accompanying R software implementing the methodology is available from https://github.com/EuanMcGonigle/TAVC.seg.

2 Scale-dependent TAVC and its robust estimation

We consider the problem of multiple change point detection under the following model:

Under Model (2), the piecewise constant signal f t contains q change points at locations τ i , i = 1, . . . , q, with the notational convention that τ 0 = 0 and τ q+1 = n. The errors {ε t } n t=1 are assumed to be a (weakly) stationary time series satisfying E(ε t ) = 0 with finite LRV σ 2 = lim N →∞ Var(N −1/2 N t=1 ε t ) ∈ (0, ∞), and are permitted to be serially correlated and heavytailed (see Assumption 1 below for details). Our aim is to consistently estimate the total number and the locations of the change points.

A common approach to this problem is closely related to the change point testing literature, which scans the data for the detection and estimation of multiple change points by locally applying a test well-suited for detecting a single change. Such a procedure typically involves comparing a test statistic of the form σ −1 s,e |T s,k,e | to a threshold, say D. Here,

denotes a change point detector evaluated at some locations 0 ≤ s < k < e ≤ n, which are determined in a method-specific way (see Section 3 below), σ 2 s,e denotes a measure of variability in the data section {X t } e t=s+1 , and σ 2 s,e its estimator. Under the stationarity assumption on {ε t } n t=1 , a natural choice is σ 2 s,e = σ 2 e−s , the scale-dependent TAVC defined in (1) with L = e − s as the scale. Then, if {X t } e t=s+1 does not contain any change point well within the interval, we expect σ −1 s,e |T s,k,e | ≤ D, while it signals the presence of such a change point when σ −1 s,e |T s,k,e | > D. The key challenge lies in separating genuine mean shifts from the natural fluctuations due to serial correlations, which requires a careful selection of the estimator σ 2 s,e that correctly captures the variability in the section of the data under consideration.

It is well-documented that multiscale application of such a test on data sections of varying lengths, improves the adaptivity of the change point methodology to detect both large, frequent changes and small changes over long stretches of stationarity (Cho and Kirch, 2021a) . For such a multiscale procedure, Example 1 demonstrates the potential pitfalls associated with using an estimator of the global LRV in place of σ 2 s,e , regardless of the length of the interval on which the detector statistic in (3) is computed. In the next section, we propose an estimator of the scale-dependent TAVC σ 2 L in (1) that is robust to the presence of multiple mean shifts, for the standardisation of multiscale change point detectors.

For notational convenience, suppose that L is an even number, and let G = L/2 denote the block size. Then, for some starting point

Analogously, we definē

Then, the following sum

takes into account the temporal dependence in the local data sections of length L = 2G. Further,

, on the other hand, is typically biased due to the mean shifts and thus is inappropriate as an estimator of the scale-dependent TAVC.

To obtain an estimator that is robust to multiple mean shifts, we adopt the robust M -estimation framework of Catoni (2012) . Let φ denote a non-decreasing influence function as

The robust estimator of the TAVC at scale L and starting point b, denoted σ 2 L,b , is defined as the solution of the M -estimation equation

where φ v (x) = v −1 φ(vx) for some v > 0; we specify the condition on v later. If there are multiple solutions to Equation (6), any of them may be chosen.

We establish the consistency of the scale-dependent TAVC estimator under the following assumption on the error process {ε t } n t=1 .

Assumption 1.

(i) We assume that ε t = ∞ k=0 a k η t−k , where {η t } t∈Z is a sequence of i.i.d. random variables and |a k | ≤ Ξ(k + 1) −β for some constants β > 2.5 and Ξ > 0 for all k ≥ 0.

(ii) There exists a fixed constant c σ > 0 such that σ 2 = ( k≥0 a k ) 2 satisfies σ 2 ≥ c σ .

(iii) We operate under either of the following two conditions on the distribution of {η t } t∈Z .

(a) There exists a fixed constant r > 4 such that η 1 r = (E(|η 1 | r )) 1/r < ∞.

(b) There exist fixed constants C η > 0 and κ ≥ 0 such that η 1 r ≤ C η r κ for all r ≥ 1.

Assumption 1 (i) is a mild condition that permits the temporal dependence to decay at a polynomial rate, and (ii) is made to ensure that the LRV is well-defined. Assumption 1 (iii) (a) allows heavy-tailed {ε t } n t=1 , while (b) assumes a stronger condition that requires sub-Weibull (Wong et al., 2020) tail behaviour on {η t } t∈Z which includes sub-Gaussian (κ = 1/2) and sub-exponential (κ = 1) distributions as special cases.

The following theorem establishes the consistency of the estimator of the scale-dependent TAVC, see Appendix C for the proof.

Theorem 1. Suppose that Assumption 1 holds, and define

ε t for L = 2G. Then, provided that v G/n, the estimator σ 2 L,b satisfies

under Assumption 1 (iii) (a), and

under Assumption 1 (iii) (b), for any fixed b ∈ {0, . . . , G − 1}. In addition, σ 2 L satisfies

Theorem 1 shows that the proposed robust estimator consistently estimates the TAVC at scale L. The estimation error is decomposed into the error from approximating σ 2 L by σ 2 L in (9), and that in estimating σ 2 L by σ 2 L,b . In deriving the second error, we make explicit the influence of multiple mean shifts on the estimator by the term (Lq/n) 1/2 in (7)-(8), as well as the effect of the innovation distribution in the remaining terms. A careful examination of the proof of Theorem 1 shows that

i.e. as L increases, the scale-L TAVC approximates the global LRV as expected.

Remark 1 (Maximum time-scale for TAVC estimation). The error due to approximating σ 2 L with σ 2 L decreases with L as in (10). On the other hand, the error of estimating σ 2 L with σ 2 L,b

increases with L as in (7) 

We now describe explicitly how the robust estimator of the scale-dependent TAVC proposed in Section 2, is applied within the algorithms that scan moving sum (MOSUM) and cumulative sum (CUSUM) statistics of the form (3), for multiple change point detection.

The MOSUM procedure (Chu et al., 1995; Eichinger and Kirch, 2018) evaluates the change point detector T s,k,e over a moving window. For a given bandwidth G, the MOSUM detector for a change in mean at time point k is given by Eichinger and Kirch (2018) propose to estimate the total number and the locations of multiple change points by identifying all significant local maximisers of T G (k), say k, satisfying

for some η ∈ (0, 1). Here, D n (G; α) denotes a critical value at a significance level α ∈ (0, 1), which is drawn from the asymptotic null distribution of the MOSUM test statistic max G≤k≤n−G σ −1 |T G (k)| obtained under mild conditions permitting heavy-tailedness and serial dependence, and takes the form

where a G,n and b G,n are known constants depending on G and n only. The single-bandwidth MOSUM procedure achieves consistency in estimating the total number and the locations of multiple change points, provided that min 1≤i≤q µ 2 i G → ∞ as n → ∞ sufficiently fast while 2G ≤ min 0≤i≤q (τ i+1 − τ i ), see Theorem 3.2 of Eichinger and Kirch (2018) and Corollary D.2 of Cho and Kirch (2021b) for explicit conditions. The requirement on G indicates that the singlescale MOSUM procedure performs best with the bandwidth chosen as large as possible while avoiding situations where there are more than one change point within the moving window at any time. Consequently, it lacks adaptivity when the data sequence contains both large changes over short intervals and small changes over long intervals.

Multiscale extensions of the single-bandwidth MOSUM procedure, i.e. applying the MOSUM procedure with a range of bandwidths and then combining the results, alleviate the difficulties involved in bandwidth selection and provide adaptivity. In this paper, we consider the multiscale MOSUM procedure combined with the 'bottom-up' merging as proposed by Messer et al. (2014) (see also Meier et al. (2021) ). Denoting the range of bandwidths by G = {G h , 1 ≤ h ≤ H :

Then, we accept all estimators in C(G 1 ) returned with the finest bandwidth G 1 to the set of final estimators C and, sequentially for h = 2, . . . , H, accept k ∈ C(G h ) if and only if min k∈C | k − k| > ηG h (with η identical to that in (12)). That is, we only accept the estimators that do not detect the change points which have previously been detected at a finer scale.

We propose to apply the multiscale MOSUM procedure with bottom-up merging, in combination with the robust estimator of multiscale TAVC as follows. For each G h ∈ G, the TAVC at

Here, M denotes the maximum scale which is set in relation to the sample size n, see Remark 1. Then, we use σ 2G h in place of the global estimator σ in (12) to standardise the MOSUM detector T G h (k). When 2G h > M , we use σ M in place of σ 2G h for MOSUM detector standardisation. In doing so, we ensure that change point detectors at multiple scales are standardised by the scale-dependent TAVC that accurately reflects the degree of variability over the moving window, while taking into account the presence of possibly multiple mean shifts therein. We refer to Algorithm 1 in Appendix A for the pseudocode of the multiscale MOSUM procedure with the robust estimator of scale-dependent TAVC.

The binary segmentation algorithm (Scott and Knott, 1974; Vostrikova, 1981) and its extensions, such as wild binary segmentation (WBS, Fryzlewicz, 2014; 2020) and seeded binary segmentation (Kovács et al., 2020) , recursively search for multiple change points using the CUSUM statistic of the form (3), with s and e that are identified iteratively. These methods have primarily been analysed for the data segmentation problem under (2) assuming i.i.d. Gaussianity on the {ε t } n t=1 . Consequently, some robust estimators of Var(ε 1 ) have been considered for standardising the CUSUM statistic. Here, we discuss the application of the WBS2 algorithm (Fryzlewicz, 2020) in the time series setting with the proposed robust estimator of the scale-dependent TAVC.

Let A s,e = {( , r) ∈ Z 2 : s ≤ < r ≤ e, r − > 1} denote the collection of all intervals within {s + 1, . . . , e} for some 0 ≤ s < e ≤ n, and R s,e denote its subset selected either randomly or deterministically (see Cho and Fryzlewicz (2021) for one approach to deterministic grid selection) with |R s,e | = min(R, |A s,e |) for some given R ≤ n(n − 1)/2. Starting with (s, e) = (0, n), we identify

for some threshold D and σ 2 r− denoting the proposed robust estimator of the TAVC at scale L = r − (when r − is odd, we use σ 2 r− −1 instead). As in Section 3.1, a maximum scale M is set so that the CUSUM statistic over any interval of length greater than M is standardised using σ M . Following the recommendation made in Fryzlewicz (2014), we adopt the threshold

If (s • , k • , e • ) that fulfils (13) exists, it signals the presence of a change point so that the data is partitioned into {X t } k• t=s+1 and {X t } e t=k•+1 , and the same step of detecting and identifying a single change point is repeated on each partition separately. If no such (s • , k • , e • ) exists, or when the user-specified minimum segment length is reached, then the search for change points

We provide a pseudocode of the WBS2 algorithm with the robust estimator of scale-dependent TAVC in Algorithm 2 of Appendix A.

We propose a heuristic extension of the robust estimator of scale-dependent TAVC to the setting where the second-order structure of {ε t } n t=1 varies smoothly over time. Suppose that there exists an appropriately chosen window size W such that {ε t } k+ W/2 t=k− W/2 +1 may be regarded as being approximately second-order stationary over all k. Then, we propose to perform the robust estimation described in Section 2.2 in a localised fashion.

To this end, define the time-varying TAVC at scale L and time k by

For notational convenience, we set W = N 2 L for some integer N 2 , and let N 3 = (W − G)/G be the number of blocks for window size W . For k ∈ {W/2, . . . , n − W/2}, we estimate σ 2 L (k) by σ 2 L (k), the solution of the following M -estimation equation

with v (G/W ) 1/2 . We apply a boundary extension so that σ 2

The estimator of the local TAVC at time k and scale L is obtained analogously as that of the global TAVC at scale L described in Section 2.2, except that we only use the windowed data region starting at time k − W/2 and ending at k + W/2 for the estimation of the former. Then, the MOSUM detector T G (k) and the CUSUM statistic T s,k,e described in Sections 3.1-3.2 are standardised in a time-dependent way using σ 2G (k) and σ 2 (e−s) (k), respectively. In practice, we observe that taking the running median of { σ 2

b=0 as an estimator of σ 2 L (k) further improves the performance, as it 'smoothes' out the local estimators and enhances the robustness to mean changes.

We illustrate the benefit of adopting the time-varying adaptation of the proposed robust estimator using the following example. Consider a time series of length n = 1000, where the errors {ε t } t∈Z follow a time-varying AR(1) model: ε t = a 1 (t)ε t−1 + W t , with a 1 (t) = 0 for t ≤ 500, a 1 (t) = 0.7 for t ≥ 501, and W t ∼ i.i.d. N (0, 1). There are two changes in the mean at τ 1 = 300 and τ 2 = 700, with change sizes 1 and 2, respectively. In Figure 1 , we show the MOSUM detector statistic in (11) calculated using bandwidth G = 100. We also plot the threshold D n (G, α) at the significance level α = 0.05, multiplied by the square root of the global TAVC estimator at scale L = 2G (i.e. σ 2 2G ) in dashed blue line, and that multiplied by the square root of the local estimators of the scale-L TAVC (i.e. σ 2 2G (k)) in solid red line. We see that using the global approach misses the change at time τ 1 = 300 due to the global scale-L TAVC estimator being too large, whilst the localised approach successfully detects both changes.

Lastly, we mention that the robust estimation of time-varying and scale-dependent TAVC is of independent interest beyond the context of change point analysis, with possible extensions including the estimation of other second-order properties. For example, the procedure can be used to obtain a robust estimator of the spectrum of a locally stationary wavelet process (Nason et al., 2000) while the time series undergoes shifts in the mean. 4 Numerical results

We provide guidance on the selection of tuning parameters required for the proposed robust TAVC estimator and its application with multiscale change point detection methods.

Parameter v in (6). 

where

. Another approach is to use an appropriately scaled median of

, in a similar fashion to McGonigle et al. (2021) . In this case, the necessary scaling constant to ensure unbiasedness can be chosen by noting thatX j,b is asymptotically Gaussian as L → ∞, and thus ξ j,b is asymptotically scaled χ 2 1 . This leads to the choiceξ

where K = 2.125. In simulation studies, we report the results obtained withξ [ ],b , = 1, 2, in setting the parameter v b where we observe that both choices return similarly good results.

Parameters b and b max in (6). As a further step to ensure greater robustness of the estimator, we obtain the estimator σ 2 L,b for a range of b ∈ {0, . . . , b max } and take their median as the final estimator σ 2 L . Informally, we may get unlucky with some starting value b that leads to many of the ξ j,b contaminated by the mean changes, and taking the median over a range of values of b helps in alleviating this. We take b max = G − 1 in practice which yields good performance.

Maximum time-scale M . We recommend M = 2.5 √ n as the coarsest scale at which the scale-dependent TAVC is estimated. This choice is made to balance between mitigating the effect of change points, and ensuring that the TAVC at coarser scales is well-approximated by

Tuning parameters for the multiscale MOSUM procedure. We follow Cho and Kirch (2021b) (2014) and Cho and Fryzlewicz (2021) respectively. As we permit the presence of serial correlations, it is reasonable to impose a minimum length requirement on the intervals considered in the WBS2 algorithm. We set this minimum length to be 2G 1 , with G 1 the finest scale considered by the MOSUM procedure.

Window size W for time-varying TAVC estimation. We utilise a scale-dependent window size W L . Setting W L = N 2 L gives N 3 = 2N 2 − 1 data points used in the solving of the M -estimation equations. We advise setting N 2 = 5, which ensures that the influence of change points is negated and that the window size is large enough to include enough data points for reliable estimation of the TAVC.

In this section, we evaluate the performance of the proposed robust estimator of scale-dependent TAVC applied with the two multiscale change point detection procedures discussed in Sections 3.1-3.2. We compare with other methods that account for serial dependence under (2) and whose implementations are readily available in R, with a variety of scenarios for generating serially correlated {ε t } t∈Z .

We assess the performance of different methods both in the case of no changes (q = 0) and multiple changes (q ≥ 1), under a variety of error structures. Unless stated otherwise, we

(M3) AR(1) model: ε t = a 1 ε t−1 + W t , with a 1 = 0.9 and σ w = 1 − a 2 1 .

(M4) AR(2) model: ε t = a 1 ε t−1 +a 2 ε t−2 +W t , with a 1 = 0.5 and a 2 = 0.3, with σ w = 0.6676184.

with a 1 (t) = 0.5 cos(2πt/n) and We assess the performance of the methods both when q = 0 and q ≥ 1. In the latter case, the time series contains the q change points at τ i = n/(q + 1) · i , i = 1, . . . , q, with the (signed) change size µ i = µ(τ i ) · (−1) i+1 . In (M1)-(M4) and (M6), we set µ(τ i ) = σ and in the case of (M5), we set µ = 1. In (M7)-(M9), we set µ(τ i ) = σ(τ i ) where σ 2 (t) denotes the time-varying LRV.

We implement the robust TAVC estimation within both the multiscale MOSUM and WBS2 Additionally, we consider DepSMUCE (Dette et al., 2020) , DeCAFS (Romano et al., 2021) and WCM.gSa (Cho and Fryzlewicz, 2021) for comparison. DepSMUCE extends SMUCE (Frick et al., 2014) to dependent data using a difference-type estimator of the LRV. Although not its primary objective, DeCAFS detects multiple change points in the mean assuming that the noise is a stationary AR(1) process. The WCM.gSa method performs model selection on the sequence of models generated by the WBS2 algorithm, using an information criterion-based model selection strategy which assumes that {ε t } t∈Z follows an AR model of an arbitrary order.

For DepSMUCE, we consider significance levels α ∈ {0.05, 0.2}. Other tuning parameters not mentioned here are chosen as recommended by the authors. Table 1 When q = 0, we report the proportion of falsely detecting any change point out of the 1000 realisations (see the column 'Size' in Table 1 ). When q ≥ 1, we report the relative mean squared

where f t is the piecewise constant signal constructed with the estimated change points, and f * 

We apply MOSUM.TAVC and WBS2.TAVC, the multiscale procedures combined with the robust estimator of scale-dependent TAVC, to two data examples. We select the parameter v in (6) using (16) and select other tuning parameters as described in Section 4.1 unless specified otherwise.

We analyse the monthly percentage changes in UK house price index (HPI), which provides insight into the estimated overall changes in house prices across the UK. The data are available from https://www.gov.uk/government/statistical-data-sets/, and a detailed description of the calculation of the HPI can be found from UK Land Registry (2021). The HPI series for various regions of the UK have previously been analysed in Baranowski et al. (2019) and McGonigle et al. (2021) . We analyse the HPI for detached properties in the area of Somerset

West and Taunton between April 1995 and February 2022 (n = 323).

We set the tuning parameters as described in 1999 -05, 2003 -01, 2008 -08, 2009 -01 WCM.gSa 1999 -05, 2003 -01, 2007 -09, 2009 -01, 2021-08

We analyse the daily average concentrations of nitrogen dioxide (NO 2 ), measured in µg/m 3 , recorded at Marylebone Road in London, UK. The measurements were taken from January 1st, 2000 until October 31st, 2021 (n = 7734). The data set is available from https://www. londonair.org.uk and a similar dataset was analysed for shifts in the mean in Cho and Fryzlewicz (2021) using the WCM.gSa method. The data take positive values and display both seasonality and effects due to bank holidays, as the main source of NO 2 emissions at the site is likely to be road traffic. To mitigate these effects, we take the square root transform of the data and remove seasonality as described in Cho and Fryzlewicz (2021) .

We apply WBS2.TAVC and MOSUM.TAVC using the global TAVC estimator in (6) TAVC 2003 -01-31, 2010 -07-25, 2019 -03-10, 2020 -03-18 MOSUM.TAVC 2003 -01-11, 2010 -03-06, 2018 -12-30, 2020 -03-18 DepSMUCE(0.05) 2003 -01-31, 2010 -07-25, 2018 -10-14, 2020 -03-18 DepSMUCE(0.2) 2003 -01-31, 2008 -08-31, 2012 -10-04, 2018 -10-14, 2020 -03-18 WCM.gSa 2003 -01-31, 2009 -12-09, 2018 -10-14, 2020 

We propose an estimator of scale-dependent TAVC that is robust to the presence of (possibly) multiple mean shifts. It is readily combined with multiscale change point detection methodolo- (12) for k ∈ C(G) do Add ( k, G) to P end for k • ∈ P in increasing order with respect to G do if min k∈C | k • − k| ≥ ηG then Add k • to C end Output: C

Algorithm 2 provides a pseudocode for the WBS2 algorithm combined with the robust TAVC estimation.

In this section, we provide further information on the simulation study carried out in Section 4.2 and report additional numerical results to demonstrate the performance of the proposed robust estimator of the scale-dependent TAVC.

The covering metric (CM) used to assess the quality of the segmentation produced by the detected change point is defined as follows. The true change locations {τ i } q i=1 define a partition P of the interval {1, 2, . . . , n} into disjoint sets A i such that A i is the segment {τ i−1 + 1, . . . , τ i }. Similarly, the estimated change locations { τ i } q i=1 yield a partition P of segments A i . (6) Table B .1: Performance comparisons when n = 500. We report the size, the proportion of realisations where change points are falsely detected when q = 0, and the distribution of the estimated number of change points, covering metric (CM) and relative MSE (RMSE) over 1000 realisations when q = 3. The modal value of the distribution of the number of estimated change points for each method is shown in bold. The best performing method according to each metric when q = 3 is underlined. Table B .2: Performance comparisons when n = 2000. We report the size, the proportion of realisations where change points are falsely detected when q = 0, and the distribution of the estimated number of change points, covering metric (CM) and relative MSE (RMSE) over 1000 realisations when q = 6. The modal value of the distribution of the number of estimated change points for each method is shown in bold. The best performing method according to each metric when q = 6 is underlined.

For sequences of positive numbers {a n } and {b n }, we write a n b n , or a n = O(b n ), if there exists some constant C > 0 such that a n /b n ≤ C as n → ∞. We write a n b n if there exists some positive constants C 1 and C 2 such that C 1 ≤ a n /b n ≤ C 2 as n → ∞. Without loss of generality, we set b = 0 and drop the dependence on b for notational simplicity; analogous arguments are applicable when other fixed values of b ∈ {0, . . . , G − 1} are used.

We denote by B j = {t ∈ {1, . . . , n} : jG + 1 ≤ t ≤ (j + 1)G} the set of indices of the j-th block of data for some j = 0, . . . , n/G − 1. We adapt the proof of Theorem 5 in Chen et al.

(2021) with modifications to our case with the TAVC estimator. We denote by

Then, |S| ≤ 2q. First, we consider the influence function constructed using the blocks which do not contain any change points:

Then, it can be shown that E(h L (u)) satisfies the envelope property

To see this, note that since φ(x) ≤ x + x 2 /2 by Equation (5), we have

Similarly, from the fact that φ(x) ≥ x − x 2 /2, the lower bound follows.

Next, we show thath L (u) is concentrated about its mean. We deal with the different cases,

Assumptions 1 (iii) (a) and (b), separately, in order to prove Equations (7) and (8) respectively.

Applying

Step 2 of the proof of Theorem 5 in Chen et al. (2021) , we have that, for C 0 > 0 and

where A n is the δ-net for {u : |u − σ 2 L | ≤ C 0 } with δ = σ 2 L x/(2N 0 ) and |A n | = O(N 0 /x). For any random variable X, denote by E 0 (X) = X − E(X) the centering operator, and let ∆X j =X j −X j−1 . Let F k = {η t , t ∈ ∪ l≤k B l }, and F k,{m} , m ≤ k, be F k with η t therein replaced with its independent and identically distributed copy η t for all t ∈ B m . Then for any

Proof of (7). Under Assumption 1 (iii) (a), we show that ζ j (u) = φ v (ξ j −u) satisfies appropriate functional dependence properties. Denote

Since |φ | ∞ ≤ 1, we have for any ∈ N and u ∈ R,

and we define

Noting that

Next, by Assumption 1 (i), we have

for any arbitrarily small > 0. Then, using (C.4),

(1 ∨ (jG + 1 − k)) −2(β−1− ) G, (C.6)

where the constants involved in depend on , β and Ξ. Since we only consider j / ∈ S, we bound I 1 in (C.3) for ≥ 1 as

where the first inequality follows from Hölder's inequality, the second from Burkholder's inequality (see e.g. Lemma 2 of Chen et al. (2021) ) with C r = max( √ r − 1, 1/(r − 1)), the last from (C.5)-(C.6) and µ r = η 1 r . Similarly, when = 0, we have that I 1 G −1 C 2 r µ 2 r . We can bound the term I 2 in (C.3) in a similar fashion, to obtain δ ,r/2 (G ) −(β−1− ) I( > 1) + I( ≤ 1) C 2 r µ 2 r .

(C.7)

Therefore, the dependence-adjusted norm w r/2,α = sup

for α = β − 2 − 2 > 1/2 − 2/r with β > 2.5. Under Assumption 1 (iii) (a), having shown the bound on w r/2,α , we apply Lemma C.2 of Zhang and Wu (2017) and yield, for any u ∈ A n , P h L (u) − E(h L (u)) ≥ x 2N 0 N 0 x −r/2 + exp − Recalling that |S| ≤ 2q and |φ(x)| ≤ log(2), we have that Under Assumption 1, we have for some j / ∈ S,

by (C.6) and Burkholder's inequality. Therefore, from (C.13) and that u + ≥ ξ = σ 2 L ,

A similar bound can be obtained for u − , the largest root of B −1 (u, ∆) and under Equation (C.11), we have that u − ≤ σ 2 L ≤ u + . Then, setting v (Gq/n) 1/2 , we have (C.12) holds, and

, G log(n) n .

Proof of (8). We proceed analogously as in the proof of (7), except that we control for the dependence-adjusted sub-exponential norm of ζ j (u), as where the first inequality follows from (C.7) and the second from Assumption 1 (i). Therefore, we havew f < ∞ with f = 2κ + 1. Then applying Lemma C.4 of Zhang and Wu (2017) with (C.2), we obtain P sup

where c depends on κ and β throughw f . Taking x log 2κ+3/2 (n)N 1/2 0 , we have that

(C.14)

Then, by the analogous arguments as those adopted in the proof of (7) with (C.14) replacing (C.9), we derive (8).

Proof of (9). Recalling that U j = G −1 (j+1)G t=jG+1 ε t , we have

ε t = σ 2 G + GE(U j U j−1 ).

By Assumption 1 (i), ρ k = E(ε 0 ε k ), k ≥ 0, satisfy

Therefore,

Lastly, to prove the second statement in Equation (10), we show that 

Nitrogen dioxide in the United Kingdom

Contour detection and hierarchical image segmentation

Structural breaks in time series

Narrowest-over-threshold detection of multiple change points and change-point-like features

Challenging the empirical mean and empirical variance: A deviation study

A robust approach for estimating change-points in the mean of an AR(1) process

Optimal difference-based variance estimators in time series: A general framework

Inference of breakpoints in high-dimensional time series

Multiple change point detection under serial dependence: Wild contrast maximisation and gappy Schwarz algorithm

Data segmentation algorithms: Univariate mean change and beyond

Two-stage data segmentation permitting multiscale change points, heavy tails and dependence

MOSUM tests for parameter constancy

Multiscale change point detection for dependent data

A MOSUM procedure for the estimation of multiple random change points

Detection and estimation of local signals

Multiscale change point inference

Wild binary segmentation for multiple change-point detection

Detecting possibly frequent change-points: Wild binary segmentation 2 and steepest-drop model selection

Autocovariance estimation in the presence of changepoints

UK COVID-19 lockdown: 100 days of air pollution reduction?

Learning smoothing models of copy number profiles using breakpoint annotations

A note on studentized confidence intervals for the changepoint

Seeded binary segmentation: A general methodology for fast and optimal change point detection

Least-squares estimation of an unknown number of shifts in a time series

An MDL approach to the climate segmentation problem

Detecting changes in mean in the presence of time-varying autocovariance

Trend locally stationary wavelet processes

mosum: A package for moving sums in change-point analysis

A multiple filter test for the detection of rate changes in renewal processes with varying variance

Wavelet processes and adaptive estimation of the evolutionary wavelet spectrum

Continuous inspection schemes

A review and comparison of changepoint detection techniques for climate data

Detecting abrupt changes in the presence of local fluctuations and autocorrelated noise

A cluster analysis method for grouping means in the analysis of variance

Autocovariance estimation in regression with a discontinuous signal and m-dependent errors: A difference-based approach

House prices shoot up in UK towns as 'race for space' continues apace

Selective review of offline change point detection methods

UK house price index

An evaluation of change point detection algorithms

Detecting 'disorder' in multidimensional random processes

Lasso guarantees for β-mixing heavy-tailed time series

Multiscale jump testing and estimation under complex temporal dynamics

Recursive estimation of time-average variance constants

Estimating the number of change-points via Schwarz' criterion

Gaussian approximation for high dimensional time series

Segmenting time series via self-normalization