key: cord-0118693-r4e9fa82
authors: Fryzlewicz, Piotr
title: Robust Narrowest Significance Pursuit: inference for multiple change-points in the median
date: 2021-09-06
journal: nan
DOI: nan
sha: fce5ef405143ab7fea01ede7be7810cbb967710a
doc_id: 118693
cord_uid: r4e9fa82

We propose Robust Narrowest Significance Pursuit (RNSP), a methodology for detecting localised regions in data sequences which each must contain a change-point in the median, at a prescribed global significance level. RNSP works by fitting the postulated constant model over many regions of the data using a new sign-multiresolution sup-norm-type loss, and greedily identifying the shortest intervals on which the constancy is significantly violated. By working with the signs of the data around fitted model candidates, RNSP is able to work under minimal assumptions, requiring only sign-symmetry and serial independence of the signs of the true residuals. In particular, it permits their heterogeneity and arbitrarily heavy tails. The intervals of significance returned by RNSP have a finite-sample character, are unconditional in nature and do not rely on any assumptions on the true signal. Code implementing RNSP is available at https://github.com/pfryz/nsp.

There is a growing interest in the statistical literature in the problem of uncertainty quantification for possibly multiple parameter changes in time-or space-ordered data, motivated by the practical question of whether any suspected changes are "real" (i.e. can be attributed to structural changes in the underlying stochastic model) or due to random fluctuations. General types of approaches to this problem include confidence sets associated with simultaneous multiscale change-point estimation (Frick et al., 2014; Pein et al., 2017) , post-selection inference (Hyun et al., 2018 (Hyun et al., , 2021 Jewell et al., 2020; Duy et al., 2020) , inference without selection and post-inference selection via Narrowest Significance Pursuit (Fryzlewicz, 2021) , asymptotic confidence intervals conditional on the estimated change-point locations (Bai and Perron, 1998; Eichinger and Kirch, 2018; Cho and Kirch, 2021) , False Discovery Rate (Hao et al., 2013; Li and Munk, 2016; Cheng et al., 2020) , and Bayesian inference (Lavielle and Lebarbier, 2001; Fearnhead, 2006) . These approaches go beyond mere change-point detection and offer statistical significance statements regarding the existence and locations of change-points in the statistical model underlying the data.

However, the vast majority of the existing works, with only a handful of exceptions which we review in more detail later, do so under strong assumptions on the innovations entering the model, such as their homogeneity over time, and/or finite moments up to a certain order, and/or a specific known distribution or distributional family. This is a limitation, because change-point modelling and inference for data that do not necessarily fall into these categories is of interest in various domains, including bioinformatics (Muggeo and Adelfio, 2011; Arlot and Celisse, 2011) , biophysics (Pein et al., 2017) , cancer medicine (Kim et al., 2000) , economics (Bai and Perron, 2003) and finance (Oomen, 2019) , amongst others.

In this paper, we are concerned with the following problem: given a sequence of noisy independent observations, automatically determine localised regions of the data which each must contain a change-point in the median, at a prescribed global significance level. The methodology we introduce, referred to as Robust Narrowest Significance Pursuit (RNSP), achieves this for the piecewise-constant median model, capturing changes in the level of the median. It does so with practically no distributional assumptions on the data, other than serial independence of the signs of the true residuals. In contrast to the existing literature, RNSP requires no knowledge of the distribution on the part of the user, and permits arbitrarily heavy tails, heterogeneity over time and/or lack of symmetry. In addition, RNSP is able to handle distributions that are continuous, continuous with mass points, or discrete, where these properties may also vary over the signal.

To situate RNSP in the context of the existing literature, we now review the few existing approaches that permit uncertainty statements for multiple change-point locations under the assumption of possible heterogeneity in the distribution of the noise.

H-SMUCE (Pein et al., 2017 ) is a change-point detector in the heterogeneous Gaussian piecewise-constant model Y i = f i + σ i ε i , where f i is a piecewise-constant signal, ε i are i.i.d. N (0, 1) variables, and σ i is also piecewise-constant in such a way that it can only change when f i does. In their Lemma C.7, the authors provide a theoretical construction of confidence intervals for the locations of the change-points in f i which works for a fixed significance level α. However, their result assumes that the number of change-points has not been underestimated, and therefore necessarily assumes certain (and in this particular case somewhat strong) minimum distance and minimum jump size conditions, which are typically difficult to verify in practice. In addition, it offers confidence intervals of the same length regardless of the individual prominence of each change-point. Finally, the practical use of this result requires the knowledge of the number of change-points, a lower bound on the minimum spacing between each pair of change-points, and the minimum jump size, all of which will often be unknown to the analyst.

However, Pein et al. (2017) also provide a different, practical, algorithmic construction of confidence intervals, in their Section A.1. This requires having an estimate of the number of change-points, and involves screening the data for short intervals over which a constant signal fit is unsuitable and they must therefore contain change-points. Crucially, this algorithmic construction relies on the knowledge of scale-dependent critical values (for measuring the unsuitability of a locally constant fit), which are not available analytically but only by simulation, and therefore the method would not extend automatically to unknown noise distributions (as the analyst needs to know what distribution to sample from). In Section 4, we show that H-SMUCE suffers from inflated type I error rates in the sense that the thus-constructed confidence intervals, in the examples of Gaussian models shown, do not all contain at least one true change-point each in more than 100(1 − α)% of the cases, contrary to what this algorithmic construction sets out to do. H-SMUCE is also prone to failing if the model is mis-specified, e.g. if the distribution of the data has a mass point (which is unsurprising in view of its assumption of Gaussianity).

In contrast to H-SMUCE, RNSP does not assume Gaussianity, and works with any (possibly heterogeneous) noise distributions, as long as they are sign-symmetric around the median, a very weak requirement which is immaterial for continuous distributions (as all, even non-symmetric, continuous distributions are sign-symmetric). The noise distribution does not need to remain constant between each consecutive pair of change-points, can change arbitrarily, and can be continuous, discrete, or continuous with mass points. The execution of RNSP does not rely on having an estimate of the number of change-points. Critical values needed by RNSP do not depend on the noise distribution (which can be unknown to the analyst), and can be accurately approximated analytically. RNSP explicitly targets the shortest possible intervals of significance, and its algorithmic construction ensures exact finite-sample coverage guarantees.

The non-robust predecessor of RNSP, Narrowest Significance Pursuit (NSP; Fryzlewicz, 2021) is able to handle heterogeneous data in a variety of models, including the piecewiseconstant signal plus independent noise model. However, NSP requires that the noise, if heterogeneous, satisfies the central limit theorem (i.e. is within the domain of attraction of the normal distribution) and is symmetric, neither of which is assumed in RNSP. Crucially, noise heterogeneity is dealt with in NSP via self-normalisation using the functional-analytic framework of Rackauskas and Suquet (2003) . The self-normalised statistic used in NSP includes a term resembling an estimate of the local variance of the noise, which, however, is only unbiased under the null hypothesis of no change-point being present locally. The fact that the same term over-estimates the variance under the alternative hypothesis, reduces the power of the detection statistic, which leads to typically long intervals of significance. We illustrate this issue with the self-normalised non-robust NSP for heterogeneous data in Section 4 and show that RNSP offers a significant improvement.

The Bayesian approach of Fearnhead (2006) provides an inference framework for independent observations drawn from a f (·|θ j ) density within the jth segment, with θ j possibly vector-valued, which permits some types of heterogeneity. Operating in this framework requires the choice of either of two types of priors: one involving a prior on the number of change-points plus a conditional prior on their locations, and the other -an instance of the product-partition model. In addition, the distribution family f is assumed known, and a prior need to be specified for the θ j parameters. While self-contained and not overly computationally intensive, this approach suffers from the fact that the estimation of the number of change-points is a difficult statistical problem and therefore so is choosing a good prior for this quantity; for the same reason, it is a choice that can significantly affect the inferential conclusions. Bai and Perron (1998) and Bai and Perron (2003) , in seminal papers studying least-squares estimation of multiple change-points in regression models under possible heterogeneity of the errors, describe a procedure for computing confidence intervals conditional on detec-tion. For an unknown distribution of the errors, the limiting distribution of each estimated change-point location converges to an appropriate functional of the Wiener process only under the assumption that the corresponding break size goes to zero with the sample size. The asymptotic validity of the resulting confidence interval relies on the consistency of the estimators of the unknown quantities involved (such as the local variance of the innovations or the break size); it is therefore a large-sample, asymptotic construction. Crucially, it does not take into the account the uncertainty associated with detection, which can be considerable especially for the more difficult problems (for an example, see the "US ex-post real interest rate" case study in Bai and Perron (2003) , where there is genuine uncertainty between models with 2 and 3 change-points; we revisit this example in Section 5.1). By contrast, RNSP produces intervals of significant change in the median that are not conditional on detection, have a finite-sample nature and are valid regardless of the size of the breaks.

The paper is organised as follows. Section 2 motivates RNSP and sets out its general algorithmic framework. Section 3 describes how RNSP measures the local deviation from model constancy, its crucial ingredient. It also gives theoretical performance guarantees for RNSP. Section 4 contains numerical examples and comparisons. Section 5 includes two examples showing the practical usefulness of RNSP. Section 6 concludes with a brief discussion. Software implementing RNSP is available at https://github.com/pfryz/nsp.

2 Motivation and review of RNSP algorithmic setting 2.1 RNSP: context and modus operandi Robust NSP (RNSP) differs from NSP as defined in Fryzlewicz (2021) in terms of the data functional it targets, and in how it measures local deviation of this data functional from the baseline model. RNSP discovers regions in the data in which the median departs from the baseline model. This is in contrast to NSP, which targets the mean. In RNSP, we do not make any moment assumptions about the data, and therefore the median is a natural measure of data centrality. In RNSP, the baseline model is constant, and therefore the algorithm will look to discover segments in the data in which the median must experience abrupt departures from constancy (change-points in the level), at a certain global significance level.

In this section, we review the components of the algorithmic framework that are shared between NSP and RNSP, with the generic measurement of local deviation from the constant model as one of its building blocks. In Section 3, we will introduce the particular way in which local deviation from the constant model is measured in RNSP, which is appropriate for the median and hence fundamentally different from NSP. RNSP operates in the signal plus noise model

where the variables {Z t } T t=1 and the signal {f t } T t=1 satisfy, respectively, the assumptions below. Before stating them, we define the sign(x) function as follows.

(2)

{Z t } T t=1 do not have to be identically distributed, and can have arbitrary mass atoms, or none, as long as their distributions satisfy (b) and (c) in Assumption 2.1. We do not impose any moment assumptions. Any continuous distribution (even one with an asymmetric density function) is sign-symmetric. The distribution(s) of {Z t } T t=1 can be unknown to the analyst. Note that the requirement of the independence of {sign(Z t )} T t=1 is significantly weaker than that of the independence of {Z t } T t=1 itself: for example, if Z t is a (G)ARCH process driven by symmetric, independent innovations, then sign(Z t ) is serially independent, while Z t is not.

Assumption 2.2 In (1), f t is a piecewise-constant vector with an unknown number N and locations 0 = η 0 < η 1 < . .

In this work, RNSP does not venture beyond the piecewise-constant model; this is in contrast to the non-robust NSP (Fryzlewicz, 2021) , applicable also to piecewise-polynomial models of higher degrees, and piecewise-stationary regression models with arbitrary covariates (including with autoregression). The reason for it is that for robustness, RNSP works with the signs of the data -both at the stage of fitting the best constant model on each interval, and in examining the empirical residuals from the fits. Due to the non-linearity of the sign operator, RNSP cannot use linear programming for the constant model fits (unlike the non-robust NSP), and has to manually try all possible constant model fits on each interval, to be able to select one that gives the minimum deviation, as this is required for its coverage guarantees. This is a problem of dimensionality one in the piecewise-constant model, which makes it computationally straightforward and fast (details are in Section 3.2). However, in models of parameter dimension two or higher (such as the piecewise-linear model), an analogous exact solution would be too computationally demanding, hence the restriction to the piecewise-constant model in the current work. As we explain in detail later on, our coverage guarantees for RNSP in the piecewise-constant median model are exact and hold for any sample size.

RNSP achieves the high level of generality in terms of the permitted distributions of Z t stated in Assumption 2.1 thanks to its use of the sign transformation. The use of 0 in sign(x) is critical for RNSP's objective to provide correct coverage guarantees also for discrete distributions and continuous distributions with mass points. We illustrate the importance of this point in Section 3.2.

The generic algorithmic framework underlying both RNSP and the non-robust NSP in Fryzlewicz (2021) is the same and is based on recursively searching for the shortest sub-samples in the data with globally significant deviations from the baseline model. In this section, we introduce this shared generic framework. In the following sections, we show how RNSP diverges from NSP through its use of a robust measure of deviation from the baseline model, suitable for the very broad class of distributions specified in Assumption 2.1.

We start with a pseudocode definition of the RNSP algorithm, in the form of a recursively defined function RNSP. In its arguments, [s, e] is the current interval under consideration and at the start of the procedure, we have [s, e] = [1, T ]; Y (of length T ) is as in the model formula (1); M is the minimum guaranteed number of sub-intervals of [s, e] drawn (unless the number of all sub-intervals of [s, e] is less than M , in which case drawing M sub-intervals would mean repetition); λ α is the threshold corresponding to the global significance level α (typical values for α would be 0.05 or 0.1) and τ L (respectively τ R ) is a functional parameter used to specify the degree of overlap of the left (respectively right) child interval of [s, e] with respect to the region of significance identified within [s, e], if any. The no-overlap case would correspond to τ L = τ R ≡ 0. In each recursive call on a generic interval [s, e] , RNSP adds to the set S any globally significant local regions (intervals) of the data identified within [s, e] on which Y is deemed to depart significantly (at global level α) from the baseline constant model. We provide more details underneath the pseudocode below. In the remainder of the paper, the subscript [s,e] relates to a constant indexed by the interval [s, e], whose value will be clear from the context. 

The RNSP algorithm is launched by the pair of calls below.

On completion, the output of RNSP is in the variable S. We now comment on the RNSP function line by line. In lines 2-4, execution is terminated for intervals that are too short. In lines 5-10, a check is performed to see if M is at least as large as the number of all subintervals of [s, e] . If so, then M is adjusted accordingly, and all sub-intervals are stored in In lines 11-13, each sub-interval [s m , e m ] is checked to see to what extent the response on this sub-interval (denoted by Y sm:em ) conforms to the baseline constant model. This core step of the RNSP algorithm will be described in more detail in Section 3.

In line 14, the measures of deviation obtained in line 12 are tested against threshold λ α , chosen to guaranteed global significance level α. How to choose λ α is independent of the distribution of Z t if it is continuous, and there is also a simple distribution-independent choice of λ α for discrete distributions and continuous distributions with probability masses; see Section 3.3. The shortest sub-interval(s) [s m , e m ] for which the test rejects the baseline model at global level α are collected in set M 0 . In lines 15-17, if M 0 is empty, then the procedure decides that it has not found regions of significant deviations from the constant model on [s, e] , and stops on this interval as a consequence. Otherwise, in line 18, the procedure continues by choosing the sub-interval, from among the shortest significant ones, on which the deviation from the baseline constant model has been the largest. The chosen interval is denoted by [s m 0 , e m 0 ].

In line 19, [s m 0 , e m 0 ] is searched for its shortest significant sub-interval, i.e. the shortest subinterval on which the hypothesis of the baseline model is rejected locally at a global level α. Such a sub-interval certainly exists, as [s m 0 , e m 0 ] itself has this property. The structure of this search again follows the workflow of the RNSP procedure; more specifically, it proceeds by executing lines 2-18 of RNSP, but with s m 0 , e m 0 in place of s, e. The chosen interval is denoted by [s,ẽ] . The importance of this second-stage search in RNSP's pursuit to produce short intervals is discussed in detail (in the context of NSP) in Fryzlewicz (2021) . In line 20, the selected interval [s,ẽ] is added to the output set S.

In lines 21-22, RNSP is executed recursively to the left and to the right of the detected interval [s,ẽ] . However, we optionally allow for some overlap with [s,ẽ] . The overlap, if present, is a function of [s,ẽ] and, if it involves detection of the location of a change-point within [s,ẽ] , then it is also a function of Y . An example of the relevance of this is given in Section 5.1.

3 Robust NSP: measuring deviation from the constant model

The main structure of the DeviationFromConstantModel(s m , e m , Y ) operation is as follows:

1. Fit the best, in the sense described precisely later, constant model to Y sm:em .

2. Examine the signs of the empirical residuals from this fit. If their distribution is deemed to contain a change-point (which indicates that the constant model fit is unsatisfactory on [s m , e m ] and therefore the model contains a change-point on that interval), the value returned by DeviationFromConstantModel(s m , e m , Y ) should be large; otherwise small.

For RNSP to be able to control, globally and non-trivially, the probability of spurious detection, the deviation measure it uses must meet the two desiderata below.

Desideratum A. If there is no change-point on [s m , e m ], then the value of Deviation-FromConstantModel(s m , e m , Y ) should be bounded from above by the same deviation measure for the true model on [s m , e m ] (the latter is an oracular quantity, unknown to the analyst), and this in turn should be bounded from above, uniformly over all [s m , e m ], by a random variable involving the signs of the true residuals Z only, such that either its distribution is known or its quantiles can easily be bounded, sharply, from above.

Desideratum B. The deviation measure on [s m , e m ] cannot be made smaller by proposing unrealistic constant model fits on that interval; otherwise it would be easy to force non-detection on any interval. This is an important desired property as our deviation measure will need to 'try' all possible constant model fits and choose one for which the deviation measure is the smallest, to ensure that Desideratum A (the part relating to boundedness from above by the deviation measure for the true model) is met.

As in Fryzlewicz (2021), a key ingredient of our measure of deviation, which helps it achieve the above desired properties, is a multiresolution sup-norm introduced below. The fundamental difference with Fryzlewicz (2021) is that RNSP uses it on the sign of the input rather than in the original data domain. The basic building block of the multiresolution sup-norm is a scaled partial sum statistic, defined for an arbitrary input sequence {x t } T t=1 by

We define the multiresolution sup-norm (Nemirovski, 1986; Li, 2016) of an input vector x (of length T ) with respect to the interval set I as

The set I used in RNSP contains intervals at a range of scales and locations. A canonical example of a suitable interval set I is the set I a of all subintervals of [1, T ]. We will use I a in defining the largest acceptable global probability of spurious detection. However, for computational reasons, DeviationFromConstantModel will use a smaller interval set (we give the details below). This will not affect the exactness of our coverage guarantees, because, naturally, if J ⊆ I, then x J ≤ x I . We also define the restriction of I There are a number of reasons for the use of the multiresolution sup-norm in our deviation measure. One is that thanks to the particular construction of the RNSP algorithm and our deviation measure, in order for us to be able to control spurious detections at a global level α, it suffices to know, or be able to bound from above (as sharply as possible), the 1 − α quantile of the distribution of sign(Z) I a . However, relatively sharp results of this type are either well-known or easy to obtain, as we show in Section 3.3.

Another reason is that the multiresolution sup-norm of the signs of the empirical residuals from a postulated baseline model fit is naturally large for proposed model fits if they are unrealistic, which penalises them, thereby ensuring that Desideratum B above is met. This important property would not be available if, instead of the multiresolution sup-norm of the residual signs, we chose to use, for example, maximum-likelihood or other contrast-based approaches to test deviations from the baseline (the simplest example of such a test is the CUSUM test). We discuss this point in more detail in Section 3.2.1, where we also give further reasons supporting the use of the multiresolution sup-norm in our context. 

We now define the deviation measure D [sm,em] , returned by the DeviationFromCon-stantModel function from line 12 of the RNSP algorithm. which satisfies the desired condition (6). Of course f [sm,em] is not known, but a key observation is that there are only at most 2(s m −e m )+3 different possible constantsf [sm,em] leading to different sequences {sign(Y t −f [sm,em] )} em t=sm . To see this, sort the values of Y sm:em in non-decreasing order to create Y (1) , Y (2) , . . . , Y (em−sm+1) . Take candidate constantsf {j} [sm,em] , j = 1, . . . , 2(s m − e m ) + 3, defined as follows.

. . .

> Y (em−sm+1) (but otherwise arbitrary).

We have the following simple result. (7). There exists a j 0 ∈ {1, . . . , 2(s m − e m ) + 3} such that

and (8) is satisfied. Otherwise, it must be that either f [sm,em] < Y (1) or f [sm,em] > Y (em−sm+1) ; in the former case take j 0 = 1 and in the latter, j 0 = 2(e m − s m ) + 3 and (8) is satisfied by a similar argument.

We now define our measure of deviation D [sm,em] , and prove its key property as a corollary to Proposition 3.1.

Definition 3.1 Let the constantsf {j} [sm,em] , j = 1, . . . , 2(s m − e m ) + 3 be defined as in (7). We define

Essentially, D [sm,em] tries all possible baseline constant model fits on [s m , e m ] and chooses the one for which the multiresolution norm of the signs of the residuals from the fit is the smallest. Choosing the fit that minimises the multiresolution fit is essential for guaranteeing coverage properties, as we will see below. If there is a change-point on [s m , e m ], the hope is that even the minimum multiresoltion norm fit as returned by D [sm,em] will be large enough for RNSP to indicate a change-point on that interval. 

In other words, the deviation measure defined in (9) satisfies the desired property (6).

Proof. Let the index j 0 be as in the statement of Proposition 3. This leads to the following guarantee for the RNSP algorithm.

Theorem 3.1 Let S = {S 1 , . . . , S R } be the set of intervals returned by the RNSP algorithm.

The following guarantee holds.

Proof. On the set sign(Z) I a ≤ λ α , each interval S i must contain a change-point as if it did not, then by Corollary 3.1 and inequality (5), we would have to have

However, the fact that S i was returned by RNSP means, by line 14 of the RNSP algorithm, that D S i > λ α , which contradicts (11). This completes the proof.

Theorem 3.1 should be read as follows. Let α = P ( sign(Z) I a > λ α ). For a set of intervals returned by RNSP, we are guaranteed, with probability of at least 1 − α, that there is at least one change-point in each of these intervals. Therefore, S = {S 1 , . . . , S R } can be interpreted as an automatically chosen set of regions (intervals) of significance in the data. In the no-change-point case (N = 0), the correct reading of Theorem 3.1 is that the probability of obtaining one of more intervals of significance (R ≥ 1) is bounded from above by P ( sign(Z) I a > λ α ).

We emphasise that Theorem 3.1 does not promise to detect all the change-points, or to do so asymptotically as the sample size gets larger: this would be impossible without assumptions on the strength of the change-points (involving spacing between neighbouring change-points and the sizes of the jumps). Such assumptions are typically impossible to verify in practice and we do not make them in this work. Instead, Theorem 3.1 promises that every interval of significance returned by RNSP must contain at least one change-point each, with a certain global probability adjustable by the user.

The intervals of significance returned by RNSP have an "unconditional confidence interval" interpretation: they are not conditional on any prior detection event, but indicate regions in the data each of which must unconditionally contain at least one change in the median of Y t , with a global probability of at least 1 − α. Therefore, as in NSP (Fryzlewicz, 2021) , RNSP can be viewed as performing "inference without selection" (where "inference" refers to producing the RNSP intervals of significance and "selection" to the estimation of changepoint locations, absent from RNSP). This viewpoint also enables "post-inference selection" or "in-inference selection" if the exact change-point locations (if any) are to be estimated within the RNSP intervals of significance after or during the execution of RNSP. See the discussion in Fryzlewicz (2021) for a further discussion of these aspects. We also emphasise that the statement of Theorem 3.1 is of a finite-sample character and holds exactly and for any sample size.

We now comment on a number of aspects of the deviation measure D · .

Method of computation. In the non-robust NSP (Fryzlewicz, 2021) , the analogue of D [sm,em] was amenable to computation via linear programming; the non-robust NSP with selfnormalisation also used linear programming as the main step in its computation. By contrast, in RNSP, measuring the deviation from the baseline model involves examining the signs of the residuals rather than the residuals themselves as was the case in NSP. Because of the non-linearity of the sign function, it is not possible to use linear programming for the computation of D [sm,em] in RNSP, and (9) needs to be computed by manually trying out all candidate constantsf {j} [sm,em] . While this may appear expensive, it is the only option, as (a) trying only 'realistic' constant model fits would require an additional global parameter describing what it means for a local constant fit to be realistic, and (b) carrying out the fitting in an ad hoc way, e.g. by only fitting the empirical median of Y sm:em (rather than each of the constantsf {j} [sm,em] ), would violate the desired property (6) and therefore not be able to lead to exact coverage guarantees.

Achieving computational savings without affecting coverage guarantees. Although the operation of trying each constantf {j} [sm,em] in (9) is relatively fast, in order to accelerate it further, we introduce the two computational savings below, which do not increase D [sm,em] and therefore respect the crucial inequality (10) and hence also our coverage guarantees in Theorem 3.1.

Reducing the set I a [sm,em] . To accelerate the computation of (9), we replace the set I a [sm,em] in D [sm,em] sm,em] is maximised at one of the "left" or "right" intervals in I lr [sm,em] , and we are happy to sacrifice potential negligible differences in the noisy case in exchange for the substantial computational savings.

Limiting interval lengths. In practice, the analyst may not be interested in excessively long RNSP intervals of significance, or in other words in change-points so weak that only a very long interval around them is required to be able to detect them. With this in mind, our software at https://github.com/pfryz/nsp provides an option of limiting the length of intervals considered in the sense that D [sm,em] is able to automatically return zero if e m − s m > L for a user-specified maximum length L.

Unsuitability of CUSUM or similar contrasts in deviation measure. We note that it would be impossible to replace the multiresolution sup-norm in D [sm,em] by, for example, the CUSUM statistic or a similar contrast measure such as that described in Ellenberger et al. (2021 would not be able to offer detection under any circumstances as it would always equal zero. This is an example of a construction that violates Desideratum B.

Importance of zero in sign function. For valid coverage guarantees in the presence of mass points in Z t (or if the distribution of Z t is discrete), it is crucial for the sign function defined in (2) to return zero if its argument is zero. To illustrate this point, define

where "inv" stands for "invalid", and consider the trivial case Z t ≡ 0. For no-change-point input data Y sm:em = (f [sm,em] , . . . , f [sm,em] ), there are only three different constantsf {j} [sm,em] (see definition (7)). With sign inv (x) in place of sign(x), this would lead to

and therefore we would have D [sm,em] = √ e m − s m + 1, the largest value D [sm,em] can possibly take, which would therefore have to lead to the (spurious) designation of [s m , e m ] as containing a change-point, for e m − s m suitably large. Naturally, the analogous argument would also apply if sign inv (0) = −1 rather than 1. By contrast, note that the use of the (correct) sign function leads to (7)) includes those placed at the data points themselves (i.e. those indexed by even values of j in definition (7)). Indeed, continuing the example directly above, if we were to exclude the constantf

[sm,em] from the list of test levels considered, we would then have D [sm,em] = √ e m − s m + 1 even with the use of the correct function sign (and not only with sign inv ), which would again lead to spurious detection.

{j} [sm,em] in between sorted data points. It is equally important that the test levelsf {j} [sm,em] should include those placed in between the sorted data points (i.e. those indexed by odd values of j in definition (7)). This can be see e.g. by considering Z t such that P (Z t = 1) = P (Z t = −1) = 1/2; for brevity, we omit the full discussion.

To make Theorem 3.1 operational, we need to obtain an understanding of the distribution of sign(Z) I a so we are able to choose λ α such that P ( sign(Z) I a > λ α ) = α (or approximately so) for a desired global significance level α.

Initially we consider Z t such that P (sign(Z t ) = 1) = P (sign(Z t ) = −1) = 1/2, i.e. P (sign(Z t ) = 0) = 0 for all t (the general case P (sign(Z t ) = 0) ≥ 0 is covered in the next paragraph). From Theorem 1.1 in Kabluchko and Wang (2014) , we have that

where a T = {2 log(T log −1/2 T )} 1/2 and Λ is a constant. As Kabluchko and Wang (2014) do not unambiguously state the value of Λ, we use simulation over a range of values of T and τ to determine a suitable value of Λ as 0.274. The practical choice of the significance threshold λ α then proceed as follows: (a) fix α to the desired level, for example 0.05 or 0.1; (b) obtain the value of τ by equating 1−exp(−2Λ exp(−τ )) = α; (c) obtain λ α = a T +τ /a T .

Suppose now that P (sign(Z t ) = 0) = ρ t ≥ 0; note that the sign-symmetry Assumption 2.1(c) implies P (sign(Z t ) = 1) = P (sign(Z t ) = −1) = (1 − ρ t )/2. Construct the variablẽ Z t = Z t | Z t = 0. As P (sign(Z t ) = 1) = P (sign(Z t ) = −1) = 1/2, the limiting statement (13) applies to sign(Z t ). However, we have the double inequality

with I = [1, 2, . . . , T 1 ], where T 1 = |{t ∈ [1, . . . , T ] : Z t = 0}|. The first inequality in (14) holds because every constituent partial sum of sign(Z) I a has a corresponding larger or equal in magnitude partial sum in sign(Z) I a I constructed by removing the zeros from its numerator and decreasing (or not increasing) its denominator as it contains fewer (or the same number of) terms. As an illustrative example, suppose the sequence of sign(Z t ) starts −1, 0, 1, 1. The absolute partial sum | − 1 + 0 + 1 + 1|/ √ 4, a constituent of sign(Z) I a , is majorised by the absolute partial sum | − 1 + 1 + 1|/ √ 3, a constituent of sign(Z) I a I , where the latter sum has been constructed by removing the 0 from −1, 0, 1, 1 and adjusting for the number of terms (now 3 instead of 4). The second inequality in (14) holds simply because T 1 ≤ T .

The implication of (14) is that sign(Z) I a for ρ t ≥ 0 is majorised by sign(Z) I a for ρ t = 0, the case handled by (13). Therefore, the threshold λ α obtain as a consequence of (13) can also be applied in the general case ρ t ≥ 0.

In this section, we demonstrate numerically that the guarantee offered by Theorem 3.1 holds for RNSP in practice over a variety of homogeneous and heterogeneous models for which the variables Z t satisfy Assumption 2.1. We also investigate the circumstances under which similar guarantees are not offered by H-SMUCE (Pein et al., 2017) or by the self-normalised version of NSP (SN-NSP), suitable for heterogeneous data (Fryzlewicz, 2021) . In this section, we use the acronyms RNSP and SN-NSP to denote the versions of these respective procedures with no interval overlaps, i.e. τ L = τ R = 0. Later in this section, we introduce notation for versions with overlaps. Both RNSP and SN-NSP use M = 1000 intervals, the default setting. For H-SMUCE, the function call we use is stepR::stepFit(x, alpha = 0.1, family="hsmuce", confband=TRUE).

We begin with null models, by which we mean models (1) for which f t is constant throughout, i.e. N = 0. For null models, Theorem 3.1 promises that RNSP at level α returns no intervals of significance with probability at least 1 − α. In this section, we use α = 0.1. There are similar parameters in H-SMUCE and SN-NSP, and they are also set to 0.1. The null models used are listed in Table 1 . therefore RNSP keeps its 90% coverage promise for this model exactly, rather than generously as in the other models. H-SMUCE behaves correctly for the two Gaussian models, but fails for the discrete distributions and model Mix 1, which contains mass points. It is unexpectedly successful in the Plain Cauchy model, but this is perhaps it has very limited detection power in the Cauchy model with change-points (more on this model below). It also underperforms for model Mix 2, which is continuous (and within the domain of attraction of the Gaussian distribution) but multimodal. SN-NSP fails for the discrete distributions, which is a consequence of the (asymptotically guaranteed) closeness of the self-normalised deviation measure to the appropriate functional of the Wiener process not "kicking in" in these instances (due to the relatively small sample sizes).

We now discuss performance for signals with change-points (N > 0). Table 3 defines our models. Theorem 3.1 promises that any intervals of significance returned by RNSP at levels α are such that, with probability at least 1 − α, they each contain at least one true changepoint. In addition to RNSP, H-SMUCE and SN-NSP, we also test versions of RNSP and SN-NSP with the following overlap functions: Table 4 : Results for each model+method combination: "coverage" is the number of times, out of 100 simulated sample paths, that the respective model+method combination did not return a spurious interval of significance; "prop. genuine interv." is the average (over 100 simulated sample paths) proportion of genuine intervals out of all intervals returned, if any (if none are returned, the corresponding 0/0 ratio is ignored in the average; N/A means there were no detections in any of the 100 sample paths); "no. of genuine intervals" is the average (over 100 sample paths) number of genuine intervals returned; "avg. genuine interv. length" is the average (over 100 sample paths) length of a genuine interval returned in the respective model+method combination.

coverage (i.e. whether at least (1 − α)100% of the simulated sample paths are such that any intervals of significance returned contain at least true change-point each); if any intervals are returned, the proportion of those that are genuine (i.e. the proportion of those intervals returned that contain at least one true change-point); the number of genuine intervals (i.e. the number of those intervals returned that contain at least one true change-point); and the average length of genuine intervals (i.e. the average length of those intervals returned that contain at least one true change-point). RNSP and RNSP-O significantly outperform SN-NSP and SN-NSP-O in three out of the four scenarios tested: Blocks, Cauchy and Poisson. In Blocks and Cauchy, the RNSP methods achieve more detections and shorter intervals of significance (so better localisation). In Poisson, in addition, they achieve much better coverage (the SN-NSP methods are misled by the discrete nature of this relatively low-intensity Poisson dataset, for which their required asymptotics do not kick in, which results in a very large number of spurious detections). However, the SN-NSP methods work better for the Bursts data in the sense that they lead to more detections. The underlying reason is that the signal level in this model is linearly proportional to the standard deviation of the noise, which particularly suits the self-normalised SN-NSP methods. Note that the "no. of genuine interv." value of 5.26 for SN-NSP-O in Bursts, which is greater than the true number of change-points (5), should not surprise here despite the value of "coverage" being 100, because SN-NSP-O is a method that can produce overlapping intervals, so two different intervals of significance can on occasion correspond to the same change-point.

We re-analyse the time series of US ex-post real interest rate (the three-month treasury bill rate deflated by the CPI inflation rate) considered in Garcia and Perron (1996) , Bai and Perron (2003) and Fryzlewicz (2021) . The dataset is available at http://qed.econ.

queensu.ca/jae/datasets/bai001/. The time series, shown in Figure 1 , is quarterly and the range is 1961:1-1986:3, so t = 1, . . . , T = 103.

RNSP appears to be an appropriate tool here, as the data displays heterogeneity and possibly some heavy-tailed movements towards the latter part. We run the RNSP algorithm with the default setting of M = 1000, with α = 0.1 and with overlaps as defined in (15). The procedure returns two intervals of significance: [23, 75] and [65, 91] . These are shown in Figure 1 , together with their mid-points, which can serve as estimates of the changepoint locations. As with any RNSP execution with non-zero overlaps, one question that may be asked is whether the two intervals may be indicating the same change-point, but this, visually, is extremely unlikely here (the reason for using non-zero overlaps is simply to provide larger samples for the RNSP following the detection of the first interval; using zero overlaps means the samples are too short and RNSP with zero overlaps does not pick up the second change-point). Therefore, the solution points to a model with at least two change-points. This is consistent (or at least not inconsistent) with both Garcia and Perron (1996) (who also settle on a model with two change-points) and Bai and Perron (2003) (who prefer a three-change-point model, not excluded by RNSP here).

The difference between those two earlier analyses and ours is that those two (a) were based on asymptotic arguments (and therefore valid asymptotically, for unspecified large samples) and (b) were conditional in the sense that the confidence regions for changepoint locations in those two works were conditional on the detection event. By contrast, our analysis via RNSP has a finite-sample nature and the intervals of significance have an unconditional character. Importantly, we do not make any distributional assumptions besides independence and sign-symmetry, both of which are likely to be acceptable for this dataset. The analysis via RNSP is unaffected by the likely heterogeneity in the data.

We analyse the weekly interest in the search term "data science" from Google Trends, in the US state of California. The link to obtain the data (accessed on 24 August 2021) was https: //trends.google.com/trends/explore?date=today%205-y&geo=US-CA&q=data%20science.

Google Trends describe the data as follows. "Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term." Weeks in this data series start on Sundays and the dataset spans the weeks from w/c 28th August 2016 to w/c 15th August 2021 (so almost five years' worth of data). The observations are discrete (integers from 22 to 100), which would likely pose difficulties for the competing methods as outlined earlier.

We execute the RNSP procedure with the default setting of M = 1000 and with α = 0.1, with no overlaps, which returns the three intervals of significance shown in Figure 3kqwDWO) reports that "2020 was the first year since 2016, Data Scientist was not the number one job in America, according to Glassdoor's annual ranking. That title would belong to Front End Engineer, followed by Java Developer, followed by Data Scientist." However, visually, a similar decline in interest is observed e.g. in the analogous Google Trends series for the term "Java" (not shown). To offer some comparative perspective, there is also a slight decline in the search intensity for the term "artificial intelligence" (also picked up by RNSP; not shown), but it is much less pronounced than those in "data science" or "Java". The role of the Covid-19 pandemic in these declines, their timings or relative magnitudes is not immediately clear.

Note that Corollary 3.1, crucial to the success of RNSP, can be rewritten as D [sm,em] ≤ sign{X sm:em − Q 1/2 (X sm:em )} I a [sm,em] ,

where Q q (·) is the population q-quantile. However, the left-hand side of (16) was not specifically constructed with q = 1/2 in mind and therefore the inequality 

is true for any q ∈ (0, 1). This shows that RNSP can equally be used for significant change detection in any quantile, and not just the median. However, for q = 1/2, the challenge is to obtain the null distribution of sign{X −Q q (X)} I a . Even if this challenge is overcome (e.g.

by simulation), RNSP as defined in this work may not be effective for change detection in quantiles "far" from the median, due to the particular way in which D [sm,em] is constructed (involving minimisation over all levels). For RNSP to be a successful device for change detection in other quantiles, the definition of D [sm,em] would have to be modified to only minimise over 'realistic' candidate levels not far from the population q-quantile under the local null.

Segmentation of the mean of heteroscedastic data via crossvalidation

Estimating and testing linear models with multiple structural changes

Computation and analysis of multiple structural change models

Multiple testing of local extrema for detection of change points

Bootstrap confidence intervals for multiple change points based on moving sum procedures

Computing valid p-value for optimal changepoint by selective inference using dynaming programming

A MOSUM procedure for the estimation of multiple random change points

Exact change point detection with improved power in small-sample binomial sequences

Exact and efficient Bayesian inference for multiple changepoint problems

Multiscale change-point inference (with discussion)

Wild Binary Segmentation for multiple change-point detection

Narrowest Significance Pursuit: inference for multiple change-points in linear models

An analysis of the real interest rate under regime shifts

Multiple change-point detection via a screening and ranking algorithm

Exact post-selection inference for the generalized lasso path

Post-selection inference for changepoint detection algorithms with application to copy number variation data

Testing for a change in mean after changepoint detection

Limiting distribution for the maximal standardized increment of a random walk

Permutation tests for joinpoint regression with application to cancer rates

An application of MCMC methods for the multiple changepoints problem

Variational Estimators in Statistical Multiscale Analysis

FDR-control in multiscale change-point segmentation

Efficient change point detection for genomic sequences of continuous measurements

Nonparametric estimation of smooth regression functions

Price signatures

Heterogeneous change point inference

Invariance principle under self-normalization for nonidentically distributed random variables

Acknowledgements I wish to thank Shakeel Gavioli-Akilagun and Zakhar Kabluchko for helpful discussions.