key: cord-0504461-yjb4qm2g
authors: Tomlinson, Matthew F.; Greenwood, David; Mucha-Kruczynski, Marcin
title: 2T-POT Hawkes model for dynamic left- and right-tail quantile forecasts of financial returns: out-of-sample validation of self-exciting extremes versus conditional volatility
date: 2022-02-02
journal: nan
DOI: nan
sha: 0e941c621bbbfd428c6dd938b56de413c015fdee
doc_id: 504461
cord_uid: yjb4qm2g

Dynamic extreme value analysis offers a promising approach to the forecasting of the extreme tail events that often dominate systemic risk. We extend the two-tailed peaks-over-threshold (2T-POT) Hawkes model as a tool for dynamic quantile forecasting for both the left and right tails of a univariate time series; this is applied to the daily log-returns of six large cap indices using a wide range of exceedance thresholds (from the 1.25% to 25.00% mirrored quantiles). Out-of-sample convergence tests find that the 2T-POT Hawkes model offers more accurate and reliable forecasting of next step ahead extreme left- and right-tail quantiles -- as measured by (mirrored) value-at-risk and expected shortfall at the 2.5% coverage level and below -- compared against GARCH-type models. Quantitatively similar asymmetries in the parameters of the Hawkes arrival process are found across all six indices, adding further empirical support to a temporal leverage effect in which the impact of losses is not only greater but also more immediate. Our results suggest that asymmetric Hawkes-type arrival dynamics are a better approximation of the true data generating process for extreme daily log-returns than GARCH-type variance dynamics and, therefore, that the 2T-POT Hawkes model presents a better performing alternative to GARCH-type models.

A notable feature of many complex systems is that local trends are influenced more by rare extreme events than by more typical fluctuations [1] . As a result, these extreme tail events often dominate the associated systemic risk, which makes accurate forecasting of them a vital objective in many disciplines. Extreme value theory (EVT) presents an approach to this problem in which asymptotic tail behaviour is modelled independently from the typical "bulk" fluctuations, under the justification that the two are often generated by distinct mechanisms [2] [3] [4] . A cornerstone of this approach is the Gnedenko-Pickands-Balkema-de Haan (GPBH) theorem, which states that the limiting distribution for extreme events that exceed some threshold is the generalized Pareto (GP) distribution [1, 5, 6] . The simplest application is realized when the data generating process for extreme events is stationary. In this case, the GP tail distribution is unconditional and the arrival process of events from this distribution is a homogeneous Poisson point process. Together, these two conditions mean that exceedances of any arbitrary threshold within the tail distribution arrive in time as a homogeneous Poisson point process. However, in some systems of interest, complex dynamics and feedbacks give rise to non-stationary behaviour that means these conditions cannot be assumed to hold. Applications to these systems demand the development of non-stationary EVT methods that can account for the time-clustering of extreme events. * mft28@bath.ac.uk A natural approach for such systems is presented by the peaks-over-threshold (POT) Hawkes model, which combines a GP tail distribution with a self-exciting inhomogeneous point process to describe the arrivals of tail events. The Hawkes point process, which first emerged as a stochastic model for the self-reflexive pattern of foreshocks and aftershocks that decorate major seismic activity [7] [8] [9] [10] , is defined by past events causing a timedecaying increase in the arrival rate of future events [11, 12] . This has found broad application to many systems characterized by activity bursts, including neural networks [13, 14] , crime [15, 16] , conflict [17, 18] , epidemics [19] , social media [20] , and financial markets [12, [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] . An intriguing class to consider are systems that experience extreme events from more than one tail. For instance, the fluctuations within a one-dimensional driftdiffusion-like process, such as a financial price time series, will experience both left-and right-tail extremes whose arrivals are mutually exclusive in time. This case is addressed by the two-tailed peaks-over-threshold (2T-POT) Hawkes model developed in [21] , wherein left-and righttail threshold exceedances are described by an asymmetric self-and cross-exciting Hawkes-type arrivals process combined with asymmetric conditional GP tails. This can account for the time clustering of threshold exceeding extremes from both tails, the correlation between the arrival intensity and the magnitude of exceedances, the heavy tailed distributions of those excess magnitudes, and the propensity for all these features to exhibit leftversus-right tail asymmetry -all of which are observed in the fluctuations of financial asset prices as measured by daily log-returns [21, [36] [37] [38] [39] [40] .

In this paper, we develop the 2T-POT Hawkes model as a tool for dynamic quantile forecasting for both the left and right tails of the same real univariate discrete time series X t . This model is applied to the daily log-returns of six international large cap equity indices and an extensive out-of-sample validation is used to evaluate the accuracy of its next step ahead forecasts of left-and right-tail quantile-based risk measures compared against equivalent forecasts produced by industry standard generalized autoregressive conditional heteroskedasticity (GARCH) models [38] , including the hybrid GARCH-EVT model that incorporate asymmetric GP tails [41, 42] . Much of the financial risk literature focuses exclusively on left-tail risk. Indeed, the risk measures that we evaluate here for each tail are typically defined with respect to the left-tail only. These measures are popularly known as the valueat-risk (VaR), which is a conditional quantile for the return of an investment over a given holding period, and the expected shortfall (ES), which is the expected value of the return given that it is less than the VaR. However, this previous focus on the left-tail disguises that the equivalent right-tail extremes, which are highly correlated with their left-tail counterparts [43] , can either quickly mitigate the impact of their mirror opposites or present an equivalent risk (or opportunity) under certain investment strategies. This is especially apparent in the wake of the Covid-19 pandemic and the impact its outbreak had on global financial markets. On 2020-03-12, the S&P 500 (SPX) index suffered its worst daily loss since the infamous 1987 Black Monday crash, declining by -9.5%. Five of the next twelve trading days saw losses of -12.0%, -5.2%, -4.3%, -2.9%, and -3.4%. Crucially, however, the same period also saw daily gains of +9.3%, +6.0%, +9.4%, +6.2%, and +3.4%. Without these strong upswings, the index would have lost a staggering 31.1% of its value over this period rather than the 4.2% aggregate drop that it did experience.

Our results show that, out of the tested models, the 2T-POT Hawkes model produces most reliably accurate forecasts in both the extreme left-tail (2.5% quantile or less) and the extreme right-tail (97.5% quantile or greater). This is not only of practical use, but also suggests that the asymmetric Hawkes-type arrival dynamics -which are found to be quantitatively similar to those reported in [21] across a wide range of threshold levels for five of the six indices -are a better approximation of the true data generating process for extreme daily logreturns than GARCH-type variance dynamics.

In Section II, we introduce and reparametrize the 2T-POT Hawkes model and extend its support to the full distribution of X t by introducing a non-stationary parametric intra-threshold bulk distribution subordinated to the arrival dynamics; we also specify the GARCH and GARCH-EVT models used as an industry standard benchmark for quantile forecasting [38] . In Section III, we describe the financial data employed in our analysis. Then, in a novel step, we fit the models using a wide range of exceedance thresholds (from the 1.25% to 25.0% mirrored quantiles). The asymmetries found in the in-sample parameter estimates for the 2T-POT Hawkes model are then compared against those reported in [21] . An extensive out-of-sample validation and comparison of the left-and right-tail quantile-based risk measure forecasts produced by the specified models is performed through convergence tests in Section IV.

In Appendix A, we derive the reparametrization of the Hawkes model in terms of the expected intensity. In turn, Appendix B defines the likelihood function and optimization procedure for the 2T-POT Hawkes model and shows the results of the likelihood ratio tests used for model selection.

We first explicitly define the quantile-based risk measures that are subject to validation in Section IV so that they can be explicitly related to each model specified later in this section. When applied to the left-tail, these measures are known in the financial risk literature as value-atrisk and expected shortfall. However, because we generalize these to include equivalent measures for the right-tail, we adopt the more generic names: conditional quantile and conditional violation expectation.

If a discrete stochastic time series X t is generated according to the conditional cumulative density function (cdf) F X,t , then the left-tail conditional quantile (i.e. the value-at-risk) at the coverage level a q over the holding period from t − 1 to t is

such that a fraction a q of X t are less than Q ← aq,t . Violations of the left-tail conditional quantile are indicated by the left-tail violation series

where

is the Heaviside step function. The right-tail conditional quantile is similarly defined

such that a fraction a q of X t are right-tail violations, as identified by the series, I → aq,t = I[+(X t − Q aq,t )]. Note that the superscripts ← and → are used to denote the left-and right-tail, respectively; henceforth the superscript is used to represent either tail (i.e. either ← or →) in generic expressions and the superscript ↔ denotes a union of the left and right tails.

The conditional quantile has the limitation of providing no information on potential values beyond itself. In the context of finance, this has drawn the interest of many practitioners to expected shortfall (known as lefttail conditional violation expectation under our nomenclature) as an alternative risk measure. Indeed, expected shortfall is now recommended as a risk measure by the Basel Committee on Banking Supervision [44] . The left-and right-tail conditional violation expectations are defined as the conditional expectation of X t given a leftor right-tail violation, respectively:

is the expectation operator and f X,t = dF X,t /dX is the conditional probability density function (pdf) of X t . Note that throughout this paper F and f are used to denote cumulative density functions and probability density functions, respectively. These are accompanied with subscripts to denote the variable or model associated with the distribution (conditional distributions are also indexed in time with the subscript t).

B. Two-tailed peaks-over-threshold (2T-POT)

Hawkes model

The two-tailed peaks-over-threshold (2T-POT) Hawkes model developed in [21] defines two sets of exceedance events in {X t }.

Left-tail exceedances events are defined as the values of {X t } that are less than left-tail exceedance threshold u ← ; this series of events, which is indexed with k, is fully described by the series of left-tail excess magnitudes

Similarly, right-tail exceedance events are defined with respect to the righttail exceedance threshold u → , and are fully specified by the series

In the most general description, the arrivals of left-and right-tail exceedance events are counted by two distinct point processes, which can be viewed as the components of the bivariate point process

where δ(t ) is the Dirac delta function. The arrival rate at time t of events within either point process is the conditional intensity for that process,

The explicit time-dependence of λ (t|H t ) specifies N (t) to be inhomogeneous point processes; Hawkestype behaviour is specified by the conditional dependence on the event history up to the present time t, H t = {(t k , M k ) : t k < t}. If N ← and N → are treated as distinct point processes, the conditional probability of a left-tail (right-tail) exceedance event occurring at time t is

Note, however, that this is incompatible with the fact that the arrivals of these events within X t are mutually exclusive. Moreover, this treatment cannot natively forbid the case where the probability of an exceedance from either tail occurring at time t is found to be greater than one, i.e. p ↔ t = p ← t + p → t > 1. It is established in [21] that, in the context of financial log-returns, these two requirements can be enforced at an insignificant cost to the goodness of fit by using the common intensity 2T-POT Hawkes model in which both types of extreme are counted within the same common point process N ↔ (t), whose arrival rate is given by the one-dimensional common conditional intensity, λ ↔ . Each exceedance event within N ↔ (t) is then stochastically drawn from either tail upon arrival. In the simplest case, it is assumed that it is equally likely that the event is drawn from either tail, in which case the conditional probability of a lefttail (right-tail) exceedance event occurring at time t is

The Hawkes-type arrival dynamics of the common intensity process are constructed through a constrained bivariate model that still allows for asymmetric self-and crossexcitement between asymmetric tails:

where θ u is the parameter vector for the Hawkes exceedance model, µ ↔ is the constant exogenous background intensity for the common arrival process, γ ↔ ≡ [γ ↔← , γ ↔→ ] T is the branching vector, and χ(t|θ u ; H t ) ≡ [χ ← , χ → ] T is the vector of endogenous excitements generated by the arrivals of left-and right-tail exceedance events.

As is illustrated in Fig. 1 , the self-exciting dynamics of the Hawkes process can be understood as a branching process, in which daughter events are triggered by the additional endogenous intensity produced by the arrival of prior mother events. γ ↔ ≡ [γ ↔← , γ ↔→ ] T is called the branching vector, because γ ↔ is the mean number of . Illustration of Hawkes-type self-exciting exceedance arrivals as a branching process. The drift-diffusion-like discrete time series Pt (top panel) is transformed into the pseudo-stationary series of log-differences, Xt = ∆tln Pt = ln Pt − ln Pt−1 = ln Pt/Pt−1 (second panel from top), with threshold-exceeding "extreme" values marked in bolder shading. The 2T-POT Hawkes exceedance model (third panel) describes the common conditional intensity λ ↔ (light orange and dark blue dashed) of left-and right-tail exceedance events as a linear sum of the endogenous excitements χ generated by the arrival of past left-(light red) and right-tail (dark green) exceedances. This may be understood as a branching process (fourth panel) in which the daughter events in generation n + 1 are spawned from the endogenous intensity produced by mother events in generation n.

daughter events in N ↔ that are triggered by a mother event in N . This is so, because the endogenous excitement χ is normalized, such that the expected lifetime contribution of each event in N to χ is one. This normalization also guarantees that the model is uniquely fitted. The process is subcritical (i.e. non-explosive) provided the aggregate branching ratio (γ ↔← + γ ↔→ )/2 is less than 1 [45] .

The components of the endogenous excitement vector χ are the sums of contributions from all past events in each tail, (11) where the time kernel φ is a monotonically decreasing function of the time between the arrival of the past event, t k , and the present, t. Here, this is taken as an exponential decay with constant β ,

This allows for Eq. (11) to be recast in Markov form,

The conditional impact function κ (M |t) is a monotonically increasing function of the excess magnitude M k .

Following the approach of [31, 33] , this is defined so that the intensity jump from the exceedance event arriving at time t k is determined by the contemporaneous value of the conditional cumulative distribution function of excess magnitudes for that tail

where the mark parameter α ≥ 0. When α = 0, larger magnitude events produce greater jumps in the excitement. Crucially, this reduces sensitivity on the choice of threshold value u , since κ (M |t) → (1 + α ) −1 as M → 0. Conversely, when α = 0, κ t becomes unity and an unmarked Hawkes process -in which χ is independent of the magnitudes of past events -is recovered.

Note that E[κ (M |t)] ≡ 1 for all values of α and t. Also note that superscript symbols are used to specify parametric probability distributions: N for the normal distribution, t for the Student's t-distribution, and P for the generalized Pareto distribution.

In accordance with the GPDH theorem [5, 6] , it is assumed that the excess magnitudes are distributed according to a conditional GP distribution, as specified by the cdf F P,

where the shape parameter ξ can specify a range of tail heaviness over three distinct phases: from the finite decay of the Weibull distribution (ξ < 0), through the exponential decay of the Gumbel distribution (ξ = 0), to the increasingly leptokurtic power-law decay of the Fréchet distribution (ξ > 0) [1, 31] . Conditional dependence on the excess (i.e. non-background) intensity of the Hawkes process is introduced via the conditional scale parameter

Thus, when η > 0, larger magnitude events become more likely when the arrival rate of exceedances is high, as is generally observed in financial returns [36] . Conversely, when η = 0 the excess magnitudes are drawn from an unconditional GP distribution with a fixed scale parameter ς . In this paper, the 2T-POT Hawkes model is reparametrized in terms of the expected average intensity a ↔ λ = E[λ ↔ ], with the background intensity being calculated as

The above relationship is derived in Appendix A. It is more efficient to use a ↔ λ as a fitting parameter instead of µ ↔ , since the former is orthogonal to γ ↔ . More significantly, it is typical in the peaks-over-threshold Hawkes literature for the threshold values u to be set equal to the value of in-sample quantiles. Here, this is expressed by the threshold level a u ∈ [0, 0.5], such that the left-tail threshold is equal to the in-sample a u -quantile, u ← =X au ≡ F −1 X,in (a u ), and the right-tail threshold is equal to its mirrored in-sample quantile, u → =X 1−au ≡ F −1 X,in (1 − a u ). It follows that each component of the insample expected average intensity a ↔ λ is asymptotically equal to 2a u d t , where d t is the unit measuring one step in discrete time (in the case of daily log-returns, d t denotes trading days), as the size of the in-sample period tends to infinity. Thus, a ↔ λ = 2a u d t can be used either as an initial value in parameter estimation or, as is done in this paper, used as a fixed constraint. It is shown by likelihood ratio tests in Appendix B that this constraint achieves a parameter reduction of one at negligible cost to the goodness of fit.

The common intensity 2T-POT Hawkes exceedance model is fully specified by the set of parameters, θ u = {γ ↔ , β, ξ, ς, η, α; a u }, where vector quantities are of the form, β ≡ [β ← , β → ] T . This model is hereafter denoted as H 2 (a u ). Note that, if all parameters in θ u are constrained to be symmetric (so that the left-and righttail components of all vector parameters are equal), then the common intensity 2T-POT model is equivalent to the classical one-tailed peaks-over-threshold (1T-POT)

Hawkes model applied to the absolute values of a copy of the original time series that is centred on the midpoint between the thresholds. That is, the set of abso-

and a univariate Hawkes model applied to this exceedance series describes equal self-and crossexcitations between left and right tails that are symmetric in all properties. Hereafter, this is refereed to as the 1T-POT Hawkes exceedance model and is denoted as H 1 (a u ).

It is simple to calculate left-tail (right-tail) conditional quantile at the coverage level a q over the holding period t − 1 to t, i.e. to calculate Q aq,t , using H 2 (a u ) or H 1 (a u ) provided that the conditional probability of a left-tail (right-tail) exceedance estimated by the model is greater than or equal to the coverage level, i.e. p t ≥ a q . In this case, Q aq,t will lie within the GP tail distribution, F P, M,t , specified by the exceedance model. Thus,

Hence,

The conditional violation expectation over the same holding period can then be calculated as

(20) If, however, the left-tail (right-tail) exceedance probability is less than the coverage level (i.e. p ,t < a q ), then the hypothetical conditional quantile would lie within the intra-threshold bulk -outside of the native support of F P, M,t . Hence, it is desirable to extend the support of the POT Hawkes models to the full distribution of X t by including a conditional distribution for the intra-threshold bulk. Even when solely concerned with the forecasting of extreme returns, this bulk distribution becomes increasingly necessary as the coverage level a q approaches the threshold level a u from below, and even more so if forecasts are to be made over more than a single time step. Moreover, the validation methods used in Section IV require that Q aq,t and E aq,t are defined at all values of t; this can only be guaranteed when the full distribution of X t is supported. We therefore introduce a parametric bulk distribution with dynamic parameters that are constrained by the conditional intensity of threshold exceedances. The dynamic bulk distribution is described by the conditional cdf F D B,t , where the superscript D is replaced by the symbol of the specified parametric distribution (N for the normal distribution and t for the Student's tdistribution). F D B,t is transformed by the conditional location and scale parameters, m t and s t , so that its value at the thresholds u matches the probability of an exceedance event at time t as determined by the Hawkes arrival process,

where θ D B is a vector containing the additional unconstrained parameters of the chosen parametric distribution D. The conditional pdf for X t at time t is then a weighted piecewise union of the bulk and tail distributions: C. Generalized autoregressive conditional heteroscedasticity (GARCH)

In this paper, the H D 2 (a u ) and H D 1 (a u ) Hawkes models are compared against the family of generalized autoregressive conditional heteroscedasticity (GARCH) models, which are the standard reduced form models for log-returns in financial forecasting [38] . GARCH models have earned this status due to their parsimonious description of two key stylized facts of financial returns: volatility clustering, which states that the standard deviation of log-returns (known as the volatility) is non-constant and exhibits significant positive autocorrelation, and the leverage effect, which states that volatility increases when returns become more negative [36, 38] . These models take the form of a conditional volatility process in which a real univariate discrete time series is generated as

where µ is the unconditional mean, σ t is the conditional volatility, and t are independent and identically distributed (i.i.d.) random innovations drawn from a parametric distribution F D , typically with zero mean and unit variance. The GARCH(p, r, q) model describes the conditional variance σ 2 t with an autoregressive moving average (ARMA) process

where ω is the minimum conditional variance and {α i , β j , γ k } are the ARCH coefficients. When r > 0, Eq. (24) is a GJR-GARCH model [46] that accounts for the leverage effect through the Heaviside step function.

In this paper, we consider the GARCH(p = 1, r, q = 1) model with (r = 1) and without (r = 0) the leverage effect. For the innovation distribution, we consider the unit normal distribution (F N = F N0,1 ) and the unit Student's t-distribution with ν degrees of freedom (F t = F t0,1,ν ). Hereafter, we label these specifications of the GARCH model as G D r (0).

When comparing the Hawkes models specified in Section II B to the standard non-EVT GARCH models specified in Section II C 1 there are two main sources of difference. First, the dynamics of the Hawkes model are driven solely by the excitation from exceedance events, with the location and scale parameters of the bulk distribution being wholly subordinated to the conditional intensity. Conversely, GARCH models assume that each innovation in t contributes to the conditional volatility in proportion to its magnitude. The second source of difference is that the Hawkes models incorporate asymmetric extreme value distributions in the form of GP tails, whereas the GARCH models assume that X t is fully supported by one continuous (typically symmetric) parametric distribution modulated by the conditional volatility.

The first distinction can be isolated by using a GARCH-EVT model [41, 42] to negate the second. In this updated model, the innovation series t is now described by a stationary piecewise distribution of the form

where f D B is the pdf of a continuous parametric distribution of zero mean and unit variance 1 and the GP tail distributions f P, M are parameterized equivalently to 1 Note that f D will have a different variance compared to f D B .

Eq. (15) except that the scale parameter remains constant and is therefore denoted as ς . Thus,

The innovation thresholds are set according to the threshold level a u , such that u ← ≡ F D B −1 (a u ) and

. Thus, when a u = 0, the standard non-EVT GARCH models are recovered. Accordingly, the GARCH-EVT model is labelled as G D r (a u ). Estimates for the innovation tail parameters,ξ andς , are obtained by maximum likelihood (ML) estimation over the estimated innovationsˆ t = (X t −μ)/σ t [41] (here, we use hat accents to denote observed or estimated values).

The models specified in Section II are applied to the daily log-returns of six international large cap equity indices: the S&P 500 (SPX) and Dow Jones Industrial Average (DJI) from the U.S.A., the DAX 30 (DAX) of Germany, the CAC 40 (CAC) of France, the Nikkei 225 (NKX) of Japan, and the Hang Seng index (HSI) of Hong Kong. These series are widely investigated financial benchmarks that are often perceived as proxies for the broader equity market in the world's three major financial centres: North America, Western Europe, and East Asia. We specifically use the daily log-returns over the period beginning 1975-01-01 2 and ending 2021-11-20. This is divided into an in-sample training period and an out-of-sample forecasting period beginning 2015-01-01. The in-sample period is a forty-year span that includes among other notable episodes the 1987 Black Monday crash, the 1997-8 Asian Financial Crisis, the 2000-2002 Tech Bubble crash, and the 2007-8 Global Financial Crisis. The out-of-sample period encompasses more than 82 months of financial data and includes the severe fluctuations caused by the outbreak of the Covid-19 pandemic in March 2020 and its aftermath. The data were sourced from the stooq.pl online database [47] ; summary statistics are provided in Table I . We note that, for both the in-and out-of-sample periods, each of the six series are not of the exact same length T due to idiosyncratic holidays and suspensions. 2 The official base date for both the DAX 30 and CAC 40 indices is 1987-12-31; however, both of these series can be extended backwards using their direct predecessors: the former DAX index and the Insee de la Bourse de Paris, respectively.

As is discussed in Section II B 1, the exceedance thresholds u of the POT Hawkes models are determined by the choice of threshold level a u , which is regarded as a tuneable parameter and is expressed as a quantile. In the simplest EVT applications, where an unconditional GP distribution is fitted to a stationary tail, threshold selection is considered as a classic bias-variance tradeoff: a less extreme threshold (i.e. higher threshold level a u ) ensures a greater number of observations in the tail; reducing noise and improving the stability of parameter estimates. However, the asymptotic nature of the GPDH theorem means that the exceedance distribution is better approximated by the GP distribution as the threshold becomes more extreme (i.e. as the threshold level a u is lowered to 0). Diagnostic tools exist for the latter issue, and these are often used to inform threshold selection in this simplest case. However, this is complicated by the introduction of Hawkes-type arrival dynamics: now, the threshold additionally determines not only the expected distribution of the conditional intensity λ ↔ , but also which values of X t provide information to the arrivals process and which do not. At one level, this complicates the aforementioned bias-variance trade-off due to the inhomogeneous arrivals of tail events (and thus of information density). At the same time, this may also introduce distinct phases as a function of a u in which the 2T-POT Hawkes model identifies signals from the data generating mechanisms of the underlying system. This could be evidenced through a phase transition along a u in the fitted parameters or in the forecasting accuracy, either of which could provide a physically meaningful definition of an extreme event in the combined context of arrivals and magnitudes. To explore this, we calibrate the threshold-based models across a wide range of threshold levels: a u = 0.0125k u , where k u ∈ Z ∪ [1, 20] . We believe this is a novel contribution in the application of POT Hawkes models to financial returns, since, to our knowledge, previous literature has only considered a single (arbitrary) threshold level per study, typically within the range 0.025 ≤ a u ≤ 0.1.

The H 2 (a u ) and H 1 (a u ) Hawkes models were calibrated across this range of threshold levels using the estimation procedure described in Appendix B. Likelihood ratio tests were used to select which variant of the model would be used in the later forecasting validation: this selected for the common intensity model with the a ↔ λ = 2a u d −1 t constraint applied and for a Student's tdistributed bulk. Tables containing the results of these likelihood ratio tests are included in Appendix B along with a graphical presentation of the estimated parameters (with standard errors) of the selected H t 2 (a u ) model as a function of a u . The fit of the exceedance model can be assessed through the residual analysis methods used in [21] . These are summarized graphically in Fig. 2 for the H 2 (a u = 0.025) exceedance model fitted to the DAX daily log-returns. A further aim of fitting the Hawkes exceedance models across a wide range of threshold levels a u is to verify whether the asymmetric Hawkes arrival dynamics reported in [21] (γ ↔← /γ ↔→ = 2.2 ± 0.5 andβ ← /β → = 4.6 ± 1.2) persist across this range. It is anticipated that the parameters will converge to symmetry as a u → 0.5, since, in this limiting case, every value within X t will qualify as an exceedance event: thus, the conditional intensity would be fixed and entirely attributed to the exogenous background, making excitation from either tail equally redundant. Fig. 3 shows that, with some exceptions, the estimated asymmetries in the arrival parameters are remarkably stable along a u and between the different indices. A slight convergence towards symmetry in both parameters is generally observed as a u increases, but there is no evidence of a sharp transition within the investigated range. The left-over-right component ra-tios are quantitatively similar to those reported in [21] in the vast majority of cases and are never observed to invert. A notable outlier is the HSI branching vector, which converges to symmetry at a u = 0.1. This possibly reflects that the HSI had a notably higher average volatility in the in-sample period compared to the other indices, which corresponds to the fact that the HSI was a less developed market in this period and, therefore, possibly should be regarded as a separate class. It is also noted that the NKX and HSI decay constant asymmetriesβ ← /β → are much larger at low a u compared to the other indices.

Overall, these results add further empirical support to hypothesis of [21] that there is a temporal aspect to the leverage effect observed in financial daily log-returns, such that the impact of losses on future extremes is not only greater but also more immediate. This import- 

Asymmetries in the in-sample estimated Hawkes arrival parameters for H2(au). Estimated left-(light orange) and right-tail (dark blue) components of the branching vector γ ↔ (first row, thin black line is the branching ratio) and decay constant β (third row) are shown above the left-over-right component ratios (second and fourth rows).

ant structural feature contributes to the forecasting of exceedance arrivals, which will be seen in the relative forecasting performance of H 2 (a u ) compared to the fully symmetric H 1 (a u ) model in Section IV.

Six variants of the models specified in Section IIthe H t 2 (a u ) and H t 1 (a u ) Hawkes models, the G t 1 (a u ) GARCH-EVT model, and three non-EVT GARCH mod-

-are used to produce next step ahead forecasts of the left-and right-tail conditional quantiles (Q aq,t ) and conditional violation expectations (Ê aq,t ); the accuracy of these forecasts is then validated and compared through out-of-sample convergence tests. Forecasts are evaluated across the range of coverage levels: a q = 0.0025k q , where k q ∈ Z ∪ [1, 60] . This is a much wider and finer range of coverage levels than has been investigated in similar analysis in previous literature [35, 42, 48, 49] : it is intended to more fully compare where the relative advantages of each model lie and to explore how this depends on the threshold level a u . Since each of the convergence tests are defined identically for the left-and the right-tail, the superscript is dropped from I aq,t , Q aq,t , and E aq,t in the remainder of Section IV for clarity unless explicitly required.

The first set of convergence tests evaluate the accuracy of the conditional quantile forecastsQ aq,t by considering the series of observed violationsÎ aq,t . If the conditional quantile forecasts are perfectly accurate (i.e.Q aq,t = Q aq,t ∀t), then the arrival process of violations is a Poisson point process,

All of the convergence tests in Section IV A derive their null hypothesis from Eq. (27).

The unconditional convergence (UC) test [50] interrogates the null hypothesis that the proportion of violations π is equal to the assumed coverage level a q (H 0 : π = a q ). The UC test is formulated as a likelihood ratio test which compares two Bernoulli likelihood functions. Asymptotically, as the total number of observations in the sample, T , goes to infinity, the test statistic LR UC is distributed as χ 2 with one degree of freedom:

where T 1 = tÎ aq,t is the observed number of violations in the sample of length T . H 0 is rejected if the con-ditional quantile forecastsQ aq,t have a significant bias, such that they consistently under-or overestimate the true conditional quantiles Q aq,t in the sample.

The conditional coverage (CC) test [51] seeks to verify both the correct coverage and the independence of violations over consecutive observations. The process of violations is described by a first-order Markov model, with the CC test based upon the estimated transition matrix π 00π01 π 10π11 = T 00 / (T 00 + T 01 ) T 01 / (T 00 + T 01 )

whereπ ij denotes the estimated conditional probability of there being a violation (j = 1) or no violation (j = 0) at t given that there was (i = 1) or was not (i = 0) a violation at t−1, and where T ij , for i ∈ {0, 1}, j ∈ {0, 1}, is the number of observations whereÎ aq,t = j|Î aq,t−1 = i. The null hypothesis of the CC test is that the conditional probability of a violation at t with and without a violation at t − 1 are both equal to the assumed coverage level, i.e. H 0 : π 01 = π 11 = a q . As the total number of observations T goes to infinity, the test statistic LR CC is asymptotically distributed as a χ 2 with two degrees of freedom:

T00πT 01 01 (1 −π 11 )

T10πT 11 11 ∼ χ 2 2 .

(30) H 0 is rejected if either the conditional quantile forecasts at time t that follow a violation at t−1 (Q aq,t |Î aq,t−1 = 1) or that follow no violation at t−1 (Q aq,t |Î aq,t−1 = 0) have a significant overall bias in the sample.

The dynamic quantile (DQ J ) test [52] is designed to detect higher-order autocorrelation in the series of violations as well as dependence on other explanatory variables. The test is based upon the hit function,

Under the correctly specified model, the seriesĤit aq,t should be i.i.d. and have zero mean. Accordingly,Ĥit aq,t should be independent of lagged values of itself and also independent of the contemporaneous conditional quantilê Q aq,t , such that the conditional expectation of Hit aq,t should be 0 regardless of any such information available at t − 1. The DQ J test used here is derived as the Wald statistic from an auxiliary regression:

where the DQ J null hypothesis is H 0 : φ k = 0 ∀k ∈ Z 0 ∪ [0, 1 + J]. In other words, the null states that the observed violation coverage probability is equal to the assumed coverage level (φ 0 = 0) and that there is no dependence of Hit aq,t on the 1 + J explanatory variables. The DQ test statistic is asymptotically χ 2 distributed with 2 + J degrees of freedom:

whereĤit aq is a 1×T vector containing the series Hit aq,t observed within the sample and A is a T × (2 + J) matrix comprised of the observed sample series of each explanatory variable.

B. Conditional violation expectation convergence tests

The conditional violation expectation forecastsÊ aq,t are evaluated by testing the null hypothesis that the standardized discrepancies

at violations (i.e. when I aq,t = 1) have zero mean [41] , i.e. H 0 : t D aq,t I aq,t /T 1 = 0. Assumptions about the distributions of the standardized discrepancies are avoided by employing the dependent circular block bootstrap used in [49, 53] . The null hypothesis is subject to a two-tailed test against the alternative hypothesis, H 1 : t D aq,t I aq,t = T 1 = 0. Thus, the null is rejected when there is a significant bias in the conditional violation expectation forecastsÊ aq,t over the observed violations (Î aq,t = 1) within the sample.

The results of the convergence tests are summarized in Table II , which compares for each test the average p-values under each model within six bands of the coverage level a q . We are primarily focused on the results for the lowest coverage band (0 < a q ≤ 0.025), which corresponds to forecasts of the most extreme log-returns and therefore to the greatest risk. For the non-EVT (a u = 0) GARCH models, Table II simply quotes the mean p-values averaged over all values of a q within the coverage band a 0 q < a q ≤ a 1 q across all six indices. For the EVT (a u > 0) models, the p-values must be aggregated in a way that allows for meaningful comparisons against the non-EVT models while also indicating the sensitivity of forecasting accuracy on threshold selection. We therefore use the following procedure. Table II . Convergence test p-values averaged within coverage level bands a 0 q < aq ≤ a 1 q and between the six indices. For the EVT (au > 0) models, the median average p-value across au within each coverage band is taken and averaged across all six indices; the corresponding average 10% and 90% quantiles are shown below this in parentheses. Higher average p-value corresponds to the better performance on the test; the highest within each coverage band in each tail are highlighted in bold. 1. The mean p-value within each coverage band is calculated independently for each model at each value of the threshold level a u for each of the six indices.

2. For each model, index, and coverage band, step 1 produces 20 mean p-values corresponding to each value of threshold level a u . The 10% quantile, 90% quantile, and median from these 20 values are taken; these represent the forecasting accuracy when the threshold level is poorly selected, well selected, and neither well nor poorly selected, respectively.

3. The quantiles calculated in step 2 are averaged across the six indices to produce the average pvalues quoted in Table II . Table II , it is observed for the left-tail that, within the class of GARCH models, the more complex ones do produce more accurate forecasts -as measured by higher average p-values -within the lowest (i.e. most extreme) coverage level band. This demonstrates that each of the additional features -the leptokurtic innovation distribution [G t 0 (0)], the leverage effect in the variance dynamics [G t 1 (0)], and the asymmetric GP tails [G t 1 (a u )] -capture significant aspects of the data generating process for extreme left-tail daily log-returns. However, for the right-tail extremes, the simplest G N 0 (0) model outperforms the more complex variants in all of the tests based on the conditional quantile (UC, CC, and DQ 4 ) within the lowest coverage band. It is evident that the Student's t-distribution used for the innovations t in the more complex GARCH models overestimates the heaviness of the right-tail and so provides a worse approximation than the normal distribution. This is an expected problem for a symmetric innovation distribution that is fitted to the full distribution of observations in one step: the fit to the right-tail is compromised by the significantly heavier left-tail. By contrast, the asymmetric GP tails of the GARCH-EVT model are fitted to each tail independently, and so the negative impact on the forecasting accuracy of G t 1 (a u ) in both tails is mitigated. Within the lowest coverage level band for each tail, H t 2 (a u ) is consistently found to have the highest average p-values for the UC and CC tests compared to all other models. Against the other EVT models, H t 2 (a u ) returns higher average p-values at each of the performance quantiles along a u in Table II , showing that it retains some advantage regardless of whether the thresholds of these models are well chosen or not. Under the median performing thresholds, H t 2 (a u ) greatly outperforms all non-EVT GARCH models in the left-tail and all but the G N 0 (0) model in the right-tail [H t 2 (a u ) regains a significant supremacy in the right-tail when the thresholds are better selected]. Simply stated, the results for the UC and CC tests show that, on average across the six indices, the asymmetric H t 2 (a u ) Hawkes model produced the most reliably accurate forecasts of left-and righttail extreme conditional quantilesQ aq≤0.025,t within the out-of-sample period. The advantage that H t 2 (a u ) holds over H t 1 (a u ) within this most extreme coverage band is especially pronounced, meaning that the asymmetries in the arrival dynamics and GP tail distributions identified by the 2T-POT Hawkes model provide a significant gain in predictive power with respect to extreme daily logreturns. That H t 2 (a u ) outperforms G t 1 (a u ) highlights the significance of the asymmetric arrival dynamics.

Within the higher coverage level bands, different trends in the average UC and CC test p-values emerge within each tail. In both tails, the gulf in performance between the H t 2 (a u ) and H t 1 (a u ) Hawkes models vanishes (and even inverts on occasion) as a q increases; this is slightly more pronounced in the right-tail. In the lefttail, the POT Hawkes models become less accurate than the GARCH and GARCH-EVT models for a q > 0.025 except at the most well chosen thresholds. Conversely, in the right-tail, the POT Hawkes models retain an advantage over the GARCH models in all coverage bands -one that is notably larger for the CC test than for the UC test. That is, the most accurate forecasts of nonextreme conditional quantilesQ aq>0.025,t were produced by G t 1 (a u ) in the left-tail and by H t 2 (a u ) in the right-tail. To understand the aggregated results for the UC and CC tests in Table II , we examine the visualization of the full results in Figs. 4 and 5. A consistent tuning relationship is observed in the left-tail for the POT Hawkes models across the six indices, wherein there is an approximately linear relationship between coverage level a q and the optimal threshold level a u for forecasting accuracy at that coverage level. By contrast, the pattern of forecasting accuracy for G t 1 (a u ) is much less consistent across the different indices: while less dependent on the tuning of a u , peak accuracy is much more inconsistent along a q , typically being concentrated within limited ranges that are different for each of the indices. Practically, the stable tuning relationship observed for H t 2 (a u ) is of greater utility to forecasters, since it assures reliably accurate forecasting of left-tail quantiles over a much larger range of coverage levels a q , and particularly at the most extreme edge (a q → 0). Moreover, since both models feature asymmetric conditional GP tails, this difference must be attributed to the description of asymmetric Hawkes-type arrival dynamics versus GARCH-type variance dynamics.

In the right-tail, the tuning relationship for the POT Hawkes models is found to be inverted, such that the most accurate forecasts for high coverage levels are achieved when the threshold level a u is low and vice versa. Considering the asymmetries in the arrival parameters reported in Section III B, this likely reflects that less extreme (a q > 0.05) right-tail observations are frequently triggered by the preceding arrivals of more extreme (a u < 0.05) left-tail observations. Despite this unexpected pattern, the right-tail forecasting accuracy of H t 2 (a u ) retains many of the practical advantages compared to G t 1 (a u ) as observed in the left-tail, namely greater consistency between the different indices and approximate tuning relationships that allow for maximally effective forecasting over a wider range of coverage levels a q . Indeed, it is clear that the average p-values of G t 1 (a u ) for the right-tail reported in Table II these observations are robust with respect to lag-1 violation independence.

G t 1 (0) G t 0 (0) G 0 (0) a u G t 1 (a u ) a u H t 1 (a u )

G t 1 (0) G t 0 (0) G 0 (0) a u G t 1 (a u ) a u H t 1 (a u ) a u H t 2 (a u )(b) CAC G t 1 (0) G t 0 (0) G 0 (0) a u G t 1 (a u ) a u H t 1 (a u )

It is expected that the DQ 4 test might be less favourable to the POT Hawkes models due to their inherently less smooth response to extreme log-returns. Violations at time t − 1 (I aq,t−1 = 1) are more likely to also be exceedance events within N ↔ than are nonviolations (I aq,t−1 = 0). This would tend to cause larger increases in the forecast conditional quantileQ aq,t at time t due to the jump-like response of the Hawkes model to exceedances, causing a negative bias in the coefficient φ 1 in Eq. (32) . Nevertheless, the average p-values in Table II show that H t 2 (a u ) and G t 1 (a u ) perform comparably well in this test within the lowest coverage band. The most significant distinctions are found in the righttail, where H t 2 (a u ) performs notably better under well chosen thresholds, but notably worse under poorly chosen thresholds. As the coverage level increases, the performance of H t 2 (a u ) quickly declines relative to G t 1 (a u ) in almost all conditions. The exception is found in the righttail under well chosen thresholds, where H t 2 (a u ) retains a significant advantage for a q ≤ 0.075. Thus, as with the UC and CC tests, the 2T-POT Hawkes model is found to be especially proficient at forecasting right-tail conditional quantilesQ → aq,t compared against the GARCH-EVT model.

The ZMD test is the only test concerned with the accuracy of the conditional violation expectation forecastŝ E aq,t ; consequently, it places more emphasis on the accuracy of the tail distribution compared to the others. This is evident when comparing the average ZMD test pvalues under H t 2 (a u ) than H t 1 (a u ) in Table II : the former are consistently higher in both tails because of the significant asymmetries found in the shape parameterξ of the GP tail distributions (see Fig. 7 in Appendix B). In the lowest Table II coverage band, H t 2 (a u ) and G t 1 (a u ) show comparable performance in the ZMD test: in the left-tail, the two are only distinguished by a slight advantage for H t 2 (a u ) under well selected thresholds; in the right-tail, the two models are equally accurate under well selected thresholds, but the relative performance of H t 2 (a u ) declines as threshold selection becomes increasingly poor. Putting this in practical terms, the asymmetric 2T-POT Hawkes model at peak performance (i.e. under well chosen thresholds) was found to produce the most accurate next step ahead forecasts of the extreme (0 < a q ≤ 0.025) left-tail conditional violation expect-ationÊ ← aq,t (a.k.a. expected shortfall) and the near-joint most accurate forecasts (next to G t 1 (a u ) at peak performance) of the equivalent extreme right-tail conditional violation expectationÊ → aq≤0.025,t . Within the higher coverage bands, H t 2 (a u ) maintains a significant advantage over G t 1 (a u ) in the left-tail, while the opposite holds in the right-tail. Curiously, this inverts the pattern observed in the three previous convergence tests, wherein H t 2 (a u ) maintained a relative advantage over G t 1 (a u ) in the righttail as a q increased but not so in the left-tail. This is likely due to the simple fact the other tests -which examine the conditional quantile forecastsQ aq,t -consider all observations in X t , whereas the ZMD test only considers observations that are also violations (Î aq,t = 1). Fig. 6 visualizes the complete ZMD test p-values for three of the indices. In comparison to the UC test visualized in Figs. 4 and 5 , the ZMD test performance shows much weaker sensitivity on the threshold level a u in all cases. It is also apparent that the test is affected by low sample size at the smallest values of a u . Indeed, there is sometimes no defined value for the test statistic at the lowest values of a q because there are not enough violations at this coverage level within the out-of-sample period to perform the circular block bootstrap. Notably, this is only observed in the right-tail -indicating an asymmetric bias in all models that results in a systemic overestimation of the most extreme right-tail conditional quantilesQ → aq≤0.005,t (and perhaps also a systemic underestimation ofQ ← aq≤0.005,t as well).

We have developed the 2T-POT Hawkes model [21] as a dynamic extreme value analysis approach to quantile forecasting for both the left and right tails of the same real univariate discrete time series X t . By introducing a non-stationary parametric bulk distribution subordinated to the arrival dynamics, we extended the support of the 2T-POT Hawkes model to the full conditional distribution of X t ; this guarantees that left-and right-tail quantile-based risk measures at the coverage level a q are always defined ∀a q ∈ [0, 1]. Our reparameterization in terms of the expected average intensity a λ constrained by the threshold level a u achieves a dimension reduction of one at a negligible cost to the goodness of fit, as verified by in-and out-of-sample likelihood ratio tests.

We fitted the 2T-POT Hawkes model to the daily log-returns of six international high cap equity indices over the in-sample period 1975-01-01 to 2015-01-01 across a wide range of exceedance thresholds, as set by the threshold level a u = 0.0125k u , where k u ∈ Z ∪ [1, 20] . The significant asymmetries in the estimated branching vectorγ ↔ and the decay constantβ reported in [21] were reproduced here and were quantitatively similar across the six indices; moreover, these asymmetries were found to be stable over the majority of the tested threshold levels a u . This adds further empirical support to notion of a temporal leverage effect in which the impact of losses is not only greater but also more immediate.

We compared the next step ahead quantile forecasting accuracy of the asymmetric 2T-POT Hawkes model against both the symmetric single-tailed peaks-overthreshold Hawkes model and against industry standard reduced form models in the form of GARCH and GARCH-EVT. This was performed through an extensive out-of-sample (2015-01-01 to 2021-11-20) validation using convergence tests that evaluated the accuracy and independence of forecasts of the left-and right-tail conditional quantilesQ aq,t and the conditional violation ex-pectationsÊ aq,t . This greatly expanded upon similar analysis in previous literature [35, 42, 48, 49] : both by extending the analysis to the right-tail and by evaluating forecasts over a much wider and finer range of coverage levels a q = 0.0025k q , where k q ∈ Z ∪ [1, 60].

Overall, our asymmetric 2T-POT Hawkes model [H t 2 (a u )] was found to produce the most reliably accurate out-of-sample forecasts of bothQ aq,t andÊ aq,t at the most extreme coverage levels (a q ≤ 0.025) in both tails; moreover, our model was also the most consistent in its performance across the different indices. The comparison against the symmetric 1T-POT Hawkes model [H t 1 (a u )] confirms that incorporating the left-right-tail asymmetries within a POT Hawkes model provides a general and demonstrable dividend in predictive power. By controlling for the effect of asymmetric generalized tails, the comparison with the GARCH-EVT model highlights that the improved forecasting accuracy offered by the 2T-POT model in the case of extreme quantiles can be attributed to its asymmetric Hawkes-type arrival dynamics being a better approximation of the true data generating process for these observations than GARCH-type variance dynamics.

Possible future developments of this work include the extension of the analysis to multi-step ahead aggregate forecasts of the same quantile-based risk measures. This is a natural application for the 2T-POT Hawkes model, since accurate single-step forecasts for one tail would have a compounding impact on the multi-step aggregate forecasts of both tails. To further improve forecasting accuracy, the 2T-POT Hawkes model could be developed through the incorporation of explanatory variables as sources of non-constant exogenous intensity. Candidates relevant to the equity indices discussed here include volatility indices such as the CBOE Volatility Index (VIX). Finally, possible generating mechanisms for the universal asymmetries observed across the six indices may be explored by fitting the 2T-POT Hawkes model to the output of relevant agent-based models (ABMs). In the univariate Hawkes process, λ = µ + γχ, each mother event spawns γ daughter events on average. If the original (0-th) generation of events is triggered by the exogenous background intensity µ and every subsequent generation by the endogenous excitement produced by the preceding generation, then the n-th generation events will arrive with an average intensity γ n µ. Thus, by summation, the expected average intensity for the whole process is

In Eq. (A1), the infinite sum ∞ n=0 γ n = (1 − γ) −1 is the average number of events in a full lineage. Equivalently, it may be said that 1 − γ is the proportion of all events that are in the 0-th generation, hence

In the multivariate case, λ = µ + Γχ, the full lineage of the branching matrix is

where Q and D are the matrices of eigenvectors and eigenvalues of Γ, respectively, and I is the identity matrix. By analogy with Eq. (A1),

and, therefore,

The common intensity model is implemented as a constrained case of the bivariate process,

in which a ← λ = a → λ = a ↔ λ /2, µ ← = µ → = µ ↔ /2, and γ ← = γ → = γ ↔ /2. Substituting these into Eq. (A5) yields Eq. (17) . (B5) Fig. 7 shows the estimated parameters of the H t 2 (a u ) model as a function of a u . The standard errors of the parameter estimates are obtained by finite difference approximation of the Hessian matrix.

Alternative parameterizations of the 2T-POT Hawkes model are compared through likelihood ratio tests in order to select the appropriate parameterization for the analysis in Section IV. These tests, which are summarized in Tables III to V, select for the common intensity model with the a ↔ λ = 2a u d −1 t constraint applied and for a Student's t-distributed bulk. 

Rejections of H0 at the 95% confidence level are highlighted in bold. au SPX DJI DAX CAC NKX HSI SPX DJI DAX CAC NKX HSI 0.0125 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 8.0E-01 9.2E-01 8.3E-01 8.1E-01 7.1E-01 0.0250 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 7.2E-01 7.6E-01 8.8E-01 7.8E-01 8.1E-01 7.3E-01 0.0375 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 6.6E-01 6.9E-01 8.3E-01 7.8E-01 8.5E-01 7.0E-01 0.0500 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 6.3E-01 6.5E-01 8.3E-01 6.6E-01 8.8E-01 7.1E-01 0.0625 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 6.4E-01 6.2E-01 8.0E-01 5.9E-01 8.9E-01 6.7E-01 0.0750 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 7.8E-01 5.8E-01 1.0E+00 6.5E-01 0.0875 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 6.2E-01 7.2E-01 7.8E-01 5.4E-01 9.3E-01 6.2E-01 0.1000 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 6.3E-01 6.8E-01 7.4E-01 5.5E-01 8.9E-01 6.6E-01 0.1125 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 6.1E-01 6.3E-01 7.3E-01 1.0E+00 9.8E-01 5.6E-01 0.1250 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 5.6E-01 5.5E-01 7.4E-01 4.3E-01 9.6E-01 4.6E-01 0.1375 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 5.3E-01 4.6E-01 7.1E-01 4.2E-01 1.0E+00 1.0E+00 0.1500 1.0E+00 1.0E+00 1.0E+00 1.0E+00 6.2E-230 1.0E+00 6.5E-01 1.0E+00 6.6E-01 3.4E-01 2.1E-44 3.9E-01 0.1625 1.0E+00 1.0E+00 1.0E+00 1.0E+00 2.9E-302 1.0E+00 1.0E+00 3.0E-01 1.0E+00 1.0E+00 1.8E-58 3.4E-01 0.1750 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 4.7E-01 2.8E-01 6.4E-01 2.3E-01 1.0E+00 2.9E-01 0.1875 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 4.0E-01 1.9E-01 5.7E-01 2.0E-01 1.0E+00 2.4E-01 0.2000 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.6E-01 5.0E-01 1.8E-01 1.0E+00 1.8E-01 0.2125 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 3.2E-01 1.8E-01 4.4E-01 1.3E-01 1.0E+00 1.7E-01 0.2250 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 3.5E-01 1.0E+00 3.9E-01 1.1E-01 1.0E+00 1.3E-01 0.2375 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 2.6E-01 1.0E+00 2.9E-01 8.3E-02 1.0E+00 1.0E-01 0.2500 1.0E+00 6.4E-06 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.7E-01 1.1E-02 3.0E-01 6.5E-02 1.0E+00 7.5E-02 Figs. S1 to S6 provide a visual summary of the residual analysis for the H 2 (a u = 0.025) Hawkes exceedance model fitted to each of the six indices. Fig. S3 is included in the main paper as Fig. 2 . Figs. S11 and S12 for the dynamic quantile (DQ 4 ) test, and Figs. S13 and S14 for the zero mean discrepancy (ZMD) test. Figs. S7, S8, and S14 are included in the main paper as Figs. 4-6. 

Critical Phenomena in Natural Sciences

An Introduction to Statistical Modeling of Extreme Values

Extreme Value Theory

A review of extreme value threshold estimation and uncertainty quantification

Statistical Inference Using Extreme Order Statistics

Residual Life Time at Great Age

Point Spectra of Some Mutually Exciting Point Processes

Spectra of some self-exciting and mutually exciting point processes

Cluster models for earthquakes: Regional comparisons

Forecasting the magnitude of the largest expected earthquake

A Review of Self-Exciting Spatio-Temporal Point Processes and Their Applications

Hawkes processes and their applications to finance: a review

Recurrent interactions in spiking networks with arbitrary topology

Theory of nonstationary Hawkes processes

Marked point process hotspot maps for homicide and gun crime prediction in Chicago

Improving social harm indices with a modulated Hawkes process

Gang rivalry dynamics via coupled point process networks

Selfexciting point process models for political conflict forecasting

Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates

Identifying exogenous and endogenous activity in social media

Asymmetric excitation of left-and right-tail extreme events probed using a Hawkes model: Application to financial returns

Hawkes jump-diffusions and finance: a brief history and review

Non-parametric kernel estimation for symmetric Hawkes processes. Application to high frequency financial data

Hawkes processes in finance

Estimating value-at-risk: a point process approach

High-frequency financial data modeling using Hawkes processes

Modelling security market events in continuous time: Intensity based, multivariate point process models

Quantifying reflexivity in financial markets: Toward a prediction of flash crashes

Critical reflexivity in financial markets: a Hawkes process analysis

Branching-ratio approximation for the self-exciting Hawkes process

Modeling multivariate extreme events using self-exciting point processes

Modeling foreign exchange market activity around macroeconomic news: Hawkes-process approach

Interpreting financial market crashes as earthquakes: A new Early Warning System for medium term crashes

Forecasting the variance of stock index returns using jumps and cojumps

Looking at Extremes without Going to Extremes: A New Self-Exciting Probability Model for Extreme Losses in Financial Markets

Empirical properties of asset returns: Stylized facts and statistical issues

Stylized Facts and Simulating Long Range Financial Data

Statistics and Data Analysis for Financial Engineering

Wiley Series in Probability and Statistics

Copulas and time series with long-ranged dependencies

Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach

Value at Risk Estimation Using the GARCH-EVT Approach with Optimal Tail Selection

Multivariate Hawkes processes: an application to financial data

Basel Committee, Minimum capital requirements for Market Risk Contents

The endo-exo problem in high frequency financial price fluctuations and rejecting criticality

On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks

Forecast combinations for value at risk and expected shortfall

Predicting tail-related risk measures: The consequences of using GARCH filters for non-GARCH data

Techniques for Verifying the Accuracy of Risk Measurement Models

Evaluating Interval Forecasts

Automatic Block-Length Selection for the Dependent Bootstrap

The log-likelihood of the 2T-POT Hawkes exceedance model under the parameters θ u over the data X 0:T = {X t |t ∈ Z ∪ [0, T )} can be expressed as the sum over the tails The log-likelihood components for each tail can be further separated: