key: cord-0300804-5kbw0uuv
authors: Merlo, Luca; Petrella, Lea; Raponi, Valentina
title: Forecasting VaR and ES using a joint quantile regression and implications in portfolio allocation
date: 2021-06-11
journal: nan
DOI: 10.1016/j.jbankfin.2021.106248
sha: 2d23aa9408d5837724f0d4313a0c4b0dc2fae281
doc_id: 300804
cord_uid: 5kbw0uuv

In this paper we propose a multivariate quantile regression framework to forecast Value at Risk (VaR) and Expected Shortfall (ES) of multiple financial assets simultaneously, extending Taylor (2019). We generalize the Multivariate Asymmetric Laplace (MAL) joint quantile regression of Petrella and Raponi (2019) to a time-varying setting, which allows us to specify a dynamic process for the evolution of both VaR and ES of each asset. The proposed methodology accounts for the dependence structure among asset returns. By exploiting the properties of the MAL distribution, we then propose a new portfolio optimization method that minimizes the portfolio risk and controls for well-known characteristics of financial data. We evaluate the advantages of the proposed approach on both simulated and real data, using weekly returns on three major stock market indices. We show that our method outperforms other existing models and provides more accurate risk measure forecasts compared to univariate ones.

The events of the ongoing credit crisis and past financial crises have emphasized the necessity for appropriate risk measures. The use of quantitative risk measures has become an essential management tool providing advice, analysis and support for financial and asset management decisions to market participants and regulators. The most widely used risk measure is Value at Risk (VaR).

VaR measures the maximum loss which a financial operator can incur over a defined time horizon and for a given confidence level. Its clear meaning and computational ease made it very popular among practitioners, so much so that it has largely contaminated the banking regulatory framework. However, VaR has a number of drawbacks (Artzner et al. 1997 (Artzner et al. , 1999 . First, VaR does not account for tail risk, i.e. it does not warn us about the size of the losses that occur with a probability lower than the predetermined confidence level. Second, VaR is not a "coherent" risk measure (Artzner et al. 1999 ) since it does not satisfy the sub-additivity property, and hence, it does not take into consideration the benefits of diversification. As a result, investors and risk managers are likely to construct positions with unintended weaknesses that result in greater losses under conditions beyond the VaR level (Yamai and Yoshiba 2005) . Market participants could solve such problems by adopting the Expected Shortfall (ES) risk measure, which is defined as the conditional expectation of exceedances beyond VaR (see Acerbi and Tasche (2002) and Rockafellar and Uryasev (2000) ). Unlike VaR, ES is a coherent risk measure and provides more information on the shape and the heaviness of the tails of the loss distribution. Therefore, ES has gained increasing attention from risk managers, banking regulators and investors as an alternative measure of risk, complementing the VaR measure.

However, despite its interesting properties, and in contrast with VaR, little work exists on modeling ES. This is in part due to the fact that ES is not an "elicitable" measure, in the sense that there does not exist a loss function such that ES is the solution that minimizes the expected loss. Several works have been proposed in the literature to overcome the problem of elicitability (see, e.g., Engle and Manganelli (2004) , Cai and Wang (2008) , Taylor (2008) , Zhu and Galbraith (2011) , Du and Escanciano (2017) , Patton et al. (2019) , Bu et al. (2019) ). Recently, using the results of Fissler and Ziegel (2016) , who show that ES is jointly elicitable with VaR, Taylor (2019) uses the Asymmetric Laplace (AL) distribution to jointly estimate dynamic models for both VaR and ES. In particular, Taylor (2019) shows that the negative of the log-likelihood associated to the AL distribution belongs to the class of loss functions presented in Fissler and Ziegel (2016) , and hence it can be used to estimate and forecast VaR and ES measures in one step. In his paper, the joint estimation of VaR and ES is obtained in a univariate quantile regression framework, exploiting the interesting result that ES can be expressed in terms of the scale parameter of the AL density.

The literature mentioned above, however, has mainly focused on univariate time series, which completely disregards the strong interrelation among assets in financial markets. To capture the degree of tail interdependence between assets, several quantile-based methods have also been proposed to estimate VaR, without however specifying a model for the ES component; see, for example, the relevant works by Baur (2013) ; Bernardi et al. (2015) ; White et al. (2015) ; Kraus and Czado (2017) and Bonaccolto et al. (2019) .

In this paper, we extend the univariate approach of Taylor (2019) to a multivariate framework, with the objective of obtaining joint estimates of both VaR and ES for multiple financial assets simultaneously, accounting for their dependence structure. To this end, we generalize the Multivariate Asymmetric Laplace (MAL) quantile regression approach of Petrella and Raponi (2019) to a time-varying setting, by allowing the parameters of the MAL to vary over time. For each asset, we model the evolution of VaR and ES as functions of the location and scale parameters of the distribution. In particular, for the VaR component, we adopt a Conditional Autoregressive Value at Risk (CAViaR) specification (Engle and Manganelli (2004) ).

The advantages of our methodology are manifold. Firstly, our approach is a joint modelling framework where both the model parameters and the pair (VaR, ES) of multiple returns are estimated simultaneously, generalizing the univariate results of Bassett et al. (2004) and Taylor (2019) . Secondly, our theory captures empirical characteristics of financial data such as peakedness, skewness, and heavy tails (see e.g., Kraus and Litzenberger (1976) ; Friend and Westerfield (1980) and Barone-Adesi (1985) ), without relying on the limitation of normally distributed returns.

The inferential problem is solved by developing a suitable Expectation-Maximization (EM) algorithm, which exploits the mixture representation of the MAL distribution (see Petrella and Raponi (2019) ) properly generalized to the case of time-varying parameters. The finite sample properties of the proposed estimation method are also evaluated using a simulation exercise, where we show the validity and the robustness of our procedure under different data generating processes.

A further contribution of the paper concerns the evaluation of VaR and ES in the context of portfolio optimization (see, e.g, Yiu (2004) and Alexander and Baptista (2008) ). In recent years, the MAL density has attracted wide attention in the literature for its flexibility in modeling financial data (Mittnik and Rachev (1991) ; Kotz et al. (2012) and Paolella (2015) ) and for its interesting properties that can be exploited to derive optimal portfolio allocations (see Zhao et al. (2015) and Shi et al. (2018) ). In the classic Mean-Variance (MV) methodology of Markowitz (1952) , portfolio risk is measured using the standard deviation of the portfolio. However, the MV approach is reasonably applicable only in cases where either returns follow a Gaussian distribution or the investors utility function is quadratic. Given the empirical evidence showing that market participants have a preference for positive skewness and they are more concerned about the downside risk (see Arditti (1971) and Konno and Suzuki (1995) among others), the MAL distribution could represents a more effective tool to select optimal portfolio allocations in the case of risk-averse agents. Therefore, in this paper we exploit the MAL properties to incorporate skewness directly into the portfolio optimization method and to identify the optimal allocation weights. We then compute the corresponding portfolio VaR and ES as a function of the multivariate structure of the data. We prove how this result follows directly from the fact that any linear combination of the MAL components is still AL distributed, with location, skew and scale parameters that are functions of the MAL parameters and the portfolio weights. Therefore, once we obtain the Maximum Likelihood (ML) estimates of the MAL parameters from the proposed dynamic quantile regression model, we fix a desired level of risk for any target portfolio and search for the optimal allocation weights according to the adopted strategy.

Specifically, we consider the Skewness Mean-Variance (SMV) strategy of Zhao et al. (2015) , where the optimal allocation is obtained by minimizing the portfolio variance, while controlling for the skewness of asset returns. However, Zhao et al. (2015) Patton et al. (2019) and Taylor (2019) and extend their approach by introducing a new scoring function based on the MAL distribution. We find that our multivariate method always provides more accurate VaR and ES predictions compared to other well-known approaches, like the Quantile AutoRegression of Koenker and Xiao (2006) and the dynamic quantile regression of Taylor (2019) . Moreover, in line with Taylor (2019) , our results show that the Asymmetric Slope CAViaR specification of Engle and Manganelli (2004) yields the best VaR and ES forecasts for all the three indices at different quantile levels, confirming the existence of relevant asymmetries in the impact of positive and negative returns.

In a second empirical analysis, we aggregate the stock market indices to form a financial portfolio with a predetermined level of risk, and estimate its optimal allocation weights by implementing our new optimization procedure. We then compute the out-of-sample portfolio's VaR and ES and evaluate the predictions using the univariate backtesting procedures of Taylor (2019 ), Nolde et al. (2017 and Patton et al. (2019) . The empirical analysis reveals that our multivariate method produces the lowest average losses compared to other existing strategies based on the multivariate Normal and t-distributions, regardless of the scoring function being used. In addition, we find that the proposed methodology overall yields the highest Sharpe Ratio and the least concentrated portfolio allocations.

The rest of the paper is organized as follows. In Section 2, we introduce the dynamic multiple quantile regression and propose a joint model for VaR and ES. We then illustrate the EM-based ML approach for the simultaneous estimation of VaR and ES. Section 3 develops the portfolio allocation problem. Section 4 introduces a new scoring function for the joint evaluation of VaR and ES forecasts. In Section 5 we discuss the main empirical results, while Section 6 concludes.

All the proofs are provided in Appendix A, while the simulation study is presented in Appendix B.

In this paper we generalize the univariate regression approach of Taylor (2019) . Specifically, by extending the MAL density of Petrella and Raponi (2019) -allowing the location and scale parameters of the MAL to vary over time -we estimate the pair of VaR and ES associated to each asset using a joint quantile regression framework. In this way, we are able to calculate the time-varying VaR and ES simultaneously for all marginal response variables, accounting for possible correlation among the considered assets. For the VaR components, we assume a CAViaR specification (see Engle and Manganelli (2004) ). Parameter estimation is carried out using a suitable EM algorithm as in Petrella and Raponi (2019) , properly extended to deal with the time-varying setting. In this way, the estimated parameters account for tail interdependence among multiple returns and convey this information onto the VaR and ES estimates.

We start by introducing the time-varying joint quantile regression model in Section 2.1, where we consider a dynamic generalization of the MAL density proposed in Petrella and Raponi (2019) .

We then show in Section 2.2 how the resulting time-varying scale parameter of the MAL can be used to model the ES vector and derive a parsimonious approach for simultaneous estimation of VaR and ES in a multidimensional setting. Parameter estimation and the EM algorithm are described in Section 2.3.

be a p-variate response variable and denote by Q Ytj (τ j |F t−1 ) the τ jquantile function of each of the j-th component of Y t , conditional on the information set F t−1 available at time t − 1, for j = 1, ..., p and t = 1, ..., T . Then, for a given τ j , we consider the following autoregressive dynamic:

where ω j = ω j (τ j ), η j = η j (τ j ) and β j = β j (τ j ) = [β 1j , ..., β Kj ] are model parameters that depend on the chosen level τ j and where we suppress the index τ j for simplicity of notation. The dynamic specification in (1) is well-known in the literature as the CAViaR model of Engle and Manganelli (2004) , which attempts to compute the τ -th level VaR by estimating the τ -th level quantile of the asset returns through a conditional autoregressive structure. The function (·) represents the so-called News Impact Curve (NIC), originally introduced by Engle and Ng (1993) . For each j-th component, the NIC function essentially feeds back the last available observation (Y t−1j ) into the present value of the conditional quantile, through the K × 1 parameter vector β j . Following the CAViaR literature, we will consider different specifications for (·) to model the marginal quantiles, which will be described in the next section.

Using matrix notation, the representation in (1) can be embedded in the following multivariate linear regression model:

where t denotes a p×1 vector of error terms, having each marginal quantile (at fixed levels τ 1 , .., τ p , respectively) equal to zero, to ensure that µ t = Q Y t (τ |F t−1 ).

To estimate the regression model in (2), we consider a dynamic generalization of the MAL distribution introduced in Petrella and Raponi (2019) and Kotz et al. (2012) , i.e. we consider the time-varying distribution MAL p µ t , D tξ , D tΣ D t , with density function:

In (3), µ t represents the location parameter vector, D tξ ∈ R p is the scale (or skew) parameter, with D t = diag[δ t1 , δ t2 , ..., δ tp ], δ tj > 0 andξ = [ξ 1 ,ξ 2 , ...,ξ p ] , having generic elementξ j = 1−2τj τj (1−τj ) . Σ is a p × p positive definite matrix such thatΣ =ΛΨΛ, with Ψ having the structure of a correlation matrix 1 andΛ = diag[σ 1 ,σ 1 , ...,σ p ], withσ 2 j = Moreover, as clarified in their paper, the constraintsξ j = 1−2τj τj (1−τj ) andσ 2 j = 2 τj (1−τj ) need to be imposed to guarantee model identifiability (see Petrella and Raponi (2019) , Proposition 2), and also to ensure that the dynamic quantile specification in (1) holds, i.e. P(Y tj < µ tj ) = τ j holds for each j = 1, 2, ..., p.

In addition, when such constraints are satisfied, then each marginal component of the MAL in (3) follows a univariate AL distribution, that is Y tj ∼ AL(µ tj , τ j , δ tj ), where δ tj represents the time-varying scale parameter of Y tj . This allows us to exploit the result of Taylor (2019) , who shows the link between the scale parameter of the AL distribution and the ES risk measure in a univariate framework. By extending these results, we provide new insights on how to estimate conditional VaR and ES jointly in a multidimensional setting, that accounts for correlations between marginals. This is explained in details in the next section.

Following Engle and Manganelli (2004) , the CAViaR specification in (1) allows us to derive the VaR of an asset at level τ j by estimating the corresponding quantile at the τ j -th level, through a conditional autoregressive structure. In the following, we consider several CAViaR formulations, depending on the choice of the NIC function (·). We then extend the idea of Taylor (2019) to a multivariate setting, in order to model and estimate the ES component dynamically.

The CAViaR specifications that we consider are the following:

where ω = [ω 1 , ..., ω p ] , η = [η 1 , ..., η p ] and β = [β 1 , ..., β p ] , with β j = [β 1j , β 2j ] , are unknown parameters to be estimated, and where y + = max(y, 0) and y − = − min(y, 0), denote the positive and negative part of y, respectively.

For the ES component, we exploit the interesting link provided in Bassett et al. (2004) , which relates univariate quantile regression to conditional ES through the following relation:

with 1(·) being the indicator function. Following Taylor (2019), the expression in (7) can be rearranged so that the conditional ES can be expressed in terms of the conditional AL scale parameter δ tj . Specifically, recalling that each marginal of the MAL distribution has a univariate AL density with conditional scale parameter equal to δ tj = E[(Y tj − Q Ytj )(τ j − 1 (Ytj <Q Y tj ) )], then equation (7) reduces to:

.., T, j = 1, 2, ..., p

implying that

In order to ensure that each estimated ES does not cross the corresponding estimated quantile, we model ES in (9) as the product of the quantile and a constant factor (see, e.g., Gourieroux et al. (2012) and Taylor (2019)) as follows:

where γ 0j is an unconstrained parameter to be estimated such that 1+e γ0j is greater than 1 and Y t is assumed with zero mean. We collect the unknown parameters in the vector γ 0 = [γ 01 , ..., γ 0p ] .

As explained in Taylor (2019) , this formulation correctly describes the relationship between ES and VaR for different data generating processes, such as a GARCH process with a Student-t distribution. Therefore, the representation in (10) provides a simple and parsimonious approach to estimate VaR and ES simultaneously in a dynamic framework.

In (10), however, only the quantile is dynamic, while the factor 1+e γ0j remains constant through time. Therefore, in order to generalize this approach, we also consider the alternative formulation for the ES presented in Taylor (2019) , where the difference between the ES and the VaR is modeled using an AutoRegressive (AR) specification as follows:

ES tj = Q Ytj (τ j ) − x tj , t = 1, 2, ..., T, j = 1, 2, ..., p,

where we define the nonnegative parameter γ = [γ 1 , ..., γ p ] , with γ j = [γ 1j , γ 2j , γ 3j ] , to ensure that the VaR and ES estimates do not cross.

In the next section, we show how to estimate model parameters using a ML approach based on a dynamic modification of the EM algorithm proposed by Petrella and Raponi (2019).

Before describing the main steps of the EM algorithm, we first introduce the notation Q t = Q Y t (τ |F t−1 ), D t (γ) andΣ(Ψ) to make clear that the vector Q Y t (τ |F t−1 ) and the matrices D t andΣ depend on the unknown parameters ω, η, β, γ and Ψ, respectively. The derivation of the EM algorithm is based on the Proposition 3 of Petrella and Raponi (2019) , properly extended to deal with the autoregressive structure of the quantile function Q t and the time dependency of the scale matrix D t (γ).

Let Φ = {ω, η, β, γ, Ψ} denote the global set of parameters and defineΦ = {ω,η,β,γ,Ψ} as the corresponding set of parameter estimates. For a given vector τ = [τ 1 , τ 2 , ..., τ p ] , the expected complete log-likelihood function (up to additive constants), given the observed data Y t and the parameter estimatesΦ, is:

where

and where W t follows a standard exponential distribution.

For a given vector τ , the expected complete log-likelihood in (13)-(15) is then maximized with respect to the parameter set Φ, yielding to the M-step updatesΦ = {ω,η,β,γ,Ψ}. Notice that, unlike Petrella and Raponi (2019) , closed form solutions forω,η,β andγ do not exist, due to the autoregressive structure of the data and, therefore, numerical optimization is required. Updated estimates ofΣ(Ψ) can be instead derived using the following expression:

Therefore, the EM algorithm can be implemented as follows:

E-step: Set the iteration number h = 1. Fix the vector τ at the chosen quantile levels τ 1 , ..., τ p of interest and initialize the parameter set Φ = {ω, η, β, γ, Ψ}. Then, givenΦ =Φ

}, at each iteration h calculate the weights:

)] with respect to Φ, and obtain the updated set of parameter estimatesΦ (h+1) .

The optimization procedure is iterated until convergence, that is when the difference between the likelihood function evaluated at two consecutive iterations is smaller than 10 −5 . We initialize the EM algorithm by providing the univariate parameter estimates of Taylor (2019) on each asset, while the initial value for the correlation matrix Ψ in (3) is calibrated using the empirical correlation matrix of the data. We fit the univariate models following the estimation procedure in Engle and Manganelli (2004) and Taylor (2019) . In addition, we also consider a multiple random starts strategy with 100 different starting points to better explore the parameter space, and retain the solution corresponding to the maximum likelihood value. This strategy would prevent convergence issues and avoid the algorithm to be trapped at some local maxima. From an algorithmic point of view, the EM method exploits the Nelder-Mead and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization routines to obtain the updated estimates of β and γ, meanwhile it uses (19) to compute the updated estimate ofΣ.

The computational analysis has been conducted using the R (version 4.0.2) software where the functions to update β, γ andΣ were coded in efficient C++ object-oriented programming.

The validity and the performance of the proposed EM algorithm have been assessed using also a simulation exercise (see Appendix B).

In this section we approach the problem of portfolio allocation. Particularly, we construct the Skewness Mean-Variance (SMV) portfolio of Zhao et al. (2015) , taking into account both for the multivariate structure and the skewness of asset returns. Following Stolfi et al. (2018) and Zhao et al. (2015) , we exploit an interesting property characterizing the MAL distribution in (3). Specifically, we show that any linear combination of its marginal components follows a univariate AL distribution, whose parameters are a function of the MAL parameters in (3). Note that, while the MAL density has so far been regarded as a convenient tool for estimating the marginal quantiles, in this section the MAL distribution is used as a data driven assumption to describe the empirical characteristics of asset returns. As already stated in the Introduction, this choice has been positively accepted in the recent financial literature to detect peakedness, fat-tail, and skewness of financial assets, overcoming possible deficiencies of standard approaches relying, for example, on the Gaussian distribution assumption. We then evaluate the riskiness of the selected portfolio by calculating its corresponding VaR and ES, using the results of Section 2.2.

Let assume that Y t is a p-dimensional random variable describing the joint dynamics of p variables at time t. Let us consider a linear combination (with weights to be determined) of each component of Y t . Then, the following proposition holds.

Assume that Y t follows the distribution in (3). Given such specification, at each time t, investors might be interested in deriving a portfolio Y b t = p j=1 b tj Y tj , by investing a portion b tj of their capital on the asset Y tj , such that p j=1 b tj = 1. Then, under this setting, the result of Proposition 1 can be easily applied, yielding a portfolio with location, skewness and scale parameters equal to, respectively, µ t , τ t and δ t as in (26). Typically, in risk management applications, the skewness parameter is fixed a priori by the researcher at a certain level (constant over time) τ t =τ , as it essentially measures the overall riskiness of a financial product (a portfolio, in our case). Therefore, once we estimate the timevarying MAL parameters from the quantile regression model in (2), for a fixed level of riskτ , the investor's portfolio decision is based on the solution of the selected portfolio strategy. As stated above, to obtain the optimal portfolio allocation, we adopt the SMV strategy of Zhao et al. (2015) , which seeks to minimize the portfolio variance and at the same time control for the skewness of asset returns. Formally, according to Proposition 1, the SMV portfolio solves the following constrained optimization problem:

whereΣ has been introduced in (3) and accounts for the covariance matrix of the returns, while b t denotes the portfolio's weights at time t held by the investor over the period [t, t + 1).

From an empirical point of view, the constraint in (27b) implies that the portfolio weights must be adjusted at each holding period to guarantee that the VaR of the portfolio has constant levelτ ,

Once we obtain the optimal portfolio weights for the period [t, t+1), we can then compute the conditional portfolio's VaR and ES at levelτ by simply applying the result in (8) to the univariate case.

As already explained above, since the parameters µ t , τ t , δ t depend on the parameter estimates of the MAL, information on the dependence structure and on the empirical characteristics embedded in the data is channeled through such estimates into the portfolio's VaR and ES forecasts. This motivates our approach even further, since it could offer an operative and useful tool to help investors and asset managers in deriving optimal portfolio allocations and, at the same time, monitoring multiple VaR and ES jointly.

To assess the performance of VaR and ES predictions jointly, we introduce a new backtesting procedure, based on the multivariate approach discussed in Section 2.

Backtesting techniques are based on quantitative tests which scrutinize model performance in terms of accuracy and precision with respect to a defined criterion. Existing approaches, however, rely on tests that analyze VaR and ES predictions separately, i.e. they only focus on the individual evaluation of one risk measure or the other. VaR evaluation is typically based on coverage tests, which measure the percentage of times that the returns have exceeded the estimated VaR at a chosen probability level τ (see, e.g., the unconditional coverage (LR uc ) test of Kupiec (1995) , the conditional coverage (LR cc ) test of Christoffersen (1998) and the Dynamic Quantile (DQ) test of Engle and Manganelli (2004) ).

To evaluate ES forecasts, the backtesting analysis becomes more complicated since ES is not an elicitable measure (Gneiting, 2011) and, therefore, suitable scoring functions cannot be determined (Taylor, 2019) . The test of McNeil and Frey (2000) is commonly used in this context, which is based on the discrepancy between the observed return and the ES forecast for the periods in which the return exceeds the VaR forecast. Another suitable option is the backtesting procedure of Du and Escanciano (2017) (2019)).

In the following, we provide a new scoring rule that can be used in a multivariate setting, to jointly evaluate VaR and ES forecasts of multiple (and possibly correlated) financial assets. To provide support for our proposal of estimating multiple VaR and ES by maximizing the MAL likelihood, we define a new scoring function (S M AL ) using the negative of the MAL log score:

where • denotes the Hadamard product and I p represents the identity matrix of order p.

Notice that, when p is equal to 1, the S M AL in (28) reduces to the AL log score of Taylor (2019). When p > 1, the loss function S M AL allows us to: (i) perform a joint assessment of the pairs (VaR, ES) specific to each asset and, at the same time, (ii) control for the existing correlation among returns.

In this section we apply the methodology presented in Sections 2 and 3 to real data in order to evaluate and compare its empirical implications with the ones obtained by using a univariate framework. Particularly, we follow Taylor (2019) (2002) test. In a second empirical exercise, we aggregate the market indices to form a financial portfolio and determine its optimal allocation weights by solving the optimization problem described in Section 3.2. We finally compute and assess the resulting portfolio's conditional VaR and ES for the out-of-sample period, which consists of the last 368 observations of the sample.

Our sample is collected from Bloomberg and it consists of 1868 weekly returns for each of the three stock indices. The main summary statistics are displayed below in Table 1 , providing evidence of the well-known stylized facts on fat tails, high kurtosis, serial and cross-sectional correlation that typically characterize financial assets. Moreover, all series exhibit a negative skewness, the Jarque-Bera test significantly rejects the normality assumption, the Ljung-Box test advocates the presence of serial correlation and the Augmented Dickey-Fuller test supports the hypothesis of absence of unit roots. These results clearly motivate us to consider a quantile regression approach as investigative tool.

Using the approach introduced in Section 2, in this section we derive a joint estimation of VaR and ES for the three stock market indices described above. Particularly, we estimate the outof-sample series of VaR and ES by considering the three different specifications in (4), (5) and (6), with both the multiplicative factor in (10) and the AR formulation in (11) To jointly evaluate VaR and ES forecasts associated to each stock market index, in addition to the results of the coverage tests, Table 3 reports the values of the loss functions S F ZN of Nolde et al.

(2017) and S F Z0 of Patton et al. (2019), averaged over the out-of-sample period where:

The losses in (29) and (30) Finally, to reinforce our analysis, we evaluate the forecasting performance of the three CAViaR competing models using the scoring function in (28). Specifically, at each time t, and for the specified level τ , we define by S

M ALt (τ ) the scoring function associated to model j, and denote the difference between the scoring function of model i and model j by ∆

where i, j = 1, 2, 3. We then test for the null hypothesis that E[∆

M AL,t ] < 0 using the Diebold and Mariano (2002) test, for all the pairs of models i and j. If the null hypothesis is rejected, then forecasts delivered by model i are more accurate than those of model j, and therefore model i is preferable than model j. The results of the test, together with the corresponding p-values, are reported in Table 4 . The table clearly shows that the CAViaR-AS specification outperforms both the CAViaR-IG and CAViaR-SAV models at all the three quantile levels and for both the adopted ES formulations of constant multiplicative factor (Panel A) and AR dynamics (Panel B). Therefore, in order to select the best performing model between the two CAViaR-AS in Panels A and B, we apply again the Diebold and Mariano (2002) test on the two competing CAViaR-AS specifications. The results are reported in Table 5 and suggest that the CAViaR-AS model with the ES specified as a constant multiple of the VaR provides the most accurate predictions at all the three quantile levels. This is in line with Taylor (2019) , who also find that the same specification not only produces the smallest losses, but it also delivers the most accurate predictions than all the other competing CAViaR dynamics 2 . These results corroborate the fact that accounting for asymmetries in the autoregressive process of a given quantile improves the model's forecasting ability (see, e.g., Engle and Manganelli (2004) , Xiliang and Xi (2009), Taylor (2005) and Laporta et al. (2018)).

To show the advantages and the different implications of our approach, we compare our results with the ones obtained by considering each asset separately, as if we ignored their possible dependence structure. Specifically, the three CAViaR models are estimated individually for each stock market index using the univariate approach of Taylor (2019) . To assess the performance of the three models and to combine the individual forecasts of the three indices in a single value, we use the sum of the three corresponding AL log scores (see Taylor (2019)) as consistent scoring rule.

That is, at each time t, and for each model j, we define the following scoring function:

where S

ALp,t (τ p ) denotes the AL log-score of Taylor (2019), corresponding to model j and asset p:

As explained in Frongillo and Kash (2015) , summing up the three AL scoring functions would produce a consistent scoring rule in this case, since each function S (j)

ALp,t (τ p ) elicits the pair (VaR, ES) for the corresponding p-th asset (see Fissler and Ziegel (2016) and Taylor (2019)).

Then, as before, we define the difference between the scoring function of model i and model j by

ALt (τ ) and apply the Diebold and Mariano (2002) test to look for the best model in terms of forecasting accuracy. The results are reported in Table 6 (Panels A and B). As shown in the table, the conclusion of the test is now less clear and does not provide any significant evidence in favor of a particular model. This is one of the first advantages of our approach, as we are able to identify a clear hierarchy among competing models.

A second question of interest concerns the "efficiency gain" of the multiple approach compared to the univariate one. In this sense, we would like to test whether taking into account the association 2 To further justify this choice, we also compare the CAViaR-AS model with the Quantile AutoRegression of Koenker and Xiao (2006) . Specifically, we estimate the regression model of Petrella and Raponi (2019) using the lagged returns (at lag 1) of each asset as covariates. Comparing these two models would allow one to evaluate the potential contribution of assuming a CAViaR specification in the quantile dynamics. According to the coverage tests and the scoring functions defined in (29) and (30), we still find a better performance of the CAViaR-AS specifications. structure among the market indices would provide us with better predictions in terms of VaR and ES. To do that, we use the backtesting procedure to identify the most "efficient" model, that is, the model producing the best forecasts according to the Diebold and Mariano (2002) test. To measure the efficiency gain we analyze the difference, if any, in the predictive accuracy between the forecast (VaR, ES) produced by our multivariate approach and the univariate ones. Therefore, for a given CAViaR specification, we test for the difference between the scores obtained with the scoring rule in (31) and the ones obtained with (28). The null hypothesis is that, on average, the difference is not statistically different from zero, i.e. the two approaches have the same forecasting performance.

The alternative hypothesis is that the difference is smaller than zero, i.e. the multivariate approach delivers significantly better predictions (smaller losses). Table 7 shows the resulting test statistics and the corresponding p-values for each of the possible pairs of competing models. Interestingly, for all the considered risk levels and for all the three CAViaR specifications, we are always able to reject the null hypothesis at 5% level, providing evidence of the efficiency gain of our proposed joint approach 3 .

To offer a graphical intuition of the results, Figure 1 shows the time series of the difference between the scoring function in (31) (4), while the red and the blue lines refer to the CAViaR-AS and CAViaR-IG dynamics, respectively. The efficiency gain of the multivariate approach clearly emerges from these pictures. Indeed, for all the considered risk levels, and regardless of the dynamic specification of the quantiles, the difference between the two approaches is almost always positive. This confirms the idea that the losses associated with the univariate model can be very large if the dependence structure of the data is not accounted for.

In Figure 2 As a further robustness check, we have also considered the best CAViaR specification for each asset and then applied the Diebold and Mariano (2002) test. The proposed joint method still appears to be more efficient than the univariate one of Taylor (2019), making our findings unchanged. each stock index. In all the cases, the estimates of VaR and ES produced by our joint approach lie below the corresponding values obtained with the univariate setting, suggesting that our proposed method could lead to more conservative results.

Finally, to get a more intuitive representation of the relationship between the estimated VaR and the ES over time and across the quantile levels, Figure 3 displays the absolute difference between the out-of-sample VaR and ES forecasts for each of the three stock indices. The plots in the first row are obtained by assuming the CAViaR-AS specification with the ES modeled as in (10) Indeed, we find that the difference between the estimated risk measures is typically smaller in calm periods and larger in period of turbulent markets, with more pronounced upward spikes when the AR dynamics for the ES is used. High volatility is also clearly evident in correspondence and in the aftermath of major economic and financial crises, such as, for example, the Chinese stock market crash at the start of 2016, the Brexit in 2018 and the outbreak of COVID-19 pandemic in 2020.

Based on these considerations, in the next section we consider the CAViaR-AS specification in (5) with the ES expressed as in (10), to implement the portfolio optimization procedure.

In this section we use the three stock market indices of FTSE 100, NIKKEI 225 and S&P 500 to build a SMV portfolio that delivers a certain fixed level of riskτ . The optimal allocation weights are determined by solving the optimization problem described in Section 3.2, using the parameter estimates provided by the CAViaR-AS specification in Section 5.2.

We evaluate the benefits of our approach by considering alternative strategies. First, we use the estimation method of Zhao et al. (2015) , where the covariance matrixΣ in (27a) is estimated using the sample variance and the sample mean of the return series. We call this strategy Moment-SMV.

Second, we evaluate the classic MV of Markowitz (1952) . In this case, we model the conditional covariance of asset returns using several well-known autoregressive dynamics, i.e. the multivariate GARCH Dynamic Conditional Correlation model (Engle 2002) , under both the multivariate Normal We jointly estimate VaR and ES of the resulting portfolios and analyze their out-of-sample performance using the last 368 returns of the sample. The backtesting results for the considered strategies are shown in Table 8 , where we report the AL log-score of Taylor (2019) , together with the S F ZN and S F Z0 loss functions in (29) and (30) for the joint evaluation of the pair (VaR, ES).

The results clearly make our approach stand out compared to the other strategies. Indeed, the strategies based on the multivariate Normal and t-distributions, and their skewed counterparts, produce highly volatile VaR forecasts and suffer from larger average losses over the out-of-sample period. On the other hand, the SMV and Moment-SMV models deliver better performance gains over the MV portfolios, with the SMV being preferred at the three VaR levels, especially at the most extreme case of τ = 0.01. Such gain may be traced back to the higher efficiency in the estimation procedure based on the ML approach proposed in Section 2.3. It is worth noticing that these conclusions remain still valid even when we use the S F Z0 and the S F ZN scoring rules, which do not directly depend on the AL likelihood function. Table 8 we also evaluate the risk-adjusted returns of the competing portfolios, measured by the Sharpe Ratio (SR) and the Herfindahl-Hirchman Index of weights concentration (HHI). We find that the SMV strategy delivers the portfolio with the highest SR and the least concentrated portfolios at both τ = 0.1 and τ = 0.05, with an average HHI of 0.558 and 0.557, respectively. On the other hand, when τ = 0.01, the MV strategy seems to yield the portfolios with the highest SR, while the Mom-SMV strategy produces the lowest degree of weights concentration.

A graphical representation of the SMV portfolio weights and their evolution over time is provided in Figure 4 . In each plot of the figure, the blue line denotes the allocation weights assigned to the FTSE 100, the red line refers to the NIKKEI 225, while the allocation weights of the S&P 500 are displayed in orange. The left panel considers τ = 0.1, the center one τ = 0.05, while in the right-hand side we plot the results for τ = 0.01. As one can see, the SMV strategy tends to invest mainly in the FTSE 100 and in the S&P 500, while it always tends to hold only a small short position on the NIKKEI 225. It is also interesting to notice that the portfolio weights exhibit the highest volatility in periods of high uncertainty in the market, such as, for example at the end of 2016 and during the ongoing COVID-19 pandemic. Finally, we evaluate the evolution of the wealth generated by the portfolios at the three risk levels during the out-of-sample period. We show that our approach can offer several important advantages, both theoretical and practical. First, it provides a significant gain in terms of estimation efficiency, as it allows us to estimate multiple VaR and ES in just one step. Second, it can significantly improve forecasts accuracy, since it accounts for the dependence structure among financial assets, that cannot be detected by univariate methods. These results are also confirmed empirically. Indeed, using three stock market indices as in Taylor (2019), we estimate the pairs of (VaR, ES) for each of the three assets and evaluate the forecasts using a new scoring function based on the MAL density, which allows us to account for the dependence structure among the considered assets at each point in time. The forecasts of VaR and ES have been compared with the ones obtained by the univariate approach of Taylor (2019), i.e. by considering the three stock market indices separately, as they were independent to each other. What we find is a significant gain in terms of forecasting accuracy using the proposed multivariate framework, leading to more reliable risk measure estimates.

Following Zhao et al. (2015) , we also exploit the properties of the time-varying MAL distribution to derive a new portfolio optimization method, where the optimal allocation weights are adjusted at each holding period to ensure that the portfolio meets a predetermined level of risk. Empirically, we find that our optimization method produces a portfolio with less concentrated allocation weights and higher Sharpe Ratio compared to other existing strategies.

Several extensions and generalizations could be analyzed, leaving space for future research.

Although we have only focused on CAViaR models, one could consider other VaR based models in the quantile regression framework and/or specify different ES dynamics where the factor (1 + e γtj ) varies over time according to an autoregressive process for γ tj . Another interesting research problem would involve the evaluation of the portfolio performance when a larger set of indices is considered, which may help us in providing an empirical ranking based on their VaR and ES forecasts. In this case, a penalized approach, as done, for instance, by Petrella and Raponi (2019) , could be adopted to deal with the curse of dimensionality, improve estimation, gain in parsimony and conduct a variable selection procedure. Finally, other portfolio strategies can be implemented as well, where the choice of the weights might be motivated by other practical considerations or regulatory restrictions. Diebold and Mariano (2002) pairwise test between competing CAViaR models in predicting one-week-ahead returns using the joint approach with the multiplicative factor in (10) 

CAViaR-AS -1.997 -6.273 -5.029

(0.023) (0.000) (0.000) Table 5 : Test statistics and p-values (in parentheses) of the Diebold and Mariano (2002) pairwise test between the CAViaR-AS specifications using the joint approach with the constant multiplicative factor in (10) and the AR formulation in (11)-(12) for the ES component in predicting one-week-ahead returns. The null hypothesis is that the two approaches have the same forecasting performance. Diebold and Mariano (2002) pairwise test between competing CAViaR models in predicting one-week-ahead returns using the univariate approach of Taylor (2019) Diebold and Mariano (2002) pairwise test between the competing joint and univariate approaches in predicting one-week-ahead returns with the multiplicative factor in (10) (Panel A) and the AR formulation in (11)-(12) (Panel B) for the ES. The null hypothesis is that the two approaches have the same forecasting performance. Nolde et al. (2017) and Taylor (2019) in (30), (29) and (32), respectively.

SR and HHI denote the portfolio Sharpe Ratio and the averaged Herfindahl-Hirschman Index. The grey bands correspond to the recession dates and to various economic and financial crises occurred in : 2014,03-2015,02; 2015,07-2016,09; 2018,01-2018,06; 2018,08-2019,03; 2020,02-2020,03 . : 2014,03-2015,02; 2015,07-2016,09; 2018,01-2018,06; 2018,08-2019,03; 2020,02-2020,03 . : 2014,03-2015,02; 2015,07-2016,09; 2018,01-2018,06; 2018,08-2019,03; 2020,02-2020,03. Appendix A. Proof of Proposition 1

As stated in Petrella and Raponi (2019) , the MAL p (µ, Dξ, DΣD) in (3) can be written as a location-scale mixture, having the following representation:

whereξ = [ξ 1 ,ξ 2 , ...,ξ p ] , having generic elementξ j = 1−2τj

τj (1−τj ) , j = 1, ..., p.Σ is a p × p positive definite matrix such thatΣ =ΛΨΛ, with Ψ being a correlation matrix andΛ = diag[σ 1 ,σ 1 , ...,σ p ], withσ 2 j = 2 τj (1−τj ) , j = 1, ..., p. Finally, Z ∼ N p (0 p , I p ) denotes a p-variate standard Normal distribution and W ∼ Exp(1) has a standard Exponential distribution, with Z being independent of W . Notice that, under the constraints imposed onξ andΛ, the representation in (33) implies that:

Let φ Y (t) denote the characteristic function of Y , with t ∈ R p . Using the result in (33), it follows that:

Now, using the conditional distribution of Y given W in (34), we have that:

Substituting this result into (35) yields:

Finally, integrating over W , we obtain:

Now, let b = (b 1 , ..., b p ) ∈ R p be a p × 1 vector such that b = 0 p and consider a new random variable

The characteristic function in (38) resembles the characteristic function of the AL univariate distribution discussed in Yu and Moyeed (2001) and Kozumi and Kobayashi (2011) where µ , τ , δ are the scale, skewness and scale parameters, respectively. Therefore, the characteristic function of Y b in (38) can be rewritten as the characteristic function of a univariate AL distribution with parameters:

In conclusion, we obtain that Y b ∼ AL(µ , τ , δ ) and P(Y b < µ ) = τ . Consequently, the parameter τ controls the probability assigned to each side of Y b and µ is the corresponding quantile at level τ . Notice that the denominator 2(b DΣDb) + (b Dξ) 2 in (39) , which implies that the distribution of Y b is symmetric around µ .

In this Appendix we conduct a simulation study to evaluate the finite sample properties of the proposed method and its ability to jointly estimate the pair (VaR, ES) for multiple correlated assets.

This simulation exercise addresses the following issues. First, we consider different distributional choices for the error term to investigate the behaviour of the model in the presence of non-Gaussian errors. Second, we evaluate the bias and accuracy of the ML estimators when the interest of the research is focused upon the lower tails of the distribution. Finally, we inspect the impact of dimensionality on both the estimated parameters and the computational burden of the optimization routine.

In the first experiment, we consider a sample size of T = 1500 and set p = 3. The observations are simulated using the following data generating process:

where Q Y t (τ |F t−1 ) is generated according to the three different CAViaR specifications described in (4)-(6). For the ES component we adopted both the multiplicative factor specification in (10) and the AR formulation in (11)-(12). Following Petrella and Raponi (2019), two different simulation scenarios are considered for the error terms t in (40):

(i) a multivariate Normal distribution (N 3 ) with zero mean and variance-covariance matrix equal to D tΣ D t , that is it ∼ N 3 (0, D tΣ D t );

(ii) a multivariate Student-t distribution (T 3 ) with 5 degrees of freedom, scale parameter D tΣ D t and non centrality parameter equal to D tξ , that is it ∼ T 3 (5, D tξ , D tΣ D t ).

The true values of the CAViaR model and the ES dynamics are calibrated using the real data in the empirical application. Specifically, we set For the CAViaR-IG dynamic, each element of the vector ω is considered in absolute value to guarantee that the autoregressive process in (6) Table 9 : Bias% and RMSE (in brackets) of point estimates for ω, η and β of the three CAViaR specifications in (4)-(6) with the ES modeled as a multiple of VaR as in (10), under the N 3 -and T 3 -scenarios. The last two rows of each Panel show the median number of iterations and CPU Time (in seconds) required to fit the model using a single run of the EM algorithm. (4)-(6), with the AR process for the ES as in (11)-(12). Panels A and B refer to the N 3 -and T 3 -scenarios, respectively, where the last two rows of each panel show the median number of iterations and CPU Time (in seconds) required to fit the model using a single run of the EM algorithm.

On the coherence of Expected Shortfall

Active portfolio management with benchmarking: Adding a Value-at-Risk constraint

Another look at mutual fund performance

Thinking coherently

Coherent measures of risk

Arbitrage equilibrium with skewed asset returns

Pessimistic portfolio allocation and choquet expected utility

The structure and degree of dependence: A quantile regression approach

A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticity models

Bayesian tail risk interdependence using quantile regression

Decomposing and backtesting a flexible specification for CoVaR

Dynamic Expected Shortfall: A spectral decomposition of tail risk across time horizons

Nonparametric estimation of conditional VaR and Expected Shortfall

Evaluating interval forecasts

Comparing predictive accuracy

Backtesting Expected Shortfall: accounting for tail risk

Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models

CAViaR: conditional autoregressive Value at Risk by regression quantiles

Measuring and testing the impact of news on volatility

Higher order elicitability and Osband's principle

Expected Shortfall is jointly elicitable with Value at Risk -implications for backtesting

Co-skewness and capital asset pricing

Vector-Valued property elicitation

Making and evaluating point forecasts

Converting Tail-VaR to VaR: An econometric study

Quantile autoregression

Portfolio selection

Estimation of tail-related risk measures for heteroscedastic financial time series: An extreme value approach

Alternative multivariate stable distributions and their applications to financial modeling

Elicitability and backtesting: Perspectives for banking regulation

Multivariate asset return prediction with mixture models

Dynamic semiparametric models for Expected Shortfall (and Value-at-Risk)

Joint estimation of conditional quantiles in multivariate linear regression models with an application to financial distress

Optimization of conditional Value-at-Risk

Portfolio selection based on asymmetric Laplace distribution, coherent risk measure, and expectation-maximization estimation

The sparse method of simulated quantiles: an application to portfolio optimization

Generating volatility forecasts from Value at Risk estimates

Estimating Value at Risk and Expected Shortfall using expectiles

Forecasting Value at Risk and Expected Shortfall using a semiparametric approach based on the asymmetric Laplace distribution

VAR for VaR: Measuring tail dependence using multivariate regression quantiles

Estimation of Value-at-Risk for energy commodities via CAViaR model

Value-at-Risk versus Expected Shortfall: A practical perspective

Optimal portfolios under a Value-at-Risk constraint

Bayesian quantile regression

A mean-CVaR-skewness portfolio optimization model based on asymmetric Laplace distribution

Modeling and forecasting Expected Shortfall with the generalized asymmetric Student-t and asymmetric exponential power distributions