key: cord-0642675-b8gy801t
authors: Zhang, Chao; Zhang, Yihuang; Cucuringu, Mihai; Qian, Zhongmin
title: Volatility forecasting with machine learning and intraday commonality
date: 2022-02-08
journal: nan
DOI: nan
sha: 063e1c9f46af8633b69f4ae67c7655102962d2bb
doc_id: 642675
cord_uid: b8gy801t

We apply machine learning models to forecast intraday realized volatility (RV), by exploiting commonality in intraday volatility via pooling stock data together, and by incorporating a proxy for the market volatility. Neural networks dominate linear regressions and tree models in terms of performance, due to their ability to uncover and model complex latent interactions among variables. Our findings remain robust when we apply trained models to new stocks that have not been included in the training set, thus providing new empirical evidence for a universal volatility mechanism among stocks. Finally, we propose a new approach to forecasting one-day-ahead RVs using past intraday RVs as predictors, and highlight interesting diurnal effects that aid the forecasting mechanism. The results demonstrate that the proposed methodology yields superior out-of-sample forecasts over a strong set of traditional baselines that only rely on past daily RVs.

Forecasting and modeling stock return volatility has been of interest to both academics and practitioners over the past years. Recent advances in high-frequency trading (HFT) highlight the need for robust and accurate intraday volatility forecasts. Intraday volatility estimates are important for pricing derivatives, managing risk, and devising quantitative strategies.

To the best of our knowledge, unlike daily volatility forecasting models, forecasting intraday volatility has received scant attention in the research literature. It is pointed out that conventional parametric models, such as GARCH and stochastic volatility models, may be inadequate for modeling intraday returns [1] . In recent works [4, 20] , high-frequency data are used to estimate daily realized volatility (RV) by summing squared intraday returns. However, these methods are potentially restrictive and are often difficult to apply when forecasting intraday volatility. Some related literature will be reviewed briefly in the next section.

In the present paper, we study and analyze various non-parametric machine learning models for forecasting multi-asset intraday and daily volatilities by using high-frequency data from the U.S. equity market. We demonstrate that, by taking advantage of commonality in intraday volatility, the model's forecasting performance can significantly be improved. The data we explore spans the period from July 2011 to June 2021, and includes the top 100 most liquid components of S&P500 index. In our approach, the 10-min, 30-min, 65min, and daily (without overnight information) forecasting horizons are studied.

Main contributions. The main contributions of our work can be summarized as follows. First, a measure for evaluating the commonality in intraday volatility is proposed, that is the adjusted R-squared value from linear regressions of a given stock's RVs against the market RVs. It is demonstrated that commonality over the daily horizon is turbulent over time, although commonality in intraday RVs is strong and stable. On the other hand, the analysis of the high-frequency data from the real market reveals the following interesting phenomena. During a trading session, commonality achieves a peak near the closing session, in contrast to the diurnal volatility pattern.

Second, in order to assess the benefits of incorporating commonality into models aimed at predicting intraday volatility, we train multiple machine learning algorithms (including HAR, OLS, LASSO, XGBoost, MLP, and LSTM) under three different schemes: (a) Single: training specific models for each asset; (b)

Universal: training one model with pooled data for all assets; (c) Augmented: training one model with pooled data and adding an additional predictor which takes account the impact of market realized volatility. We find that for most models, the incorporation of commonality leads to better out-of-sample performance through pooling data together and adding market volatility as additional features.

In addition, the empirical results we present in the paper demonstrate that neural networks (NNs) are, in general, superior to other techniques. New empirical evidence is provided, in order to demonstrate the capability of NNs for handling complex interactions among predictors. Furthermore, to alleviate the concerns of overfitting, we conduct a stringent out-of-sample test, using the existent trained models to forecast the volatility of completely new stocks that are not included in the training sample. Our results reveal that NNs still outperform other approaches (including the OLS models trained for each new stock), thus presenting empirical evidence for a universal volatility mechanism among stocks, similar to the findings in Sirignano and Cont [58] concerning universal features of price formation in equity markets.

We conclude the paper by proposing a new approach for predicting daily volatility, in which the past intraday volatilities rather than the past daily volatilities are used as predictors. This approach fully utilizes the available high-frequency data, and contributes to the improvement over traditional methods of modeling daily volatilities. In other words, the results presented in this paper demonstrate that machine learning models, where past intraday volatilities are used as predictors, generally outperform the traditional models with past daily volatilities (e.g. HAR [20] , SHAR [54] , HARQ [12] ). To the best of our knowledge, this is the first line of work that studies the effectiveness of past intraday volatilities in forecasting future daily volatility.

Paper outline. The remainder of this paper is structured as follows. We begin in Section 2 by reviewing the related literature. Section 3 describes the data and the definition of realized volatility. Section 4 discusses the commonality in intraday volatility. Section 5 introduces various machine learning models and three training schemes for predicting future intraday volatility. Section 6 provides the forecasting results and discusses the empirical findings. In Section 7, we introduce a new approach to forecasting daily volatility using past intraday volatility as predictors. Finally, we conclude our analysis in Section 8.

Our study is built on several research streams by various authors over the recent years. The first stream is related to the research on the commonality in financial markets. Chordia et al. [18] have recognized the existence of commonality in liquidity, and Karolyi et al. [43] have suggested that commonality in liquidity is related to market volatility, in particular, the presence of international investors and trading activity. Dang et al. [23] have made an observation that the news commonality is associated with stock return co-movement and liquidity commonality.

The co-movement in daily volatility is well known from the previous literature. Traditional GARCH and stochastic volatility models (e.g. Andersen et al. [2] , Calvet et al. [14] ) all make use of the volatility spillover effects. Herskovic et al. [38] have provided empirical evidence of the co-movement in volatility across the equity market. Bollerslev et al. [11] have observed strong similarities in daily realized volatility and have utilized them to forecast the daily realized volatility. Engle and Sokalska [29] have emphasised that pooled data is useful for intraday volatility forecasting, and Herskovic et al. [39] have reported that volatilities co-move strongly over time. However, there is still a void of research related to commonality in intraday volatility and its implications for managing intraday risks, especially for forecasting purposes.

Second, there are numerous contributions made by many researchers on the topic of forecasting daily volatility. However, most methods proposed by various researchers for modeling and forecasting return volatility largely rely on the parametric GARCH or stochastic volatility models, which provide forecasts of daily volatility from daily return. As pointed out by Andersen et al. [4, 2] , Engle and Patton [28] , these models employed to predict daily volatility cannot take advantage of high-frequency data, and suffer from the curse of high-dimensionality when dealing with multiple assets simultaneously. Due to the availability of high-frequency data, realized volatility (RV), computed from summing squared intraday returns, has gained popularity in recent years. Andersen et al. [4] have proposed an ARFIMA model for forecasting daily RVs, which outperforms conventional GARCH and related approaches. Corsi [20] has put forward a parsimonious AR-type model, termed Heterogeneous Autoregressive (HAR), for predicting daily RVs using various realized volatility components over different time horizons. Recently Izzeldin et al. [42] have made a comparison investigation for the forecasting performance of ARFIMA and HAR, and have concluded their performance is essentially indistinguishable. See Section 7 for more models to predict daily volatility.

On the other hand, little attention has been paid to the role of forecasting intraday volatility. Taylor and Xu [59] proposed an hourly volatility model based on an ARCH specification, and Engle and Sokalska [29] constructed a GARCH model for intraday financial returns, by specifying the variance as a product of daily, diurnal, and stochastic intraday components. These models, such as traditional GARCH and stochastic volatility, are potentially restrictive due to their parametric nature, and are not able to effectively take into account the non-linear and highly complex relationships among different financial variables.

Third, machine learning (ML) models have demonstrated great potential in finance, such as their applications in asset pricing. The high-dimensional nature of ML methods allows for better approximations to unknown and potentially complex data-generating processes, in contrast with traditional economic models. Gu et al. [31] have pointed out the superior performance of ML models for empirical asset pricing. Sirignano and Cont [58] have used LSTMs to forecast high-frequency price movements, and have provided empirical evidence for the existence of a universal and stationary price formation mechanism.

Recently, Xiong et al. [60] have applied LSTMs to forecast S&P 500 volatility, with Google domestic trends as predictors, and Bucci [13] has demonstrated that RNNs are able to outperform all the traditional econometric methods in forecasting monthly volatility of the S&P index. More recently, Rahimikia and Poon [55] have compared machine learning models with HAR models for forecasting daily realized volatility by using variables extracted from limit order books and news. Li and Tang [48] have proposed a simple average ensemble model combining multiple machine learning algorithms for forecasting daily (and monthly) realized volatility, and Christensen et al. [19] have examined the performance of machine learning models in forecasting one-day-ahead realized volatility with firm-specific characteristics and macroeconomic indicators.

The main goal of the present paper is to assess the usefulness of non-parametric ML models through the lens of forecasting multi-asset intraday volatilities.

We use the Nasdaq ITCH data from LOBSTER 1 to compute intraday returns via mid-prices. We select the top 100 components of S&P500 index, for the period between 2011-07-01 and 2021-06-30. After filtering out the stocks for which the dataset does not span the entire sample period, we are left with 93 stocks. Table 1 presents the number of stocks in each sector, according to the GICS sector division 2 .

Number 

In a general form, P i,t denotes the price process of a financial asset i and it follows

where µ i is the drift, σ i,t is the instantaneous volatility, and W t is the standard Brownian motion. The theoretical integrated variance (IV) of stock i during (t − h, t] is estimated as

where h is the look-back horizon, such as 10 minutes, 30 minutes, 1 day, etc.

Throughout this paper, we consider the minutely logarithmic return for asset i during (t − 1, t] as

Here, P i,t is the mid-price at time t, i.e.

, and P b i,t (respectively, P a i,t ) represents the best bid (respectively, ask) price.

Andersen et al. [3] , Barndorff-Nielsen and Shephard [9] showed that the sum of squared intraday returns is a consistent estimator of the unobservable IV. Because of the availability of high-frequency intraday data, we choose to compute realized volatility as a proxy for the square root of the unobserved IV (see Bollen and Inder [10] , Hansen and Lunde [35] , Andersen et al. [3] ). To reduce the impact of extreme values, we consider the logarithm, in line with Andersen et al. [4] , Bucci [13] , Herskovic et al. [38] . Specifically, during a period 

As pointed out by Pascalau and Poirier [51] , there are no conclusive methods to incorporate the overnight session's information content into the daily volatility. In line with Engle and Sokalska [29] , overnight information is excluded from our empirical analysis of daily volatility. For simplicity, we refer to this daily scenario (excluding the overnight) as the "1-day" scenario, throughout the rest of this paper.

To mitigate the effect of possibly spurious data errors, for each stock, we set the data of return/volatility below the 0.5% percentile set to the 0.5% percentile, and data above the 99.5% percentile set to the 99.5%

percentile, a process commonly referred to as winsorization. Figure 1 illustrates the pairwise Pearson and Spearman correlations of returns and realized volatilities. This figure depicts the empirical distribution of pairwise correlation coefficients over the entire sample period. We observe generally higher correlations in realized volatility than the counterparts in return. Figure 1 also reveals that, on average, as the horizon gets longer, realized volatility's correlations increase from 0.598 (10-min) to 0.731 (30-min) to 0.766 (65-min).

However, when turning to daily realized volatility, correlations in RVs become weaker, with an average of 0.514. This indicates that the connections between stocks in terms of intraday volatility may be more stable and tight than the ones in daily volatility. we follow an analogous procedure to estimate the commonality in volatility. Specifically, we use the average adjusted R-squared value from the following regressions across stocks, as a measure of commonality in

where RV

M,t is the contemporaneous market volatility during (t − h, t] for stock i, which is calculated as the equally weighted average of all individual stock volatilities during (t − h, t]. Figure 4 presents the commonality in realized volatility, averaged across stocks for each month. To create this figure, we use the observations in each month, to obtain the R-squared value from Eqn (5) . We notice that commonality effects in intraday scenarios (especially 30-min, 65-min) are substantially larger than the daily ones. For example, as reported in Table 2 , the average commonality in 65-min data is around 74.3%, while only 35.5% in daily data. Moreover, R 2 (h) is much more turbulent at the daily frequency. Table 2 also reports the results of the relation between the average commonality and the market volatility. As the horizon extends, the average commonality has a higher correlation with the market volatility 5 . . We observe a gradual increase in commonality throughout the trading session as we get closer to market close, in sharp contrast to the diurnal volatility pattern in Figure 3 . 5 We refer the reader to additional analysis on commonality in Appendix A. 

In this section, we leverage commonality for the task of predicting the cross-asset volatility. We construct the prediction model as follows

where x :t is a vector of features for all stocks in the studied universe up to t, denoted as market features, such as market volatility. θ refers to the parameters that need to be estimated.

We are aiming to find a function of variables which minimizes the out-of-sample errors for realized volatility.

This section summarizes the six models employed in our numerical experiments.

Corsi [20] proposed a volatility model, named as Heterogeneous autoregressive (HAR), which considers realized volatilities over different interval sizes. HAR has shown remarkably good forecasting performance on daily data [54, 42] . For day t, the forecast of HAR is based on

where RV

i,t ) denotes the daily (weekly, monthly) realized volatility in the past day (week, month), respectively. The choice of a daily, weekly and monthly lag is aiming to capture the long-memory dynamic dependencies observed in most realized volatility series.

However, very little attention has been paid to forecasting intraday volatility with HAR. One closely connected model is that of Engle and Sokalska [29] , who proposed an intraday volatility forecasting model, where they interpret that conditional volatility of high-frequency returns is a product of daily, diurnal, and stochastic intraday components. After the decomposition of raw returns, the authors apply a GARCH model [27] to learn the stochastic intraday volatility components.

Following the spirit of Engle and Sokalska [29] , we extend the daily HAR model to intraday scenarios by adding diurnal-effect features, as follows 6

where D i,τ t+h denote the average diurnal realized volatility in the bucket-of-the-day τ t+h computed from the last 21 days. For example, when t = 10:30 and h = 30 minutes, then τ t+h corresponds to the bucket 10:30-11:00. RV

i,t ) denotes the aggregated daily (weekly, monthly) realized volatility. When we consider the daily scenarios, Eqn (8) becomes the standard HAR model (Eqn (7)), by removing the diurnal term. For simplicity, we denote this model as HAR-d in the following experiments.

Instead of using aggregated realized volatility, we apply OLS to original features, as follows, with its loss function being the sum of squared errors. Let u = (u 1 , . . . , u p ) represent the vector of input features

When the number of predictors approaches the number of observations, or there are high correlations among predictor variables, the OLS model tends to overfit noise rather than signals. This is particularly burdensome for the volatility forecasting problem, where the features could be highly correlated.

LASSO is a linear regression method that can avoid overfitting via adding a penalty of parameters to the objective function. As pointed out by Hastie et al. [37] , LASSO performs both variable selection and regularization, therefore enhances the prediction accuracy and interpretability of regression models.

The objective function of LASSO is the sum of squared residuals and an additional l 1 constraint on the regression coefficients, as shown in Eqn (10) . Here, the hyperparameter λ controls the penalty weight.

In our experiments, we provide a set of hyperparameter values, and then choose the one with the best performance on the validation data, as our forecasting model.

Linear models are unable to capture the possible non-linear relations between the dependent variable and the predictors, and the interactions among predictors. One way to add non-linearity and interactions is the decision tree, see more in Hastie et al. [37] .

XGBoost is a decision-tree-based ensemble algorithm, implemented under a distributed gradient boosting framework by Chen and Guestrin [16] . There is abundant empirical evidence showing the success of XGBoost, such as in a large number of Kaggle competitions. In this work, we only review the essential idea behind

XGBoost -tree boosting model. For more details about other important features of XGBoost, such as the scalability in various scenarios, parallelization, distributed computing, feature importance to enhance interpretability, etc., the reader may refer to [16] . Let u represent the vector of input features

where F is the space of regression trees. An example of the tree ensemble model is depicted in Figure 6 . The tree ensemble model in Eqn (11) is trained sequentially. Boosting (see Friedman [30] ) means that new models are added to minimize the errors made by existing models, until no further improvements are achieved.

Tree 1 Tree 2 Tree B … Figure 6 : Illustration of a tree ensemble model. B is the number of trees and the final prediction is the sum of predictions from each tree, as shown in Eqn (11).

Another non-linear method is the neural network (NN), which has become increasingly popular for machine learning problems, e.g. in computer vision and natural language processing, due to the flexibility to learn complex interactions. However, NNs also suffer from many problems, such as the lack of robustness, transparency, and interpretability, over-parameterization, etc.

MLP is a class of feedforward neural networks and a "universal approximator" that can learn any smooth functions (see Hornik et al. [41] ). MLP has been applied to many fields, e.g. computer vision, natural language processing, etc. MLPs are composed of an input layer to receive the raw features, an output layer that makes forecasts about the input, and in-between those two, an arbitrary number of hidden layers that are non-linear transformations. The parameters in MLPs can be updated via stochastic gradient descent (SGD). In this work, we use Adam [45] , which is based on adaptive estimates of lower-order moments. Let u ∈ R p represent the input variables

where θ :

. . , L, and n 0 = p. For the activation function σ(·), we choose the rectified linear unit (ReLU), i.e. σ(x) = max(x, 0).

LSTM, proposed by Hochreiter and Schmidhuber [40] , is an artificial recurrent neural network (RNN) architecture, which is well-suited to classifying, processing and making predictions based on time series data. For simplicity, we consider the time series for a given stock and remove the subscript for stock identity.

The standard transformation in each unit of LSTM is defined as follows. For a more detailed discussion, we refer the reader to [40] .

where u t is input vector, f t is forget gate's activation vector, i t is update gate's activation vector, o t is output gate's activation vector,c t is cell input activation vector, c t is cell state vector, and h t is hidden state vector,

i.e. output vector of the LSTM unit. σ g is sigmoid function, and σ c , σ h are hyperbolic tangent function.

refer to weight matrices and bias vectors that need to be estimated.

Motivated by the strong commonality in volatility across stocks, we consider the following three different schemes for the model training.

• Single denotes that we train customized models F i for each stock i, as in [13, 34] . We use a stock's own past RVs only as predictor features, namely x

and no market features, where p represents the number of lags.

• Universal denotes that we train models with the pooled data of all stocks in our universe. That is, F i is same for all stocks in Eqn (6) . As in the Single scheme, we use a stock's own past RVs only as predictor features and no market features. Sirignano and Cont [58] showed that the model trained on the pooled data outperforms asset-specific models trained on time series of any given stock, in the sense of forecasting the direction of price moves. Bollerslev et al. [11] , Engle and Sokalska [29] suggested that models estimated under the Universal setting yield superior out-of-sample risk forecasts, compared to models under the Single setting, when forecasting daily realized volatility.

• Augmented denotes that we train models with the pooled data of all stocks in our universe, but in addition, we also incorporate a predictor which takes into account the impact of the market realized volatility (e.g. Bollerslev et al. [11] ) in order to leverage the commonality in volatility shown in Section 4. Namely, F i is same for all stocks in Eqn (6) . We use both individual features x

M,t−(p−1)h ) as predictors. Note that for HAR-d models under the Augmented setting, we include aggregated market features as additional features, and use OLS to estimate the parameters.

In summary, compared to the benchmark Single setting, we gradually incorporate cross-asset and market information into the training of models. The hyperparameters for each model are summarized in Appendix B.

To assess the predictive performance for RV forecasts, we compute the following metrics 7 on the out-of-sample data (see [52, 29, 51, 11, 13, 55] ).

• Mean squared error (MSE): 1

i,t represents the predicted value of RV i,t on the test data. N is the number of stocks in our universe, T test is the testing period, and #T test is the length of the testing period.

Diebold-Mariano (DM) test. This test is used to discriminate the significant differences of forecasting accuracy between different time series models [26, 25] . Denote the loss associated with forecast error e t by L(e t ), e.g. L(e t ) = e 2 t . Then the loss difference between the forecasts of models a and b is given by

t ) represents the forecast error from model a (b), respectively. The DM test makes one assumption that d Under the covariance stationary assumption, we have the test statistic

is the sample mean of d Following Gu et al. [31] , we apply a modified DM test, to make pairwise comparisons of models' performance when forecasting multi-asset volatility. In other words, the modified DM test compares the cross-sectional average of prediction errors from each model, rather than comparing errors for each individual

where e Table 3 : Results for predicting future realized volatility over multiple horizons using different models under three training schemes. For each horizon, the model with the best (second best) out-of-sample performance in MSE is highlighted in red (blue), respectively. Universal while significant benefits from Augmented.

• We observe similar findings in LASSO as in OLS, suggesting that regularization does not further aid performance.

• XGBoost slightly underperforms linear regressions, possibly due to overfitting. The best out-of-sample performance of XGBoost is achieved under the Universal setting. Incorporating market volatility does not provide additional predictive power.

• MLPs and LSTMs achieve state-of-the-art accuracy across all measures and intraday horizons, implying the complex interactions between predictors. Further analysis is provided in Section 6.3.

• Linear models slightly outperform MLPs and LSTMs at the 1-day horizon. This is perhaps expected, and might be due to the availability of only a small amount of data at the 1-day horizon, rendering the neural networks to underperform due to lack of training data.

Let us now consider the OLS model as an illustrative example for understanding the relative reduction in error. We compare its mean squared errors under these three schemes, at a monthly level, as shown in Figure 7 . For better readability, we report the reduction in error of Universal relative to Single (denoted as Univ-Single), the reduction of Augmented relative to Universal (denoted as Aug-Univ), and the reduction of Augmented relative to Single (denoted as Aug-Single). Note that Aug-Single = (Aug-Univ) + (Univ-Single). Negative values of ∆MSE indicate an improvement on out-of-sample data, and positive values indicate degradation. To arrive at this figure, we average the ∆MSE values in each month, across stocks. Figure 7 reveals that the improvement of Universal compared to Single is relatively small but consistent. In terms of the benefits of Augmented, it is typically the case that incorporating the market volatility as an additional feature helps improve the forecasting performance, especially for turmoil periods. An interesting question to investigate is whether the improvements of Universal or Augmented for individual stocks are associated with their commonality with the market volatility. To this end, we present the results in Figure 8 for each quintile bucket, sorted by stock commonality (computed from Eqn (5)). From this figure, we observe that the reduction of Augmented in out-of-sample MSE relative to Universal is explained by commonality to a large extent. Generally, the out-of-sample MSE is expected to decline steadily for stocks with higher commonality. Another interesting result arises from Figure 8 values for the commonality.

This section provides intuition for why neural networks perform as strong as they do, with an eye towards explainability. Due to the use of non-linear activation functions and multiple hidden layers, neural networks enjoy the benefit of allowing for potentially complex interactions among predictors, albeit at the cost of considerably reducing model interpretability. To better understand such a "black-box" technique, we provide the following analysis to help illustrate the inner workings of neural networks and explain their competitive performance.

Relative importance of predictors. In order to identify which variables are the most important for the prediction task at hand, we construct a metric (see Sadhwani et al. [56] ) based on the sum of absolute partial derivatives (Sensitivity) of the predicted volatility. In particular, to quantify the importance of the k-th predictor, we compute

Here, F is the fitted model under the Augmented scheme, u represents the vector of predictors and u k is the k-th element in u. u i,t represents the input features of stock i at time t. We normalize all variables' sensitivity, such that they sum up to one. In a special case of a linear regression, the sensitivity measure is the normalized absolute slope coefficient.

Considering the 65-min scenario as an example, Figure 9 reveals that for both OLS and MLP, there has been a tendency of the lagged features to decline in terms of sensitivity, as the lag increases. Additionally, we observe that the sensitivity values rise to a high point at every 6 lags, corresponding to 1 day. A distinct difference between the sensitivity values implied by OLS and the ones implied by MLP is that the latter places more weight on the lag=1 individual RV (Sensitivity=0.90) and less on the lag=1 market RV (Sensitivity=0.059). On the other had, for OLS, the sensitivities of lag=1 individual (resp. market) RV are 0.081 (resp. 0.069). Interaction effects. To analyze the interactions between the two most significant features implied by neural networks, we adopt an approach (e.g. Gu et al. [31] , Choi et al. [17] ) that focuses on the partial relations between a pair of input variables and responses, while fixing other variables at their mean values

where q represent the quantile values for the i-th predictor u i . Figure 9 (b) first reveals that the predicted volatility is non-linear in RV Figure 9 (b), the distances between the curves become relatively smaller, conveying the message that, when an individual stock is very volatile, the market effect on it weakens. 

(a) OLS, 65-min 

To examine the ability to generalize and address concerns about overfitting, we perform a stringent outof-sample test, i.e. using the existent trained models to forecast the volatility of new stocks that have not been included in the training sample, in the spirit of Sirignano and Cont [58] , Choi et al. [17] . For better distinction, we denote the stocks used for estimating machine learning models as raw stocks, and those new stocks not in the training sample as unseen stocks 9 . We follow the procedure of training, validation, and testing periods described in Section 6.1. Specifically, to predict the RVs of unseen stocks in a particular

year, we train and validate the models using the past data of raw stocks exclusively.

In this experiment, we choose OLS models trained for each unseen stock as the baseline. The results are shown in Table 4 . Note that models trained under Single cannot be applied to forecast unseen stocks, since they are trained for each specific raw stock individually. From 

Given the fact that intraday volatility exhibits a high and stable commonality (see Sections 3 and 4), we are interested in the potential benefits of using past intraday RVs to forecast daily RVs.

Generally speaking, there are two broad families of models used to forecast daily volatility: (i) GARCH and stochastic volatility (SV) models that employ daily returns; and (ii) models that use daily RVs (e.g.

ARFIMA [4] , HAR [20] , SHAR [54] , HARQ [12] ). Previous well-established studies have shown that due to the utilization of available intraday information, daily realized volatility is a superior proxy for the unobserved daily volatility, when compared to the parametric volatility measures generated from the GARCH and SV models of daily returns (see [4, 9, 42] ). It is worth noting that in the traditional forecasting daily RV models, only past daily RVs (or their alternatives) are included as predictors. Even though this is a mainstream approach in the literature, it does not benefit to the full extent from the availability of intraday data.

In this section, we introduce a set of commonly used models, where the daily variables (such RV, semi-RV [54] ) are employed as predictors. For simplicity, we refer to these models as traditional approaches.

OLS. The first benchmark explored is the OLS model with different lengths of lagged daily RVs, as shown in Eqn (9) . Two specifications of OLS are considered: (1) OLS(1d) denotes only using the lag one RV as the predictor; (2) OLS(21d) denotes using RVs in the last month as predictors.

HAR. See 5.1.1.

Recently, Patton and Sheppard [54] proposed the Semi-variance-HAR (SHAR) model as an extension of the standard HAR model (see further details in 5.1.1), in order to exploit the well-documented leverage effect [8] by decomposing the total RV of the first lag via signed intraday returns, as shown in Eqn (18) . In other words, the lag one RV in SHAR (Eqn (19)) is split into the sum of squared positive returns and the sum of squared negative returns, as follows.

Recall that RV

i,t ) denote the aggregated weekly (resp, monthly) realized volatility.

HARQ. Bollerslev et al. [12] pointed out that the beta coefficients in the HAR model can be affected by measurement errors in the realized volatilities. By exploiting the asymptotic theory for high-frequency realized volatility estimation, the authors propose an easy-to-implement model, termed as HARQ (Eqn (21)).

The realized quarticity (RQ) is estimated according to Eqn (20) , aiming to correct the measurement errors.

(d) t = M 3 M −1 i=0 r 4 t−i·∆(20)RV t,t+b = α + β (d) + β (d)Q RQ (d) t RV (d) t + β (w) RV (w) t + β (m) RV (m) t + t+b .(21)

Previous sections concluded that the most recent RV plays a more important role in forecasting future volatility. Motivated by the fact that intraday volatility has a high and stable commonality, we propose a new prediction approach for forecasting daily volatility, denoted by Intraday2Daily approach.

Recall that RV i,t , computed for intervals of length h. Figure 11 illustrates the comparison between the traditional approaches and our Intraday2Daily approach.

Traditional:

Intraday2Daily: 

The forecasting performance of benchmark models with daily variables are summarized in Table 5 . From this table, we find that the SHAR model generally performs as well as the standard HAR model, in line with Bollerslev et al. [12] . HARQ outperforms other commonly used models, including HAR and SHAR, when applied to individual stocks studied in the present paper. Table 6 reports the results of models combined with the Intraday2Daily approach 10 . In other words, models in Table 6 use sub-sampled intraday RVs rather than the lag-one total RV. For example, regarding HARQ (Eqn (7)) in Panel A of Table 6 , the lag-one total RV is replaced by non-overlapped intraday RVs.

For other machine learning models in Panel B of Table 6 , intraday RVs in the last day and daily RVs in the previous month are input as predictors.

• By comparing Table 5 with Panel A of Table 6 , we establish that the Intraday2Daily approach generally helps improve the out-of-sample performance of benchmark models. For example, under the Single setting, compared to OLS(1d) using daily RVs (MSE=0.284), 65-min RVs (MSE=0.267)

improve the out-of-sample performance.

• OLS(21d) (resp. HARQ) is the best (resp. second best) benchmark model when combined with the Intraday2Daily approach.

• MLPs with intraday RVs again achieve the best out-of-sample performance. 

To offer a more comprehensive understanding of the performance of time-of-day dependent RVs, we examine the coefficients of the Intraday2Daily OLS model trained under Augmented. Recall that before we input features into the model, we rescale them to have a mean of zero and a standard deviation of one.

Hence we can compare the coefficients of different lagged variables.

For better readability, we only report the first 13 = (390/30) coefficients of the OLS model using 30-min features in Figure 12 , corresponding to the observations of RV in the most recent day 11 . We observe that the To explain why the most recent half-hour RV is the most important predictor for forecasting the next day's volatility, we provide a handful of perspectives. According to [7] , there is a significant fraction of the total daily trading volume in the last half-hour of the trading day. For example, for the first few months of 2020 in the US equity market, about 23% of trading volume in the 3,000 largest stocks by market value has taken place after 15:30. We also conclude from Figure 5 that the market achieves the highest level of consensus near the close. Therefore, volatility near the close in the previous trading day might contain more useful information for predicting the next day's volatility.

In this paper, the commonality in intraday volatility over multiple horizons across the U.S. equity market is studied. By leveraging the information content of commonality, we have demonstrated that for most machine learning models in our analysis, pooling stock data together (Universal) and adding the market volatility as an additional predictor (Augmented) generally improves the out-of-sample performance, in comparison with asset-specific models (Single).

We show that neural networks achieve superior performance, possibly due to their ability to uncover and model complex interactions among predictors. To alleviate concerns of overfitting, we perform a stringent out-of-sample test, applying the existent trained models to unseen stocks, and conclude that neural networks still outperform traditional models.

Lastly and perhaps most importantly, motivated by the high commonality in intraday volatility, we propose a new approach (Intraday2Daily) to forecast daily RVs using past intraday RVs. The empirical findings suggest that the proposed Intraday2Daily approach generally yields superior out-of-sample forecasts.

We further examine the coefficients in Intraday2Daily OLS models, and the results suggest that volatility near the close (15:30-16:00) in the previous day (lag=1) is the most important predictor.

There are a number of interesting avenues to explore in future research.

One direction pertains to the assessment of whether other characteristics, such as sector RVs, can improve the forecast of future realized volatility, since in the present work, we have only considered the individual and market RVs. Another interesting direction is to apply the underlying idea of Intraday2Daily approach to other risk metrics, e.g. Value-at-Risk, that could potentially benefit from time-of-day dependent features.

Previous studies, especially in the behavioural finance field, have shown that investor sentiments could affect stock prices [5, 46, 21, 11, 22, 43, 32] . Keynes [44] argued that animal spirits affect consumer confidence, thereby moving prices in times of high levels of uncertainty. De Long et al. [24] , Shleifer and Summers [57] , Kogan et al. [46] found that investor sentiments induce excess volatility. Karolyi et al. [43] considered the investor sentiment index as an important source of commonality in liquidity. Bollerslev et al. [11] found a monotonic relationship between volatility and sentiment, possibly driven by correlated trading. In this section, we are interested in the relation between investor sentiments and commonality in volatility.

Traditionally, there are two approaches to measure investor sentiments [22] , i.e. market-based measures and survey-based indices. Following Baker and Wurgler [5] , we consider the daily market volatility index (VIX) from Chicago Board Options Exchange to be the market sentiment measure. We use the Consumer Sentiment Index (CSI) 12 by the University of Michigan's Survey Research Center as a proxy for surveybased indices (see Carroll et al. [15] , Lemmon and Portniaguina [47] ). Generally speaking, CSI is a consumer confidence index, calculated by subtracting the percentage of unfavorable consumer replies from the percentage of favorable ones. Following Da et al. [22] , we also include a news-based index EPU 13 proposed by Baker et al. [6] to measure policy-related economic uncertainty.

As suggested by Morck et al. [50] , the raw monthly commonality measures R 2 (h),m (computed based on Eqn (5)) are inappropriate to use as the dependent variable in regressions, because they are bounded by 0 and 1. Consistent with [50, 43, 23] , we take the logistic transformation of R 2 (h),m , i.e. log R 2 (h),m /(1 − R 2 (h),m ) , denoted by (R 2 (h),m ) L , in our following empirical analysis. To explain the commonality in volatility, we regress (R 2 (h),m ) L against the aforementioned three indices, as shown in Eqn (23) (R 2 (h),m ) L = α + β 1 CSI m + β 2 VIX m + β 3 EPU m + i,t . Table 7 : Results of time series regressions of average commonality in volatility (R 2 (h),m ) L against three sentiment measures, VIX, CSI, and EPU. Superscript * denotes the significance levels of 5%.

Besides the market volatility (VIX), we also find a significant effect of consumer sentiment (CSI) on the commonality of volatility over every studied horizon. The level of commonality is higher in times of higher market volatility and consumer sentiments. In addition, we observe that the coefficients of VIX and CSI for commonality in intraday volatility (especially for 30-min, 65-min) are substantially smaller than those in the daily case.

There is no hyperparameter to tune in HAR-d and OLS. For LASSO, we use the standard 5-fold crossvalidation method to determine λ 1 . Hyperparameters for other models are summarized as follows. 

Intraday periodicity and volatility persistence in financial markets

Volatility and correlation forecasting. Handbook of economic forecasting 1

The distribution of realized stock return volatility

Modeling and forecasting realized volatility

Investor sentiment in the stock market

Measuring economic policy uncertainty

The 30 minutes that can make or break the trading day

Measuring downside risk-realised semivariance

Econometric analysis of realized volatility and its use in estimating stochastic volatility models

Estimating daily volatility in financial markets utilizing intraday data

Risk everywhere: Modeling and managing volatility

Exploiting the errors: A simple approach for improved volatility forecasting

Realized volatility forecasting with neural networks

Volatility comovement: a multifrequency approach

Does consumer sentiment forecast household spending? if so, why?

Xgboost: A scalable tree boosting system

Alpha go everywhere: Machine learning and international stock returns. Available at SSRN 3489679

Commonality in liquidity

A machine learning approach to volatility forecasting

A simple approximate long-memory model of realized volatility

In search of attention

The sum of all fears investor sentiment and asset prices

Commonality in news around the world

Noise trader risk in financial markets

Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of diebold-mariano tests

Comparing predictive accuracy

Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation

What good is a volatility model? In Forecasting volatility in the financial markets

Forecasting intraday volatility in the us equity market. multiplicative component garch

Greedy function approximation: a gradient boosting machine

Empirical asset pricing via machine learning

Stock market declines and liquidity

Neural network ensembles

A forecast comparison of volatility models: does anything beat a garch (1, 1)?

Realized variance and market microstructure noise

A transaction data study of weekly and intradaily patterns in stock returns

The elements of statistical learning: data mining, inference, and prediction

The common factor in idiosyncratic volatility: Quantitative asset pricing implications

Firm volatility in granular networks

Long short-term memory

Multilayer feedforward networks are universal approximators

Forecasting realised volatility using arfima and har models

Understanding commonality in liquidity around the world

The general theory of employment, interest, and money

Adam: A method for stochastic optimization

The price impact and survival of irrational traders

Consumer confidence and asset prices: Some empirical evidence

Forecasting realized volatility: An automatic system using many features and many machine learning algorithms

Does anything beat 5-minute rv? a comparison of realized measures across multiple asset classes

The information content of stock markets: why do emerging markets have synchronous stock price movements

Increasing the information content of realized volatility forecasts

Volatility forecast comparison using imperfect volatility proxies

Evaluating volatility and correlation forecasts

Good volatility, bad volatility: Signed jumps and the persistence of volatility

Machine learning for realised volatility forecasting. Available at SSRN 3707796

Deep learning for mortgage risk

The noise trader approach to finance

Universal features of price formation in financial markets: perspectives from deep learning

The incremental volatility information in one million foreign exchange quotations

Deep learning stock volatility with google domestic trends