key: cord-324254-qikr9ryf authors: Lyócsa, Štefan; Píhal, Tomáš; Výrost, Tomáš title: FX Market Volatility Modelling: Can we use low-frequency data? date: 2020-09-30 journal: Financ Res Lett DOI: 10.1016/j.frl.2020.101776 sha: doc_id: 324254 cord_uid: qikr9ryf High-frequency data tend to be costly, subject to microstructure noise, difficult to manage, and lead to high computational costs. Is it always worth the extra effort? We compare the forecasting accuracy of low- and high-frequency volatility models on the market of six major foreign exchange market (FX) pairs. Our results indicate that for short-forecast horizons, high-frequency models dominate their low-frequency counterparts, particularly in periods of increased volatility. With an increased forecast horizon, low-frequency volatility models become competitive, suggesting that if high-frequency data are not available, low-frequency data can be used to estimate and predict long-term volatility in FX markets. • We study 1-to-66 day-ahead volatility forecast of six major FX pairs. • For short forecast horizons high-frequency dominate low-frequency models. • High-frequency models are more accurate during market distress. • For longer forecast horizons low-frequency volatility models become competitive. • Low-frequency data can be used to accurately predict long-term volatility. The outburst of the global financial markets in 2008, the European debt crisis, (geo)political uncertainties, oil-price wars in 2019 and 2020, and the outbreak of COVID-19 in 2020 have resulted in a surge in volatility in financial markets worldwide. For example, investors use volatility estimates for pricing financial derivatives. Fund managers might set specific risk levels that are, in turn, influenced by the predicted level of volatility. Risk levels are also targeted by banks to fulfill specific Basel criteria. Volatility might even be traded (using options or artificial indices linked to market volatility Poon and Granger (2003) ). Times of extreme volatility also create pressure to rebalance portfolios, and the likelihood of contagion between markets also increases (Kodres and Pritsker, 2002) . Market participants are thus interested in measuring, managing, and forecasting market volatility to determine the value of their investments and to prepare and communicate their planned market decisions. The literature on volatility forecasting is rich and unfolds around available volatility estimators. Initially, volatility was calculated from low-frequency, daily data. The first generation of generalized autoregressive conditional heteroscedasticity (GARCH) models (Bollerslev, 1986 ) emerged in the 1990s and early 2000s and is represented by numerous variations using low-frequency data, e.g., EGARCH, GJR-GARCH, AP-ARCH, N-GARCH, NA-GARCH, I-GARCH, and FIGARCH (for an earlier review, see Poon and Granger, 2003) . The GARCH class of models offers competitive forecasts and can capture many stylized facts about volatility, particularly the volatility clustering effect. With the greater availability of high-frequency data in the late 2000s, the research shifted toward high-frequency (intraday) volatility estimators and models. The heterogeneous autoregressive (HAR) models of Corsi (2009) utilized high-frequency data and the realized volatility estimator of Andersen and Bollerslev (1998) and Andersen et al. (2001) . The empirical evidence suggests that models of volatility based on high-frequency estimators provide superior forecasts to models based on low-frequency data (e.g., Andersen et al., 2007 , Koopman et al., 2005 , Corsi et al., 2010 , Busch et al., 2011 , Horpestad et al., 2019 . Although the basic HAR model of Corsi (2009) is appealingly simple and appears to capture the short-and long-term dependency of the volatility process adequately (e.g., Andersen et al., 2007 , Vortelinos, 2017 , the literature has raised several issues related to the effect of microstructure noise 1 . Previously, Andersen et al. (2001) acknowledged that for realized volatility (high-frequency estimator of daily volatility) to be more efficient and unbiased, one needs high-quality data from actively traded assets. As a response, alternative estimators have emerged (e.g., Ait-Sahalia et al., 2005 , Bandi and Russell, 2008 , Barndorff-Nielsen et al., 2008 , Andersen et al., 2011 , Liu et al., 2015a . The second generation of GARCH models bridges these two strands of the literature by relying on the latent volatility model (the GARCH concepts) while also using high-frequency data. The key ideas of the realized-GARCH model were presented by Hansen et al. (2012) , and several alternative models emerged thereafter (e.g., Wu and Xie, 2019, Xie and Yu, 2019) . Despite the wide interest of academia, the existing literature provides evidence only that i) volatility estimators based on high-frequency data are theoretically preferred (Andersen et al., 1 The basic specification of the HAR model has also been enhanced, e.g., by the inclusion of semivariances (Patton and Sheppard, 2015) , the disentanglement of the realized volatility into continuous and jump components (e.g., Andersen et al., 2012) , the introduction of the measurement error of the realized volatility into the HAR model as in (Bollerslev et al., 2016) , the inclusion of nontrading volatility components (Lyócsa and Molnár, 2017, Lyócsa and Todorova, 2020) , and the use of hidden Markov chains (Luo et al., 2019) . 2001) and ii) in the day-ahead predictive setting, models using high-frequency data provide superior performance (e.g., Andersen et al., 2007 , Koopman et al., 2005 , Corsi et al., 2010 , Busch et al., 2011 , Horpestad et al., 2019 . Over longer horizons, averaging daily low-frequency volatility estimators across multiple days should reduce the effect of noise. Intuitively, intraday price fluctuations should not greatly contribute to month-ahead volatility forecasts. Therefore, with increasing forecast horizon, the difference in using high-or low-frequency volatility estimators should decrease, at which point low-frequency volatility models should tend to provide similarly accurate forecasts to high-frequency volatility models. Evidence on the relative (un)importance of low-frequency volatility models for multiple-day-ahead forecasts is lacking, which is intriguing, given that the heterogeneity of market participants has increased (with different needs and investment horizons, Wooldridge, 2019) and in many real-world scenarios, market participants are more interested in long-term forecasts, e.g., derivative traders. We fill this gap in the literature. In a recent study, Ma et al. (2018) showed that when low-and high-frequency volatility forecasts are combined appropriately, the accuracy increases for the Shanghai Stock Exchange Composite Index and S&P 500 index. Therefore, low-frequency data could provide additional information complementary to the available high-frequency data. Nevertheless, the study of Ma et al. (2018) is centered around day-ahead forecasts, where high-frequency volatility models should have the edge. In this study, we present the results from a volatility forecasting modeling framework that compares the forecasting accuracy of several low-and high-frequency volatility models as a function of the forecast horizon. Our market of interest is represented by six major currency pairs 2 . For some, the implications of our research could be substantial. If low-frequency volatility models provide competitive performance, one could argue that high-frequency data are not always worth the much higher costs. Daily foreign exchange data are freely available from various sources 3 , but availability of high-frequency foreign exchange data depends on the policy of the given broker or bank, and data are not always free 4 . Even if data are available 5 for free, they are subject to various constraints, e.g., have limited licensing (e.g., can be used only for academic purposes) or are available only for short time periods or for a specific time 2 Equities and commodities are addressed in a separate study and show qualitatively similar results. 3 e.g., finance.yahoo.com, investing.com. 4 For example, the well-known provider of high-frequency data, Tick Data (www.tickdata.com), provides tick-by-tick quote data (bid and ask prices) that are already cleaned and processed. Moreover, these data are from more contributors (banks and other market participants). The dataset that we used in our paper would cost approximately 8 100 USD after all discounts (July 2020). 5 e.g., Oanda, dukascopy frequency. Moreover, the use of high-frequency data raises other issues, most notably, working with high-frequency data requires appropriate cleaning and processing of the data. For example, the approximate sizes of the daily EUR/USD data from 2005 to 2019 is 120 kB, 5-second data is 350 MB, and tick-by-tick data is 15 GB. Processing daily data and estimating the models is overall much faster than processing and estimating models that use high-frequency data, where one needs to clean and prepare each line of the 15 GB of data. 6 Therefore, the processing, data management, and computational intensity demands are much higher for highfrequency data and might not be worth the greater effort. Our results illustrate the dominance of high-frequency estimators for forecasting one-day-ahead volatility. Models that utilize highfrequency data or their combinations provide superior results. However, for longer forecast horizons, the combination of low-frequency volatility models provides forecasts statistically comparable to those of high-frequency volatility models and their combinations. Our results suggest that for most foreign exchange market (FX) pairs, low-frequency data represent a sufficient replacement for high-frequency data for forecast horizons of 5 or more days. Our study might therefore provide practitioners and policymakers with evidence supporting the use of high-or low-frequency volatility models in a particular setting. 2.1. Volatility estimators 2.1.1. High-frequency estimator Given 5-minute intraday continuous returns r t,j for day t = 1, 2, ..., T and intraday period j = 1, 2, ..., N , the usual realized variance estimator 7 (e.g., Bollerslev, 1998, Andersen et al., 2001) is defined as: Many alternative estimators of quadratic variation to address the inherent microstructure noise exist (e.g., Zhang et al., 2006 , Jacod et al., 2009 , Andersen et al., 2012 . Our choice to use the 5-minute realized variance estimators is motivated by Liu et al. (2015b) , who compared the empirical accuracy of several estimators across many assets 8 . They found that consistently outperforming the simple 5-minute realized variance is difficult. 6 One needs to do this only once, but we want to stress that different types of skills and experience are also required to work with high-frequency data. 7 In the following text, we use the terms variance and volatility interchangeably. 8 Their comparison also included foreign exchange market futures. As an alternative low-frequency estimator, we use range-based estimators that are more efficient than the usual daily squared return (e.g., Molnár, 2012) . Motivated by Patton and Sheppard (2009), we increase the efficiency of the estimation process by combining three rangebased estimators via a simple average. Specifically, given the natural logarithm of opening (O t ), high (H t ), low (L t ), and closing (C t ) prices on day t, the Parkinson (1980) estimator is: The Garman and Klass (1980) estimator is: Both estimators assume that the price follows driftless geometric Brownian motion. Allowing for arbitrary drift, Rogers and Satchell (1991) derived the following estimator: The range-based estimator used in our empirical setting is the average (following Patton and Sheppard, 2009) of the above three estimators: The motivation behind using the (naive) equally weighted average is based on the assumption that we have no prior information on which estimator might be more accurate for a given trading day. 9 Should this simplified approach lead to competitive multiple-day-ahead volatility forecasts, it follows that a more sophisticated combination of low-frequency estimators might make the results even stronger. In this section, we describe what we refer to as high-and low-frequency volatility models. As the name suggests, high-frequency volatility models utilize realized variance as the estimator of volatility, whereas low-frequency models use the range-based estimator. We use three 9 The development and statistical verification of a method that continuously updates weights is left for further research. However, motivated by reviewer insights, we run our analysis and compare the results with low-frequency volatility models that use each of the three range-based estimators separately. A short discussion is presented in Section '4.3. Individual range-based low-frequency volatility forecasts'. classes of models: the heterogeneous autoregressive model (HAR) of Corsi (2009), the autoregressive fractionally integrated model (ARFIMA), and the realized generalized autoregressive conditional heteroscedasticity (realized-GARCH) of Hansen et al. (2012) . These models were selected because they can use either high-or low-frequency volatility estimators in a straightforward manner. Moreover, all these models have been proven to be capable of replicating long memory and volatility clustering effects. In the past decade, the simple HAR model proposed by Corsi (2009) has gained popularity since it is easy to estimate and tends to perform better than competing first-generation GARCH models (Horpestad et al., 2019) . Let RV t+1,t+h be the daily average realized variance calculated over the next h days. In this study, we are especially interested in the role of low-frequency estimators for multiple-day-ahead volatility forecasts. We employ 1-to-66 trading day-ahead forecasts. According to the recent Bank for International Settlements (BIS) survey, in 2019, 78% of the over-the-counter (OTC) foreign exchange derivatives had a maturity of less than one year. 10 For the low-frequency volatility models to be useful for a wide array of participants, they should produce competitive forecasts up to a forecast horizon of one year or less. As our analysis shows that after a few weeks, the lowfrequency volatility models tend to provide competitive forecasts across all FX pairs, we have used 66 trading days (three months) as a compromise between a few weeks and one year. Our baseline HAR model is therefore specified as: RV t is the realized variance, and RV t,t−4 , RV t,t−21 , andR t,t−65 are average realized variances calculated over the past 5, 22 and 66 days, respectively. The multiple-component volatility structure in (6) Vortelinos, 2017) , our specification differs only in that in addition to the one-month, we also incorporate the three-month volatility component, which is motivated by the fact that we are also predicting three-month (66-day) ahead volatility. The model is denoted RV-HAR, and the corresponding low-frequency, range-based version is denoted RB-HAR. We consider two other popular versions of the HAR model that aim to model the asymmetric volatility observed in financial markets. Let N SV t and P SV t , respectively, denote the negative and positive semivariances of (e.g., Barndorff-Neilsen et al., 2010, Patton and Sheppard, 2015) : I represents an indicator function that returns one if the condition in square brackets holds and zero otherwise. The HAR model is then defined as: We use only one-day lags of N SV t and P SV t to mitigate the number of estimated parameters, which might deteriorate the forecasting performance in an out-of-sample context. Such simplified models were also considered by Patton and Sheppard (2015) and Bollerslev et al. (2016) . This model is denoted SV-RV-HAR. As a low-frequency range-based counterpart, we use the following specification: R t is the daily return, and β 3 RB t × I [R t < 0] captures the asymmetric volatility response. The model is denoted ARB-RB-HAR. The final two specifications are also motivated by the asymmetric volatility literature, namely, Corsi and Reno (2009) and Horpestad et al. (2019): The coefficient β 4 captures the asymmetric effect, and β 3 controls for the size effect. Preve (2019) for a discussion of estimating HAR models). We next use an ARFIMA-GARCH model, for which the mean equation models variance: where d is the differencing parameter (e.g., Granger and Joyeux, 1980) , v t is the time-varying volatility 11 and η t is an iid variable following a flexible distribution Johnson (1949b,a). The variance equation is the exponential GARCH model of Nelson (1991): The sign and the size effects are captured by α and γ, and z t is the standardized innovation. The high-frequency volatility model employs the realized variance and is denoted RV-ARFIMA-GARCH, and the range-based estimator is denoted RB-ARFIMA-GARCH. Finally, due to their popularity and the development of more sophisticated second-generation GARCH models, we use the realized-GARCH model of Hansen et al. (2012), which can be adjusted to work with high-or low-frequency volatility estimators. The mean equation models daily returns: The variance and the measurement equations are: 11 In this case, it is the time-varying volatility of variance. Originally, Hansen et al. (2012) used realized variance, in which case we denote the model as realized-GARCH. If the range-based estimator is used instead, the model is called range-GARCH. The forecasting procedure uses a rolling-window framework. The algorithm is as follows: 1. Select observations from t = 1, 2, ..., T e . 2. Estimate volatility models. 3. Using estimated parameters and observations, predict volatility at T e + 1. For HAR models, multiple-day-ahead forecasts are predicted directly, while for ARFIMA-GARCH and real-GARCH models, multiple-day-ahead forecasts are calculated recursively. 4. Shift the estimation window by using observations t = 2, 3, ..., T e + 1 and repeat steps 2 to 4 until the end of the sample. The estimation window size is set to T e = 1000. We draw on the ideas of Bates and Granger (1969) and use simple combination techniques to mitigate model uncertainty (Timmermann, 2006) . Forecasts are combined across all highfrequency volatility models, all low-frequency volatility models, and all ten high-and lowfrequency volatility models. To combine forecasts, we use weighted averages, where the weights are given by the discounted forecast error. Let F m t and F t denote the forecasts from model m and the corresponding proxy, the realized variance RV t . Our first combination is a simple average across all forecasts: Here, the subscript H means that we averaged across high-frequency models. For low-frequency models, we use the subscript L, and for a combination across both classes of forecasts, we use HL. The loss (to be defined in the next section) is L t (F m t , F t ) and for simplicity is denoted as L t . We use the discounted forecast error to weight each loss value such that recent losses have higher weight than losses in the past, and we calculate the average loss over a time period of T (out-of-sample) observations:L where δ is the weighting parameter. With δ = 1, all losses have equal weights. The lower δ is, the higher the relative weight of the most recent losses. We choose δ = 0.975 and observe almost no qualitative change in results for δ = 0.950 or δ = 0.900. The weighted losses are calculated from the most recent 200 predictions, which we refer to as the size of the calibration sample. Thus, the first combination forecast is available for the 1201 st observation of the initial sample (Estimation window + calibration sample + 1). Our second combination is formed as the weighted trimmed mean: Here, F (m) t represents the ordered forecasts, i.e., the lowest and the highest are excluded, and L (m), * are the corresponding losses that are rescaled to sum to one. The final combination is a weighted average across the three best performing models and is denoted C T op H . As noted in the previous section, our proxy is the realized variance, RV t , which in the subsequent equations is denoted F t . This approach clearly places low-frequency models at a disadvantage, but we argue that this is the only meaningful way to test whether low-frequency models can achieve comparable performance to that of high-frequency models. We evaluate the forecasts of our model specification using two statistical loss functions and the model confidence set (MCS). According to Patton (2011), the mean square error (MSE) and quasi-likelihood (QLIKE) loss functions provide a consistent ranking of forecasts, even if the proxy of the underlying latent volatility is measured with noise. As the QLIKE loss function is less sensitive to extreme values and penalizes underestimation of volatility more strongly, we use it to present our key results. 12 Statistical evaluation is conducted based on the MCS proposed by Hansen et al. (2011) . This algorithm is suitable when models are nested, when a benchmark model is not specified, and when multiple models are evaluated, i.e., it controls for data-snooping bias. The MCS algorithm finds the 'superior set of models', which represents models with the same predictive ability at the selected confidence level. We study the market with the largest turnover in the world (approximately We collect data from OANDA using a 5-minute calendar sampling scheme over a 24-hour trading window that starts at 22:00 UTC (end of the New York session). Due to low liquidity, weekends are removed from the analysis to avoid estimation bias, as is standard in the literature (e.g., Dacorogna et al., 2001 , Andersen et al., 2007 , Aloud et al., 2013 , Gau and Wu, 2017 . <> The descriptive statistics for our daily volatility measures and returns are presented in Table 1 . We note several interesting differences between high-and low-frequency variance estimates. First, the distribution of the low-frequency variance estimates shows a higher spread of values, which we would expect from a noisier estimate. Specifically, the low-frequency variance estimate has an approximately 30% larger standard deviation with more skew and higher kurtosis. Second, on average, the low-frequency estimator is slightly smaller than its high-frequency counterpart 14 . Third, the persistence of the high-frequency estimators is higher and shows longer memory. This characteristic might prove to be useful in HAR models, which specifically exploit this persistence. Fourth, the correlation between daily high-and low-frequency variance For illustration purposes, Figures 1 and 2 plot the daily realized variance for the six FX pairs and corresponding day-ahead forecasts from ARFIMA-GARCH models, which tend to produce the most accurate day-ahead forecasts for both high-and low-frequency volatility models. The forecasts tend to follow realized variances but are unable to replicate sudden spikes in volatility, a phenomenon also visible in other forecasting studies. <> Our key results visualized in Figure 3 in bold and with the dagger symbol represent the models that belong to the MCS, i.e., the predictive abilities of the models in bold are considered to be equally good. For example, the best models for forecasting one-day volatility for EUR/USD (second column in Table 2 ) are C Ave H and C T rim H , which combine the results from high-frequency models (Panel C). <> The estimated coefficients reported in Table 6 show that over time, high-frequency models tend to produce more precise forecasts for the AUD/USD, EUR/USD, GBP/USD, and USD/CAN FX pairs, while the accuracy is also increased during more volatile periods. The opposite is true for USD/JPY, and the results are nonsignificant for USD/CHF. These results suggest that more accurate forecasting models could be designed with a conditional combination that would exploit the level of market volatility. Up to now, for our low-frequency volatility models, we have assumed that we do not have any ex ante information about which of the range-based estimators leads to more accurate volatility forecasts. Here, we discuss the results from low-frequency volatility models estimated separately for the Garman and Klass (1980), Parkinson (1980) and Rogers and Satchell (1991) estimators. Detailed tabulated results are available upon request. Our general observation does not change. Increasing the forecast horizon leads to more competitive forecasts from low-frequency volatility models regardless of the range-based estimator employed. Among the individual range-based estimators, the Garman and Klass (1980) estimator leads to lower forecast errors compared to the forecast errors generated from volatility models based on the equally weighted average of the range-based estimators. However, this does not mean that one should blindly prefer the Garman and Klass (1980) estimator, as there are two caveats. First, using only one range-based estimators has occasionally led to very inaccurate forecasts, which could successfully be avoided by using the average of the three range-based estimators. For example, in a day-ahead setting for the GBP/USD and USD/CAD pairs, the forecast errors from the RB-HAR models with the Garman and Klass (1980) These examples suggest that in many practical scenarios, using the average across estimators should be preferred to using individual estimators. As many subjects interact with the FX market, predicting the market's uncertainty is crucial for improved risk management. While high-frequency data lead to superior volatility estimates, the acquisition, data management and computational costs associated with such data cannot be covered by all market participants. Moreover, low-frequency data are publicly available and are much easier to work with. This leads to our question from the title 'Can we use low-frequency data? '. In this paper, we compare the forecasting performance of several volatility models that use either low-or high-frequency volatility estimates or both. On the basis of a sample of six major currency pairs, our results suggest that for short-forecast horizons (from 1 to 5 days), high-frequency models dominate their low-frequency counterparts. As the forecast horizon increases, the advantage of the high-frequency models disappears, and low-and high-frequency forecasts become statistically comparable. The answer to the question proposed in the title is 'if high-frequency data are not available, then low-frequency data can be used to estimate and predict long-term market volatility'. Moreover, regardless of whether one relies on high-or low-frequency volatility models, one should utilize combination forecasts. The Mincer and Zarnowitz (1969)) tests further suggest that at least part of the inaccuracy of low-frequency volatility forecasts is due to bias. Finally, we find that high-frequency models tend to be more superior during periods of increased volatility. These results have implications for researchers and investors alike, as they demonstrate that low-frequency volatility models can provide competitive performance to that of highfrequency models under some circumstances. Our study notes that high-frequency data might not always be worth the much higher acquisition, data management and processing costs, especially if the forecast horizon of interest is sufficiently long. Note: ρ(.) is the value of the auto-correlation coefficient at the given lag. The SD is the standard deviation. The correlation between high-and low-frequency variance estimators is 0.90, 0.86, 0.96, 0.83, 0.88, and 0.89 for AUD/USD, EUR/USD, GBP/USD, USD/CAD, USD/CHF, USD/JPY. Notes: The values in bold and with † symbol denote model confidence set for given currency pair. In other words, we can not reject the hypothesis that these models have the same predictive performance at the level of α = 0.15. All models and forecast combinations are described in Section 2. The values in bold and with † symbol denote model confidence set for given currency pair. In other words, we can not reject the hypothesis that these models have the same predictive performance at the level of α = 0.15. All models and forecast combinations are described in Section 2. The values in bold and with † symbol denote model confidence set for given currency pair. In other words, we can not reject the hypothesis that these models have the same predictive performance at the level of α = 0.15. All models and forecast combinations are described in Section 2. The values in bold and with † symbol denote model confidence set for given currency pair. In other words, we can not reject the hypothesis that these models have the same predictive performance at the level of α = 0.15. All models and forecast combinations are described in Section 2. Note: The results correspond to the modelling of the loss differential between C T rim H and C T rim L forecasting models by the means of lagged realized variance and trend variable. All coefficients are multiplied by 10 4 . Significances are based on the variance-covariance matrix estimated using a quadratic spectracl weighting scheme and Newey and West automatic bandwidth selection. */**/*** correspond to 10%, 5% and 1% significance levels. How often to sample a continuous-time process in the presence of market microstructure noise. The review of financial studies Stylized facts of trading activity in the high frequency fx market: An empirical study The distribution of realized stock return volatility Answering the skeptics: Yes, standard volatility models do provide accurate forecasts Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility. The review of economics and statistics Realized volatility forecasting and market microstructure noise Jump-robust volatility estimation using nearest neighbor truncation Microstructure noise, realized variance, and optimal sampling Measuring downside risk: realised semivariance Designing realized kernels to measure the ex post variation of equity prices in the presence of noise The combination of forecasts Generalized autoregressive conditional heteroskedasticity Are low-frequency data really uninformative? a Štefan Lyócsa: Conceptualization, Methodology, Software, Data curation, Formal analysis, Writing -Original draft preparation, Writing -Review & Editing, Visualization, Funding acquisition, Project administration.: Tomáš Plíhal: Conceptualization, Methodology, Data curation, Investigation, Writing -Original draft preparation Financial support from the ...(to be added later)... is acknowledged gratefully.