key: cord-0166419-btiafish
authors: Hecq, Alain; Voisin, Elisa
title: Predicting crashes in oil prices during the COVID-19 pandemic with mixed causal-noncausal models
date: 2019-11-25
journal: nan
DOI: nan
sha: f739fcbd660564392297989eea4c345bd5f37c9a
doc_id: 166419
cord_uid: btiafish

This paper aims at shedding light upon how transforming or detrending a series can substantially impact predictions of mixed causal-noncausal (MAR) models, namely dynamic processes that depend not only on their lags but also on their leads. MAR models have been successfully implemented on commodity prices as they allow to generate nonlinear features such as locally explosive episodes (denoted here as bubbles) in a strictly stationary setting. We consider multiple detrending methods and investigate, using Monte Carlo simulations, to what extent they preserve the bubble patterns observed in the raw data. MAR models relies on the dynamics observed in the series alone and does not require economical background to construct a structural model, which can sometimes be intricate to specify or which may lack parsimony. We investigate oil prices and estimate probabilities of crashes before and during the first 2020 wave of the COVID-19 pandemic. We consider three different mechanical detrending methods and compare them to a detrending performed using the level of strategic petroleum reserves.

This paper aims at forecasting Brent and WTI oil price series during the first wave of the COVID-19 pandemic outbreak in 2020 using the recent literature on mixed causal-noncausal autoregressive models (hereafter MAR). Namely, time series processes with lags but also leads components and non-Gaussian errors. This new specification can, in a parsimonious way, model locally explosive episodes in a strictly stationary setting. It can therefore capture nonlinear features such as bubbles (which is defined here as a persistent increase followed by a sudden crash), often observed in commodities prices, while standard linear autoregressive models (e.g. ARMA models) cannot do so. MAR models have successfully been implemented on several commodity price series (see inter alia , Hecq, Issler, and Telg, 2020 , Fries and Zakoïan, 2019 , Gouriéroux and Zakoïan, 2017 , Cubadda, Hecq, and Telg, 2019 , Lof and Nyberg, 2017 , Karapanagiotidis, 2014 . 1 Similarly to Gouriéroux and Zakoïan (2013) , our goal when introducing a lead component in oil prices is not to provide an economic justification for the existence of a rational bubble. However, the link with a present value model between prices and dividends (Campbell and Shiller, 1987) can enrich the discussion and it also explains the difficulties to find economic fundamentals for oil prices. This motivates our choice to use proxies such as technical methods to extract the bubble component. Let us indeed consider a general model (see Diba and Grossman, 1988) in which the real current stock price P t is linked to the present value of next period's expected stock price P t+1 , dividend payments D t+1 and an unobserved variable u t+1 ,

with E t the conditional expectation given the information set known at time t. The discount factor is 1 1+r with r being a time-invariant interest rate. The general solution of (1) is (e.g. Diba and Grossman, 1988 )

1 An alternative strategy to ours is to consider autoregressive processes with breaks in coefficients. Indeed, autoregressive processes with successively unit roots, explosive and stable stationary episodes are also able to capture locally explosive episodes. See among many others Phillips, Wu, and Yu (2011) and the survey papers by Homm and Breitung (2012) or Bertelsen (2019) . Yet, for the purpose of forecasting, we argue for the choice of a model with constant coefficients as more adequate.

As can be seen in Figure 3 in Section 4, oil prices series do not appear to be stationary over time. Consequently, before estimating MAR models we intend to extract a smooth time-varying trend to render the series stationary without affecting the dynamics. By extracting a trend from the series we do not claim to identify the fundamental values of oil prices but instead detrend the series while preserving the dynamics of the prices in the remaining cycle and more specifically the noncausal component. As such, we obtain stationary series that retain their forward-looking aspect and which can be modeled as MAR processes. Obviously, a wrong detrending can give misleading results if it alters the dynamics of the cycle. Consequently, investigating the impact of different technical detrending filters on the identification of MAR models is the first contribution of this paper. Similarly to what Canova (1998) does for business cycles, we investigate the extent to which the identification of causal and noncausal dynamics are sensitive to different filters. We then study the consequences on the predictive densities of oil prices after applying different detrending methods. Inspired by the work of Kilian and Murphy (2014) , who constructed a structural VAR model of the global market for crude oil, we make use of US crude oil Strategic Petroleum Reserve (SPR), a sub-part of total petroleum stocks, for a potential trend in oil prices in Section 4. Hence, the second contribution of this paper is to compare the MAR estimations and predictions of oil price series after using technical detrending with the results obtained after detrending with the SPR levels.

The rest of this paper is as follows. Section 2 describes mixed causalnoncausal models and explains the different technical detrending methods employed in this analysis, leaving the locally explosive components in the cycle. In Section 3, the impact of the different detrending filters on model identifications is investigated using a Monte Carlo study, based on trends estimated in oil prices series. We investigate the identification of the models but also the magnitude of the coefficients estimated as they are the main drivers of the predictions. Section 4 analyzes the impact of these filters on the WTI and the Brent crude oil price series for ex-post and real-time analyses. We compare the results with those obtained after detrending with US SPR levels. We show how each detrending approach affects probabilities that oil price crashes in the period capturing the first 2020 wave of the COVID-19 pandemic. Section 5 concludes.

2 Mixed causal-noncausal models and filtering 2.1 The model MAR(r, s) denotes dynamic processes that depend on their r lags as for usual autoregressive processes but also on their s leads in the following multiplicative form Φ(L)Ψ(L −1 )y t = ε t ,

with L the backward operator, i.e., Ly t = y t−1 gives lags and L −1 y t = y t+1 produces leads. When Ψ(L −1 ) = (1−ψ 1 L −1 −...−ψsL −s ) = 1, namely when ψ 1 = ... = ψ s = 0, the univariate process y t is a purely causal autoregressive process, denoted MAR(r,0) or simply AR(r) model, Φ(L)y t = ε t . Reciprocally, the process is a purely noncausal

. The roots of both the causal and noncausal polynomials are assumed to lie outside the unit circle, that is Φ(z) = 0 and Ψ(z) = 0 for |z| > 1 respectively. These conditions imply that the series y t admits a two-sided moving average representation y t = ∞ j=−∞ γ j ε t−j , such that γ j = 0 for all j < 0 implies a purely causal process y t (with respect to ε t ) and a purely noncausal model when γ j = 0 for all j > 0 ( Lanne and Saikkonen, 2011) . Error terms ε t are assumed iid (and not only weak white noise) non-Gaussian (with potentially infinite variance) to ensure the identifiability of the causal and the noncausal parts (Breidt, Davis, Li, and Rosenblatt, 1991, Gouriéroux and Zakoïan, 2015) . While noncausal models are strictly stationary, their conditional moments are time-varying. A purely stationary noncausal MAR(0,1) Cauchy-distributed process, has a unit root in its conditional mean and exhibit ARCH-type effects (see Zakoïan, 2017 and Cavaliere, Nielsen, and Rahbek, 2018) . Figure 1 shows a purely causal (a) and a purely noncausal (b) trajectories induced by the same Student's t(2)-distributed errors, both with coefficient 0.8 and 200 observations. For the purely causal process, a shock is unforeseeable and affects the series only once it happened, inducing a large jump in the series. On the other hand, for purely noncausal processes, a shock impacts the process ahead of time, mirroring the purely causal trajectory. Indeed, we see that the series already reacts to a positive shock by increasing until a sudden crash, creating bubble patterns. This anticipative aspect is widely observed in financial and economics time series. The detrended Brent crude oil prices as shown in Figure 5 noticeably exhibit such features, the most apparent episode being the 2008 financial crisis. A combination of causal and noncausal dynamics consequently creates some asymmetry around a shock, varying with the magnitude of the respective coefficients. The advantage with oil prices is that they already underwent bubbles in the past, and those previous locally explosive episodes will help identifying MAR models. In the case where series are for the first time following a long and abnormal increase, an explosive process is difficult to distinguish from a stationary locally explosive one.

The focus of this paper is on the probabilities of crashes. Predictions are performed using the approximation methods of Gourieroux and Jasiak (2016) and Lanne, Luoto, and Saikkonen (2012) since no closed-form of the predictive density exists when the errors of the process follow a Student't distribution. For a detailed analysis of the two approximation methods see . 2

The requirement of y t being stationarity for both lag and lead polynomials gave rise to different strategies to transform nonstationary series to stationary ones. Hecq et al. (2020) and Cubadda et al. (2019) assume 3 that their commodity price series are I (1) and work with the returns ∆y t . However, this operation eliminates most of the locally explosive behaviors and the transformed series consist of many spikes instead.

In this paper, we capture the trending behavior of the observed series denotedỹ t in different ways using the general form

In this framework,ỹ t is the (potentially nonstationary) observed series and f t a generic trend function. The deviation ofỹ t from its trend is an MAR(r, s) process. Several authors, although sometimes not explicitly, use this decomposition. Cavaliere et al. (2018) opt for the choice of a particular time period with no trend and hence use only an intercept f t = µ. Hencic and Gouriéroux (2015) detrendỹ t using a polynomial trend function of order three. In summary, we could consider several choices among the following 2 A description of how the methods are used in this analysis can be found in Appendix A in the online material.

3 The locally explosive features of the data make unit root tests doubtful.

deterministic trends,

with k some positive integer and t = 1, 2, . . . , T.

Note (see Section 4) that since a larger order of polynomial allows for more flexibility, we consider polynomial trends of order four and six for the trending pattern of the monthly oil prices series considered in this analysis. More complex trends, constructed as a combination of the aforementioned examples could also be considered, such as (multiple) breaks in trends for instance. use the Hodrick-Prescott filter (HP) before detecting bubbles in Nickel monthly prices. The HP filter, as opposed to the aforementioned deterministic trends, extracts the trend process f (4) t via a minimization that relies on a penalizing parameter denoted λ.

The larger this parameter, the smoother the trend component is (that is, with λ approaching infinity, the extracting trend becomes linear). For details about the HP filter see Hodrick and Prescott (1997) . It is now commonly accepted to use λ = 1 600 for quarterly data. For other frequencies, the rule of thumb consists in adjusting the parameter to the frequency relative to quarterly data, λ = number of observations per year 4 i × 1 600, with either i = 2 (Backus and Kehoe, 1992) or i = 4 (Ravn and Uhlig, 2002) , yielding respectively a penalizing parameter of 14 400 and 129 600 for monthly series. Most criticisms of the HP filter concern its application on series with complicated stochastic and deterministic trends. Phillips and Shi (2019) propose an adaptation of the filter improving its accuracy for such series. 4 We investigate in Section 3 the potential dynamic distortions that can be induced by HP filtering (see among others Hamilton, 2018) but find no significant distortions of the mixed causal-noncausal dynamics.

Note that we are not interested in the exact value of a forecast but rather in its direction and potential magnitude. This is why we extract smooth trends to preserve the dynamics in the series. This allows to estimate predictive densities of oil prices based on the statistical properties of the data alone in a parsimonious way, and not from the construction of complicated structural models. However, wrongly detrending the series could have a significant impact on the estimation of the noncausal dynamics of the process, which could in turn strongly under-or over-estimate the longevity of explosive episodes and therefore of the probabilities of crashes and of turning points.

The aim of this section is to analyze the effect of wrongly detrending a series, both on the identification of the MAR model and on the subsequent predictions performed with the resulting model. We base this analysis on stylized facts observed in oil prices series.

We simulate 5 000 trajectories for 12 distinct data generating processes (hereafter dgp), composed of a trend and a stationary dynamic process denoted as cycle. All dgps are generated by Student's t-distributed errors with 2 degrees of freedom, a value frequently observed in financial time series, and with 400 observations. For the cycles, we consider purely noncausal processes with a lead coefficient of 0.8, purely causal processes with a lag coefficient of 0.6 and mixed causal-noncausal processes with a lag coefficient of 0.6 and a lead coefficient of 0.8. The heavy-tailed distribution generates extreme values, inducing bubble-like phenomena in processes with noncausal components. We are interested in mostly forward looking processes characterized by long lasting bubbles hence the choice of coefficients. We consider three different deterministic trends: a linear trend with breaks (denoted breaks) and two trend polynomials up to orders 4 and 6 (denoted respectively τ 4 and τ 6 for simplicity). The coefficients of the trends were estimated on the monthly WTI crude oil prices series between 1986 and 2019. Figure  2 depicts the three mentioned trends to which purely causal, noncausal and mixed causal-noncausal trajectories are added. Additionally, we consider processes with an intercept only. This results overall in 12 sets of 5 000 trajectories of the formỹ t = f t + y t . Four detrending methods are employed for each trajectories, with the general formỹ t =f t +ŷ t . Estimated polynomial trends of orders 4 and 6 and HP filters with λ = 14 000 and λ = 129 600 are applied (respectively denoted t 4 , t 6 , HP 1 and HP 2 ). 5 To gauge and compare the accuracy of the detrending methods, Table 1 shows the average mean square errors (MSE) between the true cycle ofỹ t (y t ) and the one obtained after detrending (ŷ t ). The average MSEs are computed over the 5 000 replications of each dgp and for the four detrending approaches, M SE k,d = 1 5 000

where k indicates the dgp, d the detrending method used, and i the i -th replication with 1 ≤ i ≤ 5 000.

The MSEs are minimized when the correct polynomial trend is employed or when the lower order is employed (4 in this case) in the absence of trend in the dgp. However, underestimating the order of the polynomial trend leads to significantly larger discrepancies. Distortions between the true cycle and the detrended series are larger for mixed causal-noncausal processes than for purely causal or noncausal processes. Furthermore, in the presence of noncausal dynamics the HP filter with λ = 14 400 (HP 1 ) distorts more the series than HP 2 . Hence, we can expect that a low penalizing parameter in the HP filter mostly captures some of the noncausal dynamics. However, HP 1 distorts the least the cycles to which the linear trend with breaks was added. It is the method that best manages to mimic this non-smooth trend due to this flexibility induced by its low penalizing parameter. Notes: Are reported the average MSEs over 5 000 trajectories with sample size T = 400. HP 1 corresponds to the HP filter with λ = 14 400 and HP 2 to the HP filter with λ = 129 600.

To investigate the impact of detrending on dynamic processes, we perform MAR estimations on the raw and detrended series from each dgp. The estimation of MAR models first consists in estimating the pseudo causal lag order. Since the autocorrelation structure of mixed or purely causal and noncausal processes are identical, we can estimate the order of autocorrelation (p) with information criteria by OLS. Once this order p is estimated, the identification of the lag and lead orders (r and s respectively) is performed by maximum likelihood among all MAR(r,s) models such that r + s = p (Lanne and Saikkonen, 2011) . We do so using the MARX package in R (Hecq, Lieb, and Telg, 2017) . Table 2 presents the frequencies of identifying wrong models in each of the 12 dgp, based on the detrending methods, with a maximum pseudo causal lag order of 4. 6 Proportions of a wrongly identified the pseudo lag order in the first step of the estimation using BIC are reported (p = 1 and p = 2), as well as the proportions of wrongly identified MAR models, namely when at least one of the lag or lead order mis-identified. We also report the frequency with which no noncausal dynamics is identified (s = 0). For the purely causal processes we only report in the last column (s > 0), i.e. the frequency with which spurious noncausal dynamics is detected.

Let us first focus on the models with noncausal dynamics (the MAR(0,1) and MAR(1,1) dgps) for which we report the frequencies with which we over-or underestimate the pseudo causal lag order in the first step of the estimation. We can see that HP 1 under-performs relative to the other approaches. Indeed, around twice as many lag orders are wrongly estimated in the first step on average, with a maximum of 22.84% for the MAR(1,1) processes with breaks in the linear trend. However, this non-smooth trend seems to be difficult to capture by the filters considered in this analysis. We can see from the last five rows of Table 2 that detrending this type of processes with breaks -with the four methods employed here -does not improve the correct identification of the orders of the model, and can even make it worse for MAR(1,1). This can be explained by the construction of the trend, mimicking somehow a bubble pattern, with a long and persistent expansion when the linear trend is present and followed by a sudden crash when the series returns to a stationary process. This might be mistaken for noncausal dynamics, ensuring a non zero lead order identification when the series is not detrended. This claim is supported by the results in the last column, indicating large proportions of wrongly detected noncausal dynamics for each detrending approaches, with 7.54% for HP 1 and more than 28% for the others. For the dgps with other trends (or only intercept) HP 1 wrongly estimates the pseudo causal lag order at most 10.78% of the time. For the three other detrending methods the pseudo lag order is wrongly identified in less than 7.3% of the cases. Note that when the lag order is wrongly identified, it is almost always due to over-identification. The discrepancy between the two HP filters is explained by the low penalizing parameter in HP 1 allowing the trend to mimic the series too much. By that, some of the dynamics of the MAR process are absorbed by the trend.

It is notably more harmful not to detrend when necessary than the contrary. As can be seen on the upper rows of Table 2 , applying polynomial trends or HP 2 do not increase the proportions of wrongly identified models by more than 1.6% compared to estimations on the raw series. However, when the existing trend is ignored, the pseudo lag order is wrongly estimated twice as much on the raw series than for the detrended series, and the MAR models are wrongly identified up to 6 times more than the best performing detrending method. Furthermore, the incorrect identification of the pseudo lag order p accounts for most of the proportion of wrongly identified MAR models. If p is correctly estimated, the model is also correctly identified in more than 99% of the cases. Note that the pseudo causal lag order identified is never zero, meaning that no detrending completely absorbs all dynamics. Besides, in no more than 0.62% the detrending methods killed the noncausal dynamics, as is indicated by the columns s = 0.

Let us now consider the last column, displaying the results for purely causal processes. We here investigate whether detrending can create spurious noncausal dynamics (s > 0). We find that (ignoring the dgp composed of the trend with breaks) as long as the polynomial trend order is not underestimated, in less than 3.46% of the cases noncausal dynamics was wrongly detected. For the processes with a polynomial trend of order 6, detrending with a polynomial trend of order 4 creates spurious noncausal dynamics in 60.02% of the cases.

Overall, for a dgp with noncausal dynamics, the impact of ignoring a trend is quite significant while detrending when not necessary has negligible effects on model identification. Both the polynomial trends and the HP filter with λ = 129 600 (HP 2 ) perform equally well with respect to identifying the correct orders of the model. Choosing a penalizing parameter λ too low alters the dynamics of the process as shown by the results from HP 1 . All of the approaches almost always retain the noncausal dynamics, but rarely create spurious noncausal dynamics when nonexistent in the dgp (except when the polynomial trend order is underestimated). The lead order is not always the correct one but in less than 0.62% for all cases no noncausal dynamics is found. The presented results only report identification of the model lag and lead orders. To have a better understanding of the impact of the detrending methods on the dynamics, focus needs to be put on the impact on the estimated coefficients and parameters of the models identified.

Detailed results on the impact on estimated coefficients are available in Appendix B in the online material. Overall, we find that due to low penalization, HP 1 absorbs too much of the dynamics (mostly the noncausal ones) in the resulting trend. Hence, for monthly data, we advise to use the HP filter with penalization parameter 129 600. It is also rather harmful to underestimate the order of the polynomial trend, which results in a significantly larger lead coefficient. When the fundamental trend consists of breaks (mimicking bubbles), the smooth detrending methods do not succeed in capturing the trend and this translates in much more persistent noncausal dynamics. We also investigate the effect of detrending white noise series; while for the raw series, 6.82% of the models were identified with dynamics, 7.34% were identified with dynamics for the HP filtered series with penalizing parameters 129 600. Hence we find no significant creation of dynamics when applying the HP filter to a white noise.

This section investigates the impact of detrending both for in-sample and real-time analyses. WTI and Brent crude oil monthly prices series are employed, ranging from June 1987 to December 2020. The series consist of endof-period prices, which enables us to adequately time our analysis based on the outbreak of the COVID-19 pandemic and the appearances of worldwide regulations and lock-downs to counter its spread. Figure 3 shows that both series are characterized by bubble episodes, which we define in this paper as rapidly increasing episodes followed by a sharp decline, the main one being during the financial crises in 2008. The series are also characterized by various sudden crashes. The highlighted gray bar represents the period of interest in this analysis. The earliest point of the period is December 2019; at this point almost no information was available on the coronavirus and no worldwide outbreak had already taken place. Then, we can see that as the outbreak started and regulations were increasingly being imposed worldwide, the price of crude oil significantly dropped. Brent crude oil prices fell from around $68 at the end of December 2019 to around $15 by the end of March 2020, point at which most European countries imposed national lock-downs. The restrictions of movement within and between countries thus induced a sharp and sudden decrease in the demand for crude oil.

As shown in Figure 3 , the series are probably nonstationary but considering their growth rate would eliminate the locally explosive episodes that are interesting to exploit. The two series appear almost identical until the 2008 financial crisis, period from which we can observe more apparent discrepancies. The last part of the samples is rather noisy and volatile, and estimating a trend on such a part is not straightforward. 7 We seek to extract a smooth trend without affecting the dynamics of the series. Based on the findings of Section 3, we consider the deterministic polynomial trends of orders 4 and 6 as well as the HP filter with λ = 129 600 (denoted t 4 , t 6 and HP respectively). We furthermore employ an economic variable -described in the following section -as another trend to compare economically motivated detrending with mechanical detrendings. The analysis focuses on the probabilities for oil prices to drop and investigates the potential magnitude of such decrease. We first consider an in-sample analysis, that is, the trends and the MAR models are estimated over the whole sample, from June 1987 to December 2020. Then, we fix the estimated parameters and use this information to perform one-month ahead density forecasts for the months of January, February, March and April 2020. The in-sample analysis includes as much information as possible and therefore reduces estimation uncertainty. We then compare the in-sample analysis to a real-time forecast exercise. In the real time analysis, we re-estimate the trends and the MAR models at each point of the period of interest. That is, we consider an expanding sample and perform one-month ahead density forecasts for points that are out-of-sample. 

There is an extensive literature on modeling oil prices using economic variables. As an example, Kilian and Murphy (2014) construct a structural VAR model for the real price of oil, making use of stationary transformations of economic variables, namely the real economic activity index constructed in Kilian (2009) as well as inventories and production of crude oil. In this analysis we however do not construct a structural model for the price of oil, but instead we investigate ways of detrending prices without altering the inherent dynamics of the process. As such, we suggest employing the US crude oil Strategic Petroleum Reserve (hereafter SPR) levels. These reserves were established primarily to reduce the impact of disruptions in supplies of petroleum stocks (Kilian and Zhou, 2020a) . This variable therefore incorporates not only expectations regarding the economic activity but also regarding the production of crude oil. US SPR stock is depicted against WTI crude oil prices in Figure 4 . WTI prices Crude oil SPR stocks Figure 4 : Raw WTI prices and US crude oil SPR stocks SPR is significantly less volatile than total crude oil stocks as it is a last resort reserve and is not often made use of as it requires approval of the US President. 8 This characteristic of the series makes it a good candidate for the smooth trend we intend to extract from oil price series. We hence detrend prices (both nominal and real) by taking the residuals from a standard OLS regression of prices on crude oil SPR levels. 9

To save space, Figure 5 only depicts the detrended Brent series, 10 after the polynomial trends, the HP filter and SPR levels were used to detrend the 8 Limited release can be allowed by the Secretary of Energy for crude oil loans to nongovernmental entities, as is described by the Energy department of the US. 9 As shown in Figure 3 , WTI and Brent price series seem to follow a similar trend; we therefore also employ US SPR stocks to detrend Brent prices. We find statistical support for cointegration between prices (both nominal and real prices of WTI and Brent) and US SPR levels. Hence the remaining cycles are, as intended, stationary.

10 Data and results for the prices-adjusted series that are not presented here are available upon request.

whole sample. The SPR-detrended series consists of the residuals obtained from a standard OLS regression of the prices on the SPR levels. We can see that the HP -detrended series (black solid line) and the t 6 -detrended series (dashed line) are very much alike over the majority of the sample. The polynomial trend of order 4 (dotted line) however seems to induce some more variations than the other two mechanical detrending. This is especially visible at the beginning and at the end of the sample, stemming from the lack of flexibility of such trend due to its lower order. This latter detrending method suggests that the end of 2020 is as extreme as the period between 2010 and 2014, during which prices were in fact twice as large. We can see that overall SPR-detrended series follows a similar pattern than the others but shows slightly more persistent dynamics. This could stem from the fact that the SPR series, while being rather smooth, still displays more dynamics than the 3 other trends considered here. Hence, it could slightly alter the dynamics in the remaining cycle. SPR-detrended series has a correlation of 0.84 with both HP -and t 6 -detrended series. Note that until the end of the 1980s, there was a persistent increase in SPR due to the creation and initial filling of the reserves which started in 1977, explaining the induced downward trend at the beginning of the detrended sample. We estimate MAR models with Student's t-distributed errors and set the maximum pseudo lag length in the first stage on the estimation to 4. All resulting models are MAR(1,1) and are reported in Table 3 . We report the lag and lead coefficients as well as the degrees of freedom of the distribution and their respective standard errors in parentheses. Models estimated on series that were detrended with a polynomial trend of order 6 and with the HP filter are the most similar, as suggested by Figure 5 . Models estimated after detrending with the polynomial trend of order 4 slightly deviate from the two others and always have a larger lead coefficient, hence indicating more persistence in the explosive episodes. Recall that Section 3 suggests that underestimating the trend order in mixed causal-noncausal models induces on average an overestimation of the noncausal coefficient. All series are mostly forward looking, as the lead coefficients are at least 0.8 while the lag coefficients are at most 0.31. 11 We can see that, as expected, SPRdetrended series are slightly more persistent in their noncausal dynamics with a lead coefficient up to 0.1 larger than other detrending methods and also slightly larger degrees of freedom induced by more persistent extreme events. The identification of the dynamics is overall consistent across series and their transformation. Note that adjusting the series for inflation leads to larger estimated degrees of freedom for the Student's t distribution but overall to similar dynamics. Notes: The models are obtained with a maximum pseudo lag order of 4 and for each series the model identified was an MAR(1,1). φ is the lag coefficient, ψ is the lead coefficient and γ the degrees of freedom of the Student's t distribution. The polynomial trend are trends up to the order indicated and the HP filtering is performed with a penalization parameter λ = 129 600. In parentheses are reported the standard error of the coefficients estimated obtained with the MARX package (Hecq et al., 2017) .

Lacking closed-form expressions for the predictive densities, we use the two data-driven approaches mentioned in Section 2. We employ the simulations-based approach of Lanne et al. (2012) , which only depends on the model estimated and the last observed point and compare, it approximate the density by use of simulations. We compare this method with the sample-based approach of Gourieroux and Jasiak (2016) , which uses past values in the forecasting step to approximate the conditional density. Table 4 shows the one-month ahead probabilities that the series will decrease (hence be lower than its last observed value) and the probabilities that the series will drop by more than 1 standard deviation (the standard deviations are calculated empirically over the whole sample). Forecasts are performed for January, February, March and April 2020 and results from the two prediction methods are reported for each of the detrended nominal series. We focus on the nominal series as they are the prices people observe and because the estimated models for real series are noticeably similar. 12 While we advocate the use of predictive densities to get the best picture of potential future prices, we choose 2 arbitrary probabilities to present for a matter of comparison and to save space. Nonetheless, the probabilities for any event can be computed from the methods used here, and they could for instance be employed in the construction of risk measures.

At the end of December 2019 oil prices were around $60 per barrel, they had been fluctuating around this price over the last three years. All detrending methods yield values for December that are above the 90 th percentile of the samples, suggesting high but not extreme levels. At that point in time, no international alerts regarding the risk of a pandemic had been made yet.

Probabilities that prices will drop in January are roughly 0.4 for all series and for both forecasting methods. However, probabilities that prices will drop by more than 1 standard deviation are at most 0.052. This confirms that crude oil prices are in a period of volatile and rather high prices, but it does not suggest a bubble behavior with a potential large drop. This can also be seen by the difference between the sample-based and simulationsbased predictions. show that discrepancies between the two approaches mostly arise during extreme episodes. Here, they do not differ by more than 3.3% for the probabilities of a decrease, and by no more than 0.9% for the probabilities of a sharper decrease.

At the end of January 2020, international alerts regarding the spread of the novel coronavirus had been made, which induced an unforeseeable drop in prices. Yet, the t 4 -detrended series only fell by half a standard deviation and the other two by 75% (resp. 80%) of a standard deviation for the Brent (resp. WTI) series. Values remained however above median values. Forecasts based on both methods suggest a continuity in the decrease for February with probabilities ranging from 0.76 to 0.88, yet, they indicate almost zero probability that the drop will be substantial (more than a standard deviation). They hence suggest a return to median values, meaning a return to fundamental prices. Both prediction methods again provide results diverging by no more than 3.3%. By the end of February 2020, mass gatherings started to be forbidden and the first advice for the quarantine of individuals to contain the spread of the virus had be made. The increasing worldwide pressure hence kept pushing prices down. Yet, no decrease in the detrended series was larger than 60% of a standard deviation, which was once again in line with the predictions. The series reached their median levels, forecasts for March suggested that series would remain stable around those values, yet favoring a further slight decrease as prices had been declining for the last three consecutive periods. Probabilities of a sharp drop decreased even more towards zero and both prediction methods yielded again similar probabilities.

In March 2020 the worldwide situation worsened significantly and the World Health Organization declared COVID-19 a global pandemic. Many countries imposed strict movement restrictions within and across borders, and curfews and lock-downs were implemented. This sudden drop in crude oil demand led to a considerable fall in prices, WTI prices fell by 55% and Brent prices by 71%. Values of the detrended series fell by more than 2 standard deviations and reached the 2 nd and 3 rd percentile for HP -, SP R-and t 6 -detrending. This indicates a negatively explosive episode, and therefore a negative bubble below fundamental prices. The t 4 -detrending values correspond to at least the 10 th percentile, suggesting a less extreme episode, compared to the previous behavior of the series. Until this point both predicting methods yielded similar probabilities. However, the discrepancy between the probabilities now attain 0.24 difference, where the simulations-based probabilities of a decrease are always larger than the sample-based probabilities. show that the discrepancies between the sample-and simulations-based approaches widen during explosive episodes. This is why probabilities for t 4 -detrending series are still very similar across the forecasting methods as opposed to the other detrending methods. They also show that the larger the lead coefficient, the more the sample-results tend to yield larger probabilities of a turning point than the ones computed with simulations. This stems from the fact that the series had attained a few times this point before (in 2008 and in 2015) and turned back towards median value. It is therefore, based on the learning mechanism of the sample-based approach, less likely that the series will keep on decreasing. It is important to notice that even though prices dropped significantly, probabilities that they will keep on decreasing are lower than before for HP -and t 6 -detrended series as well as for SP R-detrended Brent. However, compared to previous forecasts, probabilities now suggest that if the series actually kept on decreasing, it could likely be by more than 1 standard deviation as it has now entered an explosive episode. SP R-detrended WTI series has a larger probabilities of decrease than for the previous month, however, as can be notice, the probabilities of the sharper decrease for both SP R-detrended series are much closer to 0 than with other detrending. This stems from the larger degrees of freedom as well as larger lead coefficient and slightly lower lag coefficient. Figure 6 illustrates the evolution of the predictive densities of the HPdetrended Brent series over the time span. On the x − axis are the predictions and on the y − axis their corresponding probability density. The vertical dashed line corresponds to the last value, that is, in graph (a), the vertical line is the detrended value of Brent prices for December 2019. We can clearly observe the bi-modality of the distribution when the series deviates from median values, as shown for the forecasts of January and April, which exacerbate during the explosive negative episode. The range and shape of the density also explains the discrepancies between probabilities of a decrease and probabilities of a decrease of more than 1 standard deviation. 13

(a) January 2020 (b) February 2020 (c) March 2020 (d) April 2020 Figure 6 : One-step ahead predictive densities of HP -detrended Brent prices obtained with the sample-based prediction method.

To illustrate the valuable information provided by the predictive densities 13 Results for all other series, available upon request, follow a similar pattern.

of MAR models, graph (a) of Figure 7 depicts the predictive density for April 2020 using a Gaussian AR(2) model instead of an MAR(1,1) on HPdetrended Brent prices. The predictive density is obtained using the closedform of the conditional normal distribution. We can see that the mode of the density corresponds to a further decrease, but it now lacks the bi-modality and therefore does not suggest a return to central values as does the MAR predictive density shown on graph (d) of Figure 6 . As such, once the series enters a locally -here negative -explosive episode, the AR(2) only predicts a continuing decrease of the prices. Graph (b) of Figure 7 displays the sample-based predictive density of the SP R-detrended Brent series. We can see that the larger lead coefficient implies a lower rate of decrease (this can be seen as the distance between the two modes), but it indicates larger probabilities of a further decrease as large lead coefficients imply longer lasting explosive episodes. This is why the right mode, which corresponds to a return to central values has a much lower weight on the density. The MAR models employed here are univariate, hence no exogenous information is incorporated, as opposed to MARX models (see Hecq et al. (2020) and Hecq, Issler, and Voisin (2021) ). Disregarding exogenous variables facilitates forecasting but can sometimes lead to consequential lack of information. For instance, it is expected that crude oil prices should be lower-bounded as they cannot decrease indefinitely and become increasingly negative. Simulations-based probabilities cannot not take that into account as they are only based on the model estimated. Sample-based probabilities however, since prices have never become negative (or at least not long enough to be visible on monthly series), will tend to limit the probabilities that it will happen in the future, even without incorporating additional information within the model, based on its learning mechanism.

Overall, HP -and t 6 -detrending provide similar results both for estimation and predictions. SP R-detrending, as mentioned earlier yields slightly different dynamics which might stem from the dynamics that are inherent to the stock variable itself. Detrending with t 4 yields slightly different results for the estimation but which in turn yields quite different results for predictions. We saw in Figure 5 that detrending with a trend polynomial of order 4 induced different dynamics in the remaining cycle than the other mechanical detrending. This also corroborates the results found in Section 3 about the risks of underestimating the order of a polynomial trend on the dynamics of the series. We can also see in Figure 5 that the main differences between all detrending methods appear at the end of the sample. HP and SP R detrending are almost identical while t 6 provides slightly lower values.

On the other hand, t 4 -detrending yields significantly larger value than the others for the end of the sample.

To illustrate the difficulties and the limitations of detrending and forecasting in real time, we compare the results obtained in real time to the ones obtained in-sample for Brent prices with t 4 , t 6 and HP detrending. We did not include SP R detrending in this Section as we are interested in detrending methods that are affected by sample expansion and while with SP R detrending we still need to re-estimate the model at each point, the trend itself does not change. Table 5 shows the estimated MAR models for the expanding samples after each detrending. We can see that the expansion of the sample, even with the inclusion of the large drop of March 2020 did not affect the identification of the model nor the dynamics. Lead and lag coefficients vary by no more than 0.03. The estimated degrees of freedom of the Student's t distribution are rather stable until the data point of March is included, which induced decrease between 0.07 and 0.1 for all series, getting therefore closer to the parameter estimated ex-post. This stability in the estimation of the models suggest that probabilities should not significantly differ either.

To investigate the sensitivity of the detrending methods to the addition of new data points, Figure 8 shows how the detrended series vary based on the stopping point of the sample. The dashed line corresponds to the ex-post detrended series, hence when all data points until December 2020 are included. Then, the expanding samples are depicted from the light blue curve (sample stopping in December 2019) to the black curve (sample until March 2020). While detrending with t 4 induced the most spurious dynamics over the sample, it seems, as well as the HP filter, to be less affected by the addition of the new points than the t 6 -detrending. In graph (a), we can see that the 4 detrended series are almost identical, even once the point for March is added. In graph (c), corresponding to the HP -detrended series, we can see that the 3 first detrended series are almost identical but that the inclusion of March creates a slight shift in the detrended series. In this case also, the inclusion of even later points will induce further shifts of the estimated trend. However, for the polynomial trend of order 6, as depicted in graph (b), we can see that the inclusion of each point creates a noticeable shift in the estimated trend. From this, we expect the t 6 -detrended series to be the ones for which the probabilities differ the most from the in-sample probabilities. Indeed, even if the estimated model is almost identical, the substantial discrepancies between the real-time and ex-post detrended series may impact probabilities, especially during (mildly) explosive episodes. crease and the dashed lines are the probabilities of a decrease of more than 1 standard deviation. Graph (a) (resp. (b)) represents the sample-based (resp. simulations-based) probabilities. As expected, the simulations-based probabilities are the least affected by the re-estimation of the model at each point, since we did not observe significant alteration in the estimations. However, as shown in Figure 8 , it is indeed the t 6 -detrending that is the most sensitive to the expansion of the sample. Furthermore, we can see that mostly the probabilities of a decrease are affected, as the probabilities of a drop of more than 1 standard deviation are not significantly deviating from the in-sample probabilities. Overall, this indicates that real-time forecasting would have indicated on average lower probabilities of a decrease, at each point and for both approaches. Yet, it would have indicated equal, if not slightly higher, probabilities for the larger drop. Hence, probabilities of more extreme events, namely the tails of the predictive densities, seem to be the least affected by alteration of the trend.

(a)

Sample-based probabilities

Simulations-based probabilities Figure 9 : Evolution of in-sample (black solid and dashed lines) and real-time (orange solid and dashed lines) probabilities over time

Overall, it seems that the HP -filter is the least sensitive to the change of sample size within this analysis. Results with t 4 -detrending also emphasizes the risks of underestimating the order of the trend. Moreover, while the HPfilter and the polynomial trend of order 6 perform similarly in this analysis, assuming the order of a trend polynomial requires additional understanding regarding the deviations of the series from its fundamental trend. Deter-ministic trends appear also to be more sensitive to the addition of points in a real-time exercise than the HP filter. Furthermore, while simulationsbased probabilities are not characterized by the learning mechanism of the sample-based approach, they are less affected by expanding samples, as long as the model estimated remains consistent. However, as mentioned earlier, with a model that lacks exogenous information, the sample-based approach relying more on past behavior can potentially offset the shortcomings.

This paper aims at shedding light upon how transforming or detrending a series can substantially impact predictions of mixed causal-noncausal models.

Assuming a polynomial trend of order 4 for WTI and Brent series probably alters the dynamics in the remaining cycle. The HP filter (with penalizing parameter λ = 129 600) does not require any further assumptions with respect to the trend and can therefore be an adequate filter in cases where the trend is unknown. Knowing the actual trend or using exogenous variables for it is also not straightforward. We use US crude oil strategic petroleum reserves (SPR) to detrend oil price series to illustrate this option. We show that by detrending with SPR we obtain similar results to the HP and polynomial trend of order 6 detrending. However, detrending with a variable that has seasonality or dynamics will alter the dynamics left in the cycle. Overall, caution is needed when detrending a series, and some filtering such as polynomial trends may require additional understanding regarding the deviations of the series from its fundamental trend. Nonetheless, once the series is detrended, resulting in a stationary series, using MAR models is a straightforward approach to model nonlinear time series. They capture the locally explosive episodes observed in oil prices in a strictly stationary setting. While the bi-modality of the predictive density would not be detected with standard Gaussian ARMA models, it could be detected with complex nonlinear models, but such model lacks the parsimonious characteristic of MAR models. The data-driven prediction methods may lack theoretical grounds but provide valuable information based on the estimated model and on past behaviors of the series in a parsimonious way. This paper focuses on one-step ahead predictions of decrease in crude oil prices during the first wave of the COVID-19 pandemic.

The focus of this paper is on predicting probabilities of turning points, for example the probabilities of a crash or of entering a positive or a negative bubble. For such inquiries, density forecasts are therefore more adequate than point forecasts. However, the anticipative aspect of MAR models complicates their use for predictions. An MAR(r,s) model can also be expressed as a causal AR model,

where u t is the forward-looking component of the error term. Hence, it can itself be expressed in a purely noncausal AR process,

If the model is correctly identified and the parameters consistently estimated, it is therefore sufficient to forecast the purely noncausal process u t to forecast the variable of interest y t . However, only a few specifications admit a closed-form conditional density (see for instance Gouriéroux and Zakoïan, 2013 for the MAR(0,1) Cauchy-distributed process). The assumption of other fat-tail distributions, such as Student's t, can lead to the absence of closed-form expressions for the conditional moments and densities. Two approximations methods have been developed to estimate these predictive densities, for any distribution, also allowing for a larger lead order. The first method, based on simulations, was developed by Lanne et al. (2012) . The second approach uses the information carried by the sample and was developed by Gourieroux and Jasiak (2016) . For a detailed description and guidance in using those approximations methods, see . We focus on processes with a unique lead as this is what we identify on the WTI and Brent series in Section 4.

The purely noncausal component of the errors, u, assumed with one lead, can be expressed as an infinite sum of future error terms in its MA representation. Lanne et al. (2012) base their methodology on the fact that there exists an integer M large enough so that any future point of the noncausal component can be approximated by the following finite sum,

for any forecast horizon h ≥ 1, and where ψ is the lead coefficient of the purely noncausal MAR(0,1) process u t .

Let ε * (j) + = ε * (j)

T +1 , . . . , ε * (j)

T +M , with 1 ≤ j ≤ N , be the j -th simulated series of M independent errors, randomly drawn from the chosen distribution of the process with estimated parameters (whose probability density function (hereafter pdf ) is denoted by g). We are interested in the conditional cumulative probabilities,

where the indicator function 1() is equal to 1 when the condition is met and 0 otherwise. The variable y * T +h is replaced by an approximation using recursive substitution of its companion form with truncation parameter M and the following stacked form, Given the information set known at time T, the indicator function in (7) is only a function of the M future errors, ε * + . Let us denote this indicator function by q(ε * + ). Assuming that the number of simulations N and the truncation parameter M are large enough, the conditional cumulative probabilities of MAR(r,1) processes can be approximated as follows (Lanne et al., 2012) , compared to theoretical probabilities as no closed-form expressions exists. In such cases an approximation of theoretical results can be derived using the simulations-based approach presented above to gauge how much of the probabilities are induced by the underlying process and by past behaviors.

For values around the median of the series, both methods yield identical results. Discrepancies widen as the level of the series increases. Additionally, the larger the lead coefficient, the more the sample-based method tend to overestimate probabilities of a crash . That is, for low lead coefficients, they on average yield very similar results, even for explosive episodes, while for large lead coefficients probabilities induced by the two methods can be considerably different. Overall, both methods depend on the whole sample since they both depend on the estimated coefficients. Hence a wrong detrending would affect both methods. Overestimating the lead coefficient for instance would imply lower probabilities of a crash. For Cauchy distributed processes, one-step ahead probabilities of a crash tend to (1 − ψ) during explosive episodes. Thus, identifying a model with a lead coefficient of 0.9 instead of 0.7 for instance would induce a 20% difference in the theoretical probabilities. The sample-based probabilities could be even more distorted based on past behaviors, or on the contrary past behaviors could potentially alleviate the impact on the wrong detrending, but this is case-specific. This is why it is important to investigate the effects of various detrending methods on model identification and on the estimation of the dynamics. Note that formulas for higher lead orders can be found in the respective articles of Lanne et al. (2012) and Gourieroux and Jasiak (2016) .

trending method are shown in the columns 'wrong MAR' of Table 2 . Hence, proportions of correctly identified models range between 76.76% and 96.3% of the 5 000 replications, but are almost always above 90%. Figure 10 reports the box plots of estimated coefficients for the purely noncausal (left column) and mixed causal-noncausal (center and right columns, for the lag and lead coefficients respectively) processes after each of the four detrending approaches is applied. We indicate the true coefficients, 0.6 and 0.8 for the lag and lead respectively, by the vertical dotted line. The box plots indicate the minimum, maximum, the interquartile range and the median. The HP 1 -filtered series (with λ = 14 000) are on average characterized by lower estimated lead and lag coefficients than the other detrended series. This is due to the low penalization of the filter, capturing too much of the dynamics, reducing the persistence of the true noncausal process. Furthermore, we can see that using polynomial trends does not affect estimations of the coefficients, on average, as long as the order of the trend estimated is at least that of the true trend. That is, underestimating the order of the trend leads to an alteration of the dynamics and in our case, to more persistent noncausal dynamics. The HP 2 filter performs similarly to t 6 , but we can expect that if the true trend was a higher order, HP 2 would perform better. The constructed linear trend with breaks leads to much larger noncausal coefficients for all detrending methods. The second break in the trend mimics the crash of a bubble and the long expansion preceding it leads to the identification of the model with a larger lead coefficient, which corroborates the earlier findings. Importantly, lag coefficients are on average correctly identified (the distributions of the estimated degrees of freedom, available upon request, show that they are not significantly affected by the detrendings either). A wrong detrending therefore mostly affects the noncausal dynamics of the processes. For the simulations-based approach (sims.) the truncation parameter M = 100 and 1 000 000 simulations were used. Standard deviations (s.d.) are calculated over the detrended samples.

What do we learn from the price of crude oil futures

Forecasting the price of oil

International evidence on the historical properties of business cycles

Forty years of oil price fluctuations: Why the price of oil may still surprise us

Mixed causal-noncausal autoregressions: Bimodality issues in estimation and unit root testing

Comparing tests for identification of bubbles

Maximum likelihood estimation for noncausal autoregressive processes

Booms and busts in commodity markets: bubbles or fundamentals?

Cointegration and tests of present value models

Detrending and business cycle facts

Bootstrapping noncausal autoregressions: with applications to explosive bubble modeling

Detecting co-movements in non-causal time series

Explosive rational bubbles in stock prices?

Conditional moments of noncausal alpha-stable processes and the prediction of bubble crash odds

Mixed causal-noncausal AR processes and the modelling of explosive bubbles

Filtering, prediction and simulation methods for noncausal processes

Stationary bubble equilibria in rational expectation models

Explosive bubble modelling by noncausal process. CREST

On uniqueness of moving average representations of heavy-tailed stationary processes

Local explosion modelling by non-causal process

Why you should never use the Hodrick-Prescott filter

That's the limit! evaluation of the Brazilian inflation targeting system using mixed causal-noncausal models

Mixed causal-noncausal autoregressions with exogenous regressors

Simulation, estimation and selection of mixed causal-noncausal autoregressive models: The marx package

Forecasting bubbles with mixed causalnoncausal autoregressive models

Noncausal autoregressive model in application to bitcoin/USD exchange rates

Postwar US business cycles: an empirical investigation

Testing for speculative bubbles in stock markets: a comparison of alternative methods

Dynamic modeling of commodity futures prices

Not all oil price shocks are alike: Disentangling demand and supply shocks in the crude oil market

The role of inventories and speculative trading in the global market for crude oil

Does drawing down the US Strategic Petroleum Reserve help stabilize oil prices?

The Econometrics of Oil Market VAR Models

Optimal forecasting of noncausal autoregressive time series

Noncausal autoregressions for economic time series

Noncausality and the commodity currency hypothesis

Explosive behavior in the 1990s Nasdaq: When did exuberance escalate asset values? International economic review

The present value model of rational commodity pricing

On adjusting the Hodrick-Prescott filter for the frequency of observations

The authors would like to thank Francesco Giancaterini, an anonymous referee and the editors for valuable comments and suggestions. All remaining errors are ours.

By computing its value for all possible x covering the range of potential values for y * T +h , we can obtain the whole conditional cummulative density function (hereafter cdf ) of y * T +h . show that with Cauchy-distributed errors, this approach is a good estimator of theoretical probabilities but are significantly sensitive to the number of simulations N chosen. For Student's t distributions however, results cannot be compared to theoretical ones, but as the number of simulations gets very large, the derived densities converge to a unique function. Moreover, analogously to theoretical probabilities, once the series has significantly departed from its central values and diverges, the probabilities of a crash at a given horizon tend to a constant.

As an alternative to using simulations, Gourieroux and Jasiak (2016) employ all past observed values of the noncausal process. The predictive density function of a purely noncausal process with one lead is approximated as follows,where g is the pdf of the assumed errors distribution.With this method, the predicted probabilities are a combination of theoretical probabilities and probabilities induced by past events. Results are therefore case-specific and are based on a sort of learning mechanism . If this method is used when errors are Cauchy distributed, results can be compared to the theoretical predictive distribution to evaluate the influence of past behaviors on the obtained probabilities. However, if the errors follow a Student's t distribution for instance, results cannot be

We now investigate the persistence of the dynamics from the magnitude of the estimated coefficients. For instance, a lower lead coefficient will indicate shorter lived bubbles compared to the true generated process and thus increases the probabilities of a crash during an explosive episode. The same goes for larger degrees of freedom when the errors follow a Student's t distribution: larger degrees of freedom correspond to thinner tails, and thus rarer extreme values and thus makes less probable long lasting explosive episodes.We investigate the distribution of the estimated coefficients given a correctly identified model. Frequencies of wrongly identified models per dgp and de-