key: cord-0793872-258fygq9
authors: Weng, Futian; Zhang, Hongwei; Yang, Cai
title: Volatility forecasting of crude oil futures based on a genetic algorithm regularization online extreme learning machine with a forgetting factor: The role of news during the COVID-19 pandemic
date: 2021-05-21
journal: Resour Policy
DOI: 10.1016/j.resourpol.2021.102148
sha: ee5ec03e3e45f538d69aaf73bb8335368418a73a
doc_id: 793872
cord_uid: 258fygq9

The outbreak of news and opinions during the COVID-19 pandemic is unprecedented in this age of rapid dissemination of information. The ensuing uncertainty has led to the emergence of heightened volatility in prices of crude oil futures. Whether such news has predictive value for the volatility of crude oil futures during the COVID-19 pandemic is examined in this research. We proposed a modeling framework, genetic algorithm regularization online extreme learning machine with forgetting factor (GA-RFOS-ELM), to estimate the effects of news during the COVID-19 pandemic on the volatility of crude oil futures. GA-RFOS-ELM could learn block-by-block with fixed or varying block size when considering the block own valid period. The experimental results illustrate that news during the COVID-19 pandemic has more predictive information, which is crucial for short-term volatility forecasting of crude oil futures. The novel approach illustrates that online update learning ability is needed during the COVID-19 pandemic, which could be effective and efficient in volatility forecasting of crude oil futures. The contributions of our study are significant for investors and administrators to predict and understand the behavior of volatility during the COVID-19 pandemic.

As information technology is upgraded substantially, flows of news and opinions are essentially instantaneous. Currently, the use of mobile devices to interact and make financial decisions based on the news and opinions has become a feasible and conventional trading strategy (Groß-Klußmann and ) . More recently, the outbreak of coronavirus (COVID-19) attracted widespread attention from media worldwide, which was heavily inclined towards emphasizing the severity (Blendon et al., 2004; Mairal, 2011; Young et al., 2013) . The news and opinions related to coronavirus can cause public panic and influence investors' sentiments (Tetlock, 2007) . In particular, the crude oil future market suffered huge losses due to the COVID-19 pandemic and overwhelming related news. For example, West Texas Intermediate (WTI) crude oil futures were trading at $25.16 per barrel, with Brent at $27.91 on March 19, 2020, a historic low since 2002. Even more shockingly, an unprecedented event occurred in which the crude oil prices plunged to zero for the first time and then to negative values at -$37.63. This was a rare occurrence in the global crude oil trading market and highlighted the acute imbalance between supply and demand and the imminent collapse of inventories, which exert enormous impacts on each participant in the market.

The impact of news reports on the crude oil market has attracted increasing attention of scholars in the last few years. The consensus is that news has predictive power and cannot be ignored when predicting crude oil future market dynamics, especially in times of economic uncertainty (Narayan, 2019) . In fact, the price pressure hypothesis pointed out that individual investors do not have enough time, knowledge, experience, or energy to examine all crude oil futures, and that they generally buy oil futures that attracting their attention or are being widely discussed (Nofsinger and Sias, 1999; Barber and Odean, 2008) . It can be reasonably concluded that news could attract investors' attention and tend to generate abnormal returns (Takeda and Wakao, 2014) . Furthermore, the rationale behind network analysis emphasized that individual investors rely primarily on news feedback strategies to judge the future prospects of stock (Bange, 2000) .

Considerable news is generally related to major events, and the news caused by major events can break out in an instant. On the one hand, major events could affect the crude oil future market. Previous studies have identified crude oil market responses to disasters, political events and so on (Kowalewski and Ś piewanowski, 2020; Bash and Alsaifi, 2019; Shanaev and Ghimire, 2019) . Furthermore, research on the effects of pandemic diseases on the crude oil future market is scant, especially with respect to the more recent COVID-19 contagious infectious disease. The COVID-19 pandemic has endured for a long time, and the investment and business environments are bombarded with a mass of public news related to the COVID-19 pandemic. A growing body of empirical and theoretical studies has proven that news during the COVID-19 pandemic was valuable in volatility forecasting of crude oil futures. However, to the best of our knowledge, the more challenging question of how much predictive power news exhibits is not yet satisfactorily answered.

For example, some scholars attempted to examine the relationship between the stock market returns and COVID-19 related news (Albulescu, 2020; Baker et al., 2020; Lopatta et al., 2020; Onali, 2020) . Additionally, COVID-19 related news is associated with volatility in the stock markets (Haroon and Rizvi, 2020) . (Cepoi, 2020) argued that the stock markets exhibited asymmetric dependencies with COVID-19 related news. However, earlier literature concentrated predominantly on the role of news during the COVID-19 pandemic with respect to the stock markets, the prediction ability and application of which remain to be further explored: especially the crude oil futures. This study works hard to close the gap by focusing on volatility forecasting of crude oil futures, which acts as a barometer of stress, financial risk, or uncertainty of financial investments. This study is profound for all stakeholders surrounding the crude oil market to understand the behavior of the crude oil market during the COVID-19 pandemic.

Various classic econometric and statistical models have been adopted to forecasting volatility in crude oil market, such as the GARCH model (Herrera et al., 2018) , the heterogeneous autoregressive (HAR) model (Luo et al., 2019) , the vector autoregression (VAR) model (Salisu and Oloko, 2015) and the Markov models (Zhang and Wang, 2015) . Taking into account the non-linear and non-stationary patterns implicit in the crude oil price series, artificial intelligence models are used to predict crude oil volatility. For example (Huang et al., 2004) , proposed the extreme learning machine (ELM) method, which performs well in predicting the non-linear and non-stationary time series. And it has been widely used in the study of short-term price fluctuations in the oil market (Wang et al., 2018) . Then proposed an online sequential extreme learning machine (OS-ELM), which is effective in non-linear and non-stationary time series forecasting (Pan and Zhao, 2013) . However, the parameters in the two hidden-layers are randomly set (Zhang et al., 2020b) , combines OS-ELM and genetic algorithm (GA) to search for the optimal parameters of the hidden layer, called genetic algorithm online sequential extreme learning machine (GA-OS-ELM). Afterwards, GA-OS-EM has been widely used to identify the crack behavior in concrete dam (Dai et al., 2019) , fault diagnosis (Zhang et al., 2020b) and gold price forecasting (Weng et al., 2020) .

Based on the analysis, we proposed a prediction modeling framework for forecasting the volatility of the crude oil futures market by using news during the COVID-19 pandemic. News during the COVID-19 pandemic comes from real-time media analytics, which capture panic, media hype, fake news and national sentiment. The timeliness aspect of news during the COVID-19 pandemic is considered: that is, each datum has its own valid period, and the proposed online sequential extreme learning machine (OS-ELM) could learn block-by-block with fixed or varying block size (Liang et al., 2006) . Additionally, with the ongoing coronavirus outbreak, the nonlinear and nonstationary phenomenon of crude oil price exhibits critical features, as determined from Fig. 2 . Therefore, the new data should increasingly contribute to representing the impact on recent investor behavior, and the old data should contribute less. In such cases, the validity of the outdated training datum used is lost after several units of time, and the datum should be abandoned in the following learning. Therefore, the forgetting factor is introduced in this work to gradually eliminate the outdated datum that could become potentially misleading news, which is more suitable for the timeliness aspect (Paleologu et al., 2008) . Furthermore, according to the huge changes in the parameters of different data in different periods, there are some problems such as output instability and output matrix singularity (Man et al., 2011) . In this research, the OS-ELM algorithm is added by a regularization factor, which could ensure and improve the stability and generalization ability of the OS-ELM algorithm (Huynh and Won, 2011; Weng et al., 2020; Chen et al., 2020) . Therefore, the structural and empirical risk could be lessened and balanced. In order to confront these challenges, this paper proposes the novel genetic algorithm regularization online extreme learning machine with forgetting factor (GA-RFOS-ELM). In detail, in the initial stage, this algorithm could select the optimal weight matrix and hidden neuron threshold by adding the genetic algorithm. In the next stage, this algorithm improves the stability and generalization ability by using a regularization factor.

In order to investigate the effects and predictive power of news during the COVID-19 pandemic on the volatility of crude oil futures, this paper uses the COVID-19 news related variables of real-time media analytics about announcements describing essential coronavirus issues, such as the panic index (PI), the Media Hype Index (HY), the Fake News Index (FNI), and the Country Sentiment Index (CSI), which come from StockTwits, Dow Jones Newswire or The Wallstreet Journal, among others (Blitz et al., 2019) . With illustration and verification, the empirical results indicate that the proposed GA-RFOS-ELM model statistically outperforms all considered benchmark models considered in terms of forecasting accuracy. The empirical findings indicate that news is valuable for volatility forecasting of crude oil futures, which is consistent with the existing literature (Haroon and Rizvi, 2020) . In particular, the forgetting mechanism is appropriate and necessary for volatility forecasting of crude oil futures by using news during the COVID-19 pandemic.

In general, the main contributions of this research are as follows. First, we propose a novel OS-ELM model integrating regularization and genetic algorithm with forgetting factor by including news during the COVID-19 pandemic to further improve the accuracy of prediction for the future volatility. Offline learning ability was taken into consideration for some existing studies using the ELM model. However, news during the COVID-19 pandemic is seldom considered in the time-variant system due to the limited period of validity of each datum of news during the COVID-19 pandemic. Moreover, news during the COVID-19 pandemic may vary considerably on a daily basis. It is almost impossible to exactly predict the volatility when the model considers responses of time-varying nature. Furthermore, existing studies about OS-ELM models rarely consider other additional variables (i.e., news during the COVID-19 pandemic). Second, we evaluate the role of news during the COVID-19 pandemic on volatility forecasting of crude oil futures by respectively comparing the forecasting performances of the various models with those of different benchmarking prediction models, which highlights the superiority and robustness of the proposed GA-RFOS-ELM model. This study offers novel insight into the impact of news during COVID-19 pandemic volatility forecasting of the crude oil market and enlarges our understanding, which is in fact consistent with the finding of (Cepoi, 2020) . Third, we extend the research about the responses of stock market returns to major unconventional emergencies, especially the interaction of the pandemic diseases with stock returns. The primary measure of the COVID-19 pandemic is the number of infected cases (Al-Awadhi et al., 2020) . However, we suggest that COVID-19 related news would be a good measurement of the COVID-19 pandemic to estimate the effect on crude oil future market returns, because it may affect the expectations of individual investors regarding future crude oil futures.

The remainder of this research is organized as follows. Section 2 presents the existing OS-ELM model and develops the GA-RFOS-ELM model. Section 3 provides the data of the measures of various variables, such as crude oil futures volatility and COVID-19 related news. Section 4 describes the empirical results and interpretive analysis about the forecasting performance of the GA-RFOS-ELM model. Section 5 discusses the conclusion, limitations, and future work.

GA is a search heuristic approach motivated by natural evolution theory, an operational model of global search based on probability conversion Holland et al. (1992) . The notion of a natural selection process can be used for a search problem: that is, selecting the best one from a series of solutions for a problem Mitchell (1998) . There are several phases in a genetic algorithm, including coding, initial population, fitness function, and genetic operator.

The chromosome, composed of gene arrangement, is the starting material for biological genetics research. The first step of a genetic algorithm is coding, which abstracts the issues into a string of specific symbols through some certain mechanism. The binary system is a common coding method for the genetic algorithm, where the corresponding real value in the problem interval is converted into a binary string (b i b i− 1 …b 0 ), where i denotes the number of binary encoding bits.

After coding, a genetic algorithm usually uses a random method to generate a set of individuals as the initial population. The quality of the genetic algorithm is estimated by the fitness function for the individual (solution). A larger fitness function value indicates a better solution quality. The fitness function is the driving force of the genetic algorithm evolution process, which is generally determined by combination with the requirements of solving, and can be set as the error function.

The genetic operator is adopted to select some individuals from the parent population and inherit them to the next-generation population. This operator utilizes the roulette selection approach, known as the proportional selection operator, which makes the probability of each individual being selected as the positive value of its fitness function. Supposing that the population size is n and F i denotes the fitness function of individual i, then the probability P i of individual i which will be selected to the next-generation population is:

Specifically, reproduction, crossover, and mutation are the basic operations of a genetic algorithm. The replication operation can select the best chromosome from the previous population, and it cannot be creative. The crossover operation is able to produce new excellent varieties through chromosome exchange. In addition, mutation simulates gene mutation caused by various accidental factors in natural genetics and randomly changes the value of the genetic gene with a certain probability. In the binary coding of chromosomes, mutation randomly transforms genes from 1 to 0, or from 0 to 1. Mutation can prevent the model from falling into the local optimum and terminating the course in the early stage of the operation process. Furthermore, it can obtain the optimal solution with high quality in the largest possible solution space.

The online sequential extreme learning machine (OS-ELM) is a fast and effective algorithm which can update network parameters through an online learning mechanism (Liang et al., 2006) . This algorithm primarily contains two parts. The first part is the initial stage, that is, to obtain the output weight β of the model using a small-scale dataset. The output weight β of the model learned in the initial part will be updated through a fixed or varying chunk size learning mechanism in the online learning stage.

Thus, the OS-ELM model can be described as the following processes:

(1) Initialization stage.

Step 1 Determine the initial input weights a i and hidden layer threshold b i , i = 1, ⋯, L randomly based on an initial

Step 2 Obtain the initial matrix between the hidden and output layer H 0 .

Step 3 Calculate output weight β (0) .

(2) Sequential update stage.

Step 4 Calculate the output matrix of hidden layer H 0 based on Equ.2

Step 5 Calculate the output of the model: t k = h k β k− 1 .

Step 6 Update the output weight β (k) .

⋅ Chuck-by-chuck learning mechanism:

⋅ One-by-one learning mechanism:

Step 7 Return to step 5.

The OS-ELM model involves the calculation of inversion in the update process, for which it has been proven that its generalization ability will be seriously reduced once a singular or ill-conditioned hidden layer matrix appears (Huynh and Won, 2011) . To overcome this challenge, we develop a novel hybrid approach (GA-RFOS-ELM) to forecast the volatility of crude oil prices.

Previous researchers have shown that the stability of the extreme learning machine (ELM) can be effectively improved by obtaining a high-quality feature map in the first stage (Huang et al., 2015) : the same is true for the OS-ELM model. Therefore, we introduce the genetic algorithm to determine optimal input weight and the threshold of the hidden layer in the initial stage of online learning.

In the beginning, the optimal input weight and threshold of the hidden layer are obtained through the genetic algorithm. In detail, the solution of crude oil futures volatility can be considered as a population in the genetic algorithm. We regard the input weight and hidden layer bias as the gene of the chromosome. In addition, we use the sum of absolute errors to measure the fitness function.

where N 0 is the number of an initial sample set in OS-ELM, and X i and X i express the actual values and output of the model, respectively. Therefore, the optimal solution of the input weight and hidden layer bias problem can be converted into the objective of reducing the fitness function and selecting the best chromosome. That is, we obtain the weights of input layer a i and hidden layer bias b i by genetic algorithm, not randomly.

More importantly, after adding the forgetting factor and regularization mechanism, the loss function of the model is as follows (Celaya and Agostini, 2015; Guo et al., 2018) :

where l is the forgetting factor parameter, and δ denotes the regularization coefficient. According to literature (Guo et al., 2018) , β k in the sequential update stage of the GA-RFOS-ELM algorithm can be deduced.

Thus, suppose that there are N random training samples (

The GA-RFOS-ELM algorithm can be described as the following steps:

(1) Initialization stage.

Given an initial training subset S k− 1 = {(x j , t j )} N k− 1 j=1 , with the number of L hidden layer neurons.

Step 1 Using the genetic algorithm to calculate the input weights a i and hidden layer threshold b i based on an initial training dataset.

Step 2 Determine the output matrix of hidden layer H k− 1 .

Step 3 Obtain the output weight β k− 1 .

(2) Sequential update stage.

Step 4 Calculate the output matrix H k of the hidden layer.

Step 5 Calculate the predicted crude oil volatility output of the model:

Step 6 Update the output weights β k according to Equ.8.

Step 7 Return to step 5.

The forecasting framework of GA-RFOS-ELM is represented in Fig. 1 . Previous price volatility and several indices that quantify the influence of the COVID-19 pandemic are considered as input variables to predict future crude oil price volatility. After correlation analysis, the ultimate input variables are determined. In the initialization stage, the genetic algorithm is exploited to optimize initial parameters and calculate initial output weight, and therefore the initial network could be determined. In the sequential update stage, the mechanism of the GA-RFOS-ELM model is to used predict while actually learning. Once the price volatility at time k is predicted, the variables and actual values before time k + 1 are used to update the network until all forecasts of price volatility are output.

It is not certain which criteria are more appropriate to evaluate the predictions of volatility models (Lopez, 2001) . We choose four different functions, including the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) (Wang et al., 2018) , and median error (MdE).

The effectiveness of the proposed forecasting model is validated with respect to the volatility forecasting of the crude oil dataset from WTI. We use the daily crude close prices from January 3, 2006, to July 24, 2020, which were obtained from the Wind website. The absolute returns of crude price r t are employed as the volatility: that is, r t = 100 × [ln(P t /P t− 1 )], where P t indicates the close price on day t (Wang et al., 2018) . Fig. 2 illustrates the volatility of crude oil prices during 2006-2020, and the red line denotes the node of the COVID-19 pandemic, which shows that huge fluctuations in the markets occur during the novel coronavirus pandemic.

It is obvious that some events would exert fluctuations in crude oil exchange, especially with respect to the impact of the novel coronavirus epidemic on crude oil volatility. Moreover, four 19-nCoV news-related variables from January 1, 2020, to July 24, 2020, are introduced to forecast the crude oil price volatility for the first time. All of the data can be obtained from RavenPack (https://coronavirus.ravenpack.com/). Table 1 indicates the descriptive statistics for these four variables and crude oil volatility during the novel coronavirus, and these indices versus the future crude oil price volatility are represented in Fig. 3 .

The panic index (PI) estimates the degree of discussions in the media that reference terror and coronavirus disease. A higher quota value indicates more news related to panic occurring in the news. Fig. 3(a) compares the crude oil futures price volatility and PI, which shows that the fluctuations of price volatility are coincident with PI. These findings demonstrate that the PI index is correlated with the future volatility of crude oil prices.

The extent of news discussing coronavirus disease is evaluated by the Media Hype Index (HY). The index values range between 0 and 100. A value of 5 means that 5 percent of news is discussing the novel virus. Fig. 3(b) represents the HY index versus the crude oil futures price volatility, which expresses greater coincidence between HY and the future oil price volatility during the middle and later periods.

The Fake News Index (FNI) estimates the extent of discussion in the media about COVID-19 that refers to 19-nCoV disinformation. Values are also defined between 0 and 100. The higher index indicates that more fake news appears. The FNI index is multiplied by 10 to make observation easier and enable comparison with the volatility of future crude oil prices. The high correlation between FNI and future crude oil price volatility is displayed in Fig. 3(c) .

The degree of sentiment about substances referred to in the media alongside COVID-19 is estimated by the Country Sentiment Index (CSI). Values range between 0 and 100, which indicates the extent of sentiment. Different from the previous three indices, the CSI index appears to exhibit a significantly consistent negative correlation with the future volatility (see Fig. 3(d) ).

In this study, we removed abnormal data, such as the crude oil price on April 20, 2020, the day on which the crude oil price fell off a cliff. To make the result more reliable, the obtained dataset is divided into two parts. One part is from January 3, 2006 to December 31, 2019, which represents the dataset before the novel coronavirus, and the other indicates the epidemic period from January 1, 2020, to July 24, 2020.

Market participants generally care more about how well they can do in the future through the volatility forecasting approaches. Thus, the crude oil price volatility of the previous five days is regarded as the input variable to predict future volatility before January 1, 2020. During the 2019-nCoV, four quantitative indicators (PI, HY, FNI, and CSI) related to novel coronavirus pandemic news are appended to the input variables to verify their roles in the future crude oil volatility forecast. To make the results more reliable, we split these datasets into two groups, with 70% forming the training set, and the rest used as the test set. It is common to divide the training and testing datasets as 7:3 (Alpaydin, 2020). The input datasets are normalized to speed up the convergence, and the Sigmoid (Equ.16) is chosen as the activation function in our models.

GA is introduced to obtain the optimal initial weights of input layer and hidden layer bias in the GA-RFOS-ELM model, the relevant parameters of which are illustrated in Table 2 . Actually, there is no exact method to determine the parameters of the genetic algorithm. All parameters are determined through related literature (Leardi et al., 2002; Chung and Shin, 2020; Zhou et al., 2020) and experiments. We adopt binary encoding and set the size of the population as 20. Crossover and mutation probability are the most important parameters of the genetic algorithm, determined as 0.9 and 0.1, respectively. The terminal condition is the maximum iteration: that is, the optimization will be stopped when reaching the maximum iteration.

The number of hidden layer neurons is significant for the SLFN model and for the GA-RFOS-ELM crude oil forecasting model. Through cross-validation, this value is ultimately determined as 30. The forgetting factor parameter is set to 0.95 according to (Soares and Araújo, 2016) .

In this paper, the GA-RFOS-ELM method proposed above is compared with several econometric models and machine learning algorithms, including autoregressive (AR), regression trees (RT), Bayesian regression (Bayes), support vector regression (SVR), extreme learning machine (ELM), online sequential extreme learning machine (OS-ELM) and genetic algorithm online sequential extreme learning machine (GA-OS-ELM). Among which, as an econometric model, AR is usually employed to solve time series problems. RT, Bayes, SVR are classic machine learning models based on different theories, which are commonly used as benchmark methods of machine learning. In addition, we consider ELM and OS-ELM, two significant algorithms in the development of the ELM algorithm family, which are also the basic methods of our proposed model in this paper. At the same time, GA-OS-ELM is introduced as a benchmark approach to compare the model we proposed. All of the above models are executed on a Dell server with 16 GB RAM and implemented in Python.

To sum up, to verify the validity of models and the effect of COVID-19 news on crude oil price volatility, simulation experiments are implemented through three different datasets. Dataset 1 represents crude oil volatility from January 3, 2006, to December 31, 2019, which considers the crude oil price volatility of the previous 5 day as the input variables. Dataset 2 includes the crude oil volatility from January 1, 2020, to July 24, 2020, during coronavirus disease, which also regards the crude oil price volatility of the previous 5 day as input variables. Different from dataset 2, four quantitative indicators, PI, HY, FNI and CSI from January 1, 2020, to July 24, 2020, are considered in prediction 

Parameters of genetic algorithm.

Size of population 20 Crossover probability 0.9 Mutation probability 0.1 Maximum number of iterations 100 Terminal condition 100 models. Therefore, a total of nine indicators are considered as input variables for dataset 3. Figs. 4-6 show the evolution curves of the genetic algorithm for these three datasets. Error has been convergent within 100 iterations for the three different datasets through the genetic algorithm, which indicates that the optimal parameters are obtained. We consider the sum of L2 error as the fitness function; therefore, the error of dataset 1 is relatively larger than the other two due to its larger sample size (see Fig. 4 ). Actually, the genetic algorithm is used to obtain a better parameter instead of random determination, which mainly focuses on whether its error converges.

AR, RT, Bayes, SVR, ELM, OS-ELM and GA-OS-ELM are considered as benchmark models to compare with our proposed GA-RFOS-ELM model. The order of the AR model is determined by minimizing the Akaike information criterion (AIC) (Akaike, 1974) . The minimal cost-complexity pruning algorithm Loh (2011) is used to prune a tree to avoid fitting for RT. Four hyperparameters α 1 , α 2 , λ 1 and λ 2 of the gamma prior distributions in the Bayes model are usually chosen to be noninformative. We select the RBF kernel in the SVR model, where the proper choice of c and γ is critical to its performance. The best pair of parameters are obtained by means of a grid search. As for ELM and OS-ELM methods, the number of hidden layers is selected as 30, which is the same as that for GA-OS-ELM and GA-RFOS-ELM. Moreover, the prediction accuracy of these three models, ELM, OS-ELM, GA-OS-ELM and our proposed GA-RFOS-ELM model, is obtained by running the models 200 times on average. Table 3 illustrates the performances of the loss functions, such as RMSE, MAE, and MdE test, for predictive accuracy between the proposed GA-RFOS-ELM model with seven other benchmark models on the test samples of dataset 1. It is obvious that the proposed GA-RFOS-ELM algorithm performs better than the other six benchmark models in terms of the performance measurements of the RMSE, MAE, MAPE and MdE scores, yielding values of 1.1693, 0.8924, 2.1639 and 0.6414, respectively. In terms of RMSE scoring performance, GA-OS-ELM and Bayes models are second only to GA-RFOS-ELM, with scores of 1.3533 and 1.4262, respectively. As for MAE and MdE scores, the SVR model is actually better than the other six. If time-consuming is not taken into account, this model is generally considered to be a good choice for solving various problems.

It is observed that the GA-RFOS-ELM model also produces more accurate forecasts than the other seven comparison models for dataset 2 (see Table 4 ). Compared with AR, RT, Bayes, SVR and ELM models, the results of OS-ELM, GA-OS-ELM and GA-RFOS-ELM are relatively similar. As mentioned above, dataset 1 and dataset 2 have the same input-output structure, both of which used the crude oil price volatility of the previous 5 day as input variables to predict the future volatility. However, Fig. 4 . Evolution curve of genetic algorithm for dataset 1. dataset 2 is a small sample dataset as compared with dataset 1, and the crude oil price fluctuated more sharply during the epidemic period. Nononline learning machine learning (ML) algorithms, like RT, Bayes, SVR and ELM, may benefit from large data sets and consider global information to obtain relatively high prediction accuracy. Nevertheless, they are usually unable to capture the potential change of crude price volatility with time. They must retrain the models when the potential distribution of the data changes dramatically. However, the crude oil price fluctuated more sharply during the COVID-19 pandemic. At the same time, global learning may not be suitable in comparison with local learning. These may be the primary reasons why the online models, OS-ELM, GA-OS-ELM and GA-RFOS-ELM, perform better than the other models for dataset 2. Table 5 demonstrates the forecasting and comparison results of the proposed GA-RFOS-ELM model and compared benchmark models, such as RT, Bayes, SVR, ELM, OS-ELM and GA-OS-ELM, for dataset 3. In terms of RMSE, MAE, MAPE and MdE scores, GA-RFOS-ELM yields the best performance for this task. In addition, OS-ELM, GA-OS-ELM and GA-RFOS-ELM are also better than four other nononline models. The AR model is more suitable for univariate prediction problems, and thus we did not consider it in this case.

Different from dataset 2, four indices about the news during the COVID-19 pandemic are combined with the input variable to predict the future price volatility of crude oil. The mark * in Table 5 denotes that the values of the loss function combined with news during the COVID-19 pandemic as input variables would achieve better performance. Obviously, combining novel coronavirus pneumonia data with the news, we confirm that models primarily outperform the basic models based on these four evaluation measures. This demonstrates that news about COVID-19 data could draw the attention and expectations of the crude oil futures market, thus serving an integral role in oil price volatility analysis and forecasting. (DIEBOLD and MARIANO, 1995) with modification suggested by (Harvey et al., 1997) . A small p value (usually less than 0.05) indicates that the proposed GA-RFOS-ELM model significantly outperforms the models named in the head-column.

The p values of DM testing of GA-RFOS-ELM and AR, RT, Bayes, SVR, ELM, OS-ELM and GA-OS-ELM are all below the significance level, which proves that the proposed GA-RFOS-ELM model outperforms other models in predicting the future price volatility of crude oil. These experiences show the superiority of the proposed model. Meanwhile, they prove the role of news during the COVID-19 pandemic.

To summarize, the 19-nCoV related news, which consists of a growing body of empirical and theoretical studies about public news sentiment, is more crucial for the volatility prediction of crude oil futures and was valuable with respect to stock returns. On the other hand, compared with the global learning, the prediction accuracy is lower when confronted with time-varying variables, which is more suitable for local learning with long and short term memory. These empirical results show that news during the COVID-19 pandemic affects individual investors' decisions, but news exhibits timeliness due to information explosion, which is also consistent with the basic cognition of human beings.

The influence of public news on the crude oil future market has attracted increasing attention over the last several years. There is a growing body of theoretical and empirical research on the relationship between social, economic or politically-driven news and changes in financial markets (Smales, 2014; Broadstock and Zhang, 2019; Shi and Ho, 2020) . This paper further studies how news generated by coronavirus-related events is associated with the volatility of crude oil futures. While the current COVID-19 pandemic was related with great losses by investors all over the word, the existing studies about crude oil future market and the COVID-19 pandemic are limited. Therefore, weighting the predictive power of the COVID-19 pandemic becomes a profound and urgent issue.

This paper establishes a novel model and offers empirical evidence about volatility forecasting of crude oil futures by COVID-19 related news. Employing the GA-RFOS-ELM model shows that the crude oil futures present dependencies with contagion, media coverage, fake news and other information related to the COVID-19 pandemic. This result suggests that COVID-19 related news affects the price of crude oil futures, which has a certain explanatory power for the volatility of crude oil futures. Additionally, these results also are parallel to the dependence between news during COVID-19 financial turmoil and stock market returns (Cepoi, 2020) . This is consistent with the study sentiment that COVID-19 related news is associated with volatility in the equity markets (Haroon and Rizvi, 2020) . Therefore, we suggest a need for governments and social media to more intensively utilize positive information and applicable interactive communication platforms to mitigate ongoing coronavirus outbreak related financial market turmoil. This paper offers two primary aspects of contribution. On the one hand, extending the rich literature of volatility forecasting, we introduce a novel GA-RFOS-ELM model. News during the COVID-19 pandemic exhibits timeliness, where new data attract more emphasis, while older data are gradually being forgotten. Considering the timeliness of COVID-19 related news, the GA-RFOS-ELM model strengthens the optimal search ability of the genetic algorithm. It may be effective and efficient to utilize a chunk-by-chunk learning mechanism with fixed or varying chunk size, which illustrates that online update learning ability is needed. Additionally, the crude oil price is nonlinear and nonstationary. Faced with these questions, this paper investigates the impact of the COVID-19 pandemic on volatility forecasting of crude oil futures by using the GA-RFOS-ELM model of news during the COVID-19 pandemic. These empirical results suggest that the forecasting performances of the model are superior to those of basic forecasting techniques. We demonstrate the power of news during the COVID-19 pandemic to improve the forecasting performance, which suggests that news during GA-OS-ELM 3.54e − 2 2.64e − 3 5.44e − 3 1.74e − 2 the COVID-19 pandemic is a practical way to assist the prediction of volatility in the crude oil market and quantify investor emotion. On the other hand, this paper adds to the research on commodity futures market response to the COVID-19 pandemic. There is plenty of evidence to suggest that news of major events generally contains important incremental predictive information on future market return (Albulescu, 2020; Zhang et al., 2020a; Al-Awadhi et al., 2020) . This paper concludes that COVID-19 related news provides vital clues about the volatility of crude oil futures. The research has significant implications. With respect to policy insight, the regulator must consider the impact of news during the COVID-19 pandemic in formulating policy measures to mitigate the turbulence and instability of the crude oil market. Regarding market insight, investors should be conscious of the degree of news during COVID-19 and analyze the potency of the future crude oil futures market return.

Generally, this study is valuable and helpful for establishing more accurate forecasting models for crude oil futures with internet information. In addition, the GA-RFOS-ELM model could be applied to predictions of other temporal variables or other markets, such as corn, copper and gold, thus providing other more accurate models for volatility forecasting in future research, especially considering other variables. Additionally, constructing other indices by text mining of COVID-19 related news would be of particular interest. 

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

A new look at the statistical model identification

Death and contagious infectious diseases: impact of the covid-19 virus on stock market returns

Coronavirus and Financial Volatility: 40 Days of Fasting and Fear arXiv preprint

Introduction to Machine Learning

The Unprecedented Stock Market Impact of COVID-19

Do the portfolios of small investors reflect positive feedback trading?

All that glitters: the effect of attention and news on the buying behavior of individual and institutional investors

Fear from uncertainty: an event study of khashoggi and stock market returns

The public's response to severe acute respiratory syndrome in toronto and the United States

Media attention and the volatility effect

Social-media and intraday stock returns: the pricing power of sentiment

Online em with weight-based forgetting

Asymmetric dependence between stock market returns and news during covid19 financial turmoil

A deep residual compensation extreme learning machine and applications

Genetic algorithm-optimized multi-channel convolutional neural network for stock market prediction

Improved online sequential extreme learning machine for identifying crack behavior in concrete dam

When machines read the news: using automated text analytics to quantify high frequency news-implied market reactions

Online sequential extreme learning machine with generalized regularization and adaptive forgetting factor for timevarying system prediction

Covid-19: media coverage and financial markets behavior-a sectoral inquiry

Testing the equality of prediction mean squared errors

Forecasting crude oil price volatility

Adaptation in Natural and Artificial Systems: an Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence

Trends in extreme learning machines: a review

Extreme learning machine: a new learning scheme of feedforward neural networks

Extreme learning machine: theory and applications

Regularized online sequential learning algorithm for singlehidden layer feedforward neural networks

Stock market response to potash mine disasters

Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from fourier transform-infrared spectral data

A fast and accurate online sequential learning algorithm for feedforward networks

Classification and regression trees

To Report or Not to Report about Coronavirus? the Role of Periodic Reporting in Explaining Capital Market Reactions during the Global Covid-19 Pandemic. The Role of Periodic Reporting in Explaining Capital Market Reactions during the Global COVID-19 Pandemic

Evaluating the predictive accuracy of volatility models

Forecasting realized volatility of agricultural commodity futures with infinite hidden markov har models

The history and the narrative of risk in the media

A new robust training algorithm for a class of single-hidden layer feedforward neural networks

An Introduction to Genetic Algorithms

Can stale oil price news predict stock returns?

Herding and feedback trading by institutional and individual investors

Covid-19 and Stock Market Volatility

A robust variable forgetting factor recursive least-squares algorithm for system identification

Online sequential extreme learning machine based multilayer perception with output self feedback for time series prediction

Modeling oil price-us stock nexus: a varma-bekk-agarch approach

Is all politics local? regional political risk in Russia and the panel of stock returns

News sentiment and states of stock return volatility: evidence from long memory and discrete choice models

News sentiment and the investor fear gauge

An adaptive ensemble of on-line extreme learning machines with variable forgetting factor for dynamic system prediction

Google search intensity and its relationship with returns and trading volume of Japanese stocks

Giving content to investor sentiment: the role of media in the stock market

Crude oil price forecasting based on internet concern using an extreme learning machine

Gold price forecasting research based on an improved online extreme learning machine algorithm

The influence of popular media on perceptions of personal and population risk in possible disease outbreaks

Financial Markets under the Global Pandemic of Covid-19

Fault diagnosis method of analog circuit based on ga-os-elm

Exploring the wti crude oil price bubble process using the markov regime switching model

Hybrid genetic algorithm method for efficient and robust evaluation of remaining useful life of supercapacitors

The authors gratefully acknowledge the financial support provided by the Natural Science Foundation of Hunan Province (2020JJ578) and the Innovation-Driven Project of Central South University (No. 2020CX049).