A Data Distribution-Based Ensemble Generation Applied to Wind Speed Forecasting

Almeida, Diogo M.; de Mattos Neto, Paulo S. G.; Cunha, Daniel C.

doi:10.1007/978-3-031-79029-4_2

Diogo M. Almeida⁹,
Paulo S. G. de Mattos Neto⁹ &
Daniel C. Cunha⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15412))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

474 Accesses

Abstract

Variability in wind intensity and direction causes unstable electricity supply to the power system, making integrating this energy into the electrical system a significant challenge for operations and planning practices. Ensembles can be used as an alternative to address the complex patterns over time in wind speed time series. The appropriate size of the training partition for ensemble models depends on dataset characteristics. And, to enhance model training efficiency, it is important to minimize repetitive data. By maintaining a concise training set, it becomes easier to meet the requirements of software and hardware constraints. Mainly because, there is a growing interest in deploying machine learning models on edge devices. For these reasons, a new method called Local Distribution (LocDist) has been introduced in this paper to predict wind speed, utilizing local pattern recognition based on data distribution. In testing with three wind speed time series, LocDist created a compact training subset with less than 20% of the training data. The Diebold-Mariano hypothesis test was employed to assess the significance of the forecast errors of the proposal compared to individual and bagging methods that use the entire training set. The LocDist method with long short-term memory (LSTM) and gated recurrent unit (GRU) won in 100% of the cases. Additionally, the LocDist with extreme learning machines (ELM) and autoregressive integrated moving average (ARIMA) won or tied in 83% and 66% of the cases, respectively.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Short-term forecasting of wind speed using time division ensemble of hierarchical deep neural networks

Article 04 March 2020

Efficient Training Over Long Short-Term Memory Networks for Wind Speed Forecasting

Short-Term Wind Power Prediction Based on Data Reconstruction and Improved Extreme Learning Machine

Article 12 February 2022

1 Introduction

The use of wind energy is increasing worldwide each year. Over the past 20 years, wind energy has become the primary clean and cost-competitive energy globally [1]. In 2023, wind energy reached the historic milestone of 1 TW of installed capacity, and the coming years will mark a crucial transition period for the global wind industry. It took 40 years to reach this milestone, but the next 1 TW will take less than a decade [2]. The Brazilian Northeast is a region with some of the best winds in the world for wind energy production, as they are more constant, have stable speeds, and do not frequently change direction. Therefore, 90% of Brazilian wind farms are located in the Northeast Region [3]. Although wind energy is considered an attractive option due to its abundance and ecological benefits, several challenges are still faced in its exploitation, such as the variability in wind intensity and direction, causing unstable electricity supply to the power system [4]. Integrating this energy into the electrical system represents a significant challenge for operations and planning practices [5]. Hence, wind energy systems’ planning, scheduling, maintenance, and control depend on wind speed forecasting, making it essential to obtain wind speeds in advance [4]. For these reasons, the analysis and evaluation of this type of energy have attracted the attention of researchers worldwide [6].

Various machine learning methods have already been applied to wind speed forecasting, such as support vector regression (SVR) [7,8,9], extreme learning machines (ELM) [10], long short-term memory (LSTM) [11, 12], and gated recurrent unit (GRU) [12]. Each model can recognize nonlinear patterns in a series and has its particularities. However, wind speed time series display complex patterns [6]. Factors such as regions’ surface roughness, vegetation variability, and land use influence wind behavior [13]. Furthermore, research shows that wind speed data can exhibit a wide range of distributions [14], and these time series may also reveal hidden patterns with chaotic characteristics [5]. As a result, accurately forecasting wind speed poses a significant challenge [6].

An alternative to modeling complex patterns over time in wind speed time series is using ensembles [6]. For example, applying traditional procedures to a time series with a mixed distribution can lead to misleading results [15]. Therefore, we understand that ensembles offer superior prediction accuracy and reduced uncertainty compared to individual models, as they combine the different characteristics of individual prediction models [16,17,18,19,20], especially in the field of wind speed forecasting [6, 13, 21,22,23]. Ensemble is a technique used to combine prediction models [6]. It involves at least two steps: generation and integration. As stated in [6], individual models (also known as base models) use time series as input to produce predictions in generation. In integration, the predicted outputs of the base models are combined to obtain the final prediction. Diversity among predictions is crucial to recognizing different patterns in the time series, and it is one of the primary objectives of using an ensemble [20]. The authors in [17] have reported that the appropriate size of the training partition for ensemble models varies depending on dataset characteristics [17]. It is highlighted in [24] that to enhance model training efficiency, it is important to minimize repetitive data. By maintaining a concise training set, it becomes easier to meet the requirements of software and hardware constraints. Additionally, efficient artificial intelligence (AI) everywhere is a goal pursued by humans. As a result, there is a growing interest in deploying machine learning models on edge devices (computation and data storage closer to the sources of data) that have stringent constraints on resources and energy [25].

For these reasons, studies on the ensemble generation stage that appropriately define partitions for the base models are desirable in wind speed prediction. Especially because in [26], it has been shown that it is achievable to obtain a subset of wind speed data partitions that adequately represent the data dynamics throughout the entire analysis period. This fact suggests that wind speed time series exhibit redundancy in their patterns, rendering it unnecessary to utilize the entire training set for model training. So, the proposal of this paper introduces a new generation method for homogeneous ensembles, composed of a single type of base model, to predict wind speed, called Local Distribution (LocDist), which utilizes local pattern recognition based on data distribution. This method generates partitions based on local data distribution using a divide-and-conquer strategy. Instead of using a single prediction model to identify patterns in different distributions in the time series, base models of an ensemble are applied to separate partitions selected from the time series to identify distribution patterns. The predictions of the base models are then combined to aggregate the mapping of different distribution patterns of the time series into an ensemble. Our hypothesis suggests that if expert models are generated in partitions containing data with different distributions among the partitions, it will result in models with diversity in their predictions. Moreover, combining these models will create an ensemble of specialists, each well-versed in different distributions of the time series data, containing concise information about the data patterns. Thus, the primary contribution of this proposal is to create a compact training subset that minimally represents the data distribution of the time series. The training phase can be executed more efficiently by reducing the amount of training data.

The outline of the paper is organized as follows. Section 2 describes the related works. Section 3 introduces the proposal in detail. Section 4 is the description of the processes through which results are generated. In Sect. 5, results are presented by performing a comparison among individual models, ensembles and the proposal. Finally, in Sect. 6, conclusions are drawn.

2 Related Works

The use of individual models for wind speed time series forecasting has been widespread [8, 10, 12, 27]. In [27], it was found that autoregressive moving average (ARMA) models are more effective than the persistence model in predicting future observations. The persistence model simply repeats the most recent past observation. Furthermore, in [8], it was shown that support vector regression (SVR) outperforms multilayer perceptron (MLP). In [10], the extreme learning machines (ELM) demonstrated competitive prediction errors compared to MLP and SVR while requiring lower computational costs. Reference [12] introduced a new seasonal autoregressive integrated moving average (ARIMA) model for predicting wind speed time series and compared it with long short-term memory (LSTM) and gated recurrent unit (GRU). Despite the more advanced machine learning algorithms of LSTM and GRU, the seasonal ARIMA model exhibited superior predictive performance and faster training. These individual models employ a global mapping of a time series, using the entire training set for model training. However, this approach is not the optimal choice due to the weaknesses, disadvantages, and limitations of modeling complex energy systems [6].

Some methods for the generation stage in ensembles have been proposed to identify complex patterns in time series data. In [21,22,23], wind speed time series are decomposed into a finite and small number of intrinsic mode functions (IMFs) using empirical mode decomposition (EMD). These IMFs contain information about local trends and fluctuations at different scales of the original signal, which is valuable for understanding the actual physical significance of the signal [23]. A homogeneous ensemble utilizing a resampling technique is developed in a different application in [19] and used on the M3-Competition dataset. The time series is bootstrapped using the moving block bootstrap (MBB). This process creates new time series from the original, intending that the predictors of these time series will make the final forecast resilient to the uncertainty observed in the data [18]. This method has a significant computational cost, involving training 100 base models on the entire training set.

However, some generation methods have been proposed focusing on identifying different local patterns in a time series, thus providing specialized local predictors [17]. Therefore, in [17], the authors propose a homogeneous ensemble for various time series applications. They create training partitions of equal size, allowing for an overlap between two adjacent partitions, which enables the model to adapt to progressive pattern changes. The authors also investigate four different partition sizes. In [28], a generation strategy is used to obtain partitions of the training set with the most informative data through clustering. This approach can accelerate the learning phase without compromising accuracy or improving it. It is particularly beneficial when the training dataset is large, as it can significantly reduce computation time by avoiding the processing of redundant data [28]. However, in the proposed method of [28], it is initially necessary to train 200 prediction models on training partitions before applying clustering. The authors argue that the execution time of their method decreases because these 200 models can be run in parallel.

In most of the previously cited works, excluding [28], no analysis is conducted on the partition data to identify different data patterns or to remove redundancy. There is no evidence that the constructed partitions have different data patterns to meet the requirement of generating diverse models, nor any analysis on the most appropriate size or region for the training partitions. In [25], it is reported that developing an automatic solution for model compression is challenging. The aim is to achieve a higher compression ratio while minimizing accuracy loss, posing a challenging trade-off. Therefore, further research is desirable to investigate the ensemble generation step by analyzing the data patterns in the partitions. Mainly because in [26], it was demonstrated that it is possible to obtain a subset of wind speed data partitions that reasonably reflect the data dynamics over the entire analysis period. This fact indicates that wind speed time series may exhibit redundancy in their patterns, making it unnecessary to use the entire training set for training a model.

3 Proposed New Ensemble

The proposal is a homogeneous ensemble comprising two stages: generation and integration. We define partitions with different data distributions and eliminate other ones with redundant data distributions within the training set during the generation stage. Therefore, a concise training set that reflects the data distribution patterns is obtained. This process enables the creation of a pool of trained models $P=\{m_1,m_2,\ldots ,m_N\}$, that is, a group of N local expert predictors, which have been trained on partitions generated with different data distributions validated by the Kolmogorov-Smirnov (KS) hypothesis test. The null hypothesis of KS test states that the two samples share the same distribution [29]. Subsequently, the predictors are combined in the integration stage to achieve the final forecasting of the ensemble.

Figure 1(a) shows the training phase of the LocDist method. The inputs consist of the training and validation sets of the time series, the significance levels of the KS hypothesis test, $\alpha _1$ and $\alpha _2$, the minimum partition size T, the sliding window size W, the base model M with a technique for hyperparameter selection G, the evaluation measure E for the evaluation of M and, lastly, a combination method C for integration. The output is the pool of trained models $P=\{m_1,m_2,\ldots ,m_N\}$ of N local expert models and combination method C.

First, the ensemble generation step is employed through partitioning, which is constructed as follows: (i) Windows $w_t$ are created in the training set with a fixed size W. The difference between two adjacent windows $(w_t - w_{t-1})$ is only one observation. The first window $w_1$ is defined as the initial reference window $Ref_1$; (ii) Subsequently, a significance level $\alpha _1$ is defined for applying the KS hypothesis test between the reference window and each subsequent window sequentially; (iii) The method constructs the partition by joining adjacent windows that have a p-value $>\alpha _1$, otherwise a new partition is initiated with a new reference window. In the end, we will have the defined partitions and their respective reference windows; (iv) Next, partitions with a size smaller than T are eliminated; (v) Return to step (iii) applying a grid search with different significance levels for $\alpha _1$, aiming to maximize the expression

$$\begin{aligned} \text {Maximize } \frac{\sum _{i<j}^{u} KS(Ref_i,Ref_j)}{B_{u,2}} \end{aligned}$$

(1)

$$\begin{aligned} \text {Subject to: } KS(Ref_i,Ref_j) = \left\{ \begin{matrix} 1, & \text {if } \text {{ p}-value} \le 0.05;\\ 0, & \text {otherwise.} \end{matrix} \right. \end{aligned}$$

(2)

where B is a combination in the field of combinatorial analysis and u is the number of reference windows at the end of step (iv). Therefore, $B_{u,2}$ represents the total number of possible two-by-two combinations between the reference windows of each partition. KS refers to the Kolmogorov-Smirnov (KS) hypothesis test. Thus, $KS(Ref_i,Ref_j)$ is equal to 1 if the reference windows i and j of partitions have different distributions (p-value $\le 0.05$), and equal to 0, otherwise. Maximizing the expression (1) is a key approach for identifying the most significant number of partitions with different distributions in the training set; (v) After determining the best significance level for $\alpha _1$ in the grid search to maximize the expression (1), the KS test is applied once again to eliminate partitions with redundant distributions at a significance level of $\alpha _2=0.05$. At the end of step (v), the disjoint partitions D with solid evidence of different data distributions are obtained, and the partitioning procedure is concluded. Succeeding generating disjoint partitions D with different distributions, time series prediction models M are trained on these disjoint partitions D using technique G and evaluation metric E for hyperparameter selection. Linear models are chosen based on the lowest error E in the training partition, while nonlinear models are selected based on the lowest error E in the validation partition to avoid overfitting. This process creates a pool $P=\{m_1,m_2,\ldots ,m_N\}$ consisting of N local expert predictors specialized in different distribution patterns of the time series. This pool P is combined using a combination method C, which can be a statistical measure, such as the average, or a nonlinear model M trained with technique G and evaluation metric E for hyperparameter selection. The best combination method C is selected based on the lowest error E in the validation set.

Finally, Fig. 1(b) shows the testing phase of the LocDist method. The input involves the testing set of the time series, the pool P of N local expert predictors, and the combination method C for integration. The output is the wind speed forecasting. This phase consists of testing the predictive capacity of the proposal with the models P and the combination method C on an out-of-sample set.

4 Experimental Protocol

The databases considered in this work are from the Brazilian Institute of Space Research (INPE, in Portuguese), obtained through sensors at a height of 50 m [30]. The Northeastern Region of Brazil has some of the world’s best winds for wind energy production, which is why 90% of Brazilian wind parks are located in this region [3]. Database 1 refers to Petrolina-PE, while Databases 2 and 3 were obtained in Triunfo-PE. The first database covers the period from July to September 2010. Databases 2 and 3 cover the periods from October to December 2006 and March to May 2006, respectively. There are 13,248 instances at ten-minute intervals between consecutive instances for each database, and no missing values. Every six instances were averaged, resulting in 2,208 instances for each time series with one-hour time intervals to reduce the size of the database. Subsequently, each time series was fully normalized to the interval [0, 1]. Thus, the input data for the neural networks are appropriately mapped according to the codomain of the sigmoid activation function. Then, each time series was divided into three subsets: the first 50% of the data for training, then 25% for validation, and the last 25% for testing. Table 1 presents the statistical summary of the three pre-processed and non-normalized databases.

Table 1. Statistical summary of wind speed in the three time series databases.

Full size table

The LocDist method is a homogeneous ensemble and was separately analyzed with the following base models: ARIMA, ELM, SVR, LSTM, and GRU. The value 24 was chosen for the sliding window size W due to the seasonality of the time series, which is 24 observations (24 h). The grid search for the values of $\alpha _1$ was conducted in the range [0.05, 0.1, 0.15, ..., 0.3]. Values of $\alpha _1>0.3$ tend to construct minimal partitions, so the search ends at $\alpha _1=0.3$. Furthermore, the behavior of LocDist was evaluated for different values of the variable T, which is the minimum allowed size for the generated partitions. In [31], mentions that 66 observations are considered a minimal amount for adjusting the parameters of a machine learning model. This manner, the variable T was evaluated with values 45, 60, and 75. For values above 60, such as 75, the generation method could not find partitions with this minimum size for all time series considered in this work. Thus, the evaluation was limited to the values $T=45$ and $T=60$. The elimination of partitions with redundant data distributions was done so that the less recent distributions were discarded. It was found that maintaining the more recent distributions resulted in better LocDist predictive performance compared to the approach that discards the more recent redundant partitions. The ARIMA base model used the training partition for training and parameter adjustment. The ELM, SVR, LSTM and GRU models used the training partition for training and the validation set for parameter adjustment to avoid overfitting. At last, in the integration stage, the LocDist used the simple average or the ELM nonlinear model as a combination method. The simple average is a statistical measure known in the ensemble literature for its robustness [20]. The combination with a nonlinear model ELM is a more sophisticated approach that can surpass the simple average combination in the presence of predictors with significant differences in accuracy [32]. The validation set was used to select the best combination approach according to the lowest RMSE.

Firstly, we compared LocDist with the respective individual models: ARIMA, SVR, ELM, LSTM, or GRU. Each model was globally trained on the complete training set. Subsequently, the LocDist was compared to the respective homogeneous ensemble using a resampling technique through bagging [19]. The resampling technique was applied to ARIMA, ELM, and SVR base models. Thus, three homogeneous ensembles were obtained, and the base models were globally trained one hundred times on resampled series [19]. This approach has a high computational cost, and because of that, LSTM and GRU were not used in this resampling technique. In the integration stage, the combination method of the ensemble with bagging was the simple average.

The same parameter selection technique was applied to the individual and base models of the ensembles. For the ARIMA model, the integer parameter d was chosen according to the augmented Dickey-Fuller test to identify stationarity. Meanwhile, p and q were obtained in the range [0, 1, ..., 24], as there is a seasonality of 24 lags in the hourly time series. The conditional sum of squares method was applied for model fitting, and the model with the lowest corrected Akaike information criterion was selected. The grid search technique was applied to the remaining models. The SVR type was $\epsilon $-regression with kernel radial basis function (RBF). The input used the sliding window method whose width values were taken from [1, 6, 12, 18, 24]. The regularization parameter was adjusted in the range $[2^0,2^1,...,2^{10}]$, while the tolerance region $\epsilon $ and the parameter $\lambda $ from kernel RBF were obtained from the values $[2^{-10},2^{-9},...,2^{0}]$. For ELM, the parameters were selected in the range [1, 2, ..., 24] for the input and hidden layer nodes. The activation functions for hidden and output layers were the sigmoid and the identity, respectively. However, during the integration stage with the ELM model in the LocDist, the number of neurons in the hidden layer was selected from the range [1, 2,..., 42] to enhance the combination. Concerning LSTM and GRU, a stateful network with parameter values taken from the set [1, 6, 12, 18, 24] was assumed for the input nodes. The network parameters were obtained from the set [6, 12, 18, 24] for the hidden layer nodes. Still, for the hidden layer, the activation function was the sigmoid. The LSTM and GRU were trained through 100 epochs and batch size equal to one. Finally, the optimization solver adopted was Adam.

Table 2. Individual models and ensemble forecasting approaches used in this work with corresponding methods and respective acronyms.

Full size table

Table 2 shows all individual models and ensembles implemented and their respective acronyms. All simulations were performed for one-step-ahead forecasting using the R language with the forecast [33] library for ARIMA, E1071 [34] for SVR, ElmNNRcpp [35] for ELM, and Keras [36] for LSTM and GRU. The grid search technique selected the nonlinear models parameters based on the smallest RMSE in the validation set. The RMSE is defined by Equation (3), such that

$$\begin{aligned} RMSE = \root \of {\left( \frac{\sum _{t=1}^{L} (y_t-\widehat{y}_t)^2}{L}\right) }, \end{aligned}$$

(3)

where L is the size of the series, $y_t$ is the actual value of the time series in period t, and $\widehat{y}_t$ is the predicted value of the time series in period t. The RMSE was also employed to assess predictive performance on the test set, along with three other metrics: mean absolute error (MAE), mean absolute percentage error (MAPE), and predicted change in direction (POCID). For MAE and MAPE metrics, the lower their values, the better the model’s accuracy. In the case of POCID, the higher the value, the better the model’s performance. The MAE is defined by

$$\begin{aligned} MAE = \frac{\sum _{t=1}^{L} |y_t-\widehat{y}_t|}{L}, \end{aligned}$$

(4)

where $|\cdot |$ is the absolute value operator. The MAPE metric is given by

$$\begin{aligned} MAPE = \frac{100}{L} \sum _{t=1}^{L}\frac{|y_t-\widehat{y}_t|}{y_t}, \end{aligned}$$

(5)

while the POCID is designated by

$$\begin{aligned} POCID = 100 \frac{\sum _{t=1}^{L} Trend_t }{L}, \end{aligned}$$

(6)

where

$$\begin{aligned} \begin{matrix} Trend_t= \left\{ \begin{matrix} 1, & \text {if } (y_t - {y}_{t-1})(\widehat{y}_t - {\widehat{y}}_{t-1}) > 0;\\ 0, & \text {otherwise.} \end{matrix} \right. \end{matrix} \end{aligned}$$

(7)

Finally, the Diebold-Mariano (DM) hypothesis test [37] was applied between LocDist and the global mapping methods of the respective base model. So, the RMSE loss function is considered for the errors of both competing predictors. The null hypothesis $H_0$ assumes equality in forecasting accuracy for both predictors. For $H_0$ to be rejected, the p-value must be less than the statistical significance level $\alpha =0.05$. Some signals were used to interpret the results in Tables 3, 4 and 5 of Sect. 5. The sign “$+$” indicates that $H_0$ was rejected and the proposed method outperforms the method used for comparison. For instance, if $LocDist_{ARIMA}$ outperformed ARIMA with statistical significance, the sign “$+$” will appear in the row of ARIMA. The sign “−” indicates that the proposed method underperforms the method used for comparison. The sign “$=$” indicates that the null hypothesis was not rejected, i.e., equality in forecasting accuracy for both predictors.

Table 3. Evaluation measurements and DM hypothesis test for the testing set obtained from the wind speed time-series Database 1.

Full size table

5 Results

The best results were obtained for all three databases with the value $T=60$, which is the minimum allowed size for the generated partitions. In Database 1, LocDist generated three partitions with the following observations: 69, 62, and 96. This corresponds to 20.6% of the entire training set, which consists of 1104 observations. Table 3 shows the evaluation measurements and DM hypothesis test for the testing set obtained from the wind speed Database 1. The first column indicates the five model types, ARIMA, ELM, SVR, LSTM, and GRU, which were used in the methods of this research. The second column lists the constructed methods, such as individual, ensemble with bagging, and the proposed LocDist, according to the respective model type. Then, the four evaluation metrics, RMSE, MAE, MAPE, and POCID, are presented for each forecasting method. Finally, the DM hypothesis test result and the respective p-value are shown. The best measure among the forecasting methods for each model type is highlighted in bold. In the case of ARIMA, the $LocDist_{ARIMA}$ achieved the best results in RMSE and POCID, though with no statistical significance compared to individual ARIMA and $BAGG_{ARIMA}$. About ELM, $LocDist_{ELM}$ did not stand out in any metric, but the DM hypothesis test demonstrated no statistical significance in the errors compared to individual ELM and $BAGG_{ELM}$. In the case of SVR, $LocDist_{SVR}$ loses to individual SVM and $BAGG_{SVR}$ with statistical significance. Concerning LSTM, the $LocDist_{LSTM}$ outperformed individual LSTM in all metrics with statistical significance. The last type of model is GRU, and the $LocDist_{GRU}$ achieved the best results in RMSE, MAE, and MAPE with statistical significance compared to individual GRU. The integration step selected the combination via ELM for the ARIMA, LSTM, and GRU approaches, and the combination by averaging for the ELM and SVM.

Table 4. Evaluation measurements and DM hypothesis test for the testing set obtained from the wind speed time-series Database 2.

Full size table

Regarding Database 2, LocDist generated two partitions with the following observations: 122 and 89. This corresponds to 19.1% of the entire training set. Table 4 illustrates the evaluation measures and the DM hypothesis test for the wind speed Database 2 testing set. About ARIMA, $LocDist_{ARIMA}$ achieved the best results in RMSE and MAE, with statistical significance compared to individual ARIMA. In the case of ELM, $LocDist_{ELM}$ achieved the best results in all metrics but with no statistical significance compared to the individual ELM and $BAGG_{ELM}$. About SVR, $LocDist_{SVR}$ loses to individual SVM and $BAGG_{SVR}$ with statistical significance. Concerning LSTM, the $LocDist_{LSTM}$ achieved the best results in RMSE, MAE, and MAPE with statistical significance compared to individual LSTM. Finally, the $LocDist_{GRU}$ method outperforms the individual GRU in all metrics and with statistical significance. The integration step selected the combination via ELM for all models except for the SVM, which used the average.

Table 5. Evaluation measurements for the test set obtained from the wind speed time-series Database 3.

Full size table

In Database 3, LocDist generated three partitions with the following observations: 66, 65, and 71. This corresponds to 18.3% of the entire training set. Table 5 exhibits the results for the testing set acquired from the wind speed time series Database 3. $LocDist_{ARIMA}$ reached the best result in POCID, though it had lost to individual ARIMA and $BAGG_{ARIMA}$ with statistical significance. Regarding ELM, the $LocDist_{ELM}$ did not exhibit superior performance in any metric but showed statistical significance in the DM hypothesis test, indicating lower forecasting errors compared to the individual ELM but higher errors compared to $BAGG_{ELM}$. In the case of SVR, the $LocDist_{SVR}$ achieved the best result in RMSE, MAE, and MAPE, and with statistical significance. Concerning LSTM, the $LocDist_{LSTM}$ achieved the best results in RMSE, MAE, and MAPE with statistical significance compared to individual LSTM. At last, the $LocDist_{GRU}$ method one more time outperformed the individual GRU in all four metrics with statistical significance. The integration step selected the combination via ELM for all models.

Concerning the results of the DM hypothesis test in Tables 3, 4 and 5, we can see that the methods $LocDist_{LSTM}$ and $LocDist_{GRU}$ won 100% of the cases. In addition, the $LocDist_{ELM}$ and $LocDist_{ARIMA}$ won or tied in 83% and 66% of the cases, respectively. These results corroborate our hypothesis that the LocDist method can generate a concise and representative training set compared to the complete training dataset, potentially enabling more appropriate recognition of local patterns. $LocDist_{SVR}$ has lost in 66% of the cases. This reveals some limitations, indicating that it is not always the best approach for all models.

We then chose some models, such as ARIMA, ELM, LSTM, and GRU, for graphical evaluation of the methods in each dataset. The SVR was excluded from the analysis based on the results of the DM hypothesis test. Figures 2(a) and 2(b) show, respectively, the one-step-ahead prediction using the method LocDist built from GRU and LSTM models. The superiority of LocDist is noticeable compared to the individual methods. In Fig. 3, we have dataset 2, and the chosen models were ARIMA and ELM. We can observe that the local mapping of LocDist exhibits similar or even smaller prediction errors than the global mapping of individual and bagging methods. Finally, Fig. 4 illustrates the results on dataset 3 for the LSTM and GRU models. It is also observed that LocDist exhibits lower prediction errors compared to methods trained on the entire training set.

6 Conclusions

The LocDist introduces a new generation method for ensembles to predict wind speed, which utilizes local pattern recognition based on data distribution. It creates a compact training subset that minimally represents the data distribution of the time series. The results show promise for the LocDist applied to three wind speed time series. Creating a training subset for the LocDist ensemble was possible using less than 20% of the training data. The predictive performance of LocDist was competitive with or even better than that of the individual and bagging models trained on the entire training set. Concerning the Diebold-Mariano hypothesis test applied in the RMSE, the LocDist method with LSTM and GRU won in 100% of the cases. In addition, the LocDist with ELM and ARIMA won or tied in 83% and 66% of the cases, respectively, while the proposed ensemble method with SVR lost in 66% of the cases, revealing some limitations. For future work, it would be interesting to investigate the behavior of this new generation method with more sophisticated ensemble approaches, such as dynamic selection.

References

GWE Council: GWEC$|$ global wind report 2021. Global Wind Energy Council, Brussels, Belgium (2019)
Google Scholar
GWE Council: GWEC$|$ global wind report 2023. Global Wind Energy Council, Brussels, Belgium (2023)
Google Scholar
A. B. d. E. E. ABEEólica, “Abeeólica $|$ infovento,” INFOVENTO 31, 15 de junho de 2023 (2023)
Google Scholar
Qu, Z., Mao, W., Zhang, K., Zhang, W., Li, Z.: Multi-step wind speed forecasting based on a hybrid decomposition technique and an improved back-propagation neural network. Renew. Energy 133, 919–929 (2019)
Article MATH Google Scholar
Jiang, P., Wang, B., Li, H., Lu, H.: Modeling for chaotic time series based on linear and nonlinear framework: application to wind speed forecasting. Energy 173, 468–482 (2019)
Article MATH Google Scholar
Ahmadi, M., Khashei, M.: Current status of hybrid structures in wind forecasting. Eng. Appl. Artif. Intell. 99, 104133 (2021)
Article MATH Google Scholar
Mohandes, M.A., Halawani, T.O., Rehman, S., Hussain, A.A.: Support vector machines for wind speed prediction. Renew. Energy 29(6), 939–947 (2004)
Article MATH Google Scholar
Salcedo-Sanz, S., Ortiz-Garcı, E.G., Pérez-Bellido, Á.M., Portilla-Figueras, A., Prieto, L., et al.: Short term wind speed prediction based on evolutionary support vector regression algorithms. Exp. Syst. Appl. 38(4), 4052–4057 (2011)
Article MATH Google Scholar
Kong, X., Liu, X., Shi, R., Lee, K.Y.: Wind speed prediction using reduced support vector machines with feature selection. Neurocomputing 169, 449–456 (2015)
Article MATH Google Scholar
Saavedra-Moreno, B., Salcedo-Sanz, S., Carro-Calvo, L., Gascón-Moreno, J., Jiménez-Fernández, S., Prieto, L.: Very fast training neural-computation techniques for real measure-correlate-predict wind operations in wind farms. J. Wind Eng. Ind. Aerodyn. 116, 49–60 (2013)
Article MATH Google Scholar
Memarzadeh, G., Keynia, F.: A new short-term wind speed forecasting method based on fine-tuned LSTM neural network and optimal input sets. Energy Convers. Manage. 213, 112824 (2020)
Article MATH Google Scholar
Liu, X., Lin, Z., Feng, Z.: Short-term offshore wind speed forecast by seasonal ARIMA - a comparison against GRU and LSTM. Energy 227, 120492 (2021)
Article Google Scholar
Ferreira, M., Santos, A., Lucio, P.: Short-term forecast of wind speed through mathematical models. Energy Rep. 5, 1172–1184 (2019)
Article MATH Google Scholar
Wu, J., Li, N.: Impact of components number selection in truncated gaussian mixture model and interval partition on wind speed probability distribution estimation. Sci. Total Environ. 883, 163709 (2023)
Article Google Scholar
Robinson, P.: Analysis of time series from mixed distributions. Ann. Stat., 915–925 (1982)
Google Scholar
Santos Júnior, D.S.O., de Mattos Neto, P.S.G., de Oliveira, J.F.L., Cavalcanti, G.D.C.: A hybrid system based on ensemble learning to model residuals for time series forecasting. Inf. Sci. 649, 119614 (2023)
Google Scholar
de Mattos Neto, P.S., Cavalcanti, G.D., Firmino, P.R., Silva, E.G., Nova Filho, S.R.V.: A temporal-window framework for modelling and forecasting time series. Knowl. Based Syst. 193, 105476 (2020)
Google Scholar
Petropoulos, F., Hyndman, R.J., Bergmeir, C.: Exploring the sources of uncertainty: why does bagging for time series forecasting work? Eur. J. Oper. Res. 268(2), 545–554 (2018)
Article MATH Google Scholar
Bergmeir, C., Hyndman, R.J., Benítez, J.M.: Bagging exponential smoothing methods using STL decomposition and Box-Cox transformation. Int. J. Forecast. 32(2), 303–312 (2016)
Article MATH Google Scholar
Sergio, A.T., de Lima, T.P., Ludermir, T.B.: Dynamic selection of forecast combiners. Neurocomputing 218, 37–50 (2016)
Article MATH Google Scholar
Hu, J., Wang, J., Zeng, G.: A hybrid forecasting approach applied to wind speed time series. Renew. Energy 60, 185–194 (2013)
Article MATH Google Scholar
Ruiz-Aguilar, J.J., Turias, I., González-Enrique, J., Urda, D., Elizondo, D.: A permutation entropy-based EMD-ANN forecasting ensemble approach for wind speed prediction. Neural Comput. Appl. 33(7), 2369–2391 (2021)
Article MATH Google Scholar
Jiang, Z., Che, J., Wang, L.: Ultra-short-term wind speed forecasting based on EMD-VAR model and spatial correlation. Energy Convers. Manage. 250, 114919 (2021)
Article MATH Google Scholar
Bowden, G.J., Maier, H.R., Dandy, G.C.: Optimal division of data for neural network models in water resources applications. Water Resour. Res. 38(2), 2–1 (2002)
Article MATH Google Scholar
Deng, L., Li, G., Han, S., Shi, L., Xie, Y.: Model compression and hardware acceleration for neural networks: a comprehensive survey. Proc. IEEE 108(4), 485–532 (2020)
Article MATH Google Scholar
Ribeiro, R., Fanzeres, B.: Identifying representative days of solar irradiance and wind speed in Brazil using machine learning techniques. Energy AI 15, 100320 (2024)
Article MATH Google Scholar
Torres, J.L., Garcia, A., De Blas, M., De Francisco, A.: Forecast of hourly average wind speed with ARMA models in Navarre (Spain). Sol. Energy 79(1), 65–77 (2005)
Article MATH Google Scholar
Tetko, I.V., Villa, A.E.: Efficient partition of learning data sets for neural network training. Neural Netw. 10(8), 1361–1374 (1997)
Article MATH Google Scholar
Hodges, J., Jr.: The significance probability of the Smirnov two-sample test. Ark. Mat. 3(5), 469–486 (1958)
Article MathSciNet MATH Google Scholar
INPE: Rede do sistema de organização nacional de dados ambientais (2020). http://sonda.ccst.inpe.br/index.html. Accessed 27 Jul 2023
Cerqueira, V., Torgo, L., Soares, C.: A case study comparing machine learning with statistical methods for time series forecasting: size matters. J. Intell. Inf. Syst. 59(2), 415–433 (2022)
Article MATH Google Scholar
Andrawis, R.R., Atiya, A.F., El-Shishiny, H.: Forecast combinations of computational intelligence and linear models for the NN5 time series forecasting competition. Int. J. Forecast. 27(3), 672–688 (2011)
Article MATH Google Scholar
Hyndman, R.J., et al.: “Package ‘forecast’,” Online (2024). https://cran.r-project.org/web/packages/forecast/forecast.pdf
Meyer, D., et al.: e1071: Misc functions of the department of statistics, probability theory group (formerly: E1071), TU Wien. R package version (2023)
Google Scholar
Mouselimis, L., Gosso, A., Jonge, E.: elmNNRcpp: the extreme learning machine algorithm. R package (2023)
Google Scholar
Chollet, F.: et al. “Keras” (2015). https://keras.io
Diebold, F.X., Mariano, R.S.: Comparing predictive accuracy. J. Bus. Econ. Stat. 13(3) (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Centro de Informática (CIn), Universidade Federal de Pernambuco (UFPE), Recife, PE, Brazil
Diogo M. Almeida, Paulo S. G. de Mattos Neto & Daniel C. Cunha

Authors

Diogo M. Almeida
View author publications
Search author on:PubMed Google Scholar
Paulo S. G. de Mattos Neto
View author publications
Search author on:PubMed Google Scholar
Daniel C. Cunha
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Diogo M. Almeida .

Editor information

Editors and Affiliations

Universidade Federal Fluminense, Niterói, Brazil
Aline Paes
Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
Filipe A. N. Verri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Almeida, D.M., de Mattos Neto, P.S.G., Cunha, D.C. (2025). A Data Distribution-Based Ensemble Generation Applied to Wind Speed Forecasting. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15412. Springer, Cham. https://doi.org/10.1007/978-3-031-79029-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-79029-4_2
Published: 30 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79028-7
Online ISBN: 978-3-031-79029-4
eBook Packages: Computer ScienceComputer Science (R0)

A Data Distribution-Based Ensemble Generation Applied to Wind Speed Forecasting

Abstract

Similar content being viewed by others

Short-term forecasting of wind speed using time division ensemble of hierarchical deep neural networks

Efficient Training Over Long Short-Term Memory Networks for Wind Speed Forecasting

Short-Term Wind Power Prediction Based on Data Reconstruction and Improved Extreme Learning Machine

1 Introduction

2 Related Works

3 Proposed New Ensemble

4 Experimental Protocol

5 Results

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Keywords

Publish with us

Profiles

A Data Distribution-Based Ensemble Generation Applied to Wind Speed Forecasting

Abstract

Similar content being viewed by others

Short-term forecasting of wind speed using time division ensemble of hierarchical deep neural networks

Efficient Training Over Long Short-Term Memory Networks for Wind Speed Forecasting

Short-Term Wind Power Prediction Based on Data Reconstruction and Improved Extreme Learning Machine

1 Introduction

2 Related Works

3 Proposed New Ensemble

4 Experimental Protocol

5 Results

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us

Profiles