key: cord-0074794-ffn7gi3f authors: Hamad, Zhalla; Abdulrahman, Ismael title: Deep learning-based load forecasting considering data reshaping using MATLABSimulink date: 2022-02-16 journal: Int J Energy Environ Eng DOI: 10.1007/s40095-022-00480-x sha: 1fa8dc65b4938e01e3c6670b1ed9e40a517b5aa7 doc_id: 74794 cord_uid: ffn7gi3f Load forecasting is a nonlinear problem and complex task that plays a key role in power system planning, operation, and control. A recent study proposed a deep learning approach called historical data augmentation (HDA) to improve the accuracy of the load forecasting model by dividing the input data into several yearly sub-datasets. When the original data is associated with high time step changes from 1 year to another, the approach was not found as effective as it should be for long-term forecasting because the time-series information is disconnected by the approach between the end of 1-year sub-data and the beginning of the next-year sub-data. Alternatively, this paper proposes the use of 2-year sub-dataset in order to connect the two ends of the yearly subsets. A correlation analysis is conducted to show how the yearly datasets are correlated to each other. In addition, a Simulink-based program is introduced to simulate the problem which has an advantage of visualizing the algorithm. To increase the model generalization, several inputs are considered in the model including load demand profile, weather information, and some important categorical data such as week-day and weekend data that are embedded using one-hot encoding technique. The deep learning methods used in this study are the long short-term memory (LSTM) and gated rest unit (GRU) neural networks which have been increasingly employed in the recent years for time series and sequence problems. To provide a theoretical background on these models, a new picturized detail is presented. The proposed method is applied to the Kurdistan regional load demands and compared with classical methods of data inputting demonstrating improvements in both the model accuracy and training time. Load forecasting is a method to predict future load demands by analyzing historical data and finding dependency patterns of its time-step observations. It has many applications in power system operation and planning including demand response, scheduling, unit commitment, energy trading, system planning, and energy policy [1] . Accurate load forecasting helps power companies and decision-makers to make a balance between supply and demand, prevent power interruptions due to load shedding, and avoid excess reserve of power generation. Load forecasting problem is a challenging task due to its complexity, uncertainty, and variety of factors affecting the prediction. It is considered as a type of timeseries problems that needs a special solution. Depending on its application, load forecasting can be classified into: very-short load forecasting (VSTLF), short-term load forecasting (STLF), medium-term load forecasting (MTLF), and long-term load forecasting (LTLF). VSTLF is used in the problems of demand response and real-time operation that require a time horizon of a few minutes to several hours ahead. Forecasting the load demand from one day to several days ahead is called STLF, whereas forecasting from 1 week to several weeks ahead is known as MTLF. These two types of forecasting cover the majority of load-forecasting studies in the literature and are mainly used in scheduling, unit commitment, and energy marketing. Lastly, LTLF refers to the forecasting with a time frame of up to serval years ahead and it is useful for planning and energy-trading purposes. [1, 2] . Several recent studies have comprehensively reviewed the state-of-the-art techniques used in load forecasting [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] . These techniques can be mainly classified into two groups: statistical and machine learning. Statistical methods are classical models that map the input data to the output. Autoregressive integrated moving average (ARIMA), linear regression, and exponential smoothing are examples of this kind of load forecasting. Statistical techniques are relatively fast, easy to set up, and computationally inexpensive. However, they suffer from uncertainty and low accuracy with high nonlinear systems. On the other hand, techniques based on machine learning such as artificial neural networks, deep learning, and recurrent neural networks have more complex setup and expensive training-time but they are relatively more accurate and perform better. Among the second-type approaches, the long short-term memory (LSTM) and its newer version named gated recurrent unit (GRU) are very popular techniques and widely used in the recent studies [13, 14] . In [14] , a deep neural network and historical data augmentation (DNN-HDA) is proposed for data with a high correlation which shows a great improvement in the accuracy. The method is based on dividing the input data into multiple sequences, each sequence represents a dataset for 1 year. However, when data is divided into multiple parts, information about the connection between the end of one part and the beginning of the next part is missing. For some data and load forecasting problems, this could be unproblematic as shown in the paper. However, when the nature of data changes and includes high uncertainty and fluctuations in the time step information, this approach was found struggling to predict future load demand, especially for long-term forecasting. In [15] , a sequence-to-sequence recurrent neural network approach is proposed to capture time dependencies of input data. References [16] [17] [18] [19] , proposed multi-channels and features to extract useful information from the historical data. Most recent studies in [13, use LSTM as a main deep neural network or in a hybrid model to develop a better STLF load forecasting network. Some of these studies [33] added the impact of COVID-19 on load forecasting using lockdown information as another sequence input. Others [40] use bidirectional LSTM as a learning component. Concerning the previous studies applied to the test data of Kurdistan regional load demand, several studies [46, 47] are present. The methods used in these papers are either statistical approaches or simple models of neural networks. It should be noted that all the denoting studies use mainly MATLAB to implement the proposed models. We know that Simulink is a visualized version of MATLAB and is bidirectionally connected to MATLAB. It has several advantages over MATLAB. For instance, you can see how the algorithm works through looking into the block diagram shown as a flow chart for the problem. You can easily set up a new built-in or customizable component and add it to the model. The blocks and the signals can hold values in the form of scalars or vectors. In addition, we can replace the for-loop required to update the network at each time step in MAT-LAB with a vectorized model without the need of for-loops. This study proposes a reform in the forecasting input data to obtain a better performance and solve certain complex problems that the 1-year data-augmentation approach fails to predict accurately. Not only a one-day or week-day ahead forecasting is addressed, but a 365-day ahead prediction is introduced. Five scenarios are used for comparison including classical single-variable input one-day ahead, single-variable input 365-day ahead, single-variable multi-sequence inputs 365-day ahead, classical multi-variable input 365-day ahead, and multi-variable multi-sequence per variable inputs 365day ahead. This paper also fills the gap in the current programs used for load forecasting by introducing the Simulink model of prediction. The rest of the paper is organized as follows. In the next section, a theoretical background is presented on the long short-term memory and gated reset unit neural networks. In Sect. 3, the dataset under study is analyzed using a correlation function of input time-series observations. The forecasting methodology used in this study is described in Sect. 4, whereas the Simulink program developed for this work is introduced in Sect. 5. The results are discussed in Sect. 6. Finally, the conclusion is presented in Sect. 7. Conventional neural networks such as multilayer perceptron (MLP) can be applied to sequence-based and time-series problems but in practice, it has multiple major limitations. Its stateless structure, messy scaling, fixed-sized inputs and outputs, and unawareness of time-related structure are some of these limitations [48] . A better alternative neural network for these types of problems is the recurrent neural network (RNN). RNN is a feedforward multineural network with additional feedback cycles from previous time steps used to store temporal information as internal states. A recurrent network adds a memory state to learn the sequence order of input data and extracts the dependencies among the input observations. However, almost all RNNs are nowadays replaced with the long-short-term memory (LSTM) or gated reset unit (GRU) to solve major shortcomings in the RNNs: vanishing and exploding gradients. When the RNN weights are updated, it quickly results in either too small changes in the weights (vanishing gradient) or too large changes (exploding). The result is a short-term memory which is extremely hard for the RNN to learn and determine the dependencies among observations from earlier time steps to the later ones. The LSTM model is developed to overcome the drawback of the RNNs by adding a memory or cell state to the network. The cell state is responsible for adding or removing past information based on its relevance and importance to make the prediction. The structure of LSTM is more complex than the RNN. It has S cell-blocks connected in series, where S is the total time-steps or length of input data. Figure 1 shows the architecture of an LSTM with C features and D hidden units, the former is equivalent to the number of neurons in the classical neural network. Each LSTM cell consists of three adjusting-gate blocks to regulate its state. The gates are simple neural networks composed of weights, biases, and activation functions. The LSTM gates can be described as follows: 1. Forget gate: This gate determines what information from the cell state c t−1 (the top horizontal line in Fig. 2 colored in orange) should be thrown away using information from the previous hidden state h t−1 and the current input x t . The current cell input x t is multiplied by the weight matrix W f whereas the previous hidden-state h t−1 is multiplied by the recurrent weight matrix R f . The resulting output of these products are added to a bias vector b f . Finally, a sigmoid function g is activated to get the output vector f t that has values varying between 0 and 1. The value "0" means no information from the previous time-step of cell state is allowed to flow (not important information), whereas the value "1" means all previous information of the memory is allowed to flow (extremely important). If the information is partially relevant, the function outputs a value between "0" and "1". Mathematically, this description can be written as follows: where g denotes the gate's activation function, and all the other parameters and variables are defined above. If the input variable x t is a vector of sequences with C features, and each cell has D hidden units, then the weights W f and R f are matrices with the dimensions of DxC and DxD , respectively, whereas the bias b f is a vector with D elements. As a result, the output of the gate f t is a vector with D elements. 2. Update gate: This gate is used to update the cell state or memory that was regulated by the forget gate in the previous step. It is composed of two parts of neural networks: input gate i t and candidate cell g t , and are fed with the same inputs used for the forget-gate ( h t−1 and x t ). However, the weights, biases, and activation functions of i t and g t branches are different. For the inputgate branch-i t , we have the input variables x t and h t−1 weighted by the matrices W i and R i , respectively and biased with b i , and finally activated using a sigmoid function g . The same is repeated for the candidate-state branchg t using the denoting letter i instead of g , and replacing the sigmoid function g with a tan hyperbolic (tanh or s ) to squishes the data between − 1 and 1. The input branch is used to control the output of the squished data-the candidate state. Finally, the outputs of these two neural networks f t and g t are multiplied to produce the output of the update gate. Mathematically, the two networks can be written as follows: where s denotes the state activation function. 3. Output gate: This gate is used to compute the current hidden state h t . We pass a copy of the combined input ( x t and h t−1 ) to a sigmoid function g after multiplied with the respective weights W o and R o , and added to the bias b o . The resulting output o t is multiplied with the current cell state c t after squished to the range [-1, 1] using the tanh function s . Mathematically, the output gate can be described as follows: The equations of the new cell-state c t and hidden-state h t are: (1) where the operator . * refers to the Hadamard multiplication (element-wise or pointwise operation). Since we use MATLAB to train our network, the same variable and parameter names used by the software are employed here in this study. To summarize, the forget gate determines what information from the old memory is relevant to keep and forget the irrelevant ones. The input gate is used to update the relevant memory and generate the current memory used in the next block. The output gate is used to compute the output of the current block and the next hidden-state. We should note that all the gates have the same inputs consisting of three copies of the previous hidden state and the current input combined (the bottom line in Fig. 2 ). The top of Fig. 2 is the LSTM memory or cell state that is used by the network to learn about the sequence order of input data. GRU model (Fig. 3) is a simplified and newer version of LSTM. It is composed of two gates and one candidate-state network, namely: reset gate r t , update gate z t , and candidate state h t . The update gate used by the GRU is equivalent to the forget and input gates in the LSTM model combined as a single network. It is used to determine what information to remove or add. The reset gate is used to determine how much information from the previous state to forget. In contrast to the LSTM, there is no cell state in the GRU network. In other words, the cell state can be seen as the previous hidden state h t−1 . The network parameters of the GRU are less than those in LSTM and hence the network requires less training time to learn about dependencies among the time-step observations or sequence data. Mathematically, the following equations are used for the reset and update gates, candidate state, and the hidden state, respectively: In this study, a historical dataset is collected for Kurdistan regional power system containing load profiles for each governorate for the range of years 2015-2020 [49, 50] . The map of this region is shown in Fig. 4 . The data is divided into two subsets: training and test subsets. The first five years of the data are used for training the network, whereas the last year of the data is used for testing the trained network. We call these two datasets XTrain and XTest , respectively, which represent the predictors or independent variables for the respective training and test datasets. The predictors are moved by one-time step to generate the response or dependent variables for the respective training and test datasets YTrain and YTest. In order to see how the dataset for one year is correlated to another yearly dataset of the same time-series sequence, a correlation analysis is conducted on the sample data. Figure 5 shows the correlation among pairs of time-series variables that express the daily load demand of Kurdistan region-Erbil governorate for 6 years. The diagonal plots in Fig. 5 display the histograms of data, whereas the offdiagonal figures exhibit the scatter plots of pair variables. The correlation coefficients for each pair of variables are highlighted on the graph and listed in Table 1 . It can be observed from these plots and the table that the input loads used in this study are highly correlated. The minimum and maximum correlation coefficients are 0.777 and 0.9167, respectively, and the average of these off-diagonal values is 0.8991. The implicit relationships motivate us to investigate the use of this nature in the historical data to improve the load forecasting. As mentioned earlier, one recent study [14] observed this correlation using another dataset and introduces the concept of historical data augmentation (HDA). However, for high uncertainty data with fast changes in the time step information, the use of 1-year data for training a long-term dataset is a challenging problem; the data corresponding to the end of one year has no connection with the beginning of the next year dataset. In fact, if the starting day of a historical data marks the first day of a year which is common, then the data starts in the middle of a winter season which has a similar load-profile shape with respect to the loads obtained for the end of the previous year. Therefore, in the results section, a simple modification of this method is proposed to remove this shortcoming and accelerate the process. To predict the future values of load demands, we can implement one time-step ahead forecasting (OTSAF) or multiple time-steps ahead forecasting (MTSAF). For their future prediction, both approaches use an initial value computed from the last time-step of the historical load demands. However, the difference between OTSAF and MTSAF is in the way the network is updated for the next predictions. OTSAF updates the network using the current value of the test data, whereas MTSAF updates the network from the current predicted value. In other words, in MTSAF, the test dataset is not used anymore for future time-step prediction except for the first one. For the rest of remaining predictions, we loop over the predicted values once at a time until the end of the In this study, several scenarios for the forecasting are considered including single-sequence and multi-sequence input-output forecasting. For the single sequence prediction, the input is the historical load demands. For the multisequence, the inputs consist of load demands, weather data, week-day and holiday information. In addition to the classical method of data inputting with a full sequence of time steps, a modification on the input data is proposed by dividing the data into several subsets by considering a two-year period per each subset instead of only one-year dataset. The results are presented and compared in the following sections. This section presents the Simulink models developed for load forecasting and applied to both the OTSAF and MTSAF methods. Figures 6a, b show the block diagrams of each of these models. Since in the OTSAF, a variable from the test dataset is required for each time step to predict the next load demand, the network will have a vector of test inputs (or matrix in the case of multiple sequences), and there is no feedback loop from the output of the prediction block to its input. However, as it can be observed in Fig. 6b , the current output is fed to the input of the prediction block to be used for forecasting the next time step of load demand. By doing this loop, we are actually replacing the for-loop command required by MATLAB to achieve this task pragmatically. It is more useful to see visually how the algorithm works by showing the main steps in blocks connected to each other. From the figure, we can see three main steps in the program: standardization, prediction, and un-standardization. For the MTSAF, in addition to these three steps, we have an updating loop signal. To avoid an algebraic loop in the model, a memory block is added between the two ends of the prediction block. For multiple-sequence problems, we can keep all output sequences unchanged and plot the results, or we can evaluate a statistical value for these outputs such as their average, minimum, or maximum. A clock and switch blocks are added to switch the input from a first-time-step value obtained from the test data to the next values obtained from the prediction. This study includes several different models with separate network training settings. The models are decided empirically starting from a single LSTM model with default values. The number of recurrent neural networks is increased gradually until a satisfying result is obtained. Most of the models need at least three blocks of LSTM, GRU, or a combination of them with a fully connected layer to get an acceptable accuracy. The gradient threshold is set to 1.0 to avoid any exploding in the network update. The initial value of the learning rate is chosen to be in the range 0.001-0.01 to balance between training time and model accuracy. Reducing We first start with the results obtained from the classical OTSAF model where the input data are given as a single set of time-series load demands without dividing it into subsets of data, and without considering other input variables such as weather or calendrical data. Figure 7a shows the training data in blue for the years from 2015 to 2019, followed by the 2020 test-data, and finally, the forecasting values are plotted over the test data for comparison. The x− axis is the day index starting from day-one which marks 01-Jan-2015 and ending on 31-Dec-2020, whereas the y− axis is the load demands in MW. The network which is selected empirically is a deep neural network with three LSTM layers and 128 hidden units per each which is a default setting. Figure 7b shows the observed and predicted results for the last year of the dataset -that is, 2020-showing the difference errors in MW. The root means square error (RMSE) for this scenario is computed to be 83.0345 MW and the relative percentage error is 83.0345/2696 or 3.08%. Note that both the MAT-LAB and Simulink programs give the same results. For a network with OTSAF, a maximum number of 100 epochs (Fig. 7c ) was found to be sufficient to reach the above accuracy. The algorithm required around four minutes to train the network on a regular computer. Next, we implement the MTSAF approach on the same data and design a network to learn from the five-year training data and predict load demands for the next year. The network architecture is selected experimentally and it consists of two layers of LSTM on the top connected to two layers of GRU on the bottom. The number of hidden units for these layers are chosen empirically to be 128, 64, 32, and 16, respectively. A maximum number of 1500-epochs is chosen to train the network with a learning rate of 0.01 reduced to accelerate the training process. The results shown in Fig. 8a-c show that the network learned from training the data and predicted the next-year forecasting given only a single-day initial value and loop over until the end of the year. Compared to the case of OTSAF, the relative RMSE error is 215.4212/2696 or 7.99% which is higher than the previous case. This is expected as we know that OTSAF is a one-day ahead forecasting whereas MTSAF here is a 365-day ahead forecasting. It is worth pointing out that this method of updating network parameters requires a relatively long training time. Compared to OTSAF, MTSAF consumes around eight times more time to train the network, though the learning rate has been already reduced. The next scenario is for the case when the input data is divided into multiple subsets, each subset is for two consecutive-year periods (or one year repeated twice) so that it relates to the two ends of the year. As a result, we have a model with five-sequence inputs and five-sequence outputs. An LSTM-GRU hybrid network is chosen for this scenario in which its number of layers and hidden units are selected empirically as in the previous cases. The errors for the five sequences are evaluated to be 434.2163, 287.8786, 174.3753, 186.2802, and 243.1786 MW, respectively, and their corresponding relative errors are 16.11%, 10.68%, 6.47%, 6.91%, and 9.02%. We see that the third subset has the lowest error (6.47%) which is smaller than the error in the previous single-sequence case (7.99%). Not only the error is lower but also the training time is much less than in the previous case. The results are plotted and shown in Fig. 9a -c. It is worth mentioning that the previous forecasting method used for the data augmentation failed here to learn from the data and predict the next 365-day demands using the exact training settings and network structure above. The gap of information between the starting and ending points of the yearly dataset had a significant impact on model accuracy. So far, the input variable used for prediction is the load demands. The network can be trained using multi-variable inputs including weather data, weekday, and weekend information. The necessary data for the average daily temperature for the region is collected and preprocessed. The one-hot encoding technique is used for the calendrical variables so that it does not give more weights to week-day variables. However, the weekend days have different one-hot values owing to the reduction in power consumption during these days. The network is trained with the above four inputvariables which are augmented into 11 input sequences: one sequence for each of demands and temperature variables, seven sequences for the weekday variable, and two sequences for the weekend days. The corresponding errors for the output variables are 206.5580, 3.6647, 0.5346, 0.5345, 0.5347, 0.5350, 0.5347, 0.5356, 0.5349, 0.5355, and 0.5321, respectively. The relative percentage error for the load demand is calculated to be 7.66%, and the results are plotted and displayed in Fig. 10a -c. Fig. 9 a Load forecastingsingle-variable multi-sequence, b Model output errors-singlevariable multi-sequence, c Training and loss errors-singlevariable multi-sequence We can also forecast future load demands using multi-input data augmentation by dividing the demand sequence into several training yearly subsets. The same input variables used in the previous scenario are employed for this case study. The input variables are load demand, averaged-daily temperature, weekday information, and weekend days data. All the networks designed so far are for input data with a sampling rate of one prediction per day. It will be useful to investigate the problem with the same input data but considering different sampling rates, such as one prediction per week. For comparison reasons, the data is smoothed using Gaussian function in MATLAB. The OTSAF example analyzed previously is repeated here with the new sampling rate and smoothed data. The network is a deep learning with the same structure as the one used for the OTSAF, and the results are shown in Fig. 12a -c. The relative error is computed to be 0.3485%, which is quite small and sufficient for an accurate load forecasting. The same procedure can be repeated for the other models in this study. It should be noted that in Fig. 12 , we see the load demand is increasing from one year to another by an average scale value computed for the length of data to be 12.35%. In this section, the different models we discussed so far are compared with respect to their errors and training times. The OTSAF model requires less time for training the network compared to the same network using the MTSAF approach owing to its forecasting time window. The ratio factor is 270/2053 which is around 13%. Error ratio is also different for these methods with a percentage of 3.08% for OTSAF and 7.99% for MTSAF. However, when the data is smoothed and the sampling ratio is changed from oneday to one-week per prediction, additional improvements in training time and model error of OTSAF are obtained which are found to be 93 s (it was 270 s) and 0.3485% (it was 3.08%). However, for the rest of the models, the data is decided to remain unchanged to reflect the original data received from the source. Next, we compare the classical method of data inputting as one sequence and the proposed data augmentation technique. The main difference is in the training time where the proposed model requires only 26% (551/2053) of the training time of the classical model. The error is also improved by 23.5% (7.99/6.47). Another significant improvement in the model is that the previous model in the literature with the one-year data division fails to accurately predict the 365-day ahead demand this dataset. For the multi-variable models, the proposed data augmentation improves the accuracy with 18% less error (7.66/6.48) and accelerates the learning process 236% times faster than the classical inputting with one sequence per variable. This paper presented an improved historical data-augmentation approach proposed to enhance the load forecasting performance, accuracy, and training-time speed. Deep learning networks are used using LSTM and GRU techniques, which are the state-of-the-art approaches for time series and sequence-based problems. Multiple input sequences are employed to increase the generality of the model including load demands, temperature data, and important calendrical data such as weekday and weekend information. While the literature uses mainly MATLAB coding for forecasting load demands, this study introduces MATLAB and Simulink programs to present the algorithm in a visualized way. The test data employed in this paper is the load profile for the Kurdistan regional power system. The relationship between observations in the input data is conducted using correlation analysis which showed a high correlation value among the time-series observations. While the previous data augmentation approach was unsuccessful to train the network for several cases, the proposed method demonstrates its ability to forecast the future next 365-day load demands in a comparatively short training time and with better accuracy. Forecasting and Assessing Risk of Individual Electricity Peaks Shortterm load forecasting using convolutional neural networks in COVID-19 context: the Romanian case study A comprehensive review of residential electricity load profile models Comparative analysis of load forecasting models for varying time horizons and load aggregation levels Review of low voltage load forecasting: methods, applications, and recommendations A scoping review of deep neural networks for electric load forecasting. Energy Inform Electrical load forecasting models for different generation modalities: a review A comprehensive review of the load forecasting techniques using single and hybrid predictive models A survey on investment demand assessment models for power grid infrastructure Electrical load forecasting models: a critical systematic review Energy forecasting: a review and outlook Comprehensive review on electricity market price and load forecasting based on wind energy Electric load forecasting model using a multicolumn deep neural networks Load forecasting based on deep neural network and historical data augmentation Deep learning for load forecasting: sequence to sequence recurrent neural networks with attention Multi-convolution feature extraction and recurrent neural network dependent model for short-term load forecasting Multimodal feature extraction and fusion deep neural networks for shortterm load forecasting Multi-scale convolutional neural network with time-cognition for multistep short-term load forecasting Multi-step short-term power consumption forecasting using multi-channel LSTM with time location considering customer behavior A novel hybrid short-term load forecasting method of smart grid using MLR and LSTM neural network A novel short-term load forecasting method by combining the deep learning with singular spectrum analysis A shortterm load forecasting method using integrated CNN and LSTM network An ensemble approach for multi-step ahead energy forecasting of household communities An extensible framework for shortterm holiday load forecasting combining dynamic time warping and LSTM network Deep learning for daily peak load forecasting-a novel gated recurrent neural network combining dynamic time warping Domain fusion CNN-LSTM for short-term power consumption forecasting EMD-Att-LSTM: a datadriven strategy combined with deep learning for short-term load forecasting From load to net energy forecasting: short-term residential forecasting for the blend of load and PV behind the meter Hybrid CNN-LSTM model for short-term individual household load forecasting Hybrid multitask multi-information fusion deep learning for household short-term load forecasting Shortterm residential load forecasting based on LSTM recurrent neural network Adaptive methods for shortterm electricity load forecasting during COVID-19 lockdown in France Missing-insensitive shortterm load forecasting leveraging autoencoder and LSTM On short-term load forecasting using machine learning techniques and a novel parallel deep LSTM-CNN approach A hybrid residual dilated LSTM and exponential smoothing model for midterm electric load forecasting Enhanced deep networks for short-term and medium-term load forecasting Improving load forecasting process for a power distribution network using hybrid AI and deep learning algorithms Recurring multi-layer moving window approach to forecast day-ahead and week-ahead load demand considering weather conditions Short-term energy forecasting framework using an ensemble deep learning approach Short-term load forecasting based on PSO-KFCM daily load curve clustering and CNN-LSTM model Short-term load forecasting of power system based on neural network intelligent algorithm Short-term non-residential load forecasting based on multiple sequences LSTM recurrent neural network Ultra-short-term industrial power demand forecasting using LSTM based hybrid ensemble learning Ensemble learning for load forecasting Midterm load forecasting analysis for Erbil Governorate based on predictive model Load and demand forecasting in Iraqi Kurdistan using time series modelling. Degree Project in Engineering Long short-term memory networks with python: develop sequence prediction models with deep learning Kurdistan central dispatch control, ministry of electricity-Kurdistan regional government Directory of dispatch control in Erbil, ministry of electricity-Kurdistan regional government The authors would like to thank the directory of Erbil control center and Kurdistan central dispatch control for their help in providing the necessary data for this study.Funding Not applicable. There are no competing interests in this manuscript.