key: cord-0044734-7gdkyzlg authors: Bot, Karol; Ruano, Antonio; da Graça Ruano, Maria title: Forecasting Electricity Consumption in Residential Buildings for Home Energy Management Systems date: 2020-05-18 journal: Information Processing and Management of Uncertainty in Knowledge-Based Systems DOI: 10.1007/978-3-030-50146-4_24 sha: 1bf81e63d33200cd299b7029627c5a44d19a03d7 doc_id: 44734 cord_uid: 7gdkyzlg Prediction of the energy consumption is a key aspect of home energy management systems, whose aim is to increase the occupant’s comfort while reducing the energy consumption. This work, employing three years measured data, uses radial basis function neural networks, designed using a multi-objective genetic algorithm (MOGA) framework, for the prediction of total electric power consumption, HVAC demand and other loads demand. The prediction horizon desired is 12 h, using 15 min step ahead model, in a multi-step ahead fashion. To reduce the uncertainty, making use of the preferred set MOGA output, a model ensemble technique is proposed which achieves excellent forecast results, comparing additionally very favorably with existing approaches. The consumption of energy has increased substantially in the building sector in the past years, fueled primarily by the growth in population, households and commercial floor space. For this reason, Home Energy Management Systems (HEMS) are becoming increasingly important to invert this continuously increasing trend. HEMS offer advantages to both building occupants and electricity suppliers. For the former, they are a means to reduce energy consumption in a household, or, perhaps more important to the occupants, by reducing their electricity bill. For suppliers, making use of smart grid technology, HEMS enable the implementation of several Demand Response (DR) mechanisms [1] . If the HEMS is able to control the operation of devices in a home, it is necessary to separate the consumption of non-deferrable (or non-schedulable) appliances, from deferrable (or schedulable) devices. As the efficiency of DR techniques can be improved making use of forecasts of electricity consumption (and electricity generation if renewables are employed), then both the consumption of schedulable and nonschedulable appliances must be predicted [2] . Methodologies based on computation intelligence are the ones that are most used for short-term load forecasting. However, a certain degree of uncertainty is typically found in those forecasts [3] , which typically can be reduced using ensembles of models [4] . Computational intelligence models are developed by measuring the inputs and outputs of the system and fitting a linear or non-linear mathematical model to approximate the operation of the building [5] . These models are based on the implementation of a function deduced only from samples of training data describing the behavior of a specific system, being this way well suited when physical relations are not known [6, 7] . For buildings, the advantage of computational intelligence models over physical methods is that the former do not require knowledge of the building geometry and physical phenomena to deduce an accurate prediction model. However, the lack of proper data can become an issue for the use of computational learning methods [7] , because the accuracy is strongly depending on the quality and amount of available data. Reviews of prediction of energy consumption in buildings with computational intelligence methods can be found, for instance, in [8, 9] . According to the mentioned works, in which more than 100 cases were analyzed, these techniques are proven to be very effective. Among these methods, Artificial Neural Networks (ANN) are the primary models employed to evaluate and predict energy consumption [10] [11] [12] . The main input data used to feed this technique may be segmented in two main categories: weather-related parameters and building-related parameters. Concerning the weatherrelated parameters, atmospheric temperature is the parameter most used as an exogenous variable, but also solar radiation availability and relative humidity are employed. Considering the building related parameters, the total building energy consumption data is the most used variable (as endogenous variable), followed by parameters as occupancy, usage of devices, indoor temperatures and fenestration characteristics. According to the partition of electricity considered, the prediction of the electric energy consumption may have different focus. Most studies deal with the wholebuilding energy consumption [13] [14] [15] ; other focus only on heating demand [12, 16] , only on cooling demand [17] , on both heating and cooling [18] , and also on the detailed segmentation considering devices and other uses as water heating [19] . The prediction horizon of reviewed studies was segmented in hourly fractions, hour, day, month and year, with varying prediction time steps (most hourly for one-day as a prediction horizon, and daily for the one-month horizon). The validation methods of the prediction models also varied between the use of analytical proofs, experimental analysis, model comparison, reference comparison and simulation comparison, being the first two the most used. Additionally, an extensive review concerning categorization of forecasting parameters may be found in [20, 21] . This work proposes the use of an ensemble of models to be used for producing forecasts of home electric consumption data, considering total consumption, of schedulable equipment, and of non-schedulable devices, to be employed in HMES schemes. The models employed are Radial Basis Function (RBFs) Neural Networks models, designed using a Multi-Objective Genetic (MOGA) algorithm. MOGA has been employed successfully in a variety of applications (please see [22] [23] [24] [25] [26] to name just a few). In all these works, results obtained with MOGA have been compared with other available methods, relevant to the application at hand. The objective of this paper is not to compare MOGA with other methods for forecasting energy, but to verify if MOGA results could be improved with ensemble averaging of the models in the non-dominated set. Experimental data obtained from the Honda Smart Home US, located in Davis, United States, are employed as a case study. The paper is divided in five sections. Section 1 introduces the scope of the work, objectives and work organization, and a brief literature review. Section 2 presents the description of the case study. Section 3 introduces the MOGA methodology, and its use for ensemble averaging. Section 4 presents and discusses the results. Conclusions are drawn in Sect. 5. This work uses data obtained in the Honda Smart Home (HSM) US [22] . This building is located on the West Village campus of the University of California, Davis. The building is a classified as a Net Zero Energy Building, used sustainable construction materials, has a radiant floor and night ventilation. Electric appliances and lighting have high efficiency, and the HVAC system employs a ground-source heat pump. The household has a complex home energy management system to control the electric systems. Details about the construction, electric appliances and data acquisition system details can be found in its website [22] . The group responsible for the HSM makes available experimental data every six months. Based on the public available data, some studies were developed, focused mainly on the integration between electric vehicles and the smart home, and the home management systems of the HVAC solutions, as well as construction practices. The present work uses the HSM data to design the prediction models and test their accuracies (Fig. 2) . To develop the present study, four variables are used from the HSM data set. They are the total average electric power demand, the HVAC power demand, all the "other" electric loads except the HVAC (equipment, lighting, energy management system equipment, and other miscellaneous loads), as well as the outdoor temperature. The data set is composed by 15 min averages of power consumption of the HSM (total, HVAC, and others) and outdoor temperature, during the three years (2016, 2017 and 2018). Additionally, a codification of each day, within a week, considering holidays and their position within the week [28] , was employed to associate the patterns of consumption to the calendar days. The model intends to predict the power consumption for a prediction horizon of 12 h, using steps of 15 min, in a multi-step fashion. RBF models are employed, in a Nonlinear AutoRegressive with eXogenous inputs (NARX) configuration. Two exogenous variables (v2outdoor temperature, v3day code) and their delays are used as inputs, together with delays of the modelled variable (v1 -electric power). Three problems are considered, aiming to model the total demand (P1), HVAC demand (P2) and "other" demand (P3). As it will be explained later, two different models will be designed for each problem. As those models will be subsequently used in a predictive control scheme, the main goal of the models is to obtain a small Root Mean Square Error (RMSE) over the chosen prediction horizon. Notice that a 15 min time-step is employed to meet the technical requirements for interchanging energy information between prosumer and the energy suppliers [29] in the Portuguese market. This work uses the ApproxHull algorithm, proposed in [30] , to select data for training, testing and validation sets used in model design. ApproxHull is an incremental randomized approximate convex hull (CH) algorithm, applicable to high dimension data, that treats memory and computation time efficiently The convex hull vertices obtained are compulsory introduced in the training set, so that the model can be designed with data covering the whole operational range. Very briefly ApproxHull starts with an initial convex hull and subsequently the current convex hull grows by adding the new vertices into it. A pre-processing phase is performed on the original data set before applying the convex hull, scaling all data in the range of [−1,1]. The maximum and minimum of each dimension form the initial convex hull vertices. Then, it generates a population of k facets based on the current convex hull, selects the furthest points in the current facets population as new vertices of the convex hull, and integrates them in the current convex hull. A detailed explanation of ApproxHull may be found in [30] . The data set for the P1 problem is composed of three full years of data, from January 2016 to December 2018, while the data sets for P2 and P3 start in April 2016 due to the lack of HVAC data in the first three months of 2016. To each variable [v1, v2, v3], the admissible lags are associated to three periods: period 1 (lags immediately before the current sample), period 2 (lags around one day before), and period 3 (a week before). A training set (S tr ) and a testing set (S te ) are used in MOGA execution (please see the next Section). When MOGA stops its execution, the non-dominated or preferable (if restrictions are employed) set of models is evaluated on a third data set, the validation data set (S va ). The size of S tr is 60% of the whole set, and S te and S va have a size of 20% each. All convex hull points are incorporated in the training set. These sets are supplied to MOGA. The model design is considered as a multi-objective optimization problem, with possible restrictions and priorities associated to the objectives. Genetic algorithms can evolve trained model structures that meet pre-specified design criteria in acceptable computing time. Globally, the ANN structure optimization problem can be viewed as sequence of actions undertaken by the model designer, which should be repeated until pre-specified design goals are achieved. These actions can be grouped into three major categories: problem definition, solutions generation and analysis of results (for a detailed explanation of the design framework used, MOGA, please consult [6] ). In this problem, for the former category, the objectives to minimize are the RMSEs of the training set, of the testing set, the model complexity (O(l)) and the forecasting error (e p ). This last criterion is obtained as described in Eq. 1, where D is an additional simulation set, with p data points, and E is an error matrix (Eq. The output of MOGA is not a single solution, but a set of non-dominated models (or preferable models, if restrictions are used). This last set of models can be employed for ensemble averaging. As the forecasting criterion (1) is not used as a MOGA objective, in a few situations, models within the set can deliver a bad prediction performance. This can be solved if the median of the results obtained in the dominant (or the preferable) set, and not their mean value, is used as output of the ensemble. The results obtained by the ApproxHull algorithm, are presented in Table 1 . As previously explained, the number of samples correspond to the available data for each problem description. The number of features is equal for the three problems, and the ratio used for sets distribution is constant as well. Considering scaled data within the range [−1,1], the minimum results of e tr , e te and e va , for the non-dominated models (P *-a ), or preferable sets (P *-b ) are presented in Table 2 . There, the mean value is used for O(l). It is possible to conclude that larger RMSE errors are obtained for the P2 problem, reflecting the modelling difficulty of the HVAC operation. The smallest values are obtained for P3; however, it should also be noted that the values of P1 are only slightly higher than the P3. Equation (3) to (8) present the selected models for P 1-a , P 1-b , P 2-a , P 2-b , P 3-a and P 3-b , respectively. Further details and performance values obtained with the selected models are presented in Table 3 . In this table w k k 2 denotes the 2-norm of the linear parameters, which is related with the model condition. v1ðkÞ ¼ f ðv1ðk À 1Þ; v1ðk À 2Þ; v1ðk À 3Þ; v1ðk À 4Þ; v1ðk À 18Þ; v1ðk À 96Þ; v1ðk À 669Þ; v2ðk À 3Þ; v2ðk À 6Þ; v2ðk À 10ÞÞ ð3Þ v1ðkÞ ¼ f ðv1ðk À 1Þ; v1ðk À 12Þ; v1ðk À 13Þ; v1ðk À 17Þ; v1ðk À 18Þ; v1ðk À 92Þ; v1ðk À 96Þ; v1ðk À 97Þ; v1ðk À 672Þ; v1ðk À 673Þ; v1ðk À 675Þ; v2ðk À 2Þ; v2ðk À 5Þ; v1ðk À 11Þ; v2ðk À 12Þ; v2ðk À 16Þ; v2ðk À 20Þ; v2ðk À 99ÞÞ ð4Þ v1ðkÞ ¼ f ðv1ðk À 1Þ; v1ðk À 14Þ; v3ðk À 1ÞÞ ð5Þ v1ðkÞ ¼ f ðv1ðk À 1Þ; v1ðk À 2Þ; v1ðk À 97Þ; v1ðk À 676Þ; v2ðk À 5Þ; v2ðk À 6Þ; v2ðk À 11Þ; v2ðk À 13Þ; v2ðk À 14ÞÞ ð6Þ v1ðkÞ ¼ f ðv1ðk À 1Þ; v1ðk À 3Þ; v1ðk À 4Þ; v1ðk À 5Þ; v1ðk À 95Þ; v1ðk À 100Þ; v1ðk À 671Þ; v2ðk À 11Þ; v2ðk À 15Þ; v2ðk À 20Þ; v2ðk À 94ÞÞ ð7Þ v1ðkÞ ¼ f ðv1ðk À 1Þ; v1ðk À 5Þ; v1ðk À 92Þ; v1ðk À 94Þ; v1ðk À 95Þ; v1ðk À 98Þ; v1ðk À 668Þ; v1ðk À 676Þ; v2ðk À 3Þ; v2ðk À 11Þ; v2ðk À 20Þ; v2ðk À 93ÞÞ It should be noted that, apart from model (5), all models use samples of the modelled variable around 1 day and 1 week before. Most models use the outside temperature, but only model (5) uses the day code. To analyze the prediction results a one-month period, the month of October 2017, was employed. A prediction horizon of 12 h was considered, which means that 48 stepsahead forecasts were employed. Figure 3 present the plots of real measured data (denoted as Target), and one-step ahead predictions for P 3-b (best model selected considering the prediction error), considering just one week of the prediction period. In order to better graphically represent the comparison between target values and prediction values for all the problems, the 1-step prediction errors for all problems, for that particular week, are shown in Figs. 4, 5 and 6. The scaled prediction RMSE evolutions along the prediction horizon are presented in Fig. 5 , for the 6 problems. With the exception of P2, the models obtained with a constrained formulation present smaller RMSE values. The results presented before are obtained for the single model that has been selected for each of the six different cases. As explained in Sect. 4 Table 4 presents the ensemble RMSEs (P -ensemble ) for the training, testing and validation sets, as well as the respective differences (D(P)) with the results obtained with the selected models, shown in Table 3 . The last column shows additionally the prediction error obtained for the whole month of October 2017. It can be seen that for the RMSEs, the majority of the ensemble performs better than the selected models. In terms of the RMSE evolution over PH, which is the most important goal, all ensemble values are significantly better that the selected models. Besides the analysis made in this work, it is important to compare the obtained results with the results of related studies for the prediction of energy demand in buildings. It is however quite tricky to perform a quantitative assessment of the proposed techniques, since their performances will depend on the training data used as input [7] . Additionally, it is not so common to find results of forecasting load demand within a prediction horizon, and this is much more difficult for individual households. If we narrow this analysis to the forecasting of different load classes (total, schedulable and non-schedulable), to the best of our knowledge, there are no available results. In [21] , different prediction models (ANN-NAR, Hidden Markov Models, Support Vector Machines (SVM), MultiLayer Perceptrons and Deep Belief Networks) were designed for one-step daily and weekly forecasts. 8 weeks of 1-hour data were extracted from Pecan Street database, in 4 different scenarios. For daily forecasts, the RMSEs varied between 4.02 (ANN-NAR) to 1.48 (DBN) kW. Much better results were obtained in the present work, using three years of data, although a forecast ceiling of a half day is considered. The authors of [31] compared the forecasting performance of ANNs, SVMs and Least-Squares SVMs, with different data resolutions and forecasting horizons, with several models, each applied to a different load profile, obtained by clustering the load profiles. In the same way as in the previous work, these are one-step-ahead forecasts, although with different forecasting horizons. The best results obtained for a house with similar load profile, RMSEs within the range of 0.8 to 1.6 kW are obtained for a time resolution of 30 min and a 12-hours forecast. Again, the results presented in this paper compare very favorably with these values. This work focused on improving the accuracy of predictive models for the energy demand in buildings, using ensembles of RBF models designed with a MOGA framework. Real data, obtained from the Honda Smart Home US for three years, were used in this work. Three problems were analyzed, each one in two design versions (unconstrained and constrained). For a common prediction horizon of 12 h, it was shown that the best results were for the problem in which the predicted variable is the power consumption of "other" loads (not considering the HVAC), followed by the Problem where the total demand is the predicted variable. The Problem where HVAC demand is the modelled variable obtains the lowest accuracy, due to the higher volatility of the time series. Comparing MOGA designs, the best forecasting results were obtained with a constrained formulation, expect for the HVAC modelling. The model ensemble approach obtained, for all cases considered, the best prediction results. This scheme is Table 4 . etr, e te e va and e p -Ensemble (P -ensemble ) and best models. obviously applicable to all classification, prediction and forecasting problems. For the case at hand, although a quantitative comparation is impossible, the prediction accuracy obtained in this work compares favorably with other existing approaches. Future work will employ these forecasting models for model predictive scheduling of a real household in the South of Portugal, with PV energy production and electricity storage. Smart home energy management NILM techniques for intelligent home energy management and ambient assisted living: a review Forecasting and uncertainty: a survey Ensemble models with uncertainty analysis for multi-day ahead forecasting of chlorophyll a concentration in coastal waters Ten questions concerning model predictive control for energy efficient buildings Evolutionary multiobjective neural network models identification: evolving task-optimised models State of the art in building modelling and energy performances prediction: a review Big data in building design: a review A review of data-driven approaches for prediction and classification of building energy consumption A comprehensive overview on the data driven and large scale based approaches for forecasting of building energy demand: a review Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks Household power demand prediction using evolutionary ensemble neural network pool with multiple network structures Optimal operations management of residential energy supply networks with power and heat interchanges Evaluation of the causes and impact of outliers on residential building energy use prediction using inverse modeling A prediction methodology of energy consumption based on deep extreme learning machine and comparative analysis in residential buildings A novel cost-optimizing demand response control for a heat pump heated residential building Development of a thermal control algorithm using artificial neural network models for improved thermal comfort and energy efficiency in accommodation buildings Operational thermal load forecasting in district heating networks using machine learning and expert advice A study and a directory of energy consumption data sets of buildings A review of data-driven building energy consumption prediction studies Statistical learning versus deep learning: performance comparison for building energy prediction methods GPR target detection using a neural network classifier designed by a multi-objective genetic algorithm An intelligent support system for automatic detection of cerebral vascular accidents from brain CT images On the possibility of noninvasive multilayer temperature estimation using soft-computing methods An intelligent weather station A comparison of four data selection methods for artificial neural networks and support vector machines Evolving RBF predictive models to forecast the Portuguese electricity consumption A convex hull-based data selection method for data driven models Short-term forecasting of individual household electricity loads with investigating impact of data resolution and forecast horizon