1 Introduction

Data is consistently being generated, stored, and processed. Several applications provide data generated in large amounts, frequency, and speed. In this paper, we focus on such scenarios the so-called data streams. Data streams are, per definition, potentially unbounded and non-stationary data sequences [10]. Consequently, performing data mining to extract useful insights and patterns from such data requires algorithms that are tailored to such scenarios.

In contrast to traditional machine learning, the data stream mining area has shown concerns with the trade-off between accuracy and computational resources, i.e., processing time and memory consumption [5]. Nonetheless, recent research has shown that we still need more significant steps toward sustainability. For instance, authors in [28] depict that the CO2 emissions of training neural networks are larger than that of a car in its lifetime. First, there is no clear relationship between processing time, memory consumption, and energy consumption. Even though these components are tied to one another, multi-threading, compiler, and other low-level computational architecture have been shown to impact the entire process, and, consequently, energy consumption cannot be directly estimated from such components [13, 18].

Having both sustainability and the lack of generic tools for quantifying energy consumption in streaming scenarios as motivation, we bring forward a tool for researchers and practitioners to investigate how data stream mining algorithms behave under different streaming settings, e.g., with and without concept drifts, different validation schema, etc. In opposition to previous works [15, 16], our tool is generic because it can be coupled with any classifier and data stream available in the Massive Online Analysis (MOA) framework [6], which is the off-the-shelf solution for implementing and testing streaming methods. We experimentally evaluate our tool against a hardware solution and assess the energy consumption of different classifiers under different streaming settings. Finally, the tool is made publicly available to the scientific community as a byproduct of our research.

This paper is divided as follows. Section 2 describes data stream mining and brings forward the main concepts in energy consumption. Section 3 discusses related works that lie at the intersection of energy consumption and data stream mining. Section 4 describes our tool for energy consumption measurements and describes how it has been combined with the Massive Online Analysis (MOA) [6] framework. Section 5 discusses the analysis conducted to validate our tool and assess different classifiers under different experimental conditions. Finally, Sect. 6 concludes this work and states envisioned future works.

2 Data Stream Mining and Energy Consumption

Data streams are potentially unbounded data sequences made available over time, which may be non-stationary. Consequently, storing an entire data stream is unfeasible since it is not entirely available at once, and it would not fit in memory [10]. As a result, researchers and practitioners have devoted efforts towards developing efficient algorithms to process and mine data that arrive sequentially over time. Therefore, data stream mining is understood as the investigation of patterns, anomalies, and correlations in streaming data. In particular, in this work, we focus on classification, the most popular task in data stream mining that conveys the prediction of a discrete output given a set of input variables. More formally, we denote a data stream S to provide instances \(i^t = (\boldsymbol{x}^t, y^t)\) at timestamps denoted as t. We also denote classification as the task of learning a predictive model \(f:\boldsymbol{x} \rightarrow y\), where y is a discrete label in Y. In practice, we expect predictions \(\hat{y}\) to be accurate given the ground-truth y values.

One of the main challenges in data stream mining is concept drift [31], which regards changes in the data distribution that may render a classifier obsolete. Formally, a concept \(C = \bigcup _{y_i \in Y}{\{(P[y_i], P[\boldsymbol{x} \vert y_i])\}}\) is a set of class priors and class-conditional probability density functions [10]. Therefore, a concept drift is said to occur between two timestamps \(t_i\) and \(t_j\) if \(C^{t_i} \ne C^{t_j}\) [12]. Consequently, classifiers for data streams must be adaptive, which means that f may be adjusted when newly labeled instances are made available.

2.1 Classifiers

Over the years, different approaches have been developed for the classification task in data streams. In practice, these classifiers are variants of traditional classifiers available for batch scenarios.

A popular approach for classification in streaming scenarios is the Incremental Naive Bayes [24]. As its batch counterpart, it assumes that input features are independent. With the arrival of a training instance, all probabilities are updated according. Since probabilities are based on counters, and there is no need to store instances, Naive Bayes has a constant memory consumption and processing time. Nonetheless, it does not present any traits to identify and adapt to concept drifts.

The most common approach for learning from data streams is decision trees. In particular, Hoeffding Trees [9] is the most popular approach as it branches over time when statistically enough data (grace period, \(n_{\min }\)) and evidence has been gathered, according to the Hoeffding Bound [22]. The Very Fast Decision Tree (VFDT) is a popular implementation of incremental Hoeffding Trees, meaning that it continuously branches as new data becomes available and does not revisit the quality of previously created split nodes. In contrast, the characteristic of revisiting split nodes is observed in Hoeffding Adaptive Trees [4], in which each split node is coupled with an ADWIN drift detector [3]. Whenever a drift is flagged, the corresponding split node is replaced by a leaf node, which can branch again if the Hoeffding inequality is met. Even though Hoeffding Adaptive Trees significantly improve accuracy rates compared to incremental trees, even better results are obtained when creating ensembles of Hoeffding Trees. A state-of-the-art exemplar of ensembles of Hoeffding Trees is the Adaptive Random Forest (ARF) [19], in which Randomized Hoeffding Trees are trained in parallel and coupled with drift detectors to identify and adapt to concept drifts rapidly. ARF adjusts the sampling process with Poisson(\(\lambda =6\)) so that instances have higher chances of being used during training, thus speeding up the drift adaptation process. In the test step, classifiers’ votes are combined using weighted majority voting, i.e., classifiers with higher accuracy have a higher impact on the final prediction. Consequently, ARF is a strong learner that achieves state-of-the-art results in terms of accuracy, yet, at the high expense of computational resources.

2.2 Requirements

Throughout the training and test steps of data stream classification, streaming classifiers must meet certain requirements [5, 6]:

  • Requirement #1: Process an example at a time, and inspect it only once (at most);

  • Requirement #2: Use a limited amount of memory;

  • Requirement #3: Work in a limited amount of time;

  • Requirement #4: Be ready to predict at any point; and

  • Requirement #5: Detect and adapt to concept drifts.

This list has been incremented in the works of García-Martín [13, 14, 17], in which energy consumption is highlighted as a relevant aspect in data stream mining since several classifiers have been tailored focusing solely on accuracy and overlooking sustainability. This is one of the main drivers of our work, i.e., to allow researchers and practitioners to quantify the energy consumption of data stream classifiers and determine under which conditions they fail to meet energy sustainability criteria.

3 Related Works

Over the years, different approaches to measuring energy consumption have been proposed. This section highlights approaches for measuring energy consumption in data stream mining and decreasing energy usage. A first significant study is [16], in which authors used PowerAPI [30] to quantify the energy consumption of Hoeffding Trees despite acknowledging that it overlooks RAM consumption. The same authors have changed their approach in [15], in which Jalen (now called JourlarX) [26] has been used to quantify energy consumption of Hoeffding Trees on a function-basis. This allowed the authors to identify bottlenecks in the existing Hoeffding Tree implementation available in MOA.

Finally, authors in [13] and [14] have used Intel’s RAPL [7] to quantify the energy consumed by the DRAM and the processor based on accesses to the processor performance counters. Even though Intel’s RAPL code is not available, authors disclose that its accuracy has been checked in [21] and that it does not introduce processing overheads. It is also relevant to highlight that the work of [14] introduces a Hoeffding Tree variant in which the grace period (\(n_{\min }\)) is adjusted so that branching is only attempted according to a user-given threshold. The results showed that the proposed Hoeffding Tree variant has accuracy convergence while approximately 65% less energy consumption rates. On the other hand, the work of [13] introduces a framework to quantify the energy consumption of Hoeffding Tree ensembles while accounting for decision tree learning, drift detectors, and tree replacement.

Regarding all of the works mentioned above, a significant drawback is that energy consumption has been tackled solely for Hoeffding Trees, and no general open-source tool makes energy consumption is easily available for researchers and practitioners. Our proposal is brought forward in the next section, and it circumvents such problems.

4 Proposal

In this section, we detail our tool to quantify the energy consumption of data stream mining algorithms. Our tool is embedded within the Massive Online Analysis (MOA) framework [6], yet, its rationale can be used in other tools like River [25].

Our tool uses Intel’s RAPL [7] and can be seen as a plugin to the Massive Online Analysis (MOA) framework. The general idea of our tool is given in Fig. 1, in which we see that RAPL is used to quantify energy consumption in data acquisition and models’ training and testing phases. Once an experiment starts, energy measurements are initialized. During data stream processing, arriving instances are used to determine processing and energy readings before and after processing, i.e., testing and training steps. These readings are used to compute energy consumption rates during the experiment. For each of the stages, there are different forms of measuring energy consumption. The framework controls the flow of the data stream model to start the energy measurement right before it begins processing the samples. At each cycle, the measurement is taken and presented in real-time to the user. At the end of the process, a graph showing the instantaneous measurement for each cycle is presented to the user.

Fig. 1.
figure 1

Overview of the proposed energy measuring framework. The tool measures energy consumption during data acquisition and model training and testing.

Since our tool is based on Intel RAPL, we provide in Fig. 2 details on how our plugin interacts with MOA, RAPL, and the Linux Kernel. As new data becomes available for processing, the plugin requests measurements from the Linux Kernel, receiving the response of how much energy has been spent during the testing and training phases. These values are summed and made available whenever the evaluation interface (the so-called evaluation frequency parameter) requests an energy consumption rate. Figure 3 exemplifies the energy consumption rates measured by the proposed tool and how it is reported in MOA alongside other evaluation metrics.

Fig. 2.
figure 2

Measuring energy consumption using RAPL.

Fig. 3.
figure 3

Screenshot of the Massive Online Analysis (MOA) framework with results provided by the proposed tool.

The source code of our proposal and experimentation can be found at https://github.com/ericonuki/moa-bringing-awareness-green-ict.

5 Experiments

This section presents the experiments conducted to assess our proposed tool to measure energy consumption in data streams. In particular, this section is divided into two experiments. First, we analyze our tool against a hardware solution, which serves as a gold standard for the energy consumed. Next, we analyze the energy consumed by different classifiers in Prequential test-then-train validation in stationary and non-stationary scenarios. All tests were performed on a desktop computer running Ubuntu Desktop 18.04 LTS; Intel Core i7-2600 Sandy Bridge CPU; 4GB RAM, and 250 GB HDD Hard Drive.

5.1 Experiment 1 - Validation Against a Hardware Solution

The first experiment conducted aimed to assess whether the energy consumption measurement via hardware and software solutions were equivalent or, at least, correlated. In this experiment, both software and hardware TP-LINK HS 110 wall plug solutions were connected to the computer, and a CPU stress called stress-ng [23] was used to quantify energy consumption rates.

Initially, it was deemed necessary to assess whether the measurement of energy consumption by hardware or software is equivalent and when using only one software tool (the plugin) would not compromise the results of this study. The stress was configured to generate a 10% stress level for 10 min and progressively perform increments by 10% until 100% stress was reached. Once 100% stress was reached, the test was incremented to use an extra processor core. This process was repeated until all four cores were allocated. The entire testbed encompassed five runs, and the average results are given in Fig. 4. Both lines indicate the various stress levels on the computer, with the blue line relating to the software power consumption measurement and the green line the hardware results.

These results depict that the software solution does not meet the hardware solution readings. These results are expected since the hardware readings account for the entire computer, i.e., the operational system and other software that is being run; thus, it is reasonable to assume that the stress occupies a part of the overall energy being consumed. Even though it is clear that the results do not match, they possess a 99.97% correlation, which depicts a strong correlation. Therefore, we verify that although the software solution does not accurately describe a computer’s total energy consumption, its results correlate with the actual consumption measured from a hardware tool, as also observed in [8, 27].

Fig. 4.
figure 4

Comparison of the energy consumption quantified by hardware and software.

5.2 Experiment 2 - Analyzing Different Classifiers in Stationary and Non-Stationary Environments

In this experiment, we used our proposed tool to quantify the energy consumption of different classifiers in stationary and non-stationary environments. In particular, synthetic data streams were created using the Massive Online Analysis (MOA) framework using the Agrawal (AGR) [1], Assets Negotiation (AN) [2], and SEA [29] generators. Each stream possessed 1 million instances, and two variants were created, one with concept drifts located at the middle of the experiment (drift position equal to 500,000) and another without concept drift. Experiment variants marked with a -D suffix stand for drifting experiments. The data streams mentioned above were used to assess Naive Bayes (NB), Hoeffding Tree (HT), Hoeffding Adaptive Tree (HAT), and Adaptive Random Forest (ARF) classifiers. All classifiers used the default parameters available in MOA, except for ARF, which used 100 ensemble members (each with an individual thread) and a grace period \(n_{min}=50\). All experiments were conducted using the Prequential validation scheme proposed in [11], i.e., each instance was retrieved and used for testing and training. The source code to reproduce this experimentation is also available in the code repository.

Discussion. The results obtained are given in Tables 1, 2, 3, and 4. These tables provides accuracy, processing time (in seconds), memory consumption (in GB-Hours), and energy consumption rates (in Watts), respectively. First, we highlight that Naive Bayes (NB) has the lowest accuracy rates in all scenarios. This result corroborates that decision trees and their ensembles are more interesting when this particular evaluation metric is pursuit. We highlight, for instance, the results obtained by ARF in drifting experiments, which are expected since it has multiple learners coupled with drift detectors to detect and adapt to such changes. Nonetheless, decision trees bring forward computational overheads that are quantified by the remainder of the metrics. First, we see that Hoeffding Tree (HT), Hoeffding Adaptive Tree (HAT), and the Adaptive Random Forest (ARF) are slower than Naive Bayes (NB). The processing times for HT, HAT, and ARF are 7, 11, and 44,751 times slower than NB, respectively. Similar results are observed for RAM consumption, in which HT, HAT, and ARF consume more RAM than NB. Again, we highlight the RAM consumption observed by ARF, which is approximately \(10^{8}\) times higher than its counterparts. Finally, the energy consumption values depict that NB is the less-consuming algorithm, which the exception of HAT in the AGR-D experiment. These results are expected since NB is much faster and less memory-consuming; thus, less energy is required to finalize an experiment. Regarding the AGR-D experiment, it is relevant to emphasize that HAT’s energy consumption has decreased since it has a drift detector that restarted the entire tree learning process, i.e., it replaced the entire tree with a single decision stump, and thus, its computational cost after the drift has greatly decreased. This is a relevant scenario in which energy consumption is not directly related to processing time and memory consumption, and it allows a better analysis by researchers and practitioners on which classifier should be used in a specific scenario. Focusing on ARF, we also highlight that the energy consumption rates are not directly related to either processing time and memory consumption rates since, despite taking much more time and memory to run, its energy consumption was roughly twice when compared to its counterparts. This result can be explained due to ARF’s implementation, which is multi-threaded, and even by combining 100 learners the energy consumption is not 100 times greater than its counterparts.

Table 1. Accuracy results obtained during experimentation.
Table 2. Memory consumption results obtained during experimentation.
Table 3. Processing time results obtained during experimentation.
Table 4. Energy consumption results obtained during experimentation.

6 Conclusion

In this work, we brought forward a tool for quantifying energy consumption in data stream mining. Our tool was embedded within the Massive Online Analysis (MOA) framework, thus allowing researchers to rapidly quantify the energy consumption of different classification methods under different streaming settings and validation processes. To validate our proposal, we first conducted a testbed using a processor stress to compare the proposed software readings against a hardware wall plug. Next, we tested different classifiers in stationary and drifting scenarios in a Prequential validation scheme. Results showed that our tool allows the identification of energy consumption rates of different classifiers under different scenarios, i.e., drifting and non-drifting data streams. The energy consumption rates allow a more fine-grained analysis of the classifiers as energy consumption is not directly tied with processing time and memory consumption, especially under concept drifting scenarios and multi-threading implementations.

In future works, we plan to extend our tool to encompass different classification, regression, and clustering validation schemes, including more datasets and scenarios. We also plan to port our tool to Python-based frameworks, such as River [25]. Finally, we also plan to make our tool available in AMD and Apple’s ARM platforms and add support to GPU consumption since neural networks are increasingly used in streaming settings [20].