Abstract
Hierarchical data stream classification inherits the properties and constraints of hierarchical classification and data stream classification concomitantly. Therefore, it requires novel approaches that (i) can handle class hierarchies, (ii) can be updated over time, and (iii) are computationally light-weighted regarding processing time and memory usage. In this study, we propose the Gaussian Naive Bayes for Hierarchical Data Streams (GNB-hDS) method: an incremental Gaussian Naive Bayes for classifying potentially unbounded hierarchical data streams. GNB-hDS uses statistical summaries of the data stream instead of storing actual instances. These statistical summaries allow more efficient data storage, keep constant computational time and memory, and calculate the probability of an instance belonging to a specific class via the Bayes’ Theorem. We compare our method against a technique that stores raw instances, and results show that our method obtains equivalent prediction rates while being significantly faster.
Supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF
Similar content being viewed by others
1 Introduction
Hierarchical classification is required on problems where instances are labeled with classes that are related to one another in a hierarchy, such as in recognition of music genres and subgenres [12], computer-aided diagnosis where diseases are categorized by their etiology [43], recognition of animals, which are organized in a taxonomy [28, 42], and, recently, even helping in COVID-19 identification using the hierarchical etiology of pneumonia [29].
However, classification techniques often assume that data samples of a particular problem are static and fully available to a learning model in a well-defined training step [26]. This assumption does not reflect many of the real-world scenarios in which classification is applied. The ever-increasing volume of data from diverse sources such as the Internet, wireless sensors, mobile devices, or social networks produces massive large-scale data streams [23, 27, 32].
Data streams are potentially unbounded over time and hence cannot be stored in memory. Also, as the time component is intrinsic in data streams, these are expected to be transient, i.e., the underlying data distribution is ever-changing, thus resulting in variations in the target concept, a phenomenon named concept drift [10, 19, 21, 39].
When merged, hierarchical classification and data stream classification areas combine their properties and introduce new challenges in a roughly unexplored area: the hierarchical classification of data streams. Consequently, novel algorithms for hierarchical data stream classification must: (i) handle class hierarchies, (ii) be updatable over time, (iii) detect and adapt to changes in data behavior, and (iv) be computationally light-weighted regarding processing time and memory consumption [10, 19, 31].
In this study, we propose the GNB-hDS method: an Incremental Gaussian Naive Bayes for classifying potentially unbounded hierarchical data streams. GNB-hDS uses statistical summaries of the data stream instead of storing raw instances.
Despite the relevant application of Bayesian classifiers in hierarchical and data stream classification tasks separately, they have not been adapted yet to their intersection task. Therefore, to the best of our knowledge, this is the first method that combines incremental Bayesian learning with hierarchical classification. These statistical summaries allow a more efficient data storage, holding constant computational time and memory usage, and permit the calculation of the probability of a given instance belonging to a specific class via the Bayes’ Theorem.
The novel contributions of this work are as follows:
-
We qualify Gaussian Naive Bayes, a well-known classification technique [11], to work with potentially unbounded hierarchical data streams and in an incremental fashion by using updatable statistical summaries related to a class hierarchy.
-
We propose GNB-hDS, a method for the hierarchical classification of data streams using summarization techniques. The model is incremental and handles potentially unbounded data streams with constant memory usage.
Furthermore, as a byproduct of this research, we make the source code for the proposed method, as well as the datasets used in the experiments, available for reproducibility.
The remainder of this paper is organized as follows. Section 2 describes the problem of hierarchical classification of data streams and Sect. 3 brings forward related works. Section 4 describes the proposed incremental Gaussian Naive Bayes for the hierarchical classification of data streams. Section 5 comprises the experimental protocol and the discussion of the results obtained. Finally, Sect. 6 concludes this paper and states envisioned future works.
2 Problem Statement
As mentioned above, in this paper, we are particularly interested in hierarchical data stream classification. This specific task combines characteristics and challenges from two different areas, and thus, it differs from classical classification in two key aspects.
First, concerning hierarchical classification, instances of a problem are assigned to a label path that belongs to a hierarchically structured set of classes instead of one single independent label [35]. Figure 1 compares a general approach of (a) flat (non-hierarchical) classification, and (b) hierarchical classification in an illustrative problem. In flat classification, the decision must be made while considering all the classes of the problem (all the possible song genres). Meanwhile, hierarchical classification concerns an existing class taxonomy, which can be used to make first smaller and generic decisions about the problem (in the example, decide first between Rock and R&B genres), and then more specific ones.
Second, concerning data stream classification, there is not the concept of a complete and fully available dataset; instead, instances of a problem are provided to the model sequentially over time [19]. Figure 2 compares (a) a traditional classification process and (b) a data stream classification process. In traditional (or batch) classification, the dataset is assumed to be static and completely available to the model at the training step. Next, the dataset is divided into training and test subsets; the training data is submitted to the learning model that reviews them as many times as necessary until obtaining a single satisfactory test model. This final model is then applied to the subset of test data and provides predictions and, consequently, accuracy estimates.
In contrast, in streaming scenarios, data is made available sequentially over time, and even a single instance can be provided to the model at a time. Each instance is tested by the model, resulting in a prediction, and only after that it is incorporated into the model (being used as training data). This process, entitled ‘test-then-train’, is repeated for each instance, or chunk of instances, that is gathered from the stream. Any processed instance needs to be eventually discarded to maintain the model stable to process new instances since the data stream is potentially unbounded.
Thus, hierarchical classification of data streams regards learning models that use hierarchical data streams as input to their learning processes, not only as a source of data but effectively processing portions of the data over time, using the premise that there is no complete dataset and effectively using class taxonomy in their decision processes.
More formally, we let hDS define a hierarchical data stream in the \([(\vec {x}^t, \vec {y}^t)]_{t=0}^{\infty }\) format providing instances \((\vec {x}^t, \vec {y}^t)\) on a specific timestamp t, where \(\vec {x}^t\) represents a d-dimensional features set and its values, and \(\vec {y}^t\) represents the corresponding ground-truth label path (hierarchically structured classes).
These hierarchically structured classes compose a regular concept hierarchy arranged on a partially ordered set \((Y, \succ )\), where Y is a finite set containing all label paths and the relationship \(\succ \) is defined as an asymmetric, anti-reflexive, and transitive subsumption (is-a) relation [35]. Finally, the classification of hierarchical data streams can be formally defined as \(f^t:\vec {x}^t \mapsto \vec {y}^t\), where the function \(f^t\) is continuously updated by mapping features \(\vec {x}\) to the corresponding label paths \(\vec {y}^t\) accurately.
As the data streams are potentially infinite due to their time component, learning models are restrained by finite computational resources and must work with bounded memory and time, analyzing each instance only once according to their arrival and then discarding it. The processing time of an incoming instance from the data stream must not surpass the ratio in which new instances become available. Otherwise, the learning model will need to discard new instances without analyzing them [5, 10].
3 Related Work
Machine learning models based on the Bayes’ Theorem have been widely used in classification since their outputs are human-readable, they can naturally handle missing values, and are relatively easy to implement [22, 36].
In hierarchical classification, Bayesian classifiers were used with different levels of adaptation. The authors in [14] used Bayesian probabilities attached to each node in the hierarchy using a Local Classifier per Node approach [35] and a top-down strategy to analyze the binary predictions along with the hierarchical structure. Similarly, the authors in [13] used binary classifiers for each class in the hierarchy considering both the parent and child classes of the current class.
In the works of [7, 45], the authors also used Bayes-based classifiers within a Local Classifier per Node approach but to perform hierarchical multilabel classification. Finally, the authors in [36] proposed a Naive Bayes fitted to the hierarchical classification using a global approach [35].
A Bayesian classifier fitted to handle hierarchical classification needs to be adapted, at least, to consider the relationship between the hierarchically structured classes in the calculation of probabilities [36].
In data stream classification, incremental adaptations of Bayesian classifiers have been widely studied and are also widely applied in state-of-the-art algorithms. Data stream classification can be handled by a Naive Bayes classifier in a straightforward manner, since the learning model only needs to incrementally store summaries of data that allow the probabilities calculations as new instances are provided from the data stream [25].
The authors in [3] introduced the idea of recalculating probabilities for each instance provided to a model and this idea was later reinforced by the authors in [2, 25]. In the work [33], the authors proposed an incremental Bayes Tree based on statistical summaries of data which are updated with each incoming instance. The authors in [9] used Naive Bayes classifiers ensembled with other tree-based classifiers to improve specific leaf node predictions. Finally, the authors in [4] also used incremental statistical summaries to restrain a Naive Bayes classifier and cope with limited computational resources.
It is noteworthy, nonetheless, to highlight the work of [28], where the authors proposed an incremental k-Nearest Neighbors (kNN) [1] approach for the hierarchical classification of data streams. This can be considered a seminal work of the area, yet, it does depict drawbacks such as kNN relies on distance computations, which are computationally intensive and can put in jeopardy time and memory usage constraints required by streaming scenarios [27, 38]. In this sense, in Sect. 5, we compare our proposal (GNB-hDS) against the one proposed in [28] and show that GNB-hDS uses Bayes probabilities to obtain competitive prediction correctness with better computational performance.
4 Proposed Method
Our proposal, hereafter referred to as Gaussian Naive Bayes for Hierarchical Data Streams (GNB-hDS), is an incremental method for the hierarchical classification of data streams based on the Naive Bayes technique [11, 18].
The main idea behind GNB-hDS is the use of incremental data summaries, specifically the mean, standard deviation, and the number of data instances, that allow the calculation of probabilities used in the Bayes’ Theorem [11, 22]. These incremental data summaries are attached to nodes of the hierarchy and are updated as new instances are gathered from the data stream. We implemented two key adaptations in the traditional Naive Bayes classifier to make it handle hierarchical data streams:
-
Regarding the hierarchical data structure, the original algorithm was modified to consider not only one class but all related classes of a given instance. As the hierarchical data structure represents a subsumption relation, any new instance provided from the data stream also belongs to its ancestors. Thus, we traverse the hierarchy to update all data summaries of parent nodes recursively until the root node of the hierarchy.
-
Regarding the streaming input data, the algorithm must store incremental statistical descriptors instead of the actual instances. Thus, we need to compute the mean, the standard deviation, and the count of data instances assigned to each class incrementally, discarding the instance after it is analyzed.
Regarding the stated problem approach, GNB-hDS represents the class taxonomy in a tree structure using local classifiers at each parent node and assigns leaf node classes as the last class of one predicted label path \(\vec {y}^t\) (mandatory leaf-node and single path prediction) [35].
We point out that although the GNB-hDS method has been implemented here in a more specific way regarding the stated problem, GNB-hDS also supports direct acyclic graphs and non-mandatory leaf node prediction in its concept. To that, the data structure of a given node in the hierarchy should allow links with more than one parent node, and the top-down strategy used in the prediction step must consider some stopping criteria (e.g., a probability threshold) resulting in partial depth label paths.
Figure 3 illustrates the process performed by GNB-hDS. Circles represent classes, and dashed squares enclose classifiers. The method represents the class taxonomy in a tree structure, where R stands for the root node of the hierarchy and classes are related with each other (as described in Sect. 2).
When receiving an incoming instance for prediction, the method tackles the hierarchy using a Local Classifier per Parent Node (LCPN) approach [35], thus analyzing the current parent node and predicting between its child nodes by using probabilities obtained with the Bayes’ Theorem. This process is repeated until a leaf node is reached.
Each node in the tree stores the count of instances (n), a d-dimensional incremental mean (\({\bar{x}}_{n}\)) and a d-dimensional incremental standard deviation (\(\sigma _{n})\) of the class represented (as shown in class 2). After the incoming instance processing, the statistical descriptors (\(n,{\bar{x}}_{n}\sigma _{n}\)) are updated incrementally with the instance feature values on all the classes through the hierarchy regarding the ground-truth label path of that instance.
As before introduced, the instances are represented by data summaries comprising three statistical descriptors stored incrementally: (i) the count of class instances, (ii) the d-dimensional mean instance, and (iii) the d-dimensional standard deviation of the instances of a given class.
The number of instances assigned to a class C is stored in an attached counter. When an instance is retrieved from the stream, the C-th class counter is incremented alongside the counters of C’s ancestors.
The incremental mean (\({\bar{x}}_{n}\)) and the incremental standard deviation (\(\sigma _{n})\) considering each attribute from a \(d-\)dimensional \(x_n\) instance are obtained, respectively, from Eqs. 1 and 2, where n stands for the number of instances observed so far assigned to C [15, 40].
Also, it is important to reinforce that the incremental mean and the incremental standard deviation are d-dimensional descriptors as the features set and its values from the d-dimensional \(x_n\) instance. Note that Eqs. 1 and 2 support only continuous feature sets and the current mean and the standard deviation of the previous observed instances assigned to C are represented by \({\bar{x}}_{n-1}\) and \(\sigma _{n-1}\).
The prediction of the class to be assigned to an incoming instance provided from the data stream is performed in three steps: (i) computation of the a priori probabilities based on the count of class instances, (ii) computation of likelihood probabilities based on the Bayes’ Theorem for each attribute of the incoming instance, and (iii) calculation of the maximum value of the a posteriori probability from the product of the independent feature probabilities given a class C.
The calculation of the likelihood probability is described in Eq. 3, where i represents a feature index and j a class index [11].
To perform the class assignment, the GNB-hDS obtains the class label with the maximum value of the a posteriori probability, as described in Eq. 4, from the product of the independent feature probabilities given C [11].
Moreover, these three steps are performed from the top of the hierarchy data structure and repeated until a leaf node is reached, resulting in the union of the class assignments made from Eq. 4 and representing the final label path assigned to the incoming instance.

Algorithm 1 shows the proposed Gaussian Naive Bayes for Hierarchical Data Streams (GNB-hDS). It receives a hierarchical data stream hDS supplying instances \((\vec {x}, \vec {y})\) over time and, if required, outputs a set of predicted labels (a label path) \(\widehat{\vec {y}_i}\) for each given instance \((\vec {x},\vec {y})\), where \(\vec {x}\) represents a d-dimensional features set and its values, and \(\vec {y}\) represents the corresponding ground-truth label path of that instance.
The algorithm starts by understanding and representing the class taxonomy from the hierarchical data stream. The first loop (line 2 onwards) receives an incoming instance from the hierarchical data stream. The following loop (lines 4–12) handles the hierarchy using the LCPN approach by predicting one of the children labels possible for that parent node.
The a priori probabilities are calculated in line 6 using the counts of class instances. The likelihood and posterior probabilities are calculated in lines 8 and 9 by the application of Eqs. 3 and 4, respectively. The predicted node for the evaluated parent is obtained in line 10, and the respective single label is appended to a partial label path \(\widehat{\vec {y}_i}\) (line 11). This process is repeated until a leaf node is reached and the label path \(\widehat{\vec {y}_i}\) is complete and ready to be output by the algorithm.
Finally, the algorithm updates the statistical descriptors (the count n of class instances, the incremental mean instance \({\bar{x}}_{n}\), and the incremental standard deviation \(\sigma _{n}\)) of all classes contained in \(\vec {y}_i\), from the leaf to the root class.
5 Analysis
In this section, we report the experimental analysis conducted to compare our proposal against existing works in hierarchical data stream classification. First, we provide the experimental protocol adopted. Next, we discuss the results in terms of prediction and performance.
5.1 Experimental Protocol
Table 1 depicts the 14 hierarchically labeled datasets used in our testbed, listing their number of instances, features, and classes, the number of labels per level in the hierarchy (from top-level to leaf level), and references. These datasets contain different features, instances, and domains, thus assessing how our proposal behaves in different scenarios.
During the experiments, classifiers were assessed in terms of hierarchical F-measure [24]. Like traditional classification metrics, the hierarchical F-Measure (hF) relies on hierarchical precision and recall components, but instances are associated with a path of labels, and the entire path is evaluated.
The hierarchical F-Measure is depicted in Eq. 5, while its precision (hP) and recall (hR) components are described in Eqs. 6 and 7, respectively. In both precision and recall metrics, \(\widehat{\vec {y}_i}\) is the set of labels predicted for the i-th instance, and \({\vec {y}_i}\) is its corresponding ground-truth label set.
We report the hF metric using the prequential test-then-train [10, 20] validation method, where each instance is used to test the model before it is used for training and updating [10, 21].
Furthermore, we measured the time performance by calculating the number of instances that a classifier can process per second.
We compared our proposed GNB-hDS to the hierarchical kNN described in Sect. 3 proposed in [28], hereafter referred to as kNN-hDS. We set up kNN-hDS with \(k \in \{1,3,5\}\) and \(n \text { (buffer size)} \in \{\)5, 10, 15, 20\(\}\). The method GNB-hDS does not require setting parameters.
Finally, the results obtained by both methods were assessed using Wilcoxon hypothesis tests [41] with a 95% confidence level according to the protocol provided in [16] to verify significant differences in the hF and instances processed per second rates obtained by both methods.
The experiments in this paper were performed using Python 3.7. The proposed script containing the GNB-hDS method, as well as the datasets, are freely available for downloadFootnote 1.
5.2 Results
Table 2 shows the Hierarchical F-measure (hF) and Instances per second rates obtained by kNN-hDS and GNB-hDS in the datasets (greater values are highlighted in bold). In the kNN-hDS method, rates represent the best hF results obtained in the parameters configuration (as described in Sect. 5.1).
In terms of predictive performance assessment, the GNB-hDS method obtained better hF rates in 10 out of the 14 datasets. However, hF values are similar across both methods, such that the average difference between them is 0.44% while favoring GNB-hDS. Despite the improvements, the Wilcoxon test showed no statistical difference between hF rates obtained by the methods (p-\(value = 0.2209\)).
Concerning processing speed comparison, the GNB-hDS method was able to process more instances per second across all datasets, with an average rate of 446.21 instances against 140.43 of the kNN-hDS method. Thus, on average, our method was able to process 3.2 times more instances than the kNN-hDS method.
A one-tailed Wilcoxon test indicated a statistical difference between instances per second rates obtained by both methods (p-\(value = 0.0005\)) and confirmed that GNB-hDS is significantly faster when compared to kNN-hDS method.
Considering predictive performance and processing speed rates, GNB-hDS can obtain computational performance improvements without significant threats to the predictive performance by using statistical summaries of data combined with the class hierarchy information.
As aforementioned, the GNB-hDS method uses the premise of a Gaussian (normal) data distribution to deal with instance representation in the learning model [30]. In this sense, in addition to the previously described analysis, we investigated if GNB-hDS could use its premise to obtain better hF rates when data is normally distributed.
Thus, the GNB-hDS method, in addition to its speed, would present an additional advantage to the kNN-hDS method (or to any other method that does not use the premise of data normality) since it would be more adapted to classify normally distributed data. This advantage can be even more noticeable when we consider the data stream context, where data are potentially unbounded and statistical descriptors, such as mean and standard deviation, are more likely to obtain better representations of the population.
To examine this claim, we perform a Shapiro-Wilk test of normality [34] in all the datasets and applied the Yeo-Johnson power transformation technique [44] in all data to improve their adherence to a more normal distribution. The data normality was measured by Shapiro-Wilk test before and after the application of the Yeo-Johnson transformation.
Table 3 depicts the Shapiro-Wilk W statistic before (raw data) and after (transformed data) the Yeo-Johnson transformation. W statistic is bounded by 1, and closer values to this upper bound represent data more fitted to a normal distribution.
In addition, Table 3 depicts the Hierarchical F-measure (hF) obtained by both GNB-hDS and kNN-hDS methods when applied in both raw and transformed datasets. One can note that the predictive performance of GNB-hDS was improved when using transformed data in 13 out 14 datasets. Likewise, the average hF increased 2.35%, with noticeable increases in some datasets, such as Entomology (5.23%) and Insects-o-o-c (5.32%). Oppositely, kNN-hDS could not achieve the same improvements. In fact, kNN-hDS obtained lower hF rates with the transformed data resulting in a decrease of 1.6% in the average hF from 72.16% to 70.56%.
Finally, we performed one-tailed Wilcoxon tests to verify if the results obtained with the transformed datasets are significantly higher than with raw data for both GNB-hDS and kNN-hDS methods.
On kNN-hDS, the test indicated a statistical difference between performances with both data (p-\(value = 0.0009\)) favoring raw data, i.e., kNN-hDS does not benefit from a more normal distributed data. In contrast, on GNB-hDS the test indicated a statistical difference between performances with both data (p-\(value = 0.0006\)) favoring the transformed data and has confirmed that GNB-hDS can take advantage of a more normal distribution-like data, thus corroborating our claims.
6 Conclusion
In this paper, we proposed GNB-hDS, an algorithm for hierarchical classification of data streams using data summaries to represent data. Our proposal is incremental and handles potentially unbounded data streams with constant memory consumption. Consequently, the proposed method processes more instances per second without dreadful impacts in prediction rates when compared to existing kNN-based techniques. To the best of our knowledge, our method extends the state-of-the-art being the first incremental method based on Bayes’ Theorem tailored for hierarchical data streams classification.
The resulting source code and all the datasets used in the experiments are freely available for download to be used as a baseline to further research on the hierarchical classification of data streams, such as data preprocessing, computational resources analysis, and concept drift detection and adaptation.
In future works, we are interested in designing and applying other data summaries and different window types to maintain more than one a priori probabilities per class to allow a posteriori probabilities calculation weighted by data newness. Also, we are interested in applying existing drift detectors [6, 8, 17] to increase the responsiveness to changes in the data distribution.
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Alcobé, J.: Incremental learning of tree augmented Naive Bayes classifiers. In: Garijo, F.J., Riquelme, J.C., Toro, M. (eds.) IBERAMIA 2002. LNCS (LNAI), vol. 2527, pp. 32–41. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36131-6_4
Anderson, J.R., Matessa, M.: Explorations of an incremental, Bayesian algorithm for categorization. Mach. Learn. 9(4), 275–308 (1992)
Bahri, M., Maniu, S., Bifet, A.: A sketch-based Naive Bayes algorithms for evolving data streams. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 604–613. IEEE (2018)
Barddal, J.P., Gomes, H.M., Enembreck, F., Pfahringer, B., Bifet, A.: On dynamic feature weighting for feature drifting data streams. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 129–144. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46227-1_9
Barros, R.S., Cabral, D.R., Gonçalves Jr., P.M., Santos, S.G.: RDDM: reactive drift detection method. Expert Syst. Appl. 90, 344–355 (2017)
Bi, W., Kwok, J.T.: Bayes-optimal hierarchical multilabel classification. IEEE Trans. Knowl. Data Eng. 27(11), 2907–2918 (2015)
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148 (2009)
Bifet, A., Kirkby, R.: Data stream mining a practical approach (2009)
Bishop, C.M.: Pattern Recognition and Machine Learning. springer, Heidelberg (2006)
Burred, J.J., Lerch, A.: A hierarchical approach to automatic musical genre classification. In: Proceedings of the 6th International Conference on Digital Audio Effects, pp. 8–11. Citeseer (2003)
de Campos Merschmann, L.H., Freitas, A.A.: An extended local hierarchical classifier for prediction of protein and gene functions. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 159–171. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40131-2_14
Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. J. Mach. Learn. Res. 7, 31–54 (2006)
Chan, T.F., Golub, G.H., LeVeque, R.J.: Algorithms for computing the sample variance: analysis and recommendations. Am. Stat. 37(3), 242–247 (1983)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Frías-Blanco, I., del Campo-Ávila, J., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Díaz, A., Caballero-Mota, Y.: Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans. Knowl. Data Eng. 27(3), 810–823 (2014)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC (2010)
Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Hesabi, Z.R., Tari, Z., Goscinski, A., Fahad, A., Khalil, I., Queiroz, C.: Data summarization techniques for big data—a survey. In: Khan, S.U., Zomaya, A.Y. (eds.) Handbook on Data Centers, pp. 1109–1152. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-2092-1_38
Kiritchenko, S., Famili, F.: Functional annotation of genes using hierarchical text categorization. In: Proceedings of BioLink SIG, ISMB, January 2005
Klawonn, F., Angelov, P.: Evolving extended Naive Bayes classifiers. In: Sixth IEEE International Conference on Data Mining-Workshops (ICDMW 2006), pp. 643–647. IEEE (2006)
Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007)
Nguyen, H.L., Woon, Y.K., Ng, W.K.: A survey on data stream clustering and classification. Knowl. Inf. Syst. 45(3), 535–569 (2015)
Parmezan, A.R.S., Souza, V.M.A., Batista, G.E.A.P.A.: Towards hierarchical classification of data streams. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds.) CIARP 2018. LNCS, vol. 11401, pp. 314–322. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13469-3_37
Pereira, R.M., Bertolini, D., Teixeira, L.O., Silla Jr., C.N., Costa, Y.M.: COVID-19 identification in chest x-ray images on flat and hierarchical classification scenarios. Comput. Methods Programs Biomed. 194, 105532 (2020)
Pontes, E.A.S.: A brief historical overview of the Gaussian curve: from Abraham de Moivre to Johann Carl Friedrich Gauss. Int. J. Eng. Sci. Invent. (IJESI), 28–34 (2018)
Prasad, B.R., Agarwal, S.: Stream data mining: platforms, algorithms, performance evaluators and research trends. Int. J. Database Theory Appl. 9(9), 201–218 (2016)
Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Seidl, T., Assent, I., Kranen, P., Krieger, R., Herrmann, J.: Indexing density models for incremental learning and anytime classification on data streams. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 311–322 (2009)
Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52(3/4), 591–611 (1965)
Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)
Silla Jr., C.N., Freitas, A.A.: A global-model Naive Bayes approach to the hierarchical prediction of protein functions. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 992–997. IEEE (2009)
Souza, V.M.A., Reis, D.M., Maletzke, A.G., Batista, G.E.A.P.A.: Challenges in benchmarking stream learning algorithms with real-world data. Data Min. Knowl. Discov., 1–54 (2020). https://doi.org/10.1007/s10618-020-00698-5
Steinbach, M., Ertöz, L., Kumar, V.: The Challenges of clustering high dimensional data. In: Wille, L.T. (ed.) New Directions in Statistical Physics, pp. 273–309. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-662-08968-2_16
Tsymbal, A.: The problem of concept drift: definitions and related work. Comput. Sci. Dep. Trinity Coll. Dublin 106(2), 58 (2004)
West, D.: Updating mean and variance estimates: an improved method. Commun. ACM 22(9), 532–535 (1979)
Wilcoxon, F.: Individual comparisons by ranking methods. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics. Springer Series in Statistics (Perspectives in Statistics), pp. 196–202. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_16
Wu, F., Zhang, J., Honavar, V.: Learning classifiers using hierarchically structured class taxonomies. In: Zucker, J.-D., Saitta, L. (eds.) SARA 2005. LNCS (LNAI), vol. 3607, pp. 313–320. Springer, Heidelberg (2005). https://doi.org/10.1007/11527862_24
Yassin, N.I., Omran, S., El Houby, E.M., Allam, H.: Machine learning techniques for breast cancer computer aided diagnosis using different image modalities: a systematic review. Comput. Methods Progr. Biomed. 156, 25–45 (2018)
Yeo, I.K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000)
Zaragoza, J.C., Sucar, E., Morales, E., Bielza, C., Larranaga, P.: Bayesian chain classifiers for multidimensional classification. In: Twenty-Second International Joint Conference on Artificial Intelligence. Citeseer (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Tieppo, E., Barddal, J.P., Nievola, J.C. (2021). Classifying Potentially Unbounded Hierarchical Data Streams with Incremental Gaussian Naive Bayes. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13073. Springer, Cham. https://doi.org/10.1007/978-3-030-91702-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-91702-9_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91701-2
Online ISBN: 978-3-030-91702-9
eBook Packages: Computer ScienceComputer Science (R0)



