1 Introduction

The rapid increase in data generation has led to the emergence of data streams, characterized by continuous, real-time data flow that challenges the static nature of traditional datasets [2, 11]. In supervised learning tasks, this dynamic environment presents the dual issues of concept drift, also known as data shift [3], where underlying data distributions shift over time, and class imbalance, where one class vastly outnumbers the others. In data streams, this imbalance is not static but can also evolve, intensifying the difficulty of the learning task. The interplay between the dynamic nature of imbalance and the shifting paradigms of concept drift demands a rethinking of traditional approaches to maintain classifier performance [12, 15].

Many algorithms have been proposed to address the challenges in data streams classification with imbalanced data, such as CALMID [17], CSARF [18], ROSE [8], OOB [19], SMOTE-OB [4] and UOB [19]. Four of these algorithms, CALMID, CSARF, ROSE, and SMOTE-OB, implement an explicit approaches to handle concept drift. CSARF, ROSE, OOB, and UOB implement an ensemble approach to handle imbalanced data. Additionally, CSARF also implements a cost-sensitive approach.

One ensemble algorithm that showed robust results for binary classification in data streams with concept drift is the Kappa Updated Ensemble (KUE) [7]. By implementing the Very Fast Decision Tree (VFDT) [13] algorithm as a basis for its experts and creating policies for model updates, vote abstention, and model assessment using the Kappa statistic, the KUE outperformed other state-of-the-art algorithms [7]. However, KUE was not designed for data streams with imbalanced classes.

This paper proposes an algorithm for binary classification named the Balanced Accuracy-based Sliding Window Ensemble (BASWE), inspired by KUE. BASWE uses two sliding windows, resampling techniques, and replaces the Kappa statistic with the Balanced Accuracy metric in ensemble updates. These modifications aim to achieve a more robust ensemble with higher performance in the scenario of classification in data streams with imbalanced classes and concept drifts. The contributions of this work can be summarized as follows:

  • Introduction of BASWE, an ensemble algorithm designed to manage binary classification in data streams with imbalanced classes and concept drift.

  • The adoption of Balanced Accuracy as a pivotal metric in guiding ensemble updates, encompassing voting strategies, model performance measurement, and expert substitution.

  • The use of class-specific sliding windows and a resampling step in pre-processing. This strategy is tailored to address imbalanced data streams, effectively reducing the imbalance ratio during the model training phase with new data chunks.

  • A comprehensive experimental evaluation comparing BASWE against state-of-the-art algorithms.

2 Data Stream Classification

Data streams can be understood as data instances generated at high speed and arriving continuously, presenting challenges to computational systems for storage and processing [10]. However, if efficiently analyzed, they provide an important source of information for real-time decision-making support.

A data stream is characterized as a sequence denoted by \( S = \langle S_{1}, S_{2}, \ldots , S_{n}, \ldots \rangle \), where each \( S_{j} \) represents a collection of instances with a size \( N \ge 1 \). In the particular case where \( N=1 \), the context is referred to as online learning [10], otherwise, it is called learning by chunks. Typically, each instance \( s_t \) within each set \( S_j \) is independently and randomly produced following a stationary distribution \( D_{j} \).

The data stream classification task aims to predict the correct label \(y_t\), for each incoming instance \(s_t \in S_j\). Each instance is described by a set of attributes \(X_t=\{x_{t1}, x_{t2}, ..., x_{tn}\}\). A classifier, represented as F, takes the set of instance attributes \(X_t\) as input and generates the predicted label \(\hat{y}_t\), as output. In our research, we focus on fully labeled binary classification, where the correct label \( y_t \) is restricted to one of two possible outcomes. Additionally, every instance \( s_t \) within the data stream is pre-labeled.

In several classification problems, the distribution \(D_{j}\) is subject to changes over time, as the characteristics and definitions of the data stream evolve. This phenomenon is called concept drift [12, 20]. Concept drifts in the pattern of class distribution can be segmented into 4 categories [12]: sudden, incremental, gradual, or recurrent. In the realm of data streams, concept drift can be addressed either explicitly (using a dedicated drift detector) or implicitly.

Class imbalance in data stream classification often occurs for various reasons. It can be due to the intrinsic nature of the data scope being processed, such as a medical database that includes sporadic occurrences of a rare disease. Alternatively, it could occur purely by chance if the batch of data being processed lacks a representative diversity of instances for each class.

To address the scenario of imbalanced classes in machine learning within data streams, three main methods are commonly employed [14]: (i) data-level methods, where the data is pre-processing using, for example, resampling techniques; (ii) algorithmic-level methods where the learning algorithms are adapted to deal with imbalance classes; and (iii) hybrid methods which represent a combination of the previous two.

3 BASWE

This section introduces the Balanced Accuracy-based Sliding Window Ensemble (BASWE), an algorithm designed specifically for binary classification in imbalanced data streamsFootnote 1. This algorithm is inspired by KUE algorithm [6]. BASWE has three main characteristics. The first one is the use of dynamic ensemble techniques, which address dynamic substitutions and provide real-time ensemble updates. Additionally, BASWE implements an abstention policy to address relevance modifications of the experts. The second crucial characteristic is the use of Balanced Accuracy as the primary metric for model evaluation and updating. This strategy is specifically tailored to address the imbalanced data sets that BASWE is designed to handle. The third key feature is the application of sliding windows as a data-level method that can effectively manage class imbalance in the data stream. These latter two characteristics are explained in Sects. 3.1 and 3.2, respectively.

BASWE’s experts in the ensemble implement the Very Fast Decision Trees (VFDT) algorithm [13]. This algorithm is recognized for its computational efficiency, making it ideal for handling real-time data streams. It operates by building decision trees incrementally, accommodating massive volumes of data while minimizing memory usage.

3.1 Balanced Accuracy Approach to Handle Ensemble Update

Balanced Accuracy (BA) (Eq. 1) provides a robust measure for binary classification problems, especially when dealing with imbalanced datasets.

$$\begin{aligned} \text {Balanced Accuracy} = \frac{1}{2}\left( \frac{TPR}{TPR+FNR}+\frac{TNR}{TNR+FPR}\right) \end{aligned}$$
(1)

where TPR is the True Positive Rate and TNR is the True Negative Rate.

In the context of the BASWE algorithm, Balanced Accuracy is used as a driving mechanism for two key operations:

  • Expert substitution within the ensemble. With every new chunk of data, q new experts are trained outside the ensemble. The ensemble itself consists of k experts \((\gamma _1,...,\gamma _{k})\), and if the Balanced Accuracy of the newly trained experts exceeds that of the worst-performing ones in the ensemble, a substitution takes place.

  • Abstention of voting from ensemble experts. The abstention policy operates by controlling the voting rights of the ensemble experts based on their Balanced Accuracy. When the ensemble computes the majority vote to determine the final classification, the BA of each expert is assessed against a specific threshold (0.5 in our implementation). Any expert with a BA below this threshold is prevented from voting, meaning their classification decision does not influence the ensemble’s final output. This abstention policy ensures that the ensemble’s decision-making process is not adversely affected by the subpar performance of any of its experts, thereby maintaining the integrity and accuracy of the ensemble’s output.

3.2 Sliding Window Approach for Handling Imbalanced Data Streams

In BASWE, the sliding window approach is adopted to manage a sort of data cache, aiming to decrease the disproportion among the classes over time. This methodology contributes to handling the data imbalance and providing a more accurate and representative sample of the current data for model training.

To implement this approach, the algorithm establishes two sliding windows, \(W_0\) and \(W_1\), each corresponding to one class. The size of each sliding window is set to be half of the chunk size that is used for model processing. As each new chunk of data arrives from the data stream, the instances are added to the corresponding sliding window. If a sliding window is full, meaning the number of instances equals the predefined size, the oldest instance is discarded to make room for the newest one. This policy of expelling older instances helps accommodate concept drift, as instances naturally exit the cache over time, thereby reflecting the changing nature of the data stream.

Algorithm 1 shows the fillWindow function. In this function, the algorithm takes as input the current sliding window, the target class covered by this sliding window (desiredClass), the current chunk of data (\(S_i\)), and the window max size (cs). The chunk is traversed searching for instances of the desired class (lines 2-3), and upon finding an instance, it is added to the sliding window (line 4). After adding a new instance to the sliding window, it checks whether its size has exceeded cs, and if so, removes the oldest instance from the sliding window (lines 5-7). In the end, the function returns the window vector incremented by the new instances of the desiredClass contained in the i-th chunk.

Algorithm 1
figure a

fillWindow function

When the BASWE model is trained, if any of the sliding windows is not fully populated, oversampling is performed. The oversampling function iterates over the vector from the most recent to the oldest instance, replicating instances until the number of instances required to complete the chunk size is achieved.

In this way, the sliding window approach within the BASWE algorithm not only ensures an up-to-date representation of the data for model training but also plays an instrumental role in managing class imbalance and concept drift.

3.3 BASWE’s Pseudocode

BASWE’s pseudocode is delineated in Algorithm 2, and is segmented into four stages: ensemble initialization, ensemble experts update, training of the new \( q \) components, and replacement of the weakest \( \gamma \) expert in the ensemble \( \epsilon \).

Algorithm 2
figure b

BASWE: Balanced Accuracy-based Sliding Window Ensemble

The main loop (lines 3-34) represents the algorithm over the chunks that arrive throughout the data stream at each iteration. In lines \(4-5\), two sliding windows \(W_0\) and \(W_1\) are filled, each for one class. Oversampling is then performed by the oversampling function in lines \(6-7\), creating \(W'_0\) and \(W'_1\), which are joined to form the chunk \(S'_i\) in line 8. This chunk will be used by the experts in the training phase, while the original chunk \(S_i\) will be used to compute the performance.

Upon receiving the first chunk, \(S_1\), we have the process of creating and initializing the ensemble of k experts, represented in lines \(9-15\). For each of the \( k \) experts in the ensemble, a random integer number \( r \) is selected (line 11), drawn from uniform probability \([1,f]\). This value, \( r \), determines the size of the feature subspace that will be used for the \( j \)-th expert. In line 12, the chooseFeaturesSubspace method selects a \( r \)-dimensional random feature subspace for the \( j \)-th expert, with this information stored in the variable \( \varphi _j \), that will be used to generate diversity in the ensemble and avoid noisy data. In line 13, the filterFeaturesSubspace method receives two parameters, the \( \varphi _j \) and the chunk \(S_1'\), and then returns the chunk \(S_1'\) filtering only the features contained in the \( \varphi _j \) subspace. The classifier \( \gamma _{j} \) is then initialized, training with the chunk \(S_1'\) filtered by the filterFeaturesSubspace method. The Balanced Accuracy of the \( j \)-th expert is then computed and stored in the variable \(BA_j\) (line 14).

The ensemble model update is performed in lines 17-20 considering the subsequent chunks. Within this process, the algorithm filters the feature subspace \( \varphi _j \) from chunk \( S'_i \), conducts incremental training of \( \gamma _{j} \) (line 18), and computes the Balanced Accuracy of this expert (line 19).

From lines \(21-26\), \( q \) new experts are trained. If these new experts exhibit superior performance based on Balanced Accuracy compared to the \( q \) least effective experts in the ensemble, they will replace the latter. The higher the value of \( q \), the more intense the behavioral change of the ensemble tends to be, as more components can be replaced in each data chunk. This presents both an advantage of rapid adaptation and a risk of undesirable abrupt changes caused by anomalous chunks. In line 26, the minBalancedAccuracy method conducts a search for the expert in the ensemble \( \epsilon \) with the lowest performance. The result returned by this method is then stored in \(BA_{min}\), the Balanced Accuracy of the expert, and the position of the expert within the ensemble, represented by w. This substitution takes place in lines \(27-31\) of Algorithm 2, where we compare the Balanced Accuracy of the new expert, denoted as \( BA' \), to that of the expert with the lowest Balanced Accuracy, \(BA_{min}\). If the new expert demonstrates superior performance, the expert at position \( w \) (represented by \( \gamma _w \)) is substituted with the new \( \gamma ' \). We also update the feature subspace, \( \varphi _w \), and the Balanced Accuracy, \( BA_w \), of the weakest expert with \( \varphi ' \) and \( BA' \), respectively.

4 Experimental Results

In this section, we compare BASWE with seven state-of-the-art algorithms.

4.1 Benchmark Algorithms

BASWE was compared with different ensemble classifiers: KUE [7], CALMID [17], CSARF [18], ROSE [8], OOB [19], UOB [19], and SMOTE-OB [4]. CALMID, CSARF, ROSE, UOB, and SMOTE-OB were selected as the top 5 algorithms by a recent survey on imbalanced data streams [1]. Additionally, we included KUE, as it served as the inspiration for BASWE, and OOB, which was proposed by the same authors as UOB and demonstrated good performance in other recent works [8]. Among the selected algorithms, KUE is the only one that was not designed to deal with imbalanced data streams.

All those algorithms are ensembles; however, they implement distinct approaches to address the replacement of new base classifiers. KUE and CALMID algorithms execute, at most, one replacement per new chunk when employed with their default parameters. Conversely, BASWE, CSARF, and ROSE allow the substitution of multiple experts with each new chunk (or instance, in the case of ROSE). On the other hand, the OOB, SMOTE-OB, and UOB algorithms do not replace their base classifiers throughout the data stream.

These algorithms are implemented in MOA and we use its default parameters, as specified in their respective original studies. KUE is the only exception; it has an implementation with \(q = 1\) and an additional implementation that permits the substitution of two experts (referred as KUE (q = 2) in this section). In BASWE, we make use of the same KUE’s default parameters (\(k=10\)), except q, which is equal to 2. BASWE has one additional parameter, cs, that represents the sliding window’s cache size, we use the half of chunk size as the value of cs.

4.2 Datasets

Experiments were conducted using 24 binary data stream generators in the MOA (Massive Online Analysis) environment [5], featuring varying degrees of data imbalance. Of these, 8 data streams have sudden concept drift, 8 data streams have gradual concept drift, and the remaining 8 do not have concept drift. Additionally, evaluations were carried out on 16 real-world datasets. All experiments in this study treated the data streams as fully labeled.

Synthetic Datasets: A series of synthetic datasets was generated utilizing the MOA environment. The chosen generators for this process were as follows: Agrawal, Asset Negotiation, HyperPlane, Mixed, Random RBF, Random Tree, SEA, and Sine. Each of these synthetic datasets was synthesized under specified conditions, with a designated random seed value set at 42, a chunk size of 500, and comprising 200,000 instances. All synthetic datasets were designed for binary classification. For a comprehensive analysis, these datasets were crafted with five distinct imbalance ratios: 90:10, 95:5, 97.5:2.5, 99:1, and 99.5:0.5. In these ratios, the value before the colon denotes the proportion of the majority class, while the value after represents the proportion of the minority class. Additionally, the synthetic data streams were generated in the two contexts, with and without concept drift. In the data streams with concept drift, we analyzed sudden and gradual concept drift, the two types of concept drift commonest in the studies [1].

Real Benchmark Datasets: Sixteen real benchmark datasets were utilized for the experimental analysis: Adult, Airlines, Bridges-1VsAll, Census, Covtypenorm-1-2VsAll, Credit.G, Dermatology, Diabetes, Electricity, Gmsc, Mushroom, Sick, Sonar, Vehicle, Vote, and VowelFootnote 2.

4.3 Metrics

The performance of the classifiers was evaluated using the Kappa metric, which was used in [1, 7, 8], and the F1-Score, which was used in [9, 15, 16]. For each set of experiments, described in Sect. 4.4, the following measures are computed:

  • Absolute Best (AB) that represents the number of experiments in which the algorithm achieved the best absolute metric (F1-Score or Kappa) value.

  • Equivalent Best using t-test (EB) that indicates the number of experiments where, despite the algorithm not having achieved the highest metric (F1-Score or Kappa) value, no significant difference was found (indicating a t-test p-value greater than 0.05) between its metric value and the highest metric value among all other algorithms.

  • Total Best (TB) that is the sum of the Absolute Best and Equivalent Best values, representing the number of experiments where the algorithm demonstrated the best or equivalent to the best metric (F1-Score or Kappa) value.

  • Avg. metric (F1-Score or Kappa) that represents the average of metric (F1-Score or Kappa) value from all experiments.

The algorithms are ranked by the Total Best value and, in case of a tie, the Avg. metric (F1-Score or Kappa) is used to rank the algorithms. We prioritize the Total Best value over Avg. metric (F1-Score or Kappa) since averages can be skewed by the outcomes from specific datasets. In certain scenarios, an algorithm might exhibit a significantly higher or lower Avg. metric (F1-Score or Kappa) due to unique characteristics inherent to a particular dataset. Hence, it is possible for an algorithm to rank first in Total Best while manifesting the least favorable results when evaluated by Avg. metric (F1-Score or Kappa), due to notably poorer performance in a few experiments.

4.4 Experiments Configuration

In this study, we define an experiment as the execution of an algorithm on a data stream. To compare the algorithms, experiments were performed with synthetic and real data streams. For the synthetic data, the number of streams was 40 without concept drift, 40 with gradual concept drift, and 40 with sudden concept drift. This is because there are 8 synthetic data streams generators, and each has one experiment for every ratio of data imbalance: 90:10, 95:5, 97.5:2.5, 99:1, and 99.5:0.5. For the real data streams, the number of experiments was 16 which corresponds to the number of real data streams. Since 9 algorithms were evaluated across 136 data streams, there are 1,224 experimental configurations. To mitigate potential impacts of performance anomalies associated with specific chunk characteristics or random initialization of the experts, each experiment was executed 10 times, resulting in 12,240 experimental runs.

To calculate the F1-Score and Kappa values for each algorithm on each data stream, the average value of the metrics was calculated over all the chunks of each run, using the “moa.tasks.EvaluatePrequential” configuration of MOA. Each new chunk is processed in two steps: (i) evaluate the current ensemble on the chunk, computing the F1-score and Kappa in line 18 of Algorithm 2; and (ii) update the ensemble with the new chunk in line 19 of Algorithm 2.

The time limit for each experiment execution was 60 min. The experiments were conducted on an Apple M1 Pro CPU with 16 GB of memory, running macOS Ventura 13.3.

4.5 Synthetic Imbalanced Data Streams Without Concept Drift

In this section, we address the results obtained from 40 experiments (8 synthetic imbalanced data streams without concept drift and 5 different imbalance rates). BASWE achieved the highest Total Best value for both F1-Score (15 of 40 experiments) and Kappa (20 of 40 experiments). The second-best Total Best value for both measures was achieved by the OOB ensemble classifier. Conversely, the CSARF algorithm recorded the lowest Total Best value, securing the Total Best F1-Score value in just 2 of 40 experiments and Kappa in 3 of 40. SMOTE-OB did not finish executing the experiments within the time constraint of 60 min.

4.6 Synthetic Imbalanced Data Streams with Concept Drift

In this section, we address the results obtained from 80 experiments (5 different imbalance rates with 8 imbalanced synthetic data streams with sudden concept drift and 8 with gradual concept drift).

For experiments with gradual drift, the datasets followed a consistent pattern until the 40,000th instance. From this point, a gradual drift began, continuing over a span of 20,000 instances until the 60,000th instance. For experiments with sudden drift, the datasets maintained a stable pattern until the 40,000th instance, where an immediate and abrupt change occurred. In both scenarios, with gradual and sudden drifts, SMOTE-OB did not finish the experiments’ execution within the time limit constraint (60 min).

Gradual Concept Drift. Table 1 shows F1-Score and Kappa for the experiments over the imbalanced data streams with gradual concept drift. BASWE emerged as a top-performing algorithm, securing the highest ‘Total Best’ value in both F1-Score (14 out of 40 experiments) and Kappa (19 out of 40 experiments). For synthetic imbalanced data streams without concept drift, OOB had the highest average metric, whether F1-Score or Kappa. However, in the scenario of synthetic imbalanced data streams with gradual concept drift, BASWE took the lead with the highest average F1-Score and ROSE had the highest average Kappa.

Table 1. F1-Score (left) and Kappa (right) obtained by the algorithms for the eight synthetic datasets with gradual concept drift
Fig. 1.
figure 1

F1-Score obtained by the algorithms for the eight synthetic datasets with gradual concept drift

Figure 1 shows the F1-Score achieved by each algorithm under different imbalance ratios. The results of Kappa were also plotted but are not shown here. We can see that in several data streams, the algorithms exhibit somewhat similar behavior. However, UOB and CSARF tend to diverge more from the other algorithms as the imbalance increases. This might suggest that, even though both algorithms are designed to handle imbalanced classes and perform reasonably well in that scope, they are more sensitive to highly imbalanced data contexts, such as ratios of 97.5:2.5 or higher. This behavior is observed for both Kappa and F1-Score. In the case of UOB, which employs undersampling, the reduced performance can be attributed to its aggressive discarding of instances from the majority class at high imbalance ratios, thereby losing valuable information beneficial for learning.

Sudden Concept Drift. Table 2 shows the compilation of results measured by F1-Score and Kappa, respectively, for the experiments over the imbalanced data streams with sudden concept drift. The best algorithm in terms of F1-Score was CALMID, which achieved the ‘Total Best’ in 16 experiments. In terms of Kappa, the top spot went to ROSE, also with a ‘Total Best’ value of 16 experiments. Both algorithms, CALMID and ROSE, implement explicit techniques to handle concept drift. These performances highlight that, in the specific scenario of sudden concept drift, the presence of dedicated mechanisms for detecting concept drifts makes a significant difference, ensuring the best performance in our experiments. This distinction was not as pronounced in experiments with gradual concept drift, where BASWE, which uses an implicit approach to handle concept drift, secured the highest ‘Total Best’ values, for both F1-Score and Kappa.

Table 2. F1-Score (left) and Kappa (right) obtained by the algorithms for the eight synthetic datasets with sudden concept drift

BASWE, while not securing top performance, still showed strong results, tied in the second-highest ‘Total Best’ value in F1-Score (11 out of 40 experiments), and the second in Kappa (14 out of 40 experiments). This suggests that, even with its implicit approach to concept drift, BASWE exhibited notable robustness within this data scope.

Plots for F1-Score and Kappa for different imbalance rates, analogous to those in Fig. 1, were also analyzed under the sudden concept drift scenario but are not shown here. It was observed that, similar to the experiments with gradual concept drift, UOB deteriorates more quickly as the imbalance increases. CSARF, which previously showed rapid deterioration like UOB in the gradual concept drift experiments, demonstrated more robustness in the experiments with sudden concept drift. This can be attributed to mechanisms designed to explicitly handle the occurrence of concept drift.

4.7 Real Datasets

In this section, we discuss the results from 16 experiments on real data streams, some of which exhibit concept drifts and feature various imbalance ratios. Unlike synthetic data streams, which are generated based on specific probability distributions and are thus well-behaved, real data streams might not follow these distributions. This can introduce unique challenges, such as data chunks that contain instances of only one class.

Table 3. F1-Score (left) and Kappa (right) obtained by the algorithms for the sixteen real datasets

Table 3 shows the F1-Score and Kappa achieved by the algorithms compared across all real datasets. BASWE consistently delivered the highest Total Best value of F1-Score, achieving this in 8 of the 16 real datasets. In terms of Kappa, there was a tie among CSARF, ROSE, and BASWE algorithms, that have Total Best value equal to 5 experiments each. Just as observed in experiments with synthetic data streams, SMOTE-OB exhibited a high computational cost and was unable to complete the execution of experiments within the 60-minute time constraint for 5 of the 16 data sets.

Table 4. ‘Total Best’ of F1-Score for total experiments
Table 5. ‘Total Best’ of Kappa for total experiments

4.8 Discussion

Tables 4 and 5 present the overall performance of the algorithms measured by F1-Score and Kappa, respectively. Experimental results suggest that the BASWE is a promising choice for binary classification tasks involving class imbalance with and without concept drift. BASWE achieved superior results when measured by F1-Score, achieving the best performance or equivalent to the best in 48 out of 136 general experiments: 15 out of 40 in imbalanced data streams without concept drift, 14 out of 40 for the set with gradual concept drift, 11 out of 40 for the set with sudden concept drift, and 8 out of 16 for the real datasets. Similarly, it attained superior results when measured by the Kappa statistic, achieving the best performance or equivalent to the best in 58 out of 136 general experiments: 20 out of 40 in imbalanced data streams without concept drift, 19 out of 40 for the set with gradual concept drift, 14 out of 40 for the set with sudden concept drift, and 5 out of 16 for the real datasets.

The experimental data highlighted in Sect. 4.6 suggest that the effectiveness of BASWE decreases in the presence of sudden concept drifts. In such cases, the algorithms that recorded the highest ‘Total Best’ scores for each metric (CALMID for F1-Score and ROSE for Kappa) utilize drift detection mechanisms, which directly address concept drift. This could suggest that BASWE’s implicit method of handling concept drift may be less effective during rapid shifts in data distribution, resulting in a slower adaptation to these changes.

5 Conclusion

This paper presents BASWE, an algorithm tailored specifically for binary classification in imbalanced data streams with and without concept drift. BASWE incorporates sliding windows paired with class-specific oversampling. Rather than processing all data uniformly, BASWE prioritizes the minority class instances, ensuring their proportional representation within the model. Class-specific oversampling entails replicating instances of the minority class, effectively countering the prevalent class imbalance.

Furthermore, BASWE leverages Balanced Accuracy for ensemble updates and vote abstention. This not only ensures that the algorithm remains unbiased towards the majority class, offering a fair evaluation of its performance across both classes, but also measures the quality of experts within the ensemble. By evaluating and subsequently updating or replacing underperforming experts based on Balanced Accuracy, BASWE implicitly addresses concept drift over time. This continuous updating also enhances the diversity within the ensemble, which is further augmented by selecting a random subset of features for each expert in the ensemble.

Its consistently high performance across a variety of datasets, coupled with its robustness to different data conditions, makes BASWE a versatile and reliable choice for handling classification tasks in imbalanced data streams, with or without concept drift.