Growing Self-Organizing Maps for Multi-label Classification

Henrique Casarotto, Pedro; Cerri, Ricardo

doi:10.1007/978-3-031-79032-4_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15413))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

399 Accesses
1 Citation

Abstract

In Machine Learning, multi-label classification is the problem of simultaneously classifying an instance into two or more labels. It is a challenging problem since each label has its specialty, and correlations between them must be considered. A Self-Organizing Map (SOM) is a Neural Network where neurons organized in a grid are tuned to represent the input instances in self-organization. After tuning, similar instances in the input space are mapped to closer neurons in the grid. SOMs have already been used for multi-label problems, obtaining competitive results with other methods. However, the static nature of their grid of neurons is a disadvantage since it is difficult to define the optimized grid size for each problem. The Growing Self-Organizing Maps (GSOM) extends the SOMs, allowing the network to grow during execution based on the data characteristics. This paper proposes a GSOM to predict multi-label data. The experiments showed that GSOM obtained better or more competitive results in most of the datasets investigated compared to SOM and had a competitive performance compared to other methods.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Incorporation of Neighborhood Concept in Enhancing SOM Based Multi-label Classification

Improving Self-Organizing Maps with Unsupervised Feature Extraction

Enhancing Visual Clustering Using Adaptive Moving Self-Organizing Maps (AMSOM)

1 Introduction

In the Machine Learning context, the most common classification problem is called single-label. This problem involves classifying a data input in a single class l from a set of classes L, where $|L| > 1$. If the data input is expected to be classified over two or more classes simultaneously, then the problem is considered a multi-label classification problem. In the literature, this problem is presented over diverse areas, such image classification and recognition [8, 22], text and document classification [9, 12], music classification [33, 40], in biomedicine, such in genome classification [39, 42] and protein function prediction [29, 30].

There are many strategies to deal with multi-label problems, divided into two main approaches. The first is called the Local Approach, in which traditional classification algorithms independently predict each label. The other strategy is Global, which considers label correlations by creating specific algorithms to deal with the problem and all labels simultaneously. A hybrid of these two approaches is also possible [19].

Colombini et al. [17] proposed using Self-Organizing Maps [21] to solve the multi-label problems. Self-Organizing Maps (SOM), or Kohonen Maps, is an unsupervised neural network model containing a one-layer grid of neurons. Each neuron has a discriminant function measuring the similarity of the neuron weights to an input instance. In addition, Colombini et al. associated a prototype vector to each neuron, which is the mean of the input label vectors of the instances mapped to the neuron. When the model receives a new instance, it is compared with all weights of the neurons of the grid. The neuron that shows more similarity with the instance is selected as the winner neuron, and his prototype vector is returned as the prediction for that instance.

The Growing Self-Organizing Maps (GSOM) [6] are an extension of SOMs in which the number of neurons in the grid increases during training. The increase rate is established by the Growth Threshold, a parameter that compares the total error of the neuron with the input data. A new neuron is created if the error value exceeds the Growth Threshold. The benefit of this adaptation is that the number of neurons is more precise, and thus, the results are better.

In this paper, we propose a method for multi-label classification using Growing Self-Organizing Maps. The method is called Growing Self-Organizing Maps Multi-label Learning (GSOM-MLL), and improves the SOM-MLL proposed in the work of Colombini et al. [17].

The remainder of this paper is organized as follows. Section 2 contains a literature review of the most common algorithms for multi-label problems. Section 3 presents an overview of the Kohonen Maps and introduces the SOM-MLL proposed by Colombini et al. [17]. Our proposal, GSOM-MLL, is presented in Sect. 4, while our methodology is presented in Sect. 5. Section 6 presents and discusses our experiments and results. Finally, Sect. 7 concludes the paper and points to future work.

2 Multi-label Classification

This section presents the two main approaches to classifying multi-label data: Local (Problem Transformation) and Global (Algorithm Adaptation).

2.1 Local Approach

The local approach reduces the original problem into many different single-label problems. Each problem is individually solved, and the corresponding predictions are combined to form the final multi-label classification. The simplest local-based strategy is called Binary-Relevance (BR) [35] in which L binary classifiers are trained, each associated with one of the L labels. One advantage of this strategy is that the classifiers can be run independently from each other (in parallel, for example), and the peculiarities of each label independently influence its corresponding classifier. Another advantage is that traditional algorithms can be used. However, the disadvantage is that this strategy ignores the correlations between labels, a fundamental characteristic of multi-label problems.

Many proposed improvements in the BR algorithm have been explored over the last few years. Classifier Chains (CC) [27] modifies BR, adding a dependency mechanism in the labels. This is done by a chain that randomly connects the binary classifiers, considering the previous labels/predictions as input for training a classifier in the chain. It has the advantage of maintaining BR’s simplicity while learning the labels’ relationships. A similar strategy is Meta Staking (MS) [20], which indirectly considers these relations by adding the number of examples based on the previous classification. In Cherman et al. [15], BR was explored using decision trees and Naïve Bayes to consider label relations. Dembczyński et al. [14] is another approach that also uses the Naïve Bayes algorithm to optimize predictions. The BRkNN [32] is another approach that uses the kNN algorithm conceptually equivalent to BR, with the advantage of being faster.

Another common local-based strategy is called Label-Powerset (LP) [11] in which all classes assigned to each instance are turned into a new and unique class. This is a form to consider the correlations between classes, something BR could not do. However, this method can escalate quickly and create many classes with few positive instances. A modified version called RAndom k-LabELsets (RAKEL) [37] creates combinations of Label-Powerset classifiers iteratively, which considers label correlations without escalating the problem.

Pruned Sets (PS) [26] is a strategy that identifies the relation of the labels and eliminates the ones with less usage, grouping them based on their density and frequency while trying not to lose data. The Ensemble of Pruned Sets (EPS) is also presented as a version of the algorithm that avoids the model overfitting.

2.2 Global Approach

The global approach creates classifiers that take into account label dependency in a direct way, dealing with all labels simultaneously. It uses only one classifier to predict the L labels of the problem. The advantage is that it considers the relationships between labels and is usually faster to execute. However, the approach does not give too much weight to each individual label’s peculiarity. Additionally, the algorithm is more difficult to parallelize, bringing more computational cost to the algorithms and spending more training time.

One of the first global-based strategies is C4.5M [16], a decision-tree-based method created by adapting the C4.5 algorithm to multi-label problems. This was done by altering the entropy formula. The algorithm was also changed to represent a set of labels in their tree leaf nodes.

The MLkNN method [41, 43] is based on the kNN algorithm where, for each instance, the K closest neighbors classes are compared. Based on this, the instance is labeled. In Zhang and Zhou [42], the Back-propagation Multi-Label Learning was proposed, in which a multi-label error measure is proposed and used in the output layer of an artificial neural network.

In Tsoumakas et al. [34], the Hierarchy Of Multi-label classifiERs (HOMER) is proposed to deal with large label sets, focusing on being effective and computationally efficient. This method uses a “divide and conquer” label hierarchy strategy. It was able to have linear training and logarithmic testing complexity.

Colombini et al. [17] proposed SOM-MLL (Self-Organizing Maps Multi-label Learning), implementing Kohonen Maps to predict multi-label data. This algorithm was also used by Alshanqiti and Namoun [7] to compare the method with other baseline models, like linear regression and matrix factorization. In Saini et al. [28], a different structure of SOM for multi-label data was presented and called ML-SOM, where the label vector is determined using the closest and neighboring neurons, obtaining better results compared to SOM-MLL in some cases.

3 Multi-label Classification with Self-Organizing Maps

This section provides an overview of Self-Organizing Maps (Kohonen Maps) and the Self-Organizing Map Multi-Label Learning (SOM-MLL) [17].

3.1 Self-Organizing Maps

Self-organizing maps (SOM) [21] are unsupervised neural networks with a fixed-size bi-dimensional grid of neurons. When an instance is mapped to the grid, its best-matching neuron is selected as the winning neuron, which strengthens its proximity to the instance. The mapping process is illustrated in Fig. 1.

The training starts by calculating the Euclidean distance between the input instance and all neuron weight vectors. The winning neuron is the one with the minimum distance according to Eq. 1. In this equation, $d_j$ is the distance between the input data $\textbf{x}$ and the neuron j, with A the dimension of the input instance. Each neuron weight vector is initialized with small random values.

$$\begin{aligned} d_j (x) = \sqrt{\sum _{i=1}^{A} {(x_i - w_{j,i})^2}} \end{aligned}$$

(1)

As the winner neuron is defined, the value of its weights is updated based on the input data and the weights of the neurons in its neighborhood, according to Eq. 2. The hyperparameter $\alpha $ is the learning rate of the model, which typically decreases with training time.

$$\begin{aligned} w_j (t+1) = w_j(t) + \alpha (t) \times h_{j,i} \times (\textbf{x} - w_j(t)) \end{aligned}$$

(2)

The $h_{j,i}$ is the influence that the winning neuron j has in its neighborhood of neurons. The most commonly used formula is the Gaussian, which can be seen in Eq. 3. The $\sigma $ hyperparameter defines the influence’s size, i.e., the neighborhood width.

$$\begin{aligned} h_{j,i} = \exp \biggl (\, -\frac{d_{j}^2(x)}{2\sigma ^2} \biggl ) \end{aligned}$$

(3)

The Self-Organizing Map Multi-label Learning (SOM-MLL) is trained similarly to the common SOM. The difference is that in the prediction process, multiple labels are considered instead of only one label.

3.2 Predicting New Data

In the SOM-MLL, every instance $\textbf{x}_i$ is associated with a binary vector $\boldsymbol{v}_i$, which has the value 1 in position j if the instance is classified in class $c_j$, and 0 otherwise. After training, these binary vectors are used to construct a prototype vector $\overline{\boldsymbol{v}}_i$ for each neuron. The prototype of a neuron n is the column-wise average of the matrix formed by all binary label vectors associated with the training instances that had the neuron n as their winning neuron. This calculation is presented in Eq. 4, where ${S_{n,j}}$ is the set of training instances mapped to the neuron n classified in the class $c_j$ and ${S_n}$ is the total instances mapped to neuron n.

$$\begin{aligned} \overline{\textbf{v}}_{i,j} = \frac{|S_{n,j}|}{|S_n|} \end{aligned}$$

(4)

The prototype vector represents the probability that the mapped instance $x_i$ will be classified in each of the problem’s L labels. A threshold is applied to the prototype vector to obtain a final binary prediction. The common threshold used is 0.5, so that if $\overline{\textbf{v}}_{i,j}$ is greater or equal to 0.5, $x_i$ is classified in class $c_j$. Figure 2 shows an example of applying a threshold to a prototype vector.

4 Growing Self-Organizing Maps for Multi-label Classification

This section introduces the Growing Self-Organizing Maps (GSOM) and our proposal for a multi-label classification method called Growing Self-Organizing Maps Multi-Label Learning (GSOM-MLL).

4.1 Growing Self-Organizing Maps

The Growing Self-Organizing Maps [6] is an extension of the Kohonen Maps able to increase its number of neurons during training based on the characteristics of the input data. The algorithm has two phases: the Growing phase, where new neurons are added to the map, and the Smooth phase, where the weights of neurons are faceted by their neighbors. The algorithm has a hyperparameter called Spread Factor SF, $0 \le SF \le 1$, which controls the number of grid nodes. Lower SF values create more populated maps, while higher values create more concentrated maps.

4.2 Growing Phase

The growing phase is responsible for creating new nodes on the map by means of a Growth Threshold (GT), a Spread Factor (SF), and the dimension D of the training data. This can be seen in Eq. 5.

$$\begin{aligned} GT = - D \times \log {SF} \end{aligned}$$

(5)

The Euclidean Distance (Eq. 1) is used for each input training instance to find the corresponding closest neuron. After this winning neuron i is found, the weights are adjusted for all neurons j of the grid in the Neighborhood Influence ($h_{j,i}$) of neuron i. This adjustment procedure is the same one used in the original SOM (Eqs. 1, 2 and 3).

As an extension of SOMs, each neuron in GSOM has an associated error value E used to compute the accumulated error in the neuron. This is shown in Eq. 6, where $\textbf{x}$ is the input sample, and A is the number of attributes.

$$\begin{aligned} E_j(t+1) = E_j(t) + \sqrt{\sum _{i=1}^{A} {(x_i - w_{j,i})^2}} \end{aligned}$$

(6)

The largest error value E is called $H_{E_{rr}}$, and is updated every time a new input instance is given to the model. If $H_{E_{rr}} > GT$, the model creates a new node and sets $H_{E_{rr}} = GT/2$.

4.3 Node Generation

A new node will always be connected to a boundary node, a neuron with at least one of its four immediate neighboring positions free. Figure 3 shows the process of a node generation, being (i) the original grid, (ii) the grid where a neuron has an error larger than GT, and (iii) the two possible positions where a new neuron can be inserted in the grid.

The weights vector $\textbf{W}_{new}$ of a new neuron is based on the weight vectors $\textbf{W}1$ and $\textbf{W}2$ of its neighbors so that certain regions have similar characteristics. Figure 4 illustrates the four cases that dictate the weight initialization of a new neuron. All comparisons and operations in the four cases are executed individually for each weight of the vectors $\textbf{W}1$ and $\textbf{W}2$.

1.
The new neuron has two consecutive older nodes on one of its sides (Fig. 4(a)). In this case, the following rules are applied:
- If $W2 > W1$, then $W_{new} = W1 - (W2 - W1)$;
- If $W1 > W2$, then $W_{new} = W1 + (W1 - W2)$.
2.
The new neuron is in between two older nodes (Fig. 4(b)). In this case:
- $W_{new} = \frac{W1 + W2}{2}$.
3.
The new neuron has only one direct neighbor: an older neuron (Fig. 4(c)). In this case, the following rules are applied:
- If $W2 > W1$, then $W_{new} = W1 - (W2 - W1)$;
- If $W1 > W2$, then $W_{new} = W1 + (W1 - W2)$.
4.
The new neuron has only one direct neighbor, which is an old isolated neuron (Fig. 4(d)). In this case:
- $W_{new} = (r1 + r2)/2$, where r1 and r2 are the lower and upper values of the range of the weight vector distribution.

4.4 Smoothing Phase

This phase adjusts the neuron’s weights, tuning the grid smoother. At each epoch, the learning rate $\alpha $ is decreased according to Eq. 7.

$$\begin{aligned} \alpha (e+1) = \alpha \times v (e) \times \alpha (e) \end{aligned}$$

(7)

In Alahakoon et al. [6], v is defined as shown in Eq. 8, where R is experimentally set to 3.8 in cases of a four nodes initialization grid, and N(e) the number of nodes in the beginning of that epoch.

$$\begin{aligned} v (e) = 1 - \frac{R}{N(e)} \end{aligned}$$

(8)

The neighborhood around the winning neuron must also be reduced with training time. This is guaranteed decreasing the neighborhood width according to Eq. 9. In the equation, $\sigma _0$ is the neighborhood width in the algorithm’s initialization, while $\tau $ is a time constant.

$$\begin{aligned} \sigma (t) = \sigma _0 \exp \biggl (-\frac{t}{\tau }\biggl ) \end{aligned}$$

(9)

4.5 Predicting New Data

After finishing training, each neuron is associated with a prototype vector $\overline{\boldsymbol{v}}_i$ calculated the same way as SOM-MLL using Eq. 4. A threshold is then applied to obtain a binary final prediction vector (Fig. 2).

5 Materials and Methods

This section presents the datasets used in the experiments, the baseline algorithms used for comparison, the evaluation measures employed, and the tuning procedure employed to define the hyperparameters of GSOM-MLL.

5.1 Datasets

The datasets used are available in the Mulan Repository^{Footnote 1}. Table 1 shows some characteristics of each dataset: domain, the number of instances, the number of labels and distinct combinations of labels, and the label cardinality and label density. Label Cardinality (LC) is a metric of the average number of labels per instance, while Label Density (LD) is the LC metric over the total number of labels. Equations (10) and (11) show LD and LC, where $Y_i$ is the set of positive labels for instance i, m is the total number of instances, and q is the total number of labels.

Table 1. Characteristics of the Datasets

Full size table

$$\begin{aligned} LC = \frac{1}{m} \sum _{i=1}^{m} {|Y_i|} \end{aligned}$$

(10)

$$\begin{aligned} LD = \frac{1}{m} \sum _{i=1}^{m} \frac{|Y_i|}{q} \end{aligned}$$

(11)

5.2 Baseline Classifiers

The proposed GSOM-MLL was compared with the original SOM-MLL proposed by Colombini et al. [17], where a hexagonal grid of 25 neurons was used, the learning rate was set to decrease linearly from 0.05 to 0.01 at each epoch, and the Gaussian function (Eq. 3) was used as the neighborhood with $\sigma = 2/3$. GSOM-MLL was also compared with different local and global-based methods. They are listed below.

Binary Relevance (BR) and Label Powerset (LP) [35], with the following algorithms as base classifiers: Support Vector Machine (SVM) [38], J48 decision tree induction [24] and k-Nearest Neighbors (KNN) [5];
Random k-Labelsets (Rakel) [37] and Classifiers Chains (CC) [27], both also using SVM, J48 and KNN as base classifiers;
Hierarchy Of Multi-label classifiERs (HOMER) [23], using LP and BR, with SVM, J48 and KNN as base classifiers;
Back-Propagation Multi-Label Learning (BPMLL) [42], Multi-Label k-Nearest Neighbors (MLKNN) [43], Instance-Based Learning by Logistic Regression (IBLR-ML and IBLR-ML+) [13], and Predictive Clustering Trees (Clus) [10].

All the baseline classifiers were executed within Mulan, a library for multi-label learning [36], using their default hyperparameter values. The KNN-based methods used $K = 3$. BPMLL was executed with one hidden layer with its number of neurons equal to 80% of the number of input features.

5.3 Evaluation Measures

We evaluated all classifiers using the well-known F1-Score multi-label measure [20], which is the harmonic mean of Precision and Recall. Its calculation is presented in Eq. 12, with $Z_i$ the set of predicted labels, $Y_i$ the set of true labels, and m the number of instances.

$$\begin{aligned} \text { F1-Score} = \frac{1}{m} \sum _{i=1}^m \frac{2 \times |Y_i \cap Z_i| }{|Y_i| + |Z_i|} \end{aligned}$$

(12)

5.4 Hyperparameters and Tuning

There are many GSOM implementations available in the literature. Adeu et al. [4], for example, evaluated three of them: PyGSOM Python package [3], GSOM Python package [2] and the GrowingSOM R package [1]. These are all implementations based on Alahakoon et al. [6], and according to Adeu et al., all have different implementation details focusing on specific applications that can slightly change the results. Because of this, they chose to implement their own package^{Footnote 2} focusing on their specific application. In the same direction, we created a new Python package to meet our needs here. GSOM-MLL^{Footnote 3} is an adaptation of Adeu et al. for multi-label classification.

We used the Area Under the Precision-Recall Curve (AUPRC) to find the best hyperparameter values for our method. A Precision-Recall curve is obtained by applying threshold values in the interval [0, 1] to the classifier’s outputs. After interpolation all the precision-recall points (PR-points), and posterior connecting them, we can calculate the area under the PR-curve [18].

We performed a grid search to investigate different ranges of values for the hyperparameters. For the initial learning rate, we tested 20 values ranging from 0.005 to 0.195; for the spread factor, we evaluated 20 values ranging from 0.05 to 0.95; and for the smoothing and growing factors, we tested values ranging from 1 to 5 and 1 to 10, respectively. We chose the combination of values that led to the best AUPRC value.

The Gaussian function with $\sigma = 1$ for both growing and smoothing phases was used as the neighborhood influence. After obtaining the best hyperparameter values, the threshold of 0.5 was applied to the prototype vectors in order to obtain the final binary predictions. All grids were initialized with 4 neurons.

6 Experiments and Discussion

All the experiments were performed using the iterative stratified 10-fold cross-validation proposed for multi-label data [31]. Tables 2, 3, and 4 present, respectively, the average F1-score values and standard deviations over the 10 executions for each dataset and method. The best results are highlighted in boldface.

Table 2. Comparing GSOM-MLL with the Strategies BR, LP, and CC

Full size table

Table 3. Comparing GSOM-MLL with the Methods RAKEL and HOMER

Full size table

Table 4. Comparing GSOM-MLL with SOM-MLL, MLKNN, BPMLL, IBLR and Clus

Full size table

The overall results show that GSOM-MLL is superior to SOM-MLL and competitive with the performances of the other methods investigated. The GSOM-MLL F1-score was superior to SOM-MLL, especially in the dataset flags and scene.

We can observe that GSOM-MLL performed surprisingly badly in the birds dataset. This dataset has the smallest label density, meaning it has a very sparse distribution of labels assigned for each dataset instance. Our tuning procedure may not have been effective in maximizing performance in these circumstances.

Overall, the experiments show us that dynamically increasing the neuron grid dimension is better than working with a fixed grid size. Although the F1-score results were not as satisfactory as expected, GSOM-MLL is competitive with the other methods and is a promising strategy, especially for data visualization, since we can observe how instances are grouped in the grid. We can have a 2D visualization, keeping the original properties of the input space.

Table 5 presents the mean values for each hyperparameter after the tuning over the 10-fold cross-validation procedure: the number of growing and smooth epochs, the spread factors, the learning rates, the time in seconds to run the model, and the AUPRC values.

Table 5. Mean Values of the GSOM Hyperparameters after Tuning

Full size table

Table 5 also shows that a high spread factor is preferred for most datasets, indicating a large grid instead of a grid with few neurons. Another observation is that the learning rate is well-suited to most datasets, except for birds, which is very close to the maximum value used during the tuning stage (0.195).

Table 6 shows each dataset’s final number of neurons in the grid. The results of the GSOM-MLL experiment suggest that the number of neurons in the grid significantly influenced the outcomes. This is because all the datasets in the original SOM-MLL experiment had a fixed grid of 25 neurons, while in GSOM-MLL experiment, all datasets started with a grid of 4 neurons.

Table 6. Final Number of Neurons in the GSOM-MLL Grid

Full size table

Another point to consider is that the time spent on each iteration of the algorithm correlates with the number of neurons on the final grid. This is because more neurons can lead to increased computational costs.

7 Conclusion and Future Work

In this work, we presented GSOM-MLL, a growing self-organizing map tailored for multi-label classification. The model’s overall performance proved to be an improvement over the original SOM-MLL. The high values of the spread factor show that the datasets benefit from a higher number of neurons in a dynamic grid. Unfortunately, GSOM-MLL performs poorly in datasets with balanced label cardinality and small label density with many distinct label sets. However, it shows consistent and competitive results in other scenarios.

In future works, we can explore the neighborhood’s influence on the creation of the prototype vector. Exploring different learning rate ranges also allows for analyzing the model’s performance, as many datasets have their optimal value near the limits tested when considering error rates.

Another way to expand the research would be to utilize Hierarchical Growing Self-Organizing Maps [25]. This involves creating a hierarchy of GSOMs based on a hierarchical threshold as a direct extension of SOM and GSOM.

Notes

References

GrowingSOM R package. https://github.com/alexhunziker/growingsom
GSOM - the growing self organizing map implementation on Python. https://github.com/anantadata/gsom
Pygsom - a GSOM (growing self-organizing map) implementation for Python. https://github.com/philippludwig/pygsom
Adeu, R.S., Ferreira, K.R., Andrade, P.R., Santos, L.: Evaluating growing self-organizing maps for satellite image time series clustering. In: GEOINFO, 20 Years After! p. 243 (2019)
Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Article MATH Google Scholar
Alahakoon, D., Halgamuge, S.K., Srinivasan, B.: Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans. Neural Netw. 11(3), 601–614 (2000)
Article MATH Google Scholar
Alshanqiti, A., Namoun, A.: Predicting student performance and its influential factors using hybrid regression and multi-label classification. IEEE Access 8, 203827–203844 (2020)
Article Google Scholar
Baltruschat, I.M., Nickisch, H., Grass, M., Knopp, T., Saalbach, A.: Comparison of deep learning approaches for multi-label chest X-ray classification. Sci. Rep. 9(1), 1–10 (2019)
Article Google Scholar
Banerjee, S., Akkaya, C., Perez-Sorrosal, F., Tsioutsiouliklis, K.: Hierarchical transfer learning for multi-label text classification. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6295–6300 (2019)
Google Scholar
Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 55–63. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1998)
Google Scholar
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Article MATH Google Scholar
Chen, Q., et al.: Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database 2022, baac069 (2022)
Google Scholar
Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76(2–3), 211–225 (2009)
Article MATH Google Scholar
Cheng, W., Hüllermeier, E., Dembczynski, K.J.: Bayes optimal multilabel classification via probabilistic classifier chains. In: Proceedings of the 27th International Conference on Machine Learning, ICML-10, pp. 279–286 (2010)
Google Scholar
Cherman, E.A.: Aprendizado de máquina multirrótulo: explorando a dependência de rótulos e o aprendizado ativo. Ph.D. thesis, Universidade de São Paulo (2013)
Google Scholar
Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: De Raedt, L., Siebes, A. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44794-6_4
Chapter MATH Google Scholar
Colombini, G.G., de Abreu, I.B.M., Cerri, R.: A self-organizing map-based method for multi-label classification. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 4291–4298. IEEE (2017)
Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: International Conference on Machine Learning, pp. 233–240 (2006)
Google Scholar
Gatto, E.C., Valejo, A.D.B., Ferrandin, M., Cerri, R.: Community detection for multi-label classification. In: Naldi, M.C., Bianchi, R.A.C. (eds.) Intelligent Systems, BRACIS 2023. LNCS, vol. 14195, pp. 78–93. Springer, Cham. https://doi.org/10.1007/978-3-031-45368-7_6
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24775-3_5
Chapter MATH Google Scholar
Kohonen, T.K.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
Article MATH Google Scholar
Li, Q., Peng, X., Qiao, Y., Peng, Q.: Learning category correlations for multi-label image recognition with graph networks. arXiv preprint arXiv:1909.13005 (2019)
Papanikolaou, Y., Tsoumakas, G., Katakis, I.: Hierarchical partitioning of the output space in multi-label data. Data Knowl. Eng. 116, 42–60 (2018)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993)
Google Scholar
Rauber, A., Merkl, D., Dittenbach, M.: The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Trans. Neural Netw. 13(6), 1331–1341 (2002)
Article MATH Google Scholar
Read, J., Pfahringer, B., Holmes, G.: Multi-label classification using ensembles of pruned sets. In: 2008 8th IEEE International Conference on Data Mining, pp. 995–1000. IEEE (2008)
Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5782, pp. 254–269. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04174-7_17
Chapter MATH Google Scholar
Saini, N., Saha, S., Bhattacharyya, P.: Incorporation of neighborhood concept in enhancing SOM based multi-label classification. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D.K., Bora, P.K., Pal, S.K. (eds.) PReMI 2019. LNCS, vol. 11941, pp. 91–99. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34869-4_11
Chapter MATH Google Scholar
Santos, B.Z., Nakano, F.K., Cerri, R., Vens, C.: Predictive bi-clustering trees for hierarchical multi-label classification. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds.) ECML PKDD 2020. LNCS (LNAI), vol. 12459, pp. 701–718. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67664-3_42
Chapter MATH Google Scholar
Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Kocev, D., Džeroski, S.: Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinf. 11(1), 1–14 (2010)
Article MATH Google Scholar
Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10
Chapter MATH Google Scholar
Spyromitros, E., Tsoumakas, G., Vlahavas, I.: An empirical study of lazy multilabel classification algorithms. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 401–406. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87881-0_40
Chapter MATH Google Scholar
Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.P., et al.: Multi-label classification of music into emotions. In: ISMIR, vol. 8, pp. 325–330 (2008)
Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceeding of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data, MMD 2008, vol. 21, pp. 53–59 (2008)
Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685 (2010)
Google Scholar
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: MULAN: a Java library for multi-label learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)
MathSciNet MATH Google Scholar
Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38
Chapter MATH Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Information Science and Statistics, Springer, New York (1999). https://doi.org/10.1007/978-1-4757-3264-1
Book MATH Google Scholar
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185–214 (2008)
Article MATH Google Scholar
Wu, B., Zhong, E., Horner, A., Yang, Q.: Music emotion recognition by multi-label multi-layer multi-instance multi-view learning. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 117–126 (2014)
Google Scholar
Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing, vol. 2, pp. 718–721. IEEE (2005)
Google Scholar
Zhang, M.L., Zhou, Z.H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)
Article MATH Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar

Download references

Acknowledgment

This study was financed by the National Council for Scientific and Technological Development (CNPq) and the São Paulo Research Foundation (FAPESP) grant #2022/02981-8.

Author information

Authors and Affiliations

Department of Computing, Federal University of São Carlos, São Carlos, SP, Brazil
Pedro Henrique Casarotto
Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, SP, Brazil
Ricardo Cerri

Authors

Pedro Henrique Casarotto
View author publications
Search author on:PubMed Google Scholar
Ricardo Cerri
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Cerri .

Editor information

Editors and Affiliations

Universidade Federal Fluminense, Niterói, Brazil
Aline Paes
Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
Filipe A. N. Verri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Henrique Casarotto, P., Cerri, R. (2025). Growing Self-Organizing Maps for Multi-label Classification. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15413. Springer, Cham. https://doi.org/10.1007/978-3-031-79032-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-79032-4_3
Published: 30 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79031-7
Online ISBN: 978-3-031-79032-4
eBook Packages: Computer ScienceComputer Science (R0)

Growing Self-Organizing Maps for Multi-label Classification

Abstract

Similar content being viewed by others

Incorporation of Neighborhood Concept in Enhancing SOM Based Multi-label Classification

Improving Self-Organizing Maps with Unsupervised Feature Extraction

Enhancing Visual Clustering Using Adaptive Moving Self-Organizing Maps (AMSOM)

1 Introduction

2 Multi-label Classification

2.1 Local Approach

2.2 Global Approach

3 Multi-label Classification with Self-Organizing Maps

3.1 Self-Organizing Maps

3.2 Predicting New Data

4 Growing Self-Organizing Maps for Multi-label Classification

4.1 Growing Self-Organizing Maps

4.2 Growing Phase

4.3 Node Generation

4.4 Smoothing Phase

4.5 Predicting New Data

5 Materials and Methods

5.1 Datasets

5.2 Baseline Classifiers

5.3 Evaluation Measures

5.4 Hyperparameters and Tuning

6 Experiments and Discussion

7 Conclusion and Future Work

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us

Profiles