Submitted 12 June 2019 Accepted 22 October 2019 Published 18 November 2019 Corresponding author Seiki Ubukata, subukata@cs.osakafu-u.ac.jp Academic editor Sebastian Ventura Additional Information and Declarations can be found on page 18 DOI 10.7717/peerj-cs.238 Copyright 2019 Ubukata Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS A unified approach for cluster-wise and general noise rejection approaches for k-means clustering Seiki Ubukata Graduate School of Engineering, Osaka Prefecture University, Sakai, Osaka, Japan ABSTRACT Hard C-means (HCM; k-means) is one of the most widely used partitive clustering techniques. However, HCM is strongly affected by noise objects and cannot represent cluster overlap. To reduce the influence of noise objects, objects distant from cluster centers are rejected in some noise rejection approaches including general noise rejection (GNR) and cluster-wise noise rejection (CNR). Generalized rough C-means (GRCM) can deal with positive, negative, and boundary belonging of object to clusters by reference to rough set theory. GRCM realizes cluster overlap by the linear function threshold-based object-cluster assignment. In this study, as a unified approach for GNR and CNR in HCM, we propose linear function threshold-based C-means (LiFTCM) by relaxing GRCM. We show that the linear function threshold-based assignment in LiFTCM includes GNR, CNR, and their combinations as well as rough assignment of GRCM. The classification boundary is visualized so that the characteristics of LiFTCM in various parameter settings are clarified. Numerical experiments demonstrate that the combinations of rough clustering or the combinations of GNR and CNR realized by LiFTCM yield satisfactory results. Subjects Data Mining and Machine Learning, Data Science, Optimization Theory and Computation Keywords Clustering, k-means, Noise rejection, Rough set theory INTRODUCTION Clustering, which is an important task in data mining/machine learning, is a technique for automatically extracting group (cluster) structures from data without supervision. It is useful for analyzing large-scale unlabeled data. Hard C-means (HCM; k- means) (MacQueen, 1967) is one of the most widely used partitive clustering techniques. Real-world datasets often contain noise objects (outliers) with irregular features that may distort cluster shapes and deteriorate clustering performance. Since C-means-type methods are formulated based on the minimization of the total within-cluster sum-of-squared-error, they are strongly affected by noise objects, which are distant from cluster centers. We focus on two types of noise rejection, namely, general noise rejection (GNR) and cluster-wise noise rejection (CNR). In GNR approaches, whether each object is noise or not is defined in the whole cluster structure. Objects distant from any cluster center are rejected as noise. On the other hand, in CNR approaches, whether each object is noise or not is defined for each cluster. For each cluster, objects distant from its center are rejected as noise. Both How to cite this article Ubukata S. 2019. A unified approach for cluster-wise and general noise rejection approaches for k-means cluster- ing. PeerJ Comput. Sci. 5:e238 http://doi.org/10.7717/peerj-cs.238 https://peerj.com/computer-science mailto:subukata@cs.osakafu-u.ac.jp mailto:subukata@cs.osakafu-u.ac.jp https://peerj.com/academic-boards/editors/ https://peerj.com/academic-boards/editors/ http://dx.doi.org/10.7717/peerj-cs.238 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://doi.org/10.7717/peerj-cs.238 GNR and CNR perform noise rejection while GNR performs exclusive cluster assignment whereas CNR allows cluster overlap. HCM assigns each object to one and only one cluster with membership in the Boolean (hard; crisp) domain {0,1}, and thus it cannot represent belonging to multiple clusters or non-belonging to any cluster. However, in real-world datasets, belonging of object to clusters is often unclear. Soft computing approaches are useful to represent belonging to multiple clusters or non-belonging to any cluster. Clustering based on rough set theory (Pawlak, 1982; Pawlak, 1991) considers positive, negative, and boundary belonging of object to clusters. Lingras and West proposed rough C-means (LRCM) (Lingras & West, 2004) as a rough-set-based C-means clustering, and Peters proposed a refined version of RCM (PRCM) (Peters, 2006). Ubukata et al. proposed the generalized RCM (GRCM) (Ubukata, Notsu & Honda, 2017) by integrating LRCM and PRCM. GRCM realizes cluster overlap by a linear function threshold with respect to the distance to the nearest cluster and detects the upper area composed of objects that possibly belong to the cluster. Specifically, the threshold based on the distance to the nearest cluster center is lifted by the linear function to allow the cluster to be assigned to relatively near clusters as well as the nearest cluster. In this study, we investigate the characteristics of the linear function threshold-based object-cluster assignment in GRCM. We show that the linear function threshold-based assignment in relaxed GRCM can realize GNR, CNR, and their combinations as well as rough assignments. One important point is that the linear function threshold-based assignment essentially includes GNR and CNR in compliance with RCM standards without any extra formulation. As a unified approach for GNR and CNR in HCM, we propose linear function threshold-based C-means (LiFTCM) by relaxing GRCM. The classification boundary is visualized so that the characteristics of LiFTCM in various parameter settings are clarified. Numerical experiments demonstrate that the combinations of rough clustering or the combinations of GNR and CNR realized by LiFTCM yield satisfactory results. The remainder of the paper is organized as follows. In ‘‘Related Work,’’ related works are discussed. ‘‘Preliminaries’’ presents the preliminaries for clustering methods. In ‘‘A unified approach for cluster-wise and general noise rejection approaches,’’ we show that the linear function threshold-based assignment in relaxed GRCM can realize GNR, CNR, and their combinations as well as rough assignments. In ‘‘Proposed Method,’’ we propose LiFTCM as one of the relaxed GRCM. In ‘‘Visualization of Classification Boundaries,’’ the classification boundaries of LiFTCM with various parameter settings are considered. In ‘‘Numerical Experiments,’’ the clustering performance of LiFTCM with various parameter settings is discussed. In ‘‘Discussion,’’ the calculation of the cluster center in the proposed method are discussed. Finally, the conclusions are presented in ‘‘Conclusions.’’ RELATED WORK Noise rejection in regression analysis and C-means-type clustering Many machine learning tasks such as regression analysis are formulated in a framework of least mean squares (LMS) proposed by Legendre or Gauss (Legendre, 1805; Gauss, 1809), Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 2/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238 which minimizes the sum of the squared residuals to fit models to a dataset. However, since the LMS criterion is strongly affected by noise objects and has the lack of robustness, various robust estimation methods have been proposed to reduce the influence of noise objects. Least absolute values (LAV) (Edgeworth, 1887) is a criterion that minimizes the sum of the absolute values of the residuals to reduce the influence of large residuals. M-estimator (Huber, 1964; Huber, 1981) is one of the most widely used robust estimators, which replaces the square function in LMS by a symmetric function with a unique minimum at zero that reduces the influence of large residuals. Least median of squares (LMedS) (Hampel, 1975; Rousseeuw, 1984) minimizes the median of the squared residuals. Least trimmed squares (LTS) (Rousseeuw & Leroy, 1987) minimizes the sum of the squared residuals up to h-th objects in ascending list of residuals. Since C-means-type clustering methods are generally formulated based on the minimization of the within-cluster sum-of-squared-error, the above-mentioned robust estimation methods are promising approaches to noise in the cluster structure (Kaufmann & Rousseeuw, 1987; Dubes & Jain, 1988). In C-means-type clustering, the distance between object and its nearest cluster center is identified as the residual. Thus, in GNR, objects distant from any cluster center are rejected as noise. For instance, trimmed C-means (TCM; trimmed k-means, TKM) (Cuesta-Albertos, Gordaliza & Matrán, 1997; Garcia- Escudero & Gordaliza, 1999) introduces LTS criterion to HCM. TCM calculates the new cluster center by using objects up to h-th in ascending list of the distances to their nearest cluster centers. As a result, objects more than a certain distance away are rejected as noise. Noise rejection in C-means-type clustering is also well discussed in the context of fuzzy C-means (FCM) (Dunn, 1973; Bezdek, 1981). In noise fuzzy C-means (NFCM) (Davé, 1991; Davé & Krishnapuram, 1997), a single noise cluster is introduced in addition to the intended regular clusters and objects distant from any cluster center are absorbed in the noise cluster. Another approach to noise is CNR. For instance, possibilistic C-means (PCM) (Krishnapuram & Keller, 1993; Krishnapuram & Keller, 1996) considers cluster-wise noise rejection, in which each cluster is extracted independently while rejecting objects distant from its center. The membership values are interpreted as degrees of possibility of the object belonging to clusters. PCM represents typicality as absolute membership to clusters rather than relative membership by eliminating the sum-to-one constraint. Fuzzy possibilistic C-means (FPCM) (Pal, Pal & Bezdek, 1997) uses both relative typicalities (memberships) and absolute typicalities. Possibilistic fuzzy C-means (PFCM) (Pal et al., 2005) is a hybridization of FCM and PCM using both probabilistic memberships of FCM and possibilistic memberships of PCM. In this study, we show that GNR, CNR, and their combinations are realized by the linear function threshold-based object-cluster assignment in the proposed LiFTCM. The above-mentioned approaches introduce various mechanisms to realize GNR and CNR. In contrast, the linear function threshold-based assignment essentially includes GNR, CNR, and their combinations in compliance with RCM standards without any extra formulation. Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 3/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238 Generalized approaches to hard, fuzzy, noise, possibilistic, and rough clustering Maji & Pal (2007a) proposed rough-fuzzy C-means (RFCM) as a hybrid algorithm of FCM and RCM. RFCM is formulated so that objects in the lower areas have crisp memberships and objects in the boundary areas have FCM-based fuzzy memberships. Furthermore, Maji & Pal (2007b) proposed rough-fuzzy possibilistic C-means (RFPCM) based on possibilistic fuzzy C-means (PFCM) (Pal et al., 2005). Masson and Denœux proposed evidential C-means (ECM) (Masson & Denoeux, 2008) as one of the evidential clustering (EVCLUS) (Denoeux & Masson, 2003; Denoeux & Masson, 2004) methods based on the Dempster-Shafer theory of belief functions (evidence theory). Evidential clustering considers the basic belief assignment, which indicates the membership (mass of belief) of each object to each subset of clusters with the probabilistic constraints that derive credal partition. Credal partition can represent hard and fuzzy partitions with a noise cluster considering assignments to a singleton and the empty set. Possibilistic and rough partitions are represented by using the plausibility function and the belief function (Denoeux & Kanjanatarakul, 2016). Although RFCM and RFPCM provide interesting perspectives on the handling of the uncertainty in the boundary area, the object-cluster assignment is different from that of RCM and transform into different types of approach. Although the credal partition in ECM has high expressiveness including hard, noise, possibilistic, and rough clustering, the object-cluster assignment and cluster center calculation of ECM do not boil down to those of RCM. In contrast to the above-mentioned approaches, the formulation of the proposed LiFTCM is fully compliant with RCM standards. This study reveals that RCM itself inherently includes GNR, CNR, and their combinations as well as rough clustering aspects without any extra formulation. PRELIMINARIES Hard C-means and noise rejection Let U ={x1,...,xi,...,xn} be a set of n objects, where each object xi=(xi1,...,xij,...,xip)> is a p-dimensional real feature vector. In C-means-type methods, C (2≤C . Let uci be the degree of belonging of object i to cluster c. Let dci=||xi−bc|| be the distance between the cluster center bc and the object i. The optimization problem of HCM (MacQueen, 1967) is given by min. JHCM = C∑ c=1 n∑ i=1 ucid 2 ci, (1) s.t. uci∈{0,1},∀c,i, (2) C∑ c=1 uci=1,∀i. (3) Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 4/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238 HCM minimizes the total within-cluster sum-of-squared-error (Eq. (1)) under the Boolean domain constraints (Eq. (2)) and the sum-to-one constraints across clusters (Eq. (3)). HCM first initializes cluster centers and then alternately updates uci and bc until convergence by using the following update rules: uci=  1 ( c =argmin 1≤l≤C dli ) , 0 (otherwise), (4) bc = ∑n i=1ucixi∑n i=1uci . (5) There are various strategies for initializing cluster centers. A naive strategy is to choose C objects as initial cluster centers from U by simple random sampling without replacement. Alternatively, there are strategies that set the initial cluster centers away from each other to reduce initial value dependencies and improve clustering performance, such as KKZ (Katsavounidis, Kuo & Zhang, 1994) and k-means++ (Arthur & Vassilvitskii, 2007). General noise rejection (GNR) Since HCM is formulated based on the LMS criterion, it is strongly affected by noise objects. Like TCM, which introduces the LTS criterion, the influence of noise objects can be reduced by rejecting objects distant from any cluster. In this type of GNR, each object is assigned to the nearest cluster under the condition that the distance dci is less than or equal to a threshold (noise distance) δ(δ>0): uci=  1 ( c =argmin 1≤l≤C dli∧dci≤δ ) , 0 (otherwise). (6) The smaller δ is, the more objects are rejected as noise. The noise distance δ can depend on how many (what percentage of) objects to reject as noise. Cluster-wise noise rejection (CNR) GNR is based on HCM-based exclusive assignment and cannot represent cluster overlap. By performing noise rejection independently for each cluster, possibilistic aspects that present non-belonging to any cluster and belonging to multiple clusters are achieved. In this type of CNR, noise rejection is performed for each cluster by rejecting objects over δc distant from its center: uci= { 1 (dci≤δc), 0 (otherwise). (7) The smaller δc is, the more objects are rejected as noise for each cluster c. The cluster-wise noise distance δc can depend on how many (what percentage of) objects to reject as noise for each cluster. Generalized rough C-means In RCM-type methods, which are rough set clustering schemes, membership in the lower, upper, and boundary areas of each cluster represents positive, possible, and uncertain Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 5/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238 Figure 1 GRCM: the linear function threshold T and the allowable range of dci (gray area). Full-size DOI: 10.7717/peerjcs.238/fig-1 belonging to the cluster, respectively (Lingras & West, 2004; Peters, 2006; Peters et al., 2013; Ubukata, Notsu & Honda, 2017). GRCM is constructed based on a heuristic scheme, not an objective function. In every iteration, the membership uci of object i to the upper area of cluster c is first calculated as follows: dmini = min1≤l≤C dli, (8) uci= { 1 (dci≤αd min i +β), 0 (otherwise), (9) where α (α≥1) and β (β≥0) are user-defined parameters that adjust the volume of the upper areas. GRCM assigns each object to the upper area of not only its nearest cluster but also of other relatively nearby clusters using a linear function of the distance to its nearest cluster as a threshold. Larger α and β imply larger clustering roughness and larger overlap of the upper areas of the clusters. Figure 1 shows the linear function threshold T and the allowable range of dci (gray area) in GRCM. The membership uci and ûci of object i to the lower and boundary areas, respectively, is calculated using uci as follows: uci=  1 ( uci=1∧ C∑ l=1 uli=1 ) , 0 (otherwise), (10) Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 6/20 https://peerj.com https://doi.org/10.7717/peerjcs.238/fig-1 http://dx.doi.org/10.7717/peerj-cs.238 ûci=  1 ( uci=1∧ C∑ l=1 uli 6=1 ) , 0 (otherwise) (11) =uci−uci. (12) GRCM represents each cluster by the three regions. Therefore, the new cluster center is determined by the aggregation of the centers of these regions. The cluster center bc is calculated by the convex combination of the centers of the lower, upper, and boundary areas of the cluster c: bc =   ∑n i=1ucixi∑n i=1uci ( n∑ i=1 uci=0∨ n∑ i=1 ûci=0 ) , w ∑n i=1ucixi∑n i=1uci +w ∑n i=1ucixi∑n i=1uci +ŵ ∑n i=1ûcixi∑n i=1ûci (otherwise), (13) w,w,ŵ ≥0, (14) w+w+ŵ =1, (15) where w, w, and ŵ are user-defined parameters that represent the impact of the centers of the lower, upper, and boundary areas, respectively. Ubukata, Notsu & Honda (2017) suggest ŵ =0 because the centers of the boundary areas tend to cause instability in the calculations and poor classification performance. A UNIFIED APPROACH FOR CLUSTER-WISE AND GENERAL NOISE REJECTION APPROACHES In this section, we show that GNR, CNR, and their combinations are realized by the linear function threshold in relaxed GRCM. Here, we consider relaxing the condition α≥1 to α≥0 in Eq. (9). HCM In HCM, each object is assigned to the cluster whose center is nearest to the object. This assignment can be interpreted as assigning object i to cluster c if dci is equal to (or less than) dmini , that is, uci= { 1 (dci≤d min i ), 0 (otherwise). (16) This is the caseα=1 andβ=0 in the linear function thresholdαdmini +β for the assignment of upper area in GRCM (Eq. (9)). Figure 2A shows the linear function threshold T and the allowable range of dci in HCM. The allowable range is limited to the case dci=dmini . We note that if there are multiple nearest cluster centers for an object, HCM requires certain tie- breaking rules for satisfying the sum-to-one constraints, such as exclusive assignment based on cluster priority and uniform assignment by distributing the membership, depending on the implementation. However, in the present linear function threshold-based assignment, an object has membership 1 with respect to all its nearest clusters. The calculation of uci in HCM can be represented by that of uci in GRCM. The lower and boundary areas are not used in HCM. Thus, the cluster center calculation of HCM Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 7/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238 Figure 2 The linear function threshold T and the allowable range of dci (gray area): (A) HCM, (B) GNR, and (C) CNR. Full-size DOI: 10.7717/peerjcs.238/fig-2 is consistent with that of GRCM only using the upper areas, that is, w =1 in Eq. (13). Therefore, GRCM(α=1,β=0,w =1) represents HCM. GNR In GNR, a condition that the distance is less than δ is imposed in addition to the threshold- based HCM assignment (Eq. (16)) to reject noise objects over δ distant from any clusters. For each object i to be assigned to the cluster c, dci must be equal to (or less than) dmini , and equal to or less than the noise distance δ, that is, uci= { 1 (dci≤d min i ∧dci≤δ), 0 (otherwise). (17) This assignment can also be approximated using the linear function threshold by setting α= δ−ε δ and β=ε, where ε→+0, that is, uci=  1 ( dci≤ δ−ε δ dmini +ε ) , 0 (otherwise). (18) Equation (18) implies that uci =1 if dci ≤dmini and dci ≤δ. Thus, Eq. (18) approaches the update rule Eq. (17). In order to show that Eqs. (17) and (18) are equivalent, we show that the condition dci≤dmini ∧dci≤δ and the condition dci≤ δ−ε δ dmini +ε are equivalent, under the condition δ>0 and ε→+0. Proposition 1 If dci≤dmini ∧dci≤δ, then dci≤ δ−ε δ dmini +ε. proof. (1) dci≤dmini ∧dci≤δ (Assumption) (2) dci≤dmini (Conjunction elimination: (1)) (3) dci≤δ (Conjunction elimination: (1)) (4) dmini ≤dci (Definition: (Eq. (8))) (5) dmini ≤δ (Transitivity: (3), (4)) (6) ε δ dmini + δ−ε δ dmini ≤ ε δ δ+ δ−ε δ dmini (Multiply by ε δ and add δ−ε δ dmini in both sides of (5)) Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 8/20 https://peerj.com https://doi.org/10.7717/peerjcs.238/fig-2 http://dx.doi.org/10.7717/peerj-cs.238 (7) dmini ≤ δ−ε δ dmini +ε (Deformation: (6)) (8) dci≤ δ−ε δ dmini +ε (Transitivity: (2), (7)) � Proposition 2 If dci≤ δ−ε δ dmini +ε, then dci≤d min i ∧dci≤δ, under the condition that ε is sufficiently small. proof. (1) dci≤ δ−ε δ dmini +ε (Assumption) (2) dci≤dmini (From (1) and ε→+0) (3) dmini ≤dci (Definition: (Eq. (8))) (4) dci≤ δ−ε δ dci+ε (From (1), (3)) (5) δdci≤δdci−εdci+δε (Multiply by δ in both sides of (4)) (6) dci≤δ (Deformation: (5)) (7) dci≤dmini ∧dci≤δ (Conjunction introduction: (2), (6)) � Hence, (Eq. (17)) induces (Eq. (18)), and vice versa. Figure 2B shows the linear function threshold T and the allowable range of dci (gray area) in GNR. Since the intersection of the two lines y = δ−ε δ dmini +ε and y =d min i is (δ,δ), if dci >δ, object i is never assigned to cluster c. If dmini ≤δ, the threshold approaches the HCM-based nearest assignment. These characteristics are consistent with those of GNR. Similar to HCM, in GNR, the cluster centers are calculated only using the upper areas. Therefore, GRCM(α= δ−ε δ ,β=ε,w =1) represents GNR. CNR The object-cluster assignment of CNR is determined only by the magnitude relation between dci and δc without considering dmini . We note that the case α=0 and β=δc in Eq. (9) corresponds to the update rule Eq. (7) of CNR. Figure 2C shows the linear function threshold T and the allowable range of dci (gray area) in CNR. Independent of dmini , if dci≤δ, object i is assigned to cluster c. Similar to HCM and GNR, in CNR, the cluster centers are calculated only using the upper areas. Therefore, GRCM(α=0,β=δc,w =1) represents CNR. Smooth transition between GNR and CNR by tuning linear function threshold In reference to the threshold-based assignment of GNR, i.e., Eq. (18), we construct the following rule using a parameter t ∈[0,δc]: uci=  1 (dci≤ δc−t δc dmini +t), 0 (otherwise). (19) If t =0, then Eq. (19) reduces to Eq. (16) of HCM. If t =ε, where ε→+0, then Eq. (19) changes to Eq. (18) of GNR. If t =δc, then Eq. (19) reduces to Eq. (7) of CNR. If t ∈(0,δc), then Eq. (19) represents the combinations of GNR and CNR. Thereby, smooth transition between HCM, GNR, and CNR is realized. Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 9/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238 Figure 3 Combination of GNR and CNR: the linear function threshold T and the allowable range of dci (gray area). Full-size DOI: 10.7717/peerjcs.238/fig-3 Figure 3 shows the linear function threshold T and the allowable range of dci (gray area) in the combinations of GNR and CNR. It can be seen that this linear function can transition between the states shown in Fig. 2 by t. For practical use, we consider the normalized parameter z ∈[0,1]. We let z = t δc ∈[0,1] and replace t in Eq. (19) with zδc: uci= { 1 (dci≤(1−z)d min i +zδc), 0 (otherwise). (20) Then, z = 0 represents HCM, z →+0 represents GNR, z ∈ (0,1) represents the combinations of GNR and CNR, and z =1 represents CNR. By Eq. (20), the threshold value is represented by the convex combination of dmini and δc. That is, HCM, GNR, and CNR can be characterized depending on which of dmini and δc is emphasized as the threshold value. PROPOSED METHOD In this study, we propose LiFTCM as one of the relaxed GRCM. ‘‘LiFT’’ is an acronym that stands for ‘‘linear function threshold’’ and suggests that the threshold is lifted by the linear function. A sample procedure of LiFTCM is described in algorithm 1. Although this algorithm just corresponds to the case where the condition α≥1 in GRCM is relaxed to α≥0, LiFTCM can represent GNR, CNR, and their combinations in addition to GRCM. If 0≤α≤1, LiFTCM includes HCM, GNR, CNR, and their Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 10/20 https://peerj.com https://doi.org/10.7717/peerjcs.238/fig-3 http://dx.doi.org/10.7717/peerj-cs.238 Algorithm 1 LiFTCM Step 1. Determine α (α≥0), β (β≥0), and w,w,ŵ ≥0 such that w+w+ŵ =1. Step 2. Initialize bc. Step 3. Calculate uci using Eqs. (8) and (9). Step 4. Calculate uci and ûci using Eqs. (10) and (11). Step 5. Calculate bc using Eq. (13). Step 6. Repeat Steps 3-5 until uci do not change. Table 1 Relationship between HCM, GNR, CNR, and rough clustering, and their combinations in terms of the linear function threshold in LiFTCM. Linear function Threshold: αdmini +β β=0 β→+0 0<β α=0 – – CNR 0<α<1 – GNR Combinations of GNR and CNR α=1 HCM HCM LRCM 1<α PRCM PRCM Combinations of LRCM and PRCM (GRCM) combinations. If α≥1, LiFTCM includes HCM, LRCM, PRCM, and their combinations. Table 1 summarizes the relationships between HCM, GNR, CNR, and rough clustering, and their combinations depending on the values of the parameters α and β in LiFTCM. As it is difficult to adjust noise sensitivity by directly changing α and β when noise rejection is considered in LiFTCM, it is convenient to fix the cluster-wise noise distance δc and adjust the combination of HCM, GNR, and CNR by the parameter z ∈[0,1] with α=(1−z) and β=zδc. The representations of the conventional methods by setting the parameters of LiFTCM are summarized as follows: 1. HCM: LiFTCM(α=1, β=0, w =1). 2. LRCM: LiFTCM(α=1, β≥0, w =0). 3. PRCM: LiFTCM(α≥1, β=0, ŵ =0). 4. GRCM: LiFTCM(α≥1, β≥0). 5. GNR: LiFTCM(α=1−z, β=zδc, z →+0, w =1). 6. CNR: LiFTCM(α=0, β=δc, w =1). 7. Combinations of GNR and CNR: LiFTCM(α=1−z, β=zδc, z ∈[0,1], w =1). VISUALIZATION OF CLASSIFICATION BOUNDARIES In this section, we visualize the classification boundaries of the proposed LiFTCM. LiFTCM was applied to a grid point dataset, in which n=100×100 objects are uniformly arranged in the unit square [0,1]×[0,1]. C =3 clusters (c =1,2,3), which correspond to the primary colors (Red,Green,Blue), respectively, are extracted by LiFTCM. The RGB-color of object i is determined by (R,G,B)i=(255×u1i,255×u2i,255×u3i). Objects belonging to a single cluster are represented by primary colors, objects belonging to multiple clusters are represented by additive colors, and objects not belonging to any cluster are represented Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 11/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238 Figure 4 Classification boundaries of LiFTCM(α≥ 1, β≥ 0, w = 1) representing LRCM, PRCM, and GRCM assignments: (A) LiFTCM(α=1, β=0.1, w =1) (LRCM assignment), (B) LiFTCM(α=1.4, β= 0, w =1) (PRCM assignment), and (C) LiFTCM(α=1.4, β=0.1, w =1) (GRCM assignment). Full-size DOI: 10.7717/peerjcs.238/fig-4 by black color. The cluster centers are indicated by cross marks. Initial cluster centers were determined by b1=(0,0)>, b2=(0.5,1)>, and b3=(1,0)>. Figure 4 shows the results of LiFTCM(α≥1, β ≥0, w =1), which corresponds to GRCM(w =1). Figure 4A shows the result of LiFTCM(α=1, β=0.1, w =1), which is interpreted as the LRCM assignment. Figure 4B shows the result of LiFTCM(α=1.4, β=0, w =1), which is interpreted as the PRCM assignment. Figure 4C shows the result of LiFTCM(α=1.4, β=0.1, w =1), which is interpreted as the GRCM assignment. Thereby, cluster overlap is realized by lifting the threshold by a linear function. Figure 5 shows the results of LiFTCM(α=1−z, β=zδc, z ∈[0,1], w =1) in which noise rejection is intended. The noise distance was set to δc =0.35 and the parameter z was set to {0,0.001,0.25,0.5,0.75,1}. Figure 5A shows the result for z =0. A hard partition with a Voronoi boundary is generated in the same manner as in HCM. Figure 5B shows the result for z =0.001. Such a small value of z realize general noise rejection, that is, objects over δc distant from any cluster are rejected. The boundary between clusters is the Voronoi boundary, and objects whose distance to any cluster is greater than the noise distance δc are shown in black and rejected as noise. As z approaches 1 in the order of Figs. 5C–5E, the overlap between clusters increases. Figure 5F shows the result for z =1. In this case, cluster-wise noise rejection is performed and each cluster is composed of a circle with radius δc centered at the cluster center. By adjusting the threshold relative to δc, cluster overlap and noise rejection are realized simultaneously. Thereby, LiFTCM can realize HCM, GRCM, GNR, CNR, and their combinations by lifting the threshold by a linear function. Schematic diagram Figure 6 is a schematic diagram of the proposal of this study. Representations of HCM, LRCM, PRCM, GRCM, GNR, CNR, and their combinations by the linear function threshold in LiFTCM with the parameters (α, β), and their relationships are shown. (α,β)=(1,0) is the default state and represents HCM assignment. Increasing α from 1 and β from 0 increases cluster overlap. Simultaneously increasing α and β increases clustering Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 12/20 https://peerj.com https://doi.org/10.7717/peerjcs.238/fig-4 http://dx.doi.org/10.7717/peerj-cs.238 Figure 5 Classification boundaries of LiFTCM(α= 1− z, β = zδc, w = 1) representing HCM, GNR, CNR, and their combinations: (A) z =0 (HCM), (B) z =0.001 (GNR), (C) z =0.25 (combination), (D) z =0.5 (combination), (E) z =0.75 (combination), and (F) z =1 (CNR). Full-size DOI: 10.7717/peerjcs.238/fig-5 roughness. This shows combinations of LRCM and PRCM, namely, GRCM. LiFTCM gives an interpretation in 0≤α≤1 in addition to GRCM. As proposed in the smooth transition, when the parameter z is increased from 0 to 1, (α,β) transits from (1,0) to (0,δc), namely, from HCM to CNR via GNR. The parameter z has the effect of changing clustering more possibilistic. Cluster overlap in CNR is attributed to the increase in β in LRCM. The direction in which the destination δc is lowered is the direction in which noise objects are more rejected. NUMERICAL EXPERIMENTS This section presents the results of numerical experiments for evaluating the clustering performance of the proposed LiFTCM with various parameter settings in four real-world datasets downloaded from UCI Machine Learning Repository (https://archive.ics.uci.edu/ ml/) and summarized in Table 2. Performance was evaluated by the accuracy of class center estimation. The datasets are labeled and include the feature vector and the correct class label of each object. Each dataset was partitioned into disjoint classes according to the class labels, and the center of each class (class center) was calculated. LiFTCM was applied to the generated unlabeled datasets. The number C of clusters was set to the number of classes. To avoid initial value dependence, the initial cluster centers were set to the cluster centers generated by KKZ-based HCM. Considering the correspondence of the clusters and the Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 13/20 https://peerj.com https://doi.org/10.7717/peerjcs.238/fig-5 https://archive.ics.uci.edu/ml/ https://archive.ics.uci.edu/ml/ http://dx.doi.org/10.7717/peerj-cs.238 Figure 6 Schematic diagram: representations of HCM, LRCM, PRCM, GRCM, GNR, CNR, and their combinations by a linear function threshold in LiFTCM with the parameters (α, β), and their relation- ships. Full-size DOI: 10.7717/peerjcs.238/fig-6 classes, the minimum total error of the cluster centers and the class centers, which is called center-error, was taken as the measurement value. Let b̂c be the class center of the class corresponding to cluster c. Center-error is calculated by center_error = C∑ c=1 ||bc−b̂c||. (21) If the center-error is small, the accuracy of class center estimation is high, and clustering performance is assumed to be high. Figure 7 shows the center-error measurements as α and β take 100 equally distributed values using contour lines. Colors closer to purple imply smaller center-error and hence better clustering performance. Figure 7A shows the results for the Iris dataset. Performance is improved at approximately α=1 and β=0.35, and when α is increased, performance is maintained by decreasing β. This implies that moderate roughness improves performance. Figure 7B shows the results for the Wine dataset. Performance is improved at approximately α=1.9 and β =100. When α and β exceed certain values, performance deteriorates rapidly. This implies that moderate roughness is acceptable, but excessive roughness degrades performance. Figure 7C shows the results for the Glass dataset. Performance is improved at approximately α=1.3 and β=0.7. As with the Iris dataset, performance is improved with moderate roughness. Figure 7D shows the results for the Breast Cancer Wisconsin dataset. Performance is improved at approximately α=1 and β=2, and it is clear that performance is improved with moderate roughness, as is the case with the Iris and the Glass datasets. Therefore, it is suggested that performance is improved when α and Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 14/20 https://peerj.com https://doi.org/10.7717/peerjcs.238/fig-6 http://dx.doi.org/10.7717/peerj-cs.238 Table 2 Characteristics of the datasets and the range of parameters α, β, δc, and z, which tune the lin- ear function threshold in LiFTCM. Dataset #classes #features #objects (#objects in classes) Settings of parameters Iris 3 4 150 (50, 50, 50) α ∈ [1,1.2], β ∈ [0,0.6], δc ∈[0.85,1.5], z ∈[0,1] Wine 3 13 178 (59, 71, 48) α ∈ [1,2.4], β ∈ [0,250], δc ∈[150,1000], z ∈[0,1] Glass 6 9 214 (70, 76, 17, 13, 9, 29) α∈[1,1.6], β∈[0,1.5], δc ∈[10,30], z ∈[0,1] Breast Cancer Wisconsin 2 9 683 (444, 239) α∈[1,1.5], β ∈[0,4], δc ∈[5,70], z ∈[0,1] β are increased to obtain moderate roughness. Thus, the representation of combinations of LRCM and PRCM by LiFTCM performs well. Figure 8 shows the center-error measurements as δc and z take 100 equally distributed values using contour lines. Figure 8A shows the results for the Iris dataset. Performance is improved at approximately δc =1.1 and z =0.3, or at approximately δc =1.3 and z =0.3. This implies that setting an appropriate noise distance and combinations of noise and possibilistic clustering yield satisfactory results. Figure 8B shows the results for the Wine dataset. Performance is improved at approximately δc =300 and z =0.5. When δc is increased, performance is maintained by decreasing z. This implies that general noise rejection performs better than cluster-wise noise rejection when the noise distance is large. Figure 8C shows the results for the Glass dataset. Performance is improved at approximately δc =25 and z =0.05. Among combinations, those closer to general noise rejection perform well. Figure 8D shows the results for the Breast Cancer Wisconsin dataset. Performance is improved at approximately δc =20 and z =0.2. As with the other datasets, combinations perform well. As in the case of the Wine dataset, states close to general noise rejection perform well when δc is large. Therefore, the representation of combinations of GNR and CNR by LiFTCM is satisfactory. When the noise distance is large, states close to GNR tend to yield satisfactory results. DISCUSSION Cluster center calculation utilizing probabilistic memberships RCM-type methods have the problem that even if the number of objects in the boundary area is small, they have unnaturally large impacts on the new cluster center compared to the objects in the lower area, because the cluster center is calculated by the convex combination of these areas. To cope with the problem, Peters proposed πPRCM by introducing the Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 15/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238 Figure 7 Minimum total errors between cluster centers and class centers by LiFTCM(α≥1, β≥0, w = 1) representing GRCM(w =1): (A) Iris, (B) Wine, (C) Glass, and (D) Breast Cancer Wisconsin. Full-size DOI: 10.7717/peerjcs.238/fig-7 cluster center calculation based on the normalized membership of the membership to the upper area, which satisfies the probabilistic constraint (Peters, 2014; Peters, 2015). ‘‘π’’ is an acronym that stands for ‘‘Principle of Indifference,’’ in which the probability is assigned equally by dividing the number of possible clusters. Ubukata et al. proposed πGRCM (Ubukata et al., 2018) based on GRCM. The proposed LiFTCM has almost the same formulation as GRCM except that the condition α≥1 is relaxed to α≥0. Thus, πLiFTCM can be formulated in a similar manner to πGRCM by introducing the following normalized membership ũci of the membership to the upper area and the cluster center calculation based on ũci: ũci= uci∑C l=1uli , (22) bc = ∑n i=1ũcixi∑n i=1ũci . (23) Here, attention should be paid to the following cases. In the case of α <1, that is, in the case of GNR and CNR, since non-belonging of the object to any cluster is handled and Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 16/20 https://peerj.com https://doi.org/10.7717/peerjcs.238/fig-7 http://dx.doi.org/10.7717/peerj-cs.238 Figure 8 Minimum total errors between cluster centers and class centers by LiFTCM(α= 1− z, β = zδc, z ∈ [0,1], w = 1) representing HCM, GNR, CNR, and their combinations: (A) Iris, (B) Wine, (C) Glass, and (D) Breast Cancer Wisconsin. Full-size DOI: 10.7717/peerjcs.238/fig-8 thus the denominator ∑C l=1uli can become zero, it is necessary to set ũci=0 for all clusters in such cases. CONCLUSIONS In this study, as a unified approach for general noise rejection (GNR) and cluster-wise noise rejection (CNR) in hard C-means (HCM), we proposed linear function threshold- based C-means (LiFTCM) by relaxing generalized rough C-means (GRCM) clustering. We showed that the linear function threshold-based assignment in LiFTCM can represent GNR, CNR, and their combinations as well as GRCM. By the visualization of the classification boundaries, transitions among conventional methods based on LiFTCM and their characteristics were clarified. In the numerical experiments, the clustering performance by LiFTCM with various parameter settings was evaluated. It was demonstrated that combinations of LRCM and PRCM, or combinations of GNR and CNR by LiFTCM performed well. Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 17/20 https://peerj.com https://doi.org/10.7717/peerjcs.238/fig-8 http://dx.doi.org/10.7717/peerj-cs.238 We plan to investigate the relationship between the proposed method and fuzzy clustering with noise rejection. Automatic determination of parameters will also be considered. ADDITIONAL INFORMATION AND DECLARATIONS Funding This work was supported by JSPS KAKENHI Grant Numbers JP17K12753, and the Program to Disseminate Tenure Tracking System, MEXT, Japan. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Grant Disclosures The following grant information was disclosed by the author: JSPS KAKENHI: JP17K12753. Program to Disseminate Tenure Tracking System, MEXT, Japan. Competing Interests The authors declare there are no competing interests. Author Contributions • Seiki Ubukata conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, performed the computation work, authored or reviewed drafts of the paper, approved the final draft. Data Availability The following information was supplied regarding data availability: The raw datasets are available in the Supplementary File. Supplemental Information Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj-cs.238#supplemental-information. REFERENCES Arthur D, Vassilvitskii S. 2007. k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 1027–1035. Bezdek JC. 1981. Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press. Cuesta-Albertos JA, Gordaliza A, Matrán C. 1997. Trimmed k-means: an attempt to ro- bustify quantizers. The Annals of Statistics 25(2):553–576 DOI 10.1214/aos/1031833664. Davé RN. 1991. Characterization and detection of noise in clustering. Pattern Recognition Letters 12(11):657–664 DOI 10.1016/0167-8655(91)90002-4. Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 18/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.238#supplemental-information http://dx.doi.org/10.7717/peerj-cs.238#supplemental-information http://dx.doi.org/10.7717/peerj-cs.238#supplemental-information http://dx.doi.org/10.1214/aos/1031833664 http://dx.doi.org/10.1016/0167-8655(91)90002-4 http://dx.doi.org/10.7717/peerj-cs.238 Davé RN, Krishnapuram R. 1997. Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 5(2):270–293 DOI 10.1109/91.580801. Denœux T, Kanjanatarakul O. 2016. Evidential clustering: a review. In: International symposium on integrated uncertainty in knowledge modelling and decision making. Cham: Springer, 24–35. Denœux T, Masson M. 2003. Clustering of proximity data using belief functions. In: Intelligent systems for information processing. Amsterdam: Elsevier, 291–302. Denœux T, Masson M-H. 2004. EVCLUS: evidential clustering of proximity data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(1):95–109 DOI 10.1109/TSMCB.2002.806496. Dubes RC, Jain AK. 1988. Algorithms for clustering data. New Jersey: Prentice hall Englewood Cliffs. Dunn JC. 1973. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3(3):32–57 DOI 10.1080/01969727308546046. Edgeworth FY. 1887. On observations relating to several quantities. Hermathena 6(13):279–285. García-Escudero LÁ, Gordaliza A. 1999. Robustness properties of k means and trimmed k means. Journal of the American Statistical Association 94(447):956–969 DOI 10.1080/01621459.1999.10474200. Gauss CF. 1809. Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium. Hamburg: Friedrich Perthes und I. H. Besser. Hampel FR. 1975. Beyond location parameters: Robust concepts and methods. Bulletin of the International Statistical Institute 46(1):375–382. Huber PJ. 1964. Robust estimation of a location parameter. The Annals of Mathematical Statistics 35(1):73–101. Huber PJ. 1981. Robust statistics. New York: John Wiley & Sons. Katsavounidis I, Kuo C-CJ, Zhang Z. 1994. A new initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letters 1(10):144–146 DOI 10.1109/97.329844. Kaufmann L, Rousseeuw PJ. 1987. Clustering by means of medoids. In: Dodge Y, ed. Statistical data analysis based on the L1—norm and related methods. Amsterdam: Elsevier, 405–416. Krishnapuram R, Keller JM. 1993. A possibilistic approach to clustering. IEEE Transac- tions on Fuzzy Systems 1(2):98–110 DOI 10.1109/91.227387. Krishnapuram R, Keller JM. 1996. The possibilistic c-means algorithm: insights and recommendations. IEEE Transactions on Fuzzy Systems 4(3):385–393 DOI 10.1109/91.531779. Legendre A-M. 1805. Nouvelles méthodes pour la détermination des orbites des cométes. Paris: F. Didot. Lingras P, West C. 2004. Interval set clustering of web users with rough k-means. Journal of Intelligent Information Systems 23(1):5–16 DOI 10.1023/B:JIIS.0000029668.88665.1a. Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 19/20 https://peerj.com http://dx.doi.org/10.1109/91.580801 http://dx.doi.org/10.1109/TSMCB.2002.806496 http://dx.doi.org/10.1080/01969727308546046 http://dx.doi.org/10.1080/01621459.1999.10474200 http://dx.doi.org/10.1109/97.329844 http://dx.doi.org/10.1109/91.227387 http://dx.doi.org/10.1109/91.531779 http://dx.doi.org/10.1023/B:JIIS.0000029668.88665.1a http://dx.doi.org/10.7717/peerj-cs.238 MacQueen J. 1967. Some methods for classification and analysis of multivariate obser- vations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1. Oakland, 281–297. Maji P, Pal SK. 2007a. RFCM: a hybrid clustering algorithm using rough and fuzzy sets. Fundamenta Informaticae 80(4):475–496. Maji P, Pal SK. 2007b. Rough set based generalized fuzzy C-means algorithm and quantitative indices. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 37(6):1529–1540 DOI 10.1109/TSMCB.2007.906578. Masson M-H, Denœux T. 2008. ECM: an evidential version of the fuzzy c-means algorithm. Pattern Recognition 41(4):1384–1397 DOI 10.1016/j.patcog.2007.08.014. Pal NR, Pal K, Bezdek JC. 1997. A mixed c-means clustering model. In: Proceedings of 6th international fuzzy systems conference, vol. 1. IEEE, 11–21. Pal NR, Pal K, Keller J. M, Bezdek JC. 2005. A possibilistic fuzzy c-means clustering algorithm. IEEE Transactions on Fuzzy Systems 13(4):517–530 DOI 10.1109/TFUZZ.2004.840099. Pawlak Z. 1982. Rough sets. International Journal of Computer & Information Sciences 11(5):341–356 DOI 10.1007/BF01001956. Pawlak Z. 1991. Rough sets: theoretical aspects of reasoning about data. Vol. 9. Dordrecht: Kluwer Academic Publishers. Peters G. 2006. Some refinements of rough k-means clustering. Pattern Recognition 39(8):1481–1491 DOI 10.1016/j.patcog.2006.02.002. Peters G. 2014. Rough clustering utilizing the principle of indifference. Information Sciences 277:358–374 DOI 10.1016/j.ins.2014.02.073. Peters G. 2015. Is there any need for rough clustering? Pattern Recognition Letters 53:31–37 DOI 10.1016/j.patrec.2014.11.003. Peters G, Crespo F, Lingras P, Weber R. 2013. Soft clustering–fuzzy and rough ap- proaches and their extensions and derivatives. International Journal of Approximate Reasoning 54(2):307–322 DOI 10.1016/j.ijar.2012.10.003. Rousseeuw PJ. 1984. Least median of squares regression. Journal of the American Statistical Association 79(388):871–880 DOI 10.1080/01621459.1984.10477105. Rousseeuw PJ, Leroy A. 1987. Robust regression and outlier detection. New York: John Wiley & Sons, Inc. Ubukata S, Kato H, Notsu A, Honda K. 2018. Rough set-based clustering utilizing probabilistic memberships. Journal of Advanced Computational Intelligence and Intelligent Informatics 22(6):956–964 DOI 10.20965/jaciii.2018.p0956. Ubukata S, Notsu A, Honda K. 2017. General formulation of rough c-means clustering. International Journal of Computer Science and Network Security 17(9):29–38. Ubukata (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.238 20/20 https://peerj.com http://dx.doi.org/10.1109/TSMCB.2007.906578 http://dx.doi.org/10.1016/j.patcog.2007.08.014 http://dx.doi.org/10.1109/TFUZZ.2004.840099 http://dx.doi.org/10.1007/BF01001956 http://dx.doi.org/10.1016/j.patcog.2006.02.002 http://dx.doi.org/10.1016/j.ins.2014.02.073 http://dx.doi.org/10.1016/j.patrec.2014.11.003 http://dx.doi.org/10.1016/j.ijar.2012.10.003 http://dx.doi.org/10.1080/01621459.1984.10477105 http://dx.doi.org/10.20965/jaciii.2018.p0956 http://dx.doi.org/10.7717/peerj-cs.238