key: cord-0303014-3bnqzcke authors: Fang, Xiang; Hu, Yuchong; Zhou, Pan; Wu, Dapeng Oliver title: V3H: View Variation and View Heredity for Incomplete Multi-view Clustering date: 2020-11-23 journal: nan DOI: 10.1109/tai.2021.3052425 sha: 59d87032ae5df56f65154467465704d3a9171e57 doc_id: 303014 cord_uid: 3bnqzcke Real data often appear in the form of multiple incomplete views. Incomplete multi-view clustering is an effective method to integrate these incomplete views. Previous methods only learn the consistent information between different views and ignore the unique information of each view, which limits their clustering performance and generalizations. To overcome this limitation, we propose a novel View Variation and View Heredity approach (V3H). Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively. Then, by aligning different views based on their cluster indicator matrices, V3H integrates the unique information from different views to improve the clustering performance. Finally, with the help of the adjustable low-rank representation based on the heredity matrix, V3H recovers the underlying true data structure to reduce the influence of the large incompleteness. More importantly, V3H presents possibly the first work to introduce genetics to clustering algorithms for learning simultaneously the consistent information and the unique information from incomplete multi-view data. Extensive experimental results on fifteen benchmark datasets validate its superiority over other state-of-the-arts. Abstract-Real data often appear in the form of multiple incomplete views. Incomplete multi-view clustering is an effective method to integrate these incomplete views. Previous methods only learn the consistent information between different views and ignore the unique information of each view, which limits their clustering performance and generalizations. To overcome this limitation, we propose a novel View Variation and View Heredity approach (V 3 H). Inspired by the variation and the heredity in genetics, V 3 H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively. Then, by aligning different views based on their cluster indicator matrices, V 3 H integrates the unique information from different views to improve the clustering performance. Finally, with the help of the adjustable low-rank representation based on the heredity matrix, V 3 H recovers the underlying true data structure to reduce the influence of the large incompleteness. More importantly, V 3 H presents possibly the first work to introduce genetics to clustering algorithms for learning simultaneously the consistent information and the unique information from incomplete multi-view data. Extensive experimental results on fifteen benchmark datasets validate its superiority over other state-of-the-arts. Impact Statement-Incomplete multi-view clustering is a popular technology to cluster incomplete datasets from multiple sources. The technology is becoming more significant due to the absence of the expensive requirement of labeling these datasets. However, previous algorithms cannot fully learn the information of each view. Inspired by variation and heredity in genetics, our proposed algorithm V 3 H fully learns the information of each view. Compared with the state-of-the-art algorithms, V 3 H improves clustering performance by more than 20% in representative cases. With the large improvement on multiple datasets, V 3 H has wide potential applications including the analysis of pandemic, financial and election datasets. The DOI of our codes is 10.24433/CO.2119636.v1. Index Terms-Incomplete multi-view clustering, View variation, View heredity. I N most real-world applications, the collected data always appear in multiple views or come from different sources [1] - [4] , which are called multi-view data [5] . As an illustration, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the strain of coronavirus that causes coronavirus disease 2019 (COVID-19) [6] - [8] . When we analyze the structural proteins of SARS-CoV-2 to study vaccines, the S (spike), E (envelope), M (membrane), and N (nucleocapsid) proteins can serve as four views [9] - [12] . Also, the financial data often come from multiple sources, and each data source corresponds to a view. Multi-view clustering provides a natural way to integrate these views [13] - [16] . Recently, many multi-view clustering algorithms have been proposed [17] - [19] . Most of them assume that each view is complete. But real-world data often suffer from incompleteness [20] - [22] . For example, when detecting COVID-19, the blood test, the temperature measurement, and the neuroimage can be regarded as three views of the detection. But when the disease first broke out, most individuals only perform one or two tests due to the lack of detection conditions. As such, this incompleteness may lead to the lack of columns or rows in the view matrix, which fails previous algorithms. To cluster incomplete multi-view data, some efforts have been made. PVC [23] aligns the same samples in different views by constructing a latent subspace. To solve the incomplete multi-modality clustering problem, IMG [20] transforms the collected incomplete multi-view data to a complete representation in a latent space. To extend PVC to more than two views, MIC [24] extends MultiNMF [25] based on weighted nonnegative matrix factorization (NMF) [26] and L 2,1 -norm regularization. To decrease the impact of a large missing rate, DAIMC [27] extends MIC by combining semi-nonnegative matrix factorization (semi-NMF) [28] and L 2,1 -Norm regularization regression. To solve the multi-view co-clustering with incomplete data problem, [5] integrates complex patterns of incomplete multi-view data. To explore the local structure, UEAF [29] learns a consensus representation for all views. However, these incomplete multi-view clustering methods still have three main drawbacks. First, these methods only consider the consistent information between all the views and ignore the unique information of each view. For example, DAIMC performs clustering by aligning the consistent information. When clustering the data with little alignment information, DAIMC cannot achieve satisfactory performance due to ignoring the unique information. The unique information from each view can help us better analyze the geometry of the corresponding view. Second, these methods are difficult to learn nonlinear information between samples. Most of these methods are based on NMF. NMF is a linear operation, which only extracts the linear structure information among samples from the data. When processing some datasets with nonlinear structures, these methods may ignore much important nonlinear information among samples, which limits the application of these methods. Third, these methods do not perform well in some datasets with relatively large missing rates. Based on these samples presented in all views, these algorithms learn the structure information of datasets for clustering. As the missing rate increases, the number of presented samples will decrease significantly. Thus, these algorithms cannot learn enough information and will obtain poor clustering performance. If we directly use these methods to process important data, it is difficult to learn accurate data information because of the above drawbacks. As an illustration, if we use these methods to directly analyze the data of SARS-CoV-2, these methods are difficult to obtain satisfactory results, which may lead to slow research progress on COVID-19. Therefore, incomplete multi-view clustering still contains significant issues. To address these issues, we propose a novel View Variation and View Heredity approach (V 3 H). By introducing biological heredity and biological variation, the theory of genetics can effectively analyze the consistent trait information and the unique trait information in the biological world [30] - [32] . Inspired by the theory, V 3 H first learns a subspace from each view and decomposes each subspace into a heredity matrix shared by all the views and a variation matrix of the corresponding view. The shared heredity matrix can extract the consistent information between all the views, while each variation matrix can learn the unique information of the corresponding view. Based on each variation matrix, V 3 H constructs a graph Laplacian to obtain a corresponding cluster indicator matrix. Then, V 3 H aligns different views by minimizing the disagreement between each cluster indicator matrix and the consensus cluster indicator matrix. To measure the disagreement, V 3 H introduces the linear kernel into the Laplacian for spectral clustering. Finally, instead of learning the low-rank representation of all the subspaces, V 3 H designs an adjustable low-rank representation model via the η-norm of the variation matrix and the τ -norm of error matrices. V 3 H's contributions are mainly summarized as follows: • To our best knowledge, V 3 H is a pioneering work to introduce genetics into the clustering algorithm, which will promote the intersection between the clustering algorithm and other disciplines. Moreover, it is also the first attempt to learn both the consistent information and the unique information of incomplete views simultaneously based on the subspace decomposition. • By minimizing the disagreement between each cluster indicator matrix and the consensus, V 3 H can learn a satisfactory cluster indicator matrix for each view and integrate the unique information of these views, which improves its clustering performance. By introducing the linear kernel into the Laplacian, V 3 H learns the nonlinear structure in the dataset, which guarantees its applicability in datasets with nonlinear structure. • Based on the adjustable low-rank representation model, V 3 H can recover the underlying true data structure as needed, which helps us cluster the multi-view data with a relatively large missing rate. • Experimental results on fifteen benchmark datasets demonstrate the superiority of V 3 H over other stateof-the-arts. Impressively, in terms of three evaluation metrics, V 3 H improves the clustering performance by more than 20% in representative cases. The rest of the paper is organized as follows. Section II presents some related works. Section III describes the notation and the background. Section IV first motivates V 3 H's main idea, then proposes our V 3 H approach, and finally solves it efficiently. Section V evaluates V 3 H's performance. Section VI concludes the paper. II. RELATED WORKS The most relevant work of this paper is incomplete multiview clustering, and we present some related works in this section. Recently, many incomplete multi-view clustering methods have been proposed [20] , [23] , [24] , [27] , [29] . Based on the number of views clustered, we can divide these methods into the following two categories. (i) Incomplete two-view clustering (e.g., PVC and IMG). Incomplete two-view clustering methods can only cluster incomplete data with two views. PVC [23] learns the common and private latent spaces based on NMF [26] , [33] and L 1norm regularization. But PVC simply projects samples from each view into a common subspace and overlooks the global information among the two views. To obtain better clustering performance on multi-modal visual datasets, IMG [20] extends PVC and removes the nonnegative constraint to simplify optimization. But both PVC and IMG can only solve the incomplete two-view clustering problem, which limits their application to incomplete data with more than two views. (ii) Incomplete multi-view clustering (e.g., MIC, DAIMC and UEAF). Incomplete multi-view clustering methods can cluster incomplete data with more than two views. As the first method for incomplete multi-view clustering, MIC [24] first fills the missing samples in each incomplete view with average feature values, then learns a common latent subspace based on weighted NMF and L 2,1 -norm regularization. But MIC only simply fills the missing samples with average feature values and if we cluster the data with a relatively large missing rate, this simply filling may result in a serious deviation. To align the information of the presented samples, DAIMC [27] extends MIC via weighted semi-NMF [28] and L 2,1 -norm regularized regression. To obtain the robust clustering results, UEAF [29] performs the unified common embedding aligned with incomplete views inferring framework. Both DAIMC and UEAF rely too much on alignment information. When clustering the dataset without enough alignment information, DAIMC and UEAF always obtain unsatisfactory performance because the loss of alignment information will reduce the availability of their models. Note that besides three main drawbacks in Section I, the previous methods have the above drawbacks. These drawbacks always result in unsatisfactory clustering performance, which limits the real-world applications of these methods. For convenience, we define some notations through the paper. All the matrices are written in uppercase. [n] def = {1, 2, . . . , n}. For a matrix A, its ij-th element and i-th column are denoted by A i,j and A i separately; its trace is denoted by Tr(A); its Frobenius norm is denoted by ||A|| F ; its L 2,1 -norm is denoted by ||A|| 2,1 ; its nuclear norm is denoted by ||A|| * . | · | is the absolute operator; < ·, · > is the inner product operator; 1 is a column vector with all elements as 1; I is an identity matrix. For a complete multi-view dataset, it is denoted by {U (v) } nv v=1 ∈ R n×dv , where d v is the feature dimension of the v-th view, n is the sample number, and n v is the view number. When suffering from incompleteness, , H is the consensus cluster indicator matrix; M is the heredity matrix; c is the cluster number. α, β and η are nonnegative hyper-parameters. The v-th original view matrix (including missing and presented samples) is represented as U (v) ∈ R n×dv , n and d v are the number of samples and features, respectively. By removing the missing samples, we can update the v-th original view matrix to a new view matrix X (v) ∈ R mv×dv , where m v is the number of presented samples (m v < n). To indicate the update, we define an incomplete index matrix (1) As an effective complete multi-view clustering method, multi-view subspace clustering (MVSC) integrates different views by first performing subspace clustering on each view and then unifying these subspaces to learn a cluster indicator matrix [34] . The framework of MVSC is as follows where F is the cluster indicator matrix; for the v-th view, L In genetics, biological heredity denotes the passing on of traits from parents to their children; biological variation represents the unique trait information of their children. By introducing biological heredity and biological variation, genetics provides some theories to analyze the consistent trait information and the unique trait information in the biological world [35] , [36] . In genetics, P denotes the observed trait information. Based on the theory for quantitative traits influenced by maternal [35] - [37] , P is partitioned as where B H denotes the biological heredity representation (i.e., consistent trait information) and B V denotes the biological variation representation (i.e., unique trait information). Based on Eq. (3), genetics can explain the observed biological traits from the genetic level. By showing the characteristics of incomplete multi-view data, we first present the motivation of our proposed V 3 H approach. Then we model V 3 H as the joint of the view variation and the view heredity. Finally, we design a seven-step procedure to optimize V 3 H. Real-world incomplete multi-view data have the two main characteristics [38] : (i) the common samples presented in all views can be used to extract the consistent information from different views and to integrate these views; (ii) these samples existing in partial views can learn the unique information of the corresponding views. Therefore, our motivation is to simultaneously learn consistent information and unique information from incomplete multi-view data for clustering. Borrowing the idea of genetics in Section III-C, we propose some definitions in Definition 1, which relies on the following assumptions: (i) for an incomplete multi-view dataset {X (v) } nv v=1 ∈ R mv×dv , each of its subspaces is the perturbation of a consensus subspace; (ii) each subspace can represent the data structure of the corresponding view. denotes its subspace, which is calculated by Eq. (2) . Assume that all the subspaces originate from a consensus subspace Z * . Thus, Z * is defined as the parent subspace and Z (v) is defined as the child subspace. M is defined as the heredity matrix, which represents the consistent information shared by all the views. N (v) is defined as the variation matrix of the v-th view, which represents the unique information of the v-th view. The view heredity is the phenomenon that the consistent information exists in both parent subspace and child subspace. The view variation is the phenomenon that the unique information only exists in the corresponding child subspace. In general, different subspaces , but share the same heredity matrix M . View Alignment (Section IV-B) X (1) into a heredity matrix M and a variation matrix N (v) . In Section IV-B, by constructing a normalized graph Laplacian L (v) N , V 3 H aligns the cluster indicator matrix F (v) with the consensus cluster indicator matrix H. In Section IV-C, V 3 H leverages η-norm and τ -norm for adjustable low-rank representation. Section V shows the clustering results. Ideally, we want to obtain the optimal representation by learning the parent subspace Z * because Z * contains most of the available information on the data. However, it is difficult to learn an available Z * due to missing samples and noises. Therefore, we adopt a clever method to avoid learning Z * directly. Instead, we can learn both the heredity matrix M and the variation matrix N (v) as an alternative to learning Z * . For Z (v) , it can be decomposed into a heredity matrix M and a variation matrix N (v) . We assume that for a specific incomplete multi-view dataset, when the missing rate changes, the dimensions of its heredity matrix and variation matrices do not change. This assumption can guarantee that we can integrate the subspaces with different dimensions. In Eq. (3), the genetics analyzes the influence of biological heredity and biological variation on biological traits. Inspired by this, to integrate the heredity matrix and variation matrices, we design a subspace decomposition model as follows where p is adjustable as needed; only contains the information of m v presented samples. Both M and N (v) contain the information of n samples (i.e., all the samples including the missing samples and the presented samples). Eq. (4) can learn three kinds of information: the consistent information between views (learned by M ), the unique information (learned by N (v) ), and the relationship information between samples (learned by Since these three kinds of information are exactly expected in the clustering process, solving Eq. (4) can become a feasible alternative to learning Z * . Note that we have three variables (Z (v) , M , and N (v) ), but we only have one equation (Eq. (4)). Therefore, in the next section, we add some constraints on M and N (v) to solve Eq. (4). Inspired by the phenomenon that the expression traits of parents and children are similar [39] , [40] , we attempt to align the expression traits of parent subspace and child subspace 1 . We treat the cluster indicator matrices as the expression traits of the corresponding views because we can obtain better clustering results after aligning these cluster indicator matrices. To formulate the alignment, we design the following view alignment model where γ is a nonnegative hyper-parameter, F (v) ∈ R n×c is the cluster indicator matrix of the v-th view, and H ∈ R n×c is the consensus cluster indicator matrix of all the views. ) is used to learn the cluster indicator matrix of each view; Dis(F (v) , H) means the disagreement between F (v) and H; L For the measure of Dis(F (v) , H), a popular method is using the linear kernel function, which is simple and widely used. However, the linear kernel function can only learn linear structural information in most cases, and it is difficult to learn nonlinear information. An ideal method is to design a linear kernel function that can learn nonlinear structures. Fortunately, the linear kernel used in the Laplacian for spectral clustering can learn the nonlinear structure of the data [41] . Thus, we choose the linear kernel function to measure the disagreement Dis( where and HH T is the linear kernel of H. Therefore, we rewrite Eq. (5) as Norm value F-norm L 2,1 -norm Combining Eq. (2), (4), and (7), we can obtain where the constraint N (v) 1 = 1 treats all samples equally, which can learn the unique information of the v-th view; the constraint N (v) i,i = 0 can ensure that each sample can only be represented as the combination of other samples [42] . Most real-world multi-view data often have the low-rank subspace representations, which can be used to recover the underlying true data structure [43] - [45] . Thus, the reasonable low-rank representation can improve the performance of subspace-based clustering. Outlier Pursuit [46] is a popular technology to obtain the proper low-rank representation, which is formulated as where ||M || * is used to approximate the rank of M ; ||E (v) || 2,1 can learn the low-rank representation of E (v) . In fact, on the one hand, the rank approximation of M by the nuclear function will lead to a large deviation, which may result in unusable clustering results. On the other hand, L 2,1norm is indifferentiable at the point of zero, which renders the derivate of L 2 -norm at sparse rows senseless. Besides, two cases fail Eq. (9): (i) If a dataset suffers from incompleteness or noises, its heredity matrix M is often unavailable. When the unavailable heredity matrix is used directly in Eq. (9), we will have difficulty obtaining the satisfactory low-rank representation because calculating the rank of the unavailable heredity matrix will produce a large deviation. (ii) When handling different clustering tasks, we often need different E (v) with different sparsity [47] . The adjustable sparsity of E (v) is necessary. Therefore, Eq. (9) also has two drawbacks: unavailable heredity matrix and non-adjustable sparsity. The nonconvex relaxation of matrix rank is a popular technique. Evoked by [48] - [50] , to learn available M , we propose η-norm defined by where η is adjustable as needed (η > 0); w is the weight vector (w i > 0); σ i (M ) is the i-th singular value of M . Note that different from the common norm (e.g., L 2,1 -norm and L F -norm, etc.), our proposed η-norm is not a real norm. η-norm has the following characteristics: 1) η-norm is unitarily invariant, and ||M || η = ||U M V || w for any orthonormal U ∈ R m×m and V ∈ R n×n ; 2) when η → ∞, we have ||M || η → ||M || w, * , where || · || w, * is the weighted nuclear norm [51] ; 3) when η → 0, we have ||M || η → rank(M ). To show the advantage of our proposed η-norm, we compare η-norm with several rank relaxation approaches (i.e., Fig. 2(a) ). As shown in Fig. 2(a) , η-norm (η=0.001 in this figure) is closer to the true rank than other approaches. Thus, by learning the satisfactory low-rank representation, η-norm can ensure the availability of the heredity matrix. Similar to Eq. (10), to learn an adjustable sparse representation, we adopt the τ -norm of matrix E (v) defined by where τ is adjustable for different tasks. Based on matrix E (v) , we design a diagonal matrix D Theorem 1. For any matrix E (v) , we have Proof: Obviously, i,: in all the cases. Similar to η-norm, τ -norm has the following characteristics: 1) τ -norm is nonnegative and global differentiable; To verify the adjustability of τ -norm, we compare τ -norm with L 2,1 -norm and F-norm in Fig. 2(b) . When τ is relatively small (τ = 0.1), τ -norm is near to F-norm. As τ decreases, ||E (v) || τ is closer to ||E (v) || 2,1 , and E (v) becomes more sparse. Since we can choose different τ to adjust the sparsity, τ -norm has wider applications than L 2,1 -norm and F-norm. Therefore, considering both the available low-rank subspaces (Eq. (10)) and the adjustable sparsity (Eq. (11)), we can obtain the adjustable low-rank representation as follows Combining the view alignment (Eq. (8)) and the adjustable low-rank representation (Eq. (15)), we have Eq. (16) is a nonconvex function, which is often difficult to optimize directly. In the next section, we will design an iteration procedure to optimize it. To optimize Eq. (16), we design the following augmented Lagrangian function and vector ζ (v) are Lagrange multipliers, ω is a nonnegative penalty parameter. Eq. (17) is not convex for all variables simultaneously, and it is difficult to solve Eq. (17) in one step. Thus, we design the following seven-step procedure to update each variable iteratively [52] . Step 1. Update M . Fixing the other variables, the problem to update M is degraded to solve the following problem To solve Eq. (18), we first develop the following theorem. Then an optimal solution to Eq. Proof: Eq. (21b) holds because the Frobenius norm is unitarily invariant, Eq. Hence M * = U diag(σ * )V T , which is the optimal solution of Eq. (19) . Thus, Theorem 2 is proved. Note that Eq. (20) is a combination of concave and convex functions, which motivates us to leverage the difference of convex (DC) programming algorithm [53] . The algorithm decomposes a nonconvex function as the difference of two convex functions and iteratively optimizes it by linearizing the concave term at each iteration. For the i-th inner iteration, 2 /ω. For the (i + 1)-th inner iteration, we have which admits the following closed-form solution After several iterations, it at least converges to a locally optimal point σ * . Then M = U diag(σ * )V T . Step 2. Update Z (v) . Fixing the other variables, the problem to update Z (v) is degraded to minimize Setting the derivative J(Z (v) ) w.r.t Z (v) to 0, we have Based on the definition of W (v) , we can find W (v) W (v) T = I. By solving Eq. (25), we can update Z (v) by Step 3. Update N (v) and ζ. Fixing the other variables, the problem to update N (v) is degraded to minimize We define 2 /ω, and Eq. (27) can be equivalent to Note that Eq. (28) is independent to each row. Defining T j,: || 2 2 , we transform minimizing Eq. (28) into where max(Y Step 4. Update E (v) . Fixing the other variables, the problem to update E (v) is degraded to minimize Deriving Setting ∂J( Step 5. Update F (v) . Fixing the other variables, we can update Note that ||F (v) F (v) T || 2 F = ||HH T || 2 F = c, and c is a constant for a specific dataset. Thus, we transfer Eq. (35) into Obviously, we can solve Eq. (36) by the eigenvalue decomposition. The optimal F (v) is the eigenvector set that corresponds to the first c smallest eigenvalues of matrix (αL Step 6. Update H. Similar to Step 5, we update H by By the eigenvalue decomposition, we can learn the optimal H, which is also the eigenvector set corresponding to the first c largest eigenvalues of matrix Step 7. Update C and ω. We can update them by where ϕ and ω max are constants. The V 3 H algorithm is shown in Algorithm 1. We provide its codes in Code Ocean (DOI:10.24433/CO.2119636.v1) and Github (https://github.com/ZeusDavide/TAI V3H.git). F. Convergence and Complexity 1) Convergence Analysis: To optimize our proposed V 3 H, we need to solve seven subproblems in Algorithm 1. Each subproblem has a closed solution w.r.t the corresponding variable. The objective function is bounded, and all the above seven steps do not increase the objective function value. Thus, the objective function can reduce monotonically to a stationary value, and V 3 H can at least find a locally optimal solution. 2) Complexity Analysis: From Section IV-E, the major computational costs of our proposed V 3 H mainly come from the operations like matrix inverse, SVD, and eigenvalue decomposition. Therefore, Steps 1, 5, and 6 are the main computational costs. For Step 2, its major computational costs are from the inverse operation (I +X (v) T X (v) ) −1 . Since both I and X (v) are not updated in each iteration, we can pre-compute the inverse operation before the iteration for simplicity. For Steps 1, 5, and 6, they have the same computational complexity O(n 3 ). Therefore, the whole computational complexity of V 3 H is about O(in v n 3 ), where i is the iteration number, n v is the view number, and n is the sample number. Note that the complexity of V 3 H has nothing to do with the feature dimension d v . Since most real-world data are highdimensional [54] , V 3 H will have wide applications. We first illustrate the clustering performance of the proposed V 3 H, then verify V 3 H's convergence, and finally analyze the sensitivity of V 3 H's parameters. We conduct experiments on fifteen well-known popular datasets: 3-Sources 2 , 20 New Groups (20-NGs) 3 , 100 Leaves (100-Ls) 4 , BBC with 3 views (BBC (3v)) 5 , BBC with 4 views (BBC (4v)) 6 , BBCSport with 2 views (BS (2v)) 7 , BBCSport with 4 views (BS (4v)) 8 , BUAA [55] , Coil [56] , Digit 9 , NUS [57] , ORL [58] , Outdoor Scene (Scene) [59] , Yale 10 , and Extended YaleB (YaleB) [60] . For these datasets, most multi-view clustering algorithms often cluster their common subsets for simplicity. To compare fairly with these algorithms, we use the same size datasets for clustering. The important statistics of used datasets are shown in Table I , and the detailed statistics are as follows: 1) 3-Sources: it is a news dataset that has 948 samples collected from 3 views: BBC with 3560 features, Reuters with 3631 features and The Guardian with 3068 features. Following [29] , we select a subset with 169 samples, which are categorized into 6 clusters. 2) 20-NGs: it is a document dataset that has 500 samples from 3 views. Each view has 500 features. These documents are categorized into 5 clusters. 3) 100-Ls: it has 1600 samples from 100 clusters. Each sample appears in 3 views, and each view has 64 features. 4) BBC (3v) and 5) BBC (4v): the original BBC dataset has 685 samples, which are described by 3-4 views categorized into 5 clusters. Following [61] , we choose a subset with 282 samples described by 3 views. These views include 2582 features, 2544 features, 2465 features, respectively. Following [62] , we also choose the full dataset described by 4 views. These views include 4659 features, 4633 features, 4665 features, and 4684 features, respectively. 6) BS (2v) and 7) BS (4v): the original BBCSport dataset has 737 samples, which are described by 2-4 views and cate- gorized into 5 clusters. Following [63] , we select a subset with 544 samples described by 2 views. These views include 3183 features and 3203 features, respectively. Following [29] , we also use a subset with 116 samples described by 4 views. For CSMSC and MultiNMF, they cannot directly handle incomplete multi-view data. Following [27] , we first fill the missing samples with average feature values. Then we perform CSMSC and MultiNMF. Since IMG and PVC cannot perform clustering on the incomplete data with more than two views, we use these methods on all the two-views combinations and report average results for fairness. Since our proposed V 3 H has three parameters, α, β, and γ, we adjust them to get the best performance (see Section V-D). Since p, η, τ , and w i are adjustable as needed, we set p = 1, η = 10 −3 , τ = 10 −2 , and w i = 1 in our experiment for simplicity. Following [20] , we repeat each incomplete multi-view clustering experiment 10 times to obtain the average performance. Following [27] and [24] , we randomly delete some samples from each view to get incomplete views. We set the missing rate (PER) from 0 (each view is complete) to 0.5 (each view has 50% samples missing) with 0.1 as an interval. Evaluation Metric: Following [29] , we evaluate the experimental results by three popular metrics: Accuracy (ACC), Normalized Mutual Information (NMI) and Purity. For these metrics, the larger value represents better performance. For convenience, we first divide the compared methods into 3 groups: single-view methods (CK and CS), two-view methods (IMG and PVC) and multi-view methods (CSMSC, DAIMC, MIC, MultiNMF and UEAF). Then, based on the experimental results, we compare and analyze the performance of different groups. Finally, we critically analyze each method. V 3 H versus single-view methods: compared with singleview methods, V 3 H achieves better performance on all the datasets. For instance, when clustering the Scene dataset with PER=0.5 ( Fig. 7(a), 7(b) , and 7(c)), compared with CK and CS, V 3 H raises the clustering results at least 31.10% in ACC, 25.73% in NMI, and 30.54% in Purity, respectively. It is because the Scene dataset has 4 views. CK and CS only simply concatenate these views, which cannot learn the relationship information between different views. On the contrary, V 3 H can extract the relationship information by integrating different views. Therefore, integrating effectively different views is necessary for multi-view clustering. V 3 H versus two-view methods: compared with two-view methods, V 3 H obtains better clustering results in all the cases. For the datasets with more than 2 views (e.g., BS (4v), 3-Sources, 20-NGs, etc.), V 3 H performs better than PVC and IMG. When we cluster the BS (4v) dataset with PER=0.5 ( Fig. 5(a) , 5(b) and 5(c)), compared with PVC and IMG, V 3 H raises the clustering results at least 24.14% in ACC, 43.10% in NMI, and 34.47% in Purity, respectively. The reason is that PVC and IMG can only integrate two views, and structural information of the rest views is not learned, which illustrates the necessity of integrating all the views. As for the two-view datasets (e.g., BUAA and BS (2v)), V 3 H still outperforms PVC and IMG. For the BUAA dataset with PER=0.2 (Fig. 5( the performance at least 3.51% in ACC, 1.07% in NMI, and 5.97% in Purity, respectively. The main reason is that the BUAA dataset contains the nonlinear structural information, V 3 H can learn the information based on the linear kernel used in the Laplacian for spectral clustering. V 3 H versus multi-view methods: compared with multiview methods, V 3 H also achieves better clustering performance in most cases. When we cluster the BBC (4v) dataset with PER=0.5 (Fig. Fig. 4(d) , 4(e), and 4(f)), compared with multi-view methods, V 3 H raises the performance at least 25.84% in ACC, 18.95% in NMI, and 22.92% in Purity, respectively. It is because each view of the BBC (4v) dataset includes the unique information. V 3 H can learn the unique information through the corresponding variation matrix, while the unique information is ignored by multi-view methods. More impressively, as PER on the dataset increases, V 3 H achieves satisfactory and relatively stable clustering results, while the performance of all multi-view methods drops significantly. It is because based on the subspace decomposition in Eq. (4), V 3 H can learn the information from both the presented samples and the missing samples. But these multi-view methods can only learn information from the presented samples. In summary, all the methods are analyzed as follows: 1) CK and CS: for most clustering tasks, the performance of CK and CS are close because they simply concatenate all views. Although this concatenation is easy to operate, CK and CS always perform poorly due to ignoring the relationship between different views. Therefore, CK and CS often have limited applications in multi-view datasets. 2) PVC and IMG: for the incomplete two-view datasets, PVC can obtain pretty clustering performance by establishing a latent subspace from two views. Similarly, IMG also performs well in incomplete two-view clustering by introducing manifold learning into PVC. But when clustering the data with more than 2 incomplete views, PVC and IMG cannot obtain an optimal latent subspace due to ignoring the global structure of the multi-view data. Thus, it is difficult for them to obtain satisfactory results in incomplete multi-view clustering tasks. 3 view. As the missing rate increases, the clustering results of these three methods drop significantly. The reason is that these three methods simply fill the missing samples with the average feature values, which neglects the hidden information of the missing samples. Since real-world data are often incomplete, these three methods are difficult to be widely used. For these datasets with little alignment information, DAIMC and UEAF always obtain unsatisfactory clustering results because they learn the consensus representation by aligning these views. These drawbacks limit the application of these methods. 4) Our proposed V 3 H: by aligning cluster indicator matrices from different views and learning the low-rank representation, V 3 H can perform satisfactorily in most cases, which shows its wide application. Moreover, when the missing rate is relatively large (e.g., PER=0.4 or PER=0.5), V 3 H has more obvious superiority over other state-of-the-arts, which illustrates its effectiveness in high-incompleteness applications. In terms of {α, β, γ}, we conduct the hyper-parameter experiments on 3-Sources dataset. Similar to [27] , we set PER=0.5 and report V 3 H's NMI versus α, β, and γ within the set of {10 0 , 10 −1 , 10 −2 , 10 −3 , 10 −4 , 10 −5 , 10 −6 }. As shown in Fig. 8 (a) and 8(b), our proposed V 3 H obtains stable and satisfactory clustering performance across a wide range of these parameters. Thus, V 3 H is insensitive to the variation of the parameters. Also, V 3 H obtains the best clustering results when we set α=10 −3 , β= 10 −4 and γ=10 −1 , which are the recommended values. Based on the recommended values of these hyperparameters, we study the convergence by conducting the experiments on the Coil dataset with different PERs, i.e., PER=0.1, PER=0.3, PER=0.5. Fig. 8(c) show that the convergence curve versus the iteration number, and "Obj fun val" represents "objective function value", which is calculated by (||M || η + v (β||E (v) || τ + αTr(F (v) T L (v) F ), similar to [65] . Obviously, our proposed V 3 H has converged just after 10 iterations for all PERs, which shows its fast convergence. Note that the convergence curves under different PERs are close to each other. This is because, based on Eq. (4), we can learn the information of all samples (including presented samples and missing samples). When PER changes, the dimensions of M and N (v) do not change. Besides, when the iteration number is the same, obj(P ER = 0.5) > obj(P ER = 0.3) > obj(P ER = 0.1). The reason is as follows: in the objective function, only the dimensions of E and Z (v) will change as the missing rate increases. Thus, only the values of ||E (v) || τ and F will rely on missing rate. In fact, for a robust algorithm, its error matrix E will generally be small. Therefore, we can approximate the numerator of obj as a constant. As the missing rate increases, the value of ||X (v) − X (v) Z (v) − E (v) || 2 F will decrease, and the objective function value will increase. In this paper, we propose a novel View Variation and View Heredity approach (V 3 H) for incomplete multi-view clustering. As far as we know, V 3 H is the first attempt to introduce genetics into the clustering method. Also, it can learn the consistent information and the unique information based on view variation and view heredity respectively. Extensive experiments on fifteen datasets demonstrate the superiority of V 3 H over other state-of-the-art methods. Impressively, when clustering the YaleB dataset with the missing rate of 0.2, V 3 H improves at least 23.24% in ACC, 22.42% in NMI, and 22.46% in Purity over the best performing compared method. Our proposed V 3 H is an offline approach for highdimensional incomplete multi-view clustering. A larger challenge is to cluster large-scale high-dimensional data. In the future, we will introduce online learning into V 3 H for the large-scale high-dimensional data about COVID-19. We collect a large amount of data about COVID-19 every day, and online learning is an effective way to process these data. Based on online learning, we will process these data. Consistent and specific multi-view subspace clustering Multi-view clustering: A scalable and parameter-free bipartite graph fusion method Efficient and effective regularized incomplete multi-view clustering A survey on multi-view clustering Multi-view cluster analysis with incomplete data to understand treatment effects Covid-19 and italy: what next Presumed Asymptomatic Carrier Transmission of COVID-19 The epidemiology and pathogenesis of coronavirus disease (covid-19) outbreak Structure, function, and antigenicity of the sars-cov-2 spike glycoprotein Detection of SARS-CoV-2 in Different Types of Clinical Specimens Sars-cov-2 viral load in upper respiratory specimens of infected patients Structural basis for the recognition of sars-cov-2 by full-length human ace2 Multi-view clustering Multiview clustering via canonical correlation analysis Multi-view k-means clustering on big data Multi-view clustering in latent embedding space Auto-weighted multi-view co-clustering via fast matrix factorization Multi-view k-means clustering with adaptive sparse memberships and weight allocation Uniform distribution non-negative matrix factorization for multiview clustering Incomplete multi-modal visual data grouping Partial multi-view subspace clustering Online multi-view clustering with incomplete views Partial multi-view clustering Multiple incomplete views clustering via weighted nonnegative matrix factorization with l 2,1 regularization Multi-view clustering via joint nonnegative matrix factorization Learning the parts of objects by nonnegative matrix factorization Doubly aligned incomplete multi-view clustering Convex and semi-nonnegative matrix factorizations Unified embedding alignment with missing views inferring for incomplete multiview clustering Genetics of the evolutionary process Populations, species, and evolution: an abridgment of animal species and evolution Genetics of populations. Jones & Bartlett Learning Sparse projections over graph Multi-view subspace clustering The covariance between relatives for characters composed of components contributed by related individuals Analysis of cytoplasmic and maternal effects i. a genetic model for diploid plant seeds and animals Heritability in the genomics era -concepts and misconceptions Anchors bring ease: An embarrassingly simple approach to partial multi-view clustering The genetics of human populations. Courier Corporation Structural variation in the human genome Co-regularized multi-view spectral clustering Subspace clustering Robust recovery of subspace structures by low-rank representation Robust principal component analysis A ranked subspace learning method for gene expression data classification Robust pca via outlier pursuit Learning from imbalanced data A statistical view of some chemometrics regression tools Variable selection via nonconcave penalized likelihood and its oracle properties Nonlinear image recovery with halfquadratic regularization Weighted nuclear norm minimization with application to image denoising Robust pca via nonconvex rank approximation Convex analysis approach to dc programming: theory, algorithms and applications Scalable nearest neighbor algorithms for high dimensional data The buaa-visnir face database instructions Columbia object image library(coil-20) Nus-wide: A realworld web image database from national university of singapore Harter, a.: Parameterisation of a stochastic model for human face identification Experiments on high resolution images towards outdoor scene classification From few to many: illumination cone models for face recognition under variable lighting and pose Enhancing multiview clustering through common subspace integration by considering both global similarities and local structures Gmc: Graph-based multi-view clustering Multiple kernel clustering with neighbor-kernel subspace segmentation Exclusivityconsistency regularized multi-view subspace clustering Incomplete multiview spectral clustering with adaptive graph learning