key: cord-0805705-07mr2j4w
authors: Guo, Yan-Ru; Bai, Yan-Qin; Li, Chun-Na; Bai, Lan; Shao, Yuan-Hai
title: Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications
date: 2021-11-05
journal: Appl Intell (Dordr)
DOI: 10.1007/s10489-021-02843-z
sha: 3c912162b9332a9c8f862d9f516fb1c650f5ef63
doc_id: 805705
cord_uid: 07mr2j4w

The recently proposed L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) was an effective improvement over linear discriminant analysis (LDA) and was used to handle vector input samples. When faced with two-dimensional (2D) inputs, such as images, converting two-dimensional data to vectors, regardless of the inherent structure of the image, may result in some loss of useful information. In this paper, we propose a novel two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA). 2DBLDA maximizes the matrix-based between-class distance, which is measured by the weighted pairwise distances of class means and minimizes the matrix-based within-class distance. The criterion of 2DBLDA is equivalent to optimizing the upper bound of the Bhattacharyya error. The weighting constant between the between-class and within-class terms is determined by the involved data that make the proposed 2DBLDA adaptive. The construction of 2DBLDA avoids the small sample size (SSS) problem, is robust, and can be solved through a simple standard eigenvalue decomposition problem. The experimental results on image recognition and face image reconstruction demonstrate the effectiveness of 2DBLDA.

Feature extraction plays an important role in pattern recognition. As a powerful supervised feature extraction method, linear discriminant analysis (LDA) [1] has been successfully applied in many problems, such as face recognition [2, 3] , text mining [4, 5] , image retrieval [6, 7] , gait recognition [8] , and microarrays [9, 10] .

However, classical LDA is a vector (or one-dimensional) 1D based method. When input data are naturally of matrix (or two-dimensional) 2D form, such as images, two issues may arise. First, converting 2D data to 1D data may produce high-dimensional vectors and hence may lead to the small sample size (SSS) problem [11] . For example, a 32×32 face image corresponds to a 1024-dimensional vector. Second, during the transformation from 2D data to 1D data, the underlying spatial (structural) information Chun-Na Li na1013na@163.com Extended author information available on the last page of the article. is destroyed. Therefore, useful discriminant information may be lost [12, 13] . To handle these problems, many image-as-matrix methods have been developed [14, 15] . In contrast to the image-as-vector methods, the imageas-matrix methods treat an image as a two-order tensor, and their objective functions are expressed as functions of the image matrix instead of the high-dimensional image vector. The representative image-as-matrix method is twodimensional LDA (2DLDA) [16] . 2DLDA constructed the within-class scatter matrix and between-class scatter matrix by using the original image samples represented in matrix form rather than converting matrices to vectors beforehand. Compared to LDA, 2DLDA can alleviate the SSS problem when a mild condition is satisfied [17] and can preserve the original structure of the input matrix.

Thereafter, some modifications and improvements of 2DLDA were studied by many researchers. Due to the squared L2-norm nature of 2DLDA, it was sensitive to noise and outliers. To improve the robustness of 2DLDA, robust replacements of the L2-norm were studied, including the L1-norm [18] [19] [20] [21] , nuclear norm [22, 23] , Lp-norm [24, 25] , and Schatten Lp-norm, 0 < p < 1 [26] . Some of the studies focused on extracting the discriminative transformations on both sides of the matrix samples. The authors in [27, 28] implemented 2DLDA on matrices in sequence or independently and then combined left-and right side transformations to achieve bilateral dimensionality reduction. Li et al. [25] used iterative schemes to extract transformations on both sides. Extensions to other machine learning problems and real applications were also investigated. For example, Wang et al. [29] proposed a convolutional 2DLDA for nonlinear dimensionality reduction, and Xiao et al. [30] studied a twodimensional quaternion sparse discriminant analysis that met the requirements of representing RGB and RGB-D images.

Although 2DLDA can ease the SSS problem, it may still face the singularity issue theoretically as LDA since it needs to solve a generalized eigenvalue problem. Recently, a novel vector-based L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) [31] was proposed. Compared to LDA, L2BLDA solved a simple standard eigenvalue decomposition problem rather than a generalized eigenvalue decomposition problem, which avoided the singularity issue and had robustness. In fact, minimizing the Bhattacharyya error [32] bound is a reasonable way to establish classification [33] . In this paper, inspired by L2BLDA, to cope with the SSS problem and improve the robustness of 2DLDA, we first derive a Bhattacharyya error upper bound for matrix input classification and then propose a novel twodimensional linear discriminant analysis by minimizing this Bhattacharyya error upper bound, called 2DBLDA. The proposed 2DBLDA has the following characteristics:

• 2DBLDA is proposed for the novel two-dimensional matrix input problem. The 2DBLDA criterion is proven to be an upper bound of the theoretical framework of the Bhattacharyya error bound optimality. We have proved that optimizing this upper bound of the Bhattacharyya error can lead to an optimal discriminant direction. Therefore, the rationality of the 2DBLDA optimization problem is guaranteed theoretically. • The weighting constant of the between-class distance and the within-class distance of 2DBLDA is adaptive to the involved data that is calculated according to input data. This constant not only helps the objective of 2DBLDA achieve the minimum error bound but also makes the proposed 2DBLDA adaptive without tuning any parameters. By considering the above weighted between-class distance information, 2DBLDA could achieve robustness. • Unlike 2DLDA, 2DBLDA is solved effectively through a standard eigenvalue decomposition problem, which does not involve the inverse of a matrix and hence avoids the SSS problem.

• To observe the discriminant ability of our method, we consider the accuracy of different databases, plot the variation of the accuracy with dimension reduction, and measure the reconstruction performance of the face image. The experimental results on image recognition and face reconstruction demonstrate the effectiveness of 2DBLDA.

The paper is organized as follows. Section 2 briefly introduces LDA, L2BLDA and 2DLDA. Section 3 proposes our 2DBLDA and gives the corresponding theoretical analysis. Section 4 compares 2DBLDA with its related approaches. Section 5 discusses the relationship between our 2DBLDA and related methods and analyses the experimental results. Finally, the concluding remarks are given in Section 6. The proof of the Bhattacharyya error upper bound of 2DBLDA is given in the Appendix.

The notations of this paper are given as follows. We consider a supervised learning problem in the d 1 

The training dataset is given by T = {(X 1 , y 1 ), ..., (X N , y N )}, where X l ∈ R d 1 ×d 2 is the l-th input matrix sample and y l ∈ {1, ..., c} is the corresponding label, l = 1, ..., N. Assume that the i-th class contains N i samples, i = 1, . . . , c. Then, we have

X l be the mean of all matrix samples and

X is be the mean of matrix samples in the i-th class. For a matrix Q = (q 1 , q 2 , . . . , q n ) ∈ R m×n , its Frobenius norm (F-norm)

The F-norm is a natural generalization of the vector L2-norm on matrices.

Linear discriminant analysis (LDA) finds a projection transformation matrix W such that the ratio of betweenclass distance to within-class distance is maximized in the projected space. For data in R n , LDA finds an optimal W ∈ R n×r , r ≤ n, such that the most discriminant information of the data is retained in R r by solving the following problem:

where tr(·) is the trace operation of a matrix, and the between-class scatter matrix S b and the within-class scatter matrix S w are defined by

and

where x i ∈ R n is the mean of the samples in the i-th class, x ∈ R n is the mean of the whole data, and x is ∈ R n is the s-th sample of the i-th class. The optimization problem (1) is equivalent to the generalized problem S b w = λS w w, where λ = 0, with its solution W = (w 1 , . . . , w r ) given by the first r largest eigenvalues of (S w ) −1 S b in case S w being nonsingular.

As an improvement over LDA, the L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) [31] is a recently proposed vector-based weighted linear discriminant analysis. In the vector space R n , by minimizing an upper bound of the Bhattacharyya error, the optimization problem of L2BLDA is formulated as

where W ∈ R n×r , r ≤ n, P i = N i N , P j = N j N , x i ∈ R n is the mean of the samples in the i-th class, x is ∈ R n is the s-th sample of the i-th class, Δ = 1 4 c i<j P i P j ||x i − x j || 2 2 , and I ∈ R r×r is the identity matrix. L2BLDA is solved through the following standard eigenvalue decomposition problem:

Then, W = (w 1 , w 2 , . . . , w r ) is obtained by the r orthogonormal eigenvectors that correspond to the first r nonzero smallest eigenvectors of S. After obtaining the optimal W, a new sample x ∈ R n is projected into R r by W T x.

Different from LDA or L2BLDA, which works on vector samples, two-dimensional linear discriminant analysis (2DLDA) [16, 17] operates on matrix samples. 2DLDA defines the between-class scatter matrix and the within-class scatter matrix directly on the 2D data set T as

and

Then 2DLDA solves the following optimization problem:

where

can be solved through the generalized eigenvalue problem S b w = λS w w in case S w is nonsingular, and its solution is the r eigenvectors corresponding to the first largest r nonzero eigenvalues. After obtaining optimal W, a new sample X ∈ R d 1 ×d 2 is projected into R r×d 2 by W T X. Note that 2DLDA will still encounter the singularity problem when S w is not of full rank.

In this section, we derive a new two-dimensional linear discriminant analysis criterion by minimizing a Bhattacharyya error bound. From the viewpoint of minimizing the probability of classification error, the Bayes classifier is the best classifier [1] , and its error rate, known as the Bayes error, is defined as

where X is a sample, P i is the prior probability, and p i (X) is the probability density function of the i-th class of the data. Computing the Bayes error is usually hard, and therefore minimizing its upper bound is often considered an alternative effective method [35] [36] [37] . Among various bounds, the Bhattacharyya error [32] is a close upper bound to the Bayes error, which is given by

Under the background of two-dimensional supervised dimensionality reduction, if we can derive a relatively close upper bound of B , we may obtain a reasonable dimensionality reduction model. In fact, under some basic assumptions, we can obtain an upper bound of B , as shown in the following proposition.

Proposition 1 Assume P i and p i (X) are the prior probability and the probability density function of the ith class for the training data set T , respectively, and the data samples in each class are independent and identically normally distributed. Let p 1 (X), p 2 (X), . . . , p c (X) be the Gaussian functions given by p i (X) = N (X|X i , Σ i ), where X i and Σ i are the class mean and the class covariance matrix, respectively. We further suppose Σ i = Σ, i = 1, 2, . . . , c, where Σ is the covariance matrix of the data set T , and X i and Σ can be estimated accurately from T . Then for arbitrary projection vector w ∈ R d 1 , the Bhattacharyya error bound B defined by (11) 

where

Proof See the Appendix.

Proposition 1 gives a reasonable upper bound of B . After obtaining an upper error bound, it is natural to minimize it. Therefore, we minimize the upper bound of B in (12) , that is, the right side of (12). In fact, by minimizing it, we can easily obtain a novel two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA) as follows:

By applying (13), we can project a d 1 ×d 2 sample X into a 1 × d 2 sample X by X = w T X. However, it does not usually contain enough discriminant information in the 1×d 2 space, and we may need r ≥ 1 projection vectors w 1 , w 2 , . . . , w r that constitute a projection matrix W = (w 1 , w 2 , . . . , w r ) ∈ R d 1 ×r and project X into a r × d 2 space by X = W T X.

In general, we consider the following 2DBLDA

where W ∈ R r×d 1 , r ≤ d 1 . We now give the geometric meaning of 2DBLDA. Minimizing the first term in (14) will make the means of two different classes far from each other in the projected space, which guarantees the between-class separativeness. Here, the coefficients 1 N N i N j in the first term weight distance pairs between different class means. Minimizing the second term in (14) forces each sample around its own class mean in the projected space. The weighting constant Δ in front of the second term balances the between-class importance and within-class importance while also ensuring a minimum error bound according to the proof of Proposition 1. We can observe that 2DBLDA is adaptive to different data since Δ is determined by the given data set. To ensure minimum redundancy in the projected space, we also consider an orthogonormal constraint W T W = I on discriminant directions.

It is easily seen that we can solve 2DBLDA through the following standard eigenvalue decomposition problem:

where

Then, we obtain the optimal solution as W = (w 1 , w 2 , . . . , w r ), where w 1 , w 2 , . . . , w r are the r orthogonormal eigenvectors corresponding to the first r smallest nonzero eigenvectors of S.

In this section, we compare the proposed 2DBLDA with 2DPCA [34] , 2DPCA-L1 [12] , 2DLDA [16] and L1-2DLDA [18, 19] . The learning parameter δ of L1-2DLDA is selected optimally from the set {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1} by grid search. We experimented on three image databases for image recognition and one face image database for face reconstruction. In the experiments, after applying a dimensionality reduction method on training data and then obtaining a projection matrix, the test data are projected to lower dimensional space using this projection matrix. For image recognition, the nearest neighbours classifier is employed to obtain classification accuracy. In addition, when the data classes are unbalanced, area under the ROC curve (AUC) and Gmean are used as the performance measurement index. For face reconstruction, the mean reconstruction error is used for performance evaluation. All the methods will be carried out on a PC with P4 2.3 GHz CPU by Matlab 2017b.

The Yale database 1 is a human face database that contains 165 images of 15 individuals, and each individual includes 11 images. The database is considered to evaluate the performance of methods when facial expression and lighting conditions are changed.

Columbia Object Image Library (Coil100) 2 is a database of colour images of 100 objects. The objects were placed on a motorized turntable against a black background. The turntable was rotated 360 degrees to vary object pose with respect to a fixed colour camera. Images of the objects were taken at pose intervals of 5 degrees. The database contains 900 images of 100 objects, with each object containing 9 images.

The COVID database 3 has 349 CT images containing clinical findings of COVID-19 from 216 patients and 397 non-COVID CT scans. The images are collected from COVID-19 related papers from medRxiv 4 , bioRxiv 5 , Lancet, etc. In our experiment, 195 COVID-19 images and 195 non-COVID-19 images were randomly extracted.

We resize each image to 32 × 32 for all the above three databases. Since the number of samples in some classes of the image data used in the experiment is relatively small, to avoid the chance that images in these classes may not be selected due to random cross-validation, for each class, we randomly select 60% of the data samples as the training set, and deem the rest as the test set. Therefore, this strategy makes sure both train and test data set contain samples from every class. First, we obtain all projection matrices from the training data and then compute the test classification accuracy on the projected test data. Since 2DPCA, 2DLDA and 2DBLDA have no parameters, the result of one run is the final result. For L1-2DLDA, there is one parameter, and we adopt ten-fold cross validation on the training set to find its optimal parameter. Then, this optimal parameter is used to run L1-2DLDA ten times on the test set to eliminate the influence of random initialization, and the average accuracy of these ten accuracies is adopted. Similarly, for 2DPCA-L1, since its performance is affected by the initialization projections, we repeat the method ten times and adopt mean accuracy along with standard variance. The results on these databases are listed in Table 1 , and the best accuracies are shown in bold figures. From the table, we see that our 2DBLDA has comparable performance compared to other methods. The 2DPCA-L1 and L1-2DLDA obviously have the highest computational burden. In contrast, 2DBLDA costs the least CPU time than 2DPCA and 2DLDA.

To further see the superiority of our 2DBLDA, we artificially pollute the training data by adding each training sample with a rectangle block occlusion at a random location. We set the occlusion area ratio to 10%, 20%, 30%, 40%. For convenience, we denote these four data sets as Yale b0.1 , Yale b0.2 , Yale b0. 3 and Yale b0.4 , where the subscript "b" represents block occlusion and the number next to it means occlusion ratio. For the Coil100 data and COVID data, we add random rectangular Gaussian noise of mean 0 and variance 0.2 that covers 10%, 20%, 30%, 40% areas of each training image at a random position. Denote these eight data sets as Coil g0.1 , Coil g0.2 , Coil g0.3 , Coil g0.4 , COVID g0.1 , COVID g0.2 , COVID g0.3 and COVID g0. 4 , where the subscript "g" represents Gaussian noise, and the number next to it means noise ratio. Some noise samples are shown in Fig. 1 .

The classification results on the noise datasets are listed in Table 2 . From the table, we have the following observations : (i) All methods are affected by noise, and their corresponding accuracies are lower than those of the original data. In general, the larger the noise area is, the lower the accuracy is. (ii) The proposed 2DBLDA has the highest average accuracy on all noise data. (iii) L1-2DLDA and 2DPCA perform better than 2DPCA-L1 and 2DLDA. (iv) L1-2DLDA can achieve the optimal accuracy when δ is relatively small. (v) For CPU time, we see that 2DPCA-L1 and L1-2DLDA have the same computing time level but are all slower than 2DPCA and 2DLDA, and that 2DLDA and 2DBLDA run the fastest since they obtain all the discriminant vectors once for all. 

Two-dimensional Bhattacharyya bound linear discriminant analysis... The optimal parameter δ of L1-2DLDA is shown in bracket

To observe the discriminant ability of the dimensionality method, we measure feature ranking by observing the effect of sample classification in projection space and plot the accuracy variation along with the reduced dimensions in Figs. 2 and 3. Figure 2 depicts the variation of accuracies along dimensions on the original three databases, and Fig. 3 depicts the corresponding results on noise databases. The results show the following: (i) With the increase of the number of reduced dimensions, the accuracies of 2DPCA and our 2DBLDA first achieve their highest and then have a relatively steady trend, while other methods vary greatly. (ii) Regardless of on the original data or the noise data, the proposed 2DBLDA has the highest accuracy under the optimal reduced dimension. (iii) All the methods are greatly influenced by the reduced dimension, and it is necessary to choose an optimal reduced dimension. (iv) In addition, the optimal reduced dimension of 2DBLDA is not too large compared to other methods in general.

In this subsection, we verify the influence of our algorithm on unbalanced classes. To construct unbalanced data, different numbers of images are randomly selected from each class to form the training set, and the remaining data are deemed as the test set. In specific, for the COVID database, we randomly select 60% of the sample number for each class from COVID-19 images and non-COVID-19 images in a ratio of 1:1.5 as the training set. Notably, the training set and test set we construct are unbalanced. To test the robustness, as before, we pollute the training images with a black rectangular block, which covers 10%, 20%, 30% and 40% of each image at a random position. In this situation, we use AUC and G-mean to measure the performance of all methods, which are both designed for unbalanced data. the larger the noise area is the lower the performance is for all algorithms, when the block percentage increases, 2DBLDA and L1-2DLDA are less affected by noise, while the performance of other methods decreases dramatically and the proposed 2DBLDA is the best. The result is in fact consistent with the formulation of 2DBLDA, where its weighted between-class distance information and weighting constant of the between-class distance and the within-class distance make contribution to its good performance on unbalanced problems. The result also shows that compared to other methods, our 2DBLDA is more adaptive and robust to different data.

In this part, the proposed 2DBLDA and other methods are applied to face reconstruction on the Indian female database. The Indian females database contains 242 human face images of 22 female individuals, and each individual has 11 different images. The original images are resized to 32×32 pixels. We introduce face image reconstruction. For a given image X ∈ R d 1 ×d 2 , suppose we have obtained a projection matrix W = (w 1 , w 2 , . . . , w r ) ∈ R d 1 ×r , r ≤ d 1 . Then X is projected into the r × d 2 -dimensional space by (l) COVID g0.4 Fig. 3 Accuracies of all methods on three databases with different levels of noise X = W T X. Since w 1 , w 2 , . . . , w r are orthonormal, then the reconstructed image of X can be obtained by X = W X = WW T X. To measure the reconstruction performance, we use the average reconstruction error (ARE) as a performance indicator, which is defined as

where r = 1, 2, . . . , d 1 .

We first experiment on the original data and compute the ARE for each method. The variation in ARE along different dimensions is shown in Fig. 6 (a) . From the figure, we see that when the dimension is less than 15, our 2DBLDA performs the best, especially when the dimension is greater than 5. When the dimension is greater than 15, 2DPCA is comparable or slightly better than our 2DBLDA, but both of these methods almost achieve steady performance. The result shows that 2DBLDA can achieve good performance for low dimensions. The other three methods obviously perform worse than our 2DBLDA and 2DPCA on all the dimensions. When r = 15, we demonstrate the reconstructed face images for 7 random individuals in Fig. 6b . We can visually see that 2DBLDA and 2DPCA have the best reconstruction performance.

To further evaluate the effectiveness of the proposed 2DBLDA, we add two different types of noise to the data. The first type of noise is Gaussian noise with mean 0 and variance 0.05 that covers 30% of the area of each image. The ARE of each method under different dimensions is plotted in Fig. 6c . On Gaussian noise data, we see that our 2DBLDA outperforms other methods on almost all the reduced dimensions, and 2DPCA is comparable to our 2DBLDA only when the dimension is greater than 27, indicating that the proposed 2DBLDA can achieve fairly good performance by employing only a small number of reduced dimensions. We then add the second type of noise, dummy noise, to the data. Here, the dummy noise is the image that is generated from the discrete uniform distribution on [0,1] and is of the same size as the original image. An additional 100 dummy images are added to the whole database. After the projection matrix is obtained on these polluted data, it is used to reconstruct human face images. The result in Fig. 6e demonstrates that our 2DBLDA has the lowest ARE on these databases for all the dimensions, and when the dimension is greater than 20, it has a rather low ARE. The reconstructed face images when r = 15 shown in Fig. 6f also support the above argument.

To further clarify the contribution of our method, we discuss the differences between the proposed 2DBLDA and its two aspects. First, when deriving its bound, RLp2DLDA ignores the term P i P j and replaces it by 1, which obviously magnifies the upper bound. In contrast, our 2DBLDA keeps this term and fully explores this weighting information, which leads to one of good properties of 2DBLDA, that is, robustness. Second, RLp2DLDA also magnifies its upper bound when using the Lp-norm (0 < p < 1) rather than the L2-norm. Therefore, this results in two advantages of our 2DBLDA over RLp2DLDA: one is that 2DBLDA obtains a meaningful weighting parameter that does not need tuning, and the other is that 2DBLDA can simply solve its optimization problem through a standard eigenvalue problem, while RLp2DLDA solves its optimization problem through an iteration technique without proving its convergence.

(ii) Difference From L2BLDA: Compared to the vectorbased robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm (L2BLDA), the proposed 2DBLDA is a matrix-based dimensionality reduction method. Although 2DBLDA is a Fig. 4 and Fig. 5 , we see that the proposed 2DBLDA has the best performance compared to other methods. (iv) To observe the behavior of the proposed method visually, we reconstruct face images by the obtained projection matrix. Original and polluted Indian female databases are used for face reconstruction. By choosing an appropriate reduced dimension but not necessarily too large, the proposed 2DBLDA can obtain good face reconstruction performance.

This paper proposed a novel two-dimensional linear discriminant analysis via Bhattacharyya upper bound optimality (2DBLDA). Different from the existing 2DLDA, optimizing the criterion of 2DBLDA was equivalent to optimizing the upper bound of the Bhattacharyya error, leading to maximizing a weighted between-class distance and minimizing the within-class distance, where these two distances were weighted by a meaningful adaptive constant that can be computed directly from the involved data. The 2DBLDA had no parameters to be tuned and could be effectively solved by a standard eigenvalue decomposition problem. Experimental results on image recognition and face image reconstruction demonstrated the superiority of the proposed method. Our MATLAB code can be downloaded from http://www.optimal-group. org/Resources/Code/2DBLDA.html. However, a drawback of 2DBLDA is that its classification performance degrades when the class distribution of the samples is inconsistent. A TAISL technique could be used to handle this issue [38] . Since sparse learning could make the data have better interpretation after dimensionality reduc-tion [20] , one of the future studies also includes considering a sparse model. In the end, applying our algorithm to track fault detection is worth studying [39, 40] .

. (19) The upper bound of the error B can be estimated as and noting X i = w T X i , the first inequality is obtained. For the second inequality, we first note that for any z ∈ R 1×d 2 and an invertible A ∈ R d 2 ×d 2 , ||z|| 2 = ||(zA)A −1 || 2 ≤ ||zA|| 2 · ||A −1 || F , which implies ||zA|| 2 ≥ ||z|| 2 ||A −1 || F . By taking z = w T X i − w T X j and A = Σ − 1 2 , we get the second inequality. For the last inequality, since ||w|| 2 = 1, ||w T (X i − X j )|| 2 2 ≤ ||w|| 2 2 · ||X i − X j || 2 F = ||X i − X j || 2 F and 1

By multiplying a 8 P i P j to both sides of (22) and summing it over all 1 ≤ i < j ≤ c, we obtain the last inequality of (20) . 2 2 , we then obtain (12) .

Introduction to statistical pattern recognition

Facial expressions classification and false label reduction using LDA and threefold SVM

A hybrid improved kernel LDA and PNN algorithm for efficient face recognition

Text classification method based on self training and LDA topic models

Experimental explorations on short text topic mining between LDA and NMF based schemes

Multi view nonparametric discriminant analysis for image retrieval and recognition

MMDF LDA An improved multi modal latent dirichlet allocation model for social image annotation

Generalized linear discriminant analysis based on euclidean norm for gait recognition

NBLDA: Negative binomial linear discriminant analysis for RNA-seq data

Protein fold recognition using deep kernelized extreme learning machine and linear discriminant analysis

Linear discriminant analysis for the small sample size problem: an overview

L1-norm-based 2DPCA

Generalized two-dimensional PCA based on 2 -norm minimization

Horizontal and vertical nuclear norm based 2DLDA for image representation

Advanced variations of twodimensional principal component analysis for face recognition

2D-LDA: a statistical linear discriminant analysis for image matrix

Two dimensional linear discriminant analyses for hyperspectral data

L1-norm based twodimensional linear discriminant analysis

Robust L1-norm twodimensional linear discriminant analysis

Sparse L1-norm two dimensional linear discriminant analysis via the generalized elastic net regularization

Trace ratio 2DLDA with L1-norm optimization

Horizontal and vertical nuclear norm-based 2DLDA for image representation

Nuclear-norm based 2DLDA with application to face recognition

Generalized twodimensional linear discriminant analysis with regularization

Robust bilateral Lp-norm two-dimensional linear discriminant analysis

Two-dimensional discriminant analysis based on Schatten p-norm for image feature extraction

Palm vein recognition based on a modified (2d) 2 LDA. Signal Image Video Process

Novel method fusing (2d) 2 LDA with multichannel model for face recognition

Convolutional 2DLDA for nonlinear dimensionality reduction. Int Joint Conf Artif Intell

Two-dimensional quaternion sparse discriminant analysis

Robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm

Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means

Reverse nearest neighbors Bhattacharyya bound linear discriminant analysis for multimodal classification

Two-dimensional PCA: a new approach to appearance-based face representation and recognition

Distribution matching with the Bhattacharyya similarity: A Bound Optimization Framework

Dynamic Bhattacharyya bound based approach for fault classification in industrial processes

Linear dimensionality reduction for classification via a sequential Bayes error minimisation with an application to flow meter diagnostics

Cross domain intelligent fault classification of bearings based on tensor aligned invariant subspace learning and two dimensional convolutional neural networks

Multidimensional denoising of rotating machine based on tensor factorization

A classification method to detect faults in a rotating machinery based on kernelled support tensor machine and multilinear principal component analysis

Proof of Proposition 1: We first note that p i ( X) = N ( X| X i , Σ), where X i = w T X i ∈ R 1×d 2 is the i-class mean, and Σ is the covariance matrix in the 1×d 2 projected space. DenoteAccording to [1] , we havePublisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.