key: cord-0805705-07mr2j4w authors: Guo, Yan-Ru; Bai, Yan-Qin; Li, Chun-Na; Bai, Lan; Shao, Yuan-Hai title: Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications date: 2021-11-05 journal: Appl Intell (Dordr) DOI: 10.1007/s10489-021-02843-z sha: 3c912162b9332a9c8f862d9f516fb1c650f5ef63 doc_id: 805705 cord_uid: 07mr2j4w The recently proposed L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) was an effective improvement over linear discriminant analysis (LDA) and was used to handle vector input samples. When faced with two-dimensional (2D) inputs, such as images, converting two-dimensional data to vectors, regardless of the inherent structure of the image, may result in some loss of useful information. In this paper, we propose a novel two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA). 2DBLDA maximizes the matrix-based between-class distance, which is measured by the weighted pairwise distances of class means and minimizes the matrix-based within-class distance. The criterion of 2DBLDA is equivalent to optimizing the upper bound of the Bhattacharyya error. The weighting constant between the between-class and within-class terms is determined by the involved data that make the proposed 2DBLDA adaptive. The construction of 2DBLDA avoids the small sample size (SSS) problem, is robust, and can be solved through a simple standard eigenvalue decomposition problem. The experimental results on image recognition and face image reconstruction demonstrate the effectiveness of 2DBLDA. Feature extraction plays an important role in pattern recognition. As a powerful supervised feature extraction method, linear discriminant analysis (LDA) [1] has been successfully applied in many problems, such as face recognition [2, 3] , text mining [4, 5] , image retrieval [6, 7] , gait recognition [8] , and microarrays [9, 10] . However, classical LDA is a vector (or one-dimensional) 1D based method. When input data are naturally of matrix (or two-dimensional) 2D form, such as images, two issues may arise. First, converting 2D data to 1D data may produce high-dimensional vectors and hence may lead to the small sample size (SSS) problem [11] . For example, a 32×32 face image corresponds to a 1024-dimensional vector. Second, during the transformation from 2D data to 1D data, the underlying spatial (structural) information Chun-Na Li na1013na@163.com Extended author information available on the last page of the article. is destroyed. Therefore, useful discriminant information may be lost [12, 13] . To handle these problems, many image-as-matrix methods have been developed [14, 15] . In contrast to the image-as-vector methods, the imageas-matrix methods treat an image as a two-order tensor, and their objective functions are expressed as functions of the image matrix instead of the high-dimensional image vector. The representative image-as-matrix method is twodimensional LDA (2DLDA) [16] . 2DLDA constructed the within-class scatter matrix and between-class scatter matrix by using the original image samples represented in matrix form rather than converting matrices to vectors beforehand. Compared to LDA, 2DLDA can alleviate the SSS problem when a mild condition is satisfied [17] and can preserve the original structure of the input matrix. Thereafter, some modifications and improvements of 2DLDA were studied by many researchers. Due to the squared L2-norm nature of 2DLDA, it was sensitive to noise and outliers. To improve the robustness of 2DLDA, robust replacements of the L2-norm were studied, including the L1-norm [18] [19] [20] [21] , nuclear norm [22, 23] , Lp-norm [24, 25] , and Schatten Lp-norm, 0 < p < 1 [26] . Some of the studies focused on extracting the discriminative transformations on both sides of the matrix samples. The authors in [27, 28] implemented 2DLDA on matrices in sequence or independently and then combined left-and right side transformations to achieve bilateral dimensionality reduction. Li et al. [25] used iterative schemes to extract transformations on both sides. Extensions to other machine learning problems and real applications were also investigated. For example, Wang et al. [29] proposed a convolutional 2DLDA for nonlinear dimensionality reduction, and Xiao et al. [30] studied a twodimensional quaternion sparse discriminant analysis that met the requirements of representing RGB and RGB-D images. Although 2DLDA can ease the SSS problem, it may still face the singularity issue theoretically as LDA since it needs to solve a generalized eigenvalue problem. Recently, a novel vector-based L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) [31] was proposed. Compared to LDA, L2BLDA solved a simple standard eigenvalue decomposition problem rather than a generalized eigenvalue decomposition problem, which avoided the singularity issue and had robustness. In fact, minimizing the Bhattacharyya error [32] bound is a reasonable way to establish classification [33] . In this paper, inspired by L2BLDA, to cope with the SSS problem and improve the robustness of 2DLDA, we first derive a Bhattacharyya error upper bound for matrix input classification and then propose a novel twodimensional linear discriminant analysis by minimizing this Bhattacharyya error upper bound, called 2DBLDA. The proposed 2DBLDA has the following characteristics: • 2DBLDA is proposed for the novel two-dimensional matrix input problem. The 2DBLDA criterion is proven to be an upper bound of the theoretical framework of the Bhattacharyya error bound optimality. We have proved that optimizing this upper bound of the Bhattacharyya error can lead to an optimal discriminant direction. Therefore, the rationality of the 2DBLDA optimization problem is guaranteed theoretically. • The weighting constant of the between-class distance and the within-class distance of 2DBLDA is adaptive to the involved data that is calculated according to input data. This constant not only helps the objective of 2DBLDA achieve the minimum error bound but also makes the proposed 2DBLDA adaptive without tuning any parameters. By considering the above weighted between-class distance information, 2DBLDA could achieve robustness. • Unlike 2DLDA, 2DBLDA is solved effectively through a standard eigenvalue decomposition problem, which does not involve the inverse of a matrix and hence avoids the SSS problem. • To observe the discriminant ability of our method, we consider the accuracy of different databases, plot the variation of the accuracy with dimension reduction, and measure the reconstruction performance of the face image. The experimental results on image recognition and face reconstruction demonstrate the effectiveness of 2DBLDA. The paper is organized as follows. Section 2 briefly introduces LDA, L2BLDA and 2DLDA. Section 3 proposes our 2DBLDA and gives the corresponding theoretical analysis. Section 4 compares 2DBLDA with its related approaches. Section 5 discusses the relationship between our 2DBLDA and related methods and analyses the experimental results. Finally, the concluding remarks are given in Section 6. The proof of the Bhattacharyya error upper bound of 2DBLDA is given in the Appendix. The notations of this paper are given as follows. We consider a supervised learning problem in the d 1 The training dataset is given by T = {(X 1 , y 1 ), ..., (X N , y N )}, where X l ∈ R d 1 ×d 2 is the l-th input matrix sample and y l ∈ {1, ..., c} is the corresponding label, l = 1, ..., N. Assume that the i-th class contains N i samples, i = 1, . . . , c. Then, we have X l be the mean of all matrix samples and X is be the mean of matrix samples in the i-th class. For a matrix Q = (q 1 , q 2 , . . . , q n ) ∈ R m×n , its Frobenius norm (F-norm) The F-norm is a natural generalization of the vector L2-norm on matrices. Linear discriminant analysis (LDA) finds a projection transformation matrix W such that the ratio of betweenclass distance to within-class distance is maximized in the projected space. For data in R n , LDA finds an optimal W ∈ R n×r , r ≤ n, such that the most discriminant information of the data is retained in R r by solving the following problem: where tr(·) is the trace operation of a matrix, and the between-class scatter matrix S b and the within-class scatter matrix S w are defined by and where x i ∈ R n is the mean of the samples in the i-th class, x ∈ R n is the mean of the whole data, and x is ∈ R n is the s-th sample of the i-th class. The optimization problem (1) is equivalent to the generalized problem S b w = λS w w, where λ = 0, with its solution W = (w 1 , . . . , w r ) given by the first r largest eigenvalues of (S w ) −1 S b in case S w being nonsingular. As an improvement over LDA, the L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) [31] is a recently proposed vector-based weighted linear discriminant analysis. In the vector space R n , by minimizing an upper bound of the Bhattacharyya error, the optimization problem of L2BLDA is formulated as where W ∈ R n×r , r ≤ n, P i = N i N , P j = N j N , x i ∈ R n is the mean of the samples in the i-th class, x is ∈ R n is the s-th sample of the i-th class, Δ = 1 4 c i