2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 106 Fast Aerial UAV Detection Based on Image Segmentation and HOG-FLD Feature Fusion Li Xiaoping School of Computer Science and Engineering Xi’an Technological University Xi’an, 710021, Shaanxi e-mail: 919083845@qq.com Lei Songze School of Computer Science and Engineering Xi’an Technological University Xi’an, 710021, Shaanxi e-mail: lei_sz@163.com Wang Yanhong School of Science Xi’an Technological University Xi’an, 710021, Shaanxi e-mail: 29314998@qq.com Xiao Feng a , Tian Penghui * b School of Computer Science and Engineering Xi’an Technological University Xi’an, 710021, Shaanxi e-mail: a 544070146@qq.com b 809576423@qq.com Abstract—In order to detect non-cooperative target UAV quickly and accurately, a novel method of UAV detection method based on graph theory and HOG-FLD feature fusion is presented in this paper. In order to avoid the time-consuming full search, the candidate areas of the UAV are obtained through the selective search of the image segmentation and the similarity, and the features are extracted through the method of gradient orientation histogram fusion FLD linear to train the SVM classifier with generalization ability to identify the UAV. The method can detect the UAV quickly and accurately under complicated background and circumstances of various position and angle. Compared with the sliding window method based on image segmentation and HOG+SVM, the experimental results show that the speed of this method has been obviously improved with the same recognition accuracy. Keywords-Image Segmentation; Graph Theory; HOG; FLD; SVM I. INTRODUCTION In recent years, UAV have been widely used in the military field. Due to its advantages of small size, low cost, low noise, high maneuverability, high concealment and superior stability, UAV can be applied to various of fields in the world for the purpose of reconnoitering enemy troops, detecting danger zone, tracking target, electronic interference, communication relay, and even completing the task of attack by carrying small-scale offensive weapons. Therefore, to detect non-cooperative UAV is necessary. To completed the task of independently identifying aerial non-cooperative UAV and monitoring in real time to border areas of different country could realize economic and extensive border surveillance and play a security role for the country and society. Recently, on the UAV detection technology of domestic and foreign scholars is rare. However, the difficulty of the UAV detection technology is mainly that the UAV in the picture is easily affected by the position, the angle, the distance from the camera and its own structure, which makes it difficult to detect the UAV. Above problem is extremely similar to difficulty of target detection technology and needs to research and practice. At present, the common methods used in target 2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 107 detection are mainly following methods: the method based on the optical flow field [1], the inter-frame difference method and the background detection based on the target detection method. Optical flow target detection is a method of moving target, which the basic principle is: that a velocity vector is given to the moving target pixel, and the image will form an optical flow field, and its background will be obviously distinguished from the moving target vector. In the meanwhile, you can get the location of the target by analyzing dynamic image. This method is suitable for all kinds of backgrounds, and it has no requirement for background. Its drawbacks are that the number of pixels is too large, the distribution of optical motion fields is too wide and inaccurate, and the computation is too large. The method of inter-frame difference [2] is used to obtain the foreground of the target by using the threshold segmentation method to obtain the difference between the pixels of the adjacent two frames, whose calculation is small, and the ability of target recognition is poor when the illumination changes. In order to ensure the accuracy and integrity of UAV target detection, the traditional method of sliding window to detect UAV requires that the sample picture and the picture to be tested need to be scaled until the target of the image to be detected is less than or equal to the template size and the detection is stopped. So that the UAV which is larger than or equal to the template size can also be detected accurately after adjusting the pixel size of the whole image to be tested. However, the HOG feature [3-5] is extracted after the image is scaled. Extracting HOG feature need to traverse the whole image by sliding window [3] until the edge contour feature of high dimension can be obtained by matching, so that the traversal need to take a long time, which leads to the slow generation of feature descriptors, and the scale of each level requires a large amount of computation. An image segmentation method based on graph theory and HOG-FLD feature fusion is proposed for UAV detection in this paper. This method is divided into training stage and test stage. In the training stage, it is necessary to unify and gray-scale processed the pixel size of positive and negative samples set be collected in advance, and then it could obtain the feature vector with small dimension which can stably describe the UAV contour feature after the feature extraction is carried out by using HOG-FLD feature fusion method. Finally, the feature vector is input into the support vector machine model, the support vector machine classifier is trained after a series of statistical learning. In the test stage, the candidate regions are screened out by using the segmentation method based on graph theory and are extracted features according to the feature extraction method of the training stage, and then the extracted features are input into the trained classifier to carry out classification and detecting whether the candidate area includes the target UAV. Image segmentation uses the connected method of graph [6] or the minimum spanning tree to merge the region, which mainly determines whether to merge the two regions according to the similarity of each region of the image. This method could not only obtain the global features of the image and the candidate area of UAV, but also the calculation speed is very fast and the processing efficiency is also high. The features obtained by combining the HOG features of gradient histogram and the Fisher linear discriminant analysis (FLD) are selected as the features of the statistical learning training classifier in feature extraction stage. Because of the advantages that the features of small dimension and good robustness and good classification ability extracted by HOG-FLD feature fusion are rarely affected by the changes of local illumination, background change, target location and angle change, it is easier to train a classifier that is easy to classify and has strong generalization ability. II. SEGMENTATION Method Based on Graph Theory to Obtain Region of Interest A. Segmentation Method Based on Graph Theory Image segmentation is a way to separate the image into regions with different characteristics and separate the interested objects from the background in the segmented blocks. Let interest objects and backgrounds in segmented regions show a clear sense of contrast based on human visual effects. Image segmentation method plays an important role in further 2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 108 analysis of image such as image compression and image recognition. The image segmentation method based on graph theory (Graph-Based Image Segmentation) and the region merging method are selected to segment the image in this paper. The image segmentation method based on graph theory takes the image as the object transforming the image into the right undirected graph, and uses the connected branch method of graph [6] to segment the image, and then it could obtain the region block. The region merging method is to merge the segmented region blocks according to the specific merging rules. Because of great different characteristics of the UAV and the background in texture, color, size and degree of agreement, regional block of great different characteristics need to be decomposed and regional block of small different features need to be merged [11], which follows the principle of [10] the optimal segmentation. B. Acquisition Principle of UAV Region of Interest The image contains information such as shape, size, color, texture and so on. Image will be segmented after the whole image is traversed by the search algorithm of graph theory. In the graph theory, the definition of the image - to - graph [12] is described as follows: ),( EVG  is an undirected graph with V as its vertex and E as its edge set. Any edge Evv ji ),( is defined as a weight function ),( ji vvW greater than zero. Any vertex ji vv , represents the pixel point of the graph to be processed in graph theory, and the weight of any edge represents the gray difference, the distance or some other features between the adjacent pixels of i v , j v and ),( ji vv in an image. Using Graph Cuts algorithm [7], the graph G is divided into several disjoint independent regions, and the region is defined as )',(' EVG  , E' is a subset of E. The regions of the picture is segmented by the algorithm of Minimun Spanning Tree(MST) [14] that the difference of the pixels in the same region is measured through the maximum weight edge, of which the largest weight edge of segmentation region values are defined as follows: )(max)( ),( eCInt ECMSTe     VC   Where C is the generated partition region, and ),( ECMST represents the set of partitioned regions or the Minimun Spanning Tree [17] that C generates in E. The resulting dissimilarity between pixels within the segmented region could also be understood as the weight of the minimum edge that connects the vertices of two segmented regions, as defined below: )),((min),( ),(,, 21 21 ji EvvCvCv vvwCCDif jiji    When there is no edge connection between the two partitioned regions, there is ),( 21 CCDif . The condition for the appearance of its partition boundary is defined as follows:      otherwisefalse CCMIntCCDififtrue CCD ),(),(, ),( 2121  If true in the expression is satisfied, then the boundary is represented, then the region is segmented. otherwise, it is merged. Where the smallest internal difference of division is defined as follows: ))()(),()(min(),( 221121 CCIntCCIntCCMInt    () is a threshold function, which is used to control the merging degree of the image segmentation region, which is defined as CkC /)(  , where C is the number of the segmentation region or the vertex, and k is a constant parameter that is used to control the coarse granularity of its image segmentation area. The segmentation effect is shown in Figure 1 (b).That small regions will be merged means detecting whether the segmented region meets the regional boundary conditions, if not, then they will be merged. The regional boundary conditions could be measured by the following four types of similarity in the image: 1) Color [13] similarity ),( jicolour rrS means that 25 bins one-dimensional color histogram of the three color channels 2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 109 are acquired after image is normalized. Or that means that the three color components are combined into one dimensional feature vector, so that each region of the image could be represented as a 75 dimensional feature vector },...,{ 1 n iii ccC  , and the color calculation formula of the region is: )()( )()( ),min(),( 1 ji jjii t n k k j k ijicolour rsizersize CrsizeCrsize C ccrrS        Where )( irsize is the number of pixels contained in the region( i r ), and the number of pixels contained in the new region is )()( ji rsizersize  . 2) Texture [13] similarity ),( jitexture rrS means that calculate Gaussian differential for 8 different directions of three color channels separately (its 1 ), 10 bins one-dimensional histogram of each direction of three color channels is respectively obtained after image is normalized, and then the region could be expressed as a 240-dimensional vector },...,{ 1 n iii ttT  , and its texture similarity calculation formula is as follows:    n k k j k ijitexture ttrrS 1 ),min(),(  3) Dimension similarity ),( jisize rrS , which is used to merge smaller regions as early as possible. Its formulas are as follows: )( )()( 1),( imsize rsizersize rrS ji jisize    Where im refers to the whole image. 4) The matching similarity ),( ji rrfill is used to merge the intersected regions as soon as possible. Its account form is as follows: )( )()()( 1),( imsize rsizersizeBBsize rrfill iiij ji   (9) the similarity set S could be gotten by combining these four kinds of similarity, and the formula of the combination is as follows: ),(),( ),(),(),( 43 21 jifilljisize jitexturejicolourji rrSarrSa rrSarrSarrS    In order to judge whether the boundary condition of }1,0{ia satisfies the similarity. The two segmented region blocks i r and j r which have the largest similarity are merged and a region t r is obtained after the similarity degree of the segmented regions in the similarity set S is compared and sorted. The similarity between two adjacent regions of i r and j r in the set S is removed in the meanwhile. Then according the step of this method , we begin to calculate the similarity between the new merged region t r and its current adjacent regions, and we add this similarity to the similarity set S again. At the same time, we also add t r to the region set R , and we finally label the region with a rectangular box. The segmentation effect is shown in Figure 1 (c) and (d). 2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 110 a) Original map b) Segmentation of graphs based on Graph Cuts algorithm c) Segmentation graph of the algorithm in this paper d)The partitioned graph after adjusting parameters Figure 1. Image Segmentation III. TARGET UAV DETECTION BASED ON HOG-FLD FEATURE FUSION AND SVM DETECTION The method could be divided into two stages from the perspective of practice: training and testing. In the training stage, the window with constant shape and size is used to traverse the whole UAV or non-UAV sample images, and the feature is extracted by HOG-FLD feature fusion method. A computable feature vector is obtained to describe the contour of the image, which is robust and easy to classify. The SVM classifier [8-9] for UAV detection could be obtained by statistical analysis and learning of these Eigenvectors. In the early stage of the test, the same method as the training stage is used to extract the feature of the regions to be detected, and the candidate targets of these areas are classified by the trained SVM classifier in the later stage to determine if the target in the candidate area is a UAV. A. Feature Extraction From HOG-FLD Feature Fusion Because the edge contour feature of UAV has strong stability and extensibility, the HOG feature with strong ability to describe contour is extracted from data set. However, because of the large dimension of HOG feature vector, it is unfavorable to the training of classifier. In order to train the classifier with good classification ability, it is necessary to reduce the dimension of extracted HOG feature by FLD [18] feature fusion and finally obtain the feature vector with small dimension. To sum up, the feature extraction based on HOG-FLD feature fusion is adopted in this article. The basis of feature extraction in this article is to extract feature of gradient histogram of oriented gradient (HOG). The thought of algorithm is as follows: calculating the gradient of selected image; The whole image is divided into rectangular cells of fixed size and equal size, each cell has pixel including mm ; the cell is divided into 9 undirected channels) or 18 directed channels, and the gradient histogram of each direction is voted on. The voting weight is the gradient value that has been calculated earlier. The divided cell units above are composed of fixed blocks of the same size. Each block contains nn  cell elements, which the local eigenvector corresponding to each block is normalized in order to reduce the effect of illumination on the results. The feature vectors of these blocks are combined to form the HOG feature vectors of the image. The steps of extracting HOG feature are as follows: graying the positive sample image of UAV, filtering the input image with Gamma correction method so that the image could achieve the standard contrast of color space and reduce the effects of local shadows and light changes on the image; dividing the image into a certain number of cells, and forming a fixed number of cells into a certain number of blocks of equal size as in the preceding paragraph, classifying the gradient range according to the above mentioned rules, we could 2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 111 calculate the cell features in these blocks and finally connect all the blocks together to obtain the feature vector containing the whole target UAV image. Formula (11) is a formula for normalizing the image, formula (12) and formula (13) could calculate the gradient component of each pixel, formula (14) and formula (15) could calculate the size and direction of the gradient. )||/(|| 1  ggg vvv  ),1(),1(),( yxyxyx pipiGx    )1,()1,(),(   yxyxyx pipiGy  ),( 2 ),( 2 ),( yxyxyx GyGxS   )/arctan( ),(),(),( yxyxyx GxGy  Where  g v is the result of histogram normalization, and gv is the extracted vector histogram; ),1( yxpi  , ),1( yxpi  , )1,( yxpi , and )1,( yx pi respectively denote the location of 4 pixel points; ),( yx Gx and ),( yx Gy respectively denote the coordinate position in the horizontal and vertical directions of the two pixels. And ),( yx S , ),( yx  respectively denote the length of the gradient direction vector and the angle of the gradient direction vector. In this article, a window including 64 * 128 pixel is used to scan the sample image and the image to be detected using a window including 64 × 128 pixel. The scanning step size is 8 pixels (scanning is in horizontal and vertical direction). The window is divided into cells including 8 * 8 pixel and forms 8 * 16 = 128 units. Then setting up four adjacent units up and down to the left and right as a block of pixels, a window contains 105 blocks of pixels. A 3780 dimensional feature vector named HOG feature description value is generated in a window containing 105 pixel blocks according to the calculation steps of HOG. Its specific HOG algorithm is shown in Figure 2: 8*8 pixel unit 16*16 pixel blocks 64*128 pixel window image 8 pixels per step Figure 2. Principle Diagram of HOG Algorithm On the basis of HOG feature, the linear subspace is constructed by using Fisher Linear Discriminant Analysis (FLD) [18]. By calculating the optimal projection matrix, the projection matrix for feature extraction of the training set is obtained, and the similarity of its projection vector is taken as the similarity degree cos S of cosine similarity, which purpose is to reduce the intra-class dispersion W S as much as possible and to increase the inter-class dispersion B S as much as possible (or that is in the training concentration, to make the sample data of the same UAV as close as possible, and the sample data of different UAV to be far away). In this way, the features with classification ability are extracted. When dealing with a problem that type is c, the mathematical formulas of inter-class dispersion BS and intra-class dispersion W S should be defined as follows: T i c i iiB NS )()( 1       T ik c i ik x W xxS ik )()( 1        Where i  denotes the mean value of class i  ;  means the mean value of total sample; i  means the number of samples of class i  . The optimal projection matrix opt W could be obtained by solving the optimization problem such as formula (18), where W S must be a nonsingular matrix (or that is 2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 112 the total number of training samples N is greater than the characteristic dimension of UAV image): || || maxarg WSW WSW W W T B T W opt   opt W could also be obtained by solving the generalized eigenvalue problem such as the formula (19):   WSWS WB  In order to solve the problem that the intra-class dispersion matrix W S is singular, PCA principal component analysis (PCA) is used to reduce the dimension of the feature space (dimensionality reduction to N-cu), and then Fisher linear discriminant analysis (FLD) is used to deal with it. The projection vector y of test sample x is obtained according to formula (20): xWy T opt   Cosine similarity cos S is used as the similarity measure of projection vector y. Where the cosine similarity cos S of vector },,,{ 21 n aaaA  and },,,{ 21 n bbbB  is defined as follows: 22 1 22 cos |||||||||||||||| , BA ba BA BA S n i ii   B. Support Vector Machine Because the support vector machine (SVM) proposed by Vapnik has the advantages of simple system structure, global optimization, good generalization, and short training and prediction time [9], this paper uses SVM as a machine learning tool to calculate the rule of samples in order to achieve fast and efficient learning sample features and accurate classification purposes. The main idea of SVM is to deal with the linear inseparability of the original space by selecting the kernel function of Polynomial Kernel to correspond the data to the high-dimensional space. When the algorithm is used to realize the two-classification, the sample features such as HOG must be extracted from the original space first, and then the sample features in the original space are represented as a vector in the high-dimensional space. In order to minimize the error rate of the two class classification problems, we need to find a hyperplane that is used to divide the two classes in the high dimensional space. Let the sample set be ii yx , , where i=1,2,...,N, e i Rx  , }1,0{iy is the class identifier. Then in e dimensional space, its linear discriminant function is as follows: bxwxg )(  The formula of the classification surface equation is as follows: 0 bxw  After normalization of the discriminant function, the following conditions must be satisfied for the two types of samples: 1)( xg  The classification interval could be |||| 2 w , where the maximum requirement |||| w for the classification interval is kept to a minimum, all samples must be correctly classified, and the following conditions must be met: 01])[(  bxwy ii  An SVM whose inner product function is ),( ji xxk is constructed with the following formulas (which can be understood as the formula for calculating the extreme value of a quadratic function with conditional constraints): ),()( 1 ji N i jiji xxkyyaaaQ     Its constraints are expressed as:    N i iii yaCa 1 0,0 The formulas of the support vector machine that could be calculated are as follows: 2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 113 )),(sgn()( 1     N i iii bxxkyaxf  Among them,  b is a constant parameter, indicating the size of the threshold that needs to be classified. C. Sample Preparation and Classifier Training In this article, 900 positive sample images and 900 negative sample images are selected, and 500 original images of positive samples are selected to calculate the aspect ratio of UAV. An image that aspect ratio is 1:2 is obtained, and the pixels are normalized to 256 * 512 to avoid the effect of image size on the recognition effect. As described in the HOG feature section above, each image could produce 105 pixel blocks, and each pixel block could obtain a 36-dimensional feature vector, so a image will finally produce a 3780 dimensional HOG feature vector. The extracted HOG vector is used as the input vector of Fisher linear discriminant analysis algorithm to reduce the dimension of the whole vector. The dimension of the vector is adjustable and its parameter is determined according to the recognition efficiency of the experiment. Sending the vector reduced dimension to the SVM model, the SVM classifier is obtained to detect whether the UAV is included in the image under test. IV. PARAMETER ANALYSIS AND EXPERIMENTAL RESULTS The environment used in this experiment is that the simulation software of Matlab R2015b based Windows 7 on 64-bit operating system. Where CPU of computer is i5-6500 and its installed memory is 4.0 GB, and the number of the test images are 200. The best detection effect is obtained by adjusting the important parameters that affect the experimental results. A. Parameter Selection of FLD In the process of extracting feature vector, the dimension that needs to be reduced by Fisher linear discriminant analysis algorithm is the parameter k of FLD algorithm, which changes with the change of this parameter. The contrast graph of recognition time is shown in Figure 3. It can be seen from the graph that when k is 50, the whole algorithm has the least recognition time. Figure 3. The time Chart with Parameter k Changed B. Experimental Effect of Aerial UAV Detection Algorithm Based on Region of Interest Experiments show that in the whole algorithm, when the number of pixels in the divided blocks is 8*8, SVM kernel function type is 2, when the threshold of segmentation is k=90 and sigma=10, The algorithm has the best overall recognition effect on accuracy rate and time, and the recognition result is shown in Figure 4: Picture(a) Picture (b) Picture (c) Picture (d) Figure 4. Identification Results C. Comparison of Detection Effect of Traditional HOG-SVM UAV Detection Algorithm In order to verify the efficiency of the proposed method, the experimental results of the proposed method are compared with the experimental results of the HOG and SVM method based on image segmentation, and the experimental results are obtained by statistics. As shown in TABLE I, and the average recognition time of this method is faster than that of the latter method. 2018 International Conference on Sensor Network and Computer Engineering (ICSNCE 2018) 114 TABLE I. COMPARISON RESULTS OF THE TEST METHODS Test Method Accuracy Rate Time HOG-FLD+SVM 93.35% 0.090s HOG+SVM 95.75% 0.160s V. TAG The mechanism based on the region of interest to obtain the region of interest is used in this article. In the testing phase, the acquired regions of interest are input into the trained SVM classifier, which reduces the recognition time. The dimension reduction of Fisher linear discriminant analysis (FLD) makes SVM easy to train and in the phase of feature vector extraction. Compared the HOG-SVM detection method based on image segmentation with the region of interest based aerial UAV detection algorithm, this paper uses Matlab to carry out simulation experiments. The experimental results show that the detection algorithm based on region of interest is better than the sliding window method based on image segmentation to detect UAV in terms of accuracy and time. ACKNOWLEDGMENT Fund projects: Key projects in the Industrial Field of Shaanxi (2016KTZDGY4-09), Scientific Research Program Funded by Shaanxi Provincial Education Department (17JK0364), National Natural Science Foundation of China (61572392), State and Provincial Joint Engineering Lab. of Advanced Network and Monitoring Control (GSYSJ2016008), Scientific Research Program Funded by Shaanxi Provincial Education Department (16JK1379). REFERENCES [1] J. Barron, D. Fleet, S. Beauchemin, “Performance of optical flow techniques,” International Journal of Computer Vision, vol.12, pp. 42-77, 1944. [2] A. Lipton, H. Fujiyoshi, R. Patil, “Moving target classification and tracking from real-time video,” Proc IEEE Workshop on Applications of Computer Vision, , pp. 8-14, 1998. [3] DALALN, TRIGGS B, “Histograms of oriented gradients for human detection,” Proc IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2005. [4] Qiang Zhu, Shai Avidan, Mei Chen Yeh, and Kwang Ting Cheng, “Fast human detection using a cascade of histograms of oriented gradients,” Proc.IEEE international Conference on Computer Vision and Pattern Recognition, 2006. [5] Suard F, Akotomamonjy A R, Bensrhair A, etal, “Pedestrain Detection Using Infrared Images and Histograms of Oriented Gradients,” Proceedings Intelligent Vehicle Symposium, pp. 206-212, 2006. [6] Felzenszwalb P F, Huttenlocher D P, “Efficient Graph-Based Image Segmentation,” Internatinal Journal of Computer Vision, vol.59, pp. 167-181, 2004. [7] Humayun A, Li F, Rehg J M. RIGOR, “Reusing Inference in Graph Cuts for Generating Object Regions,” Proc.IEEE Computer Vision and Pattern Recognition, pp. 336-343, 2014. [8] Vapnik V N, “ The nature of statistical learning theory,” Springer-Verlag, New York , pp. 37-69, 1995. [9] Guo Mingwei, ZhaoYuzhou, Xiang Junping, etal, “A Survey of Target Detection Algorithms Based on Support Vector Machine,” Control and Decision, vol. 29, pp. 192-200, 2014. [10] Zhang Han, He Dongjian, “An Image Segmentation Method Based on Texture Information and Graph Theory,” Computer Science and Engineering, vol. 50, pp. 180-184, 2014. [11] Zhai Jiyou, Zhuang Yan, “Significant Detection of Boundary Prior and Adaptive Region Merging ,” Computer Engineering and Application, 2017. [12] Chen Shanchao, Fu Hongguang, Wang Ying, “Application of An Improved Graph Segmentation method in Tongue Image Segmentation,” Computer Engineering and Application, vol. 48, pp. 201-203, 2012. [13] Yan Yu, Song Wei, “Color and Texture Mixed Descriptor Image Retrieval Method,” Computer Science and Exploration, PP. 1-8, 2016. [14] Ye Qing, Hu Changbiao, “An improved Image Segmentation Method Based on Graph Theory,” Computer and Modernization, vol. 253, pp. 64-67, 2016. [15] Sande K E A V D, Uijlings J R R, Gevers T, etal, “Segmentation as selective search for object recognition,” International Conference on Computer Vision. IEEE Computer Society, pp. 1879-1886, 2011. [16] Girshick R, Donahue J, Darrell T, etal, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” Computer Science, pp. 580-587, 2014. [17] Wang Ping, Wei Zheng, Cui Weihong, “A Minimum Spanning Tree Image Segmentation Criterion Based on Statistical Learning Theory,” Information Science Edition, Journal of Wuhan University, vol. 42, pp. 878-883, 2017. [18] Belhumeur P, Kriegman D, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19, pp.711-720, 1997.