key: cord-0057837-3xjz3q6g authors: Ben Slima, Ilef; Ammar, Sourour; Ghorbel, Mahmoud; Kessentini, Yousri title: Possibilistic Classifier Combination for Person Re-identification date: 2021-02-22 journal: Pattern Recognition and Artificial Intelligence DOI: 10.1007/978-3-030-71804-6_8 sha: 933233641170e49f1598b6d2b77378820421ba1f doc_id: 57837 cord_uid: 3xjz3q6g Possibility theory is particularly efficient in combining multiple information sources providing incomplete, imprecise, and conflictive knowledge. In this work, we focus on the improvement of the accuracy rate of a person re-identification system by combining multiple Deep learning classifiers based on global and local representations. In addition to the original image, we explicitly leverages background subtracted image, middle and down body parts to alleviate the pose and background variations. The proposed combination approach takes place in the framework of possibility theory, since it enables us to deal with imprecision and uncertainty factor which can be presented in the predictions of poor classifiers. This combination method can take advantage of the complementary information given by each classifier, even the weak ones. Experimental results on Market1501 publicly available dataset confirm that the proposed combination method is interesting as it can easily be generalized to different deep learning re-identification architectures and it improves the results with respect to individual classifiers. Person re-identification (re-id) aims to identify the same person in multiple images captured from different camera views [12] . Many efforts have been dedicated to solve this problem, but person re-id still facing many challenges and can be affected by many factors such as the variation of person appearance (poses [6] , illumination [16] , camera views [18] and occlusion [15] ) and the impact of the background scenes [10, 27] . To address these challenges, many works [20, 28, 30] directly focus on the whole image to learn global feature description. Other works proposed to enhance the re-id accuracy of multiple baseline methods by applying re-ranking process on the ranking list [21, 33, 37] . Using global representations generally leads to unnecessary information coming from the background while local features can be ignored. To overcome this problem, many researches attempt to exploit local features to enhance the person re-id results [17, 19, 25, 29] . Some other researches try to extract local representation from different parts of the original image. For example, [19] and [32] considered the image as a sequence of multiple equal horizontal parts. Other works used semantic segmentation to reduce the impact of the background variations or to process body parts separately. We cite our previous work [10] which proposed a person re-id system based on a late fusion of a two-stream deep convolutional neural network architecture to reduce the background bias. The first stream is the original image, and the second one is the background subtracted image belonging to the same person. The combination method used in this work is the weighted sum and the authors demonstrated that combining the output of these two streams can enhance the person re-id performances. In this paper, we extend our previous work by combining different body part streams with the original and background subtracted images in order to integrate local representations and alleviate the pose and background variations. Traditional combination methods are not generally efficient when combining multiple information sources providing incomplete, imprecise, and conflictive knowledge. To overcome this limitation, we propose in this paper a possibilistic combination method to aggregate the output of multiple person re-identification classifiers. The possibility theory, introduced by Zadeh [34] , is an uncertainty representation framework which deals with uncertainty by means of fuzzy sets [7] . It naturally complements fuzzy set theory for handling uncertainty induced by fuzzy knowledge [4] . Possibility theory has been used in many works in order to improve the performance of classifiers in the context of uncertain and poor data. For example, the authors in [5] developed possibilistic Bayesian classifiers which showed a good performance in the case of poor data. The aggregation between two possibilistic classifiers has been also proposed in this work in order to take advantages of different classifiers and further improve their accuracy. Besides, the authors in [3] used a probability-to-possibility transform-based possibilistic approach in order to deal with decision-making under uncertainty. Recently, an aggregation approach was proposed in [1] , named SPOCC method, which combines different learner predictions in the possibility theory framework and may be applied on any type of classifier. It assumes that the possibility distributions framework reflects how likely the classifier prediction is correct and considers the uncertainty factor of the classifier and the imprecision of the used data. Inspired by the work of [1] , we propose in this paper a combination method based on possibility theory to deal with the imprecision of our different classifiers and to aggregate their predictions in the context of person re-identification. Thus, the main contributions of this paper are as follows: • At first, we propose to consider different body part streams as input of our Convolutional Neural Network (CNN) classifier. Thus, different classifiers are applied and different predictions could be obtained for a query image. • Secondly, we propose a combination method that is based on possibility theory and is composed of two main phases; the construction and the aggregation of possibility distributions for each classifier. • Finally, we evaluate this combination method on the Market-1501 benchmark dataset using two different deep CNN architectures and we prove that our method is especially interesting in the case of using poor classifiers. This paper is organized as follows. We present in Sect. 2 an overview of our proposed method and we provide the implementation details. Then, we present the experimental results in Sect. 3. Finally, we conclude in Sect. 4. We depict in this section the details of the proposed method. We recall that our work takes place in the context of person re-identification which aims to identify the same person in multiple images captured from different camera views. Given a probe person image p and a gallery set G with N images belonging to l different identities (class labels), with G = {g i | i = 1 ... N } and Ω = {m j | j = 1 ... l}, the aim of person re-id method is to compare p with all the gallery images g i in G in order to determine the identity m j in Ω of p. We propose in this work to extend our work [10] by considering four different variations of the input image: the original and background subtracted images and the middle and down body parts images. We show in Fig. 1 a flowchart describing the three processing steps. The proposed framework takes as input the original image p. In the first step, we use the semantic segmentation model (SEG-CNN) proposed in our previous work [10] to generate three segmented images: background subtracted image, middle and down body parts. Four images are then given as input to the second step (reidentification step), where we apply a deep learning classifier on each input image and generate the corresponding output m p k−pred , where k − pred is the predicted identity by the k th classifier. The third step, called possibilistic fusion, consists on aggregating these outputs ({m p k−pred } k=1...4 ) to provide the final result using the possibilistic distribution of each classifier. In this paper, we focus on the third step, namely the possibilistic fusion. The aim of this method is to combine multiple classifier outputs in order to provide robust person re-identification results. Our possibilistic fusion method consists on two main phases as shown in Fig. 3 : 1) the construction of possibility distributions and 2) the aggregation of these distributions to make a final decision. For the first phase, we apply each classifier on the validation set and we create the corresponding confusion matrix (see Fig. 2 ). The four confusion matrices M (k) ; k = 1 . . . 4 are used as input of our combination method, precisely for the construction of the possibility distributions (see Sect. 2.1). Because of in re-id process, learning and testing datasets contain always different identities, we use in this step the gallery images G as validation data to generate the confusion matrix for each classifier. Then, in the second phase, the above generated possibility distributions of the four classifiers are aggregated to make the final decision (see Sect. 2.2). We provide in next sections the details of these steps. The first phase of the proposed combination method is to construct the possibility distributions of each classifier. This method is inspired from the work of [1] where the possibility distributions are obtained from the confusion matrices of the corresponding classifiers. These confusion matrices are obtained by applying a re-id classifier on the validation set (see Fig. 2 ) which reflects how likely each classifier prediction is correct regarding the frequentist probabilities estimated on the validation set. It should be recalled that a confusion matrix is an important tool for analyzing the performance of a classification method [11] . It is a two-dimensional matrix which contains information about the actual and the predicted classes and which summarizes the number of correct and incorrect predictions [24] . Using the obtained confusion matrices M (k) , we construct the possibility distributions of each classifier using three different steps (as presented in the first part of Fig. 3 ). It should be noted that a considerable disparity between the distribution of images in each class can exist in many datasets. In this case, the possibility distributions generated from the confusion matrices can be biased due to this disparity. For that reason, we propose to normalize the values of each cell M according to the number of instances in each class, so that the new values of the matrix will be in the same order of magnitude. This normalization step is given by the following formula: where M (k) In order to construct possibility distributions, we propose to transform the confusion matrix of each classifier into possibility distributions. To do so, the first step is to normalize the j th column of the confusion matrix to obtain an estimation of the probability distribution p(Y = i/m k = j) where Y is the actual class and m k is the class predicted by the classifier C k . This probability distribution of the column j is denoted by p Y /m k =j . Then, we use the Dubois and Prade Transformation method (DPT) [8] to transform the probability distributions into possibility distributions. So, for each classifier C k and for each column j, the possibility distributions π (k) j are obtained from the probability distributions p Y /m k =j after ordering them in descending order. The DPT transforming method is given by this formula: where the p i are ordered in the descending order (p 1 ≥ p 2 ≥ ... ≥ p l ). In conclusion, for each classifier C k , we obtain the different possibility distributions (π (k) j ) 1≤j≤l as follows: Discounting the Possibility Values: In most cases, the capability predictions of the different classifiers are significantly different. Some classifiers have high classification rates and some others have a weak accuracy. Otherwise, as mentioned earlier, the combination of the different classifiers can take advantage of the complementary information given by each classifier, even the weak ones. In order to take into consideration this difference between the classifiers' predictions and to fade the poorer ones, we propose to use a discounting mechanism on all the possibility values of the different classifiers. Many discounting methods have been proposed in the literature, as in [2, 22, 23, 26] . In this paper, we use the discounting method presented in the formulas 4 and 5. Other discounting methods could be used in a future work. The discounting method consists in updating all the possibility distributions related to a classifier C k using the following formula: The variable α k is a coefficient relative to the classifier C k which is given as follows: where r [C k ] is the estimated error rate of the classifier C k on the validation set and ρ is a hyper-parameter to tune by grid search. Using the above equation, it should be noted that the best base classifier is not discounted since the value of its α k is equal to 0. After constructing all the possibility distributions of the different classifiers, the next phase is to combine these possibility distributions in order to classify a new query image. As presented in Fig. 3 , for each query p, we consider the class (m p k−pred ) predicted by each classifier C k . This prediction corresponds to the Top 1 identity predicted by the classifier. Then, for each classifier, and given its own prediction, we consider the possibility distribution which corresponds to the predicted class m p k−pred among all the possibility distributions constructed in the above phase. Thus, only the four possibility distributions π (k) m k−pred are considered and will be aggregated in order to make the final prediction. As other measures in the fuzzy set theory, the possibility distributions can be aggregated using a T-norm operator. T-norms are examples of aggregation functions; they are widely used in knowledge uncertainty treatment [31] . Among different T-norm operators proposed in the literature [9] , we propose to use the elementwise Product T-norm (T × ) in this work. Therefore, if π k1k2 is the aggregated possibility distribution obtained by applying a T-norm to the distributions π k1 and π k2 , then, π k1k2 is obtained as follows: π k1k2 (y) = T × (π k1 (y), π k2 (y)) = π k1 (y) × π k1 (y); ∀y (6) Since the T-norm Product operator is a commutative and associative operator, it is obviously applied in the case of more than two factors. Consequently, the possibility distribution of the ensemble of aggregated classifiers is obtained as follows: Finally, the final prediction of a query p by the ensemble of classifiers is the class c ens given by: c ens (p) = arg max y∈Ω π ens (y) In this section, we empirically evaluate the proposed method and we show how possibilistic combination can enhance person re-id results. Experiments are carried out on a publicly available large-scale person re-identification dataset Mar-ket1501 [35] . This dataset is composed of 32,667 images divided into 19,372 gallery images, 3,368 query images and 12,396 training images related to 1501 person identities distributed in 751 identities for training phase and 750 for testing phase. Images are captured by one low-resolution and five high-resolution cameras. We implemented the different steps of the proposed combination method and we applied them on the classification results obtained by two different deep convolutional neural network architectures: the Siamese CNN architecture (S-CNN) [36] and the ResNet50 architecture [13] . First, the confusion matrices are obtained by applying the two classifiers on the gallery set (the 19,372 gallery images), as presented in Fig. 2 . Then, these two classifiers and the combination method are applied on the test set according to the person re-identification protocol (query set containing 3368 images and gallery set containing 19,372 images) (see Fig. 1 ). As mentioned in Sect. 2.1, we used a grid search method to find the best value of the ρ parameter used in the formula 5. To evaluate the performance of our combination method, we use the accuracy metric which reflects the number of correct classifications among all the classified query images. In this paper, we only focus on the Top 1 predicted class. The results obtained by applying the two different CNN classifiers (S-CNN and ResNet50) are shown respectively in Figs. 4 and 5. We remind that we propose to use four different re-id streams for each CNN classifier: In each graphic of Fig. 4 and Fig. 5 , the reported Combination refers to the accuracy obtained by fusing the output of the corresponding stream classifiers. For example, in the first graphic (a) of Fig. 4 , we present the results of the classifier using the Full stream (the light blue bar), the result of the classifier using the No bk stream (the grey bar) and finally the result of our fusion method using these two streams Full +No bk (the red bar). In the same way, we present in the last graphic (k) of Fig. 4 the results of the classifiers using the four streams individually (Full in light blue, No bk in grey, Mid in orange and Dwn in blue) followed by the result of our fusion method using these four streams (in red bar). It should be noted that, among the four individually streams, the Full stream (which extract information from the whole image) gives the best classification rates when using both classifiers (For example, in the case of S-CNN classifier, we obtain 79.78% with the Full stream against 73.93% with No bk stream, 47.47% with the Mid stream and 47.77% with the Dwn stream). We should also note that the results of individual streams are better when we use the S-CNN classifier (the Full stream gives 79.78% when we use S-CNN and 73.60% when we use ResNet50). Based on the results showed in Fig. 4 and 5, we notice that our fusion method outperforms the individually-stream classifiers in almost all the cases (except one single case in the diagram (b) of Fig. 4) . We also mention that the best accuracy rate is obtained when we combine the four streams (84.97% is obtained when we use the four streams and the S-CNN classifier (diagram (k) in Fig. 4) and 84.08% with the ResNet50 classifier (diagram (k) in Fig. 5) ). When compared with the Full stream (which gives the best results among the four streams), our possibilistic combination method makes an improvement of 6.5% when using the S-CNN classifier and of 14.23% when using the ResNet50 classifier. From the results showed in Fig. 4 and 5, we also remark that our proposed possibilistic combination method is more efficient when using poor classifiers. We remind that the Resnet50 classifier gives less accurate results than the S-CNN classifier. However, our combination method makes a better improvement when using the Resnet50 classifier (an improvement of 14.23% compared to the Full stream (see diagram (k) in Fig. 5) ) than that obtained when using the S-CNN classifier (an improvement of 6.5% compared to the Full stream (see diagram (k) in Fig. 4) ). On the other hand, we remark that when we combine the two poorer classifiers, Mid + Dwn, we obtain a highest improvement of 19.57% when using S-CNN classifier compared to the Dwn stream which gives a performance of 47.77% (see diagram (f) in Fig. 4 ) and of 21.46% when using ResNet50 classifier compared to the Mid stream which gives a performance of 47.47% (see diagram (f) in Fig. 5 ). In contrast, the combination of the two better streams, Full + No bk (see diagrams (a) in Fig. 4 and 5) , does not make a great improvement especially in the case of S-CNN classifier (2.47% compared to the Full classifier). To summarize, the proposed combination method based on the possibilistic framework is able to improve the performance obtained by each classifier separately. It is able to take advantages of the opinions of even the poorer classifiers in order to make a better decision. In addition, the experimental results elaborated on the Market-1501 dataset with two different classification methods confirm that the possibilistic based combination method is especially interesting in the case of poor classifiers. We also compared our method with other state-of-the-art methods of person re-id applied on the Market-1501 dataset. We especially focus on methods based on multi-stream approaches. Table 1 presents a comparative study of these methods. As we focus in our approach on the Rank-1 predicted identity, we consider in this comparison only the Rank-1 result predicted by each method. The Table 1 show that our method outperforms most of other methods by a large margin. For example, it makes an improvement of 3.8% compared to the BSTS S-CNN method and of 22.6% compared to PL-Net method. This confirms that the usage of multiple classifiers on different streams can improve the classification rates; and that our possibilistic combination method is able to take advantage of the decision of the different classifiers, even the poorer ones (the middle and down). In this paper, we proposed a possibilistic combination method which aims to merge the outputs of multiple Deep learning classifiers based on global and local representations in order to enhance the performance of a person re-identification system. The proposed combination method takes place in the framework of possibility theory, since it enables us to deal with imprecision and uncertainty. Experimental tests were performed on the Market-1501 dataset and have led to very satisfactory results compared to the results obtained by each classifier separately, especially in the case of using poor classifiers. It should be noted that we considered in this paper the Top 1 identity predicted by each classifier. It would be then interesting to extend this work to take into account the Top 5 or Top 10 predicted identities in a future work. In addition, we used in this paper the product T-norm when fusing the possibility distributions of the different classifiers. As a perspective, we envisage to test other T-norms as the Lukasiewicz or the Drastique T-norms [9] . Finally, another interesting perspective is to propose other discounting methods to take into account the differences between the classifiers and to fade the poorer ones [2, 22, 23] . SPOCC: scalable possibilistic classifier combination-toward robust aggregation of classifiers CPF: concept profiling framework for recurring drifts in data streams A new possibilistic classifier for mixed categorical and numerical data based on a bi-module possibilistic estimation and the generalized minimum-based algorithm Fuzzy sets and possibility theory in approximate and plausible reasoning Possibilistic classifiers for numerical data Improving person re-identification via pose-aware multishot matching Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities On several representations of an uncertain body of evidence Comparison of different t-norm operators in classification problems Improving person reidentification by background subtraction using two-stream convolutional networks Audio classification Person Re-Identification. ACVPR Deep residual learning for image recognition Person re-identification by deep learning muti-part information complementary Adversarially occluded samples for person re-identification Illumination-invariant person reidentification Contribution-based multi-stream feature distance fusion method with k-distribution re-ranking for person re-identification Person re-identification with discriminatively trained viewpoint invariant dictionaries Learning deep context-aware features over body and latent parts for person re-identification DeepReID: deep filter pairing neural network for person re-identification Improving person re-identification by combining Siamese convolutional neural network and re-ranking process Sur l'affaiblissement d'une fonction de croyance par une matrice de confusion. Rencontres Francophones sur la Logique Floue et Ses Applications Refined modeling of sensor reliability in the belief function framework using contextual discounting Foundations of neural networks. In: Pattern Recognition and Signal Analysis in Medical Imaging Auto-ReID: searching for a partaware convnet for person re-identification A Mathematical Theory of Evidence Eliminating background-bias for robust person re-identification Gated Siamese convolutional neural network architecture for human re-identification Local-global extraction unit for person re-identification Learning deep feature representations with domain guided dropout for person re-identification Forms of multi-criteria decision functions and preference information types Deep representation learning with part loss for person re-identification Divide and fuse: a re-ranking approach for person re-identification Fuzzy sets as a basis for a theory of possibility Scalable person re-identification: a benchmark A discriminatively learned CNN embedding for person reidentification Re-ranking person re-identification with kreciprocal encoding We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.