gp_biometrics.dvi HAL Id: hal-00671952 https://hal.archives-ouvertes.fr/hal-00671952 Submitted on 20 Feb 2012 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Genetic Programming for Multibiometrics Romain Giot, Christophe Rosenberger To cite this version: Romain Giot, Christophe Rosenberger. Genetic Programming for Multibiometrics. Expert Systems with Applications, Elsevier, 2012, 39 (2), pp.1837–1847. �10.1016/j.eswa.2011.08.066�. �hal-00671952� https://hal.archives-ouvertes.fr/hal-00671952 https://hal.archives-ouvertes.fr Genetic Programming for Multibiometrics Romain Giot∗, Christophe Rosenberger GREYC Laboratory ENSICAEN - University of Caen - CNRS 6 Boulevard Maréchal Juin 14000 Caen Cedex - France Abstract Biometric systems suffer from some drawbacks: a biometric system can provide in general good performances except with some individuals as its performance depends highly on the quality of the capture... One solution to solve some of these problems is to use multibiometrics where different biometric systems are combined together (multiple captures of the same biometric modality, multiple feature extraction algorithms, multiple biometric modalities. . . ). In this paper, we are interested in score level fusion functions application (i.e., we use a multi- biometric authentication scheme which accept or deny the claimant for using an application). In the state of the art, the weighted sum of scores (which is a linear classifier) and the use of an SVM (which is a non linear classifier) pro- vided by different biometric systems provid one of the best performances. We present a new method based on the use of genetic programming giving similar or better performances (depending on the complexity of the database). We derive a score fusion function by assembling some classical primitives functions (+, ∗, −, ...). We have validated the proposed method on three significant biometric benchmark datasets from the state of the art. Keywords: Multibiometrics, Genetic Programming, Score fusion, Authentication. ∗Corresponding author Email addresses: romain.giot@ensicaen.fr (Romain Giot), christophe.rosenberger@greyc.ensicaen.fr (Christophe Rosenberger) Preprint submitted to Expert Systems with Applications February 20, 2012 1. Introduction1 1.1. Objective2 Every day, new evolutions are brought in the biometric field of research.3 These evolutions include the proposition of new algorithms with better per-4 formances, new approaches (cancelable biometrics, soft biometrics, ...) and5 even new biometric modalities (like finger knuckle recognition [1], for example).6 There are many different biometric modalites, each classified among three main7 families (even if we can find a more precise topology in the literature) :8 • biological : recognition based on the analysis of biological data linked to an9 individual (e.g., DNA analysis [2], the odor [3], the analysis of the blood10 of different physiological signals, as well as heart beat or EEG [4]);11 • behavioural : based on the analysis of an individual behaviour while he is12 performing a specific task (e.g., keystroke dynamics [5], online handwrit-13 ten signature [6], the way of using the mouse of the computer [7], voice14 recognition [8], gait dynamics (way of walking) [9] or way of driving [10]);15 • morphological based on the recognition of different particular physical pat-16 terns, which are, for most people, permanent and unique (e.g., face recog-17 nition [11], fingerprint recognition [12], hand shape recognition [13], or18 blood vessel [14], ...).19 Nevertheless, there will always be some users for which a biometric modality20 (or method applied to this modality) gives bad results, whereas, they are better21 in average. These low performances can be implied by different facts: the quality22 of the capture, the instant of acquisition and the individual itself but they have23 the same implication (impostors can be accepted or user need to authenticate24 themselves several times on the system before being accepted). Multibiometrics25 allow to solve this problem while obtaining better performances (i.e., better26 security by accepting less impostors and better user acceptance by rejecting less27 genuine users) and by expecting that errors of the different modalities are not28 2 correlated. In this paper, we propose a generic approach for multibiometric29 systems.30 We can find different types of biometric multimodalites [15]. They use:31 1. different sensors of the same biometric modality (i.e., capacitive or resistive32 sensors for fingerprint acquisition);33 2. several different representations for the same capture (i.e., use of points34 of interest or texture for face or fingerprint recognition);35 3. different biometric modalities (i.e., face and fingerprint recognition);36 4. different instances of the same modality (i.e., left and right eye for iris37 recognition);38 5. multiple captures (i.e., 25 images per second in a video used for face recog-39 nition);40 6. an hybrid system composed of the association of the previous ones.41 We are interested in the first four cases in this paper. Our objective is to42 automatically generate fusion functions which combine the scores provided by43 different biometric systems in order to obtain the most efficient multibiometrics44 authentication scheme.45 1.2. Background46 1.2.1. Performance Evaluation47 In order to compare different multibiometrics systems, we need to present48 the how to evaluate them. Several works have already done on the evaluation of49 biometric systems [16, 17]. Evaluation is generally realized within three aspects:50 • performance: it has for objective to measure various statistical criteria51 on the performance of the system (Capacity [18], EER, Failure To En-52 roll (FTE), Failure To Acquire (FTA), computation time, ROC curves,53 etc [17]);54 • acceptability: it gives some information on the individuals’ perception,55 opinions and acceptance regarding the system;56 3 • security: it quantifies how well a biometric system (algorithms and de-57 vices) can resist to several types of logical and physical attacks such as58 Denial of Service (DoS) attack.59 In this paper, we are only interested in performance evaluation (because the60 fusion approach is not modality dependant and perception and security depend61 on the used modalities). The main performance metrics are the following ones:62 • FAR (False Acceptance Rate) which represents the ratio of impostors ac-63 cepted by the system;64 • FRR (False Rejection Rate) which represents the ratio of genuine users65 rejected by the system;66 • EER (Error Equal Rate) which is the error rate when the system is con-67 figured in order to obtain a FAR equal to the FRR;68 • ROC (Receiver Operating Characteristic) curve which plots the FRR de-69 pending on the FAR and gives an overall overview of system performance;70 • AUC (Area Under the Curve) which gives the area under the ROC curve.71 In our case, smaller is better. It is a way to globally compare performance72 of different biometric systems.73 We can also present the HTER (Half Total Error Rate) which is the mean74 between the FAR and FRR for a given threshold (this error rate is interesting75 when we cannot get the EER).76 1.2.2. Biometric Fusion77 There are several studies on multibiometrics. The fusion can be operated on78 different points of the mechanism:79 • template fusion: the templates captured by different biometric systems80 are merged together, then the learning process is realized on these new81 templates [19, 20]. Figure 1(a) presents this type of fusion. The fusion82 4 (a) Template fusion. (b) Classical score fusion. (c) Cascade fusion. (d) Hierarchical fusion. Figure 1: Illustration of different fusion mechanisms. process is related to a feature selection in order to determine the most83 significant patterns to minimize errors.84 • decision fusion: the decision is taken for each of the biometric authen-85 tication system, then the final decision is done by fusing the previous86 ones [21].87 • rank fusion: the decision is done with the help of different ranks of bio-88 metric identification systems. The main method is the majority vote [22].89 • score fusion: the fusion is realized considering the output of the classifiers.90 The Figure 1(b) presents this type of fusion.91 Buyssens et al. [23] showed the interest of biometric fusion for face recogni-92 tion combining the image in visible and infrared color spaces with convolutional93 5 neural networks. In [24], Mantalvao and Freire have combined keystroke dynam-94 ics with voice recognition, it seems it is the first time that multibiometrics has95 been done with keystroke dynamics and another biometric modality. In [25],96 Hocquet et al. demonstrated the interest of fusion in keystroke dynamics in97 order to improve the recognition rates: three different keystroke dynamics func-98 tions are used on the same capture. The sum operator (consisting in summing99 the different scores) seems to be the most powerful approach in the literature.100 These fusion architectures are quite simple but powerful. Results can yet be101 improved (in term of error rate or computation time) by using different archi-102 tectures. A cascade fusion [26] is another interesting approach. A first test is103 done, if the user is correctly verified as the attended client or if it is detected104 as an impostor, the algorithm stops. Otherwise, another biometric authentica-105 tion (with another capture from another modality) is proceeded until obtaining106 a decision of acceptance or rejection, or reaching the end of the cascade. So,107 instead of using one decision threshold, each test (except the last one) needs108 two thresholds: one for rejection and one for acceptance. All scores between109 these thresholds are considered in an indecision zone. This mechanism is pre-110 sented in Figure 1(c). Another advantage of this method is to decrease the111 verification time by not using all the modalities, they are used only if necessary.112 This method has been successfully applied on a multibiometric system using113 face and fingerprint recognition in a mobile environment (where acquisition and114 computation times are important) [26].115 Another kind of architecture has been proposed: it is a hierarchical fusion116 scheme [27] (called multiple layers by their authors). Shen et al. have pre-117 sented this method with two different keystroke dynamics methods. The fusion118 is done at different steps, and involves different mathematical operations on119 scores (sum, weighted sum, product, min, max) and logical operations decision120 (comparison to a threshold, or, and) on differents templates extracted from the121 same capture. An extended version to any multibiometric system is presented122 in Figure 1(d). We think our work can be seen as a generalization of this paper.123 124 6 It is also possible to model the distribution of the genuine and impostor125 matching scores, we talk about Density-based score fusion. In [28], scores are126 modelled with a Gaussian Mixture Model and have been tested on three multi-127 biometric databases involving face, fingerprint, iris and speech modalities.128 129 Concerning non linear algorithms, Support Vector Machine (SVM) can also130 be used in a fusion process. Each score to combine is arranged in a vector131 and a training set is used to learn the SVM model. In [29], the SVM fusion132 to improve face recognition gives slightly better performances than weighted133 sum. Voice and online signature have been fused with SVM in [30]. In this134 experiment, arithmetic mean gives best results with noise free data, while SVM135 gives equivalent results with noisy data.136 1.3. Discussion137 In this paper, we are interested in biometric modality independent transformation-138 based score fusion [28] where the matching scores are first normalized and second139 combined. We have previously seen that in this case, arbitrary functions are140 often used. Our work is based on these various fusion architectures based on141 score fusion in order to produce a score fusion function automatically generated142 with genetic programming [31].143 144 By the way, the definition of a fusion architecture is still an open issue145 in the multibiometrics research field [32], because the range of possible fusion146 configurations is very large. We think that using automatically generated fusion147 functions can bring a new solution to solve this kind of problems.148 2. Material and Methods149 In this section, we present all the required information in order to allow150 other researchers to reproduce our experiment.151 7 2.1. Biometric databases152 As it is well known that results can be highly related to the database, for this153 study, we have used three different multibiometric databases: the first one is the154 BSSR1 [33] distributed by the NIST [34] (referenced as BSSR1 in the paper),155 the second one is a database we have created for this purpose (referenced as156 PRIVATE in the paper) and the third one is a subset of scores computed with157 the BANCA [35] database (referenced as BANCA in the text. In fact, BANCA158 database is composed of templates. We have used the scores available in [36]).159 As all these databases are multi-modal, the scores are presented with tuples:160 the ith tuple of scores is represented as si = (s 1 i , s 2 i , ..., s n i ) for a database having161 n modalities (in our case, n ∈ {4, 5}).162 The three databases are presented in detail in the following subsections while163 Table 1 presents a summary of their description.164 2.1.1. BSSR1 database165 The BSSR1 [33] database consists of an ensemble of scores sets from different166 biometric systems. In this study, we are interested in the subset containing167 the scores of two facial recognition systems and the two scores of a fingerprint168 recognition system applied to two different fingers for 512 users. We have 512169 tuples of intra-scores (comparison of the capture of an individual with its model)170 and 512 ∗ 511 = 261, 632 tuples of inter-scores (comparison of the capture of an171 individual with the model of another individual). Each tuple is composed of 4172 scores: s = (s1 bssr1 , s2 bssr1 , s3 bssr1 , s4 bssr1 ), they respectively represent the score of173 the algorithm A of face recognition, the score of algorithm B of face recognition174 (the same face image is used for the two algorithms), the score of the fingerprint175 recognition with left index, the score of fingerprint recognition with right index.176 This database has been used several times in the literature [28, 37].177 2.1.2. PRIVATE database178 The second database is a chimeric one we have created by combining two179 public biometric template databases: the AR [38] for the facial recognition and180 8 the GREYC keystroke [39] for keystroke dynamics.181 182 The AR database is composed of frontal facial images of 126 individuals183 under different facial expression, illumination conditions or occlusions. This is184 a quite difficult database in reason of these specificities. These images have185 been taken during two different sessions with 13 captures per session. The186 GREYC keystroke contains the captures on several session during a two months187 period involving 133 individuals. Users were asked to type the password ”greyc188 laboratory” 6 times on a laptop and 6 times on an USB keyboard by interleaving189 the typings.190 We have selected the first 100 individual of the AR database and we have191 associated each of these individuals to another one in a subset of the GREYC192 keystroke database having 5 sessions of captures. We then used the 10 first193 captures to create the model of each user and the 16 remaining ones to compute194 the intra and inter scores.195 These scores have been computed by using two different methods for the196 face recognition (the scores s1private and s 2 private and three different ones for the197 keystroke dynamics (s3private, s 4 private and s 5 private scores). The face recognition198 algorithms are based on eigenfaces [11] and SIFT keypoints [40] comparisons199 between images from the model and the capture [41]. Keystroke dynamics scores200 have been computed by using different methods [42] based on SVM, statistical201 information and rhythm measures.202 2.1.3. BANCA database203 The lastest used benchmark is a subset of scores produced by the help of204 the BANCA database [36]. The selected scores correspond to the following205 one labelled: IDIAP voice gmm auto scale 25 100 pca.scores for s1banca, SUR-206 REY face nc man scale 100.scores for s2 banca , SURREY face svm man scale 0.13.scores207 for s3banca and208 UC3M voice gmm auto scale 10 100.scores for s4 banca .209 We have empirically chosen this subset. G1 set is used as the learning set,210 9 Table 1: Summary of the different databases used to validate the proposed method Nb of BSSR1 PRIVATE BANCA users 512 100 208 intra tuple 512 1600 467 inter tuple 261632 158400 624 items/tuples 4 5 4 while G2 set is used as the validation set. Users from G1 are different than users211 from G2.212 2.1.4. Discussion213 The main differences between these three benchmarks are:214 • the biometric modalities used in BSSR1 and BANCA have better perfor-215 mances than the ones in PRIVATE;216 • the quantity of intra-scores is more important in PRIVATE (only one tuple217 of intra-score per user in BSSR1 instead of several in PRIVATE);218 • BSSR1 and BANCA are databases of scores (by the way, we do not know219 the biometric systems having generated them) whereas PRIVATE is a220 database of templates (we had to compute the scores);221 • BSSR1 and BANCA are more adapted to physical access control appli-222 cations (i.e., a building is protected by a multi-modal biometric system),223 while PRIVATE is more adapted to logical access control (i.e., the au-224 thentication to a Web service is protected by a multi-modal biometric225 system).226 In the following subsections, we describe the proposed methodology to auto-227 matically generate a score fusion function with genetic programming. We adopt228 the classical score fusion context described in Figure 1(b). Before using the229 scores provided by different biometric systems, we need to normalize them.230 2.2. Score Normalization231 It is necessary to normalize the various scores before operating the fusion pro-232 cess: indeed, these scores come from different classifiers and their values do not233 10 necessarily evolve within the same interval. We have chosen to use the tanh [43]234 operator to normalize the scores of each modality. Equation (1) presents the235 normalization method, where µmgen and σ m gen respectively represents the average236 and standard deviation of the genuine scores of the modality m. The genuine237 scores are obtained by comparing the model and the capture of the same user:238 they are also called the intra scores. In opposition, the inter scores are obtained239 by comparing the model of a user with the capture of other users. score′ and240 score respectively represents the scores after and before normalisation.241 score′ = 1 2 { tanh ( 1 100 ( score − µmgen σmgen ) + 1 } (1) We have selected this normalization procedure from the state of the art242 because it is known to be stable [44] and does not use impostors patterns which243 can be hard or impossible to obtain in a real application. The aim of this244 paper is not to analyse the performance of biometric systems depending on the245 normalization procedure, but to present a new multibiometrics fusion procedure.246 The scores of each modality have been normalized using this procedure.247 2.3. Fusion Procedure248 In this study, we have chosen to use genetic programming [31] in order to249 generate score fusion functions. Genetic programming belongs to the family of250 evolutionary algorithms and its scheme is quite similar to the one of genetic251 algorithms [45]: a population of computer programs (possibly represented by a252 tree) evolves during several generations; different genetic operators are used to253 create the new population. Programs are evaluated by using a fitness function254 which produces a value that is used for their comparisons and gives a probability255 of selection during the tournaments. In a system where the computer programs256 are represented by trees, their leaves mainly represent the entries of the problem,257 the root gives the solution to the problem and the other nodes are the various258 functions taking into arguments the values of their children nodes.259 The leaves are called terminals and can be of several kinds: (a) pseudo-260 variables containing the real entries of the problem (in our case, the list of261 11 scores of each modality), (b) some constants possibly randomly generated, (c)262 functions without any arguments having any side effect, or (d) some ordinary263 variables.264 The different genetic operators usually used during the evolution are (a)265 the crossover, where randomly choose sub-trees have two different trees are266 exchanged, (b) the mutation, where a sub-tree is destroyed and replaced by267 another one randomly generated, or (c) the copy, where the tree is conserved in268 the next generation. The different steps of a genetic programming engine are269 presented as following:270 1. An initial population is randomly generated. This population is composed271 of computer programs using the available functions and terminals. The272 trees are built using a recursive procedure.273 2. The following steps are repeated until the termination criterion is satis-274 fied (the fitness function has reached the right value, or we reached the275 maximum number of generations).276 (a) Computation of the fitness measure of each program (the program-277 ming is evaluated according to its input data).278 (b) Selection of programs with a probability based on their fitness to279 apply them the genetic operations.280 (c) Creation of the new generation of programs by applying the follow-281 ing genetic operations (depending on their probabilities) to the pre-282 viously selected programs:283 • Reproduction: the individual is copied to the new population.284 • Crossover: A new offspring program is created by recombining285 randomly chosen parts from two select programs. An example is286 provided in Figure 2.287 • Mutation: A new offspring program is created by mutating one288 node of the selected program at a randomly chosen place. An289 example is provided in Figure 3.290 3. the single best program of the whole population is designated as the win-291 ner. This can be the solution or an approximate solution to the problem.292 12 A B C D E F G H I J (a) Program source 1 1 2 3 4 5 6 7 8 (b) Program source 2 A B 2 4 5 6 7 8 (c) Program result 1 1 C 3 D E F G H I J (d) Program result 2 Figure 2: Crossover in genetic programming: node C from tree 1 is exchanged with node 2 from tree 2. Program result 1 is the new individual to add to the new generation. 293 13 A B C D E (a) Program source A 1 C 2 3 D E (b) Program result Figure 3: Mutation in genetic programming: node B is replaced by another sub-tree. Different applications to genetic programming are presented in [46] as well294 as their bibliographic references. The fields of these applications can be listed295 in curve fitting, data modelling, symbolic regression, image and signal process-296 ing, economics, industrial process control, medicine, biology, bioinformatics,297 compression... but, it seems, so far of our knowledge, that it has not been298 yet applied to multibiometrics. We only found one reference on genetic pro-299 gramming in the biometrics field. In this paper [47], authors have used genetic300 programming to learn speaker recognition programs. They have used an island301 model where different islands operate their genetic programming evolution, and,302 after each generation some individuals are able to leave to another island. The303 obtained performance was similar to the state of the art in speaker recognition304 in normal conditions, but, the generated systems performed better in degraded305 conditions.306 More information about the configuration of the genetic programming sys-307 tem is presented in the next section.308 2.4. Parameters of the Genetic Programming309 We want to use a score fusion function that returns a score related to the310 performance of a multibiometric system. This score has to be compared with a311 threshold in order to make the decision of acceptance or rejection of the user.312 14 In this case, none logical operation is required in the generated programs and313 different information can be extracted from the result of the fusion function (we314 can compute the ROC curve, the EER, ...).315 2.4.1. Fitness Function316 The EER (Error Equal Rate) is usually used to compare the performance317 of different biometric systems together. A low EER means that FAR and FRR318 are both low and the system has a good performance if its threshold is config-319 ured accordingly to obtain this value. For this reason, we have chosen to use320 this running point to evaluate the performance of the generated score fusion321 functions.322 To compute the EER, we consider the highest and lowest values in the final323 scores generated by the genetic programming. Then, we set a threshold at the324 lowest score and linearly increment it until obtaining the highest score value in325 1000 steps. For each of these steps, we compute the FAR (comparison between326 the threshold and the inter scores) and FRR (comparison between the threshold327 and the intra scores). The ROC curve can be obtained by plotting all these328 couples of (FAR, FRR), while the EER is the mean of FAR and FRR for the329 couple having the lowest absolute difference. So, the fitness function is fitness =330 (FARi + FRRi)/2, where i is the threshold for which abs(FARi − FRRi) is331 minimal.332 2.4.2. Genetic Programming Parameters333 In this section, we present the various parameters used in the genetic pro-334 gramming algorithm. Table 2 presents the various parameters of the evolution-335 ary algorithm.336 To achieve this experiment, we used the PySTEP [48] library. The generated337 programs contain basic functions (+, −, ∗, /, min, max, avg). The terminals338 are the scores of the biometric systems and random constants between 0 and 1.339 The whole fitness cases are completed with a single tree evaluation, thanks to340 the numpy [49] library. Each fitness case is a tuple of scores (where each score341 15 Table 2: Summary of the configuration of the genetic programming iterations. Numbers used in function set can be scores or constants. Configuration Values Objective Generates a function producing a multibiometrics score. Functions set • +: addition of two numbers, • −: subtraction of two numbers, • ∗: multiplication of two num- bers, • /: division of two numbers, • min: returns the minimum of two numbers, • max: returns the maximum of two numbers, • avg: returns the mean of two numbers Fitness function Computes the EER of the multibiometric system Terminal set BSSR1 • a: scores from s1 bssr1 , • b: scores from s2bssr1, • c: scores from s3bssr1, • d: scores from s4 bssr1 , • 50 constants lin- early distributed between 0 and 1. PRIVATE • a, b, c: keystroke dynamics scores (s3private, s 4 private, s5private), • d, e: face recog- nition scores (s1private, s 2 private), • 50 constants lin- early distributed between 0 and 1. BANCA • a: scores from s1banca, • b: scores from s2 banca , • c: scores from s3banca, • d: scores from s4 banca , • 50 constants lin- early distributed between 0 and 1. Initial popula- tion 500 random trees with a depth between 2 and 8 built with the ramped half and half method. Evolution pa- rameters • Number of individuals: 500, • Maximal number of generations: 50, • Depth limited to: 8, • Probability of crossover: 45%, • Probability of mutation: 50% • Probability of reproduction: 5% (with elitism), • Selection: tournament of size 10 with a selection probability of 80%. Termination cri- terion Best individual has a fitness inferior at 0.001 (by the way, this value would never be met . . . ) or maximal number of generations reached. Learning set First half of the intra-scores tuples and first half of the inter-scores tuples. Validating set Second half of the intra-scores tuples and second half of the inter-scores tuples. 16 comes from a different biometric modality) and its result value is the score342 returned by the generated multimodal system. The global fitness value of a tree343 is the EER value computed with the previously generated scores (computation344 of the ROC curve, then reading of the EER value from it).345 PySTEP is a strongly typed genetic programming engine, but, in our case,346 we do not use any particular constraints: the root node can only have a function347 as child (no terminal in order to avoid an unimodal system, and any function of348 the set), while the other function nodes can have any of the functions as children349 as well as any of the terminals.350 The maximal depth of the generated trees is set to 8. In order to avoid351 to stay in a local minimal solution, the mutation probability is set to 50%.352 500 individuals evolve during 50 generations. We have set this few quantities,353 because during our investigations, using a population of 5000 individuals on354 100 generations did not give so much better results (gain not interesting in355 comparison to the computation time). Each database has been splitted in two356 sets of equal size: the first half is the learning set and the second half is the357 validation set.358 The mutation rate is set to 50%, the cross-over rate to 45% and the repro-359 duction rate to 5%. For mutation and cross-over the individuals are selected360 with a tournament of size 10 with a probability of 80% to select the best individ-361 ual. The same individual can be selected several times. For the reproduction,362 the individuals are selected with an elitism scheme: the 5% best individuals are363 copied from generation n − 1 to generation n. During a crossover, only the first364 offspring (of the two generated ones) is kept.365 3. Results366 In this section, we present the results of the generated fusion programs on367 the three benchmark data sets.368 The results are compared to other functions from the state of the art: (a)369 the min rule which returns the minimum score value, (b) the mul rule which370 17 returns the product of all the scores, (c) the sum rule which returns the sum371 of the scores, (c) the weight rule which returns a weighted sum, and (d) an372 SVM implementation. The weighs of the weighted sum have been configured by373 using genetic algorithm on the training sets [50, 51] (in order to give the best374 results as possible). The fitness function is the value of the EER and the genetic375 algorithm engine must lower this value. Table 3 presents the configuration of376 the genetic algorithm.377 Table 3: Configuration of the genetic algorithm to set the weights of the weighted sum Parameter Value Population 5000 Generations 500 Chromosome signification weights of the fu- sion functions Chromosome values interval [−10; 10] Fitness EER on the gen- erated function Selection normalized ge- metric selection (probability of 0.9) Elitism True For the SVM, we have computed the best parameters (i.e., search the C378 and γ parameter giving the lowest error rate) using the learning database on379 a 5-fold cross validation scheme. We have used the easy.py script provided380 with libSVM [52] for this purpose. We have then tested the performance on the381 validation set. We only obtain on functional point (and not a curve) when using382 an SVM. That’s why we have used the HTER instead of the EER.383 Table 4 presents the performances, for the three databases, of each biometric384 systems, fusion mechanisms from the sate of the art, and our contribution.385 Concerning the state of the art performances, can see that the simple fusion386 functions sum and mul tend to give better performances compared to the best387 biometric method of each database, but they are outperform by the weight rule.388 The min operator gives quite bad results (it does not improve the best biometric389 18 system). The SV M method gives good results but is outperform by the weight390 method.391 Table 5 presents the gain of performance against the weight operator (which392 gives the best results in Table 4) in term of EER and AUC.393 This gain is computed as following: gain = 100 (EERweight − EERgpfunc) EERweight (2) where EERweight and EERgpfunc are respectively the EER values of the weighted394 fusion and the generated score fusion function (the same procedure is used for395 the AUC). Better values than the weighted sum are represented in bold. The396 EER gives a local performance for one running point (system configured in or-397 der to obtain an FAR equal to the FRR), while the AUC gives a gives a global398 performance of the whole system. These two information are really interesting399 to use when comparing biometric systems. Figure 4 presents the ROC curves400 of the generated programs against the weighted sum. Performance of the initial401 biometric systems are not represented, because we have already seen that they402 are worst than the weighted sum (same remark for the other fusion functions).403 Logarithmic scales are used, because error rates are quite small.404 We can see from Table 5 and Figure 4 that most of the time, the automati-405 cally generated functions with genetic programming give slightly better results406 than the weighted sum. These improvements can be local and global and vary407 between 16% and 59% for the EER and 0.05% and 76% for the area under408 the curve. When there is no improvement, the results are equal or (in one409 case) slightly inferior. Even if there is some difference between training (not410 represented in this paper) and validating sets, we cannot observe overfitting411 problem. The BSSR1 dataset presents the largest difference of performance412 between training and validation sets, but, the results are still better than the413 ones from the state of the art (and the same problem can be observe with the414 weighted sum). By the way, the fitness criterion has never been met, we did415 not achieve to obtain fusion functions doing no error. So, the evolution always416 19 Table 4: Performance (HTER in %) of the initial methods (s1 ∗ , s2 ∗ , s3 ∗ , s4 ∗ , s5 ∗ ), the state of the art fusion functions (sum, min, mul, weight) and our proposal on the three databases. Bold values represent better performance than the initial biometric systems, and * represents fusion results better than state of the art. (a) BSSR1 Method HTER BSSR1 Biometric systems s 1 bssr1 04.30% s 2 bssr1 06.19% s 3 bssr1 08.41% s 4 bssr1 04.54% Fusion functions sum 00.70% min 05.04% mul 00.70% weight 00.38% SV M 0.77% (FAR=1.16%, FRR=0.39%) Proposal gpI 0.40% (b) PRIVATE Method HTER PRIVATE Biometric systems s 1 private 8.92% s 2 private 11.53% s 3 private 15.69% s 4 private 06.21% s 5 private 31.43% Fusion functions sum 02.70% min 13.72% mul 02.67% weight 02.26% SV M 05.47% (FAR=10.87, FRR= 0.07%) Proposal gpA 01.57%* (c) BANCA Method HTER BANCA Biometric systems s 1 banca 04.38% s 2 banca 11.54% s 3 banca 08.97% s 4 banca 07.32% Fusion functions sum 01.28% min 04.38% mul 01.28% weight 00.91% SV M 01.01% (FAR= 1.71 %, FRR=0.32%) Proposal gpΦ 00.75%* 20 Table 5: Performance gain betwain our proposal and the weighted sum (which gives the best results in the methods of the state of the art). Database EER AUC BSSR1 -5.26% 0.05% PRIVATE 34.85% 23.85% BANCA 17.58% 76.74% ended when reaching the 50th generation.417 Figure 5 represents the fitness evolution during all the generations of one418 genetic programming run on the BSSR1 database. A logarithmic scale has been419 used to give more importance to the low values and track easier the fitness420 evolution of the best individual of each generation. We can observe the same421 kind of results with the other databases. The fitness convergence appears several422 generations before the end of the computation. The worst program of each423 generation is always very bad which implies that the standard deviation of the424 fitness is also always quite huge. This can be explained by the high quantity of425 mutation probability and the low quantity of good programs kept for the next426 generation. When running the experiment several times, we obtain the same427 convergence value. We can say that we reach the maximum performance of the428 system.429 4. Discussion430 The score fusion functions generated by the proposed approach give a slightly431 better performance than the fusion functions used in the state of the art in multi-432 biometrics. We can argue that genetic programming is adapted to automatically433 define score fusion functions returning a score. The tradeoff of this performance434 gain is the need of training patterns which are not necessary for sum, mul or435 min (but this requirement is already present for the weighted sum or the use436 of an SVM). By the way, this is not really a problem, because we already need437 training patterns to configure the threshold of decision (if we do not want to do438 it empirically) or if we need to normalize the scores before doing the fusion.439 Another problem inherent to genetic programming is the complexity of the440 21 generated programs. It is probable that some subtrees could be pruned or sim-441 plified without loosing performance. Another trail would be to add regulariza-442 tion parameter to the fitness function (for example, the number of nodes or the443 depth of the tree). Generated programs would be more readable by an human444 and quicker to interpret. Figure 6 presents a simple generated tree (depend-445 ing on the database, they can be more or less complex). Even if the program446 is quite short (comparing to the other generated functions), it includes useless447 code (e.g., the subtree avg(a, a − 1/12) could be simplified by a − 1/24). Some448 generated trees include preprocessing steps by not using all the modalities in449 the terminal set.450 Genetic programming generated score fusion functions give performance451 slightly equal or better than genetic algorithm configured weighted sum. Even452 if computation time is more important than for genetic algorithm, we can think453 that the gain is not really important between the two methods, but, to obtain454 these results, genetic programming needed a population ten times smaller and455 ten times less of generations.456 5. Conclusion457 We propose in this paper a new approach for multibiometrics based on the458 automatic generation of score fusion functions. We have seen interesting ap-459 proaches in the state of the art and decided to improve them by automatically460 generated score fusion programs by the help of genetic programming.461 Our contribution concerns the designing of multibiometric systems while462 using a generic approach based on genetic programming (and is inspired from the463 state of the art architectures). The proposed method returns a multibiometrics464 score to be compared with a defined threshold. The proposed multibiometric465 system has been heavily tested on three different multibiometric databases. We466 obtained great improvements compared to classical fusion functions used in the467 state of the art. We hope to have opened a new path in the fusion of biometric468 systems thanks to genetic programming.469 22 Results could surely be improved by using different parameters in the genetic470 programming engine (i.e., more individuals and generations, different range of471 constants, different functions, . . . ). It could be interesting to test other perfor-472 mance metrics could be improved by adding quality measures of the capture,473 and if genetic programming could produce template fusion programs.474 6. Acknowledgment475 The authors would like to thank: the author of pySTEP [48], the library476 used during the experiment, for his helpfull help when encoutering problems477 with it, the authors of the various biometric databases used in this experiment,478 as well as the French Basse-Normandie region for its financial support.479 References480 [1] A. Kumar, Y. Zhou, Human Identification Using KnuckleCodes, in: IEEE481 International Conference on Biometrics: Theory, Applications and Systems482 (BTAS 2009), 2009.483 [2] M. Hashiyada, Developement of biometric dna ink for authentication secu-484 rity, Tohoku J. Exp. Med. 204 (2004) 109–117.485 [3] Z. Korotkaya, Biometric person authentication: Odor, Tech. rep., De-486 partment of Information Technology, Laboratory of Applied Mathematics,487 Lappeenranta University of Technology (2003).488 [4] A. Riera, A. Soria-Frisch, M. Caparrini, C. Grau, G. Ruffini, Unobtru-489 sive biometric system based on electroencephalogram analysis, EURASIP490 Journal on Advances in Signal Processing 2008 (2008) 8.491 [5] R. Gaines, W. Lisowski, S. Press, N. Shapiro, Authentication by keystroke492 timing: some preliminary results, Tech. rep., Rand Corporation (1980).493 [6] J. Fierrez, J. Ortega-Garcia, On-line signature, Springer US, 2008, pp.494 189–209.495 23 [7] A. Weiss, A. Ramapanicker, S. Pranav, S. Noble, L. Immohr, Mouse move-496 ments biometric identification: A feasibility study, in: Proceedings of Stu-497 dent/Faculty Research Day, CSIS, Pace University,, 2007.498 [8] D. Petrovska-Delacretaz, A. El Hannani, G. Chollet, Text-independent499 speaker verification: State of the art and challenges, Lecture Notes In Com-500 puter Science 4391 (2007) 135.501 [9] C. Nandini, C. Kumar, Comprehensive framework to gait recognition, In-502 ternational Journal of Biometrics 1 (1) (2008) 129–137.503 [10] K. Benli, R. Duzagac, M. Eskil, Driver recognition using gaussian mixture504 models and decision fusion techniques, in: ISICA 2008, 2008.505 [11] M. Turk, A. Pentland, Face recognition using eigenfaces, in: Proc. IEEE506 Conf. on Computer Vision and Pattern Recognition, Vol. 591, 1991.507 [12] D. Maltoni, A. Jain, S. Prabhakar, Handbook of fingerprint recognition,508 Springer, 2009.509 [13] A. Kumar, D. Zhang, Personal recognition using hand shape and texture,510 IEEE Transactions on Image Processing 15 (8) (2006) 2454.511 [14] Z. Xu, X. Guo, X. Hu, X. Cheng, The blood vessel recognition of ocular512 fundus, in: Proceedings of the 4th International Conference on Machine513 Learning and Cybernetics (ICMLC’05), 2005, pp. 4493–4498.514 [15] A. Ross, K. Nandakumar, A. Jain, Handbook of multibiometrics, Springer,515 2006.516 [16] M. Theofanos, B. Stanton, C. A. Wolfson, Usability & Biometrics: En-517 suring Successful Biometric Systems, National Institute of Standards and518 Technology (NIST), 2008.519 [17] ISO, Biometric performance testing and reporting, Tech. rep., ISO/IEC520 1975-1:2006(E) (2006).521 24 [18] J. Bhatnagar, A. Kumar, On estimating performance indices for biometric522 identification, Pattern Recognition 42 (2009) 1803 – 1815.523 [19] R. Raghavendra, B. Dorizzi, A. Rao, G. Hemantha Kumar, Pso versus524 adaboost for feature selection in multimodal biometrics, in: IEEE 3rd In-525 ternational Conference on Biometrics: Theory, Applications and Systems,526 BTAS 2009, 2009.527 [20] A. Rattani, M. Tistarelli, Robust multi-modal and multi-unit feature level528 fusion of face and iris biometrics, in: International Conference on biometrics529 (ICB2009), 2009.530 [21] A. Ross, A. Jain, Multimodal biometrics: An overview, in: Proceedings531 of 12th European Signal Processing Conference, Citeseer, 2004, pp. 1221–532 1224.533 [22] Y. Zuev, S. Ivanov, The voting as a way to increase the decision reliability,534 Journal of the Franklin Institute 336 (2) (1999) 361–378.535 [23] P. Buyssens, M. Revenu, O. Lepetit, Fusion of ir and visible light modali-536 ties for face recognition, in: IEEE International Conference on Biometrics:537 Theory, Applications and Systems (BTAS 2009), 2009.538 [24] J. Montalvao Filho, E. Freire, Multimodal biometric fusion—joint typist539 (keystroke) and speaker verification, in: Telecommunications Symposium,540 2006 International, 2006, pp. 609–614.541 [25] S. Hocquet, Authentification biométrique adaptative application à la dy-542 namique de frappe et à la signature manuscrite, Ph.D. thesis, Université543 de Tours (2007).544 [26] L. Allano, La biométrie multimodale : stratégies de fusion de scores et545 mesures de dépendance appliquées aux bases de personnes virtuelles, Ph.D.546 thesis, Institut National des Télécommunications (2009).547 25 [27] P. S. Teh, A. B. J. Teoh, C. Tee, T. S. Ong, A multiple layer fusion approach548 on keystroke dynamics, Pattern Analysis & Applications (2009) 14.549 [28] K. Nandakumar, Y. Chen, S. Dass, A. Jain, Likelihood ratio-based bio-550 metric score fusion, IEEE Transactions on Pattern Analysis and Machine551 Intelligence 30 (2) (2008) 342.552 [29] J. Czyz, M. Sadeghi, J. Kittler, L. Vandendorpe, Decision fusion for face553 authentication 7.554 [30] S. Garcia-Salicetti, M. Mellakh, L. Allano, B. Dorizzi, Multimodal bio-555 metric score fusion: the mean rule vs. support vector classifiers, in: Proc.556 EUSIPCO, 2005.557 [31] J. Koza, J. Rice, Genetic programming, Springer, 1992.558 [32] A. Ross, N. Poh, Handbook of Remote Biometrics, Springer, Ch. Multibio-559 metric Systems: Overview, Case Studies, and Open Issues.560 [33] NIST, Nist biometric score set (2006).561 URL http://www.itl.nist.gov/iad/894.03/biometricscores/562 [34] N. I. of Standards, Technology, Nist biometric score set (2006).563 URL http://www.itl.nist.gov/iad/894.03/biometricscores/564 [35] E. Bailly-Bailliere, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler,565 J. Mariéthoz, J. Matas, K. Messer, V. Popovici, F. Porée, et al., The566 BANCA database and evaluation protocol, Lecture Notes in Computer567 Science (2003) 625–638.568 [36] N. Poh, Banca score database.569 URL http://info.ee.surrey.ac.uk/Personal/Norman.Poh/web/570 banca_multi/main.php?bodyfile=entry_page.html571 [37] N. Sedgwick, C. Limited, Preliminary Report on Development and Evalua-572 tion of Multi-Biometric Fusion using the NIST BSSR1 517-Subject Dataset,573 Cambridge Algorithmica Linited.574 26 [38] A. Martinez, R. Benavente, The ar face database, Tech. rep., CVC Techni-575 cal report (1998).576 [39] R. Giot, M. El-Abed, R. Christophe, Greyc keystroke: a benchmark for577 keystroke dynamics biometric systems, in: IEEE International Conference578 on Biometrics: Theory, Applications and Systems (BTAS 2009), 2009.579 [40] D. Lowe, Distinctive image features from scale-invariant keypoints, Inter-580 national journal of computer vision 60 (2) (2004) 91–110.581 [41] C. Rosenberger, L. Brun, Similarity-based matching for face authentication,582 in: Proceedings of the International Conference on Pattern Recognition583 (ICPR’2008), Tampa, Florida, USA, 2008.584 [42] R. Giot, M. El-Abed, C. Rosenberger, Keystroke dynamics with low con-585 straints svm based passphrase enrollment, in: IEEE Third International586 Conference on Biometrics : Theory, Applicationsand Systems (BTAS),587 2009.588 [43] F. Hampel, E. Ronchetti, P. Rousseeuw, W. Stahel, Robust statistics: the589 approach based on influence functions, John Wiley & Sons New York, 1986.590 [44] A. Jain, K. Nandakumar, A. Ross, Score normalization in multimodal591 biometric systems, Pattern Recognition 38 (12) (2005) 2270 – 2285.592 URL http://www.sciencedirect.com/science/article/593 B6V14-4G0DDW4-1/2/d922960ee7ed8928744113dd9494d37a594 [45] M. Mitchell, An introduction to genetic algorithms, The MIT press, 1998.595 [46] R. Poli, W. Langdon, N. McPhee, A field guide to genetic programming,596 Lulu Enterprises Uk Ltd, 2008, freely available at http://www.gp-filed-597 guide.org.uk.598 [47] P. Day, A. K. Nandi, Robust text-independent speaker verification using ge-599 netic programming, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND600 LANGUAGE PROCESSING 15 (2007) 285–295.601 27 [48] M. Khoury, Python strongly typed genetic programming.602 URL http://pystep.sourceforge.net603 [49] T. Oliphant, Guide to NumPy, Spanish Fork, UT, Trelgol Publishing.604 [50] R. Giot, M. El-Abed, C. Rosenberger, Fast learning for multibiometrics sys-605 tems using genetic algorithms, in: The International Conference on High606 Performance Computing & Simulation (HPCS 2010), IEEE Computer So-607 ciety, Caen, France, 2010, p. 8.608 [51] R. Giot, B. Hemery, C. Rosenberger, Low cost and usable multimodal bio-609 metric system based on keystroke dynamicsand 2d face recognition, in:610 IAPR International Conference on Pattern Recognition (ICPR), IAPR, Is-611 tanbul, Turkey, 2010.612 [52] C. Chang, C. Lin, LIBSVM: a library for support vector machines (2001).613 28 (a) Validation with BSSR1 (b) Validation with PRIVATE (c) Validation with BANCA Figure 4: ROC curves of the fusion systems from the state of the art and with genetic programming. The EER of each fusion function is presented in the legend. Note the use of a logarithmic scale. 29 0 10 20 30 40 50 Generation (#) 10 0 10 1 10 2 F it n e ss s c o re M in /A v g /M a x Min Max Mean Std Figure 5: Fitness evolution of one run of the genetic programming evolution. The max, min, mean and std values of the fitness are represented. We want to minimize the fitness value, so lower is better. Figure 6: Sample of a ”simple” generated program. We can observe the complexity of the generated fusion function. 30