gp_biometrics.dvi


HAL Id: hal-00671952
https://hal.archives-ouvertes.fr/hal-00671952

Submitted on 20 Feb 2012

HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.

Genetic Programming for Multibiometrics
Romain Giot, Christophe Rosenberger

To cite this version:
Romain Giot, Christophe Rosenberger. Genetic Programming for Multibiometrics. Expert Systems
with Applications, Elsevier, 2012, 39 (2), pp.1837–1847. �10.1016/j.eswa.2011.08.066�. �hal-00671952�

https://hal.archives-ouvertes.fr/hal-00671952
https://hal.archives-ouvertes.fr


Genetic Programming for Multibiometrics

Romain Giot∗, Christophe Rosenberger

GREYC Laboratory

ENSICAEN - University of Caen - CNRS

6 Boulevard Maréchal Juin 14000 Caen Cedex - France

Abstract

Biometric systems suffer from some drawbacks: a biometric system can provide

in general good performances except with some individuals as its performance

depends highly on the quality of the capture... One solution to solve some of

these problems is to use multibiometrics where different biometric systems are

combined together (multiple captures of the same biometric modality, multiple

feature extraction algorithms, multiple biometric modalities. . . ). In this paper,

we are interested in score level fusion functions application (i.e., we use a multi-

biometric authentication scheme which accept or deny the claimant for using

an application). In the state of the art, the weighted sum of scores (which is a

linear classifier) and the use of an SVM (which is a non linear classifier) pro-

vided by different biometric systems provid one of the best performances. We

present a new method based on the use of genetic programming giving similar or

better performances (depending on the complexity of the database). We derive

a score fusion function by assembling some classical primitives functions (+, ∗,

−, ...). We have validated the proposed method on three significant biometric

benchmark datasets from the state of the art.

Keywords: Multibiometrics, Genetic Programming, Score fusion,

Authentication.

∗Corresponding author
Email addresses: romain.giot@ensicaen.fr (Romain Giot),

christophe.rosenberger@greyc.ensicaen.fr (Christophe Rosenberger)

Preprint submitted to Expert Systems with Applications February 20, 2012


1. Introduction1

1.1. Objective2

Every day, new evolutions are brought in the biometric field of research.3

These evolutions include the proposition of new algorithms with better per-4

formances, new approaches (cancelable biometrics, soft biometrics, ...) and5

even new biometric modalities (like finger knuckle recognition [1], for example).6

There are many different biometric modalites, each classified among three main7

families (even if we can find a more precise topology in the literature) :8

• biological : recognition based on the analysis of biological data linked to an9

individual (e.g., DNA analysis [2], the odor [3], the analysis of the blood10

of different physiological signals, as well as heart beat or EEG [4]);11

• behavioural : based on the analysis of an individual behaviour while he is12

performing a specific task (e.g., keystroke dynamics [5], online handwrit-13

ten signature [6], the way of using the mouse of the computer [7], voice14

recognition [8], gait dynamics (way of walking) [9] or way of driving [10]);15

• morphological based on the recognition of different particular physical pat-16

terns, which are, for most people, permanent and unique (e.g., face recog-17

nition [11], fingerprint recognition [12], hand shape recognition [13], or18

blood vessel [14], ...).19

Nevertheless, there will always be some users for which a biometric modality20

(or method applied to this modality) gives bad results, whereas, they are better21

in average. These low performances can be implied by different facts: the quality22

of the capture, the instant of acquisition and the individual itself but they have23

the same implication (impostors can be accepted or user need to authenticate24

themselves several times on the system before being accepted). Multibiometrics25

allow to solve this problem while obtaining better performances (i.e., better26

security by accepting less impostors and better user acceptance by rejecting less27

genuine users) and by expecting that errors of the different modalities are not28

2


correlated. In this paper, we propose a generic approach for multibiometric29

systems.30

We can find different types of biometric multimodalites [15]. They use:31

1. different sensors of the same biometric modality (i.e., capacitive or resistive32

sensors for fingerprint acquisition);33

2. several different representations for the same capture (i.e., use of points34

of interest or texture for face or fingerprint recognition);35

3. different biometric modalities (i.e., face and fingerprint recognition);36

4. different instances of the same modality (i.e., left and right eye for iris37

recognition);38

5. multiple captures (i.e., 25 images per second in a video used for face recog-39

nition);40

6. an hybrid system composed of the association of the previous ones.41

We are interested in the first four cases in this paper. Our objective is to42

automatically generate fusion functions which combine the scores provided by43

different biometric systems in order to obtain the most efficient multibiometrics44

authentication scheme.45

1.2. Background46

1.2.1. Performance Evaluation47

In order to compare different multibiometrics systems, we need to present48

the how to evaluate them. Several works have already done on the evaluation of49

biometric systems [16, 17]. Evaluation is generally realized within three aspects:50

• performance: it has for objective to measure various statistical criteria51

on the performance of the system (Capacity [18], EER, Failure To En-52

roll (FTE), Failure To Acquire (FTA), computation time, ROC curves,53

etc [17]);54

• acceptability: it gives some information on the individuals’ perception,55

opinions and acceptance regarding the system;56

3


• security: it quantifies how well a biometric system (algorithms and de-57

vices) can resist to several types of logical and physical attacks such as58

Denial of Service (DoS) attack.59

In this paper, we are only interested in performance evaluation (because the60

fusion approach is not modality dependant and perception and security depend61

on the used modalities). The main performance metrics are the following ones:62

• FAR (False Acceptance Rate) which represents the ratio of impostors ac-63

cepted by the system;64

• FRR (False Rejection Rate) which represents the ratio of genuine users65

rejected by the system;66

• EER (Error Equal Rate) which is the error rate when the system is con-67

figured in order to obtain a FAR equal to the FRR;68

• ROC (Receiver Operating Characteristic) curve which plots the FRR de-69

pending on the FAR and gives an overall overview of system performance;70

• AUC (Area Under the Curve) which gives the area under the ROC curve.71

In our case, smaller is better. It is a way to globally compare performance72

of different biometric systems.73

We can also present the HTER (Half Total Error Rate) which is the mean74

between the FAR and FRR for a given threshold (this error rate is interesting75

when we cannot get the EER).76

1.2.2. Biometric Fusion77

There are several studies on multibiometrics. The fusion can be operated on78

different points of the mechanism:79

• template fusion: the templates captured by different biometric systems80

are merged together, then the learning process is realized on these new81

templates [19, 20]. Figure 1(a) presents this type of fusion. The fusion82

4


(a) Template fusion. (b) Classical score fusion.

(c) Cascade fusion. (d) Hierarchical fusion.

Figure 1: Illustration of different fusion mechanisms.

process is related to a feature selection in order to determine the most83

significant patterns to minimize errors.84

• decision fusion: the decision is taken for each of the biometric authen-85

tication system, then the final decision is done by fusing the previous86

ones [21].87

• rank fusion: the decision is done with the help of different ranks of bio-88

metric identification systems. The main method is the majority vote [22].89

• score fusion: the fusion is realized considering the output of the classifiers.90

The Figure 1(b) presents this type of fusion.91

Buyssens et al. [23] showed the interest of biometric fusion for face recogni-92

tion combining the image in visible and infrared color spaces with convolutional93

5


neural networks. In [24], Mantalvao and Freire have combined keystroke dynam-94

ics with voice recognition, it seems it is the first time that multibiometrics has95

been done with keystroke dynamics and another biometric modality. In [25],96

Hocquet et al. demonstrated the interest of fusion in keystroke dynamics in97

order to improve the recognition rates: three different keystroke dynamics func-98

tions are used on the same capture. The sum operator (consisting in summing99

the different scores) seems to be the most powerful approach in the literature.100

These fusion architectures are quite simple but powerful. Results can yet be101

improved (in term of error rate or computation time) by using different archi-102

tectures. A cascade fusion [26] is another interesting approach. A first test is103

done, if the user is correctly verified as the attended client or if it is detected104

as an impostor, the algorithm stops. Otherwise, another biometric authentica-105

tion (with another capture from another modality) is proceeded until obtaining106

a decision of acceptance or rejection, or reaching the end of the cascade. So,107

instead of using one decision threshold, each test (except the last one) needs108

two thresholds: one for rejection and one for acceptance. All scores between109

these thresholds are considered in an indecision zone. This mechanism is pre-110

sented in Figure 1(c). Another advantage of this method is to decrease the111

verification time by not using all the modalities, they are used only if necessary.112

This method has been successfully applied on a multibiometric system using113

face and fingerprint recognition in a mobile environment (where acquisition and114

computation times are important) [26].115

Another kind of architecture has been proposed: it is a hierarchical fusion116

scheme [27] (called multiple layers by their authors). Shen et al. have pre-117

sented this method with two different keystroke dynamics methods. The fusion118

is done at different steps, and involves different mathematical operations on119

scores (sum, weighted sum, product, min, max) and logical operations decision120

(comparison to a threshold, or, and) on differents templates extracted from the121

same capture. An extended version to any multibiometric system is presented122

in Figure 1(d). We think our work can be seen as a generalization of this paper.123

124

6


It is also possible to model the distribution of the genuine and impostor125

matching scores, we talk about Density-based score fusion. In [28], scores are126

modelled with a Gaussian Mixture Model and have been tested on three multi-127

biometric databases involving face, fingerprint, iris and speech modalities.128

129

Concerning non linear algorithms, Support Vector Machine (SVM) can also130

be used in a fusion process. Each score to combine is arranged in a vector131

and a training set is used to learn the SVM model. In [29], the SVM fusion132

to improve face recognition gives slightly better performances than weighted133

sum. Voice and online signature have been fused with SVM in [30]. In this134

experiment, arithmetic mean gives best results with noise free data, while SVM135

gives equivalent results with noisy data.136

1.3. Discussion137

In this paper, we are interested in biometric modality independent transformation-138

based score fusion [28] where the matching scores are first normalized and second139

combined. We have previously seen that in this case, arbitrary functions are140

often used. Our work is based on these various fusion architectures based on141

score fusion in order to produce a score fusion function automatically generated142

with genetic programming [31].143

144

By the way, the definition of a fusion architecture is still an open issue145

in the multibiometrics research field [32], because the range of possible fusion146

configurations is very large. We think that using automatically generated fusion147

functions can bring a new solution to solve this kind of problems.148

2. Material and Methods149

In this section, we present all the required information in order to allow150

other researchers to reproduce our experiment.151

7


2.1. Biometric databases152

As it is well known that results can be highly related to the database, for this153

study, we have used three different multibiometric databases: the first one is the154

BSSR1 [33] distributed by the NIST [34] (referenced as BSSR1 in the paper),155

the second one is a database we have created for this purpose (referenced as156

PRIVATE in the paper) and the third one is a subset of scores computed with157

the BANCA [35] database (referenced as BANCA in the text. In fact, BANCA158

database is composed of templates. We have used the scores available in [36]).159

As all these databases are multi-modal, the scores are presented with tuples:160

the ith tuple of scores is represented as si = (s
1

i , s
2

i , ..., s
n
i ) for a database having161

n modalities (in our case, n ∈ {4, 5}).162

The three databases are presented in detail in the following subsections while163

Table 1 presents a summary of their description.164

2.1.1. BSSR1 database165

The BSSR1 [33] database consists of an ensemble of scores sets from different166

biometric systems. In this study, we are interested in the subset containing167

the scores of two facial recognition systems and the two scores of a fingerprint168

recognition system applied to two different fingers for 512 users. We have 512169

tuples of intra-scores (comparison of the capture of an individual with its model)170

and 512 ∗ 511 = 261, 632 tuples of inter-scores (comparison of the capture of an171

individual with the model of another individual). Each tuple is composed of 4172

scores: s = (s1
bssr1

, s2
bssr1

, s3
bssr1

, s4
bssr1

), they respectively represent the score of173

the algorithm A of face recognition, the score of algorithm B of face recognition174

(the same face image is used for the two algorithms), the score of the fingerprint175

recognition with left index, the score of fingerprint recognition with right index.176

This database has been used several times in the literature [28, 37].177

2.1.2. PRIVATE database178

The second database is a chimeric one we have created by combining two179

public biometric template databases: the AR [38] for the facial recognition and180

8


the GREYC keystroke [39] for keystroke dynamics.181

182

The AR database is composed of frontal facial images of 126 individuals183

under different facial expression, illumination conditions or occlusions. This is184

a quite difficult database in reason of these specificities. These images have185

been taken during two different sessions with 13 captures per session. The186

GREYC keystroke contains the captures on several session during a two months187

period involving 133 individuals. Users were asked to type the password ”greyc188

laboratory” 6 times on a laptop and 6 times on an USB keyboard by interleaving189

the typings.190

We have selected the first 100 individual of the AR database and we have191

associated each of these individuals to another one in a subset of the GREYC192

keystroke database having 5 sessions of captures. We then used the 10 first193

captures to create the model of each user and the 16 remaining ones to compute194

the intra and inter scores.195

These scores have been computed by using two different methods for the196

face recognition (the scores s1private and s
2

private and three different ones for the197

keystroke dynamics (s3private, s
4

private and s
5

private scores). The face recognition198

algorithms are based on eigenfaces [11] and SIFT keypoints [40] comparisons199

between images from the model and the capture [41]. Keystroke dynamics scores200

have been computed by using different methods [42] based on SVM, statistical201

information and rhythm measures.202

2.1.3. BANCA database203

The lastest used benchmark is a subset of scores produced by the help of204

the BANCA database [36]. The selected scores correspond to the following205

one labelled: IDIAP voice gmm auto scale 25 100 pca.scores for s1banca, SUR-206

REY face nc man scale 100.scores for s2
banca

, SURREY face svm man scale 0.13.scores207

for s3banca and208

UC3M voice gmm auto scale 10 100.scores for s4
banca

.209

We have empirically chosen this subset. G1 set is used as the learning set,210

9


Table 1: Summary of the different databases used to validate the proposed method

Nb of BSSR1 PRIVATE BANCA

users 512 100 208

intra tuple 512 1600 467

inter tuple 261632 158400 624

items/tuples 4 5 4

while G2 set is used as the validation set. Users from G1 are different than users211

from G2.212

2.1.4. Discussion213

The main differences between these three benchmarks are:214

• the biometric modalities used in BSSR1 and BANCA have better perfor-215

mances than the ones in PRIVATE;216

• the quantity of intra-scores is more important in PRIVATE (only one tuple217

of intra-score per user in BSSR1 instead of several in PRIVATE);218

• BSSR1 and BANCA are databases of scores (by the way, we do not know219

the biometric systems having generated them) whereas PRIVATE is a220

database of templates (we had to compute the scores);221

• BSSR1 and BANCA are more adapted to physical access control appli-222

cations (i.e., a building is protected by a multi-modal biometric system),223

while PRIVATE is more adapted to logical access control (i.e., the au-224

thentication to a Web service is protected by a multi-modal biometric225

system).226

In the following subsections, we describe the proposed methodology to auto-227

matically generate a score fusion function with genetic programming. We adopt228

the classical score fusion context described in Figure 1(b). Before using the229

scores provided by different biometric systems, we need to normalize them.230

2.2. Score Normalization231

It is necessary to normalize the various scores before operating the fusion pro-232

cess: indeed, these scores come from different classifiers and their values do not233

10


necessarily evolve within the same interval. We have chosen to use the tanh [43]234

operator to normalize the scores of each modality. Equation (1) presents the235

normalization method, where µmgen and σ
m
gen respectively represents the average236

and standard deviation of the genuine scores of the modality m. The genuine237

scores are obtained by comparing the model and the capture of the same user:238

they are also called the intra scores. In opposition, the inter scores are obtained239

by comparing the model of a user with the capture of other users. score′ and240

score respectively represents the scores after and before normalisation.241

score′ =
1

2

{

tanh

(

1

100
(
score − µmgen

σmgen

)

+ 1

}

(1)

We have selected this normalization procedure from the state of the art242

because it is known to be stable [44] and does not use impostors patterns which243

can be hard or impossible to obtain in a real application. The aim of this244

paper is not to analyse the performance of biometric systems depending on the245

normalization procedure, but to present a new multibiometrics fusion procedure.246

The scores of each modality have been normalized using this procedure.247

2.3. Fusion Procedure248

In this study, we have chosen to use genetic programming [31] in order to249

generate score fusion functions. Genetic programming belongs to the family of250

evolutionary algorithms and its scheme is quite similar to the one of genetic251

algorithms [45]: a population of computer programs (possibly represented by a252

tree) evolves during several generations; different genetic operators are used to253

create the new population. Programs are evaluated by using a fitness function254

which produces a value that is used for their comparisons and gives a probability255

of selection during the tournaments. In a system where the computer programs256

are represented by trees, their leaves mainly represent the entries of the problem,257

the root gives the solution to the problem and the other nodes are the various258

functions taking into arguments the values of their children nodes.259

The leaves are called terminals and can be of several kinds: (a) pseudo-260

variables containing the real entries of the problem (in our case, the list of261

11


scores of each modality), (b) some constants possibly randomly generated, (c)262

functions without any arguments having any side effect, or (d) some ordinary263

variables.264

The different genetic operators usually used during the evolution are (a)265

the crossover, where randomly choose sub-trees have two different trees are266

exchanged, (b) the mutation, where a sub-tree is destroyed and replaced by267

another one randomly generated, or (c) the copy, where the tree is conserved in268

the next generation. The different steps of a genetic programming engine are269

presented as following:270

1. An initial population is randomly generated. This population is composed271

of computer programs using the available functions and terminals. The272

trees are built using a recursive procedure.273

2. The following steps are repeated until the termination criterion is satis-274

fied (the fitness function has reached the right value, or we reached the275

maximum number of generations).276

(a) Computation of the fitness measure of each program (the program-277

ming is evaluated according to its input data).278

(b) Selection of programs with a probability based on their fitness to279

apply them the genetic operations.280

(c) Creation of the new generation of programs by applying the follow-281

ing genetic operations (depending on their probabilities) to the pre-282

viously selected programs:283

• Reproduction: the individual is copied to the new population.284

• Crossover: A new offspring program is created by recombining285

randomly chosen parts from two select programs. An example is286

provided in Figure 2.287

• Mutation: A new offspring program is created by mutating one288

node of the selected program at a randomly chosen place. An289

example is provided in Figure 3.290

3. the single best program of the whole population is designated as the win-291

ner. This can be the solution or an approximate solution to the problem.292

12


A

B C

D E F

G H

I J

(a) Program source 1

1

2 3

4 5 6

7 8

(b) Program source 2

A

B 2

4 5 6

7 8

(c) Program result 1

1

C 3

D E F

G H

I J

(d) Program result 2

Figure 2: Crossover in genetic programming: node C from tree 1 is exchanged with node 2
from tree 2. Program result 1 is the new individual to add to the new generation.

293

13


A

B C

D E

(a) Program source

A

1 C

2 3 D E

(b) Program result

Figure 3: Mutation in genetic programming: node B is replaced by another sub-tree.

Different applications to genetic programming are presented in [46] as well294

as their bibliographic references. The fields of these applications can be listed295

in curve fitting, data modelling, symbolic regression, image and signal process-296

ing, economics, industrial process control, medicine, biology, bioinformatics,297

compression... but, it seems, so far of our knowledge, that it has not been298

yet applied to multibiometrics. We only found one reference on genetic pro-299

gramming in the biometrics field. In this paper [47], authors have used genetic300

programming to learn speaker recognition programs. They have used an island301

model where different islands operate their genetic programming evolution, and,302

after each generation some individuals are able to leave to another island. The303

obtained performance was similar to the state of the art in speaker recognition304

in normal conditions, but, the generated systems performed better in degraded305

conditions.306

More information about the configuration of the genetic programming sys-307

tem is presented in the next section.308

2.4. Parameters of the Genetic Programming309

We want to use a score fusion function that returns a score related to the310

performance of a multibiometric system. This score has to be compared with a311

threshold in order to make the decision of acceptance or rejection of the user.312

14


In this case, none logical operation is required in the generated programs and313

different information can be extracted from the result of the fusion function (we314

can compute the ROC curve, the EER, ...).315

2.4.1. Fitness Function316

The EER (Error Equal Rate) is usually used to compare the performance317

of different biometric systems together. A low EER means that FAR and FRR318

are both low and the system has a good performance if its threshold is config-319

ured accordingly to obtain this value. For this reason, we have chosen to use320

this running point to evaluate the performance of the generated score fusion321

functions.322

To compute the EER, we consider the highest and lowest values in the final323

scores generated by the genetic programming. Then, we set a threshold at the324

lowest score and linearly increment it until obtaining the highest score value in325

1000 steps. For each of these steps, we compute the FAR (comparison between326

the threshold and the inter scores) and FRR (comparison between the threshold327

and the intra scores). The ROC curve can be obtained by plotting all these328

couples of (FAR, FRR), while the EER is the mean of FAR and FRR for the329

couple having the lowest absolute difference. So, the fitness function is fitness =330

(FARi + FRRi)/2, where i is the threshold for which abs(FARi − FRRi) is331

minimal.332

2.4.2. Genetic Programming Parameters333

In this section, we present the various parameters used in the genetic pro-334

gramming algorithm. Table 2 presents the various parameters of the evolution-335

ary algorithm.336

To achieve this experiment, we used the PySTEP [48] library. The generated337

programs contain basic functions (+, −, ∗, /, min, max, avg). The terminals338

are the scores of the biometric systems and random constants between 0 and 1.339

The whole fitness cases are completed with a single tree evaluation, thanks to340

the numpy [49] library. Each fitness case is a tuple of scores (where each score341

15


Table 2: Summary of the configuration of the genetic programming iterations. Numbers used
in function set can be scores or constants.
Configuration Values

Objective Generates a function producing a multibiometrics score.
Functions set

• +: addition of two numbers,
• −: subtraction of two numbers,
• ∗: multiplication of two num-
bers,

• /: division of two numbers,
• min: returns the minimum of

two numbers,

• max: returns the maximum of
two numbers,

• avg: returns the mean of two
numbers

Fitness function Computes the EER of the multibiometric system

Terminal set

BSSR1

• a: scores from
s1
bssr1

,
• b: scores from
s2bssr1,

• c: scores from
s3bssr1,

• d: scores from
s4
bssr1

,
• 50 constants lin-
early distributed
between 0 and 1.

PRIVATE

• a, b, c: keystroke
dynamics scores
(s3private, s

4

private,

s5private),

• d, e: face recog-
nition scores
(s1private, s

2

private),

• 50 constants lin-
early distributed
between 0 and 1.

BANCA

• a: scores from
s1banca,

• b: scores from
s2
banca

,
• c: scores from
s3banca,

• d: scores from
s4
banca

,
• 50 constants lin-
early distributed
between 0 and 1.

Initial popula-

tion

500 random trees with a depth between 2 and 8 built with the ramped half and
half method.

Evolution pa-

rameters
• Number of individuals: 500,
• Maximal number of generations:
50,

• Depth limited to: 8,
• Probability of crossover: 45%,
• Probability of mutation: 50%

• Probability of reproduction: 5%
(with elitism),

• Selection: tournament of size 10
with a selection probability of
80%.

Termination cri-

terion

Best individual has a fitness inferior at 0.001 (by the way, this value would
never be met . . . ) or maximal number of generations reached.

Learning set First half of the intra-scores tuples and first half of the inter-scores tuples.
Validating set Second half of the intra-scores tuples and second half of the inter-scores tuples.

16


comes from a different biometric modality) and its result value is the score342

returned by the generated multimodal system. The global fitness value of a tree343

is the EER value computed with the previously generated scores (computation344

of the ROC curve, then reading of the EER value from it).345

PySTEP is a strongly typed genetic programming engine, but, in our case,346

we do not use any particular constraints: the root node can only have a function347

as child (no terminal in order to avoid an unimodal system, and any function of348

the set), while the other function nodes can have any of the functions as children349

as well as any of the terminals.350

The maximal depth of the generated trees is set to 8. In order to avoid351

to stay in a local minimal solution, the mutation probability is set to 50%.352

500 individuals evolve during 50 generations. We have set this few quantities,353

because during our investigations, using a population of 5000 individuals on354

100 generations did not give so much better results (gain not interesting in355

comparison to the computation time). Each database has been splitted in two356

sets of equal size: the first half is the learning set and the second half is the357

validation set.358

The mutation rate is set to 50%, the cross-over rate to 45% and the repro-359

duction rate to 5%. For mutation and cross-over the individuals are selected360

with a tournament of size 10 with a probability of 80% to select the best individ-361

ual. The same individual can be selected several times. For the reproduction,362

the individuals are selected with an elitism scheme: the 5% best individuals are363

copied from generation n − 1 to generation n. During a crossover, only the first364

offspring (of the two generated ones) is kept.365

3. Results366

In this section, we present the results of the generated fusion programs on367

the three benchmark data sets.368

The results are compared to other functions from the state of the art: (a)369

the min rule which returns the minimum score value, (b) the mul rule which370

17


returns the product of all the scores, (c) the sum rule which returns the sum371

of the scores, (c) the weight rule which returns a weighted sum, and (d) an372

SVM implementation. The weighs of the weighted sum have been configured by373

using genetic algorithm on the training sets [50, 51] (in order to give the best374

results as possible). The fitness function is the value of the EER and the genetic375

algorithm engine must lower this value. Table 3 presents the configuration of376

the genetic algorithm.377

Table 3: Configuration of the genetic algorithm to set the weights of the weighted sum

Parameter Value

Population 5000
Generations 500
Chromosome signification weights of the fu-

sion functions
Chromosome values interval [−10; 10]
Fitness EER on the gen-

erated function
Selection normalized ge-

metric selection
(probability of
0.9)

Elitism True

For the SVM, we have computed the best parameters (i.e., search the C378

and γ parameter giving the lowest error rate) using the learning database on379

a 5-fold cross validation scheme. We have used the easy.py script provided380

with libSVM [52] for this purpose. We have then tested the performance on the381

validation set. We only obtain on functional point (and not a curve) when using382

an SVM. That’s why we have used the HTER instead of the EER.383

Table 4 presents the performances, for the three databases, of each biometric384

systems, fusion mechanisms from the sate of the art, and our contribution.385

Concerning the state of the art performances, can see that the simple fusion386

functions sum and mul tend to give better performances compared to the best387

biometric method of each database, but they are outperform by the weight rule.388

The min operator gives quite bad results (it does not improve the best biometric389

18


system). The SV M method gives good results but is outperform by the weight390

method.391

Table 5 presents the gain of performance against the weight operator (which392

gives the best results in Table 4) in term of EER and AUC.393

This gain is computed as following:

gain = 100
(EERweight − EERgpfunc)

EERweight
(2)

where EERweight and EERgpfunc are respectively the EER values of the weighted394

fusion and the generated score fusion function (the same procedure is used for395

the AUC). Better values than the weighted sum are represented in bold. The396

EER gives a local performance for one running point (system configured in or-397

der to obtain an FAR equal to the FRR), while the AUC gives a gives a global398

performance of the whole system. These two information are really interesting399

to use when comparing biometric systems. Figure 4 presents the ROC curves400

of the generated programs against the weighted sum. Performance of the initial401

biometric systems are not represented, because we have already seen that they402

are worst than the weighted sum (same remark for the other fusion functions).403

Logarithmic scales are used, because error rates are quite small.404

We can see from Table 5 and Figure 4 that most of the time, the automati-405

cally generated functions with genetic programming give slightly better results406

than the weighted sum. These improvements can be local and global and vary407

between 16% and 59% for the EER and 0.05% and 76% for the area under408

the curve. When there is no improvement, the results are equal or (in one409

case) slightly inferior. Even if there is some difference between training (not410

represented in this paper) and validating sets, we cannot observe overfitting411

problem. The BSSR1 dataset presents the largest difference of performance412

between training and validation sets, but, the results are still better than the413

ones from the state of the art (and the same problem can be observe with the414

weighted sum). By the way, the fitness criterion has never been met, we did415

not achieve to obtain fusion functions doing no error. So, the evolution always416

19


Table 4: Performance (HTER in %) of the initial methods (s1
∗
, s2

∗
, s3

∗
, s4

∗
, s5

∗
), the state of

the art fusion functions (sum, min, mul, weight) and our proposal on the three databases.
Bold values represent better performance than the initial biometric systems, and * represents
fusion results better than state of the art.

(a) BSSR1

Method HTER

BSSR1

Biometric systems

s
1

bssr1 04.30%
s
2

bssr1 06.19%
s
3

bssr1 08.41%
s
4

bssr1 04.54%

Fusion functions

sum 00.70%

min 05.04%
mul 00.70%

weight 00.38%

SV M 0.77% (FAR=1.16%, FRR=0.39%)

Proposal gpI 0.40%

(b) PRIVATE

Method HTER

PRIVATE

Biometric systems

s
1

private 8.92%
s
2

private 11.53%
s
3

private 15.69%
s
4

private 06.21%
s
5

private 31.43%

Fusion functions

sum 02.70%

min 13.72%
mul 02.67%

weight 02.26%

SV M 05.47% (FAR=10.87, FRR= 0.07%)

Proposal gpA 01.57%*

(c) BANCA

Method HTER

BANCA

Biometric systems

s
1

banca 04.38%
s
2

banca 11.54%
s
3

banca 08.97%
s
4

banca 07.32%

Fusion functions

sum 01.28%

min 04.38%
mul 01.28%

weight 00.91%

SV M 01.01% (FAR= 1.71 %, FRR=0.32%)

Proposal gpΦ 00.75%*

20


Table 5: Performance gain betwain our proposal and the weighted sum (which gives the best
results in the methods of the state of the art).

Database EER AUC

BSSR1 -5.26% 0.05%
PRIVATE 34.85% 23.85%
BANCA 17.58% 76.74%

ended when reaching the 50th generation.417

Figure 5 represents the fitness evolution during all the generations of one418

genetic programming run on the BSSR1 database. A logarithmic scale has been419

used to give more importance to the low values and track easier the fitness420

evolution of the best individual of each generation. We can observe the same421

kind of results with the other databases. The fitness convergence appears several422

generations before the end of the computation. The worst program of each423

generation is always very bad which implies that the standard deviation of the424

fitness is also always quite huge. This can be explained by the high quantity of425

mutation probability and the low quantity of good programs kept for the next426

generation. When running the experiment several times, we obtain the same427

convergence value. We can say that we reach the maximum performance of the428

system.429

4. Discussion430

The score fusion functions generated by the proposed approach give a slightly431

better performance than the fusion functions used in the state of the art in multi-432

biometrics. We can argue that genetic programming is adapted to automatically433

define score fusion functions returning a score. The tradeoff of this performance434

gain is the need of training patterns which are not necessary for sum, mul or435

min (but this requirement is already present for the weighted sum or the use436

of an SVM). By the way, this is not really a problem, because we already need437

training patterns to configure the threshold of decision (if we do not want to do438

it empirically) or if we need to normalize the scores before doing the fusion.439

Another problem inherent to genetic programming is the complexity of the440

21


generated programs. It is probable that some subtrees could be pruned or sim-441

plified without loosing performance. Another trail would be to add regulariza-442

tion parameter to the fitness function (for example, the number of nodes or the443

depth of the tree). Generated programs would be more readable by an human444

and quicker to interpret. Figure 6 presents a simple generated tree (depend-445

ing on the database, they can be more or less complex). Even if the program446

is quite short (comparing to the other generated functions), it includes useless447

code (e.g., the subtree avg(a, a − 1/12) could be simplified by a − 1/24). Some448

generated trees include preprocessing steps by not using all the modalities in449

the terminal set.450

Genetic programming generated score fusion functions give performance451

slightly equal or better than genetic algorithm configured weighted sum. Even452

if computation time is more important than for genetic algorithm, we can think453

that the gain is not really important between the two methods, but, to obtain454

these results, genetic programming needed a population ten times smaller and455

ten times less of generations.456

5. Conclusion457

We propose in this paper a new approach for multibiometrics based on the458

automatic generation of score fusion functions. We have seen interesting ap-459

proaches in the state of the art and decided to improve them by automatically460

generated score fusion programs by the help of genetic programming.461

Our contribution concerns the designing of multibiometric systems while462

using a generic approach based on genetic programming (and is inspired from the463

state of the art architectures). The proposed method returns a multibiometrics464

score to be compared with a defined threshold. The proposed multibiometric465

system has been heavily tested on three different multibiometric databases. We466

obtained great improvements compared to classical fusion functions used in the467

state of the art. We hope to have opened a new path in the fusion of biometric468

systems thanks to genetic programming.469

22


Results could surely be improved by using different parameters in the genetic470

programming engine (i.e., more individuals and generations, different range of471

constants, different functions, . . . ). It could be interesting to test other perfor-472

mance metrics could be improved by adding quality measures of the capture,473

and if genetic programming could produce template fusion programs.474

6. Acknowledgment475

The authors would like to thank: the author of pySTEP [48], the library476

used during the experiment, for his helpfull help when encoutering problems477

with it, the authors of the various biometric databases used in this experiment,478

as well as the French Basse-Normandie region for its financial support.479

References480

[1] A. Kumar, Y. Zhou, Human Identification Using KnuckleCodes, in: IEEE481

International Conference on Biometrics: Theory, Applications and Systems482

(BTAS 2009), 2009.483

[2] M. Hashiyada, Developement of biometric dna ink for authentication secu-484

rity, Tohoku J. Exp. Med. 204 (2004) 109–117.485

[3] Z. Korotkaya, Biometric person authentication: Odor, Tech. rep., De-486

partment of Information Technology, Laboratory of Applied Mathematics,487

Lappeenranta University of Technology (2003).488

[4] A. Riera, A. Soria-Frisch, M. Caparrini, C. Grau, G. Ruffini, Unobtru-489

sive biometric system based on electroencephalogram analysis, EURASIP490

Journal on Advances in Signal Processing 2008 (2008) 8.491

[5] R. Gaines, W. Lisowski, S. Press, N. Shapiro, Authentication by keystroke492

timing: some preliminary results, Tech. rep., Rand Corporation (1980).493

[6] J. Fierrez, J. Ortega-Garcia, On-line signature, Springer US, 2008, pp.494

189–209.495

23


[7] A. Weiss, A. Ramapanicker, S. Pranav, S. Noble, L. Immohr, Mouse move-496

ments biometric identification: A feasibility study, in: Proceedings of Stu-497

dent/Faculty Research Day, CSIS, Pace University,, 2007.498

[8] D. Petrovska-Delacretaz, A. El Hannani, G. Chollet, Text-independent499

speaker verification: State of the art and challenges, Lecture Notes In Com-500

puter Science 4391 (2007) 135.501

[9] C. Nandini, C. Kumar, Comprehensive framework to gait recognition, In-502

ternational Journal of Biometrics 1 (1) (2008) 129–137.503

[10] K. Benli, R. Duzagac, M. Eskil, Driver recognition using gaussian mixture504

models and decision fusion techniques, in: ISICA 2008, 2008.505

[11] M. Turk, A. Pentland, Face recognition using eigenfaces, in: Proc. IEEE506

Conf. on Computer Vision and Pattern Recognition, Vol. 591, 1991.507

[12] D. Maltoni, A. Jain, S. Prabhakar, Handbook of fingerprint recognition,508

Springer, 2009.509

[13] A. Kumar, D. Zhang, Personal recognition using hand shape and texture,510

IEEE Transactions on Image Processing 15 (8) (2006) 2454.511

[14] Z. Xu, X. Guo, X. Hu, X. Cheng, The blood vessel recognition of ocular512

fundus, in: Proceedings of the 4th International Conference on Machine513

Learning and Cybernetics (ICMLC’05), 2005, pp. 4493–4498.514

[15] A. Ross, K. Nandakumar, A. Jain, Handbook of multibiometrics, Springer,515

2006.516

[16] M. Theofanos, B. Stanton, C. A. Wolfson, Usability & Biometrics: En-517

suring Successful Biometric Systems, National Institute of Standards and518

Technology (NIST), 2008.519

[17] ISO, Biometric performance testing and reporting, Tech. rep., ISO/IEC520

1975-1:2006(E) (2006).521

24


[18] J. Bhatnagar, A. Kumar, On estimating performance indices for biometric522

identification, Pattern Recognition 42 (2009) 1803 – 1815.523

[19] R. Raghavendra, B. Dorizzi, A. Rao, G. Hemantha Kumar, Pso versus524

adaboost for feature selection in multimodal biometrics, in: IEEE 3rd In-525

ternational Conference on Biometrics: Theory, Applications and Systems,526

BTAS 2009, 2009.527

[20] A. Rattani, M. Tistarelli, Robust multi-modal and multi-unit feature level528

fusion of face and iris biometrics, in: International Conference on biometrics529

(ICB2009), 2009.530

[21] A. Ross, A. Jain, Multimodal biometrics: An overview, in: Proceedings531

of 12th European Signal Processing Conference, Citeseer, 2004, pp. 1221–532

1224.533

[22] Y. Zuev, S. Ivanov, The voting as a way to increase the decision reliability,534

Journal of the Franklin Institute 336 (2) (1999) 361–378.535

[23] P. Buyssens, M. Revenu, O. Lepetit, Fusion of ir and visible light modali-536

ties for face recognition, in: IEEE International Conference on Biometrics:537

Theory, Applications and Systems (BTAS 2009), 2009.538

[24] J. Montalvao Filho, E. Freire, Multimodal biometric fusion—joint typist539

(keystroke) and speaker verification, in: Telecommunications Symposium,540

2006 International, 2006, pp. 609–614.541

[25] S. Hocquet, Authentification biométrique adaptative application à la dy-542

namique de frappe et à la signature manuscrite, Ph.D. thesis, Université543

de Tours (2007).544

[26] L. Allano, La biométrie multimodale : stratégies de fusion de scores et545

mesures de dépendance appliquées aux bases de personnes virtuelles, Ph.D.546

thesis, Institut National des Télécommunications (2009).547

25


[27] P. S. Teh, A. B. J. Teoh, C. Tee, T. S. Ong, A multiple layer fusion approach548

on keystroke dynamics, Pattern Analysis & Applications (2009) 14.549

[28] K. Nandakumar, Y. Chen, S. Dass, A. Jain, Likelihood ratio-based bio-550

metric score fusion, IEEE Transactions on Pattern Analysis and Machine551

Intelligence 30 (2) (2008) 342.552

[29] J. Czyz, M. Sadeghi, J. Kittler, L. Vandendorpe, Decision fusion for face553

authentication 7.554

[30] S. Garcia-Salicetti, M. Mellakh, L. Allano, B. Dorizzi, Multimodal bio-555

metric score fusion: the mean rule vs. support vector classifiers, in: Proc.556

EUSIPCO, 2005.557

[31] J. Koza, J. Rice, Genetic programming, Springer, 1992.558

[32] A. Ross, N. Poh, Handbook of Remote Biometrics, Springer, Ch. Multibio-559

metric Systems: Overview, Case Studies, and Open Issues.560

[33] NIST, Nist biometric score set (2006).561

URL http://www.itl.nist.gov/iad/894.03/biometricscores/562

[34] N. I. of Standards, Technology, Nist biometric score set (2006).563

URL http://www.itl.nist.gov/iad/894.03/biometricscores/564

[35] E. Bailly-Bailliere, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler,565

J. Mariéthoz, J. Matas, K. Messer, V. Popovici, F. Porée, et al., The566

BANCA database and evaluation protocol, Lecture Notes in Computer567

Science (2003) 625–638.568

[36] N. Poh, Banca score database.569

URL http://info.ee.surrey.ac.uk/Personal/Norman.Poh/web/570

banca_multi/main.php?bodyfile=entry_page.html571

[37] N. Sedgwick, C. Limited, Preliminary Report on Development and Evalua-572

tion of Multi-Biometric Fusion using the NIST BSSR1 517-Subject Dataset,573

Cambridge Algorithmica Linited.574

26


[38] A. Martinez, R. Benavente, The ar face database, Tech. rep., CVC Techni-575

cal report (1998).576

[39] R. Giot, M. El-Abed, R. Christophe, Greyc keystroke: a benchmark for577

keystroke dynamics biometric systems, in: IEEE International Conference578

on Biometrics: Theory, Applications and Systems (BTAS 2009), 2009.579

[40] D. Lowe, Distinctive image features from scale-invariant keypoints, Inter-580

national journal of computer vision 60 (2) (2004) 91–110.581

[41] C. Rosenberger, L. Brun, Similarity-based matching for face authentication,582

in: Proceedings of the International Conference on Pattern Recognition583

(ICPR’2008), Tampa, Florida, USA, 2008.584

[42] R. Giot, M. El-Abed, C. Rosenberger, Keystroke dynamics with low con-585

straints svm based passphrase enrollment, in: IEEE Third International586

Conference on Biometrics : Theory, Applicationsand Systems (BTAS),587

2009.588

[43] F. Hampel, E. Ronchetti, P. Rousseeuw, W. Stahel, Robust statistics: the589

approach based on influence functions, John Wiley & Sons New York, 1986.590

[44] A. Jain, K. Nandakumar, A. Ross, Score normalization in multimodal591

biometric systems, Pattern Recognition 38 (12) (2005) 2270 – 2285.592

URL http://www.sciencedirect.com/science/article/593

B6V14-4G0DDW4-1/2/d922960ee7ed8928744113dd9494d37a594

[45] M. Mitchell, An introduction to genetic algorithms, The MIT press, 1998.595

[46] R. Poli, W. Langdon, N. McPhee, A field guide to genetic programming,596

Lulu Enterprises Uk Ltd, 2008, freely available at http://www.gp-filed-597

guide.org.uk.598

[47] P. Day, A. K. Nandi, Robust text-independent speaker verification using ge-599

netic programming, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND600

LANGUAGE PROCESSING 15 (2007) 285–295.601

27


[48] M. Khoury, Python strongly typed genetic programming.602

URL http://pystep.sourceforge.net603

[49] T. Oliphant, Guide to NumPy, Spanish Fork, UT, Trelgol Publishing.604

[50] R. Giot, M. El-Abed, C. Rosenberger, Fast learning for multibiometrics sys-605

tems using genetic algorithms, in: The International Conference on High606

Performance Computing & Simulation (HPCS 2010), IEEE Computer So-607

ciety, Caen, France, 2010, p. 8.608

[51] R. Giot, B. Hemery, C. Rosenberger, Low cost and usable multimodal bio-609

metric system based on keystroke dynamicsand 2d face recognition, in:610

IAPR International Conference on Pattern Recognition (ICPR), IAPR, Is-611

tanbul, Turkey, 2010.612

[52] C. Chang, C. Lin, LIBSVM: a library for support vector machines (2001).613

28


(a) Validation with BSSR1

(b) Validation with PRIVATE

(c) Validation with BANCA

Figure 4: ROC curves of the fusion systems from the state of the art and with genetic
programming. The EER of each fusion function is presented in the legend. Note the use of a
logarithmic scale.

29


0 10 20 30 40 50
Generation (#)

10
0

10
1

10
2

F
it

n
e
ss

 s
c
o
re

 M
in

/A
v
g
/M

a
x

Min
Max
Mean
Std

Figure 5: Fitness evolution of one run of the genetic programming evolution. The max, min,
mean and std values of the fitness are represented. We want to minimize the fitness value, so
lower is better.

Figure 6: Sample of a ”simple” generated program. We can observe the complexity of the
generated fusion function.

30