A neural-AdaBoost based facial expression recognition system


Expert Systems with Applications 41 (2014) 3383–3390
Contents lists available at ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a
A neural-AdaBoost based facial expression recognition system
0957-4174 � 2013 The Authors. Published by Elsevier Ltd.
http://dx.doi.org/10.1016/j.eswa.2013.11.041

⇑ Corresponding authors. Tel.: +86 18914557945 (E. Owusu), tel.: +13852914090
(Y. Zhan).

E-mail addresses: kayowusueb@yahoo.com (E. Owusu), yzzhan@ujs.edu.cn (Y.
Zhan), mao_qr@ujs.edu.cn (Q.R. Mao).

Open access under CC BY-NC-ND license. 
Ebenezer Owusu ⇑, Yongzhao Zhan ⇑, Qi Rong Mao
School of Computer Science and Communication Engineering, Jiangsu University, 301 Xuefu Road, 212301 Zhenjiang, Jiangsu, China

a r t i c l e i n f o a b s t r a c t
Keywords:
Facial expression recognition
Bessel transform
Gabor feature
AdaBoost
MFFNN
This study improves the recognition accuracy and execution time of facial expression recognition system.
Various techniques were utilized to achieve this. The face detection component is implemented by the
adoption of Viola–Jones descriptor. The detected face is down-sampled by Bessel transform to reduce
the feature extraction space to improve processing time then. Gabor feature extraction techniques were
employed to extract thousands of facial features which represent various facial deformation patterns. An
AdaBoost-based hypothesis is formulated to select a few hundreds of the numerous extracted features to
speed up classification. The selected features were fed into a well designed 3-layer neural network clas-
sifier that is trained by a back-propagation algorithm. The system is trained and tested with datasets from
JAFFE and Yale facial expression databases. An average recognition rate of 96.83% and 92.22% are regis-
tered in JAFFE and Yale databases, respectively. The execution time for a 100 � 100 pixel size is 14.5 ms.
The general results of the proposed techniques are very encouraging when compared with others.

� 2013 The Authors. Published by Elsevier Ltd. Open access under CC BY-NC-ND license. 
1. Introduction

Facial expression is the explicit transformation of the human
face due to the automatic responses to the emotional instability.
In most situations it is spontaneous and uncontrollable. The auto-
matic facial expression involves the application of an artificial
intelligent system to recognize the expressions of the face under
any circumstance. Today, the studies of facial expressions have
gained keen interest in pattern recognition, computer vision and
its related fields. Mainly, such facial expressions are the seven pro-
totypical ones, namely; anger, fear, surprise, sad, disgust, happy,
and neutral.

Research into automatic facial expression recognition is very
important in this modern society of technological age. For instance,
the technology is applied in a wide variety of contexts, including
robotics, digital signs, mobile applications, and medicine. It is re-
ported that ‘‘some robots can operate by first recognizing expres-
sions’’ of humans (Bruce, 1993). The AIBO robot for instance is a
biologically-inspired robot that can show its emotions via an array
of LEDs located in the frontal part of the head (Breazeal & Scassel-
lati, 2002). In addition to this, the robot can also display ‘happiness’
feeling when it detects a face. In behavioral sciences and medicine
for instance, expression recognition is effectively applied for inten-
sive care monitoring (Morik, Brockhausen, & Joachims, 1999). Cur-
rently, there are developing systems that are capable of making
routine examinations of facial behavior during pain in clinical set-
tings. In infants the Neonatal Facial Coding System (NFCS) has been
employed for real-time assessment within 32 to 33 week post-con-
ceptional age infants who are undergoing a heel lance. The technol-
ogy is being used in more advanced settings to reduce accidents
through the implementation of automated detection of driver
drowsiness in public transports. This system relays information
about the drivers’ emotional states to observers for effective sur-
veillance leading to necessary awareness.

The hallmark of every facial expression system is accuracy and
to some extent the speed of execution. However most of the exist-
ing systems produce poor performances in terms of accuracy; as
for execution speed, most of the systems are even silent to give a
hint. Some few examples; Franco and Treves (2001) proposed a
neural based facial expression recognition system that used princi-
pal component analysis (PCA) to reduce the feature vectors. The
features were fed into a feed-forward neural network that was
trained by a back-propagation network. In this system an average
recognition of 84.5% was reported on the Yale facial expression
database – an achievement which is not very encouraging. Kumb-
har, Jadhav, and Patil (2012) described a neural network classifica-
tion facial expression recognition system that employs Gabor
feature extraction and feature reduction by PCA to distinguish 7-
class facial expression recognition on the JAFFE database. In this
system they specified 20 inputs, 40 to 60 hidden layers and seven
output feed-forward neural networks. Again, the 60–70% recogni-
tion accuracy they obtained by their procedure is not encouraging

http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2013.11.041&domain=pdf
http://dx.doi.org/10.1016/j.eswa.2013.11.041
mailto:kayowusueb@yahoo.com
mailto:yzzhan@ujs.edu.cn
mailto:mao_qr@ujs.edu.cn
http://dx.doi.org/10.1016/j.eswa.2013.11.041
http://www.sciencedirect.com/science/journal/09574174
http://www.elsevier.com/locate/eswa
http://creativecommons.org/licenses/by-nc-nd/3.0/
http://creativecommons.org/licenses/by-nc-nd/3.0/


Fig. 1. Sample face detection images (left), cropped detected face (right).

3384 E. Owusu et al. / Expert Systems with Applications 41 (2014) 3383–3390
to befit the expectations of a real-time system. Recently, Londhe
and Pawar (2012) extracted features of the face using Affine Mo-
ment Invariants and performed the classification using feed-for-
ward neural network. The expression recognition obtained was
93.8% on the JAFFE database. Tai and Chung (2007) extracted the
facial features using a Sobel filter. In their experiment they re-
served the maximum connected component to reduce the wrinkles
and noises and conducted 7-class classification on JAFFE facial
expression database through the application of Elman network
with two hidden layers, each layer containing fifteen neurons.
With this approach the average accuracy of automatic facial
expression recognition is 84.7%. Zhang and Tjondronegoro (2009)
extracted the expressive face by using Gabor filters, feature reduc-
tion by PCA and expression classification by neural network. In this
method an average facial recognition of 93.4% was recorded in the
JAFFE facial expression database. Dailey and Cottrell (1999) also
extracted facial features by Gabor techniques and reduced the fea-
tures by PCA. The expression classifier was neural network and the
average expression recognition was 94.5% ± 0.7 on the seven proto-
typical facial expressions, however the facial expression database
was not mentioned.

Most of these studies advocate the use of Neural Network as the
expression classifier and extracted the facial features by Gabor fil-
ters and reduced the features via PCA. The displeasing thing is that
all the results were not very encouraging.

This study persists in exploring the potentials of neural net-
works to execute this kind of assignment, trying to esteem some
biological constraints, utilizing the capabilities of modular
systems.

Though many techniques have been used to extract the facial
features, Gabor feature extraction remains a high-quality choice;
there are other alternatives but they are not very promising. Just
a few examples: Satiyan and Nagarajan (2010) utilized the Haar
technique to extract facial features which were used as input to
the neural network for classifying 8 facial expressions. The Haar
wavelet extraction is very fast (Satiyan & Nagarajan, 2010; Van,
2008), however the wavelets are too huge to result to effective
classification when used as input to classifiers in facial expression
recognition (Cemre, 2008); in other words it is a potential cause to
misclassifications and poor performance. Distance-based feature
extraction methods are also one of the largely applied techniques
used for feature extraction in both 2D and 3D static faces. The idea
behind these procedures is that the muscle deformations which are
the major causes of changes in facial expression from normal
expression results in variations of the Euclidean distances between
facial landmarks or points. These points, as well as their distances,
have been widely employed for static facial expression analysis
(Sha, Song, Bu, Chen, & Tao, 2011; Soyel & Demirel, 2007; Tang &
Huang, 2008). Among the most successful ones is feature extrac-
tion based on the Bhattacharyya distance (Choi & Lee, 2003). How-
ever, despite some advantages of this method, the degree of
computational complexity is unacceptably high. The matching of
even a small model shape with a normal image can take half an
hour on an eight-processor Sun SPARCServer 1000 (Rucklidge,
1997; Zhang & Lu, 2004). The Patch based feature extraction meth-
od is another alternative widely exploited for facial expression bio-
metrics. Maalej, Amor, Daoudi, Srivastava, and Berretti (2010) for
instance represented extracted patches from facial surfaces by sets
of closed curves and then applied a Riemannian framework to ob-
tain the shape analysis of the extracted patches. However, the
patch-based features also have numerous drawbacks. First, partic-
ular representations cannot be applied to other solutions without
major modifications: the majority of the techniques have only
been utilized to a single class. Also, most methods do not exploit
the large amounts of available training data (Aghajanian et al.,
2009).
Thus on the basis of these we still considered Gabor features as
the best approach, not because it does not have drawbacks, but the
drawbacks can be easily managed. The Gabor filter is a superior
model of simple cell receptive fields in cat striate cortex (Jones &
Palmer, 1987), and it grants exceptional basis for object recogni-
tion and face recognition (Lades et al., 1993; Wiskott, Fellous, Kru-
ger, & vonderMalsburg, 1997). Again, the Gabor methods are
superior to all the above-mentioned methods because it extracts
the maximum information from local image regions (Deng, Jin,
Zhen, & Huang, 2005), and it is invariant against, translation and
rotations (Al Daoud, 2009).

In this work, the data were reduced in dimensions by Bessel
transform (Ganga, Prakash, & Gangashetty, 2011) and then after
extraction of the face by Gabor methods, the features were further
reduced via an AdaBoost-based (Freund & Schapire, 1995) feature
reduction technique. The selected features which represented the
facial deformation patterns were then fed into a 3-layer feed-for-
ward neural network that is trained by a back-propagation algo-
rithm. It is interesting to note that Bessel down-sampling
techniques have never been adopted for facial expression recogni-
tions. Again, the combinations of Bessel down-sampling and the
formulated AdaBoost-based algorithm is an innovation that re-
duces the expression dataset to enhance accuracy and speed. Final-
ly, the construction of the feed-forward neural network is
influential to bring about successful results.

The rest of the work is arranged as follows. Section 2 discusses
face detection and image down-sampling. Section 3 discusses Ga-
bor feature extraction. Section 4 discusses feature selection. Sec-
tion 5 discusses the multilayer feed-forward neural network
(MFFNN). Results and analysis are presented in Section 6. The final
conclusions of the work are drawn in Section 7.
2. Face detection and down-sampling

The face detection component was implemented by the adop-
tion of Viola and Jones system (Viola & Jones, 2004). Fig. 1 shows
sample face detection by the Viola–Jones classifier.

The size of the image is rescaled to a window of size 20 � 20
pixels by the use of Bessel down-sampling. Methods like bilinear
interpolations have been utilized by several authors for this task
in particular but interpolations are prone to aliasing problems
(Munoz, Blu, & Unser, 2001). Bessel down-sampling reduces the
size of the image and preserves the details and perceptual quality
of the original image (Ganga et al., 2011). The down-sampling im-
age signal xd (t1, t2) is expressed as:

xdðt1; t2Þ¼
Xp
n1¼1

Xq
n2¼1

cðn1; n2ÞJ0
an1

p � r
t1

� �
J0

an2
q � s

t2

� �
ð1Þ

where p and q refer to the respective image size, p � r and q � s are
the required reduced size of the image; r and s are positive integers
that represent the reduction values, n is the number of low-fre-
quency DCT coefficients, J0ðan1Þ and J0ðan2Þ are zero order Bessel


E. Owusu et al. / Expert Systems with Applications 41 (2014) 3383–3390 3385
functions, c(n1, n2) are Bessel coefficients computed from the first
order Bessel function, t1 and t2 are chosen such that 0 6 t1 6 p � r
and 0 6 t2 6 q � s. Interested readers are referred to (Al Daoud,
2009).
3. Gabor feature extraction

The 2-D Gabor filters are spatial sinusoids localized by Gaussian
window, and they can be created to be selective for orientation,
localization, and frequency as well. It is very flexible to demon-
strate images by Gabor wavelets because the details about their
spatial relations are preserved in the process.

Gðx; y; h; u; rÞ¼
1

2pr2
exp �

x2 þ y2
2r2

� �
expf2piðR1 þ R2g ð2Þ

where i is a complex number representing the square root of �1.
R1 = ux cos h and R2 = uy sin h, u is the spatial frequency of the

band pass, h is the spatial orientation of the function G, (x, y) spec-
ify the position of light impulse in the visual field, r is the standard
deviation of 2-D Gaussian envelop. In this Gabor family, we chose
eight orientations 0; p8 ;

2p
8 ; . . . ;

7p
8

� �
and five scales 4; 4

ffiffiffi
2
p

; 8;
n

8
ffiffiffi
2
p

; 16g: In order to give added robustness to illumination we
turned the Gabor filter to zero DC (direct current) by the
expression

~Gðx; y; h; u; rÞ¼ Gðx; y; h; u; rÞ�
1
q

Xn
i¼�n

Xn
j¼�n

Gðx; y; h; u; rÞ
" #

ð3Þ

where, q is the size of the filter, given by q = (2n + 1)2. Fig. 2 shows
the Gabor filter image.

The sample points of the filtered image is coded to two bits, real
bit x1 and imaginary bit x2 such that,

G1 ¼
x1 ¼ 1; if R ~Gðx; y; h; u; rÞ

h i
� I

n o
P 0

x1 ¼ 0; if R ~Gðx; y; h; u; rÞ
h i

� I
n o

< 0

8><
>: ð4Þ
G2 ¼
x2 ¼ 1; if I ~Gðx; y; h; u; rÞ

h i
� I

n o
P 0

x2 ¼ 0; if I ~Gðx; y; h; u; rÞ
h i

� I
n o

< 0

8><
>: ð5Þ

where I is subimage of the expressional face, R and I are the
real and the imaginary parts of each Gabor kernel, � is the convo-
lution operator. With this coding, only the phase information in
the facial expressions image is stored in the feature vector of size
256 bytes. The final magnitude response which is used to represent
the feature vectors is computed by

G ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
G21 þ G

2
2

q
ð6Þ

Fig. 3 shows the magnitude response of a template image.
Fig. 2. Gabor filtered image: real (left) and imaginary (middle) parts of Gabor filter in 3
4. Feature selection

Due to the large size of the Gabor wavelets, it is not practically
possible to use all the wavelets as input to our classifier, for fear of
misclassification and possible system crash. The AdaBoost feature
reduction algorithms have special speed advantage in increasing
classification process (Shen & Bai, 2004). Thus we formulated an
AdaBoost-based algorithm to select a few deserving portions of
the wavelets.

Assuming the extracted Gabor features are represented by a
total of i e (1, 2,..., N) appearance features. Then the image I is
represented as Ui ¼fðxn; ynÞg

N
n¼�1 configured by the parameters z,

l, v. The positive sets /(+) and the negative sets /(�) are denoted
by /ðþÞ ¼ fðxn; ynÞg

N
n¼1 � R

J �ð�1Þ and /ð�Þ ¼ fðxn; ynÞg
N
n¼�1 � R

J�
ð�1Þ respectively, where xn is the nth data sample containing J fea-
tures, and yn is its corresponding class label. To train the vectors
||G||, which is denoted by /(u,v,z) over a distribution D, we simply
determined the weights of all the feature vectors Ui ¼
fðxn; ynÞg

N
n¼�1 ¼ /

ðþÞ þ /ð�Þ: This gives us a threshold k which
indicated the decision hyperplane. k is computed as:

k ¼
P
8i2/ðþÞ DðiÞ:/ðl;v;zÞ

jj
P
8i2/ðþÞ DðiÞ:/ðl;m;zÞjj

þ
P
8i2/ð�Þ DðiÞ:/ðl;v;zÞ

jj
P
8i2/ð�Þ DðiÞ:/ðl;v;zÞjj

ð7Þ

A sample is positive or client if it is located at the positive half of
k (which is the majority decision), otherwise it is a negative or an
imposter. The status is reversed if the minority of the positive in-
stances is rather located in the positive half space. Let c be denoted
by clients and p be the imposters. For a given training dataset con-
taining both positive and negative samples, where each sample is
(Si, yi); y e { ± 1} represents the corresponding class label, the fea-
ture selection algorithm is formulated as follows:

� Initialize sample distribution D0 by weighting every training
sample equally such that the initial weight w1,i = 1/2c, 1/2p for
y = 1 and �1, respectively.
� For the iteration t = 1, 2,..., T, where T is the final iteration, do:

(i) Normalize the weights, wt;i  
wt;iPN
i¼1

wt;i
, where wt is a probabil-

ity distribution and N is the total number of features.
(ii) Train a weak classifier ht for feature j, which uses a single

feature. The training error nt is estimated with respect to
wt such that:
D, and t
nt ¼
X

t

wt;ijhtðxiÞ� yij
2 ð8Þ
(iii) Select the hypothesis h1t with the most discriminating infor-
mation, that is to say, the hypothesis with the least classifi-
cation error n1t ; on the weighted samples.

(iv) Compute the weight xt that weights h
1
t by its classification

performance as:
xt ¼
1
2

ln
1
n1t
� 1

" #
ð9Þ
he real part of the Gabor kernels (2D) in the spatial and frequency domain.


Fig. 3. A Gabor magnitude response of face image: a sample image (left), the magnitude response image of the whole Gabor filter bank of 40 Gabor filters (right).

3386 E. Owusu et al. / Expert Systems with Applications 41 (2014) 3383–3390
(v) The weight distribution is then updated and normalized by:
wtþ1;i � wt;i:e�xt yi h
1
t ðS

1
t Þ ð10Þ
� The final feature selection hypothesis H(S) which is a function of
the selected features is denoted by:
HðSÞ¼ sgn
XT
t¼1

xt h
1
t ðS

1
t Þ

" #
ð11Þ

The selected features represent samples of the facial deforma-
tion patterns of the expressive face. The datasets which were
images from the JAFFE and Yale databases were partitioned into
training and testing by leave-one-out cross validation (Wu, Bru-
baker, Mullin, & Rehg, 2008).
Fig. 4. A 3-layer feed-forward neural network.
5. Multilayer feed-forward neural network (MFFNN) classifier

The selected features are fed into the constructed neural net-
work to train it to identify the seven universal facial expressions.
The architecture is a 3-layer feed-forward neural network and
trained by a back-propagation algorithm (Bouzalmat, Belghini,
Zarghili, Kharroubi, & Majda, 2011; Londhe & Pawar, 2012). The
back propagation algorithm basically replicates its input to its out-
put via a narrow conduit of hidden units. The hidden units extract
regularities from the inputs because they are completely connected
to the inputs. Every network was trained to give the maximum va-
lue of 1 for exact facial expression and 0 for all other expressions.
The construction is shown in Fig. 4. The input layer has 7 nodes,
each for each facial expression whiles the hidden layer had 49 neu-
rons; each expression for 7 neurons. We chose 7 neurons to com-
pensate for the target output of seven facial expressions. This was
the case for the seven prototypical facial expressions, which was
validated by the use of the JAFFE facial expression database. Since
the experiment was also validated using the Yale database, where
four expressions were used there was a slight modification in the
construction of the network for this application. Here, the hidden
layer neurons were settled at 16 each facial expression dedicated
for 4 neurons and 4 neurons in the output layer.

The input vectors of the network represented by X = [x1, x2,...,
xl]

T. The output layers are denoted by Y = [y1, y2,..., yk]
T. The optimi-

zation model is formulated as X : h ? Y. The output dataset of each

layer of the network is denoted by yj1; :::; y
j
k�1; y

j
k

	 
;

j ¼ 1; 2; :::; k � 1; k; where k � 1 corresponds to the total hidden
layers and k represents the total output layers. We denote the tar-
get datasets and its additive white noise by (t1, t2,..., tK) and
g = (e1, e2,..., eK), respectively. The variable K represents the total
patterns of the network. The corresponding vectors of the hidden
units are denoted by V = (v1, v2,..., vk�1).

The sigmoid activation function h = (h1, h2,..., hk) of each layer is
h1, h2,..., hk�1. The weights of the network are updated by
w1, w2,..., wk. The training epochs are 1000 and the target of error
is 0.001. The training algorithm is modeled as:

min
h1;h2;v 1;w1;w2

XK
j¼1
ðtj � yj2Þ

2
ð12Þ

Subject to the constraints

yj1 ¼ h1ðw1 x
jÞ; w1 2 Rv 1�M; yj1 2 R

v 1 ; xj 2 RM

yj2 ¼ h2ðw2; y
j
1Þ; w2 2 R

1�v 1 ; yj2 2 R
1

9>=
>; ð13Þ

The process of training involves weight initialization, calcula-
tion of the activation unit, adjustment, weight adaptation, and test-
ing for convergence of the network. Assuming vji represents the
weight between the jth hidden unit and ith input unit; and wkj rep-
resents the weight between the kth output and the jth hidden unit.
The activation unit is then calculated sequentially, starting from
the input layer. The activation of hidden and output unit is calcu-
lated as:


Table 2
Confusion matrix of 4-class facial expression recognition on Yale.

Neural (%) Sad (%) Happy (%) Surprise (%)

Neutral 86.16 9.81 0 4.03
Sad 8.35 86.79 0 4.86
Happy 1.7 0 97.60 0.7
Surprise 0.95 0.72 0 98.33

Average Recognition = 92.22%

E. Owusu et al. / Expert Systems with Applications 41 (2014) 3383–3390 3387
yðpÞj ¼ h
ðpÞ
yj

XI
i¼1

v ji zi � v jo

 !
ð14Þ

oðpÞk ¼ h
ðpÞ
ok

XJ
j¼1

wkjyj � wko

 !
ð15Þ

where yðpÞj is the activation of the jth hidden unit and o
ðpÞ
k is the acti-

vation of the kth output unit for the pattern, p. h is a sigmoid func-
tion. k is the total number of output units, I is the total number of
input units and J is the total number of hidden units. vjo is the
weight connected to the bias unit in the hidden layer, zo = �1 and
yo = �1. We adjusted the weights, starting at the output units and
recursively propagated error signals to the input layer. The detected
output oðpÞk is compared with the corresponding target value t

ðpÞ
k

which is a facial image, over the entire training set using the sig-
moid function to express the approximation error in the network’s
target functions.

EðpÞ ¼
1
2

XK
k¼1

tðpÞk � o
ðpÞ
k

	 
2
ð16Þ

The minimization of the error E(p), requires the partial deriva-
tive of E(p) with respect to each weight in the network to be com-
puted. The change in weight is proportional to the corresponding
derivative.

Dv jiðt þ 1Þ¼�g
@EðpÞ

@v ji
þ aDv jiðtÞ ð17Þ

Dwkjðt þ 1Þ¼�g
@EðpÞ

@wkj
þ aDwkjðtÞ ð18Þ

where, g is the learning rate, normally between 0 and 1, we set it to
0.9. The function a is also set to 0.9. The last term is a function of the
previous weight change.

@E
@v ji
¼
@yj
@v ji

XK
k¼1
�ðtk � okÞokð1 � okÞyj wkj ð19Þ

@yj
@v ji
¼ yjð1 � yjÞzi: Therefore,

Dv ji ¼ g
XK
k¼1
ðtk � okÞokð1 � okÞyjwkj yjð1 � yjÞzi ð20Þ

The weights are updated by,

wkjðt þ 1Þ¼ wkjðtÞþ Dwkjðt þ 1Þ ð21Þ

v jiðt þ 1Þ¼ v jiðtÞþ Dv jiðt þ 1Þ ð22Þ

where, t is equal to the current time step. Dvji and Dwkj are the
weight adjustments. We repeated the process once from the equa-
tion (14) in order to achieve the desired output.
Table 1
Confusion matrix of 7-class facial expression recognition on JAFFE.

Neutral (%) Sad (%) Fear (%)

Neutral 92.23 4.31 1.11
Sad 3.85 93.9 1.28
Fear 0 0.95 96.08
Anger 2.8 0 1.1
Disgust 0 0 0
Happy 0.21 0 0.07
Surprise 0 0 0.1

Average Recognition = 96.83%
6. Results and analysis

The facial expression recognition was validated with the JAFFE
and Yale facial expression databases.

The JAFFE database contains 213 images of 10 female Japanese
persons. Each respondent in the database posed three or four
examples of each of the seven facial expression prototypes (happy
(ha), sad (sa), anger (an), disgust (di), fear (fe), surprise (su), and
neutral (ne)). 2 images of each individual from each class of expres-
sion are randomly selected for training, leading to a total of 140
images corresponding to 65.7%, the rest was preserved for testing.
The trial was performed using tenfold cross-validation to obtain
the average recognition rate. In order to create distinct datasets
for cross-validation, none of the sets in the training folder appear
in any of the remaining folders.

The Yale facial expression database contains 165 grayscale
images in GIF format of 15 individuals. There are 11 images per
subject. Each subject exhibited one of the six facial expressions;
ha, ne, sa, sleepy (sl), surprise (su) and wink (wi). In this database
we manually extracted 130 images corresponding to ha, ne, sa,
and su. The datasets in this database were also partitioned into
training and testing by using the same method described for the
JAFFE. In due course about 77% of the images were used for training
and the remaining, for testing.

We recorded an average recognition rate of 96.83% in JAFFE and
92.22% in Yale on Intel(R) Core(TM) 2 Duo CPU P8400 @ 2.26 GHz
(2 CPUs) – 2.3 GHZ and 2.0 GHz RAM computer running on Win-
dows 7 Ultimate 64-bit (6.1, Build 7601) (see Tables 1 and 2 for
the confusion matrices).

The best recognitions were detected in ha, su and di, where we
obtained almost 100% in the JAFFE database. We saw that facial
images with extreme exhibited expressions recorded the best re-
sults. Generally, the performance of ne was the weakest; about
92.23% in JAFFE and 86.16 in Yale. The results show that the defor-
mations of the muscles around the mouth and the eyes are the
most reliable determinants for automatic facial expressions. This
accounts why recognition in the neutral face is poor. Thus the in-
crease in these muscle deformations increases the accuracies of
automatic recognitions. The execution time for a pixel of size
100 � 100 is 14.5 ms. Fig. 5 shows the comparative performance
of execution time with other neural network classifiers.
Anger (%) Disgust (%) Happy (%) Surprise (%)

2.35 0 0 0
0.97 0 0 0
0 1.83 0 1.14
96.10 0 0 0
0.61 99.91 0 0.29
0 0 99.72 0
0 0.03 0 99.87


0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10
5

0

100

200

300

400

500

600

700

800

Image Pixels

P
ro

ce
ss

in
g 

T
im

e 
(m

s)

Proposed method
Ma & Khorasani classifier
Kharat & Dudul classifier
Kumbhar et al. classifier

Fig. 5. Comparing execution time of different classifiers.

Fig. 6. Comparing recognition rates of different methods in JAFFE database.

Fig. 7. Comparing recognition rates of different methods in JAFFE database.

Fig. 8. Comparing the average recognition rates of different methods in JAFFE and
Yale databases.

3388 E. Owusu et al. / Expert Systems with Applications 41 (2014) 3383–3390
The proposed method was compared with three different clas-
sifiers to access its performance in terms of recognition accuracy
and execution speed. The system was also tested with some real
life images from the World Wide Web. The results indicate that
the proposed method is statistically better (p < 0.05) both in accu-
racy and speed. The three methods are described as follows:

Method I (same as Ma and Khorasani (2004) method): The fea-
ture detector is by a discrete cosine transform (DCT), pruning tech-
nique is used to reduce the input size of the network, the training
algorithm is back-propagation procedure, the expression classifier
is a feed-forward neural network with one hidden layer.

Method II (same as Kumbhar et al. (2012) method): The feature
detector is Gabor method, the feature dimensionality is principal
component analysis (PCA) and the expression classifier is a feed-
forward neural network.

Method III (same as Kharat and Dudul (2009)): The feature
detector is discrete cosine transform, the feature reduction is by
principal component analysis (PCA), and the classifier is a feed-for-
ward neural network.
All the classifiers were trained and tested with the same data-
sets used for the proposed method. Figs. 6 and 7 shows the average
recognition rates of various expressions in JAFFE and Yale respec-
tively; the intent is not to compare the performances of the two
databases but to investigate for the robustness of the system in di-
verse databases. Fig. 8 also shows the comparison of the overall
average recognition rates of various descriptors in JAFFE and Yale.
Fig. 9 shows sample real-time expression recognitions by the sys-
tem. The average recognition rates are also compared with other
methods that employed the same datasets in their experiment to
give a general idea of the performance of the proposed method
(see Tables 3 and 4 for details). However this does not signify a di-
rect comparison because the experiments were not conducted un-
der the same environment. The results show that the proposed
method is very encouraging. Though performance in the Yale facial
expression database is reduced as compared to that in the JAFFE fa-
cial expression database, it is far better than all the performances
in Yale we compared with.


Fig. 9. Sample facial expression recognitions: happy (left), anger (middle), disgust (right).

Table 3
Comparative performance of recognition rates in different methods on JAFFE
database.

Author Classifier/
method

Database Rate (%)

Lekshmi and Sasikumar (2009) SVM JAFFE 86.9
Kumbhar et al. (2012) Image feature JAFFE 60–70
Zhi and Ruan (2008) 2D-DLPP JAFFE 95.91
Zhao, Zhuang, and Xu (2008) PCA and neural

network
JAFFE 93.72

Lee, Huang, and Shih (2010) RDA JAFFE 96.7
Proposed method MFFNN JAFFE 96.81

Table 4
Comparitive performance of recognition rates in different methods on the Yale
database.

Author Classifier/method Database Rate (%)

Lekshmi and Sasikumar (2009) SVM Yale 89.5
Franco and Treves (2001) Neural network Yale 84.5
Proposed method MFFNN Yale 91.52

E. Owusu et al. / Expert Systems with Applications 41 (2014) 3383–3390 3389
7. Conclusions

This study employs many advanced techniques to improve the
recognition rate and execution time of facial expression recogni-
tion system. Face detection was carried out by the application of
Viola–Jones descriptor. Detected faces were down-sampled by
the Bessel transform. This approach reduced the image dimensions
and preserved the perceptual quality of the original image. An Ada-
Boost based algorithm was formulated to select a few hundreds of
Gabor wavelets from the several thousands of the extracted fea-
tures to reduce the computational cost and to avoid misclassifi-
cation as well. The selected features were fed into a well-
designed multilayer feed-forward neural network classifier. The
network is thus trained with sample datasets from both JAFFE
and Yale facial expression databases. The remaining datasets from
the two databases and some images from the World Wide Web
were used to test for the system. The execution time for a pixel
of size 100 � 100 is 14.5 ms; the average recognition rate in JAFFE
database is 96.83% and that in Yale is 92.22%. The proposed method
is compared with several methods and the performance is out-
standing. The results of the study also show that automatic expres-
sion recognitions are very accurate in surprise, disgusts and happy;
about 100%. Mild expressions like sad, fear and neutral have lower
accuracies. However fear can be very accurate when it is at the
peak because accuracies in recognitions largely depend on the
magnitude of facial deformations around the mouth and eyes. To
advance towards 100% efficiency we believe the development of
natural databases would be of more help since many artificial dat-
abases have many confused scenarios among facial expressions in
sad, neutral and mild anger. Again future improvements of recog-
nition accuracies will look at the possibility of increasing the num-
ber of hidden neurons to expressions that recorded lower values.

Acknowledgments

This paper is supported by the National Nature Science Founda-
tion of China (Nos. 61272211 and 61170126), the Natural Science
Foundation of Jiangsu Province (No. BK2011521), and the Research
Foundation for Talented Scholars of Jiangsu University (No.
10JDG065)

References

Aghajanian, J., Warrell, J., Prince, S. J., Li, P., Rohn, J. L., & Baum, B. (2009). Patch-
based within-object classification. In: IEEE 12th International Conference on
Computer Vision, 2009 (pp. 1125–1132).

Al Daoud, J. E. (2009). Enhancement of the face recognition using a modified
Fourier–Gabor filter. International Journal of Advanced Software Computer
Applications, 1.

Bouzalmat, A., Belghini, N., Zarghili, A., Kharroubi, J., & Majda, A. (2011). Face
recognition using neural network based Fourier Gabor filters & random
projection. International Journal of Computer Science and Security, 5, 376–386.

Breazeal, C., & Scassellati, B. (2002). Robots that imitate humans. Trends in Cognitive
Sciences, 6, 481–487.

Bruce, V. (1993). What the human face tells the human mind: Some challenges for
the robot–human interface. Advanced Robotics, 8, 341–355.

Cemre, Z. (2008) Facial Expression Recognition. MSc Thesis, University of Surrey.
Choi, E., & Lee, C. (2003). Feature extraction based on the Bhattacharyya distance.

Pattern Recognition, 36, 1703–1709.
Dailey, M. N., & Cottrell, G. W. (1999). PCA = Gabor for expression recognition. UCSD

CSE TR CS-629.
Deng, H. B., Jin, L. W., Zhen, L. X., & Huang, J. C. (2005). A new facial expression

recognition method based on local Gabor filter bank and PCA plus LDA.
International Journal of Information Technology, 11, 86–96.

Franco, L., & Treves, A. (2001). A neural network facial expression recognition
system using unsupervised local processing. In: IEEE Proceedings of the 2nd
International Symposium on Image and Signal Processing and Analysis (pp. 628–
632).

Freund, Y., & Schapire, R. E. (1995). A desicion-theoretic generalization of on-line
learning and an application to boosting. In Computational Learning Theory
(pp. 23–37). Berlin, Heidelberg: Springer.

Ganga, M. P., Prakash, C., Gangashetty, S. V. (2011). Bessel transform for image
resizing. In: IEEE 18th International Conference on Systems, Signals and Image
Processing, (pp. 1–4).

Jones, J. P., & Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter
model of simple receptive fields in cat striate cortex. Journal of Neurophysiology,
58, 1233–1258.

http://refhub.elsevier.com/S0957-4174(13)00961-5/h0010
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0010
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0010
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0015
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0015
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0015
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0020
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0020
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0025
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0025
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0035
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0035
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0045
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0045
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0045
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0055
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0055
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0055
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0065
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0065
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0065


3390 E. Owusu et al. / Expert Systems with Applications 41 (2014) 3383–3390
Kharat, G. U., & Dudul, SV. (2009). Emotion recognition from facial expression using
neural networks. Human–Computer Systems Interaction. Berlin, Heidelberg:
Springer (pp. 207–219). Berlin, Heidelberg: Springer.

Kumbhar, M., Jadhav, A., & Patil, M. (2012). Facial expression recognition based on
image feature. International Journal of Computer and Communication Engineering,
1, 117–119.

Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., von der Malsburg, C., Wurtz, R. P.,
et al. (1993). Distortion invariant object recognition in the dynamic link
architecture. IEEE Transactions on Computers, 42, 300–311.

Lee, C. C., Huang, S. S., & Shih, C. Y. (2010). Facial affect recognition using regularized
discriminant analysis based algorithms. EURASIP Journal on Advances in Signal
Processing, 1.

Lekshmi, V. P., & Sasikumar, M. (2009). Analysis of facial expression using Gabor and
SVM. International Journal of Recent Trends in Engineering, 2.

Londhe, R., & Pawar, V. (2012). Facial expression recognition based on Affine
Moment Invariants. International Journal of Computer Science Issues, 9.

Ma, L., & Khorasani, K. (2004). Facial expression recognition using constructive feed-
forward neural networks. IEEE Transactions on Systems, Man, and Cybernetics,
Part B: Cybernetics, 34, 1588–1595.

Maalej, A., Amor, B., Daoudi, M., Srivastava, A., Berretti, S. (2010). Local 3D shape
analysis for facial expression recognition. In: IEEE 20th International Conference
on Pattern Recognition (pp. 4129–4132).

Morik, K., Brockhausen, P., & Joachims, T. (1999). Combining statistical learning
with a knowledge-based approach – A case study in intensive care monitoring.
In: ICML (pp. 268–277).

Munoz, A., Blu, T., & Unser, M. (2001). Least-squares image resizing using finite
differences. IEEE Transactions on Image Processing, 10, 1365–1378.

Rucklidge, W. J. (1997). Efficient locating objects using Hausdorff distance.
International Journal of Computer Vision, 24, 251–270.

Satiyan, M., & Nagarajan, R. (2010). Recognition of facial expression using Haar-like
feature extraction method. In: IEEE International Conference on Intelligent and
Advanced Systems (pp. 1–4).
Sha, T., Song, M., Bu, J., Chen, C., & Tao, D. (2011). Feature level analysis for 3D facial
expression recognition. Neurocomputing, 74, 2135–2141.

Shen, L., & Bai, L. (2004). AdaBoost Gabor feature selection for classification. In:
Proceeding of Image and Vision Computing, (pp. 77–83), New Zealand.

Soyel, H., & Demirel, H. (2007). Facial expression recognition using 3D facial feature
distances. Image Analysis and Recognition. Berlin, Heidelberg: Springer
(pp. 831–838). Berlin, Heidelberg: Springer.

Tai, S. C., & Chung, K. C. (2007). Automatic facial expression recognition system
using Neural Networks. In: TENCON IEEE Region 10 Conference (pp. 1–4).

Tang, H., & Huang, T. (2008). 3D facial expression recognition based on properties of
line segments connecting facial feature points. In: 8th IEEE International
Conference on Automatic Face & Gesture Recognition (pp. 1–6).

Van, P. F. (2008). Discrete Wavelet Transformations: An Elementary Approach with
Applications. Hoboken, NJ: Wiley-Interscience.

Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal
of Computer Vision, 57, 137–154.

Wiskott, L., Fellous, J. M., Kruger, N., & vonderMalsburg, C. (1997). Face recognition
by elastic bunch graph matching. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 19, 775–779.

Wu, J. X., Brubaker, S. C., Mullin, M. D., & Rehg, J. M. (2008). Fast asymmetric
learning for cascade face detection. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 30, 369–382.

Zhang, D., & Lu, G. (2004). Review of shape representation and description
techniques. Pattern Recognition, 37, 1–19.

Zhang, L., & Tjondronegoro, D. (2009). Selecting, optimizing, and fusing ‘salient’ Gabor
features for facial expression recognition. Neural Information Processing. Berlin,
Heidelberg: Springer (pp. 724–732). Berlin, Heidelberg: Springer.

Zhao, L., Zhuang, G., & Xu, X. (2008). Facial expression recognition based on PCA and
NMF. In: IEEE 7th World Congress on Intelligent Control and Automation (pp.
6826–6829).

Zhi, R., & Ruan, Q. (2008). Facial expression recognition based on two-
dimensional discriminant locality preserving projections. Neurocomputing,
71, 1730–1734.

http://refhub.elsevier.com/S0957-4174(13)00961-5/h0070
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0070
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0070
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0075
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0075
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0075
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0080
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0080
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0080
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0085
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0085
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0085
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0090
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0090
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0095
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0095
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0100
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0100
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0100
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0115
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0115
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0120
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0120
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0130
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0130
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0140
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0140
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0140
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0155
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0155
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0160
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0160
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0165
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0165
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0165
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0170
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0170
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0170
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0175
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0175
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0180
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0180
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0180
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0190
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0190
http://refhub.elsevier.com/S0957-4174(13)00961-5/h0190

	A neural-AdaBoost based facial expression recognition system
	1 Introduction
	2 Face detection and down-sampling
	3 Gabor feature extraction
	4 Feature selection
	5 Multilayer feed-forward neural network (MFFNN) classifier
	6 Results and analysis
	7 Conclusions
	Acknowledgments
	References