MIJ2K Optimization using evolutionary multiobjective optimization algorithms


This document is published in:

Expert Systems with Applications, (2011), 38 (9), 10999–11010.  
DOI:http://dx.doi.org/10.1016/j.eswa.2011.02.143

© 2011 Elsevier Ltd.

http://dx.doi.org/10.1016/j.eswa.2011.02.143


Ab
tes
pro
com
out
Xip
wit

1

Ke
MIJ2K Optimization using evolutionary multiobjective optimization algorithms 

Alvaro Luis Bustamante ⇑, José M. Molina López, Miguel A. Patricio
Univ. Carlos III de Madrid, Avda. Univ. Carlos III, 22, 28270 Colmenarejo, Madrid, Spain

⇑ Corresponding author. Tel.: +34 918561338.

E-mail addresses: aluis@inf.uc3m.es (A.L. Bustamante), molina@ia.uc3m.es (J.M. Molina López), mpatrici@inf.uc3m.es (M.A. Patricio).
jective
 a high  
s. We tr  
ession r  
ssed wi  
f the op  
ss the s
stract: This paper deals with the multiob
ted and highly accurate algorithm with
blem including two competing objective
peting objectives are quality and compr

lined in this paper. Video will be compre
h.org Foundation repository. The result o
h different encoder parameters and discu
Multi-objective, Optimization, Vi

ferred to as bit rate). The best we could exp

video quality with the smallest file size, bu
conflict since better qualities inherently im

ywords:
 deo, Encoder

 definition of video compression and its optimization. The opti-mization will be done using NSGA-II, a well-
 conver-gence speed developed for solving multiobjective problems. Video compression is defined as a
y to find a set of optimal, so-called Pareto-optimal solutions, instead of a single optimal solution. The two
atio maximization. The optimization will be achieved using a new patent pending codec, called MIJ2K, also
th the MIJ2K codec applied to some classical vid-eos used for performance measurement, selected from the
timization will be a set of near-optimal encoder parameters. We also present the convergence of NSGA-II

uitability of MOEAs as opposed to classical search-based techniques in this field.
. Introduction

Nowadays, digital video is widely used for many purposes,
ranging from mere entertainment, such as TV, video conferencing
or video on demand, to more professional environments, such as
remote video surveillance. This wide range of applications is possi-
ble thanks to recent advances in digital video technology, like
broadband connections to the Internet, computing capacity and
the digital storage space of new devices.

However, the most important part of a digital video system is
the codec. The codec enables digital video compression (with the
encoder) and/or decompression (with the decoder). This reduces
the data necessary for representation. This is necessary because
uncompressed digital video still exceeds common network band-
widths for transmission and storage spaces for digital archiving.

Compression usually employs lossy data compression. Lossy 
data compression reduces a file by permanently eliminating cer-
tain information, especially redundant information. When the file 
is uncompressed, only a part of the original information is still 
there (although this may go unnoticed to the user, especially in vi-
deo and sound compressions). It inherently implies a reduction of 
the quality in exchange for a reduction in the final amount of 
information.

Systems like these introduce a complex trade-off between the
quality and quantity of data needed to represent the video (also re-
ect is to get the highest

t these objectives are in
ply a greater bit rate.
Thus, when compressing digital video, the user usually estab-
lishes encoder objectives or constraints, i.e. the maximum bit rate 
(normally used when there are bandwidth limits or storage space 
constraints (Jiang, 2006; Wang & Leou, 2003)), or video quality 
(rated on a 0 to 100 scale by some quality metric, etc., to ensure 
some quality of service (Ng, Leung, & Hui, 2005; Zhang, Zhu, & 
Ya-W, 2005)).

The encoder should compress the digital video according to 
these objectives, but it is not usually easy to get a direct correlation 
between these high-level objectives and low-level encoder param-
eters, because video encoder bit rate and video quality depend on 
several coding parameters, such as quantization parameter (Czuni, 
Csaszar, & Licsar, 2006), coding mode (kuang Chen, Vetro, & Sun, 
1997), macroblock sizes (Tu, Yang, Shen, & Sun, 2003), or motion 
compensation algorithms (Hang, Chou, & Cheng, 1997). Each of 
these parameters has its own attributes or thresholds and may 
have different effects. We have developed MIJ2K, a new video co-
dec, currently patent pending, and we need to optimize some 
parameters before its release.

In this paper, we present an algorithm to dynamically optimize 
the two conflicting objectives discussed above (video quality/bit 
rate) as well as translate the objectives to the video encoder 
parameters. Such algorithms are known as multiobjective optimi-
zation (MO) algorithms. MO optimization problems (Steuer, 1986; 
Sawaragi, Nakayama, & Tanino, 1985) are very common in many 
complex engineering situations, and can be found in many fields: 
product and process design, finance, aircraft design, the oil and gas 
industry, automobile design, or wherever optimal
1


decisions need to be taken in the presence of more than one, gen-
erally conflicting objective, preventing the simultaneous optimiza-
tion of each objective. Basically there are two major approaches for 
solving such MO problems. The first is to combine the individual 
objective functions into a single composite function and optimize 
this function only. This could be done with techniques such as 
the weighted sum method (Koski, 1988), utility theory (Thurston 
et al., 2006), etc.

The second major approach is to determine an entire Pareto
optimal solution set or a representative subset, i.e. a series of solu-
tions that are non-dominated with respect each other. Pareto opti-
mal solution sets are often preferred to single solutions because
they can be practical when considering real-life problems and give
the decision maker (DM) the option of evaluating the trade-offs
between different solutions.

The use of Multiobjective Evolutionary Algorithms (MOEAs) to
output the Pareto front between these two objective functions
(quality and bit rate) will satisfy all DM requirements, since the
Pareto solution set will offer a wide range of solutions ranging from
low quality, low bit rate to high quality, high bit rate, all of which
are optimal in some sense. DMs could use the solution set for many
applications with different constraints, such as real-time stream-
ing, controlling the bit rate according to available bandwidth; high
definition video, establishing high qualities; video storage with
space constraints, etc.

We will use the NSGA-II algorithm to obtain the Pareto front 
(Deb, Pratap, Agarwal, & Meyarivan, 2002). NSGA-II should maxi-
mize two objective functions: the quality measured by the peak-
to-signal noise ratio (PSNR), and the compression ratio (CR), which 
is a ratio between compressed video and original video sizes. To 
evaluate each set of parameters, we will test the video compressor 
with classical sequences used to evaluate encoders performance, 
like ‘Hall Monitor’ or ‘Akiyo’, selected from the well-known Xi-
ph.org Foundation repository (Lora, 1994-2008), which is suitable 
for evaluating video compression codecs.

Achieving compression and decompression to meet the fitness
function for the quality and bit rate objectives is a tedious process
due to the amount of data that has to be managed. Classical search-
based techniques will take a long time, whereas MOEAs are well
suited for such optimization problems.

The paper is organized as follows. Section 2 gives a general 
description of how the MIJ2K works. Section 3 defines the multiob-
jective problem. Section 4 specifies the conflicting objective func-
tions and the decision variables for optimization. Finally, Section 
5 presents the tests performed with the NSGA-II algorithm.
2. MIJ2K codec

This section outlines the MIJ2K codec to give an understanding
of how it works and the internal parameters needed to use evolu-
tionary algorithms for optimization purposes. The basics of video 
compression rely on two major methods: intra-frame and inter-
frame compression techniques (Shi & Sun, 2000). In intra-frame 
methods each video frame is an independent entity encoded with a 
still image compressor, usually JPEG (Pennebaker & Mitchell, 1993; 
Wallace, 1991) or JPEG2000 (Christopoulos, Skodras, & Ebrahimi, 
2000; Rabbani & Joshi, 2002). This technique is extre-mely useful in 
real-time environments, like video surveillance, due to its low 
computing complexity. On the other hand, inter-frame techniques 
employ advanced coding methods, using algo-rithms like motion 
compensation. These techniques are more bandwidth-efficient 
than intra-frame methods at the expense of more complexity.

In this paper, we optimize the internal parameters of the MIJ2K
codec, which is based on the JPEG2000 image compression stan-
dard, and a mixture of intra and inter-frame techniques. JPEG2000 
is a wavelet-based (Adams & Ward, 2001) image compression stan-
dard created by the Joint Photographic Experts Group committee in 
the year 2000, with the aim of superseding their original discrete 
cosine transform-based JPEG standard (dating from 1992).

JPEG2000 offers a modest increase in compression performance 
compared with JPEG, but its main benefit is significant code-
stream flexibility. The code stream obtained after compression of 
an image with JPEG2000 is scalable, meaning that it can be de-
coded in a number of ways. For instance, by truncating the code 
stream at any point, we can get a representation of the image at 
a lower resolution or signal-to-noise ratio. By ordering the code 
stream in various ways, applications can achieve significant perfor-
mance increases (Adams, 2001).

Apart from the above features, the main benefit of using 
JPEG2000 for video streaming is that, unlike other video compres-
sors including MPEG-4, compression could be done in real-time 
(Luis & Patricio, 2000) because it is an intra-frame codec. Thanks 
to this feature, JPEG2000 can be used in events that require real-
time transmission, like video surveillance.

As far as our proposal is concerned, however, the main advan-
tage is that the image can, optionally, be partitioned into smaller 
independent non-overlapped rectangular blocks called tiles (ISO/
IEC, 2000). We will exploit this exceptional feature, provided by 
this compressor alone, to perform real-time inter-frame compres-
sion using the proposed conditional tile replenishment method, 
which we optimize in this paper. We will employ a new block-
based difference coding technique (Shi & Sun, 2000) for this task, 
having tiles assume the role of blocks.

Tiles can be of any size, and the whole image can even be con-
sidered as one single tile. Once the size has been chosen, though, all
the tiles will be of the same size (except, optionally, tiles on the
right and bottom borders). Dividing the image into tiles is advan-
tageous in that the encoder/decoder will need less memory to en-
code/decode the image. Also it can opt to encode/decode only
selected tiles to achieve a partial coding/decoding of the image. It
will provide full control of whatever area of the image is being
compressed, decompressed, transmitted, etc.

Fig. 1 shows an example of how the J2K code stream is struc-
tured, and how an image is divided using tiles. The first marker 
present is the start of code stream (SOC). This is followed by a 
main header (MH), which includes the common parameters re-
quired for image decoding. The tile-part header (TH) contains 
the necessary information for decoding each tile. It is followed 
by the corresponding tile-part bit-stream. Finally, the end of 
code-stream (EOC) marker denotes the termination of a J2K code 
stream.

Notice that each region of the image occupies a definite region 
in the J2K code stream. Thanks to the header definition method 
(ISO/IEC, 2000), each region can also be accessed randomly.

The inter-frame technique adopted in MIJ2K is a real-time spe-
cific block-based difference coding, adapted to JPEG2000 streams.
This technique is useful in the real-time transmission scheme be-
cause it provides a low computational complexity. It also preserves
the real-time latency provided by the native JPEG2000 intra-frame
architecture.

The MIJ2K architecture designed for this task is outlined in 
Fig. 2 and explained in more detail in the following sections. The 
general operating procedure is as follows.

A common JPEG2000-based streaming system is basically di-
vided into three steps. The first step is related to frame acquisition
and compression. It is followed by the transmission of the resulting
compressed frames. Finally, it ends with the reception and display
of each frame. The acquisition and compression step in these
systems is not complex. Each frame is acquired and compressed
separately, and can be transmitted as soon as it is compressed.
2


Fig. 2. Overall functioning of the MIJ2K architecture, performing real-time selective tile compression and transmission.

Fig. 3. ‘Akiyo’ sequence. Tiles detected as changing in frame 127. In this frame, a
bandwidth saving of around 83% is achieved.

Fig. 1. JPEG2000 code-stream structure.
In the proposed MIJ2K streaming architecture, an extra process
is inserted in the compression step. Instead of compressing each
whole frame, it compresses and transmits only the areas that are
different (changing tiles) from the previously transmitted frame.
Compressing only changing tiles improves compression, transmis-
sion, and decoding performance, since it hugely reduces the total
amount of data for management.

For example, Fig. 3 shows the tiles detected as changing in
frame 127 of the ‘Akiyo’ sequence. They are marked with a green
rectangle.1 Notice how only 17% of the tiles in this frame are de-
tected as changing. When the MIJ2K method is applied to the
whole ‘Akiyo’ sequence, it saves around 87% of bandwidth com-
pared with a native JPEG2000 streaming system.

Changing tiles are detected using a reference frame FRi that
stores a representation of the last transmitted frame. Each new
frame FSi to be transmitted is compared tile by tile with the refer-
ence frame FRi in order to detect the tiles that differ with their
counterparts. FRi is updated frame by frame with the tiles that are
detected as ‘changing’ to create a real representation of what the
client is viewing.

The client receiving this modified stream (a JPEG2000 code
stream containing just some of the tiles) will have to decode each
1 For interpretation of colour in Fig. 3, the reader is referred to the web version of
this article.
tile received and use it to update the displaying frame FDi . The cli-
ent can easily locate each tile position since they are identified by a
tile number. Knowing that all tiles (except image boundaries) are
3


of the same size, it is easy to calculate the position of the tile inside
the full image.

Some of the subsystems illustrated in Fig. 2 (necessary for
understanding which parameters to optimize), and their design de-
tails are explained in more detail in the following sections.

2.1. JPEG2000 encoder

This module compresses the JPEG2000 still images. It basically
takes and compresses the source frame FSi using tile partitioning.
The encoder should be modified to assure that only the tiles spec-
ified by the TINDEXi parameter, and not all the tiles of the frame are
compressed. This is feasible since the tiles in JPEG2000 images can
be compressed and decompressed separately. Ideally, then, only
the changing tiles are compressed. This will save compression
time, improving overall system operation. All frames will be com-
pressed with the same quality, given by bppi.

Fig. 4 is a diagram of this module. It shows all the required in-
puts and outputs. These are described in more detail in the
following:

� Source frame FSi : this is the last image acquired by the digitizer
board or digital camera, that is, the frame that is going to be
transmitted. It should be formatted in some J2K encoder-under-
standable format, for instance, a RAW 8bpp or 24bpp RGB image
(depending on whether it is a gray-scale or color picture).

� Quality bppi: this parameter is related to compression quantity,
or target quality after each still image has been compressed.
This parameter is usually expressed as the quantity of bits used
to represent each pixel in the generated JPEG2000 code stream,
that is, bits per pixel (bpp).

� Tile size TSIZE: this parameter defines the size of the tiling per-
formed in the J2K compressed image. Once the tile size has been
set, all images will be compressed in separate squared regions
JPEG2000 
Encoder

Source 
Frames

Index of Changing Tiles: 3, 7, 15, …

MAIN HEADER
Tile #3

EOC

SOC

Tile #7
Tile #15

Parameters:
Bits Per Pixel

Tile Size (32x32)

Resultant
J2K Codestream

Fig. 4. MIJ2K encoder operation.

Fig. 5. Overlay areas with different tile sizes. Fro
of the same size. Tile size is variable, but, theoretically, small
tile sizes can achieve a better fitting to moving objects. This is
illustrated in Fig. 5, where the soccer player is a moving object in

the sequence, and little tiles are a better fit for the
player.

� Changing tiles index TINDEXi : this input is delivered by the motion
measurement subsystem (this process is detailed later), as
shown in Fig. 2. It indicates the index of the tiles that contain
some movement, that is, the tiles that should be compressed.
In this case, the JPEG2000 encoder should compress these tiles
(regions) of the image only. This will improve the compression
delay since the encoder does not have to compress the whole
image.

� J2K code stream FMIJ2Ki : this is the J2K code stream output after
compression. It should contain only the changing tiles specified
in the TINDEXi input. The tiles bit-stream should be between the
main header and end of code-stream markers, as shown in
Fig. 4. In this case, only tiles 3, 7 and 15 have been detected
as changing. The FMIJ2K output will be passed directly to thei
packetization subsystem, as illustrated in Fig. 2, which will
transmit the frame over the network.

This process is defined as in Eq. (1), where the J2K function rep-
resents the encoder, and the inputs of the function are source
frame FSi , index of tiles for compression T

INDEX
i , quality bppi and size

of tiles TSIZE with the default value 32 � 32.

FMIJ2Ki ¼ J2K F
S
i ; bppi; T

INDEX
i ; TSIZE

� �
ð1Þ

Notice how the compression module introduces two variables for
optimization: bppi and TSIZE. A priori we do not know what is the
best value for the two variables, since we do not know how quality
or tile size affect the motion compensation algorithm and the final
video output.

2.2. Motion measurement

This subsystem manages the motion measurement between
two consecutive frames and detects the tile index of images that
contain some movement. The complexity of the algorithm pro-
posed for this task is low with a view to meeting the needs of
real-time transmissions rather than aiming for the highly efficient
motion detection performed by common inter-frame encoders.
Such sophisticated techniques, as used in H.264/MPEG-4 AVC, are
not usually feasible for critical real-time environments, such as
video surveillance.

This algorithm, as shown in Fig. 6, takes two frames, FSi and F
R
i

(source and reference), for comparison. It also has to know the
selected tile size TSIZE used in the compression step.
m left to right 16 � 16, 32 � 32 and 64 � 64.
4


Motion 
Measurement

Source 
Frames

Tile Size (32x32)

Index of Changing Tiles: 3, 7, 15, …

Reference 
Frame

Fig. 6. Motion measurement input/output.
All the inputs and outputs of this system are described below:

� Source frames FSi and FRi : FSi should be the same image as used in
the J2K encoder in some affordable format where the pixel val-
ues of the image can be directly manipulated. The other
required image is the reference frame FRi . For comparison pur-
poses, it should have the same format as the source frame FSi .

� Tile size TSIZE: as compression is performed using the concept of
tiling, motion should be measured in tile units. So, this subsys-
tem must know the working tile size TSIZE.

� Changing tiles index TINDEXi : as mentioned in the JPEG2000 enco-
der section, this subsystem should provide the index of tiles
containing some movement. The indexes could range from 0
to the total amount of tiles in any order and without any restric-
tion. If no tiles with movement are detected (the current frame
is almost equal to the last frame), this subsystem should some-
how notify this circumstance and then send no tile for the
current FSi frame (but the client side must be notified).

TINDEXi ¼ Motion FSi ; FRi ; TSIZE
� �

ð2Þ

So, this subsystem can be defined as in (2). In this case, we will
specify what the ‘Motion’ function does in more detail, since this is
the most important part of the adopted inter-frame technique.
Fig. 7 is a detailed illustration of this function. We also describe
all the tasks involved as follows:

� Preprocessing: this task prepares the source frame FSi for motion
measurement. The conversion performed is related to the
extraction of image intensity information, that is, the Y-lumi-
nance component of the YUV color space. The conversion spec-
ified in (3) is applied for RGB images. This way we can work
with one simple representation of the image, leading to a faster
analysis of source frames.
Sou
Fram

Tile 

Source Reference 
Tile Change 
Detector

rce 
es

Size (32x32) Index of 
Tiles

Color 
Conversion

Simple 
Blur Filter

Preprocessing

Reference 
UpdateReference 

Frame

Fig. 7. Detailed MIJ2K motion measurement.
Y ¼ 0:2999R þ 0:587G þ 0:114B ð3Þ
Once the source frame has been correctly transformed to a
gray-scale image, an optional simple blur filter is used to re-
duce excessive detail and noise present in many surveillance
cameras. We do not go into any more detail about the prepro-
cessing algorithm and motion algorithm improvements, but
we will try to determine whether the use of simple blur really
does improve compression.
So, the preprocessed frame output by this subsystem, Fi

P,
should be a gray-scale image conversion of Fi

S passed through
a simple blur filter, as described in (4).

FPi ¼ Blur GrayScale FSi
� �� �

ð4Þ
� Tile change detector: this module, illustrated in Fig. 7, should
detect the tiles that contain movement for the purpose of selec-
tive tile compression and transmission. The method used in this
subsystem should be relatively simple in order to meet real-
time requirements.
This subsystem compares the whole preprocessed frame Fi

P

against the reference frame Fi
R tile by tile. It detects how much

movement there is in each tile to decide whether or not it
should be transmitted. Fig. 8 shows how this subsystem works.
It is also described in detail in the following:

� Absolute difference ABSi[x]: each tile from both preprocessed
and reference frames is passed through an absolute difference
filter to detect absolute changes between two tiles. The compu-
tational complexity of this operation is low, and it is useful for
detecting objective differences between tiles. It generates a
black-and-white image in which white pixels represent differ-
ences, whereas black pixels signify no changes. This is described
in (5), where FPi½x� and F

R
i½x� are tile x of frame i in the preprocessed

P and reference R frames. On the other hand, ABSi[x] represents
the absolute difference between the tiles. In this case, both
FPi½x� and F

R
i½x� are two matrices of u � v containing all the tile’s

pixel values.
ABSi½x� ¼ FPi½x� � FRi½x�
��� ��� ð5Þ

The tile image ABSi[x] output by the absolute difference process
should be analyzed in order to detect how much movement
there is, and thus decide whether or not it should be transmitted.
Absolute 
Difference

Tile

Max. Pixel 
Value

Mean Pixel 
Value

Tile

Thresholding

Changing Tile 
(Yes/No)

Ti
le

 C
ha

ng
e 

D
et

ec
to

r T
hr

ea
d

[ ] [ ]

[ ]

[ ] [ ]

Fig. 8. MIJ2K tile change detector.

5


Changes should be measured somehow, and here we propose
two efficient methods, which should work together.
� Mean value MEANi[x]: this is the first measurement. It is the

mean pixel value of the ABSi[x] tile. This process takes all the val-
ues of the ABSi[x] tile, calculates the total and divides this by the
number of tile pixels, as described in (6).
MEANi½x� ¼
Pu�1

s¼0
Pv�1

t¼0 ABSi½x�ðs;tÞ
u � v

ð6Þ
� Max value MAXi[x]: the second measurement is the maximum
pixel value of the ABSi[x] tile, as described in (7). This is useful
for detecting movement peaks in tiles, since they could be over-
looked by the mean value when nearby pixels values are near to
zero.
Fig. 9. Pareto front solutions.
8s; t=s P 0 ^ s < u; t P 0 ^ t < v
MAXi½x� ¼ ABSi½x�ðs;tÞ ():9ABSi½x�ðg;hÞ > ABSi½x�ðs;tÞ

ð7Þ

Operating in conjunction, MEANi[x] and MAXi[x] can detect all
kinds of movements, ranging from small uniform variations
(using the mean) to occasional big changes (using max thresh-
old). Both metrics are easy to implement, providing a very low
complexity method for detecting movement.
� Thresholding: together MEANi[x] and MAXi[x] indicators tell us

when the tile should be transmitted. In this way, there is said
to be a big enough change in a tile to warrant transmission
when both values are above some threshold.
� Reference update: this subsystem is used to update the refer-

ence frame FRi . The first reference frame, F
R
0 , is an entirely black

image, and it is updated frame by frame with the tiles that are
different from the preprocessed frame FPi . Logically, the first
frame F P0 will update all the tiles of the reference frame. The ref-
erence frame is also updated directly from the blurred image
output by the preprocessing subsystem. This will stop the refer-
ence frame from having to be preprocessed each time and
reduce system complexity.
This subsystem introduces two new parameters for optimiza-
tion, which are MEANi[x] and MAXi[x]. These variables determine
motion detection algorithm sensitivity. A priori it is not feasible
to set values for these parameters manually, and we will have
evolutionary algorithms select the correct values.

3. Multiobjective optimization

As discussed in the introduction, one way to solve MO problems
is obtain the entire Pareto front. This is a common research field, 
where the real challenge is to obtain the Pareto front with the min-
imum iteration length, because some objectives could be very 
costly to evaluate, and traditional search-based techniques are 
very time inefficient. In this way new evolutionary algorithms (EAs) 
(Back, Fogel, & Michalewicz, 1997) have been successfully 
extrapolated to multiobjective problems. EAs are well suited to 
MO optimization problems as they are fundamentally based on 
biological processes, which are inherently MO.

In this case, we will apply Multiobjective Evolutionary Algo-
rithms (MOEAs) (Deb et al., 2001; Coello, Lamont, & Veldhuizen, 
2006) to obtain the Pareto front within the video compression field. 
Compression is done by reducing the redundant information, like 
spatial and temporal redundancies, present in video scenes to the 
minimum. This way we can output compressed videos with 
apparently no loss of information, where the final amount of infor-
mation needed for representation is reduced.

The problem in this case is that the loss of information (albeit
redundant) amounts to a reduction in the quality or similarity of
the compressed video compared with the original video. Fewer
data necessarily imply lower quality, and, when trying to optimize
the relation between conflicting objectives of compression and
quality, we are faced with a MO problem.

There is then not a single problem solution that satisfies both
objectives, where we obtain the highest compression ratio with
the highest quality. The solution in this case is to output the best
quality to compression ratio across the full encoder operating range
and let the DM select the solution that meets his or her requirements.

The Multicriteria Decision Making (MCDM) (Ehrgott & Gandib-
leux, 2002) literature describes two general approaches for solving 
user-preference mechanisms involving MO problems. In the first 
approach, the DM gives preferences first and the algorithm outputs 
a set of solutions of preferred regions of the Pareto front according 
to preferences (a priori methods). In the second approach, the algo-
rithm provides almost all the Pareto front solutions, and the DM 
selects the interesting options (a posteriori methods) (Miettinen,
2001).

A priori methods are preferred when there are many objective
functions and the search space grows, since, in such cases, the
computing resources can be focused on the preferred areas, return-
ing results sooner. In this case, however, we are working with MO
problems in the video compression field, where there are only two
objective functions. Moreover, a wide encoder operating range
must be covered, since DM constraints can easily vary depending
on the use to which the encoder is put. So, a posteriori methods will
be applied in this case to obtain the full Pareto front.

As shown in Fig. 9 we expect to obtain the full Pareto front be-
tween the conflicting objectives of quality, measured by PSNR, and 
CR. We can formally describe this problem as a set of objective 
functions to be jointly maximized (or minimized). This can be 
generally defined as a set of functions described in Eq. (8):

maximize f n ð~vÞ where n ¼ 1; 2; . . . ; N ð8Þ

In this case fnð~vÞ is an objective function, and the solution ~v is a
vector of M decision variables, which is given by ~v ¼ðv 1; v 2; . . . ;
v MÞ. The values of these decision variables should be between upper
and lower bounds as defined by the problem.

Once we have obtained solutions, we can compare them using
the notion of dominance, that is, given a decision vector ~u ¼ðu1;
u2; . . . ; ukÞ is said to dominate other ~v 2 ¼ðv 1; v 2; . . . ; v kÞ, if and only
if

8i 2 f1; . . . ; kg; ui P v i ^ 9i0 2 f1; . . . ; kgjui0 > v i0 ð9Þ
6


In other words, the decision vector ~u is said to dominate ~v if and
only if ~u is at least as good as ~v for all objectives and ~u is better
than ~v for at least one objective. The Pareto optimal set is given
when a decision vector is not dominated by any other vector in
the search space.
4. MOEA approach to video encoder optimization

In this section we discuss the decision variables that the enco-
der uses, which we will optimize using NSGA-II, specifying the val-
ues of the upper and lower bounds of each one, and how they affect
the encoder. Moreover we detail the objective functions used to
obtain the Pareto front.

4.1. Objective functions

We will use two standard functions in the scope of video
compression to evaluate the performance of each vector of solu-
tions. The first function, f1, is the average PSNR (12), that repre-
sents an objective quality metric, and the second function, f2, is
the compression ratio (13), which describes a ratio of compressed
to original video sizes. These functions are described below:

MSEf ¼
1

MN

XM�1
i¼0

XN�1
j¼0
kIfði; jÞ� K fði; jÞk

2 ð10Þ

PSNRf ¼ 20 � log10
255ffiffiffiffiffiffiffiffiffiffiffi
MSEf

p
!

ð11Þ

f1 ¼
P#Frames

f¼1 PSNRf
#Frames

ð12Þ

f2 ¼ CR ¼
UncompressedSize

CompressedSize
ð13Þ

The mean square error (MSEf) is calculated pixel by pixel for each
video frame f of M � N pixels between the original frame, repre-
sented by If, and the reconstructed compressed frame, Kf. Then,
the PSNRf is calculated for each video frame f over the MSEf value.
Notice that MSE is calculated over the luminance component Y’
(brightness) of the YUV color space to compute the PSNR.

These two objective functions should be maximized, since
higher values of CR will mean lower compressed video sizes, and
a greater number of dBs in the PSNR metric is equal to a lower
MSE, leading to a greater similarity between original and com-
pressed video frames.

4.2. Decision variables

The encoder we intend to optimize is based on intra-frame en-
coder techniques (Connor, Brainard, & Limb, 1972), but has been 
extrapolated to inter-frame coding (Brofferio & Rocca, 1977), basi-
cally using a video scene motion detection system (MMS) with the 
aim of performing simple motion compensation. It is based on 
JPEG2000 (ISO/IEC, 2000) and its still images compression method. 
This essentially divides each frame into independent accessible 
square regions, also called tiles. Our encoder uses tiles for condi-
tional replenishment depending on whether or not motion is 
detected in the same way as similar encoders do with empty 
macroblocks.

Each frame or still image in the video sequence can be com-
pressed using JPEG2000 to a specific quality, which will determine
the final quality and size of the compressed video in combination
with the motion detection system.

All the decision variables or low level encoder parameters that
we will try to optimize are summarized below, stating the range of
values of each one, divided into functional units.
1. Tiling setup
� TSIZE = {16 � 16, 24 � 24, 32 � 32}

2. Motion detection system
� Smoothing = {Enabled, Disabled}
� MEANi[x] = [0, 255]
� MAXi[x] = [0, 255]

3. Still image quality control
� Bits per pixel in 32 � 32 (bpp32) = [0.4, 7.5]
� Bits per pixel in 24 � 24 (bpp24) = [0.7, 8.73]
� Bits per pixel in 16 � 16 (bpp16) = [1.2, 11.0]

Tiling setup represents the size of the squared regions of the
still images once compressed. The width and height of each tile
must be the same. In theory, lower tile sizes will perform better
compression, since they make the motion compensation system
more precise. On the other hand, they add an extra overhead, mak-
ing the compressor less efficient.

Motion detection parameters are the attributes that manage the
motion measurement subsystem (MMS) in tile regions. Large val-
ues of MEANi[x] and MAXi[x] will represent a high MMS sensitivity,
and lower tile reusability, which will imply better quality but less
CR. We think that, in the case of Smoothing, which is related to the
simple blur filtering, improves the MMS by reducing the noise that
is usually present in videos, but we really do not know exactly
what effect it has and whether there is any real improvement.

On the other hand, still image quality control only has one
parameter. It represents the bits per pixel that the JPEG2000 com-
pressor should use for each still frame. Again, higher values of this
parameter will lead to better quality but less CR. Notice that, for
each tile size, there is a specific operating range for this value.

Observe how each of these low level encoder parameters actu-
ally represents a MO problem. For this reason, we will try to obtain
a set of parameter values that achieves a set of solutions near to the
optimal Pareto front.
5. Experiments

In this section we present all the tests performed to get the
near-optimal Pareto solutions set of the encoder, using
the NSGA-II MOEA. We detail both the purposes of the tests and
the parameters used in each execution and discuss the results.
For these experiments, we will use a posteriori methods, that is,
output almost the entire Pareto front of the encoder, leaving it to
the decision maker to choose the solution that meets his or her
requirements a posteriori.

The decision variables for optimization are the stated in Section 
4.2. Notice that TSIZE and Smoothing are discrete variables. There-
fore, we will perform additional tests to determine their best val-
ues. We will use the concept of Pareto dominance to compare 
the solutions achieved by the NSGA-II algorithm.

The NSGA-II algorithm will run a maximum of 200 generations 
with an initial population of 200. The population size has been set 
in order to ideally get an individual about every 0.15 dB, covering 
the 25 to 55 dB encoder operating range. This way the DM’s solu-
tion selection will be more precise in terms of quality. Each test will 
be performed twice with the Hall Monitor Fig. 10(a) and Akiyo 
10(b) test sequences.
5.1. Video test sequences

The test video sequences used to evaluate the encoder are very
commonly used to evaluate encoder performance. Selected se-
quences are 10(a) and (b) from the Xiph.org repository. They are
uncompressed CIF-format sequences with a length of 251 frames
and a size of 352 � 288 pixels.
7


Fig. 10. Test video sequences.

Fig. 11. Smoothing test [Hall Monitor].
5.2. Smoothing test

The first test performed is related to the MMS, and particularly
to determining how the Smoothing parameter affects f1 and f2
objective functions when optimizing the MEANi[x], MAXi[x], and Bits
per pixel decision variables. Smoothing is a low level encoder
parameter that can be enabled or disabled only. It should improve
the performance of the encoder when enabled, but we have no
hard evidence of this. It should be the first test run since this
parameter depends on the tile size parameter, but not viceversa.
In this case then we can establish an arbitrary tile size, i.e.
32 � 32. The Bits per pixel range of values is set according to the tile
size, as defined in the decision variables section and shown in Table
1.

This test will generate two Pareto fronts that differ only as to
whether the encoder uses Smoothing prefiltering. So, we could
use the concept of Pareto dominance to determine whether
Smoothing is really useful.

Figs. 11 and 12 both show the Pareto solution set obtained for
each test video with the Smoothing parameter enabled and dis-
abled. It is clear that the Pareto solution sets with the Smoothing
parameter enabled dominate the others with Smoothing disabled.

This means that the Smoothing parameter really does improve
the MMS and the final compression performance when enabled.
So, for the best results, the Smoothing value should be enabled dur-
ing the compression. In the following tests, Smoothing will always
be enabled.

5.3. Tile size test

In the second test we will try to determine the influence of the
tile size on compression. We can determine whether there is a tile
size that outperforms the others. A priori, lower tile sizes should
perform better compression, improving the MMS, since smaller tile
sizes usually fit any object in the image better. Also other video
codecs use small macroblock sizes, i.e. H.264 (Wiegand, Sullivan,
Bjntegaard, & Luthra, 2003) that works with 16 � 16, 8 � 8 and also
4 � 4.
Table 1
Encoder parameters Test1.

Test1 Run1 Run2

TSIZE 32 � 32 32 � 32
Smoothing Enabled Disabled
MEANi[x] [0 255] [0 255]
MAXi[x] [0 255] [0 255]
bppi [0.4,7.5] [0.4,7.5]

Fig. 12. Smoothing test [Akiyo].

8


Note, however, that more data are necessary to represent smal-
ler tile sizes in JPEG2000, i.e. observe how the lower bound of the
Bits per Pixel low level encoder parameter grows as tile size de-
creases. This indicates that there is an extra overhead on the inclu-
sion of each tile.

In this test we want to compare the Pareto fronts achieved by
the MEANi[x], MAXi[x], and Bits per pixel decision variables with the
different possible values for the Tile size parameter. We could
determine the best value for this parameter comparing the Pareto
solution sets.

Table 2 describes the parameters used in this case for the three
executions of this test, one for each tile size. Notice, in this case,
that Smoothing is enabled in all tests, as a consequence of the pre-
Table 2
Encoder parameters Test1.

Test2 Run1 Run2 Run3

TSIZE 16 � 16 24 � 24 32 � 32
Smoothing Enabled Enabled Enabled
MEANi[x] [0 255] [0 255] [0 255]
MAXi[x] [0 255] [0 255] [0 255]
bppi [1.2,11.0] [0.7,8.73] [0.4,7.5]

Fig. 13. Tile sizes test [Hall Monitor].
vious result. This test will provide three Pareto fronts, indicating
the best tile size to be used with the encoder.

Fig. 13(a) and Fig. 14(a) show how all the Pareto fronts obtained
are very similar for each tile size. But there is a slight performance
improvement when using 32 � 32 tiles. Fig. 13(b) and Fig. 14(b)
show a detailed area of the Pareto front, located at the Pareto front
inflection point, clearly indicating how the 32 � 32 Pareto front
dominates the other Pareto solution sets.

From the comparison of these Pareto fronts generated with
NSGA-II we can select the best value for the Tile Size low level en-
coder parameter, which will be set at 32 � 32. We also have to
prove how the overhead present with small tile sizes is not offset
by the theoretical improvement in the MMS.

5.4. Results of optimization

In the previous tests we have determined the best values for the
discrete encoder parameters (TSIZE and Smoothing), and we now
know that TSIZE = 32 � 32 and Smoothing = enabled improves the
compression. The remaining parameters (MEANi[x], MAXi[x] and
bppi) are continuous, and their values are determined for each vec-
tor of solutions by the NSGA-II algorithm.
Fig. 14. Tile sizes test [Akiyo].

9


Fig. 15. Final Pareto solution set [Hall Monitor].

Fig. 16. NSGA-II converg
The DM using the results of this optimization could select any
solution from the Pareto front that meets his or her requirements
(in terms of quality or compression ratio). The Pareto front con-
tains a set of values of MEANi[x], MAXi[x] and bppi. The DM can also
use the Pareto front to evaluate the trade-offs between all solu-
tions. In this section we will report the Pareto solution sets for each
video sequence.

Figs. 15 and 17 represent the best Pareto solution sets for
each compressed video. They illustrate how these solution sets
cover a wide operating range in terms of quality and compres-
sion ratio.

Also we want to show the convergence speed of NSGA-II for
such problems. Fig. 16(a)–(d) show the evolution of generations
for the Hall Monitor test video, and how the solution approximated
in the 20th generation is almost equal to the Pareto front solution
set obtained with 200 generations, illustrated in Fig. 15.

With the Akiyo test sequence, on the other hand, convergence
speeds up, and, by the 5th generation (Fig. 18), we get a good Par-
eto solution set, compared with the solution obtained in 200th
generation, shown in Fig. 17.
ence [Hall Monitor].

10


Fig. 17. Final Pareto solution set [Akiyo].

Fig. 18. NSGA-II conv
This demonstrates the suitability of using MOEAs for such prob-
lems that have a large search space and are not suited for approx-
imation using classical search-based techniques. In actual fact,
they take about two minutes to evaluate f1 and f2 objective func-
tions for each individual in this problem.
6. Conclusions

In this paper we have worked with the multiobjective definition
in the field of video compression. We have used NSGA-II to opti-
mize the internal parameters of a new video codec called MIJ2K,
but this procedure could be extrapolated to any video compressor
with good results. Using MOEAs we can output an entire Pareto
solution set covering the whole operating range of the codec for
both quality and compression ratio.

Also we show the suitability of using MOEAs in the field of video
compression, since they achieve a fast convergence speed, as re-
quired by such problems due to the cost of evaluating each objec-
tive function.

Even so, we intend to further research this problem. In this pa-
per, we have optimized the low level encoder parameters in order
ergence [Akiyo].

11


to create a Pareto solution set for a specific test video. The solutions
obtained can be used with different videos that share similar pro-
files to the videos used here with the aim of reusing the
optimization.

In future work, we will research real-time optimization depend-
ing on given constraints independently of the video that is being
compressed. This is potentially very useful, for instance, when
the video compressor is used for real-time streaming purposes
and the network constraints are variable depending on load. The
algorithm should then search new optimal solutions for the new
given constraints.

Acknowledgments

This work was supported in part by Projects CICYT TIN2008-
06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, SINPROB,
CAM MADRINET S-0505/TIC/0255 and DPS2008-07029-C02-02.

References

Adams, M. 2001. The JPEG-2000 still image compression standard. ISO/IEC JTC 1/SC
29/WG 1, 2412.

Adams, M., Ward, R. 2001. Wavelet transforms in the JPEG-2000 standard. In
Proceedings of IEEE Pacific Rim Conference on Communications, Computers and
Signal Processing, vol. 1, pp. 160–163.

Back, T., Fogel, D. B., & Michalewicz, Z. (Eds.). (1997). Handbook of Evolutionary
Computation. Bristol, UK, UK: IOP Publishing Ltd..

Brofferio, S., & Rocca, F. (1977). Interframe redundancy reduction of video signals
generated by translating objects. IEEE Transactions on Communications [legacy,
pre-1988], 25(4), 448–455.

Christopoulos, C., Skodras, A., & Ebrahimi, T. (2000). The JPEG 2000 still image
coding system: an overview. IEEE Transactions on Consumer Electronics, 46(4),
1103–1127.

Coello, C. A. C., Lamont, G. B., & Veldhuizen, D. A. V. (2006). Evolutionary Algorithms
for Solving Multi-Objective Problems (Genetic and Evolutionary Computation).
Secaucus, NJ, USA: Springer-Verlag New York, Inc..

Connor, D., Brainard, R., & Limb, J. (1972). Intraframe coding for picture
transmission. Proceedings of the IEEE, 60(7), 779–791.

Czuni, L., Csaszar, G., & Licsar, A. (2006). Estimating the Optimal Quantization
Parameter in h. 264. In ICPR ’06: Proceedings of the 18th International Conference
on Pattern Recognition (pp. 330–333). Washington, DC, USA: IEEE Computer
Society.

Deb, K. (2001). Multi-Objective Optimization using Evolutionary Algorithms.
Chichester: Wiley-Interscience Series in Systems and Optimization. John
Wiley & Sons.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist
multiobjective genetic algorithm: Nsga-ii. IEEE Transactions on Evolutionary
Computation, 6, 182–197.

Ehrgott, M., & Gandibleux, X. (2002). Multiple Criteria Optimization: State of the Art
Annotated Bibliographic Surveys. Kluwer Academic Publishers..

Hang, H.-M., Chou, Y.-M., & Cheng, S.-C. (1997). Motion estimation for video coding
standards. Journal of VLSI Signal Processings and Systems, 17(2–3), 113–136.

ISO/IEC. 15444-1:2000 information technology jpeg2000 image coding system-part
1: core coding system. Technical report.

Jiang, M. 2006. Adaptive rate control for advanced video coding. PhD thesis, Santa
Clara, CA, USA. Adviser-Nam Ling.

Koski, J. (1988). Multicriteria truss optimization. Engineering and in the Sciences.
kuang Chen, Y., Vetro, A., Sun, H. and Kung, S.Y. 1997. Optimizing intra/inter coding

mode decisions. In Proceedings of International Symposium on Multimedia
Information Processing, pp. 561–568.

Lora, M. 1994-2008. Xiph.org:: Test media. World Wide Web electronic publication.
Luis, A., Patricio, M. Scalable Streaming of JPEG 2000 Live Video Using RTP over UDP.
Miettinen, K. (2001). Some methods for nonlinear multi-objective optimization.

Lecture Notes in Computer Science, 1–20.
Ng, J. K.-Y., Leung, K. R., & Hui, C. K.-C. (2005). A qos-enabled transmission scheme

for mpeg video streaming. Real-Time Systems, 30(3), 217–256.
Pennebaker, W., & Mitchell, J. (1993). JPEG Still Image Data Compression Standard.

Kluwer Academic Publishers..
Rabbani, M., & Joshi, R. (2002). An overview of the JPEG 2000 still image

compression standard. Signal Processing: Image Communication, 17(1), 3–48.
Sawaragi, Y., Nakayama, H., & Tanino, T. (1985). Theory of Multiobjective

Optimization. Orlando: Academic Press.
Shi, Y., Sun, H. 2000. Image and video compression for multimedia engineering:

fundamentals, algorithms, and standards, CRC Pr I Llc.
Steuer, R. E. (1986). Multiple Criteria Optimization: Theory Computation and

Application. New York: John Wiley. pp. 546.
Thurston, D. L. (2006). Multi-Attribute Utility Analysis of Conflicting Preferences. In

Kemper E. Lewis et al. (Eds.), Decision Making in Engineering Design. New York,
New York: ASME.

Tu, Y.-K., Yang, J.-F., Shen, Y.-N., & Sun, M.-T. (2003). Fast Variable-Size Block Motion
Estimation using Merging Procedure with an Adaptive Threshold. In ICME ’03:
Proceedings of the 2003 International Conference on Multimedia and Expo
(pp. 789–792). Washington, DC, USA: IEEE Computer Society.

Wallace, G. 1991. The JPEG still picture compression standard.
Wang, Y.-C., & Leou, J.-J. (2003). A Rate Control Scheme for h. 26l Video

Transmission. In ICME ’03: Proceedings of the 2003 International Conference on
Multimedia and Expo – Volume 3 (ICME ’03) (pp. 349–352). Washington, DC,
USA: IEEE Computer Society.

Wiegand, T., Sullivan, G., Bjntegaard, G., & Luthra, A. (2003). Overview of the H. 264/
AVC video coding standard. IEEE Transactions on Circuits and Systems for Video
Technology, 13(7), 560–576.

Zhang, Q., Zhu, W., & Ya-W, Z. (2005). End-to-end qos for video delivery over
wireless internet. Proceedings of the IEEE, 93(1), 123–134.
12