Expert system based parallel multi-1D block matching algorithm with implementation for motion estimation


Expert Systems with Applications 39 (2012) 3249–3256
Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a
Expert system based parallel multi-1D block matching algorithm
with implementation for motion estimation q

Shin-Yeu Lin a,⇑, Chong-Wei Su b, Jung-Shou Huang c
a Department of Electrical Engineering & Green Technology Research Center at Chang Gung University, Taoyuan, Taiwan, ROC
b Institute of Electrical and Control Engineering at National Chiao Tung University, Hsinchu, Taiwan, ROC
c Elan Electronics Corporation, Hsinchu, Taiwan, ROC

a r t i c l e i n f o
Keywords:
Expert system
Knowledge base
Inference engine
Block matching algorithm
Motion estimation
Mixed signal
0957-4174/$ - see front matter � 2011 Elsevier Ltd. A
doi:10.1016/j.eswa.2011.09.012

q This research work was supported in part by Natio
under Grant NSC98-2221-E-182-065-MY2.
⇑ Corresponding author. Address: Department of El

Technology Research Center, Chang Gung Universit
Kwei-Shan, Tao-Yuan 333, Taiwan, ROC. Tel.: +886 3
2118026.

E-mail addresses: shinylin@mail.cgu.edu.tw (S.-
edu.tw (C.-W. Su), rong@emc.com.tw (J.-S. Huang).
a b s t r a c t

In this paper, we propose an expert-system based parallel multi-1-dimensional block matching algorithm
(ESPM-1D-BMA) for motion estimation (ME). Instead of the conventional 2D block matching, we employ
the parallel multi-1D blocks matching to improve the computing speed. To improve the ME accuracy, we
design a knowledge base and inference engine to determine the true motion vector (MV) from the results
of parallel multi-1D blocks matching. To speed up the computing speed further, we present a hardware
architecture for implementing the ESPM-1D-BMA. We have demonstrated that the MV estimation accu-
racy achieved by the proposed ESPM-1D-BMA is much better than the comparing fast block matching
algorithms and is close to the 2 dimensional full search block matching algorithm (2D-FSBMA). We also
demonstrate that the computing speed of the proposed ESPM-1D-BMA is about two times as fast as the
mixed-signal 2D-FSBMA (MS-2D-FSBMA).

� 2011 Elsevier Ltd. All rights reserved.
1. Introduction matching to reduce the computational complexity in searching
Two dimensional (2D)-block matching algorithm (BMA) is a
commonly adopted method for searching the motion vector (MV)
between two image frames namely the reference and the search
frames, such that the MV is obtained when the best matched 2D-
blocks, the reference and one search blocks are found. Among vari-
ous BMAs, the 2D-full search BMA (2D-FSBMA) using the mean
square error (MSE) criteria (Gharavi & Mills, 1990) is considered to
be the most accurate algorithm for searching the MV. However,
the 2D-FSBMA is computationally complex. Hence, some fast BMAs
were proposed, such as the new three-step search (NTSS) (Li, Zeng, &
Liou, 1994), diamond search (DS) (Zhu & Ma, 1997) and the hexago-
nal based search (HEXBS) (Zhu, Lin, & Chau, 2002). These algorithms
use few search points to reduce computational complexity, however
at the price of poor accuracy. Therefore, proposing a method to re-
duce the computational complexity of 2D-FSBMA while maintaining
its accuracy in searching the MV is the purpose of this paper.

Instead of 2D-block matching, we will slice a 2D block into mul-
tiple, say K, 1D blocks and employ a parallel multi-1D blocks
ll rights reserved.

nal Science Council in Taiwan

ectrical Engineering & Green
y, 259 Wen-Hwa 1st Road,
2118800x3221; fax: +886 3

Y. Lin), cwsu.ece97g@nctu.
the MV. Due to the noise appearing in the search and reference
frames, the 1D-block matching should be less accurate than the
2D-block matching in searching the MV. Additionally, the MVs
determined in each of the K 1D-blocks matching may be different
due to various noise contaminations in various 1D blocks. There-
fore, to remedy the possible inaccuracy in searching the MV using
parallel multi-1D blocks matching, we propose an expert system
based parallel multi-1D-BMA (ESPM-1D-BMA).

For the purpose of real-time motion estimation (ME), we need
to improve the computing speed further by implementing the pro-
posed algorithm in hardware. Therefore, we will present the hard-
ware implementation architecture of the proposed algorithm.

We organize our paper in the following manner. In Section 2, we
will present the proposed ESPM-1D-BMA. In Section 3, we will
present the hardware implementation architecture of the proposed
algorithm. In Section 4, we will test the performance of the pro-
posed algorithm and compare with other existing methods in
terms of ME accuracy and the computing speed using comprehen-
sive simulations. Finally, we will draw a conclusion in Section 5.
2. Expert system based parallel multi-1D block matching
algorithm (ESPM-1D-BMA)

2.1. Review of 2D-FSBMA

We let In and In�1 in Fig. 1(a) denote the reference and search
frames, respectively; we let block A inside In denote the reference

http://dx.doi.org/10.1016/j.eswa.2011.09.012
mailto:shinylin@mail.cgu.edu.tw
mailto:cwsu.ece97g@nctu. edu.tw
mailto:cwsu.ece97g@nctu. edu.tw
mailto:rong@emc.com.tw
http://dx.doi.org/10.1016/j.eswa.2011.09.012
http://www.sciencedirect.com/science/journal/09574174
http://www.elsevier.com/locate/eswa


3250 S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256
block (RB) and let block B, which can be any block in In�1, denote
the search block (SB). The idea of 2D-FSBMA is to search all possi-
ble SBs and find the one that is most similar to A. This searching
task is performed by computing the MSE between blocks A and B
as described below. We assume that the sizes of the RB A (or SB
B) and frame In (or In�1) are X � Y and M � N pixels, respectively,
as shown in Fig. 1(b). The MSE between two blocks, RB A and SB
B, induced in 2D-FSBMA, denoted by MSE2D, is defined as

MSE2D ¼
1

X � Y
XX

i¼1

XY

j¼1
ðrði; jÞ� sði; jÞÞ2 ð1Þ

where r(i, j) and s(i, j) denote the (i, j)th pixel values of the RB A and
SB B, respectively. Therefore, the 2D-FSBMA will search through all
possible SBs to find the one with smallest MSE, say B⁄. Then the MV
is defined as the difference of position indices between RB A and SB
B⁄ as shown in Fig. 1(a).

2.2. Motivation

To improve the computing speed of 2D-FSBMA, we will slice the
2D block into multi-1D blocks and apply a parallel multi-1D blocks
matching algorithm. However, the MVs determined in each of the
multi-1D blocks matching may be different due to various noise
contaminations in various 1D blocks. Therefore, we will use the ex-
pert system concept to help determine the true MV. In the follow-
ing, we will describe the proposed algorithm step by step.

2.3. The 1D block matching

We let B denote the number of pixels in 1D block, then a 1D
block is formed by the B consecutive pixel values from a row of
the frame. We let r(i), i = 1, . . . , B and s(i), i = 1, . . . , B denote the B
Fig. 1. Motion vector determination using 2D-FSBMA

Fig. 2. A diagram of the K independen
pixel values of the 1D reference block and the 1D search block,
respectively. Then, the MSE between r(i), i = 1, . . . , B and s(i),
i = 1, . . . , B can be computed as follows:

MSEðSBCÞ¼
1
B

XB

i¼1
ðrðiÞ� sðiÞÞ2 ð2Þ

where SBC represents the 1D search block coordinate (SBC), which
is identified by the coordinate of the first pixel of the 1D search
block, s(1).

2.4. Parallel multi-1D blocks matching

To improve the ME accuracy of 1D block matching, while keep-
ing its computational efficiency, we can use a parallel multi-1D
blocks matching. The multi-1D blocks matching consists of K 1D
reference blocks, and each 1D reference block represents one inde-
pendent 1D reference block in the reference frame as shown in
Fig. 2, in which we assume K = 8.

We let rk(i), i = 1, . . . , B denote the kth 1D reference block in In
and let the MSEk(SBC) denote the MSE between rk(i), i = 1, . . . , B in
In and the search 1D block, s(i), i = 1, . . . , B in In�1, then MSEk(SBC)
for k = 1, . . . , K can be computed by

MSEkðSBCÞ¼
1
B

XB

i¼1
ðrkðiÞ� sðiÞÞ

2 ð3Þ

which can be performed independently and in parallel for each k.

2.5. Expert system based parallel multi-1D blocks matching algorithm
(ESPM-1D-BMA)

As described earlier that the searched MV based on each of the
K 1D-blocks matching may be different due to various noise
. (b) Example reference block and image frame.

t 1D blocks in a reference frame.


S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 3251
contamination in the search and reference 1D blocks. However, we
can view the MSEs computed from the 1D block matching for a ref-
erence block in In as a result determined by an expert. Therefore,
the MSEs resulted from K 1D blocks matching can be viewed as a
result determined by K experts. Subsequently, to determine the
true MV, we can employ the concept of expert system (Liao,
2005; Li & Sun, 2009; Sun & Li, 2008) to construct the knowledge
base (KB) and the inference engine (IE) for the parallel multi-1D
blocks matching as follows.

The input of the employed expert system is the overall resulted
MSEs of the K 1D-blocks matching. For each of the K 1D-blocks
matching, the true MV should be among the MVs with top smallest
MSEs. Therefore, the employed KB of the proposed algorithm can
be stated as follows. For each of the K 1D blocks matching, we se-
lect the P search blocks that correspond to the top P smallest MSEs
and assign them with the marked numbers (MNs) P, P � 1, . . . , 1,
such that the selected search block with smaller MSE is marked
by a larger MN. For example, the MN assigned to the search block
with smallest MSE is P. Since each selected search block may con-
clude an MV, there will be K � P MVs resulted from the parallel
multi-1D blocks matching, and each MV is associated with the
MN of the corresponding search block.

However, some of the K � P MVs may be the same. In general,
the frequently appearing MVs and the MV with smaller MSE have
higher probability to be the true MV. Consequently, if an MV ap-
pears q times in the K � P MVs, it will associate with q MNs. There-
fore, the IE of our expert system can be stated as follows. For each
distinct MV with q copies in the resulted K � P MVs, we will sum
the q MNs and define the resulting value as the accumulated MN
(AMN) of the MV. Consequently, the MV with largest AMN is deter-
mined to be the true MV.

For the sake of illustration, we use the following example to
explain the proposed ESPM-1D-BMA. We assume K = 8 and P = 3.
In Table 1, the first column shows the K 1D reference blocks,
and the second, the third and the fourth columns show the P
(=3) MVs with P smallest MSEs resulted from the 1D block match-
ing for each reference block. Then the MVs in columns 2, 3, and 4
are assigned with MNs 3, 2, and 1, respectively. Calculating the
Table 1
The top 3 MV for each 1D reference block.

Index of reference block MV

MV with smallest MSE

1 (3, 2)
2 (3, 2)
3 (5, 8)
4 (8, 8)
5 (8, 9)
6 (5, 8)
7 (3, 2)
8 (3, 2)

Fig. 3. The hardware implementation architecture of th
AMNs of the seven distinct MVs presented in Table 1, we find that
(3, 2) has the largest AMN, 17, and is considered to be the true
MV.
3. Implementation for motion estimation

For the purpose of real-time ME, we can speed up the comput-
ing speed of the proposed algorithm further by hardware imple-
mentation. However, a pure digital circuit implementation (Hsieh
& Lin, 1992; Yang, Wolf, & Vijaykrishan, 2005) may suffer from
some implementation problems, such as high power consumption
and large chip size. To overcome these implementation problems,
a mixed-signal approach that uses simple current-summation circuit
to circumvent computationally complex digital MSE computation
should be a good choice (Panovic & Demosthenous, 2006). In the
following, we will present the implementation of the proposed
algorithm using mixed-signal approach step by step.
3.1. Implementing 1D-block matching using mixed-signal approach

First of all, we transformed the sensored pixel values into volt-
ages within the range [0 V, 2.5 V] by dividing the range of pixel val-
ues between black and white into 255 grey levels, which are
represented by 255 least significant bits (LSBs), such that black
and white correspond to 0 and 255 LSB, respectively. We let Vx de-
note the transformed voltage of a pixel value of x LSB, then
V x ¼ 2:5V�0V255 � x. We let voltages Vr(i) and Vs(i) denote the trans-
formed voltages of r(i) and s(i), respectively. Assuming B = 8, the
mixed signal approach for the 1D block matching hardware imple-
mentation architecture is presented in Fig. 3, in which we preload
the transformed voltage Vr(i) of the pixel value of the 1D reference
block r(i), i = 1, . . . , B in In into the reference block memory (RBM).
Then the transformed voltage Vs(i) of the pixel value of the search
frame In�1 is serially fed into the search block memory (SBM) from
left to right, from top to bottom, then the transformed voltage of
the pixel value of the 1D search block will be fetched from the
SBM as shown in Fig. 3. The two clock signals tc and tr are used
MV with 2nd smallest MSE MV with 3rd smallest MSE

(5, 8) (1, 5)
(5, 5) (8, 9)
(3, 2) (8, 8)
(3, 2) (5, 8)
(6, 3) (5, 5)
(6, 3) (3, 2)
(5, 5) (1, 5)
(6, 3) (1, 5)

e 1D block matching using mixed-signal approach.


Fig. 4. The computing architecture of parallel multi-1D block matching.

3252 S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256
to control the timing of the propagation of the pixel value fed into
SBM and RBM. The SE(i) denoted by a circle in Fig. 3 represents the
ith square error computing circuit to result in an output current
kt(Vr(i) � Vs(i))2, where kt is the transconductance parameter. The
summation of the output currents of SE(i), denoted by I(i) in
Fig. 3, i = 1, . . . , 8, will flow into a single resistive load R as shown
in Fig. 3. Then, the resulted voltage across R denoted by VO is pro-
portional to the MSE of 1D block matching and will be input to a P
winner comparator (PWC), which will be presented later. Clearly,
the computation of the MSE for all possible 1D search blocks can
be completed when the last pixel value in the last row of the frame
In�1 is read out.
3.2. Implementing parallel multi-1D blocks matching

The parallel computing architecture for the K 1D blocks
matching is presented in Fig. 4. The K 1D reference blocks are
preloaded into RBMk, k = 1, . . . , K, and the search block in the
frame In�1 are serially fed into SBM. The MSEk, k = 1, . . ., K are
computed in parallel to obtain MSEk(SBC), k = 1, . . . , K. The detailed
structure of SBM, RBMk, and MSEk in Fig. 4 are the same as that
presented in Fig. 3.
Fig. 5. The circuit of PW
3.3. Implementing ESPM-1D-BMA

To implement the ESPM-1D-BMA, we need a PWC to select the P
search blocks that correspond to the top P smallest MSEs resulted
from the 1D block matching for each 1D reference block. The cir-
cuit of PWC for the kth 1D reference block, denoted by PWCk, is
presented in Fig. 5, which is designed based on a sorting logic;
the solid lines and dotted lines in this figure represent the trans-
mission of data and signals, respectively.

At the very beginning, the P (=3) sample and hold circuits (SHs)
are reset to a default value, which is the largest value that SH can
take; similarly, the corresponding P digital memories (DMs) are reset
to null values. Then, for each incoming MSEk(SBC), we will compare it
with the MSE values stored in the P SHs as indicated by the P compar-
ators, denoted by COMPi, i = 1, . . . , P, shown in Fig. 5. COMPi will gen-
erate an enable signal eni, whose value depends on the comparison
result such that if MSEkðSBCÞ < MSESHi then eni = 1; otherwise
eni = 0. The combination of en1, . . . , enP will indicate which of the
following ranges that MSEk(SBC) lies: ½0; MSESH1�; ðMSESH1 ; MSESH2�;
. . . ;ðMSESHP�1 ; MSESHP �, orðMSESHP ;1�as presented in the illustrative
table in Fig. 5, in which we set P = 3. From the value range of the
incoming MSEk(SBC), we can easily update the P winners as follows.
If eni ¼ 1; MSESHi , the content of SHi, should be replaced by either
MSESHi�1 for the case eni�1 = 1 or MSEk(SBC) for the case that eni�1 = 0
or i = 1, and the content of the corresponding DMi will be replaced by
the proper SBC accordingly. To implement the above P winners
updating logic, we will design a selection signal, seli, to determine
what to replace the contents of SHi and DMi when the enable signal
eni = 1 in the following manner. If eni = 1 and seli = 1, the contents of
SHi and DMi will be replaced by SHi�1 and DMi�1, respectively. If
eni = 1 and seli = 0, the contents of SHi and DMi will be replaced by
MSEk (SBC) and the corresponding SBC, respectively. If eni = 0, SHi
and DMiremain unchanged. The values of seli, i = 2, . . . , P, which are
determined based on the values of eni, i = 1, . . . , P, are presented in
the illustrative table of Fig. 5, and they can be generated using AND
Ck and illustrations.


S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 3253
gates as shown in Fig. 5. Notably, sel1 is not needed because if
en1 ¼ 1; MSESH1 and DM1 must be replaced by MSEk(SBC) and the
corresponding SBC. For the kth 1D reference block with reference
block coordinate RBCk, the above comparison and replacement pro-
cess will continue until the last pixel value in the last row of the
search frame is read out. The final contents in SHi and DMi,
i = 1, . . . , P are the MSEs and the SBCs of the P search blocks with
top P smallest MSEs, respectively. Consequently, the P SBCs stored
in DMi, i = 1, . . . , P and the kth reference block coordinate (RBCk) will
be input to the subtracter, SUB, as shown in Fig. 5 to calculate the top
P MVks corresponding to the P smallest MSEs. This constitutes the
operations of PWCk.

By the aid of PWCk, k = 1, . . . , K, we can design the IE using the
following components. Except for the indicators for indicating
the index of and the number of distinct MVs in the K � P MVs,
we employ a counter to cyclically generate the MN for each MV,
an identifier to recognize the distinct MVs, an adder to calculate
the AMN for each distinct MV and a comparator to identify the
MV with largest AMN. Since the circuit to interconnect the above
mentioned components for implementing the IE and generating
the true MV is complicated and tedious, we will present it in
Appendix A.

3.4. Hardware Implementation Architecture of ESPM-1D-BMA for ME

Now, combining the parallel multi-1D blocks matching comput-
ing architecture (Figs. 3 and 4), PWCk (Fig. 5), k = 1, . . . , K, and IE
Fig. 6. The block diagram of ESPM-1D-BMA for ME.

Table 2
The average ME accuracies of the 5000 estimated MVs.

K P Football Greens Concord

2 1 91.70 98.84 89.98
2 95.11 98.96 92.12
3 96.04 99.00 93.18
4 96.51 99.36 94.18
5 97.36 99.04 94.12

4 1 98.27 99.60 94.32
2 98.83 99.36 95.18
3 99.08 99.44 95.40
4 99.39 99.42 96.20
5 99.56 99.48 96.10

6 1 99.30 99.68 96.12
2 99.55 99.52 96.80
3 99.58 99.32 97.10
4 99.50 99.60 97.52
5 99.57 99.68 97.96

8 1 99.63 99.82 96.98
2 99.65 99.58 97.12
3 99.58 99.54 97.20
4 99.60 99.60 97.92
5 99.69 99.46 98.28

2D-FSBMA 100 99.998 99.947
(Fig. 8 in Appendix A), the hardware implementation architecture
of the proposed ESPM-1D-BMA for ME can be described in Fig. 6.
The solid lines and dotted lines in Fig. 6 represent the transmission
of data and signals, respectively.

For the sake of simplicity in illustration, we assume K = 3 in
Fig. 6, where SBM, RBMk and MSEk, k = 1, . . . , K, are the same as
those in Fig. 4. The Image Sensor, IS, shown in Fig. 6 is used to ob-
tain the image data of the 1D search block and the K reference 1D
blocks. The purpose of Digital Controller, DC, is to control the acti-
vation timing of each unit and the synchronization of the parallel
processing architecture. Therefore, the operations of the hardware
implementation architecture presented in Fig. 6 can be described
as follows. First of all, the image data of the K 1D reference blocks
will be preloaded into RBMk, k = 1, . . . , K. The DC will send a control
signal to the IS to obtain the image data of In�1, which will be input
to SBM. Since MSEk, k = 1, . . . , K are analog circuits, they will com-
pute MSEk(SBC), k = 1, . . . , K, directly and in parallel once the data
are ready at the output of both SBM and RBMk, k = 1, . . . , K. The
computed MSEk(SBC), k = 1, . . . , K will be input to PWCk, k = 1,
. . . , K controlled by the signal output from DC to determine
whether it is among the P smallest MSEks for the kth 1D reference
block. The above process will repeat until the last pixel value in the
last row of the search frame is read out. Then DC will send a signal
to PWCk to output the top P MVks for k = 1, . . . , K in parallel. These
K � P MVs will be input to IE to generate the true MV.

3.5. Time complexity

We let T(�) denote the computation time or propagation time of
unit (�). In addition toT(�), there are other time delays need be con-
sidered such as (i) the time delay incurred from DC to guarantee a
safety margin of controlling the action of the K units of same type
of components in parallel and (ii) the wire delay between units.
However, comparing with T(�), these time delays are negligible.
Therefore, the estimated time needed to generate an MV of the
ESPM-1D-BMA can be stated in the following:

ðM � NÞðT SH þ T MSE þ T PWCÞþ T IE ð4Þ

where TSH denotes the time for loading a pixel value into RBM or
SBM, TMSE denotes the time required in computing the MSE of 1D
block matching, which is the propagation time of the circuit
Fabric Hestain Pears Tissue Board

96.62 98.60 81.16 98.04 98.34
98.33 99.14 86.20 97.82 98.76
98.89 99.30 90.16 97.92 99.00
99.08 99.20 91.84 97.82 98.70
99.17 98.92 91.30 98.04 98.72

99.59 99.54 93.74 98.66 98.72
99.53 99.16 95.96 98.30 99.98
99.59 99.04 97.24 98.28 99.70
99.79 99.20 97.60 98.32 99.66
99.70 99.12 97.80 98.12 99.66

99.92 99.44 96.96 98.60 100
99.65 99.18 97.54 98.54 99.44
99.68 99.22 98.24 98.44 99.62
99.63 99.32 98.30 98.54 99.60
99.62 99.24 98.32 98.52 99.72

99.94 99.48 97.70 98.88 99.98
99.72 98.98 98.70 98.78 99.64
99.57 98.90 99.28 98.52 99.70
99.63 99.36 99.14 98.78 99.74
99.54 99.24 98.98 98.58 99.70

100 99.625 99.923 99.285 100


3254 S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256
presented in Fig. 3, TPWC denotes the propagation time of the PWCk
presented in Fig. 5 and TIE denotes the processing time of the IE pre-
sented in Fig. 8.
Fig. 7. The pseudo-code of IE.
4. Test results and comparisons

In this section, we will demonstrate the ME accuracy achieved
by the proposed ESPM-1D-BMA in comparison with the 2D-FSBMA
and some fast block matching algorithms (Li et al., 1994; Zhu et al.,
2002; Zhu & Ma, 1997) using extensive simulations. We will also
compare the computational efficiency of the proposed ESPM-1D-
BMA with the 2D-FSBMA. We use Matlab and their eight pictures,
football, greens, concord, fabric, hestain, pears, tissue and board, as
our simulation tool and test bed, respectively. The eight tested pic-
tures were all formatted as grey-level image in eight-bit. We set
M = 24 and N = 24 for all tests, X = 8 and Y = 8 for 2D-FSBMA, and
B = 8 for ESPM-1D-BMA. However, we will use various combina-
tions of K and P to test the performance of ESPM-1D-BMA. For each
tested picture, we arbitrarily pick an image frame of M � N pixels
to serve as the reference frame In. For each reference frame In, we
prepare the search frame In�1 by the following procedures. We ran-
domly generate an MV ranging from �M�X2 to

M�X
2 and from �

N�Y
2 to

N�Y
2 in x and y directions, respectively, and apply it to In to form a

noise free In�1. For each pixel in the noise free In�1, we randomly
generate a noisy signal based on a normal distribution with mean
0 LSB and variance 3 LSBs and add it to the pixel. The resulted
frame will serve as the tested search frame In�1. For each one of
the eight tested pictures, we prepare 5000 (In�1, In)s based on the
above process.

4.1. Comparisons of ME accuracy

Now for each of the eight tested pictures, we use the prepared
5000 (In�1, In) s to estimate the corresponding randomly generated
MVs using the proposed ESPM-1D-BMA and the 2D-FSBMA, the
average accuracy of the 5000 estimated MVs for each picture and
for various combinations of K and P are presented in Table 2. From
Table 2, we can observe that the larger the values of K and P in the
proposed ESPM-1D-BMA, the more accurate the MV estimation
will be. We can also observe that when K P 8 and P P 4, the pro-
posed algorithm is at most 1% less accurate than the 2D-FSBMA on
the average.

Putting the test results of ESPM-1D-BMA for K = 8 and P = 3 pre-
sented in Table 2 in the second row of Table 3 for reference, we also
use the same 5000 prepared (In�1, In) s to test the three fast block
matching algorithms, the DS, the NTSS, the HEXBS. The average
ME accuracy resulted by these three methods are presented in
the last three rows of Table 3, which show that the proposed
ESPM-1D-BMA is far better than the three fast block matching
algorithms in ME accuracy.

4.2. Comparison of computational efficiency

Now, to evaluate the computing time of ESPM-1D-BMA and the
mixed-signal approach 2D-FSBMA (MS-2D-FSBMA), we need to re-
view the computational complexity of the latter first. Similar to the
Table 3
Comparisons of ESPM-1D-BMA with the three fast block matching algorithms.

Football Greens Concord

ESPM-1D-BMA (K = 8, P = 3) 99.58 99.54 97.20
DS 45.93 69.04 81.79
NTSS 48.69 81.13 86.87
HEXBS 38.15 63.02 73.89
1 � B current summation circuit employed in 1D block matching
presented in Fig. 3, the MS-2D-FSBMA employed a X � Y current-
summation circuit to obtain the 2D-MSE (Panovic & Demosthen-
ous, 2006). Instead of PWCk, k = 1, . . . , K, the MS-2D-FSBMA need
only one single winner comparator (SWC), which consists of a com-
parator, COMP, a SH, and a DM to identify the coordinate of the SB
with minimum MSE. In the case of a frame with size M � N, a block
with size X � Y, the computing time of the MS-2D-FSBMA for com-
puting an MV can be calculated by:

2 � X � Y � T SH þðM � XÞðN � Y þ 1ÞðY � T SH þ T MSE2D
þ T COMPÞþðN � YÞðX � T SH þ T MSE2D þ T COMPÞ ð5Þ

where TSH is the same as that in (4); T MSE2D denotes the time for per-
forming a 2D-MSE computation; since the most time consuming
Fabric Hestain Pears Tissue Board

99.57 98.90 99.28 98.52 99.70
82.88 89.04 75.54 79.24 38.73
86.22 87.89 75.01 87.69 43.99
67.10 74.53 62.13 74.69 28.19


Fig. 8. Hardware architecture for implementing the IE.

S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 3255
component in SWC is the COMP, the time needed for comparison in
SWC is TCOMP.

Based on the existing circuit for SH (Chen, Gu, Shen, Wu, & Hsu,
1998), MSE2D (Panovic & Demosthenous, 2004) and COMP (Razavi
& Wooley, 1992), we obtain the following computing time using
HSPICE simulation: T SH ¼ 50 ns; T MSE2D ¼ 10 ns, and TCOMP = 100 ns.
Although the analog circuit for computing MSE presented in Fig. 3
is simpler than computing MSE2D in MS-2D-FSBMA, the operations
of all SEs in both circuits are carried out in parallel. Therefore,
T MSE ffi T MSE2D ¼ 10 ns. Similarly, the operations of PWCk (Fig. 5),
k = 1, . . . , K are also carried out in parallel, and the most time
consuming component in PWCk is COMPk, therefore, TPWC ffi
TSWC = TCOMP = 100 ns. According to the hardware implementation
architecture of the IE presented in Appendix A, the processing time
for IE is TIE = K � P � Tclock, where Tclock = 13.1 ns is the critical path
delay of IE. Therefore, for K = 8, P = 3, we have TIE = 314.4 ns. Conse-
quently, based on the parameters employed in our tests, the com-
puting time needed for an MV estimation of ESPM-1D-BMA with
K = 8, P = 3 and MS-2D-FSBMA are 92474.4 ns and 153,280 ns,
respectively. This demonstrates that the proposed ESPM1D-BMA
uses only 60% of computing time of the MS-2D-FSBMA and
achieves almost the same ME accuracy. Notably, the critical point
of the computing efficiency achieved by the ESPM-1D-BMA is the
small amount of time spending on loading the pixel values of the
frame In�1 into the SBM. The part of loading time in (4) and (5)
for ESPM-1D-BMA and MS-2D-FSBMA are M � N � TSH and 2 �
X � Y � TSH + (M � X)(N � Y + 1)(Y � TSH) + (N � Y)(X � TSH), respec-
tively, which constitutes 31.14% and 79.33% of the corresponding
total computing time, respectively.
5. Conclusion

In this paper, we have proposed a hardware implementable
ESPM-1D-BMA and demonstrated that its ME accuracy is close to
the 2D-FSBMA and better than the three comparing fast block
matching algorithms. Above all, the computing speed of ESPM-
1D-BMA is about two times as fast as the MS-2D-FSBMA.
Appendix A

To implement the IE, we will first describe the pseudo code for
carrying out the IE then present the hardware architecture for
implementing the pseudo code. We let mvk,p and MNk,p,
p = 1, . . . , P denote the P MVs that correspond to the top P smallest
MSEs for the kth 1D block matching and the corresponding MN,
respectively; we let D denote the number of distinct MVs among
the K � P mvk,p’s and let Dmvd, d = 1,. . .,D, denote the D distinct
MVs; we let AMNddenote the AMN of Dmvd and let wMV and
wAMN denote the resulting best-so-far MV and the corresponding
AMN during the process, respectively. Based on the above nota-
tions, the pseudo code for carrying out the IE is presented in Fig. 7.

The operations of the pseudo code can be summarized in the
following. In the initialization step, we reset all the variables. In
step 1, we check whether the incoming MV, mvk,p, matches any
one of the stored distinct MVs, Dmvd. If it matches, we identify
the matched distinct MV and output a flag new = 0; otherwise,
we denote the incoming MV as a new distinct MV and set
new = 1. In step 2, we compute the AMN of the identified distinct
MV. In step 3, we check whether AMN_new of the distinct MV iden-
tified previously greater than the wAMN of wMV and output a flag
win = 1 if the result is positive; otherwise we set win = 0. In step 4,
we update the distinct MV and the associated AMN and update
wMV and wAMN if win = 1; furthermore, if the flag new resulted
in step 1 is 1, we set D = D + 1 that is to increase the number of dis-
tinct MVs by 1.

The hardware implementation architecture of the pseudo code is
presented in Fig. 8. In the leftest part of this architecture, we use a
counter, counter_kp marked by (ii) in Fig. 8, to generate the index
p = 1, . . . , P for each k = 1, . . . , K sequentially and use a KP-to-1 Multi-
plexer, denoted by MUX and marked by (i) in Fig. 8, to select mvk,p
based on the generated index; this corresponds to the statement
smv = mvk,p in step 1 of Fig. 7. The smv will be input to the MV iden-
tifier marked by (iii) in Fig. 8, which will compare the smv with the
distinct MVs stored in the registers denoted by Distinct MV and
AMN. The counter, counter_D marked by (iv) in Fig. 8, will generate
the index D, such that if smv does not match any existing distinct
MVs, this smv will be the Dth distinct MV, and the MV identifier will
set the enable signal enD = 1 to activate the Distinct MV and AMN
registers, marked by (v) in Fig. 8, to store the current smv and the
corresponding AMN. In the meantime, the MV identifier also sets
new = 1 to increase counter_D by 1. We will then use an adder
marked by (vi) in Fig. 8 to update the AMN register for the current
smv, which may be one of the existing distinct MVs or a new distinct
MV, as follows. Add the MNk,p corresponding to the current smv to
the AMN_old then output AMN_new, which will be fed back to the


3256 S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256
Distinct MV and AMN registers. Notably, the MNk,p is generated by
the counter_MN marked by (vii) in Fig. 8. Furthermore, the AMN_-
new will compare with wAMN through the comparator, >, marked
by (viii) in Fig. 8. If AMN_new > wAMN, we set win = 1, which will up-
date wAMN by AMN_new and wMV by smv as shown by the blocks
marked by (x) and (ix), respectively. This completes the implemen-
tation of the pseudo code presented in Fig. 7.

We design the clock period to be long enough such that the sig-
nal mvk,p can travel through the critical path, which includes the
propagation of counter_kp, KP-to-1 MUX, and MV identifier, the ac-
cess of the Distinct MV and AMN registers, the processing time of
adder and comparator, and the update of wMV and wAMN. Based
on Taiwan Semiconductor Manufacturing Corporation (TSMC)
0.18 lm CMOS technology, the critical path’s delay of the IE pre-
sented in Fig. 8 can be within 13.1 ns. That means we can design
the clock period for IE as Tclock = 13.1 ns. Then, the time required
for processing the IE is TIE = K � P � Tclock. Therefore, for K = 8,
P = 3, the IE can obtain the true MV within 314.4 ns.

References

Chen, M. J., Gu, Y. B., Shen, W. C., Wu, T., & Hsu, P. C. (1998). A compact high-speed
Miller-capacitance based sample-and-hold circuit. IEEE Transactions on Circuits
and Systems I: Fundamental Theory and Applications, 45, 198–201.

Gharavi, H., & Mills, M. (1990). Block matching motion estimation algorithms new
results. IEEE Transactions on Circuits and Systems, 37, 649–651.
Hsieh, C. H., & Lin, T. P. (1992). VLSI architecture for block-matching motion
estimation algorithm. IEEE Transactions on Circuits Systems Video Technology,
2(2).

Liao, S. H. (2005). Expert systems methodologies and applications – a decade review
from 1995 to 2004. Expert Systems with Applications, 28(1), 93–103.

Li, H., & Sun, J. (2009). Majority voting combination of multiple case-based
reasoning for financial distress prediction. Expert Systems with Applications,
36(3), 4363–4373.

Li, R., Zeng, B., & Liou, M. L. (1994). A new three-step search algorithm for block
motion estimation. IEEE Transactions on Circuits Systems for Video Technology, 4,
438–442.

Panovic, M., & Demosthenous, A. (2004). A compact block matching cell for
analogue motion estimation processors. Proceedings of 2004 IEEE International
Symposium on Circuits and Systems (ISCAS’04) (Vol. 2, pp. 229–232). Canada:
Vancouver.

Panovic, M., & Demosthenous, A. (2006). Motion estimation processor using mixed-
signal approach. IEEE Transactions on Circuits and Systems II, 53(6),
492–496.

Razavi, B., & Wooley, B. A. (1992). Design techniques for high-speed, high-resolution
comparators. IEEE Journal of Solid-State Circuits, 27(12), 1916–1926.

Sun, J., & Li, H. (2008). Listed companies’ financial distress prediction based on
weighted majority voting combination of multiple classifiers. Expert Systems
with Applications, 35(3), 818–827.

Yang, S., Wolf, W., & Vijaykrishan, N. (2005). Power and performance analysis of
motion estimation based on hardware and software realizations. IEEE
Transactions on Computers, 54, 714–726.

Zhu, S. & Ma, K.-K. (1997). A new diamond search algorithm for fast block matching
motion estimation. In IEEE international conference on communications and signal
processing (pp. 292–296).

Zhu, C., Lin, X., & Chau, L. P. (2002). Hexagon-based search pattern for fast block
motion estimation. IEEE Transactions on Circuits Systems for Video Technology, 12,
349–355.


	Expert system based parallel multi-1D block matching algorithm with implementation for motion estimation
	1 Introduction
	2 Expert system based parallel multi-1D block matching algorithm (ESPM-1D-BMA)
	2.1 Review of 2D-FSBMA
	2.2 Motivation
	2.3 The 1D block matching
	2.4 Parallel multi-1D blocks matching
	2.5 Expert system based parallel multi-1D blocks matching algorithm (ESPM-1D-BMA)

	3 Implementation for motion estimation
	3.1 Implementing 1D-block matching using mixed-signal approach
	3.2 Implementing parallel multi-1D blocks matching
	3.3 Implementing ESPM-1D-BMA
	3.4 Hardware Implementation Architecture of ESPM-1D-BMA for ME
	3.5 Time complexity

	4 Test results and comparisons
	4.1 Comparisons of ME accuracy
	4.2 Comparison of computational efficiency

	5 Conclusion
	Appendix A 
	References