Expert system based parallel multi-1D block matching algorithm with implementation for motion estimation Expert Systems with Applications 39 (2012) 3249–3256 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Expert system based parallel multi-1D block matching algorithm with implementation for motion estimation q Shin-Yeu Lin a,⇑, Chong-Wei Su b, Jung-Shou Huang c a Department of Electrical Engineering & Green Technology Research Center at Chang Gung University, Taoyuan, Taiwan, ROC b Institute of Electrical and Control Engineering at National Chiao Tung University, Hsinchu, Taiwan, ROC c Elan Electronics Corporation, Hsinchu, Taiwan, ROC a r t i c l e i n f o Keywords: Expert system Knowledge base Inference engine Block matching algorithm Motion estimation Mixed signal 0957-4174/$ - see front matter � 2011 Elsevier Ltd. A doi:10.1016/j.eswa.2011.09.012 q This research work was supported in part by Natio under Grant NSC98-2221-E-182-065-MY2. ⇑ Corresponding author. Address: Department of El Technology Research Center, Chang Gung Universit Kwei-Shan, Tao-Yuan 333, Taiwan, ROC. Tel.: +886 3 2118026. E-mail addresses: shinylin@mail.cgu.edu.tw (S.- edu.tw (C.-W. Su), rong@emc.com.tw (J.-S. Huang). a b s t r a c t In this paper, we propose an expert-system based parallel multi-1-dimensional block matching algorithm (ESPM-1D-BMA) for motion estimation (ME). Instead of the conventional 2D block matching, we employ the parallel multi-1D blocks matching to improve the computing speed. To improve the ME accuracy, we design a knowledge base and inference engine to determine the true motion vector (MV) from the results of parallel multi-1D blocks matching. To speed up the computing speed further, we present a hardware architecture for implementing the ESPM-1D-BMA. We have demonstrated that the MV estimation accu- racy achieved by the proposed ESPM-1D-BMA is much better than the comparing fast block matching algorithms and is close to the 2 dimensional full search block matching algorithm (2D-FSBMA). We also demonstrate that the computing speed of the proposed ESPM-1D-BMA is about two times as fast as the mixed-signal 2D-FSBMA (MS-2D-FSBMA). � 2011 Elsevier Ltd. All rights reserved. 1. Introduction matching to reduce the computational complexity in searching Two dimensional (2D)-block matching algorithm (BMA) is a commonly adopted method for searching the motion vector (MV) between two image frames namely the reference and the search frames, such that the MV is obtained when the best matched 2D- blocks, the reference and one search blocks are found. Among vari- ous BMAs, the 2D-full search BMA (2D-FSBMA) using the mean square error (MSE) criteria (Gharavi & Mills, 1990) is considered to be the most accurate algorithm for searching the MV. However, the 2D-FSBMA is computationally complex. Hence, some fast BMAs were proposed, such as the new three-step search (NTSS) (Li, Zeng, & Liou, 1994), diamond search (DS) (Zhu & Ma, 1997) and the hexago- nal based search (HEXBS) (Zhu, Lin, & Chau, 2002). These algorithms use few search points to reduce computational complexity, however at the price of poor accuracy. Therefore, proposing a method to re- duce the computational complexity of 2D-FSBMA while maintaining its accuracy in searching the MV is the purpose of this paper. Instead of 2D-block matching, we will slice a 2D block into mul- tiple, say K, 1D blocks and employ a parallel multi-1D blocks ll rights reserved. nal Science Council in Taiwan ectrical Engineering & Green y, 259 Wen-Hwa 1st Road, 2118800x3221; fax: +886 3 Y. Lin), cwsu.ece97g@nctu. the MV. Due to the noise appearing in the search and reference frames, the 1D-block matching should be less accurate than the 2D-block matching in searching the MV. Additionally, the MVs determined in each of the K 1D-blocks matching may be different due to various noise contaminations in various 1D blocks. There- fore, to remedy the possible inaccuracy in searching the MV using parallel multi-1D blocks matching, we propose an expert system based parallel multi-1D-BMA (ESPM-1D-BMA). For the purpose of real-time motion estimation (ME), we need to improve the computing speed further by implementing the pro- posed algorithm in hardware. Therefore, we will present the hard- ware implementation architecture of the proposed algorithm. We organize our paper in the following manner. In Section 2, we will present the proposed ESPM-1D-BMA. In Section 3, we will present the hardware implementation architecture of the proposed algorithm. In Section 4, we will test the performance of the pro- posed algorithm and compare with other existing methods in terms of ME accuracy and the computing speed using comprehen- sive simulations. Finally, we will draw a conclusion in Section 5. 2. Expert system based parallel multi-1D block matching algorithm (ESPM-1D-BMA) 2.1. Review of 2D-FSBMA We let In and In�1 in Fig. 1(a) denote the reference and search frames, respectively; we let block A inside In denote the reference http://dx.doi.org/10.1016/j.eswa.2011.09.012 mailto:shinylin@mail.cgu.edu.tw mailto:cwsu.ece97g@nctu. edu.tw mailto:cwsu.ece97g@nctu. edu.tw mailto:rong@emc.com.tw http://dx.doi.org/10.1016/j.eswa.2011.09.012 http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa 3250 S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 block (RB) and let block B, which can be any block in In�1, denote the search block (SB). The idea of 2D-FSBMA is to search all possi- ble SBs and find the one that is most similar to A. This searching task is performed by computing the MSE between blocks A and B as described below. We assume that the sizes of the RB A (or SB B) and frame In (or In�1) are X � Y and M � N pixels, respectively, as shown in Fig. 1(b). The MSE between two blocks, RB A and SB B, induced in 2D-FSBMA, denoted by MSE2D, is defined as MSE2D ¼ 1 X � Y XX i¼1 XY j¼1 ðrði; jÞ� sði; jÞÞ2 ð1Þ where r(i, j) and s(i, j) denote the (i, j)th pixel values of the RB A and SB B, respectively. Therefore, the 2D-FSBMA will search through all possible SBs to find the one with smallest MSE, say B⁄. Then the MV is defined as the difference of position indices between RB A and SB B⁄ as shown in Fig. 1(a). 2.2. Motivation To improve the computing speed of 2D-FSBMA, we will slice the 2D block into multi-1D blocks and apply a parallel multi-1D blocks matching algorithm. However, the MVs determined in each of the multi-1D blocks matching may be different due to various noise contaminations in various 1D blocks. Therefore, we will use the ex- pert system concept to help determine the true MV. In the follow- ing, we will describe the proposed algorithm step by step. 2.3. The 1D block matching We let B denote the number of pixels in 1D block, then a 1D block is formed by the B consecutive pixel values from a row of the frame. We let r(i), i = 1, . . . , B and s(i), i = 1, . . . , B denote the B Fig. 1. Motion vector determination using 2D-FSBMA Fig. 2. A diagram of the K independen pixel values of the 1D reference block and the 1D search block, respectively. Then, the MSE between r(i), i = 1, . . . , B and s(i), i = 1, . . . , B can be computed as follows: MSEðSBCÞ¼ 1 B XB i¼1 ðrðiÞ� sðiÞÞ2 ð2Þ where SBC represents the 1D search block coordinate (SBC), which is identified by the coordinate of the first pixel of the 1D search block, s(1). 2.4. Parallel multi-1D blocks matching To improve the ME accuracy of 1D block matching, while keep- ing its computational efficiency, we can use a parallel multi-1D blocks matching. The multi-1D blocks matching consists of K 1D reference blocks, and each 1D reference block represents one inde- pendent 1D reference block in the reference frame as shown in Fig. 2, in which we assume K = 8. We let rk(i), i = 1, . . . , B denote the kth 1D reference block in In and let the MSEk(SBC) denote the MSE between rk(i), i = 1, . . . , B in In and the search 1D block, s(i), i = 1, . . . , B in In�1, then MSEk(SBC) for k = 1, . . . , K can be computed by MSEkðSBCÞ¼ 1 B XB i¼1 ðrkðiÞ� sðiÞÞ 2 ð3Þ which can be performed independently and in parallel for each k. 2.5. Expert system based parallel multi-1D blocks matching algorithm (ESPM-1D-BMA) As described earlier that the searched MV based on each of the K 1D-blocks matching may be different due to various noise . (b) Example reference block and image frame. t 1D blocks in a reference frame. S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 3251 contamination in the search and reference 1D blocks. However, we can view the MSEs computed from the 1D block matching for a ref- erence block in In as a result determined by an expert. Therefore, the MSEs resulted from K 1D blocks matching can be viewed as a result determined by K experts. Subsequently, to determine the true MV, we can employ the concept of expert system (Liao, 2005; Li & Sun, 2009; Sun & Li, 2008) to construct the knowledge base (KB) and the inference engine (IE) for the parallel multi-1D blocks matching as follows. The input of the employed expert system is the overall resulted MSEs of the K 1D-blocks matching. For each of the K 1D-blocks matching, the true MV should be among the MVs with top smallest MSEs. Therefore, the employed KB of the proposed algorithm can be stated as follows. For each of the K 1D blocks matching, we se- lect the P search blocks that correspond to the top P smallest MSEs and assign them with the marked numbers (MNs) P, P � 1, . . . , 1, such that the selected search block with smaller MSE is marked by a larger MN. For example, the MN assigned to the search block with smallest MSE is P. Since each selected search block may con- clude an MV, there will be K � P MVs resulted from the parallel multi-1D blocks matching, and each MV is associated with the MN of the corresponding search block. However, some of the K � P MVs may be the same. In general, the frequently appearing MVs and the MV with smaller MSE have higher probability to be the true MV. Consequently, if an MV ap- pears q times in the K � P MVs, it will associate with q MNs. There- fore, the IE of our expert system can be stated as follows. For each distinct MV with q copies in the resulted K � P MVs, we will sum the q MNs and define the resulting value as the accumulated MN (AMN) of the MV. Consequently, the MV with largest AMN is deter- mined to be the true MV. For the sake of illustration, we use the following example to explain the proposed ESPM-1D-BMA. We assume K = 8 and P = 3. In Table 1, the first column shows the K 1D reference blocks, and the second, the third and the fourth columns show the P (=3) MVs with P smallest MSEs resulted from the 1D block match- ing for each reference block. Then the MVs in columns 2, 3, and 4 are assigned with MNs 3, 2, and 1, respectively. Calculating the Table 1 The top 3 MV for each 1D reference block. Index of reference block MV MV with smallest MSE 1 (3, 2) 2 (3, 2) 3 (5, 8) 4 (8, 8) 5 (8, 9) 6 (5, 8) 7 (3, 2) 8 (3, 2) Fig. 3. The hardware implementation architecture of th AMNs of the seven distinct MVs presented in Table 1, we find that (3, 2) has the largest AMN, 17, and is considered to be the true MV. 3. Implementation for motion estimation For the purpose of real-time ME, we can speed up the comput- ing speed of the proposed algorithm further by hardware imple- mentation. However, a pure digital circuit implementation (Hsieh & Lin, 1992; Yang, Wolf, & Vijaykrishan, 2005) may suffer from some implementation problems, such as high power consumption and large chip size. To overcome these implementation problems, a mixed-signal approach that uses simple current-summation circuit to circumvent computationally complex digital MSE computation should be a good choice (Panovic & Demosthenous, 2006). In the following, we will present the implementation of the proposed algorithm using mixed-signal approach step by step. 3.1. Implementing 1D-block matching using mixed-signal approach First of all, we transformed the sensored pixel values into volt- ages within the range [0 V, 2.5 V] by dividing the range of pixel val- ues between black and white into 255 grey levels, which are represented by 255 least significant bits (LSBs), such that black and white correspond to 0 and 255 LSB, respectively. We let Vx de- note the transformed voltage of a pixel value of x LSB, then V x ¼ 2:5V�0V255 � x. We let voltages Vr(i) and Vs(i) denote the trans- formed voltages of r(i) and s(i), respectively. Assuming B = 8, the mixed signal approach for the 1D block matching hardware imple- mentation architecture is presented in Fig. 3, in which we preload the transformed voltage Vr(i) of the pixel value of the 1D reference block r(i), i = 1, . . . , B in In into the reference block memory (RBM). Then the transformed voltage Vs(i) of the pixel value of the search frame In�1 is serially fed into the search block memory (SBM) from left to right, from top to bottom, then the transformed voltage of the pixel value of the 1D search block will be fetched from the SBM as shown in Fig. 3. The two clock signals tc and tr are used MV with 2nd smallest MSE MV with 3rd smallest MSE (5, 8) (1, 5) (5, 5) (8, 9) (3, 2) (8, 8) (3, 2) (5, 8) (6, 3) (5, 5) (6, 3) (3, 2) (5, 5) (1, 5) (6, 3) (1, 5) e 1D block matching using mixed-signal approach. Fig. 4. The computing architecture of parallel multi-1D block matching. 3252 S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 to control the timing of the propagation of the pixel value fed into SBM and RBM. The SE(i) denoted by a circle in Fig. 3 represents the ith square error computing circuit to result in an output current kt(Vr(i) � Vs(i))2, where kt is the transconductance parameter. The summation of the output currents of SE(i), denoted by I(i) in Fig. 3, i = 1, . . . , 8, will flow into a single resistive load R as shown in Fig. 3. Then, the resulted voltage across R denoted by VO is pro- portional to the MSE of 1D block matching and will be input to a P winner comparator (PWC), which will be presented later. Clearly, the computation of the MSE for all possible 1D search blocks can be completed when the last pixel value in the last row of the frame In�1 is read out. 3.2. Implementing parallel multi-1D blocks matching The parallel computing architecture for the K 1D blocks matching is presented in Fig. 4. The K 1D reference blocks are preloaded into RBMk, k = 1, . . . , K, and the search block in the frame In�1 are serially fed into SBM. The MSEk, k = 1, . . ., K are computed in parallel to obtain MSEk(SBC), k = 1, . . . , K. The detailed structure of SBM, RBMk, and MSEk in Fig. 4 are the same as that presented in Fig. 3. Fig. 5. The circuit of PW 3.3. Implementing ESPM-1D-BMA To implement the ESPM-1D-BMA, we need a PWC to select the P search blocks that correspond to the top P smallest MSEs resulted from the 1D block matching for each 1D reference block. The cir- cuit of PWC for the kth 1D reference block, denoted by PWCk, is presented in Fig. 5, which is designed based on a sorting logic; the solid lines and dotted lines in this figure represent the trans- mission of data and signals, respectively. At the very beginning, the P (=3) sample and hold circuits (SHs) are reset to a default value, which is the largest value that SH can take; similarly, the corresponding P digital memories (DMs) are reset to null values. Then, for each incoming MSEk(SBC), we will compare it with the MSE values stored in the P SHs as indicated by the P compar- ators, denoted by COMPi, i = 1, . . . , P, shown in Fig. 5. COMPi will gen- erate an enable signal eni, whose value depends on the comparison result such that if MSEkðSBCÞ < MSESHi then eni = 1; otherwise eni = 0. The combination of en1, . . . , enP will indicate which of the following ranges that MSEk(SBC) lies: ½0; MSESH1�; ðMSESH1 ; MSESH2�; . . . ;ðMSESHP�1 ; MSESHP �, orðMSESHP ;1�as presented in the illustrative table in Fig. 5, in which we set P = 3. From the value range of the incoming MSEk(SBC), we can easily update the P winners as follows. If eni ¼ 1; MSESHi , the content of SHi, should be replaced by either MSESHi�1 for the case eni�1 = 1 or MSEk(SBC) for the case that eni�1 = 0 or i = 1, and the content of the corresponding DMi will be replaced by the proper SBC accordingly. To implement the above P winners updating logic, we will design a selection signal, seli, to determine what to replace the contents of SHi and DMi when the enable signal eni = 1 in the following manner. If eni = 1 and seli = 1, the contents of SHi and DMi will be replaced by SHi�1 and DMi�1, respectively. If eni = 1 and seli = 0, the contents of SHi and DMi will be replaced by MSEk (SBC) and the corresponding SBC, respectively. If eni = 0, SHi and DMiremain unchanged. The values of seli, i = 2, . . . , P, which are determined based on the values of eni, i = 1, . . . , P, are presented in the illustrative table of Fig. 5, and they can be generated using AND Ck and illustrations. S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 3253 gates as shown in Fig. 5. Notably, sel1 is not needed because if en1 ¼ 1; MSESH1 and DM1 must be replaced by MSEk(SBC) and the corresponding SBC. For the kth 1D reference block with reference block coordinate RBCk, the above comparison and replacement pro- cess will continue until the last pixel value in the last row of the search frame is read out. The final contents in SHi and DMi, i = 1, . . . , P are the MSEs and the SBCs of the P search blocks with top P smallest MSEs, respectively. Consequently, the P SBCs stored in DMi, i = 1, . . . , P and the kth reference block coordinate (RBCk) will be input to the subtracter, SUB, as shown in Fig. 5 to calculate the top P MVks corresponding to the P smallest MSEs. This constitutes the operations of PWCk. By the aid of PWCk, k = 1, . . . , K, we can design the IE using the following components. Except for the indicators for indicating the index of and the number of distinct MVs in the K � P MVs, we employ a counter to cyclically generate the MN for each MV, an identifier to recognize the distinct MVs, an adder to calculate the AMN for each distinct MV and a comparator to identify the MV with largest AMN. Since the circuit to interconnect the above mentioned components for implementing the IE and generating the true MV is complicated and tedious, we will present it in Appendix A. 3.4. Hardware Implementation Architecture of ESPM-1D-BMA for ME Now, combining the parallel multi-1D blocks matching comput- ing architecture (Figs. 3 and 4), PWCk (Fig. 5), k = 1, . . . , K, and IE Fig. 6. The block diagram of ESPM-1D-BMA for ME. Table 2 The average ME accuracies of the 5000 estimated MVs. K P Football Greens Concord 2 1 91.70 98.84 89.98 2 95.11 98.96 92.12 3 96.04 99.00 93.18 4 96.51 99.36 94.18 5 97.36 99.04 94.12 4 1 98.27 99.60 94.32 2 98.83 99.36 95.18 3 99.08 99.44 95.40 4 99.39 99.42 96.20 5 99.56 99.48 96.10 6 1 99.30 99.68 96.12 2 99.55 99.52 96.80 3 99.58 99.32 97.10 4 99.50 99.60 97.52 5 99.57 99.68 97.96 8 1 99.63 99.82 96.98 2 99.65 99.58 97.12 3 99.58 99.54 97.20 4 99.60 99.60 97.92 5 99.69 99.46 98.28 2D-FSBMA 100 99.998 99.947 (Fig. 8 in Appendix A), the hardware implementation architecture of the proposed ESPM-1D-BMA for ME can be described in Fig. 6. The solid lines and dotted lines in Fig. 6 represent the transmission of data and signals, respectively. For the sake of simplicity in illustration, we assume K = 3 in Fig. 6, where SBM, RBMk and MSEk, k = 1, . . . , K, are the same as those in Fig. 4. The Image Sensor, IS, shown in Fig. 6 is used to ob- tain the image data of the 1D search block and the K reference 1D blocks. The purpose of Digital Controller, DC, is to control the acti- vation timing of each unit and the synchronization of the parallel processing architecture. Therefore, the operations of the hardware implementation architecture presented in Fig. 6 can be described as follows. First of all, the image data of the K 1D reference blocks will be preloaded into RBMk, k = 1, . . . , K. The DC will send a control signal to the IS to obtain the image data of In�1, which will be input to SBM. Since MSEk, k = 1, . . . , K are analog circuits, they will com- pute MSEk(SBC), k = 1, . . . , K, directly and in parallel once the data are ready at the output of both SBM and RBMk, k = 1, . . . , K. The computed MSEk(SBC), k = 1, . . . , K will be input to PWCk, k = 1, . . . , K controlled by the signal output from DC to determine whether it is among the P smallest MSEks for the kth 1D reference block. The above process will repeat until the last pixel value in the last row of the search frame is read out. Then DC will send a signal to PWCk to output the top P MVks for k = 1, . . . , K in parallel. These K � P MVs will be input to IE to generate the true MV. 3.5. Time complexity We let T(�) denote the computation time or propagation time of unit (�). In addition toT(�), there are other time delays need be con- sidered such as (i) the time delay incurred from DC to guarantee a safety margin of controlling the action of the K units of same type of components in parallel and (ii) the wire delay between units. However, comparing with T(�), these time delays are negligible. Therefore, the estimated time needed to generate an MV of the ESPM-1D-BMA can be stated in the following: ðM � NÞðT SH þ T MSE þ T PWCÞþ T IE ð4Þ where TSH denotes the time for loading a pixel value into RBM or SBM, TMSE denotes the time required in computing the MSE of 1D block matching, which is the propagation time of the circuit Fabric Hestain Pears Tissue Board 96.62 98.60 81.16 98.04 98.34 98.33 99.14 86.20 97.82 98.76 98.89 99.30 90.16 97.92 99.00 99.08 99.20 91.84 97.82 98.70 99.17 98.92 91.30 98.04 98.72 99.59 99.54 93.74 98.66 98.72 99.53 99.16 95.96 98.30 99.98 99.59 99.04 97.24 98.28 99.70 99.79 99.20 97.60 98.32 99.66 99.70 99.12 97.80 98.12 99.66 99.92 99.44 96.96 98.60 100 99.65 99.18 97.54 98.54 99.44 99.68 99.22 98.24 98.44 99.62 99.63 99.32 98.30 98.54 99.60 99.62 99.24 98.32 98.52 99.72 99.94 99.48 97.70 98.88 99.98 99.72 98.98 98.70 98.78 99.64 99.57 98.90 99.28 98.52 99.70 99.63 99.36 99.14 98.78 99.74 99.54 99.24 98.98 98.58 99.70 100 99.625 99.923 99.285 100 3254 S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 presented in Fig. 3, TPWC denotes the propagation time of the PWCk presented in Fig. 5 and TIE denotes the processing time of the IE pre- sented in Fig. 8. Fig. 7. The pseudo-code of IE. 4. Test results and comparisons In this section, we will demonstrate the ME accuracy achieved by the proposed ESPM-1D-BMA in comparison with the 2D-FSBMA and some fast block matching algorithms (Li et al., 1994; Zhu et al., 2002; Zhu & Ma, 1997) using extensive simulations. We will also compare the computational efficiency of the proposed ESPM-1D- BMA with the 2D-FSBMA. We use Matlab and their eight pictures, football, greens, concord, fabric, hestain, pears, tissue and board, as our simulation tool and test bed, respectively. The eight tested pic- tures were all formatted as grey-level image in eight-bit. We set M = 24 and N = 24 for all tests, X = 8 and Y = 8 for 2D-FSBMA, and B = 8 for ESPM-1D-BMA. However, we will use various combina- tions of K and P to test the performance of ESPM-1D-BMA. For each tested picture, we arbitrarily pick an image frame of M � N pixels to serve as the reference frame In. For each reference frame In, we prepare the search frame In�1 by the following procedures. We ran- domly generate an MV ranging from �M�X2 to M�X 2 and from � N�Y 2 to N�Y 2 in x and y directions, respectively, and apply it to In to form a noise free In�1. For each pixel in the noise free In�1, we randomly generate a noisy signal based on a normal distribution with mean 0 LSB and variance 3 LSBs and add it to the pixel. The resulted frame will serve as the tested search frame In�1. For each one of the eight tested pictures, we prepare 5000 (In�1, In)s based on the above process. 4.1. Comparisons of ME accuracy Now for each of the eight tested pictures, we use the prepared 5000 (In�1, In) s to estimate the corresponding randomly generated MVs using the proposed ESPM-1D-BMA and the 2D-FSBMA, the average accuracy of the 5000 estimated MVs for each picture and for various combinations of K and P are presented in Table 2. From Table 2, we can observe that the larger the values of K and P in the proposed ESPM-1D-BMA, the more accurate the MV estimation will be. We can also observe that when K P 8 and P P 4, the pro- posed algorithm is at most 1% less accurate than the 2D-FSBMA on the average. Putting the test results of ESPM-1D-BMA for K = 8 and P = 3 pre- sented in Table 2 in the second row of Table 3 for reference, we also use the same 5000 prepared (In�1, In) s to test the three fast block matching algorithms, the DS, the NTSS, the HEXBS. The average ME accuracy resulted by these three methods are presented in the last three rows of Table 3, which show that the proposed ESPM-1D-BMA is far better than the three fast block matching algorithms in ME accuracy. 4.2. Comparison of computational efficiency Now, to evaluate the computing time of ESPM-1D-BMA and the mixed-signal approach 2D-FSBMA (MS-2D-FSBMA), we need to re- view the computational complexity of the latter first. Similar to the Table 3 Comparisons of ESPM-1D-BMA with the three fast block matching algorithms. Football Greens Concord ESPM-1D-BMA (K = 8, P = 3) 99.58 99.54 97.20 DS 45.93 69.04 81.79 NTSS 48.69 81.13 86.87 HEXBS 38.15 63.02 73.89 1 � B current summation circuit employed in 1D block matching presented in Fig. 3, the MS-2D-FSBMA employed a X � Y current- summation circuit to obtain the 2D-MSE (Panovic & Demosthen- ous, 2006). Instead of PWCk, k = 1, . . . , K, the MS-2D-FSBMA need only one single winner comparator (SWC), which consists of a com- parator, COMP, a SH, and a DM to identify the coordinate of the SB with minimum MSE. In the case of a frame with size M � N, a block with size X � Y, the computing time of the MS-2D-FSBMA for com- puting an MV can be calculated by: 2 � X � Y � T SH þðM � XÞðN � Y þ 1ÞðY � T SH þ T MSE2D þ T COMPÞþðN � YÞðX � T SH þ T MSE2D þ T COMPÞ ð5Þ where TSH is the same as that in (4); T MSE2D denotes the time for per- forming a 2D-MSE computation; since the most time consuming Fabric Hestain Pears Tissue Board 99.57 98.90 99.28 98.52 99.70 82.88 89.04 75.54 79.24 38.73 86.22 87.89 75.01 87.69 43.99 67.10 74.53 62.13 74.69 28.19 Fig. 8. Hardware architecture for implementing the IE. S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 3255 component in SWC is the COMP, the time needed for comparison in SWC is TCOMP. Based on the existing circuit for SH (Chen, Gu, Shen, Wu, & Hsu, 1998), MSE2D (Panovic & Demosthenous, 2004) and COMP (Razavi & Wooley, 1992), we obtain the following computing time using HSPICE simulation: T SH ¼ 50 ns; T MSE2D ¼ 10 ns, and TCOMP = 100 ns. Although the analog circuit for computing MSE presented in Fig. 3 is simpler than computing MSE2D in MS-2D-FSBMA, the operations of all SEs in both circuits are carried out in parallel. Therefore, T MSE ffi T MSE2D ¼ 10 ns. Similarly, the operations of PWCk (Fig. 5), k = 1, . . . , K are also carried out in parallel, and the most time consuming component in PWCk is COMPk, therefore, TPWC ffi TSWC = TCOMP = 100 ns. According to the hardware implementation architecture of the IE presented in Appendix A, the processing time for IE is TIE = K � P � Tclock, where Tclock = 13.1 ns is the critical path delay of IE. Therefore, for K = 8, P = 3, we have TIE = 314.4 ns. Conse- quently, based on the parameters employed in our tests, the com- puting time needed for an MV estimation of ESPM-1D-BMA with K = 8, P = 3 and MS-2D-FSBMA are 92474.4 ns and 153,280 ns, respectively. This demonstrates that the proposed ESPM1D-BMA uses only 60% of computing time of the MS-2D-FSBMA and achieves almost the same ME accuracy. Notably, the critical point of the computing efficiency achieved by the ESPM-1D-BMA is the small amount of time spending on loading the pixel values of the frame In�1 into the SBM. The part of loading time in (4) and (5) for ESPM-1D-BMA and MS-2D-FSBMA are M � N � TSH and 2 � X � Y � TSH + (M � X)(N � Y + 1)(Y � TSH) + (N � Y)(X � TSH), respec- tively, which constitutes 31.14% and 79.33% of the corresponding total computing time, respectively. 5. Conclusion In this paper, we have proposed a hardware implementable ESPM-1D-BMA and demonstrated that its ME accuracy is close to the 2D-FSBMA and better than the three comparing fast block matching algorithms. Above all, the computing speed of ESPM- 1D-BMA is about two times as fast as the MS-2D-FSBMA. Appendix A To implement the IE, we will first describe the pseudo code for carrying out the IE then present the hardware architecture for implementing the pseudo code. We let mvk,p and MNk,p, p = 1, . . . , P denote the P MVs that correspond to the top P smallest MSEs for the kth 1D block matching and the corresponding MN, respectively; we let D denote the number of distinct MVs among the K � P mvk,p’s and let Dmvd, d = 1,. . .,D, denote the D distinct MVs; we let AMNddenote the AMN of Dmvd and let wMV and wAMN denote the resulting best-so-far MV and the corresponding AMN during the process, respectively. Based on the above nota- tions, the pseudo code for carrying out the IE is presented in Fig. 7. The operations of the pseudo code can be summarized in the following. In the initialization step, we reset all the variables. In step 1, we check whether the incoming MV, mvk,p, matches any one of the stored distinct MVs, Dmvd. If it matches, we identify the matched distinct MV and output a flag new = 0; otherwise, we denote the incoming MV as a new distinct MV and set new = 1. In step 2, we compute the AMN of the identified distinct MV. In step 3, we check whether AMN_new of the distinct MV iden- tified previously greater than the wAMN of wMV and output a flag win = 1 if the result is positive; otherwise we set win = 0. In step 4, we update the distinct MV and the associated AMN and update wMV and wAMN if win = 1; furthermore, if the flag new resulted in step 1 is 1, we set D = D + 1 that is to increase the number of dis- tinct MVs by 1. The hardware implementation architecture of the pseudo code is presented in Fig. 8. In the leftest part of this architecture, we use a counter, counter_kp marked by (ii) in Fig. 8, to generate the index p = 1, . . . , P for each k = 1, . . . , K sequentially and use a KP-to-1 Multi- plexer, denoted by MUX and marked by (i) in Fig. 8, to select mvk,p based on the generated index; this corresponds to the statement smv = mvk,p in step 1 of Fig. 7. The smv will be input to the MV iden- tifier marked by (iii) in Fig. 8, which will compare the smv with the distinct MVs stored in the registers denoted by Distinct MV and AMN. The counter, counter_D marked by (iv) in Fig. 8, will generate the index D, such that if smv does not match any existing distinct MVs, this smv will be the Dth distinct MV, and the MV identifier will set the enable signal enD = 1 to activate the Distinct MV and AMN registers, marked by (v) in Fig. 8, to store the current smv and the corresponding AMN. In the meantime, the MV identifier also sets new = 1 to increase counter_D by 1. We will then use an adder marked by (vi) in Fig. 8 to update the AMN register for the current smv, which may be one of the existing distinct MVs or a new distinct MV, as follows. Add the MNk,p corresponding to the current smv to the AMN_old then output AMN_new, which will be fed back to the 3256 S.-Y. Lin et al. / Expert Systems with Applications 39 (2012) 3249–3256 Distinct MV and AMN registers. Notably, the MNk,p is generated by the counter_MN marked by (vii) in Fig. 8. Furthermore, the AMN_- new will compare with wAMN through the comparator, >, marked by (viii) in Fig. 8. If AMN_new > wAMN, we set win = 1, which will up- date wAMN by AMN_new and wMV by smv as shown by the blocks marked by (x) and (ix), respectively. This completes the implemen- tation of the pseudo code presented in Fig. 7. We design the clock period to be long enough such that the sig- nal mvk,p can travel through the critical path, which includes the propagation of counter_kp, KP-to-1 MUX, and MV identifier, the ac- cess of the Distinct MV and AMN registers, the processing time of adder and comparator, and the update of wMV and wAMN. Based on Taiwan Semiconductor Manufacturing Corporation (TSMC) 0.18 lm CMOS technology, the critical path’s delay of the IE pre- sented in Fig. 8 can be within 13.1 ns. That means we can design the clock period for IE as Tclock = 13.1 ns. Then, the time required for processing the IE is TIE = K � P � Tclock. Therefore, for K = 8, P = 3, the IE can obtain the true MV within 314.4 ns. References Chen, M. J., Gu, Y. B., Shen, W. C., Wu, T., & Hsu, P. C. (1998). A compact high-speed Miller-capacitance based sample-and-hold circuit. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 45, 198–201. Gharavi, H., & Mills, M. (1990). Block matching motion estimation algorithms new results. IEEE Transactions on Circuits and Systems, 37, 649–651. Hsieh, C. H., & Lin, T. P. (1992). VLSI architecture for block-matching motion estimation algorithm. IEEE Transactions on Circuits Systems Video Technology, 2(2). Liao, S. H. (2005). Expert systems methodologies and applications – a decade review from 1995 to 2004. Expert Systems with Applications, 28(1), 93–103. Li, H., & Sun, J. (2009). Majority voting combination of multiple case-based reasoning for financial distress prediction. Expert Systems with Applications, 36(3), 4363–4373. Li, R., Zeng, B., & Liou, M. L. (1994). A new three-step search algorithm for block motion estimation. IEEE Transactions on Circuits Systems for Video Technology, 4, 438–442. Panovic, M., & Demosthenous, A. (2004). A compact block matching cell for analogue motion estimation processors. Proceedings of 2004 IEEE International Symposium on Circuits and Systems (ISCAS’04) (Vol. 2, pp. 229–232). Canada: Vancouver. Panovic, M., & Demosthenous, A. (2006). Motion estimation processor using mixed- signal approach. IEEE Transactions on Circuits and Systems II, 53(6), 492–496. Razavi, B., & Wooley, B. A. (1992). Design techniques for high-speed, high-resolution comparators. IEEE Journal of Solid-State Circuits, 27(12), 1916–1926. Sun, J., & Li, H. (2008). Listed companies’ financial distress prediction based on weighted majority voting combination of multiple classifiers. Expert Systems with Applications, 35(3), 818–827. Yang, S., Wolf, W., & Vijaykrishan, N. (2005). Power and performance analysis of motion estimation based on hardware and software realizations. IEEE Transactions on Computers, 54, 714–726. Zhu, S. & Ma, K.-K. (1997). A new diamond search algorithm for fast block matching motion estimation. In IEEE international conference on communications and signal processing (pp. 292–296). Zhu, C., Lin, X., & Chau, L. P. (2002). Hexagon-based search pattern for fast block motion estimation. IEEE Transactions on Circuits Systems for Video Technology, 12, 349–355. Expert system based parallel multi-1D block matching algorithm with implementation for motion estimation 1 Introduction 2 Expert system based parallel multi-1D block matching algorithm (ESPM-1D-BMA) 2.1 Review of 2D-FSBMA 2.2 Motivation 2.3 The 1D block matching 2.4 Parallel multi-1D blocks matching 2.5 Expert system based parallel multi-1D blocks matching algorithm (ESPM-1D-BMA) 3 Implementation for motion estimation 3.1 Implementing 1D-block matching using mixed-signal approach 3.2 Implementing parallel multi-1D blocks matching 3.3 Implementing ESPM-1D-BMA 3.4 Hardware Implementation Architecture of ESPM-1D-BMA for ME 3.5 Time complexity 4 Test results and comparisons 4.1 Comparisons of ME accuracy 4.2 Comparison of computational efficiency 5 Conclusion Appendix A References