基于CamShift的视频跟踪算法改进及实现


International Journal of Advanced Network, Monitoring and Controls          Volume 03, No.04, 2018 

97 

Improvement and Realization of CamShift Algorithm Based MotionImage Tracking

Wang Yubian*  

Department of Railway Transportation Control 

Belarusian State University of Transport, 

34, Kirova street, Gomel,246653, Republic of Belarus 

*is the communication author. 

e-mail: alika_wang@mail.ru 

 
Yuri Shebzukhov 

Department of the International Relations 

Belarusian State University of Transport,  

Republic of Belarus  

34, Kirova street, Gomel, 246653, Republic of Belarus 

e-mail: oms@bsut.by 

 
Abstract—The detection and tracking technology of moving 

object image is one of the key technologies of computer vision, 

and is widely used in many fields such as industry, 

transportation, and military. The detection and tracing of 

moving object in the motion scenes based on the UAV 

platform is a technical difficulty in this field. In practical 

applications, the characteristics of complicated environment, 

small target object and moving platform require higher 

real-time performance and reliability of the algorithm. If it is 

possible to add some other features of the target to the 

tracking process, it will be possible to improve the 

shortcomings of Camshift itself. Based on the Camshift 

tracking algorithm, this paper integrates SURF feature 

detection into the algorithm, which greatly improves the 

tracking accuracy of the target object, and also has better 

real-time performance which can achieve better tracking 

performance of the object. 

Keywords-Motion Image; CamShift Algorithm; SURF; 

Feature Detection 

I. INTRODUCTION 

Video tracking is the key technology for motion target 

object detection in dynamic sceneson UAV platform. It can 

be realized by two methods: one is based on target 

recognition technology, of which the core concept is 

frame-by-frame recognition algorithm for motion video to 

identify the target object and determine the target matching. 

The other is based on the detection technology of the 

moving object, of which the core concept is the active 

detection of the moving object, and the position of the 

moving object is determined in accordance with the 

detection result to realize the tracking.This method can 

achieve the tracking of any moving object without the need 

of complicated priori information for detection, such 

ascharacteristics of object shape, object sizes. However, the 

tracking effect of various tracking algorithms also depends 

on the background migration of the object, the unpredictable 

tracking path, the unpredictable target motion path and 

mode, the scene switch, the target object movement does not 

have an analyzable pattern, the change of camera model, the 

camera shift, and the change  of illumination condition. 

And the causes of the changes in the color and shape of the 

moving object are very different [1]. The current 

mainstream motion object tracking methods with good 

tracking performance mainly include feature-based tracking 

methods, region-based tracking methods, model-based 

tracking methods, motion estimation-based tracking 

methods, and contour-based tracking methods. 

The detection and tracking algorithm of the conventional 

motion object is only suitable for the scene with static or 

almost static background, but it is not suitable for the 

detection and tracking of moving objects in the UAV video. 

Therefore, the digital sequence information of the video 

images acquired by the drone should meet the real-time, 

Accurate, robust and other requirements [2]. The current 

research shows that the target object tracking method based 

on Camshift algorithm can meet the requirements of drone 

video target tracking. 

The CamShift algorithm performs target object 

recognition and tracking by analyzing the Hue component 

DOI: 10.21307/ijanmc-2019-028


International Journal of Advanced Network, Monitoring and Controls          Volume 03, No.04, 2018 

98 

information of the target region in the HSV color space. The 

target has low deformation sensitivity. The algorithm has 

good real-time performance, little computationand low 

complexity. Therefore it has been extensively studied. 

In the video target tracking by drone, the Camshift 

algorithm [3],comparing to other tracking methods, has 

many advantages and disadvantages. Under the condition 

such as the similar color setting to the target object in the 

background or in the complicated scene, the Camshift 

algorithm may have tracking error or fail to track because of 

the tracking characteristicsthat the algorithm is based on the 

color information of the moving object. If other auxiliary 

features of the object can be obtained during the search 

process and input conditionsof algorithm are specified, it is 

possible to make up for the problems caused by Camshift in 

such scenes. Because the SURF algorithm has the advantage 

of good object recognition, the implementation of the 

SpeedUp Robust Features (SURF) in the Camshift 

algorithm will greatly improve the tracking accuracy and the 

tracking reliability of the object. The improved method 

ensures the good real-time performance of the tracking, 

greatly improves the tracking accuracy of the object by 

UAV, and eventually achieves a better tracking effect of the 

moving object. 

II. THE CHARACTERISTICS OF SURF 

The tracking of the characteristics of the target object 

isbased on the tracking of point of interest, which is often 

used in engineering applications and it works well. The 

difficulty of this method lies in the selection and extraction 

of features [4]. The selection of features should be able to 

fully cover the key features of the target object in different 

scene settings, and such features shall also be convenientfor 

extraction. In general, if the number of sampling points is 

insufficient in the process of extracting features, it is easy to 

lose tracking of the object and the tracking effectwould 

deteriorate. On the contrary, the calculation amount and 

complexity will be greatly increased, and unableto satisfy 

the actual application. 

Although the Harris corner detection method is a 

traditional point of interest detection method, due to its 

fixed scale characteristics, it is becoming difficult to 

determine the change ofposition of the target object between 

image frames when the target object is deformed or there is 

change in dimensions. Prof. David G. Lowe of the 

University of British Columbia in Canada first proposed the 

Scale Invariant Feature Transform (SIFT) [5]. The SIFT 

algorithm for querying feature key points is to use feature 

detection within the constructed scale. The orientational 

selection of point of interest is calculated according to the 

gradient of its neighborhood, thus achieving the goal that 

the feature in the scale does not change in accordance with 

the orientation. However, the algorithm requires high 

computational complexity, and has high requirements for 

hardware devices, and a shortcoming of poor real-time 

computation. On the basis of this algorithm, Yan Ke et al. 

introduced the Principal Components Analysis(PCA) in the 

SIFT system and proposed the PCA-SIFT algorithm, which 

profoundly improved the matching inefficiency of the 

common SIFT algorithm. However, this method will lead to 

the failure of feature extraction in the later stage as well as 

the deterioration of the characteristic distinctiveness. 

On this basis, some people have proposed a point of 

interest algorithm based on SURF [6]. The extracted 

features are used as the key local features of the images. The 

speed of the fast calculation is improved by the integral 

image method. Then the point of interest obtained by Haar 

wavelet transform are used to obtain the main orientation 

and its feature vector [7]. Finally, the Euclidean metric is 

calculated to verify the matching effect between the images. 

The SURF feature is invariant in the presence of changes in 

brightness, translation, rotation, and scale. In addition, the 

method does not show any negative effect to its robustness 

under noise interference conditions even if the viewing 

angle changes. This method not only realizes theaccurate 

feature recognition, but also reduces the computational 

complexity, which greatly improves the inefficiency of the 

method in use, and has broad application. 

 
International Journal of Advanced Network, Monitoring and Controls          Volume 03, No.04, 2018 

99 

III. THE SURF ALGORITHM 

A. Constructing Hessian matrix 

SURF relies on approximation of the determinant 

images of Hessian matrix. The Hessian matrix is the core of 

the SURF algorithm [8]. First, the Hessian matrix of each 

pixel is calculated according to the equation, and then the 

Hessian matrix discriminant is used to determine whether 

the point is an extremum. Gaussian filtering is used to 

construct the Gaussian pyramid [9]. Compared with the 

Gaussian pyramid construction process of SIFT algorithm, 

the speed of SURF algorithm is improved. In the SURF 

algorithm, the image space occupancy of each set is 

different. The previous set of images is downsampled by 1/4 

to obtain the next scale. At the same time, the same set of 

images has the same size, and the difference is that different 

scales σ are used. In addition, in the process of blurring, the 

Gaussian template size is always constant, and only the 

scale changes σ. For the SURF algorithm, the image size 

remains the same, only the Gaussian blur template size and 

scale σ need to be changed. 

B. Preliminary determination of point of interestusing 

non-maximum suppression 

Each pixel processed by the hessian matrix is compared 

with its 26 points in the three-dimensional domain. If it is 

the extremum of these points, it shall beselected as a 

preliminary point of interest for the next process [10]. The 

detection process uses a filter corresponding to the size of 

the scale image to detect. In this paper, a 3×3 filter is used 

as an example for detection analysis. The point of interestfor 

detection which is one of the 9 pixel points in the scale and 

the remaining 8 points in the scale is compared with the 9 

pixel pointsin each of theadjacenttwo scale layers above and 

below it to complete the calculation of 26 points in the 

three-dimensional domain. 

C. Precisely locate extremum 

The three-dimensional linear interpolation method 

removes the points whose valuesare less than a certain 

threshold, but the sub-pixel point of interest can be obtained, 

and the extremumbe increased, the number of detected point 

of interest is therefore reduced, and finally several 

highly-featured points are detected and the amount of 

workis reduced [11]. 

D. Select the main orientation of the point of interest 

In order to ensure the rotational invariance, the gradient 

histogram is not countedin the SURF, but the Haar wavelet 

response around the point of interestregion is computed [12]. 

That is, centering on the point of interest, within the circular 

neighborhood of a radius of 6s (s is the scale of the point of 

interest), the Haar wavelet side length is 4s, and the Haar 

wavelet response of all points in the 60 degree fan in both x- 

and y-directions is computed. Sum, and assign a Gaussian 

weight coefficient to the Haar wavelet responses, to improve 

the contribution of the response close to the point of interest, 

suppress the contribution of the response away from the 

point of interest, and then add the response within the range 

of 60 degrees, eventually yielding a new vector, traversing 

the whole circular region. The main orientation of the point 

of interestwas defined by the orientation of the longest 

vector [13]. 

E. Construct descriptor of the SURF point of interest[14] 

Extract a square region around the point of interest. The 

size of the window is 20s. The orientation of the region is 

the main orientation detected in step 2.4. The region is then 

divided into 16 sub-regions, each of which the Haar wavelet 

responses of the x- and y-direction of 25 pixels are summed, 

where the x- and y-directions are relative to the main 

orientation. The Haar wavelet response is the sum of the 

horizontal direction, the sum of the absolute values in the 

horizontal direction, the sum of the vertical directions, and 

the sum of the absolute values in the vertical direction. 

IV. EXTRACT AND MATCH THE POINT OF INTEREST 

（1）Select a frame from the drone tracking video and 

extract the points of interest using SURF detection method, 

as shown in Figure 1. 

 
International Journal of Advanced Network, Monitoring and Controls          Volume 03, No.04, 2018 

100 

 
Figure 1. Extract points of interest 

（2）Matching the target region. After selecting the 

target region, extract the target regions in the adjacent two 

frames and match the points of interest of the two frames. In 

Figure 2, it can be seen that 6 points of interest are 

successfully matched. 

 
Figure 2. Matching the points of interest 

V. VERIFICATION 

After manually selecting the target window, the 

feedback mechanism is used to calculate the color similarity 

between the Camshift tracking window and the initial 

window, and the feature similarity between the SURF 

tracking window and the initial window. Suppress the major 

displacement, and assign the weight of displacement 

dynamically to the two tracking algorithms in accordance 

with Bhattacharyya distance [15]. The Camshift tracking 

algorithm is preferred when the motion tracking is stable, 

and otherwise, the SURF tracking method [16] is preferred. 

The experiments show that the tracking method is ideal for 

tracking, and it can solve the problem of tracking 

interference caused by background changes and object 

similarity. Takes a picture of tracking object every 15 

seconds, and an example is shown in Figures 3, 4, 5, and 6. 

 
Figure 3. Tracking image 1 

 
Figure 4. Tracking image 2 

 
Figure 5. Tracking image 3 

 
International Journal of Advanced Network, Monitoring and Controls          Volume 03, No.04, 2018 

101 

 
Figure 6. Tracking image 4 

As shown in the above figures 3, 4, 5, and 6, the drone 

has achieved good results in tracking the moving object. 

VI. CONCLUSION 

Based on the classic tracking algorithm Camshift and 

focused on the deficiencies of the algorithm in the motion 

object tracking of drone video, this paper proposes the 

combination of Camshift algorithm and SURF feature 

detection to realize the tracking of motion object in UAV 

video. The experimental results show that the proposed 

method can effectively track and locate the target object in 

the more complex aerial photography background. The 

experiment has achieved good results, basically realizes the 

tracking of the moving object, and the tracking speed is fast, 

thus the real-time performance is satisfactory and the 

timeconsumption is little.  

However, in practical applications, the environment of 

the object tracking is more complicated and diverse. The 

further study of this paper can be carried out in the 

following respects: This paper only studied the tracking of a 

single object, which does not involve the tracking of 

multiple similar objects. The multi-object tracking has 

practical significance in video surveillance, intelligent 

traffic detection, air formation, and geographic monitoring. 

Therefore, it is very necessary to further study the tracking 

of multiple objects; the object tracking system in this paper 

needs to be improved, including optimization of the logic 

system and add the feature of parallel processing to improve 

the tracking efficiency of the system. The Camshift 

algorithm in this paper is still based on color histogram, 

which is sensitive to changes in illuminance and the change 

in color of objects. When the resolution of the camera is not 

high and the ambient light is insufficient, the tracking effect 

is not good,therefore the study should focus on the tracking 

of physical features that are insensitive to illumination. 

 
REFERENCES 

[1] Liu yanli, tang xianqi, Chen yuedong. Application research of 
moving target tracking algorithm based on improved Camshift [J]. 
Journal of Anhui Polytechnic University,2012, 27(2):74-77. 

[2] Xiong TAN, Xuchu YU, Jingzheng LIU, Weijie HUANG. Object 
Fast Tracking Based on Unmanned Aerial Vehicle 
Video[C]//Proceedings of PACIIA, IEEE Presss,2010:244-248. 

[3] C. Harris, M. J. Stephens. A Combined Corner and  Edge 
Detector[C]. Prco of the 4th Alvey Vision Conf, 2013: 147-152. 

[4] D. G. Lowe. Distinctive Image Features from Scale-Invariant 
Keypoints[J]. International Journal of Computer Vision.2014, 60(2): 
91-110. 

[5] C. Harris, M. J. Stephens. A Combined Corner and  Edge 
Detector[C]. Prco of the 4th Alvey Vision Conf, 2015: 147-152. 

[6] D. G. Lowe. Distinctive Image Features from Scale-Invariant 
Keypoints[J]. International Journal of Computer Vision.2014, 60(2): 
91-110. 

[7] Liu yawei. Review of target detection and tracking methods in uav 
aerial photography video [J]. Airborne missile,2016.(9):53-56. 

[8] Yan K,Sukthankar:a more distinctive representation for local image 
descriptors[C]//ProceedingsofCVPR,Los Alamitos,IEEE 
press,2014:268-235. 

[9] Bay H,Ess A,Tuytelaars T.Speeded up robust 
features(SURF).[J]Computer Vision and Image 
Understanding,2007,110(3):346-359. 

[10] Leutenegger S,Chli M,Siegwart R.BRISK:binary robust invariant 
scalable keypoints.[C]//Proceedings of ICCV,IEEE 
Press,2013:326-329. 

[11] Cui zhe. Image feature point extraction and matching based on SIFT 
algorithm [D]. Xi 'an: XiDian: university.2016:38-46. 

[12] Yu huai, Yang wen. A fast feature extraction and matching algorithm 
for uav aerial images [J]. Journal of electronics and information 
technology, 2016, 38(3):509-516. 

[13] Wang jianxiong. Research on key technologies of low altitude 
photogrammetry of unmanned airship and practice of large scale 
map formation [D]. Xi 'an: chang 'an university, 2011:36-48. 

[14] Li yifei, research on PID control in four-rotor aircraft,[J] technology 
and market, 2016.07:90-91. 

[15] Li xiang, wang yongjun, li zhi, misalignment error and correction of 
attitude system vector sensor,[J] journal of sensor technology, 
2017.02:266-271. 

[16] Wang donghua, yue dawei. Design and implementation of large 
remote sensing image correction effect detection system,[J] 
computer programming skills and maintenance,2015.12:118-120. 

 
http://kns.cnki.net/kcms/detail/detail.aspx?dbcode=SJES&filename=SJES13011501083888&v=MTMyOTA0UFFIL2lyUmRHZXJxUVRNbndaZVp1SGlublU3N0pKbG9jYnhNPU5pZk9mYks3SHRETnFvOUVaT01NQkhReG9CTVQ2VA==&uid=WEEvREcwSlJHSldRa1Fhb09jSnZpZ08yWkh6OUZSbjYwY0pIaGNMdmlKTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4ggI8Fm4gTkoUKaID8j8gFw!!
http://kns.cnki.net/kcms/detail/detail.aspx?dbcode=SJES&filename=SJES13011501083888&v=MTMyOTA0UFFIL2lyUmRHZXJxUVRNbndaZVp1SGlublU3N0pKbG9jYnhNPU5pZk9mYks3SHRETnFvOUVaT01NQkhReG9CTVQ2VA==&uid=WEEvREcwSlJHSldRa1Fhb09jSnZpZ08yWkh6OUZSbjYwY0pIaGNMdmlKTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4ggI8Fm4gTkoUKaID8j8gFw!!
http://kns.cnki.net/kcms/detail/detail.aspx?dbcode=SJES&filename=SJES13011501083888&v=MTMyOTA0UFFIL2lyUmRHZXJxUVRNbndaZVp1SGlublU3N0pKbG9jYnhNPU5pZk9mYks3SHRETnFvOUVaT01NQkhReG9CTVQ2VA==&uid=WEEvREcwSlJHSldRa1Fhb09jSnZpZ08yWkh6OUZSbjYwY0pIaGNMdmlKTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4ggI8Fm4gTkoUKaID8j8gFw!!
http://scholar.cnki.net/result.aspx?q=Brisk:Binary%20robust%20invatiant%20scalable%20keypoints
http://scholar.cnki.net/result.aspx?q=Brisk:Binary%20robust%20invatiant%20scalable%20keypoints
http://scholar.cnki.net/result.aspx?q=Brisk:Binary%20robust%20invatiant%20scalable%20keypoints