key: cord-0047186-kzuhfx7f authors: Liu, Rongrong; Ruichek, Yassine; El Bagdouri, Mohammed title: Multispectral Dynamic Codebook and Fusion Strategy for Moving Objects Detection date: 2020-06-05 journal: Image and Signal Processing DOI: 10.1007/978-3-030-51935-3_4 sha: 96843dbfa8df665c2f5ae77fd21d15ad9e42f80a doc_id: 47186 cord_uid: kzuhfx7f The Codebook model is one of the popular real-time models for background subtraction to detect moving objects. In this paper, we propose two techniques to adapt the original Codebook algorithm to multispectral images: dynamic mechanism and fusion strategy. For each channel, only absolute spectral value is used to calculate the spectral similarity between the current frame pixel and reference average value in the matching process, which can simplify the matching equations. Besides, the deciding boundaries are obtained based on statistical information extracted from the data and always adjusting themselves to the scene changes. Results demonstrate that with the proposed techniques, we can acquire a comparable accuracy with other methods using the same multispectral dataset for background subtraction. Moving object detection is one of the most commonly encountered low-level tasks in computer vision and a prerequisite of many intelligent video processing applications, such as automated video surveillance [1] , object tracking [2, 3] and parking lot management [4] , to just name a few. A common and widely used approach for detecting moving objects from stationary camera is background subtraction. As the name suggests, it is the process to automatically generate a binary mask which classifies the set of pixels into foreground and background, during which, the moving objects are always called the foreground and the static information is called the background. Thus, background subtraction is sometimes also known as foreground detection and foreground-background segmentation [5] . The last decade witnessed very significant publications on background subtraction. A quick search for "background subtraction" on IEEE Xplore returns over 2370 publications in the last ten years (2009-2019). Among these, Gaussian Mixture Model (GMM) [6] modeled every pixel with a mixture of k Gaussian to handle multiple backgrounds. To deal with cluttered and fast variation scenes, a non-parametric technique known as Kernel Density Estimation (KDE) [7] was also proposed. There are also other methods that process images not only at the level of pixel, but also at region or even at frame levels [8] . Moreover, Kim et al. [9] , with Codebook algorithm, summarized each background pixel by one or more codewords to cope with illumination changes. In our work, Codebook approach has been chosen as the base modeling frame for its simplicity and efficiency. According to [10] , most of the methods are based on color images, namely Red-Green-Blue (RGB). In recent years, thanks to the technological advances in video capture, multispectral imaging is becoming increasingly accepted and used by the computer vision and robotic communities, in particular, Benezeth et al. [11] present a publicly available collection of multispectral video sequences called multispectral video sequences (MVS) dataset. This is the first dataset on MVS available for research community in background subtraction. In this paper, we try to utilize the different channels of multispectral images from a new perspective and propose a novel background subtraction framework motivated by Codebook algorithm. The goal is to simplify the similarity calculation between the current frame pixel and reference average value in the matching process. The remainder of this paper is organized as follows. The proposed methods is presented in detail in Sect. 2. Section 3 demonstrates the experiments and comparisons with other methods on the MVS dataset. Final conclusions and future works are given in Sect. 4. The original Codebook background modeling algorithm was proposed in 2005 by Kim [9] to construct a background model from long observation sequences. It is quite simply a non-statistical clustering scheme with several important additional elements to make it robust against moving background. The motivation for adopting such a model is that it is fast to run, because it is deterministic; efficient for requiring little memory; adaptive; and able to handle complex backgrounds with sensitivity [12] . It has been proved to be very efficient in dealing with dynamic backgrounds. During the last decade, many works have been dedicated to improve the original Codebook model [13, 14] . For example, [15] has adopted multi-scale multifeature codebook-based background subtraction targeting for challenging environments. In the work of [16] , the background model is constructed by encoding each pixel into a codebook consisting of codewords based on a box model and it is also appropriate in the Hue-Saturation-Value (HSV) color space. Besides, [17] has rewritten the model parameters and then processed the three channels separately to simplify the matching equations. In [18] , a dynamic boundary of codebook under the Lab color space has been developed. As a parametric method, the original Codebook needs parameter tuning to find the appropriate values for every scene. In this paper, we utilize a self-adaptive method motivated by [19] , to select the optimal parameters automatically. In order of achieve this, some statistical information need to be calculated iteratively and recorded for each codeword during the whole process. At the beginning, the codebook is an empty set and the number of the codewords is set to 0. When the first multispectral frame comes, the Codebook model is initialized by constructing an associated codeword for each pixel. The corresponding vector v m is set to be the average spectral values for all the channels and it is initialized as below: where n is the number of channels. The auxiliary information will be defined as a four-tuple where f m is the frequency of access or the number of times that the codeword is matched, λ m is maximum length of time between consecutive accesses, p m and q m are the first and the last accesses times of the codeword respectively. They are initialized as: In spite of the vector v m and the auxiliary tuple aux m , we also record a third vector named S m , which represents the set of the variance of the separate spectrum σ 2 i and is initialized as: What's more, another two vectors B min and B max need to be used to record the minimum and maximum values for each channel and they are initialized with the values of the spectral values of the first frame. For an input pixel at time instant t, with the current value x t = (X 1 , X 2 , . . . , X n ), the matching codeword is found if is satisfied for each channel, where B low and B high denote the lower and upper boundaries, respectively and are obtained by: where σ i is the standard deviation of the ith band value in the current codeword, whose square is the corresponding i th element of S m . Thus, during this whole process, the boundaries are obtained from the data themselves and no manual parameters tuning is required, which makes the proposed model more practical. Multispectral Dynamic Codebook Model Updating. When a new frame arrives, the matching criteria is first evaluated to see whether it is satisfied. If a match is found, the corresponding codeword is updated with the information of the current pixel, as illustrated in Algorithm 1, where N is the number of frames. Find the matching codeword to xt in C if the following condition is satisfied for each channel. B lowi ≤ Xi ≤ B high i 6: if C = φ or there is no match then 7: L ← L + 1, create a new codeword cL 8: v0 = xt 9: After constructing the Codebook model, the moving objects are detected by conducting the matching process. The pixel is detected as foreground if no acceptable matching codeword exists. Otherwise, it is classified as background and the corresponding codeword is updated according to the updating strategy. Another step forward to exploit benefits of each spectral channel of multispectral images is to fuse the detection results of the monochromatic channels. The idea is very straight forward. We first employ the multispectral dynamic Codebook which has been discussed in detail in the last subsection to every channel separately and obtain seven binary foreground-background masks independently. Then the detection results of the monochromatic channels (3 bands, 4 bands, 5 bands, 6 bands and 7 bands) are fused via union, vote or intersection to get the final foreground background segmentation result. The workflow for multispectral self-adaptive fusion strategy is shown in Fig. 1. Fig. 1 . Workflow of fusion strategy for multispectral self-adaptive Codebook To evaluate the performance of the proposed approaches for background subtraction, the MVS dataset presented by Benezeth et al. [11] is adopted for testing in this section. The proposed approaches are also compared with other methods using the same dataset [5] . Both visual and numerical results are displayed. The MVS dataset contains a set of five challenging video sequences with seven multispectral channels or bands (six visible spectra and one near-infrared spectrum) captured simultaneously. These sequences are all publicly available, and the ground truth images are obtained manually. Note that the first scene is indoor, while the other four are outdoor. The measure of accuracy employed in these experiments is F-measure, also known as balanced F-score or F1 score, which reaches its best value at 1 and worst score at 0. It is a harmonic mean of the precision and recall. Firstly, the experiments for multispectral dynamic Codebook are conducted on the thirty-five different three-band-based combinations, thirty-five different four-band-based combinations, twenty-one different five-band-based combinations, seven different six-band-based combinations, and total seven-band, for the five videos composing the MVS dataset. Then the largest F-measures are selected and listed in Table 1 . Accordingly, Table 2 shows the best F-measures with the fusion strategy. The largest F-measure for each video is in bold. Figure 2 shows the visual results on the five video sequences. The top row is the original frames and the second row is the corresponding ground truth frames provided together with the dataset. The third and forth rows are the results obtained by the proposed multispectral dynamic Codebook and fusion strategy, respectively. From Tables 1 and 2 , the largest F-measure never appear when all seven channels are used. The combination with four channels has the largest possibility to achieve the best performance. This agrees with the assertion deduced with the Pooling method proposed by Yannick et al. [11] that only few channels actually define better the moving objects. The results of the proposed methods are further compared in Table 3 with other methods [5] using the same dataset, in which, the brightness (B), spectral distortion (SD) and spectral information divergence (SID) are adopted to calculate the spectral distance in the matching process. The average F-measures for From average F-measures for the whole dataset, our approaches produce comparable results to the best method that utilizes all the three criteria in [5] (4 th column in Table 3 ), which proves the effectiveness of our methods. As we can see for the method adopting B and SD, the accuracy for outdoor scenes outperforms in average all the other mechanisms, but it achieves less satisfactory result for the indoor video. The two techniques proposed in this paper can be compromising solutions. Another advantage of the proposed methods is the low complexity of the matching equations, as the multispectral channels are processed separately only utilizing the intensity value of each channel and no correlation between channels need to be considered and calculated in the matching process. Besides, like other compared methods, it is also quite easy to adapt the proposed two algorithms to situation with any number of channels for background subtraction. In this paper, we have proposed two techniques to adapt the original Codebook algorithm to multispectral images: dynamic mechanism and fusion strategy, both of which process the seven channels of multispectral images independently. For each channel, only the intensity value is used to calculate the spectral similarity between the current frame pixel and reference one. Besides, the thresholds to determine are not set in advance empirically and fixed for the whole procedure, but obtained based on statistical information extracted from the data themselves and can always adjusting themselves to the scene changes. Results demonstrated that we can acquire a comparable accuracy using simpler matching equations than other techniques, when conducting experiments on the same multispectral public dataset. Our work may offer a new way for future works for applying multispectral images in moving objects detection. Recently, deep learning-based methods have attracted huge attention in research community for its impressive performance for classification, semantic segmentation, localization, object detection and instance detection. Motivated by the recent success of deep neural networks for foreground segmentation [20] , our next work is to explore deep learning besides traditional machine learning methods to investigate the benefits of multispectral images to improve the performance of background subtraction. Review on moving object detection in video surveillance Online multi-object tracking combining optical flow and compressive tracking in Markov decision process Multi-object tracking with discriminant correlation filter based deep learning tracker Intelligent parking space detection system based on image processing Extended codebook with multispectral sequences for background subtraction Background modeling using mixture of gaussians for foreground detection-a survey Non-parametric model for background subtraction Wallflower: principles and practice of background maintenance Real-time foregroundbackground segmentation using codebook model Traditional and recent approaches in background modeling for foreground detection: an overview Background subtraction with multispectral video sequences Hybrid Cone-Cylinder" codebook model for foreground detection with shadow and highlight suppression Online codebook modeling based background subtraction with a moving camera Foreground-background segmentation based on codebook and edge detector Multi-scale multi-feature codebook-based background subtraction Box-based codebook model for real-time objects detection Moving object detection based on improved codebook model Dynamic codebook for foreground segmentation in a video A Self-adaptive CodeBook (SACB) model for real-time background subtraction Learning multi-scale features for foreground segmentation