The results of the state vectors computed by the tracking algorithm (block-matching and Camshift) and the Kalman filter.

## Abstract

The Kalman filter has long been regarded as the optimal solution to many applications in computer vision for example the tracking objects, prediction and correction tasks. Its use in the analysis of visual motion has been documented frequently, we can use in computer vision and open cv in different applications in reality for example robotics, military image and video, medical applications, security in public and privacy society, etc. In this paper, we investigate the implementation of a Matlab code for a Kalman Filter using three algorithm for tracking and detection objects in video sequences (block-matching (Motion Estimation) and Camshift Meanshift (localization, detection and tracking object)). The Kalman filter is presented in three steps: prediction, estimation (correction) and update. The first step is a prediction for the parameters of the tracking and detection objects. The second step is a correction and estimation of the prediction parameters. The important application in Kalman filter is the localization and tracking mono-objects and multi-objects are given in results. This works presents the extension of an integrated modeling and simulation tool for the tracking and detection objects in computer vision described at different models of algorithms in implementation systems.

### Keywords

- Kalman filter
- tracking objects
- detection objects
- localization
- video and image processing
- computer vision
- embedded system

## 1. Introduction

The computer vision, from the technological evolution point of view, is the most useful in our days. It is a discipline at the border of computer science, mathematics, physics, neuroscience and various other disciplines, which aims to initiate the specific issues of image and video analysis from

Tracking corresponds to the estimation of the location of the object in each of the images in a video sequence, the camera and/or the object (face, man, hand, animal, etc.) being able to be simultaneously in motion. The localization process is based on the recognition of the object of interest from a set of visual characteristics (color, shape, speed, etc.). Specifically, the purpose of an object tracking method is to estimate, in each image of the sequence, the functions that are used in tracking the object or objects present in the field of vision of the camera such as motion, color, corners, outline, shape, and object view. In object tracking, the class, appearance, scale, and/or location of the tracking region are predicted based on the forward images and on the underlying model for state transitions. The state of the object is generally represented by its location and its speed.

There are then three main stages in the analysis of the video sequence, the first stage consists in carrying out the detection of moving objects. Then the step of tracking these objects from one image to another and finally, we analyze the tracks of objects to recognize their behavior. Many different techniques for tracking objects have been proposed. The detection events and detection moving objects in complex scenes is difficult to analyze due to camera noise and changing lighting conditions. Each limitation must be overcome in order to avoid failure of the tracking algorithm. In an object tracking algorithm, there are generally four steps: detection, location, association, and trajectory estimation [1, 2, 3]. The algorithms are composed by three important modules: block matching and meanshift, camshift, Kalman filter. The Kalman filter is used in a wide range of technological fields. It is a major theme in automation and frame and signal processing. The Kalman filter “KF” is a set of mathematical equations which provide an efficient (recursive) computation of the means for estimation the state of a process. The KF is very powerful in several aspects: it supports estimates of past, present and even future states and it can do so even when the exact nature of the modeled system tracking and detection objects. The Kalman filter is a corrective predictor filter. In the tracking system objects, this filter looks at an object as it moves, that is, it takes information on the state of the object at the precise moment. Then, it uses this information to predict where the object is in the next frame. For this, it takes as input a measurement vector (position in x, in y, width and height of the object). In the tracking process, this filter looks at an object as it moves, that is it takes information on the state of the object at the precise moment. In the case of tracking an object in motion, the Kalman filter allows us to estimate the states of motion of the object (and therefore predetermine the areas of motion in the following frames with using the combination for three algorithms (block-matching, Camshift and Kalman Filter)) and thus adds robustness tracking objects. Many authors have studied the Kalman filter in object tracking [1, 2]. In this work, we optimized many criteria in image and video processing application. For example, we can site: time execution, quality and performance in the image and video processing, artifact and noise in a frame, etc., the data flow for Kalman Filter is presented in Figure 1.

## 2. Different methods of modeling an object

In a follow-up scenario, an object can be defined as anything that is of interest for further analysis. For example people walking on a road, boats on the sea, fish inside an aquarium, airplanes in the air, cars on the road, a motion hand or face, motion for different objects and multi-objects, etc. It is a collection of objects that may be important to track in a specific area or environment. The implementation of an object tracking system involves designing two main parts, the object representation and the object location. The localization step is based on the representation model of the object and its location in the previous frame. The representation consists in associating with the object followed characteristics of shape and/or appearance allowing to recognize it in successive frames. In recent studies, representations by shape and appearances are classified into three families such as representation by point clouds, representation by bounding boxes (representation by geometric shapes) and representation by silhouettes. In what follows, we will describe these methods as illustrated in Figure 2 [4, 5, 6].

What representation is appropriate for tracking objects?

What algorithm should be used?

How is the movement, appearance and shape of the object modeled?

Tracking of events and detection motion objects in complex scenes is difficult to analyze due to camera noise and changing lighting conditions. Each limitation must be overcome in order to avoid failure of the tracking algorithm. In an object tracking algorithm, there are generally four steps: object detection, location, association, and trajectory estimation. We will be interested in this master’s work [1, 2, 7] to study the different methods of representing objects in a video sequence.

### 2.1 Camshift algorithm

Camshift is an algorithm for tracking objects in real time (people, vehicles). It is based on the colors developed in the video sequence. Camshift is based on the average displacement algorithm (Meanshift). The calculation module is based on iterations to reach convergence. Camshift take the HSV color space as a model with the color tone component

#### 2.1.1 Calculation of Camshift parameters

With each iteration of the Camshift algorithm, the object search window will be resized. To search for the new size, the search window obtained by the average displacement algorithm is slightly enlarged to include the object. Then, the parameters of the window must be adapted such as the width, the length and the center of gravity. The term

We calculate the secondary moments using the Eqs. (4), (5) and (6), orientation is calculated by Eq. (7) and the length and width of the object search window (11) and (12).

### 2.2 Kalman filter

#### 2.2.1 Definition

The Kalman filter is a set of mathematical equations which provide an efficient (recursive) computation of the means for estimation the state of a process, so as to minimize the mean of the quadratic error. The filter is very powerful in several aspects: it supports estimates of past, present and even future states and it can do so even when the exact nature of the modeled system is unknown. The filter allows, thanks to its role, to correct and restrict the areas in which we seek movement in the next step. We can see the quadratic error by Eq. (13).

#### 2.2.2 Role of the Kalman filter in the tracking application

The Kalman filter is used in a wide range of technological. In the tracking and detection process, this filter looks at an object as it moves, that is, it takes information on the state of the object at the precise moment. Then, it uses this information to predict where the object is in the next frame. For this, it takes as input a measurement vector (position in x, in y, width and height of the object). Then it acts on so-called internal parameters (position, speed and acceleration in x and y, as well as the height, the width) to make a prediction and then an estimate of these. Finally, the result is an estimate of the following measurement. In the case of tracking an object in motion, the Kalman filter allows us to estimate the states of motion of the object. Many authors have studied the Kalman filter in object tracking [1, 8, 9], the differences of the present work and the earlier works are the type and the method of objects tracking.

#### 2.2.3 Formulation and modeling of the Kalman filter

The main objective of the Kalman filter is to estimate the vector of states in a discrete time. This process is illustrated by Eq. (14) with stochastic linear differences:

With a measurement vector which has the following form (15):

The Matrix

where

### 2.3 The origins of filter formulation

We define

The covariance of the a priori estimation error is illustrated by Eq. (21):

For computing statistics for the Kalman filter, we start with the goal of finding an equation that computes an a posteriori state estimate as a linear combination of an a priori estimate and a weighted difference between an actual measure

The difference **??**), is chosen to be the gain or the mixing factor which minimizes the covariance of the posterior error. The gain of the Kalman filter is of the following form (23):

The use of a Kalman filter then allows us to estimate the parameters for tracking objects. However, the Kalman filter does not allow the moving element or these parameters to be extracted in the frame. We will first propose a method for detecting the moving object in the frame or video sequence. The fact that we used a robust, reliable and precise tracking algorithm greatly helped us to extract the two measurement and state vectors for the initialization of the Kalman filter (the inputs of the Kalman filter).

## 3. Different function of the Kalman filter

The Kalman filter is an optimal recursive estimator to the linear filtering problem Data. This filter has two necessary modules, a prediction module and a correction or estimation module. The Kalman filter then makes it possible to estimate the position of the object by achieving a compromise between the position observed in the frame and the predicted position. The input parameters of the Kalman filter are respectively, the position of the object in the frame at time “k,” the size of the object and the width and length of the object search window which are variable due to the mobility of the object during the sequence. These parameters represent the state vector and the filter measurement vector from Kalman filter. From the works that are studied in the literature [1, 4, 5], we chose the Kalman filter for estimating the tracking parameters. Figure 4 shows the Kalman filter cycle [8, 9, 10]. Generally the estimation of the tracking parameters with a Kalman filter is a process requires the following steps: itemize.

The measure which consists in taking the tracking parameters computed in the Camshift algorithm.

The estimate, which updates the position of the object.

The prediction, which computes the position of the object in the next frame.

The variable parameters of the Kalman filter are the state vector and the measurement vector:

The state vector is composed by the initial position, the width and the length of the search window as well as the center of gravity of the object

The measurement vector of the Kalman filter is composed of the initial position, the length and the width of the search window for the object at time

### 3.1 Process to estimate

The Kalman filter estimates the state “s” of a discrete process, this state is modeled by the linear Eq. (26):

With “A” (27) is the transition matrix,

The measurement model is defined by the Eq. (28).

With

The two vectors

The noise process is of the following form (32):

The measurement noise is presented by the dimension matrix

So the noise and measurement process covariances are deduced from

### 3.2 The update equations

Finally, the output equations for the two prediction and correction blocks of the Kalman filter are:

With

NFr | x | y | W | L | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

2 | 52 | 63 | 58 | 71 | 87.35 | 91.50 | 69 | 53 | 53 | 63 | 87.19 | 91.49 |

10 | 55 | 67 | 58 | 71 | 90.62 | 96.41 | 69 | 55 | 55 | 67 | 89.80 | 95.68 |

15 | 55 | 68 | 58 | 71 | 90.05 | 96.93 | 55 | 69 | 55 | 68 | 89.58 | 96.81 |

25 | 54 | 65 | 58 | 71 | 89.52 | 94.20 | 56 | 65 | 69 | 56 | 90.04 | 93.27 |

35 | 54 | 55 | 58 | 71 | 89.18 | 83.92 | 55 | 55 | 69 | 55 | 89.28 | 83.72 |

45 | 48 | 56 | 58 | 71 | 83.45 | 85.26 | 49 | 57 | 69 | 49 | 83.36 | 85.37 |

55 | 46 | 63 | 58 | 71 | 81.56 | 92.43 | 46 | 64 | 69 | 46 | 80.97 | 92.44 |

65 | 43 | 65 | 58 | 71 | 78.52 | 93.53 | 44 | 69 | 69 | 44 | 78.49 | 93.37 |

75 | 43 | 63 | 58 | 71 | 78.73 | 92.46 | 43 | 64 | 69 | 43 | 77.98 | 92.59 |

85 | 36 | 57 | 58 | 71 | 71.41 | 85.59 | 37 | 57 | 69 | 37 | 71.03 | 85.34 |

99 | 38 | 62 | 58 | 71 | 73.80 | 91.03 | 39 | 63 | 69 | 39 | 73.91 | 91.26 |

The expression for the estimation error is presented by Eq. (41). We applied this Eq. (41) for all the parameters of the state vectors (x, y, W, L,

The results of estimation calculation of the tracking parameters with our algorithm (Camshift and Kalman filter) that we obtain in the different test sequence are given in Table 2.

Video | Nbre | Histogram | Subtraction | Skin color |
---|---|---|---|---|

Sequence | Frames | Calculation | Background | Detection |

Execution time | Execution time | Execution time | ||

Foreman | 68 | 16.51 (s) | 5.21 (s) | |

Redcup | 68 | 19.21 (s) | ||

Afef | 68 | 17.02 (s) | 10.58(s) | |

PETS2001(1) | 68 | 43.54 (s) | 36.56 (s) | |

PETS2001(2) | 68 | 47.83 (s) | 43.29 (s) |

We can see the two pre-treatment methods (background subtraction and skin color detection) are the fastest and the histogram calculation method is the slowest.

## 4. The results for tracking and detection objects

The fundamental basis for estimating the parameters of tracking by the Kalman filter consists in estimating the state vector

We can see another correction of trajectory of car (in too sequences video) in Figure 6.

## 5. Results discussion

During the test sequences generated with the different pre-processing methods, we can conclude that object tracking differs from one object to another (a human being, a face, a hand, a glass, a car) and that several parameters can influence the monitoring result.

The experimental results obtained indicate that our algorithm (Camshift and the Kalman filter) gives superior results, in terms of precision, reliability and execution time, in comparison with the various methods presented in the literature (for example the KLT (Kanade Lucas Tomasi) algorithm and the classifier algorithm (Adaboost and SVM) [1, 2, 3, 7]). In particular, the use of several preprocessing methods to detect the object in each frame of the sequence. The results of the implemented algorithms are the meanshift displacement algorithm and block-matching, the Camshift algorithm and the Kalman filter, this combination for the this algorithms give a robust, precise, reliable and fast algorithm.

Evaluating the performance of a mobile object tracking system in a video sequence is a complex task which requires the definition of metrics bringing into play concepts specific to video analysis, such as time persistence, precision and execution time for example.

## 6. Conclusions

The detection and tracking of objects in a sequence of images or video is a topical need for several applications such as video conferencing, video indexing and especially video surveillance. Computer vision with a Human Interface Machine “HIM” is therefore an issue actively studied in many domains, especially since the prices of acquisition and processing equipment have become more attractive. This is an area that touches on everything, starting from the problems of acquisition with different linked effects and where the originality of simple ideas can still bring a lot. In this chapter, we introduced the Kalman filter algorithm for tracking and detection objects and multi-objects. Localization, target tracking, and detection objects were provided as examples for reader’s better understanding of practical usage of the Kalman filters. We proceeded to the implementation of the different modules of object tracking algorithm through the estimation of calculation parameters using a Kalman filter. The results obtained make it possible to meet the monitoring requirements of several video surveillance applications. On the one hand, the localization precision achieved by our system makes it a standard module for detection or identification or object tracking systems. On the other hand, a flow at a frequency of 20 frames per second was considered, which is reasonable for an object tracking system with a minimum execution time. The tracking algorithm with its different modules must be tested with other video sequences. Although the implementation of monitoring systems has certain weaknesses, our method has given promising results. Many avenues can be envisaged to continue this work. First of all, note that we tested the algorithm implemented for tracking two objects (a car and a pedestrian in the sequence of “PETS 2001 (1)” and two cars in the sequence of “PETS 2001 (2)”), and it can be applied for tracking multiple objects in a video sequence. Then, use the detection algorithm based on Adaboost classifiers upstream of the tracking algorithm (Camshift and Kalman filter). The association of these two modules is based on a cascade of Adaboost classifiers, improves the calculation time and improves the quality of tracking of one or more objects in a sequence of images or video. Then, validation of the detection and tracking system for faces and other objects (pedestrians, cars, hand gestures, glass, etc.) on an FPGA target platform (Saber-Lite with ARM-Cortex-A9MP). Our solution optimize the time of execution and other criteria in frame and video processing. In future, we intend to extensively evaluate the method quantitatively so that it can be well tested before trying on computer vision practice.