Average values of the mean error and number of measurements in the experiments.
This chapter presents various methods for object detection, localization and tracking that use a Wireless Sensor Network (WSN) comprising nodes endowed with low-cost cameras as main sensors. More concretely, it focuses on the integration of WSN nodes with low-cost micro cameras and describes localization and tracking methods based on Maximum Likelihood and Extended Information Filter. Finally, an entropy-based active perception technique that balances perception performance and energy consumption is proposed.
Target localization and tracking attracts significant research and development efforts. Satellite-based positioning has proven to be useful and accurate in outdoor settings. However, in indoor scenarios and in GPS-denied environments localization is still an open challenge. A number of technologies have been applied including inertial navigation (Grewal et al., 2007), ultra-wideband (Gezici et al., 2005) or infrared light signals (Depenthal et al., 2009), among others.
In the last decade, the explosion of ubiquitous systems has motivated intense research in localization and tracking methods using Wireless Sensor Networks (WSN). A good number of methods have been developed based on Radio Signal Strength Intensity (RSSI) (Zanca et al., 2008) and ultrasound time of flight (TOF) (Amundson et al., 2009). Localization based on Radio Frequency Identification (RFID) systems have been used in fields such as logistics and transportation (Nath et al., 2006) but the constraints in terms of range between transmitter and reader limits its potential applications. Note that all the aforementioned approaches require active collaboration from the object to be localized -typically by carrying a receiver- which imposes important limitations in some cases.
Also, recently, multi-camera systems have attracted increasing interest. Camera based localization has high potentialities in a wide range of applications including security and safety in urban settings, search and rescue, and intelligent highways, among many others. In fact, the fusion of the measurements gathered from distributed cameras can reduce the uncertainty of the perception, allowing reliable detection, localization and tracking systems. Many efforts have been devoted to the development of cooperative perception strategies exploiting the complementarities among distributed static cameras at ground locations (Black & Ellis, 2006), among cameras mounted on mobile robotic platforms (Shaferman & Shima, 2008) and among static cameras and cameras onboard mobile robots (Grocholski et al., 2006).
In contrast to other techniques, camera-based Wireless Sensor Networks, comprised of distributed WSN nodes endowed with a camera as main sensor, require no collaboration from the object being tracked. At the same time, they profit from the communication infrastructure, robustness to failures, and re-configurability properties provided by Wireless Sensor Networks.
This chapter describes various sensor fusion approaches for detection, localization and tracking of mobile objects using a camera-based Wireless Sensor Network. The main advantages of using WSN multi-camera localization and tracking are: 1) they exploit the distributed sensing capabilities of the WSN; 2) they benefit from the parallel computing capabilities of the distributed nodes; 3) they employ the communication infrastructure of the WSN to overcome multi-camera network issues. Also, camera-based WSN have easier deployment and higher re-configurability than traditional camera networks making them particularly interesting in applications such as security and search and rescue, where pre-existing infrastructure might be damaged.
This chapter is structured as follows:
Section 2 includes a brief introduction to Wireless Sensor Networks and describes the basic scheme adopted for the camera-based WSN.
Section 3 presents a basic data fusion based on Maximum Likelihood approach. The method has bad performance in case of losses of WSN messages, which can be not infrequent in some applications.
Section 4 proposes a data fusion method based on Extended Information Filter. This method has good performance at moderate computer cost.
Section 5 summarizes an entropy-based active perception technique that dynamically balances between perception performance and use of resources.
Section 6, which describes implementation details and presents some experimental results.
Finally, Section 7 is devoted to the final discussions and conclusions.
2. Camera-based WSN
2.1. Brief introduction to wireless sensor networks
A Wireless Sensor Network (WSN) consists of a large number of spatially distributed devices (nodes) with sensing, data storage, computing and wireless communication capabilities. Low size, low cost and particularly low power consumption are three of the key issues of WSN technology. Nodes are designed to operate with minimal hardware and software requirements, see basic scheme of the main modules in Fig. 1Left. They often use 8 or 16-bit microcontrollers at low processing rates and a limited RAM capacity for data storage. Nodes often require a few milliwatts for operation. Most nodes can be set in a standby state, from which they wake up occasionally, for instance when one sensor detects an event. Their radio transceivers are also very energy efficient and their transmission range is typically less than 100 m in the open air. Besides, its bandwidth is often low.
In contrast to the simplicity of each node, the main strengths of WSN rely on the cooperation of a number of nodes for cooperatively performing tasks. In fact, a good number of algorithms have been developed to provide them significant flexibility, scalability, tolerance to failures and self-reconfiguration. WSN are typically organized in tree-like channels between data sources (nodes) and data sinks (WSN base), see Fig. 1Right. Despite this apparent simplicity, algorithms for network formation and information routing have been intensively researched with the objective of optimizing the energy consumption, the communication delays expressed as the number of hops, the network dynamic reconfiguration or its reliability to failures. A survey on WSN routing protocols can be found in (Al-Karaki & Kamal, 2004).
The nodes of the WSN can be equipped with a growing variety of sensors including light intensity sensors, optical barrier, presence sensors, gas sensors and GPS. These features together with battery operation facilitate its deployment with minimal invasion and low installation and maintenance costs. The standardization of communication protocols, such as IEEE 802.15.4, has facilitated the effort to extend its range of possible applications. WSN have already been applied to building control (Sandhu et al., 2004), environmental monitoring (Polastre et al., 2004) and manufacturing automation (Hanssmann et al., 2008), among others (Akyildiz et al. 2002).
2.2. Camera-based WSN
A camera-based WSN uses cameras as main sensors of the distributed WSN nodes. In contrast to the advantage of using robust and reconfigurable WSN communication, camera-based WSN must face the main constraints of WSN technology, i.e. limited computational and data storage capacity and low communication bandwidth. Thus, centralized schemes in which all the images are processed by one node are not suitable for camera-based WSN. One proposed solution is to transmit the images gathered by the cameras through the WSN (Wark et al, 2007). However, this approach has bad scalability in terms of bandwidth, being critical in problems that require images of certain resolution at a certain rate. Also, additional constraints arise in centralized schemes when considering the computational and memory capacity required to process the images from all the cameras and fuse their results in only one WSN node. Lack of reliability to failures in the central node is another important drawback.
In our case a distributed scheme is adopted: the images captured by each camera are processed locally at each node. Camera nodes have sufficient computational capabilities to execute efficient image-processing algorithms in order to extract from the images the required information, for instance the location of an object on the image plane. Hence, only distilled bandwidth-reduced information from each camera node is transmitted through the WSN. Then, a node receiving the measurements from all the camera nodes can perform data fusion algorithms to determine the location of the object in real-world coordinates. This scheme reduces drastically the bandwidth requirements and distributes the overall computational burden among the nodes of the WSN.
In the adopted scheme, each camera node applies image-processing segmentation techniques to identify and locate the object of interest on the image plane. Thus, the possibility of flexibly programming image processing algorithms is a strong requirement. The selected camera board is the
We implemented a robust algorithm based on a combination of color and motion segmentations capable of being efficiently executed with limited computational and memory resources. The result of the segmentation algorithm is a rectangular region on the image plane characterized by the coordinates of its central pixel, its width and height.
Several data fusion methods are used to merge the results from the segmentation algorithms running in every camera node. Data fusion reduces the influence of errors in measurements and increases the overall system accuracy. On the other hand, it requires having the measurements from all the cameras expressed in the same reference frame. In the methods presented, each camera node obtains the coordinates of the region of interest on the image plane applying image segmentation algorithms and corrects its own optical distortions transforming them to the undistorted normalized pin-hole projection on the image plane. Each camera is internally calibrated and the calibration parameters are known at each camera node. Hence, camera nodes message packets include the distortion-corrected normalized measurements for each image analyzed. These messages are transmitted through the WSN for data fusion. This approach standardizes the measurements from all camera nodes facilitating data fusion method and distributes the computational cost among the camera nodes. For further details of the implementations refer to Section 6.
3. Localization using maximum likelihood
Maximum Likelihood (ML) is one of the basic statistical data fusion methods, (Mohammad-Djafari, 1997). Its objective is to estimate the state of an event that best justifies the observations maximizing a statistical likelihood function that can be expressed as the probability of measurement
Assume that the state is measured synchronously from
Assume that each measurement is subject to errors that can be considered to be originated by the influence of a high number of independent effects. By virtue of the Central Limit Theorem it can be considered to have Gaussian distribution, (Rice, 2006):
where each measurement is weighted proportionally to the inverse of its covariance: measurements with more noise have lower weigh in (4). The overall estimated covariance follows the expression:
It should be noted that since
The following describes the ML method adopted for camera-based WSN. Consider that a point
Figure 3 shows an illustration of the method with two cameras. The probability distribution of the measurements from
The described ML method can be executed in a WSN node in few milliseconds. This high efficiency facilitates schemes where camera nodes observing the same object interchange their observations and apply data fusion.
This method can be used for object localization but it is not suitable for object tracking, and even when used for localization the ML method has important constraints. Applying the ML data fusion requires having previously determined Zi, the location of P in frame
Furthermore, ML has high sensitivity to failures in measurements, for instance in cases where the object is out of the field of view of the camera, occluded in the image or in case of losses of WSN messages, not infrequent in some environments. This sensor fusion method relies totally on the measurements and its performance degrades when some of them are lost. Other sensor fusion techniques such as Bayesian Filters rely on observations and on models, which are very useful in case of lack of measurements.
4. Localization and tracking using EIF
Bayesian Filters (RBFs) provide a well-founded mathematical framework for data fusion. RBFs estimate the state of the system assuming that measurements and models are subject to uncertainty. They obtain an updated estimation of the system state as a weighted average using the prediction of its next state according to a system model and also using a new measurement from the sensor to update the prediction. The purpose of the weights is to give more trust to values with better (i.e., smaller) estimated uncertainty. The result is a new state estimate that lies in between the predicted and measured state, and has a better estimated uncertainty than either alone. This process is repeated every step, with the new estimate and measure of uncertainty used as inputs for the following iteration.
The Kalman Filter (KF) is maybe the most commonly used RBF method. The Kalman Filter and its dual, the Information Filter (IF), use a prediction model, that reflects the expected evolution of the state, and a measurement model, that takes into account the process through which the state is observed to respectively predict and update the system state:
In our problem the measurements considered are the location of the object on the image of the distributed camera nodes. Even assuming simple pin-hole cameras, these observation models are non-linear and a first order linearization is required. In this case, having non-linear prediction and measurement models leads to the Extended Information Filter (EIF). After linearizing the IF equations via Taylor Expansion, we can assume that the predicted state probability, written as Gaussian, is as follows:
where is the mean of the previous state,
where is the mean of the predicted state,
Information Filters (IF) employ the so-called canonical representation, which consists of an information vector
Therefore, the selection of the state and models has critical impact on the performance and computational burden of the filter. We selected a state vector typical in tracking problems that considers only the current object position and velocity
Of course, we do not know a priori what kind of movement would the object perform. So we assume local linear motion and we include Gaussian noise in each coordinate to consider errors in the model. This model can efficiently represent local motions and has been extensively applied in RBFs. Also, more complex models increase the computation burden and would require a priori knowledge of the motion, unavailable in tracking of objects with no collaboration, as is the case of security applications.
The EIF uses a different observation model for each camera that is seeing the object. The observation model adopted for camera i uses as measurements the distortion-corrected pin-hole projections from camera i at time
The location of the object at time t in the global reference frame
Thus, the overall measurement model
This observation model is, as already stated, non-linear. At the updating stage the EIF requires using the Jacobian matrices of
Each measurement at each camera node i requires only one prediction step and one updating step. Assuming 3 cameras, the execution of an iteration of an EIF for 2D localization and tracking with 3 cameras requires approximately 6,000 floating point operations, roughly 400 ms. in a Xbow TelosB mote, such as those used in the experiments. The Bayesian approach provides high robustness in case of losses of measurements. If at time t there are no measurements, only the prediction stage of the EIF algorithm is executed. In this case, the uncertainty of the state grows more and more until new measurements are available. This behavior naturally increases the robustness in case of failures of the segmentation algorithm or losses of measurement messages. Thus, EIF exhibits higher robustness than ML to noisy measurements and particularly to the lack of measurements. Some experimental results can be found in Section 7.
5. Active perception techniques
In the previous schemes all the cameras that are seeing the object at any time
The active perception problem can be broadly defined as the procedure to determine the best actions that should be performed. In our problem there are two types of actions, activate or deactivate camera
In most active perception strategies the selection of the actions is carried out using reward VS cost analyses. In the so-called greedy algorithms the objective is to decide the next best action to be carried out without taking into account long-term goals. POMDPs (Kaelbling et al., 1998), on the other hand, consider the long-term goals providing an elegant way to model the interaction of an agent in an environment, both of them uncertain. Nonetheless, POMDPs require intense computing resources and memory capacity. POMDPs also scale badly with the number of camera nodes. Thus, in our problem we adopted an efficient greedy active perception scheme.
At each time step, the strategy adopted activates or deactivates one camera node taking into account the expected information gain and the cost of the measurement. In our approach the reward is the information gain about the target location due to the new observation. Shannon entropy is used to quantify the information gain.
Consider the prior target location distribution at time
Entropy is a measure of the uncertainty associated to a random variable, i.e. the information content missing when one does not know the value of a random variable. The reward for action
There are analytical expressions to express the entropy of a Gaussian distribution. Assuming
On the other hand, the cost of activating a camera node is mainly expressed in terms of the energy consumed by camera. However, note that there are other costs, as those associated to the use of the wireless medium for transmitting the new measurements or the increase in computational burden required to consider the measurements from the new camera in the EIF. Also, these costs can vary depending on the camera node and the currently available resources. For instance, the cost of activating a camera with low battery level is higher than activating one with full batteries.
This active perception method can be easily incorporated within a Bayesian Recursive Filter. In our case it was integrated in the EIF described in Section 5. To simplify the complexity and computer burden, the number of actions that can be done at each time is limited to one. Thus, in a deployment with
The main disadvantage of (14) is that the action to be carried out should be decided without actually having the new measurement. We have to rely on estimations of future information gain. At time
This expression assumes that the location distribution of the target is Gaussian, which is not totally exact due to the nonlinearities in the observation pin-hole models. Also, they provide expectation of the information gain instead of the information gain itself. Despite these inaccuracies, it is capable of providing a useful measure of the information gain from a sensory action in an efficient way. In fact, the active perception method for a setting with 3 cameras adopted requires approximately 3,400 floating point operations, roughly 300 ms in Xbow TelosB motes, but can imply remarkable resources saving rates, up to 70% in some experiments shown in Section 6. It should be noted that its computational burden scales well since it is proportional to the number of cameras in the setting.
6. Implementation and some results
This Section provides details of the camera-based WSN implementation and presents some experimental results.
6.1. Implementation of camera-based WSN with COTS equipment
Requirements such as energy consumption, size and weight are very important in these systems. In our experiments we used
The micro camera board selected is the
6.1.1. Image segmentation
We assume that the objects of interest are mobile. First, assuming a static environment, the moving objects are identified through difference with respect to a reference image. A pixel of image
In case the color of the object of interest can be characterized, then a color-based segmentation is applied only to the windows with motion previously identified. For this operation the HSI color field is preferred in order to achieve higher stability of color with lighting changes. Then, an efficient 8-neighbours region-growing algorithm is used. Finally, the characteristics of the region of interest such as coordinates of the central pixel, region width and height are obtained. Figure 6 shows the results of each step over an image from a
The algorithm has been efficiently programmed so that the complete segmentation (the images are 352x288 pixels) takes 560 ms., 380 ms. of which are devoted to downloading the image from the internal camera buffer to the
6.1.2. Image distortions correction
In the next step, before transmitting the measurements for data fusion, each camera node corrects its own optical distortions, transforming them to the normalized pin-hole projection. Let
The internal calibration parameters -optical distortion parameters
Also, the position and orientation for each camera in a global reference frame are assumed known at each camera node. These 6 parameters -3 for camera position and 3 for orientation- are included in the measurements packets sent for data fusion so that it can cope with static and mobile cameras, for instance on very light-weight UAVs. The time stamps of the measurements are also included in these packets for synchronization.
6.1.3. Interface and synchronization modules
Several software modules were implemented on the
Another software module was devoted to synchronization among camera nodes. The method selected is the so-called
6.2. Some results
Figure 7Left shows a picture of one localization and tracking experiment. The objective is to locate and track mobile robots that follow a known trajectory, taken as ground truth. Figure 7Right depicts a scheme of an environment involving 5 camera nodes at distributed locations and with different orientations. The local reference frames of each of the camera are depicted. The global reference frame is represented in black.
In this Section the three data fusion methods presented are compared in terms of accuracy and energy consumption. Accuracy is measured as the mean error with the ground truth. For the consumption analysis we will assume that the energy dedicated by the
In all the experiments the commands given to the robot to generate the motion were the same. The object locations are represented with dots in Fig. 7Right. In all the experiments the measurements computed by cameras 2 and 3 from
Four different cases were analyzed: ML using cameras 1, 2 and 3; EIF using cameras 1, 2 and 3; EIF using the five cameras; and active perception with all the cameras. A set of ten repetitions of each experiment were carried out. Figures 8a-d shows the results obtained for axis X (left) and Y (right) in the four experiments. The ground truth is represented in black color and the estimated object locations are in red. In Figs. 8b-d the estimated 3 confidence interval is represented in blue color. Table 1 shows the average of the mean error and the number of measurements used by each method.
|ML||EIF 3 cameras||EIF 5 cameras||Active Perception|
|Mean error (m.)||0.42||0.37||0.18||0.24|
|Number of measurements||225||225||395||239|
In Fig. 8a it can be observed that the ML method performs quite well when the level of noise in the measurements is low. On the other hand, losses of measurements from cameras 2 and 3 originate important errors in the X coordinate of the estimated object location while measurements from camera 1 are enough to provide accuracy in the Y coordinate.
The EIF with cameras 1, 2 and 3 exhibits a more robust performance. The loss of measurements from cameras 2 and 3, prevents the EIF from having valid measurements for the X coordinate and thus, it relies on the system prediction model. Note that the covariance of the estimation in X increases gradually until measurements from camera 2 and 3 are again available, see Fig. 8b. Loss of measurements from cameras 2 and 3 have moderate effect in the confidence interval in Y. Globally the EIF achieved higher accuracy than the ML method in the experiments, see Table 1.
When all the cameras are fused, the estimation of the EIF is even more accurate: the 3 confidence interval becomes narrower, see Fig. 8b,c, and the mean error becomes significantly lower, see Table 1. Loss of measurements from cameras 2 and 3 has negligible effect in the estimation because other cameras provide that information into the filter. On the other hand, using a higher number of cameras requires using often constrained in WSN applications.
The active perception method dynamically activates the camera nodes required to reduce the uncertainty and deactivates the non-informative camera nodes to save resources. The practical effect is that it obtains good object localization and tracking accuracy, see Fig. 8d, with a drastic reduction in the number of measurements used, see Table 1. In the experiments carried out the mean errors achieved by the active perception method were almost as good as those achieved by the EIF with 5 cameras (0.24 versus 0.18) but they needed 39.49% less measurements (239 versus 395).
Figure 9 shows the results in an experiment assuming a cost of
The performance of the active perception is highly dependant on the values of the cost adopted to decide on the sensory action. The higher the cost, the higher has to be the information gain of an action to become advantageous. Figure 10 shows results obtained in an experiment using
This chapter describes three efficient data fusion methods for localization and tracking with WSN comprising nodes endowed with low-cost cameras as main sensors. The approach adopted is a partially decentralized scheme where the images captured by each camera node are processed locally using segmentation algorithms in order to extract the location of the object of interest on the image plane. Only low-bandwidth data is transmitted through the network for data fusion.
First, a Maximum Likelihood technique that fuses camera observations in a very efficient way is described. ML carries out data fusion using only the information contained in the measurements. It has good performance when the level of noise in the measurements is low but degrades with noisy measurements and particularly with lacks of measurements, for instance in cases of losses of WSN messages.
Then, an Extended Information Filter is proposed. Bayesian Filters compute the estimation based on measurements and observation and system models. We preferred EIFs instead of its dual EKF since the update stage of EIF is more efficient than EKF and thus it is more suitable when there are a high number of observations, such as it is our case, where a good number of low-cost camera nodes can be used. The uncertainty of the perception using EIF is reduced by using more camera nodes at the expense of requiring more resources such as energy, bandwidth and computer and memory capacity.
Finally, an Active Perception method based on a greedy algorithm balances between the information that can be obtained from a camera node and the cost of that information. The method dynamically activates the most-informative camera nodes required to reduce the uncertainty and deactivates the least-informative ones to save resources.
Several experiments with WSN comprising
The described methods have limited scalability with the number of camera nodes due to the computational and memory constraints of WSN nodes and limitations in the effective WSN bandwidth. The reliability to failures of the node performing the data fusion is also an important drawback. Decentralized data fusion can help to improve these issues. Efficient fully decentralized schemes suitable for camera-based WSN are object of current research.