A Construction Method for Automatic Human Tracking System with Mobile Agent Technology

Human tracking systems that can track a specific person is being researched and developed aggressively, since the system is available for security and a flexible service like investigation of human behaviour. For example, Terashita, Kawaguchi and others propose the method for tracking an object captured by simple active video camera (Terashita et al. 2009; Kawaguchi et al. 2008), and Yin and others propose the solution of the problem of the blurring of the active video camera (Yin et al. 2008). Tanizawa and others propose a mobile agent framework that can become the base of a human tracking system (Tanizawa et al. 2002). These are component technology, and they are available in construction of the human tracking system. On the other hand, Tanaka and others propose a human tracking system using the information from video camera and sensor (Tanaka et al. 2004), and Nakazawa and others propose a human tracking system using recognition technique which recognizes same person using multiple video cameras at the same time (Nakazawa et al. 2001). However, although these proposed systems are available as human tracking system, the systems are constructed under fixed camera position and unchanged photography range of camera. On the other hand, there are several researches to track people with active cameras. Wren and others propose a class of hybrid perceptual systems that builds a comprehensive model of activity in a large space, such as a building, by merging contextual information from a dense network of ultra-lightweight sensor nodes with video from a sparse network of high-capability sensors. They explore the task of automatically recovering the relative geometry between an active camera and a network of one-bit motion detectors. Takemura and others propose a view planning of multiple cameras for tracking multiple persons for surveillance purposes (Takemura et al. 2007). They develop a multi-start local search (MLS)based planning method which iteratively selects fixation points of the cameras by which the expected number of tracked persons is maximized. Sankaranarayanan and others discuss the basic challenges in detection, tracking, and classification using multiview inputs (Sankaranarayanan et al. 2008). In particular, they discuss the role of the geometry induced by imaging with a camera in estimating target characteristics. Sommerlade and others propose a consistent probabilistic approach to control multiple, but diverse active cameras concertedly observing a scene (Sommerlade et al. 2010). The cameras react to objects moving about, arbitrating conflicting interests of target resolution and trajectory accuracy, and the cameras anticipate the appearance of new targets. Porikli and others propose an automatic


Introduction
Human tracking systems that can track a specific person is being researched and developed aggressively, since the system is available for security and a flexible service like investigation of human behaviour.For example, Terashita, Kawaguchi and others propose the method for tracking an object captured by simple active video camera (Terashita et al. 2009;Kawaguchi et al. 2008), and Yin and others propose the solution of the problem of the blurring of the active video camera (Yin et al. 2008).Tanizawa and others propose a mobile agent framework that can become the base of a human tracking system (Tanizawa et al. 2002).These are component technology, and they are available in construction of the human tracking system.On the other hand, Tanaka and others propose a human tracking system using the information from video camera and sensor (Tanaka et al. 2004), and Nakazawa and others propose a human tracking system using recognition technique which recognizes same person using multiple video cameras at the same time (Nakazawa et al. 2001).However, although these proposed systems are available as human tracking system, the systems are constructed under fixed camera position and unchanged photography range of camera.On the other hand, there are several researches to track people with active cameras.Wren and others propose a class of hybrid perceptual systems that builds a comprehensive model of activity in a large space, such as a building, by merging contextual information from a dense network of ultra-lightweight sensor nodes with video from a sparse network of high-capability sensors.They explore the task of automatically recovering the relative geometry between an active camera and a network of one-bit motion detectors.Takemura and others propose a view planning of multiple cameras for tracking multiple persons for surveillance purposes (Takemura et al. 2007).They develop a multi-start local search (MLS)based planning method which iteratively selects fixation points of the cameras by which the expected number of tracked persons is maximized.Sankaranarayanan and others discuss the basic challenges in detection, tracking, and classification using multiview inputs (Sankaranarayanan et al. 2008).In particular, they discuss the role of the geometry induced by imaging with a camera in estimating target characteristics.Sommerlade and others propose a consistent probabilistic approach to control multiple, but diverse active cameras concertedly observing a scene (Sommerlade et al. 2010).The cameras react to objects moving about, arbitrating conflicting interests of target resolution and trajectory accuracy, and the cameras anticipate the appearance of new targets.Porikli and others propose an automatic object tracking and video summarization method for multi-camera systems with a large number of non-overlapping field-of-view cameras is explained (Porikli et al. 2003).In this framework, video sequences are stored for each object as opposed to storing a sequence for each camera.
Thus, these studies are efficient as the method to track targets.In the automatic human tracking system, tracking function must be robust even if the system loses a target person.Present image processing is not perfect because a feature extraction like ``SIFT`` (Lowe. 2004) has high accuracy but takes much processing time.The trade-off of accuracy and processing time is required for such a feature extraction algorithm.In addition, the speed a person walks is various and the person may be unable to be captured correctly in cameras.Therefore, it is necessary to re-detect a target person as tracking function even if the system loses the target.In this chapter, a construction method of human tracking system including the detection method is proposed for realistic environment using active camera like the above mentioned.And the system constructed by the method can continuously track plural people at the same time.The detection methods compensate for the above weakness of feature extraction as a function of system.The detection methods also utilize "neighbor node determination algorithm" to detect the target efficiently.The algorithm can determine neighbor camera/server location information without the location and view distance of video camera.Neighbor camera/servers are called "neighbor camera node/nodes" in this chapter.The mobile agent (Lange et al. 1999;Cabri et al. 2000;Valetto et al. 2001;Gray et al. 2002;Motomura et al. 2005;Kawamura et al. 2005) can detect the target person efficiently with knowing the neighbor camera node location information.In this chapter, the algorithm which can determine the neighbor node even if the view distance of video camera changes is also proposed.

System configuration
The system configuration of the automatic human tracking system is shown in Fig. 1.It is assumed that the system is installed in a given building.Before a person is granted access inside the building, the person's information is registered in the system.Through a camera an image of the persons face and body is captured.Feature information is extracted from the image by SIFT and registered into the system.Any person who is not registered or not recognized by the system is not allowed to roam inside the building.This system is composed of an agent monitoring terminal, agent management server, video recording server and feature extraction server with video camera.The agent monitoring terminal is used for registering the target person's information, retrieving and displaying the information of the initiated mobile agents, and displaying video of the target entity.The agent management server records mobile agents' tracking information history, and provides the information to the agent monitoring terminal.The video recording server records all video images and provides the images to the agent monitoring terminal via request.The feature extraction server along with the video camera analyzes the entity image and extracts the feature information from the image.
A mobile agent tracks a target entity using the feature information and the neighbor nodes information.The number of mobile agents is in direct proportion to the number of the target entities.A mobile agent is initialized at the agent monitoring terminal and launched into the feature extraction server.The mobile agent extracts the features of a captured entity and  compares it with the features already stored by the agent.If the features are equivalent, the entity is located by the mobile agent.
The processing flow of the proposed system is also shown in Fig. 1. (i) First, a system user selects an entity on the screen of the agent monitoring terminal, and extracts the feature information of the entity to be tracked.(ii) Next, the feature information is used to generate a mobile agent per target which is registered into the agent management server.(iii) Then the mobile agent is launched from the terminal to the first feature extraction server.(iv) When the mobile agent catches the target entity on the feature extraction server, the mobile agent transmits information such as the video camera number, the discovery time, and the mobile agent identifier to the agent management server.(v) Finally, the mobile agent deploys a copy of itself to the neighbor feature extraction servers and waits for the person to appear.If the mobile agent identifies the person, the mobile agent notifies the agent management server of the information, removes the original and other copy agents, and deploys the copy of itself to the neighbor feature extraction servers again.Continuous tracking is realized by repeating the above flow.
The system architecture is shown in Fig. 2. The GUI is operated only on the agent monitoring terminal.The GUI is able to register images of the entities and monitor the status of all the mobile agents.The mobile agent server is executed on the feature extraction server and allows the mobile agents to execute.The Feature extraction function is able to extract features of the captured entities, which is then utilized in the tracking of those entities as mobile agents.OSGi (Open Service Gateway Initiative Alliance) S/W acts as a mediator for the different software, allowing the components to utilize each other.The Agent information manager manages all mobile agent information and provides the information to the agent monitoring terminal.The Video recording S/W records all video, and provides the video movie to agent monitoring terminal.Each PC is equipped with an Intel Pentium IV 2.0 GHz processor and 1 GB memory.The system has an imposed condition requirement that maximum execution time of feature judgment is 1 second and maximum execution time of mobile agent transfer is 200 milliseconds.

Influence by change of view distance of video camera
Here is indicated a problem that a change of view distance of video camera makes change for neighbor cameras.And a solution for the problem is also indicated.

Problem of influence by change of view distance of video camera
If a mobile agent tracks a target entity, the mobile agent has to know the deployed location of the video cameras in the system.However the abilities of the neighbor cameras are also determined by their view distances.A problem caused by a difference in the view distances can occur.This problem occurs when there is a difference in expected overlap of a view or an interrupt of view.
A scenario in which a neighbor video camera's location is influenced by view distance is shown in Fig. 3.The upper side figures of Fig. 3 show four diagrams portraying a floor plan with four video cameras each, considering the view distances of each video camera are different and assuming that the target entity to be tracked moves from the location of video camera A to video camera D. The underside figures of Fig. 3 show neighbors of each video camera with arrows.The neighbor of video camera A in object (a-1) of Fig. 3 is video camera B but not C and not D as the arrows in object (a-2) show.In object (a-1) of Fig. 3, video camera C and D are also not considered neighbors of video camera A, because video camera B blocks the view of video camera C and D. And the target entity can be captured at an earlier time on video camera B. But in the case of object (b-1) of Fig. 3, the neighbors of video camera A are video camera B and C but not camera D as the arrows in object (b-2) of Fig. 3 show.In the case of object (c-1) of Fig. 3, the neighbors of video camera A are all video cameras as the arrows in object (c-2) of Fig. 3 show.Thus neighbor video camera's location indicates the difference in view distances of video cameras.The case of object (d-1) in Fig. 3 is more complicated.The neighbors of video camera A in object (d-1) of Fig. 3 are video camera B, C, and D as the arrows in object (d-2) of Fig. 3 show.And video camera B is not considered the neighbor of video camera C. It is because video camera A exists as a neighbor between video camera B and C. When it is assumed that a target entity moves from A to D, the target entity is sure to be captured by video camera A, B, A, and C in that order.This scenario indicates that the definition of "neighbor" cannot be determined clearly because the determination of the neighbor definition is influenced by the change of view distance and it becomes more complicated as the number of video cameras increases.

Neighbor node determination algorithm to resolve the problem
Neighbor node determination algorithm can easily determine the neighbor video camera's location without regard to the influence of view distances and any modification of the information of the currently installed cameras.The modification information is set in the system to compute neighbor video cameras on the diagram, which is expressed as a graph.Nodes are used to compute neighbor video camera's information in this algorithm.The nodes are defined as camera node and non-camera node.Camera node is the location of video camera that is labeled as camera node.The nodes are defined as A = {a 1 , a 2 , ..., a p }.This node is also a server with video camera.Non-camera node is defined as The conditions of a non-camera node are stated below; i) either of crossover, corner, terminal of passage, ii) the position where a video camera is installed, or iii) the end point of the view distance of a video camera.In addition, the point where the above conditions are overlapped is treated as one node.When the view distance of the video camera reaches a non-camera node, the non-camera node is defined as the neighbor of the camera node.When two non-camera nodes are next to each other on a course, those nodes are specified as neighbors.Fig. 4 shows an example of these definitions applied and shows the view distances of the video cameras.
The algorithm accomplishes neighbor node determination using an adjacency matrix.Two kinds of adjacency matrix are used by the algorithm.One is an adjacency matrix X made from camera nodes' locations as rows and non-camera nodes' locations as columns.Element x ij of matrix X is defined as (1).Another one is as adjacency matrix Y made from non-camera nodes' location as rows and columns.Element y ij of matrix Y is defined as (2).The neighbor information for video cameras is calculated from the connection information of non-camera nodes by using adjacency matrix X and Y.
. 1 There is the line which links camera node and non-camera node 0 There is no link. .
1 There is the line which links two non-camera nodes, and 0 There is no link or (3) is satisfied.by excluding unnecessary non-camera nodes from adjacency matrix X and Y. v) Calculate neighbor's location matrix by multiplying adjacency matrix and transposed matrix X' T .This neighbor's location matrix is the neighbor's node information.An unnecessary non-camera node is a non-camera node which has no camera node as a neighbor.Adjacency matrix X' and Y' are computed without unnecessary nodes, and using the procedure shown later.
There are reasons why it might be better to include the unnecessary nodes in the diagram from the beginning as we have done.Since the risk of committing an error will be higher as the diagram becomes larger, we include the unnecessary nodes from the beginning and remove them at the end.Finally, matrix E which indicates the neighbor nodes is derived as (4).
. . 1 is neighbour node to ' ' ' 0 is not neighbour node to

Human tracking method
Human tracking method consists of Follower method and Detection method.Follower method is used for tracking a moving target.Detection method is used for detecting a target when an agent has lost the target.In the tracking method, an agent has three statuses as "Catching", "Not catching" and "Lost".At first, an agent is assumed that it stays on a certain camera node.If the feature parameter the agent keeps is similar to the feature parameter extracted on the node, agent's status is indicated as "Catching".If the parameter the agent keeps is not similar to the feature parameter extracted on the node, agent's status is indicated as "Not catching".If the agent keeps "Not catching" status on a certain time, the agent decides that it lost a target, and agent's status is indicated as "Lost".

Follower method
In Follower method, an agent deploys its copies to neighbor nodes when agent's status becomes "Catching".When one of the copies has "Catching" status, all agents except that copy are removed from the system.And that copy becomes original agent.After that, the agent deploys its copies to neighbor nodes again.The follower method realizes tracking by repeating those routine.

Detection method
The detection method in this chapter is used to re-detect a target when the automatic tracking system loses the target.This method improves the tracking function, because an individual can not be accurately identified in the current image processing.As such the reliability of the system is further improved, because it enhances the continuous tracking function and re-detection of the target even if a target is lost for a long period of time.In this chapter, if a target is not captured within a certain period of time, the mobile agent then concludes that the target is lost.On such case the system can also conclude that the target is lost.
We are proposing two types of detection method: (a) "Ripple detection method" and (b) "Stationary net detection method".These methods are shown in Fig. 5. Ripple detection method widens a search like a ripple from where an agent lost a target to give top priority to re-detect.This method has a feature that the discovery time becomes shorter and usual tracking can resume more quickly, if the target exists near where the agent lost.In addition, this method deletes other agents immediately after discovering the target, and suppresses the waste of the resource.The Ripple detection method is developed and is experimented in search propriety.In the Ripple detection method, the neighbor camera nodes are shown as ( 5).
When a mobile agent lost a target, copy agents are deployed to the next nodes of (5) expressed by ( 6), and search is started.E 2 shows next neighbor camera nodes, because the elements of E 2 larger than 1 can be reached if the elements are larger than 1.Therefore, except neighbor node information E of camera nodes, automatic human tracking system uses a minimum resource by deploying copy agents.
Similarly, it becomes like ( 7) and ( 8) to calculate the next camera node further.
As mentioned above, the equation ( 9) is derived when deploying agents efficiently to the n next camera nodes.n is larger than 2 and is incremented one by one when this equation is used for detection.
1 1 Stationary net detection method widens a search like setting a stationary net with the Neighbor node determination algorithm from where an agent lost a target to give top priority to re-detect.This method uses equation (10) in the algorithm.
. . 1 1 is neighbour node to '( ') ' 0 is not neighbour node to In this equation, adjacency matrix E indicates the node that can reach via n non-camera nodes and n is always set to n ≥ 2. In this method, the coefficient n is set to n = 4 because camera nodes are set with a certain interval.The interval between cameras in the real system may be close, but in that case, number of non-camera nodes between the cameras decreases.
Therefore it is enough interval to re-detect a target if n consists of n ≥ 4.This method has a feature that agents are deployed to neighbor camera nodes via n next non-camera nodes and catch a target like a stationary net.In addition, this method also deletes other agents immediately after discovering the target, and suppresses the waste of the resource.The Stationary net detection method is developed and is experimented in search property.In the Stationary net detection method, the neighbor camera nodes are shown as (11).
When a mobile agent lost a target, copy agents are deployed to the next nodes of (11) expressed by ( 12), and search is started.X'Y' 2 X' T shows neighbor camera nodes via two noncamera nodes, because the elements of X'Y' 2 X' T larger than 1 can be reached if the elements are larger than 1.If copy agents are deployed at each camera nodes via non-camera nodes more than two, detection range of target widens.And, excepting neighbor node information E of camera nodes, automatic human tracking system uses a minimum resource by deploying copy agents.
Similarly, it becomes like ( 13) and ( 14) to calculate the next camera node of more wide range.
As mentioned above, the equation ( 15) is derived when deploying agents efficiently to the next camera nodes via n non-camera nodes.n is larger than 2 and is incremented one by one when this equation is used for detection.

Experimentation
Here are two types of experiment.One is an experiment by simulator, and the other one is an experiment by real environment.In the experiment by simulator, follower method and detection methods are experimented, and the effectiveness is verified.In the experiment by real environment, the tracking method is verified for whether the plural targets can be tracked continuously.

Experiment by simulator
Examination environment for the Ripple detection method and the Stationary net detection method is shown in Fig. 6 and Fig. 7.There are twelve camera nodes in the environment of floor map 1, and there are fourteen camera nodes in the environment of floor map 2. Here, the following conditions are set in order to examine the effectiveness of these detection methods.i) Camera nodes are arranged on latticed floor, 56m × 56m.ii) View distance of camera is set to 10m in one direction.iii) Identification of a target in the image processing does not fail when re-detecting.iv) Walking speed of the target is constant.v) Only one target is searched.vi) The target moves only forward without going back.In the case of the floor map 1, the target moves following the order of a 1 , a 2 , a 3 , a 4 , a 5 , a 6 , a 7 , a 8 , a 9 , a 10 , a 11 , a 12 and a 1 .In the case of the floor map 2, the target moves following the order of a 1 , a 2 , a 4 , a 5 , a 7 , a 9 , a 10 , a 11 and a 1 .In the examination, the time that an agent concludes a failure of tracking is same as search cycle time.The search cycle time is defined as the time concluded that an agent can not discover a target.The search cycle time is prepared using 3 patterns 12 seconds, 9 seconds and 6 seconds.Walking speed of the target is prepared using 3 patterns 1.5m/s, 2m/s and 3m/s.And search of target is prepared that an agent loses a target at a 7 and the agent starts a search in the situation that the target has already moved to a 8 .Furthermore, Stationary net detection method is examined by 3 patterns n = 2, n = 3 and n = 4, because of confirming effectiveness by number of non-camera nodes.On each floor map, using 12 patterns of such combination by each walking speed, discovery time and the number of agents are measured.Generally, the walking speed of a person is around 2.5m/s, and the two types of walking speed, 2m/s and 3m/s, used by the target which was examined are almost equivalent to the walking speed of general person.And walking speed, 1.5m/s, is very slow from the walking speed of general person.
The results of the measurement on the floor map 1 are shown in Table 1, Table 2 and Table 3.The results of the measurement on the floor map 2 are shown in Table 4, Table 5 and Table 6.They are a mean value of 5 measurements.
The result of the Ripple detection method shows that the discovery time becomes shorter and usual tracking can resume more quickly, if the target exists near where the agent lost.But, if the walking speed of a target is faster, the agent will become difficult to discover the target.
The result of the Stationary net detection method shows that the agent can discover a target if coefficient n has larger value, even if the walking speed of a target is faster.And it is not enough interval to re-detect a target if n consists of n ≤ 3 and it is not enough time to redetect the target if the search cycle time is shorter.
From the result of measurement on the floor map 1, if the Stationary net detection method uses coefficient n = 4, there is not the difference of efficiency between the Ripple detection method and the Stationary net detection method.However, from the result of measurement On the whole, the result of both methods shows that a number of agents decreases by searching a target near search cycle but the agents can not search the target if the search cycle time is longer than the waking speed.In addition based on the results, when the walking speed is faster, the discovery time is shortened or equal and the number of agents decreases or is equal.

Experiment by real environment
Upon verification of real system, it aimed to confirm whether targets can be tracked continuously by tracking method.Therefore, in order to reduce influence by image processing performance, the image processing adopts simple processing by color vision.In this experiment, there is a miniature environment of 2m by 2m shown in Fig. 8 and Fig. 9.
Here are three targets and those targets have each moving routes, red, blue, and green, shown in Fig. 8.The environment consists of seven servers with USB camera.The toys are used as targets instead of people tracked.This toy is a train toy with a sensor that recognizes a black line.And the train runs on along the line.The black line is used as a route a target walks and is drawn by hand.The toy is covered with a color paper to keep a certain accuracy of image processing as shown in Fig. 9.The moving speed of target is 6.8m/second.This environment has simulated a floor of 60m by 60m on the scale of 1/30; therefore the A solid line rectangle encloses the part where a red target was discovered, the rectangle of the short dashed line encloses the part where a blue target was discovered, and a dotted line rectangle encloses the part where a green target was discovered.The plot becomes a constant pattern if target is captured correctly, because the movement of the target is periodic.However, The illuminance was actually different according to the place that the camera installed, the color was judged the black by the shadow, and the target might not be able to be caught.Therefore, the number of agents did not increase and decrease by a constant pattern.But, the Neighbor node determination algorithm was able to calculate neighbor camera nodes.However, when the target is captured by video camera, the neighbor nodes are recomputed by the algorithm, then it was confirmed that an agent might not be deployed appropriately for a few dozens milliseconds.About this result, it is necessary to improve the computation time.In graphs of Fig. 10, the plot shows the time the target captured.It is confirmed that a target is lost temporarily from the camera capturing the target.But it is confirmed that the human tracking keeps tracking targets continuously by capturing the target by other cameras.

Conclusion
A construction method of automatic human tracking system with mobile agent technology is proposed using neighbor node determination algorithm.The construction method consists of the neighbor node determination algorithm and tracking methods.The neighbor node determination algorithm can compute neighbor nodes.This algorithm can be efficient to compute neighbor node and can make the system robust even if view distance of camera is changed.The tracking methods consist of follower method and detection method.The follower method can identify feature of a target locally.The detection method can search a lost target but a search cycle has to be within walking speed × distance between cameras.The detection method can be efficient to detect a target if the search cycle is near the walking speed.A mobile agent can keep tracking a target by using these detection methods if the agent lost the target.In addition, from the experiment results, the Stationary net detection method can detect a target faster than the Ripple detection method.And the Stationary net detection method can use smaller number of agents than the Ripple detection method.
Because the Ripple detection method searches a target by widening a search gradually but the Stationary net detection method can widen a search efficiently by the Neighbor node determination algorithm.
The effectiveness of proposed tracking method was experimented using simulator and real environment.In the experiment using simulator, the tracking methods are experimented by the walking speed of a target.In the detection methods, consideration is added about the propriety of the parameter n which gives the number of non-camera nodes.Aimed to confirming behavior of the automatic human tracking system, the system in a real environment uses the simple image processing which can identify the color information of a target except the influence by the accuracy of image processing.And the follower method and the detection method are confirmed to be effective by a toy instead of a targeted person and to be able to construct the automatic human tracking system.
We will research more efficient detection to improve the automatic human tracking system.In addition, the accuracy of image processing has to be improved more to track a target more accurately.We are considering to improve our tracking system by combining effective

Fig. 3 .
Fig. 3. Example of influence by change of view distance.
3)Below is the algorithm to determine neighbor nodes: i) Set camera nodes and non-camera nodes on the diagram as shown in object (b) of Fig.4.ii) Transform the diagram to a graph as shown in object (c) of Fig.4.iii) Generate an adjacency matrix X from camera node locations and non-camera node locations on the graph, and generate an adjacency matrix Y from noncamera node locations on the graph.Adjacency matrix X indicates that rows are camera nodes and columns are non-camera nodes.Adjacency matrix Y indicates that rows and columns are non-camera nodes, which results in adjacency matrix Y resolving an overlap problem of view distances between video cameras.iv) Calculate adjacency matrix X' and Y'

Fig. 6 .
Fig. 6.Floor map 1 for experiment of detection methods.on the floor map 2, if a floor map is complicated, the discovery time of the Stationary net detection method becomes shorter than the discovery time of the Ripple detection method and the number of agents of the Stationary net detection method becomes less than the number of agents of the Ripple detection method.

Table 1 .
Detection time on floor map 1 by walking speed 1.5m/s.Fig.7. Floor map 2 for experiment of detection methods.

Table 3 .
Detection time on floor map 1 by walking speed 3m/s.

Table 5 .
Detection time on floor map 2 by walking speed 2m/s.

Table 6 .
Detection time on floor map 2 by walking speed 3m/s