Human Sensing in Crowd Using Laser Scanners

Human sensing is a critical technology to achieve surveillance systems, smart interfaces, and context-aware services. Although various vision-based methods have been proposed (Aggarwal & Cai, 1999)(Gavrila, 1999)(Yilmaz et al., 2006), tracking humans in a crowd is still extremely difficult. Figure 1 shows a snapshot of a railway station during rush hour, which is one of the hardest examples of human sensing.


Introduction
Human sensing is a critical technology to achieve surveillance systems, smart interfaces, and context-aware services.Although various vision-based methods have been proposed (Aggarwal & Cai, 1999) (Gavrila, 1999) (Yilmaz et al., 2006), tracking humans in a crowd is still extremely difficult.Figure 1 shows a snapshot of a railway station during rush hour, which is one of the hardest examples of human sensing.Suppose a camera is diagonally set up on a low-ceiling, like the one in Fig. 1.Significant occlusion tends to occur in crowded places because pedestrians severely overlap each other.As a result, sufficiently high sensing performance cannot be achieved.On the other hand, if a camera is positioned to take measurements looking straight down in order to reduce occlusions, the viewing angle is limited.Furthermore, covering large areas using multiple Fig. 1.Snapshot of a railway station during rush-hour cameras is difficult due to the computational cost of data integration.These problems cannot be solved even using fisheye cameras or omni-directional cameras to expand the viewing angle.In addition, there are some cases in which cameras cannot be installed due to privacy concerns.
In this chapter, we propose a method to tackle these problems using laser scanners for human sensing in crowds.We especially focus on human tracking and gait analysis techniques.Our proposed method is well-suited to privacy protection because it does not use images but only range data.Moreover, because of the simple data structure of the laser scanner, we can easily integrate data even as the number of sensors increases, and real-time processing can be performed even when multiple sensors are used.Therefore, our method is especially suitable for crowd sensing in large public spaces such as railway stations, airports, museums, and other such facilities.That is to say, the above issues can be solved with our approach.We conducted an experiment of the proposed method in a crowded railway station in Tokyo in order to evaluate its effectiveness.This chapter is structured as follows.Section 2 reviews existing research on human sensing.Section 3 proposes a method of tracking people in crowds using multiple laser scanners.Section 4 describes the gait analysis of tracked people.Section 5 is a performance evaluation in a crowded station.

Review
In this section, we briefly review the existing research on human sensing.The approaches are roughly classified into three types: vision-based, laser-based, and sensor fusion.

Vision-based approach
The first type is the vision-based approach using video cameras.
Much research has been done using this approach, although the number of people targeted for tracking has been relatively small.A well-known human detector was proposed by the article (Dalal et al., 2005).They used histograms of oriented (HOG) descriptors with a support vector machine (SVM).Felzenszwalb et al. extended this detector based on the deformable part-based model (Felzenszwalb et al., 2009).For tracking targets, mean shift trackers are widely used (Comaniciu et al., 2000).
In order to extend such approaches to track multiple targets in a crowd, we have to handle significant occlusion of each object.A typical solution is to utilize data association such as a Kalman filter, particle filter, or Markov chain Monte Carlo (MCMC) data association approach.Okuma et al. proposed a boosted particle filter that can track multiple targets by combining the AdaBoost detector and a particle filter (Okuma et al., 2004).Zhao et al. proposed a principled Bayesian framework that integrates a human appearance model, background appearance model, and a camera model (Zhao et al., 2008).The optimal solution is inferred by an MCMC-based approach.The result shows that up to 33 people are tracked in a complex scene.Kratz et al. models spatial-temporal motion patterns of crowds by using a hidden Markov model (HMM), and tracks individuals in extremely crowded scenes (Kratz & Nishino, 2010).

Laser-based approach
The second approach is based on lasers.Fod et al. proposed laser-based tracking using multiple laser scanners (Fod et al., 2002).Their system measures a human's body, and tracks it by using a Kalman filter.More practical approaches for tracking people in crowds have been proposed by (Zhao & Shibasaki, 2005) (Nakamura et al., 2006).They measure pedestrians' feet to reduce the occlusion, and track individual pedestrians in crowds by recognizing their walking patterns.Experimental results showed that 150 people were simultaneously tracked with 80% precision in a railway station.Cui et al. combines laser-based tracking with a Rao-Blackwellized Monte Carlo data association filter (RBMC-DAF) to overcome tracking errors that occur when two closely situated data points are mixed (Cui et al., 2007).Song et al. have proposed a unified framework that couples semantic scene learning and tracking (Song, Shao, Zhao, Cui, Shibasaki & Zha, 2010).Their system dynamically learns semantic scene structures, and utilizes the learned model to increase the accuracy of tracking.

Sensor fusion
The third approach involves sensor fusion.Several techniques have been proposed to track multiple people by fusing the laser and vision approaches.Nakamura et al. used the mean shift visual tracker to support laser-based tracking (Nakamura et al., 2005).Cui et al. extended this approach by combining it with decision-level Bayesian fusion (Cui et al., 2008).Song et al. proposed a system of joint tracking and learning, which trains the classifiers that separate the targets who are in close proximately (Song, Zhao, Cui, Shao, Shibasaki & Zha, 2010).Trained visual classifiers are used to assist in laser-based tracking.Katabira et al. proposed an advanced air-conditioning system that combines laser scanners and wireless sensor networks (Katabira et al., 2006).The area that should be ventilated is determined by the humans' positions in the room and the temperature distribution.

Focus of this chapter
This chapter introduces the method of laser-based tracking and gait detection, which first emerged as a practical technique for sensing people in crowds that target more than a hundred people.

Human sensing using laser scanner
We use a laser scanner called SICK LMS-200.This sensor measures distance by using the time of flight (ToF) of laser light, and can also perform wide-area measurements (30-m distance).In addition, because the dispersion of the laser waves is minimal, resolution is high, and the angular resolution is 0.25 • at the maximum.The wavelength of laser light is 905 nm SICK LMS-200 Fig. 2. Snapshot of human sensing using a laser scanner Fig. 3. Example of range data obtained with laser scanner (near-infrared region), and it is a class 1A laser that is safe for peoples' eyes.The sampling frequency varies depending on the settings; it was 37.5 Hz in our case.
In the proposed method, a flat area about 16 cm off the floor is scanned with the sensors set on the floor.As a result, range data for ankles, including both static objects and moving objects, can be obtained.Figure 2 shows a sensing system, and Fig. 3 shows an example of the obtained range data.

Human sensing using multiple Laser scanners
We performed human sensing by using multiple laser scanners in order to minimize occlusions in the wide-area sensing.Suppose that each sensor obtains data at the same horizontal level; integration of multiple range data can be achieved using the following Laser Scanner Technology www.intechopen.comHelmart transformation.
where (x, y) represent a laser point in the local coordinate, (u, v) represent a transformed laser point in the global coordinate, m is a scaling factor, α is a rotation angle, and Δx and Δy are vectors shifted from the origin.These parameters are estimated by taking a visual correspondence using the rotation and shift of shared static objects (e.g.walls, pillars, etc.) measured by each sensor.The interface that performs this operation is built into the software.
After the integration and synchronization, human tracking is conducted by the algorithms explained in Section 3.2.Second, several laser point strikes on the foot are clustered in order to extract one foot candidate.In this study, a group of points within a radius of 15 cm is clustered as a foot candidate.In practice, due to errors in the sensor calibration, there are several cases in which the foot of one human is not entirely within a cluster, or the feet of several humans are within the same cluster.However, such false positives or false negatives can be reduced during the subsequent stages, and there is no significant impact on tracking processing.

Tracking flow
Third, the existing trajectories are extended to the current frame by using the Kalman filter.The details of this process are described in Section 3.2.2 to 3.2.4.By using a dynamic model of humans, the best foot candidate is integrated.
Last, if the foot candidate does not integrate into the existing trajectories, a new trajectory is created in the following steps of initial detection, and the initial state is set for the Kalman filter.
1. Grouping: When foot candidates not belonging to any of the existing trajectories are within 50 cm of each other, two foot candidates are grouped to create a human candidate.When there is a crowd, several human candidates can be created.Invalid human candidates are eliminated using the following seeding process.
2. Seeding: In consecutive frames, the candidates that satisfy the following two conditions are taken to represent the same human, and the connected centers of gravity for the two moving foot candidates represent a new trajectory.(a) At least one foot overlaps for a human candidate in consecutive frames (three or more frames).(b) The motion vector created by the other foot, which does not overlap, changes smoothly.

Walking model
When walking, pedestrians make progress by using one foot as an axis and moving the other foot.The two feet change roles alternately as they reach the ground and create a rhythmic walking motion.According to the ballistic walking model (Mochon & McMahon, 1980), muscle power acts when generating speed during the first half of the foot's movement, and the latter half of the foot's movement is passive.Figure 5 represents a simplified model of a pedestrian walking with attention given to the changes in the movement, speed, and acceleration of the feet.
In this research, the movement of the two feet is defined using four phases.Phase 1 is defined as going from a stationary state for both feet through the acceleration of the right foot alone, and to where the two feet are in alignment.Phase 2 is defined as when the right foot then decelerates and reaches the ground.In the same fashion, Phase 3 is defined as when the left foot accelerates, and Phase 4 is defined as when it decelerates.
The values v L and v R are the speed of the left and right feet respectively, a L and a R are their acceleration, and p L and p R are their positions.These variables are taken to have the values in the observed plane integrated using the process in Section 3.1.2.Here, f L/R represents the acceleration function for the two feet defined in Equation ( 8), and v represents the unit direction vector.In walking phase 2, the right foot decelerates at a steady rate, and both feet are on the ground.The acceleration acting here has a negative value because of the effect of an external force other than muscle power.This is defined as |a R | = − f R v.I n walking phases 1 and 2, the left foot is virtually stationary, and thus, |v L |≈0 and |a L |≈0.When the right foot is the axis, the acceleration for the left foot in walking phases 3 and 4 for foot movement is |a L | = f L v and |a L | = − f L v and the velocity can be defined as |v R |≈0 and |a R |≈0.In the state in which both feet are on the ground, |v L/R |≈0 and|a L/R |≈0.
Table 1.Transitions of state parameters in the walking phases

Definition of Kalman filter
As was described in Section 3.2.2, the walking model proposed in this chapter has three state parameters: v L/R , a L/R , and p L/R .As shown in Fig. 5, although the position and velocity

21
Human Sensing in Crowd Using Laser Scanners www.intechopen.com of each pedestrian vary continuously, the acceleration varies discretely depending on the phase of the foot movement.Thus, the state parameters are divided into two vectors, and the Kalman filter is defined based on the dynamics for moving objects.
Here, s k,n represents the state vector for the position p L/R , and the velocity v L/R for both feet for pedestrian n at the measurement time k.The vector u k,n represents the state vector for the acceleration a L/R , and ω represents the system noise.The subscripts x and y for each element represent the spatial coordinates.
The transition matrix Φ and Ψ are related to the state vector s k,n and u k,n from the past frame k − 1 to the present frame k.Here, Δt is the interval for observations, and in this study Δt ≈ 26 milliseconds.

22
Laser Scanner Technology /* Left foot is the rear foot (Phase 3) */ 14: The acceleration function f L/R is calculated with the equations below using the average step length S L/R .
Here, N represents the number of walking phases recognized from frame j to k. Frame j is determined experimentally.Furthermore, the initial value for the acceleration is calculated based on the amount of movement in the new tracing.
The Kalman filter updates the state vector s k,n using the following equation based on the observed value vector m k,n .
where, H represents measurement matrix, ǫ is measurement noise.

Tracing trajectories using Kalman Filter
Figure 6 shows the flow diagram of the tracing trajectories using the Kalman Filter.
First, the walking phase is recognized by using the algorithm described in Section 3.2.3, and u k,n is estimated.Then, ŝk,n and mk,n are predicted.Next, the foot candidates in a search area S area from among the predicted vector mk,n are searched.If a foot candidate is detected in the search area, then it is taken to be the foot m k,n of the pedestrian candidate, and the state vector s k,n is updated.If several foot candidates are found, then the one with the smallest Euclid distance for mk,n is taken to be the foot m k,n of the pedestrian candidate.If no foot candidates are found, then because of the possibility of occlusion, this is allowed for only a set period of time T thd .In this instance m k,n cannot be obtained, and so only the state vector and the error covariance matrix are predicted.If the set period of time T thd is exceeded, then the target is considered lost and the search is canceled.Tracking is performed by repeating this process until all tracings are completed.

Gait features
Gait refers to the walking style of humans, and is defined by several parameters such as walking speed, stride length, cadence, step width, ratio of stance phase, and swing phase.These parameters are useful not only in clinical applications, but also in research focusing on human identification, gender recognition, and age estimation (Sarkar et al., 2005).Gait detection is actively studied in the field of computer vision; for example, Bobick and Johnson used the action of walking to extract body parameters instead of directly analyzing dynamic gait patterns (Bobick & Johnson, 2001), BenAbdelkader et al. analyzed the periodicity of the width of an extracted bounding area, and then computed the period using autocorrelation (BenAbdelkader et al., 2002).
Stride length and cadence (number of steps per minute) are generally considered to be the most important gait features.This is because these features are easy to measure by visual observation.However, laser scanners can achieve more detailed analyses that have shorter time intervals than those observed visually.In this research, we extracted step length rather than stride length, and cycle time (walking cycle) rather than cadence, as depicted in Fig. 7.

Gait detection by using spatial-temporal clustering of range data
Generally, the movements of pedestrians' feet are periodic.If we put all the laser points into the spatial-temporal domain, we can see that some periodic spiral patterns are generated, as shown in Fig. 8.The cross points of this spiral pattern correspond to the axes of the feet.Therefore, we can detect gait features by using nearby cross points.In this research, we used mean shift clustering (Comaniciu & Meer, 1999) to detect cross points.Mean shift is a well-known algorithm for finding the local maximum of an underlying density function.Here, a Gaussian kernel is used, where σ s and σ t stand for the kernel size in the space and time domains, respectively.We implement the mean-shift algorithm with σ s = 0.15 m σ t = 0.5 sec.The detected cross points are indicated by solid circles in Fig. 8.It can be seen that the cross points have been correctly detected.More details of this process can be found in our previous research (Shao et al., 2006).
Moreover, walking speed can be calculated by v n k = s n k /ω n k .In this research, several cross points satisfying v n k ≤ 0.2 m/s were eliminated, because a stationary human does not provide a spiral pattern in the spatial-temporal spaces, which leads to a detection error.
Laser Scanner Technology www.intechopen.com

Experimental conditions
We evaluated the effectiveness of the proposed method through an experiment conducted at a railway station in Tokyo, which is used by roughly 250,000 people per day.The station concourse is about 20 meters by 30 meters, and can hold over 150 passengers at a time.
Figure 9 shows a plane view of the concourse and the locations of the sensors.The shadow areas are indicated in the observation field.The darker the shadow, the greater the number of sensors observing the area.Eight laser sensors (#1 through #8) were set up around the most crowded area.Furthermore, in order to evaluate the proposed method with real-world conditions, the authors set up several cameras to obtain video.Six cameras (#C3 -#C8) were positioned on the ceiling to take video from directly above the concourse, and two video cameras (#C1 -#C2) were positioned to take video diagonally.Fig. 9. Sensor alignments in a railway station, where #1 to #8 and #C1 to #C8 represent the position of laser scanners, and video cameras respectively.Shadow area shows the observation fields.

People tracking in crowd
Figure 10 shows the results of people tracking in crowds during rush hour.The red ellipses are recognized people, and yellow points are laser points 1 .Although significant occlusions occur, our proposed method can robustly track each pedestrian in the crowd.We found that a maximum of 150 people could be tracked at the same time and that tracking precision exceeded 80% during rush hour.The average pedestrian density at this time was roughly about 0.6 people/m 2 .
The proposed method was more effective for tracking people in crowds in wide open areas than with vision-based methods.Because this method is also useful for protecting privacy due to using only range data, it can be used for sensing in areas where it is difficult to set up video cameras.  2 we can see that his cycle time is short and stable, with a mean of 0.33 s and a variance of 20 ms, but his step length varies greatly, from about 0.6 m to 1.5 m, because of the change from fast walking to running.In this experiment, step length and cycle time of each step were extracted from our walking model and employed with speed to analyze different walking patterns.The results demonstrate that different walking patterns have their own distributions in the step length to cycle time space, and useful information about their behavior can be obtained.More information on activity using gait features can be found in the reference (Nakamura et al., 2007).two passengers get close to each other within 60 cm). Figure 13 shows a detected average number of train passengers during a day.We can see two peaks during the commuter rushes.

Conclusion
In this chapter, we described a method of human sensing in a crowd using multiple laser scanners.We evaluated the effectiveness of the proposed method for human tracking and gait analysis in a crowd through an experiment conducted at a large railway station.Our proposed method is well-suited to protect privacy because it does not use images but uses range data only.Therefore, our method is especially suitable for crowd sensing in public spaces such as railway stations, airports, and museums.We believe that this laser-based method is a necessary approach to complement vision-based methods that makes it possible to achieve a wider range of applications.

Figure 4 Fig. 4 .
Figure 4 illustrates the flow diagram of laser-based people tracking.

Fig. 6 .
Fig. 6.Flow diagram of tracing trajectories using the Kalman filter

Fig. 10 .
Fig. 10.Results of people-tracking in crowd during rush-hour

Fig. 12 .
Fig. 12. Visualization results of crowd flow in one day.Blue lines are movement from right to left, and yellow lines are from left to right.Red points represent static people, and white points indicate collision avoidance between two people.

Fig. 13 .
Fig. 13.Detected average number of passengers at a railway station in one day.

Table 1
summarizes the transitions of state parameters in walking phases.When