Decentralised Scalable Search for a Hazardous Source in Turbulent Conditions

The problem is autonomous coordinated search by an interconnected group of moving robots for the purpose of finding and localising a source of hazardous emissions (e.g., gas and particles). Dispersion of the emitted substance is assumed to be affected by turbulence, resulting in the absence of concentration gradients. The chapter proposes a search strategy that operates in a completely decentralised manner, as long as the communication network of the moving robots forms a connected graph. By decentralised operation, we mean that each moving robot is reasoning (i.e., estimating the source location and making decisions on robot motion) locally. Coordination of the group is achieved by consensus via communication with the neighbours only, in a manner which does not require global knowledge of the communication network topology.


Introduction
Searching strategies for finding targets using appropriate sensing modalities are of great importance in many aspects of life. In the context of national security, there could be a need to find a source of hazardous emissions [1][2][3]. Similarly, rescue and recovery missions may be tasked with localising a lost piece of equipment that is emitting weak signals [4]. Biological applications include, for example, protein searching for its specific target site on DNA [5], or foraging behaviour of animals in their search for food or a mate [6,7]. The objective of search research [8] is to develop optimal strategies for localising a target in the shortest time (on average), for a given search volume and sensing characteristics.
The use of autonomous vehicles in dangerous missions, such as finding a source of hazardous emissions, has become widespread [9][10][11]. Existing approaches to the search and localisation in the context of atmospheric releases can be loosely divided into three categories: up-flow motion methods, concentration gradient-based methods and information gain-based methods, also known as infotaxis. Both the up-flow motion methods and the concentration gradient methods are simple, in the sense that they require only a limited level of spatial perception [12]. Their limitations manifest in the presence of turbulent flows, due to the absence of concentration gradients, when the plume typically consists of time-varying disconnected patches. The information gain-based methods [13] have been developed specifically for searching in turbulent flows. In the absence of a smooth distribution of concentration (e.g., due to turbulence), this strategy directs the searching robot(s) towards the highest information gain. As a theoretically principled approach, where the source-parameter estimation is carried out in the Bayesian framework and the searching platform motion control is based on the information-theoretic principles, the infotaxic (or cognitive) search strategies have attracted a great deal of interest [3,[14][15][16][17][18][19][20][21][22][23].
This chapter summarizes our recent results in development of an autonomous infotaxic coordinated search strategy for a group of robots, searching for an emitting hazardous source in open terrain under turbulent conditions. The assumption is that the search platforms can move and sense. Two types of sensor measurements are collected sequentially: (a) the concentration of the hazardous substance; (b) the platform location within the search domain. Due to the turbulent transport of the emitted substance, the concentration measurements are typically sporadic and fluctuating. The searching platforms form a moving sensor network, thus enabling the exchange of data and a cooperative behaviour. The multi-robot infotaxis have already been studied in [16,17,20,24]. However, all mentioned references assumed all-to-all (i.e., fully connected) communication network with centralised fusion and control of the searching group.
We develop an approach where the group of searching robots operate in a fully decentralised coordinated manner. Decentralised operation means that each searching robot performs the computations (i.e., source estimation and path planning) locally and independently of other platforms. Having a common task, however the robotic platforms must perform in a coordinated manner. This coordination is achieved by exchanging the data with immediate neighbours only, in a manner which does not require the global knowledge of the communication network topology. For this reason, the proposed approach is scalable in the sense that the complexities for sensing, communication, and computing per sensor platform are independent of the sensor network size. In addition, because all sensor platforms are treated equally (no leader-follower hierarchy), this approach is robust to the failure of any of the searching agents. The only requirement for avoiding the break-up of the searching formation is that the communication graph of the sensor network remains connected at all times. Source-parameter estimation is carried out sequentially, and on each platform independently, using a Rao-Blackwellised particle filter. Platform path planning, in the spirit of infotaxis, is based on entropyreduction and is also carried out independently on every platform.

Mathematical models
First, we describe the measurement model. The concentration measurements are modelled using a Lagrange encounters model developed in [13], based on an open field assumption and a two-dimensional geometry. Let ith robotic vehicle position (i ¼ 1, 2, …, N) at time t k be denoted by r i k ∈ R 2 . Suppose that the emitting source is located at coordinates specified by the vector r 0 ¼ X 0 ; Y 0 ½ ⊺ and its release rate, or strength, is Q 0 . The goal of the search is to detect and estimate the sourceparameter vector η 0 ¼ r ⊺ 0 Q 0 Â Ã ⊺ in the shortest possible time. The particles released from the source propagate with combined molecular and turbulent isotropic diffusivity D, but can also be advected by wind. The released particles have an average lifetime τ before being absorbed. Let the average wind characteristics be the speed U and direction, which by convention, coincides with the direction of the x axis. Suppose a spherical concentration measuring sensor of small radius a is mounted on the ith robot, whose position at time k is 1 r i k ¼ x i k ; y i k Â Ã ⊺ . This sensor will experience a series of encounters with the particles released from the emitting source. The average rate of encounters can be modelled as follows [13]: where D, τ and U are known environmental parameters, is the distance between the source and the ith sensor platform, K 0 is the modified Bessel function of the second kind of order zero, and λ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi The probability that a sensor at location r i k is hit by z ∈ Z þ ∪ 0 f g dispersed particles (where z is a non-negative integer) during a time interval t 0 is Poisson distributed, i.e., (2) is the mean number of particles expected to reach the sensor at location r i k during interval t 0 . Eq. (2) expressed the likelihood function of a concentration measurement z i k collected by ith sensor, i.e., ℓ z i k jη 0 The motion model of a coordinated group of robots is described next. Let the pose vector of the ith robot platform at time t k be denoted Ã ⊺ has already been introduced and ϕ i k is the vehicle heading. The group of searching vehicles moves in a formation. The centroid of the formation at time t k is specified by coordinates: For each platform i ¼ 1, …, N, the offset Δx i ; Δy i À Á from the centroid x c k ; y c k À Á is predefined and known to it (i.e., The measurements of concentration are taken at time instants t k , k ¼ 1, 2, ⋯. Between two consecutive sensing instants, each platform is moving. Let the duration of this interval (referred to as the travel time) for the ith platform be T i k ≥ 0. The assumption is that sensing is suppressed during the travel time.
Motion of the ith platform during interval T i k is controlled by linear velocity V i k and angular velocity Ω i k . Given that the motion control vector applied to the ith platform, its dynamics during a short integration time interval δ ≪ T i k can be modelled by a Markov process whose transitional density is The process noise covariance matrix Q captures the uncertainty in motion due to the unforeseen disturbances. The vehicle motion function β θ i tÀδ ; u i k À Á is: tion of the formation due to process noise with parameters: Here, x i kÀ1 and y i kÀ1 are the estimates of the coordinates of the formation centroid at k À 1 (that is of x c kÀ1 and y c kÀ1 , respectively) available to the ith platform. Coordinates x i kÀ1 and y i kÀ1 refer to the known ith vehicle position at k À 1. Figure 1 illustrates the trajectories of N ¼ 7 autonomous vehicles in a formation using the described transitional density π θ i t jθ i tÀδ ; u i k À Á . In the absence of process noise (i.e., Q ¼ 0), the vehicles would move in a perfect formation if (a) all control vectors are identical (i.e., u 1 In this case, each platform would know the true coordinates of the formation centroid (i.e., , and hence the correction vectors B i kÀ1 would be zero. A robotic platform can communicate with another platform of the formation, if their mutual distance is smaller than a certain range R max . Because of process noise in motion, the distance between the vehicles in the formation will vary and consequently the topology of the communication network graph may also vary. For simplicity, we will assume that communication links (when established) are error free. Figure 1 illustrates the communication graphs of a formation consisting of N ¼ 7 searching platforms at two consecutive time instants.

Decentralised sequential estimation
Estimation and robot motion control are carried out using the measurement dissemination-based decentralised fusion architecture [25]. Measurement locations 2 and the corresponding measured concentration values, i.e., the triple x i k ; y i k ; z i k À Á , are exchanged via the communication network. The protocol is iterative. In the first iteration, platform i broadcasts its triple to its neighbours and receives from them their measurement triples. In the second, third and all subsequent iterations, platform i broadcasts its newly acquired triples to the neighbours, and accepts from them only the triples that this platform has not seen before (newly acquired). Providing that the communication graph is connected, after a sufficient number of iterations (which depends on the topology of the graph), a complete list of measurement triples from all platforms in the formation, denoted will be available at each platform. Suppose the posterior density function of the source at discrete-time k À 1 and platform i be denoted ð Þand d k , the problem of sequential estimation is to compute the posterior at time k, i.e., p i η 0 jd 1:k ð Þ. Using the Bayes rule, the posterior is where g d k jη 0 ð Þis the likelihood function. Assuming that individual platform measurements are conditionally independent, g d k j η 0 ð Þcan be expressed as is independent of Q 0 . The posterior density p i η 0 jd 1:k ð Þis computed using the Rao-Blackwell dimension reduction scheme [26]. Using the chain rule, the posterior can be expressed as: where the posterior of source strength p i Q 0 jr 0 ; d 1:k ð Þwill be worked out analytically, while the posterior of source position p i r 0 jd 1:k ð Þwill be computed using a particle filter. Following [27], we express the posterior p i Q 0 jr 0 ; d 1:kÀ1 ð Þ with the Gamma distribution whose shape and scale parameters are κ kÀ1 and ϑ kÀ1 , respectively. That is 2 Because the measurement locations are assumed to be known exactly, they will not be treated as random variables.
Since the conjugate prior of the Poisson distribution is the Gamma distribution [28], the posterior p Q 0 jr 0 ; d 1:k ð Þis also a Gamma distribution with updated parameters κ k and ϑ k , i.e., p Q 0 jr 0 ; d 1:k ð Þ¼G Q 0 ; κ k ; ϑ k ð Þ . The computation of κ k and ϑ k can be carried out analytically as a function of r 0 and the measurement set The parameters of the prior for source strength, p Q 0 ð Þ ¼ G κ 0 ; ϑ 0 ð Þare chosen so that this density covers a large span of possible values of Q 0 .
Next, we turn our attention to the posterior of source position p i r 0 jd 1:k ð Þin the factorised form (8). Given p r 0 jd 1:kÀ1 ð Þ , the update step of the particle filter using d k applies the Bayes rule: where f d k jd 1:kÀ1 ð Þ¼ Ð g d k jr 0 ; d 1:kÀ1 ð Þ p r 0 jd 1:kÀ1 ð Þ dr 0 is a normalisation constant. The problem in using (11) is that the likelihood function g d k jr 0 ; d 1:kÀ1 ð Þis unknown; only g d k j η 0 ð Þof (6) is known. Fortunately, it is possible to derive an analytic expression for g d k jr 0 ; d 1:kÀ1 ð Þ : The Rao-Blackwellised particle filter (RBPF) fully describes the posterior p i η 0 jd 1:k ð Þby a particle system Þusing the RBPF is carried out by a recursive update of the particle system S i k over time.

Decentralised formation control
In decentralised multi-robot search, each platform autonomously makes a decision at time t kÀ1 about its next control vector u i k , as described in Section 4.1. However, in order to maintain the geometric shape of the formation and thus avoid its break-up, there is a need to impose a form of coordination between the platforms. This will be explained in Section 4.2.

Selection of individual control vectors
A robot platform i autonomously decides on the control vector u i k using the infotaxis strategy [13], which can be formulated as a partially observed Markov decision process (POMDP) [29]. The elements of POMDP are (i) the information state, (ii) the set of admissible actions and (iii) the reward function. The information state at time t kÀ1 is the posterior density p i η 0 jd 1:kÀ1 ð Þ ; it accurately specifies the ith platform current knowledge about the source position and its release rate. Admissible actions can be formed with one or multiple steps ahead. A decision in the context of search is the selection of a motion control vector u i k ∈ U which will maximise the reward function. According to Section 2, the space of admissible actions U is continuous with dimensions: linear velocity V, angular velocity Ω and duration of motion T. In order to reduce the computational complexity of numerical optimisation, U is adopted as a discrete set with only myopic (one step ahead) controls. In addition, U is timeinvariant and identical for all platforms. If V, O and T denote the sets of possible discrete-values of V, Ω and T, respectively, then U is the Cartesian product V Â O Â T . The myopic selection of the control vector at time t k on platform i is expressed as: where D is the reward function and z i k is the future concentration measurement collected by the ith platform if the platform moved under the control v ∈ U to position x i k ; y i k À Á . In reality, this future measurement is not available (the decision has to be made at time t kÀ1 ), and therefore the expectation operator E with respect to the prior measurement PDF features in (13).
Previous studies of search strategies [3,20] found that the reward function defined as the reduction of entropy, results in the most efficient search. Hence, we adopt the expected reward defined as where H kÀ1 is the current differential entropy, defined as while H i k z i k v ð Þ À Á ≤ H kÀ1 is the future differential entropy (after a hypothetical control vector v has been applied to collect z i k ): (14) is with respect to the probability mass function P z i k jd 1:kÀ1 Given that p i η 0 jd 1:kÀ1 ð Þis approximated by a particle system S i kÀ1 , one can approximately compute H i kÀ1 , which features in (14), as In order to compute E H i k z i k v ð Þ À Á È É of (17), first note that P z i k jd 1:kÀ1 , whereμ i kÀ1 is the predicted mean rate of chemical particle encounters at location r i k (where the platform i would move after applying a hypothetical control v), computed based on d 1:kÀ1 . According to Section 2, where the product κ i kÀ1 ϑ m, i kÀ1 approximates the source release rate as the mean of the Gamma distribution with parameters κ i kÀ1 ; ϑ m, i kÀ1 À Á . Next, we find the value of z max as the minimum value of z 0 such that the cumulative distribution function ∑ z 0 z¼0 P z;μ i kÀ1 À Á is greater than a certain threshold 1 À ε, where ε ≪ 1. The summation (17) is then computed only for z i k ¼ 0, 1, ⋯, z max . Computation of H k z i k v ð Þ À Á is carried out according to (18), except that w m, i kÀ1 is replaced with Thus, (17) is approximated with Pseudo-code of the routine for the computation of control vector on platform i is given by Algorithm 1. Computeμ i kÀ1 using (19) 7: Determine Calculate the expected reward ℛ i using (14) 10: end for 11: Find u i k using (13) 12: Output: u i k

Cooperative control through consensus
So far, we have explained how platform i would independent of the other platforms in the formation determine the best action for itself, i.e., u i k . In general, individual platforms will disagree on the best action, and in the extreme In order to maintain the shape of the formation during the motion period (from time t kÀ1 to t k ), the platforms need to reach an agreement on the common action u k , to be applied to all platforms at the same time. But this is not sufficient; according to the motion model in Section 2, the platforms also need to agree on the formation centroid coordinates and the common heading angle ϕ kÀ1 to be applied in (4).
We apply decentralised cooperative control based on the average consensus [30,31]. In a network of collaborating agents, consensus is an iterative protocol designed to reach an agreement regarding a certain quantity of interest. Suppose that every platform, as a node in the communication network, initially has an individual scalar value. The goal of average consensus is for every node in the network to compute the average of initial scalar values, in a completely decentralised manner: by communicating only with the neighbours in the communication graph (without knowing the topology of the communication graph).
In the problem we consider, there is not only a single individual scalar value, but six of them. They include three motion control parameters, i.e., for platform i, V i k , Ω i k and T i k , two formation centroid coordinates, i.e., x i kÀ1 , y i kÀ1 and the heading angle of each platform ϕ i kÀ1 . Let us denote the scalar value of interest by b i , that is Ideally, we want every platform in the formation to compute the mean value If all platforms in the formation were to use identical average values for motion control, centroid coordinates and heading, then their motion would be coordinated (except for process noise, which will be taken care of through vector B i kÀ1 in (4)) and the shape of the formation would be maintained (provided R max is adequate).
Average consensus is an iterative algorithm. At iteration s ¼ 0, the node in the communication graph (the robot platform) will initialise its state b i 0 ð Þ using either a component of vector u i k (if b i is a motion control parameter) or the platform pose θ i kÀ1 (if b i is a formation centroid coordinate or heading angle). This value is locally available. The initial values of centroid coordinates are the actual ith platform coordinates, i.e., x i kÀ1 0 ð Þ ¼ x i kÀ1 and y i kÀ1 0 ð Þ ¼ y i kÀ1 . At each following iteration s ¼ 1, 2, ⋯, each platform updates its state with a linear combination of its own state and the states of its current neighbours. Let us denote the set of current neighbours of platform i by J i . Then [30]: where |J i | is the number of neighbours of platform i. This particular linear combination is based on the so-called maximum degree weights [32]. Other weights can be also used. It can be shown that if the communication graph is connected, the values b i s ð Þ after many iterations converge to the mean b [32]. The search continues until the global stopping criterion is satisfied. The local stopping criterion is calculated on each platform independently based on the spread of the local positional particles r m, i 0, k n o , measured by the square-root of the trace of its sample covariance matrix C k . For example, if the spread of particles on platform i is smaller than a certain threshold ϖ, then the local stopping criterion is satisfied and is given a value of one, otherwise it is zero. This local stopping criterion value (zero or one) becomes the initial state of the global stopping criterion on platform i, denoted σ i 0 ð Þ: The global stopping criterion is computed on each platform using the average consensus algorithm, using (21), but with b i replaced by σ i . After a sufficient number of iterations, S, platform i decides to stop the search if at least one of the platforms in the formation has reached the local stopping criterion, that is, if We point out that both estimation and control are based on the consensus algorithm. While the cooperative control is using the average consensus (21), the decentralised measurement dissemination of Section 3 achieves the consensus on the set of measurements at time k. The consensus algorithm is iterative, and hence its convergence properties are very important. First note that, although the network topology changes with time (as the robots move while searching for the source), during the short interval of time when the exchange of information takes place, the topology can be considered as time-invariant. Furthermore, assuming bidirectional communication between the robots in formation, the network topology can be represented by an undirected graph. The convergence of the consensus algorithm for a time-invariant undirected communication topology is guaranteed if the graph is connected [31][32][33]. Note that this theoretical result is valid for an infinite number of iterations. In practice, if the communication graph at some point of time is not connected, or if an insufficient number of consensus iterations are performed, it may happen that one or more robots are lost (they could re-join the formation only by coincidence). This event, however, does not mean that the search mission has failed: the emitting source will be found eventually, albeit by a smaller formation in possibly longer interval of time.

Numerical results
The proposed search algorithm has been applied to an experimental dataset, collected by COANDA Research & Development Corporation using their large recirculating water channel. The emitting source was releasing fluorescent dye at a constant rate from a narrow tube. The dataset comprises a sequence of 340 frames of instantaneous concentration field measurements in the vertical plane and is sampled at every 10/23 s. The size of a frame is 49 Â 49 pixels, where a pixel corresponds to a square area of 2:935 Â 2:935 mm 2 . As the size of the data is relatively small, we follow the approach used in [24]: upscale each frame by a factor of 3 using bicubic interpolation and place the result in the top left corner of a 500 Â 500 search area. A measurement obtained by a platform is, thus, the integer value of the concentration of the dye taken from the closest spatial and temporal sample from the experimental data.
An example of the search algorithm running on the experimental data is shown in Figure 2. All physical quantities are in arbitrary units (a.u.). The following environmental/sensing parameters were used: D ¼ 1, τ ¼ 250, U ¼ 0, a ¼ 1 and t 0 ¼ 1. Algorithm parameters are selected as follows: κ 0 ¼ 3, ϑ 0 ¼ 5:2, number of particles M ¼ 25 2 , V ¼ 1 f g, O ¼ À3; À2; À1; 0; 1; 2; 3 f g degrees per unit of time and T ¼ 0:5; 1; 2; 4; 8; 16; 32; 64 f g . The number of iterations, both for the exchange of measurement triples and in the consensus algorithm, was fixed to 30. The local search stopping threshold was ϖ ¼ 3. Figure 2 displays the top-down view of the search progress at step indices k ¼ 0; 12; 22; 32. The formation consists of N ¼ 7 platforms, whose trajectories are shown in different colours. The search algorithm terminated at K ¼ 33. Note that the plume size is much smaller than the search area. Panels (a)-(c) of Figure 2 show the particles before resampling: the particles are placed on a regular grid, thus mimicking a grid-based approach, with the value of particle weights indicated by the grey-scale intensity plot (white means a zero weight). This provides a good visual representation of the posterior p r 0 jd 1:k ð Þ. Panel (d) shows the situation after a non-zero concentration measurement was collected by the search team. The positional particles have been resampled at this point of time and moved closer to the true source location.
Using 200 Monte Carlo simulations, the mean search time for the algorithm was 2525 a.u., with a 5th and 95th quantile of 1840 and 3445 a.u., respectively. Note that in all simulations the formation started from the bottom right hand corner indicated in Figure 2

Conclusions
The chapter presented a decentralised infotaxic search algorithm for a group of autonomous robotic platforms. The algorithm allows the platforms to search and locate a source of hazardous emissions in a coordinated manner without the need for a centralised fusion and control system. More precisely, this distributed coordination is achieved only by local exchange of measurement data between neighbouring platforms. Similarly, the movement decisions taken by the platforms were reached using a distributed average consensus algorithm over the whole formation. The key aspect is that individual platforms only require knowledge of their neighbours; the global knowledge of the communication network topology is unnecessary. An advantage of adopted distributed framework is that all platforms are treated equally, making the proposed search algorithm scalable and robust to the failure of a single platform. Numerical results using experimental data confirmed the robust performance of the algorithm. The main limitation of the algorithm is that the environmental parameters (such as diffusivity, the average direction and speed of the wind, particle lifetime), must be known. Future work will explore sensitivity to parametrisation and will aim to develop a team of "search and rescue" robots for further experimentation in realistic environments.